Handle SimpleXMLElement error if bad url given - php

I am getting some products via some xml files daily.
Sometimes there are 4 files, sometimes there are up to 10.
I want to process them in a loop with SimpleXMLElement from an URL.
Here is what I try:
for ($i = 1; $i <= 10; $i++) {
try {
$SimpleXML = new \MFF_System\SimpleXMLExtended($file, LIBXML_NOCDATA, true);
} catch (Exception $e) {
var_dump($e);
}
}
But unluckily I've got warning:
SimpleXMLElement::__construct(): http://example.com/file_6.xml:42:
parser error : Input is not proper UTF-8, indicate encoding ! Bytes:
0xE1 0x6D 0x6F 0x67
In this case, there were only 5 files, so when I try to get the 6th, that gaves me the main page from the site. I've also tried with to surpress the warning without success.
$SimpleXML = #new \MFF_System\SimpleXMLExtended($file, LIBXML_NOCDATA, true);
Is there any way to handle these errors in order to avoid them stopping my script?
EDIT
I can not use file_get_contents because in this case, I am getting memory limit error, and I can not increase the memory. These are so big files, one of it is 1GB. Guess what, the developer of these files are put the product images binaries into the file :((((((((( (I can not speak with him).
EDIT2
In php documentation (here) I've read $var = #new some_class(); should work.
If I am using:
$SimpleXML = #new \MFF_System\SimpleXMLExtended($file, LIBXML_NOCDATA, true);
I get this:
Exception: String could not be parsed as XML in ........\Parser.php on line 19
I thought, yeah, it is much more better, because this is an Expection. If I wrap with try/catch, I just get the same error.
try {
$SimpleXML = #new \MFF_System\SimpleXMLExtended($file, LIBXML_NOCDATA, true);
} catch (Exception $e) {
die('Exception');
}
\MFF_System\SimpleXMLExtended is just extends the SimpleXMLElement class, and use a method to add CDATA, nothing special.

Related

Smarty returns an empty result every now and then

I am facing a weird problem using Smarty. I am generating an email's body through a template. Most times it works as expected, but from time to time, the returned data is empty. However, I do not see any error in my logs, neither I catch any exception. It is just as if the template was empty.
This is the piece of code I am using to get the email's body:
// $data is an array with template's data
// $tpl is the template's path
$s = new Smarty();
$s->assignArray( $data );
try {
$body = $s->fetch( $tpl );
} catch ( \Exception $e ) {
Debug::Log( $e->getMessage() );
}
// Sometimes $body is empty, but no exception is thrown.
I checked that the template has no errors, after all, it works in most cases.
I also saved $data contents when $body is empty and I ran the code manually to get $body content, but it worked, so I do not think the problem is related to template vars.
Another test I did is to try to process the template up to 5 times, sleeping for a second between the tries, but the result was always empty.
The template's cache path is writable.
I am using PHP 5.6.40, Smarty 3.1.21 and Apache2.
Can you give me a hand to debug this issue?
Update
I have been able to reproduce the problem. Smarty always returns an empty result whenever the fetch method is called after PHP detected that the client closed the connection. For example, take this code:
ignore_user_abort(1); // Continue running even if the connection is closed
set_time_limit(180); // 3 minutes
$s = new Smarty();
$s->assignArray( $data );
// Keep writing data untill PHP realises that connection was closed
while( 1 ) {
if(connection_status() != CONNECTION_NORMAL || connection_aborted( ) ) {
break;
}
echo "123456789";
}
$body = $s->fetch( $tpl );
if ( '' == $body ) {
throw new Exception("Result is empty");
}
die('Code never reaches this point');
If I call the script above and I close the connection immediately, the result of the fetch method is always empty.
However, if PHP did not detect that the connection was closed, even though it really was, the result of fetch is not empty.
ignore_user_abort(1); // Continue running even if the connection is closed
set_time_limit(180); // 3 minutes
$s = new Smarty();
$s->assignArray( $data );
// Sleep to make sure the connection was closed
// PHP do not realise the connection is closed untill it tries to write something
sleep( 60);
$body = $s->fetch( $tpl );
if ( '' == $body ) {
throw new Exception("Result is empty");
}
echo "Now the result is not empty";
This is the code I used to call the above scripts:
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://myhost/test.php');
curl_setopt($ch, CURLOPT_FRESH_CONNECT, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 1);
curl_exec($ch);
curl_close($ch);
echo "all done";
This seems to be related to this question: PHP ob_get_contents "sometimes" returns empty when it should not?
My script does a lot of things so it takes quite a long time to finish. Some users close their browser before the script finished, and that is when Smarty returns an empty result, as it uses ob_start a lot.
Best wishes,
blanking the page without exceptions seems to be the way smarty does everything..
im not familiar with python, however, i suspect that only exceptions are thrown. not notices. you might try at the end of your code to check of thrown notices or other warning, hoever it is called in python.
it might still be a folder persmission, have you also checked that the templates_c directory exists, and has opermissions? or any {var.name} without $.
it can be anything, smarty nevers throws exceptions, it just blanks the page.
if it still does not help, crate a basic overly-simplified template, and try that for some time to see if it still happens. if it does, it is a mistake in your template.
As far as I'm concerned, it turns out to be a bug in PHP 5.6. I made some tests using print_r with the return flag set to true, and the result after closing the connection was never empty with PHP 7.0 and PHP 8.0. However, when I used PHP 5.6 the result was empty.
Example:
<?php
error_reporting( E_ALL );
ini_set('display_errors', 1);
ignore_user_abort(true);// (curl disconnects after 1 second)
ini_set('max_execution_time','180'); // 3 minutes
ini_set('memory_limit','512M'); // 512 MB
function testPrint_r($length)
{
$test1 = array('TEST'=>'SOMETHING');
$test2 = print_r($test1, true);
$test3 = "Array\n(\n [TEST] => SOMETHING\n)\n";
if(strcmp($test2, $test3)!==0) {
throw new Exception("Print_r check failed, output length so far: ".$length);
// consult your error.log then, or use some other reporting means
}
}
$message = "123456789\n";
$length = strlen($message);
$total_length = 0;
while(1)
{
echo $message;
$total_length += $length;
testPrint_r($total_length);
}
die('it should not get here');
Using PHP 5.6, if you call the script and close the connection, the Exception is thrown because print_r returns an empty result. However, using PHP 7.0 or PHP 8.0 the script keeps running until it reaches the maximun execution time.
Kind regards,

PHP simple HTML DOM parser errors

I've started writing a scraper for one site that will also have a crawler, since I need to go through some links, but I'm getting this error :
PHP Fatal error: Uncaught Error: Call to a member function find() on
null in D:\Projekti\hemrank\simple_html_dom.php:1129 Stack trace:
0 D:\Projekti\hemrank\scrapeit.php(37): simple_html_dom->find('ul')
1 D:\Projekti\hemrank\scrapeit.php(19): ScrapeIt->getAllAddresses()
2 D:\Projekti\hemrank\scrapeit.php(55): ScrapeIt->run()
3 {main} thrown in D:\Projekti\hemrank\simple_html_dom.php on line 1129
When I var_dump the $html variable I get the full html with all the tags, etc, that's why it's strange to me that it says "Call to a member function find() on null", when there's actually value in the $html. Here's the part of the code that's not working :
$html = new simple_html_dom();
$html->load_file($baseurl);
if(empty($html)){echo "HTTP Response not received!<br/>\n";exit;}
$links = array();
foreach ($html->find('ul') as $ul) {
if(!empty($ul) && (count($ul)>0))
foreach ($ul->find('li') as $li) {
if(!empty($li) && (count($li)>0))
foreach ($li->find('a') as $a) {
$links[] = $a->href;
}
else
die("NOT AVAILABLE");
}
}
return $links;
}
Is this a common problem with PHP simple HTML DOM parser, is there a solution or should I switch to some other kind of scraping?
I just searched for the lib you are using, this is line 1129:
return $this->root->find($selector, $idx, $lowercase);
So your error message is telling you that $this->root inside the class is null, therefore no find() method exists!
I'm no expert on the lib, as I use the awesome DOMDocument for parsing HTML, but hopefully this should help you understand what has happened.
Also, $html will never be empty in that code of yours, you already populated it when you instantiated it!
I suggest the following change:
$html->load_file($baseurl); to $html = file_get_html($baseurl);
On my VPS server it works with $html->load_file($baseurl); but on my dedicated local server it only works with $html = file_get_html($baseurl);
This solved my problem:
- Call to a member function find() on null
- simple_html_dom.php on line 1129

PHP Exception within a try catch block not being caught unless the print function is called?

TL;DR:
My PHP-Exception only get caught, if i use a print-statement in the catch-block.
I am seeing some strange behaviour with a PHP try catch block. My PHP version is 7.0.14.
Here is a method that I've defined which simply explodes a string and stores the result in memory. If the exploded string doesn't have the expected amount of constituent parts, an Exception will be thrown.
public function importString($string) {
// separate the string with the delimiter.
$separated = explode($this->delimiter, $string);
// If there is a different number of segments to specified id's throw exception.
if (count($separated) != count($this->ids)) {
$class = static::class;
throw new \Exception("Invalid Pseudo ID '{$string}' for type '{$class}'.");
}
//store separated id strings
foreach($separated as $i => $item) {
$this->data[$this->ids[$i]] = $item;
}
}
This method is contained within a class called PseudoId from which the two classes below inherit.
Below is the code that calls the method defined above; the majority of the time I expect the imported string to be constructed from three different values (e.g. "one_two_three") but on occasion there can be strings with four components (e.g. "one_two_three_four"), therefore I first attempt to import the string into the object that expects the string to have three components, falling back to the object that expects four.
try {
//Added later for debug
throw new \Exception("error");
$productInfo = new PseudoIdFeedProduct(0, 0, 0);
$productInfo->importString($data['productId']);
} catch (\Exception $e) {
//Added later for debug
print "caught.";
$productInfo = new PseudoIdFeedProductIndividualBilling(0, 0, 0, 0);
$productInfo->importString($data['productId']);
}
// Added for debug
var_dump($productInfo);
Now here's the weird part: When passing in a string with four components, the Exception thrown in PseudoIdFeedProduct::importString() isn't caught. So for the sake of debugging I added an Exception to the top of the try block, calling the print function in the catch block as a check and it worked. So I removed the exception at the top of the try block, and it still worked. Slightly baffled, I removed the call to the print function and it stopped working...
Whether the thrown Exception is caught depends on whether the print function is called in the catch block, I added and removed it a couple of times to double check.
Clearly I don't want to have to call the print function here, what am I doing wrong here and how can I get this to work as expected?

php - Detect bad request

I have 2 JSON sources and one of them reply 400 Bad request (depend of charge in servers)
So I want that my php code check the answer of both server and select working one
<?php
$server1 = 'server1.lan'
$server2 = 'server2.lan'
/*
Here a code to check and select the working server
*/
$json=file_get_contents('https://'.$workingServer.'/v1/data?source='.$_GET['source']);
$data = json_decode($json);
if (count($data->data)) {
// Cycle through the array
foreach ($data->data as $idx => $data) {
echo "<p>$data->name</p>\n";
?>
Thanks !
Below is an idea of what you may want to implement. Your goal is to get that idea and implement something like that in your own way, with a normal error handling and removal of code duplication:
$json = file_get_contents('https://server1.lan/v1/data');
if ($json === false)
{
$json = file_get_contents('https://server2.lan/v1/data');
if ($json === false)
{
die('Both servers are unavailable');
}
}
file_get_contents returns boolean false on failure, so if the first server is unavailable, call the second. If it is also unavailable, exit the script, or do some sort of error handling that you prefer.
You may want to create an array of possible server names, and use a function that iterates over all of them until it finds a working one, and returns the contents, or throws an exception on failure.
I would also suggest that you use curl, which gives you an option to see the error codes of the request, customize the request itself, and so on.
Check $http_response_header after making the file_get_contents call.
$json = file_get_contents(('https://'.$server1.'/v1/data?source='.$_GET['source']);
if (strpos($http_response_header[0],"400") > 0)
{
$json = file_get_contents(('https://'.$server.'/v1/data?source='.$_GET['source']);
}
See examples at http://php.net/manual/en/reserved.variables.httpresponseheader.php

"Comment not terminated" XML parsing error in Box API response

For months I've been running the "Box Rest Client" lib by Angela R that employs the following code to parse curl responses from the box API:
$xml = simplexml_load_string($res);
Today, after the code loops through dozens of request/responses I generate this following error:
ErrorException [ Warning ]: simplexml_load_string(): Entity: line 9:
parser error : Comment not terminated
This happened in 2 straight attempts to run the code - and now seems to have gone away without any changes to anything.
Interested if anyone knows what is up with that?
I have put a catch for this case if its useful to anyone using this lib (for the next month or so before its deprecated by box api 2.0)
private function parse_result($res) {
try {
$xml = simplexml_load_string($res);
$json = json_encode($xml);
$array = json_decode($json,TRUE);
return $array;
} catch (Exception $e){
$error = 'xml parsing error: '. $e->getMessage(). "<br>";
return array('status' => $error );
}
}
It's possible it is related to including two minus signs -- inside of an HTML comment. For example:
<!-- this is my comment--but not a very good one. -->
The two dashes in the middle of the comment causes problems with the parser.

Categories