PHP DOMDocument error handling - php

In my application I am loading xml from url in order to parse it.
But sometimes this url may not be valid. In this case I need to handle errors.
I have the following code:
$xdoc = new DOMDocument();
try{
$xdoc->load($url); // This line causes Warning: DOMDocument::load(...)
// [domdocument.load]: failed to open stream:
// HTTP request failed! HTTP/1.1 404 Not Found in ...
} catch (Exception $e) {
$xdoc = null;
}
if($xdoc == null){
// Handle
} else {
// Proceed
}
I know I probably doing it wrong, but what's a correct way to handle this kind of exceptions? I don't want to see error messages on my page.
The manual for DOMDocument::load() says:
If an empty string is passed as the
filename or an empty file is named, a
warning will be generated. This
warning is not generated by libxml and
cannot be handled using libxml's error
handling functions.
But there is no information on how to handle it.
Thanks.

From what I can gather from the documentation, handling warnings issued by this method is tricky because they are not generated by the libxml extension and thus cannot be handled by libxml_get_last_error(). You could either use the error suppression operator and check the return value for false...
if (#$xdoc->load($url) === false)
// ...handle it
...or register an error handler which throws an exception on error:
function exception_error_handler($errno, $errstr, $errfile, $errline ) {
throw new ErrorException($errstr, 0, $errno, $errfile, $errline);
}
and then catch it.

set_error_handler(function($number, $error){
if (preg_match('/^DOMDocument::loadXML\(\): (.+)$/', $error, $m) === 1) {
throw new Exception($m[1]);
}
});
$xml = new DOMDocument();
$xml->loadXML($xmlData);
restore_error_handler();
That works for me in PHP 5.3. But if you're not using loadXML, you might need to do some modifications.

To disable throwing errors:
$internal_errors = libxml_use_internal_errors(true);
$dom = new DOMDocument();
// etc...
libxml_use_internal_errors($internal_errors);

From php.net
If an empty string is passed as the
filename or an empty file is named, a
warning will be generated. This
warning is not generated by libxml and
cannot be handled using libxml's error
handling functions.
In your production environment you shouldn't have errors displayed to the user. They don't need to see them so taking this into account you can use...
$xdoc = new DOMDocument();
if ( $xdoc->load($url) ) {
// valid
}
else {
// invalid
}

For me , following did the trick
$feed = new DOMDocument();
$res= #$feed->load('http://www.astrology.com/horoscopes/daily-extended.rss');
if($res==1){
//do sth
}

Related

PHP DOMDocument loading custom tags [duplicate]

I need to parse some HTML files, however, they are not well-formed and PHP prints out warnings to. I want to avoid such debugging/warning behavior programatically. Please advise. Thank you!
Code:
// create a DOM document and load the HTML data
$xmlDoc = new DomDocument;
// this dumps out the warnings
$xmlDoc->loadHTML($fetchResult);
This:
#$xmlDoc->loadHTML($fetchResult)
can suppress the warnings but how can I capture those warnings programatically?
Call
libxml_use_internal_errors(true);
prior to processing with with $xmlDoc->loadHTML()
This tells libxml2 not to send errors and warnings through to PHP. Then, to check for errors and handle them yourself, you can consult libxml_get_last_error() and/or libxml_get_errors() when you're ready:
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$errors = libxml_get_errors();
foreach ($errors as $error) {
// handle the errors as you wish
}
To hide the warnings, you have to give special instructions to libxml which is used internally to perform the parsing:
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();
The libxml_use_internal_errors(true) indicates that you're going to handle the errors and warnings yourself and you don't want them to mess up the output of your script.
This is not the same as the # operator. The warnings get collected behind the scenes and afterwards you can retrieve them by using libxml_get_errors() in case you wish to perform logging or return the list of issues to the caller.
Whether or not you're using the collected warnings you should always clear the queue by calling libxml_clear_errors().
Preserving the state
If you have other code that uses libxml it may be worthwhile to make sure your code doesn't alter the global state of the error handling; for this, you can use the return value of libxml_use_internal_errors() to save the previous state.
// modify state
$libxml_previous_state = libxml_use_internal_errors(true);
// parse
$dom->loadHTML($html);
// handle errors
libxml_clear_errors();
// restore
libxml_use_internal_errors($libxml_previous_state);
Setting the options "LIBXML_NOWARNING" & "LIBXML_NOERROR" works perfectly fine too:
$dom->loadHTML($html, LIBXML_NOWARNING | LIBXML_NOERROR);
You can install a temporary error handler with set_error_handler
class ErrorTrap {
protected $callback;
protected $errors = array();
function __construct($callback) {
$this->callback = $callback;
}
function call() {
$result = null;
set_error_handler(array($this, 'onError'));
try {
$result = call_user_func_array($this->callback, func_get_args());
} catch (Exception $ex) {
restore_error_handler();
throw $ex;
}
restore_error_handler();
return $result;
}
function onError($errno, $errstr, $errfile, $errline) {
$this->errors[] = array($errno, $errstr, $errfile, $errline);
}
function ok() {
return count($this->errors) === 0;
}
function errors() {
return $this->errors;
}
}
Usage:
// create a DOM document and load the HTML data
$xmlDoc = new DomDocument();
$caller = new ErrorTrap(array($xmlDoc, 'loadHTML'));
// this doesn't dump out any warnings
$caller->call($fetchResult);
if (!$caller->ok()) {
var_dump($caller->errors());
}

DomDocument and HTML parsing using PHP [duplicate]

I need to parse some HTML files, however, they are not well-formed and PHP prints out warnings to. I want to avoid such debugging/warning behavior programatically. Please advise. Thank you!
Code:
// create a DOM document and load the HTML data
$xmlDoc = new DomDocument;
// this dumps out the warnings
$xmlDoc->loadHTML($fetchResult);
This:
#$xmlDoc->loadHTML($fetchResult)
can suppress the warnings but how can I capture those warnings programatically?
Call
libxml_use_internal_errors(true);
prior to processing with with $xmlDoc->loadHTML()
This tells libxml2 not to send errors and warnings through to PHP. Then, to check for errors and handle them yourself, you can consult libxml_get_last_error() and/or libxml_get_errors() when you're ready:
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$errors = libxml_get_errors();
foreach ($errors as $error) {
// handle the errors as you wish
}
To hide the warnings, you have to give special instructions to libxml which is used internally to perform the parsing:
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();
The libxml_use_internal_errors(true) indicates that you're going to handle the errors and warnings yourself and you don't want them to mess up the output of your script.
This is not the same as the # operator. The warnings get collected behind the scenes and afterwards you can retrieve them by using libxml_get_errors() in case you wish to perform logging or return the list of issues to the caller.
Whether or not you're using the collected warnings you should always clear the queue by calling libxml_clear_errors().
Preserving the state
If you have other code that uses libxml it may be worthwhile to make sure your code doesn't alter the global state of the error handling; for this, you can use the return value of libxml_use_internal_errors() to save the previous state.
// modify state
$libxml_previous_state = libxml_use_internal_errors(true);
// parse
$dom->loadHTML($html);
// handle errors
libxml_clear_errors();
// restore
libxml_use_internal_errors($libxml_previous_state);
Setting the options "LIBXML_NOWARNING" & "LIBXML_NOERROR" works perfectly fine too:
$dom->loadHTML($html, LIBXML_NOWARNING | LIBXML_NOERROR);
You can install a temporary error handler with set_error_handler
class ErrorTrap {
protected $callback;
protected $errors = array();
function __construct($callback) {
$this->callback = $callback;
}
function call() {
$result = null;
set_error_handler(array($this, 'onError'));
try {
$result = call_user_func_array($this->callback, func_get_args());
} catch (Exception $ex) {
restore_error_handler();
throw $ex;
}
restore_error_handler();
return $result;
}
function onError($errno, $errstr, $errfile, $errline) {
$this->errors[] = array($errno, $errstr, $errfile, $errline);
}
function ok() {
return count($this->errors) === 0;
}
function errors() {
return $this->errors;
}
}
Usage:
// create a DOM document and load the HTML data
$xmlDoc = new DomDocument();
$caller = new ErrorTrap(array($xmlDoc, 'loadHTML'));
// this doesn't dump out any warnings
$caller->call($fetchResult);
if (!$caller->ok()) {
var_dump($caller->errors());
}

How do I handle Warning: SimpleXMLElement::__construct()?

I'm getting this error when I run in local host, if internet is disconnected (if internet is connect its ok) I want to handle this error, "error can show " but want to handle not fatal error break on PHP page.
Warning: SimpleXMLElement::__construct() [simplexmlelement.--construct]:
php_network_getaddresses: getaddrinfo failed: No such host is known.
in F:\xampp\htdocs\shoptpoint\sections\docType_head_index.php on line 30
but I'm trying to handle using try-catch. Below is my code
$apiurl="http://publisher.usb.api.shopping.com/publisher/3.0/rest/GeneralSearch?apiKey=78b0db8a-0ee1-4939-a2f9-d3cd95ec0fcc&trackingId=7000610&categoryId='5855855'";
try{
new SimpleXMLElement($apiurl,null, true);
}catch(Exception $e){
echo $e->getMessage();
}
How do I handle the error and my page can execute end of the project?
Using set_error_handler, you can do the following to convert any notices/warnings raised by SimpleXMLElement into a catchable Exception.
Take the following:-
<?php
function getData() {
return new SimpleXMLElement('http://10.0.1.1', null, true);
}
$xml = getData();
/*
PHP Warning: SimpleXMLElement::__construct(http://10.0.1.1): failed to open stream: Operation timed out
PHP Warning: SimpleXMLElement::__construct(): I/O warning : failed to load external entity "http://10.0.1.1"
PHP Fatal error: Uncaught exception 'Exception' with message 'String could not be parsed as XML'
*/
See how we get 2 Warnings before the the Exception from SimpleXMLElement is thrown? Well, we can convert those to an Exception like this:-
<?php
function getData() {
set_error_handler(function($errno, $errstr, $errfile, $errline) {
throw new Exception($errstr, $errno);
});
try {
$xml = new SimpleXMLElement('http://10.0.1.1', null, true);
}catch(Exception $e) {
restore_error_handler();
throw $e;
}
return $xml;
}
$xml = getData();
/*
PHP Fatal error: Uncaught exception 'Exception' with message 'SimpleXMLElement::__construct(http://10.0.1.1): failed to open stream: Operation timed out'
*/
Good luck,
Anthony.
If for any reason you don't want to set up the error handler you can also use some libxml functions to suppress E_WARNING from raising:
// remembers the old setting and enables usage of libxml internal error handler
$previousSetting = libxml_use_internal_errors(true);
// still need to try/catch because invalid XML raises an Exception
try {
// XML is missing root node
new SimpleXMLElement('<?xml version="1.0" encoding="UTF-8"?>',null, true);
} catch(Exception $e) {
echo $e->getMessage(); // this won't help much: String could not be parsed as XML
$xmlError = libxml_get_last_error(); // returns object of class LibXMLError or FALSE
if ($xmlError) {
echo $xmlError->message; // this is more helpful: Start tag expected, '<' not found
}
}
// sets libxml usage of internal error handler to previous setting
libxml_use_internal_errors($previousSetting);
Alternatively you can use libxml_get_errors() instead of libxml_get_last_error() to get all errors. With it you can get all errors regarding parsing the XML as an array of LibXMLError objects.
Some helpful links:
https://www.php.net/manual/en/function.libxml-use-internal-errors.php
https://www.php.net/manual/en/function.libxml-get-last-error.php

loadXML unhandleable error

I'm using PEAR XML_Feed_Parser.
I have some bad xml that I give to it and get error.
DOMDocument::loadXML(): Input is not proper UTF-8, indicate encoding !
Bytes: 0xE8 0xCF 0xD3 0xD4 in Entity, line: 7
It's actually html in wrong encoding - KOI8-R.
It's ok to get error but I can't handle it!
When I create new XML_Feed_Parser instance with
$feed = new XML_Feed_Parser($xml);
it calls to __construct() that looks like that
$this->model = new DOMDocument;
if (! $this->model->loadXML($feed)) {
if (extension_loaded('tidy') && $tidy) {
/* tidy stuff */
}
} else {
throw new Exception('Invalid input: this is not valid XML');
}
Where we can see that if loadXML() failed then it throw exception.
I want to catch error from loadXML() to skip bad XMLs and notify user. So i wrapped my code with try-catch like that
try
{
$feed = new XML_Feed_Parser($xml);
/* ... */
}
catch(Exception $e)
{
echo 'Feed invalid: '.$e->getMessage();
return False;
}
But even after that I get that error
DOMDocument::loadXML(): Input is not proper UTF-8, indicate encoding !
Bytes: 0xE8 0xCF 0xD3 0xD4 in Entity, line: 7
I've read about loadXML() and found that
If an empty string is passed as the source, a warning will be generated. This warning is not generated by libxml and cannot be handled using libxml's error handling functions.
But somehow instead of warning i get error that halts my application. I've written my error handler and I saw that this is really warning ($errno is 2).
So i see 2 solutions:
Revert warnings to warnings - do not
treat them like errors. (Google
doesn't help me here). After that
handle False returned from loadXML.
Somehow catch that error.
Any help?
libxml_use_internal_errors(true) solved my problem. It made libxml to use normal errors so i can catch False from loadXML().
Try this one:
$this->model = new DOMDocument;
$converted = mb_convert_encoding($feed, 'UTF-8', 'KOI8-R');
if (! $this->model->loadXML($converted)) {
if (extension_loaded('tidy') && $tidy) {
/* tidy stuff */
}
} else {
throw new Exception('Invalid input: this is not valid XML');
}
or you can do it without need to modify XML_Feed_Parser like this:
$xml = mb_convert_encoding($loaded_xml, 'UTF-8', 'KOI8-R');
$feed = new XML_Feed_Parser($xml);

Disable warnings when loading non-well-formed HTML by DomDocument (PHP)

I need to parse some HTML files, however, they are not well-formed and PHP prints out warnings to. I want to avoid such debugging/warning behavior programatically. Please advise. Thank you!
Code:
// create a DOM document and load the HTML data
$xmlDoc = new DomDocument;
// this dumps out the warnings
$xmlDoc->loadHTML($fetchResult);
This:
#$xmlDoc->loadHTML($fetchResult)
can suppress the warnings but how can I capture those warnings programatically?
Call
libxml_use_internal_errors(true);
prior to processing with with $xmlDoc->loadHTML()
This tells libxml2 not to send errors and warnings through to PHP. Then, to check for errors and handle them yourself, you can consult libxml_get_last_error() and/or libxml_get_errors() when you're ready:
libxml_use_internal_errors(true);
$dom->loadHTML($html);
$errors = libxml_get_errors();
foreach ($errors as $error) {
// handle the errors as you wish
}
To hide the warnings, you have to give special instructions to libxml which is used internally to perform the parsing:
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();
The libxml_use_internal_errors(true) indicates that you're going to handle the errors and warnings yourself and you don't want them to mess up the output of your script.
This is not the same as the # operator. The warnings get collected behind the scenes and afterwards you can retrieve them by using libxml_get_errors() in case you wish to perform logging or return the list of issues to the caller.
Whether or not you're using the collected warnings you should always clear the queue by calling libxml_clear_errors().
Preserving the state
If you have other code that uses libxml it may be worthwhile to make sure your code doesn't alter the global state of the error handling; for this, you can use the return value of libxml_use_internal_errors() to save the previous state.
// modify state
$libxml_previous_state = libxml_use_internal_errors(true);
// parse
$dom->loadHTML($html);
// handle errors
libxml_clear_errors();
// restore
libxml_use_internal_errors($libxml_previous_state);
Setting the options "LIBXML_NOWARNING" & "LIBXML_NOERROR" works perfectly fine too:
$dom->loadHTML($html, LIBXML_NOWARNING | LIBXML_NOERROR);
You can install a temporary error handler with set_error_handler
class ErrorTrap {
protected $callback;
protected $errors = array();
function __construct($callback) {
$this->callback = $callback;
}
function call() {
$result = null;
set_error_handler(array($this, 'onError'));
try {
$result = call_user_func_array($this->callback, func_get_args());
} catch (Exception $ex) {
restore_error_handler();
throw $ex;
}
restore_error_handler();
return $result;
}
function onError($errno, $errstr, $errfile, $errline) {
$this->errors[] = array($errno, $errstr, $errfile, $errline);
}
function ok() {
return count($this->errors) === 0;
}
function errors() {
return $this->errors;
}
}
Usage:
// create a DOM document and load the HTML data
$xmlDoc = new DomDocument();
$caller = new ErrorTrap(array($xmlDoc, 'loadHTML'));
// this doesn't dump out any warnings
$caller->call($fetchResult);
if (!$caller->ok()) {
var_dump($caller->errors());
}

Categories