(PHP) Sudden DOMDocument non-object problems - php

I've been developing a XML based PHP application which has suddenly "lost" all XML capabilities.
It just gives non-object errors on all DOMDocument/XML functions such as item, replaceChild, removeChild etc. generate non-object errors. Didn't happen yesterday and I haven't changed anything in code.
There's also errors like Failed to parse QName, error parsing attribute name in Entity etc. while XML data is still the same as for the last few months so no changes there.
Seems like DOMDocument "library" is completely unavailable while PHPinfo states that all required modules etc are enabled.
EDIT:
Now it seems like SimpleXML function asXML() is adding a new element to the document:
<ns:#attributes/>

Thanks for the reply.
In the end it was far from the actual XML or XML code.
I had a != comparator instead of !== which made it act like this. So next time I compare some variable to false value I'll probably remember that I should use !==.
Small things make big difference.

Related

PHP's SimpleXMLElement returns parsing error despite seemingly valid XML

I'm having a horrible trouble getting this XML to be properly parsed through PHP's SimpleXMLElement. The error that I'm getting (below) is looking like it's a parsing error, but I can't seem to find any issue. And, since this is the National Weather Service's alerts feed, I would have to assume others are pulling this feed and getting it to correctly work.
I've tried all the the following and several variations of them:
$simpleFeed = new SimpleXMLElement(simplexml_load_string(file_get_contents('http://alerts.weather.gov/cap/us.php?x=0')));
and
$simpleFeed = new SimpleXMLElement(simplexml_load_string('http://alerts.weather.gov/cap/us.php?x=0'));
and
$simpleFeed = new SimpleXMLElement('http://alerts.weather.gov/cap/us.php?x=0', NULL, TRUE);
I've included the error that I'm currently getting, but the line numbers do occasionally change around (I'm not sure if that's my doing or the National Weather Service's Feed's doing):
SimpleXMLElement::__construct(): Entity: line 107: parser error : Start tag expected, '<' not found
SimpleXMLElement::__construct():
SimpleXMLElement::__construct(): ^
Every XML parser / validator that I run this through says that it's valid, with a few warnings. I'm not seeing anything here that would indicate the XML is the problem, except that the error message makes it look like that is the case.
Does anyone have experience with something like this and can help?
To spare you some misdirections and extra-routes:
$simpleFeed = new SimpleXMLElement(simplexml_load_string(file_get_contents('http://alerts.weather.gov/cap/us.php?x=0')));
is really over-doing it. In two ways:
simplexml_load_string(file_get_contents(...)) is actually just simplexml_load_file(...). PHP uses the same layer internally to get the date, regardless from within simplexml_load_file or file_get_contents. It's not hurting, but I think you should know.
more hurting (in the meaning of incompatible) is that you can't instantiate a SimpleXMLElement passing the SimpleXMLElement as the first parameter of it's contructor. That just does not work. I wonder you haven't seen other errors, you might think you can "skip" those safely, but it's actually crucial you listen to error messages when developing and running software (see How to get useful error messages in PHP?).
So what does this ranting list say? Simple in code:
$feedURL = http://alerts.weather.gov/cap/us.php?x=0';
$xml = simplexml_load_file($feedURL);
if (!$xml) {
throw new UnexpectedValueException(
sprintf('failed to open %s', var_export($feedURL, true))
);
}
This little code-example will give you either a the SimpleXMLElement or will tell you otherwise. You don't have to argue whether or not the XML is valid or the parser can deal with it. For the rest of your application it would be the same. A failure to open the URL or a success. There is nothing else you need to worry about. If the URL is wrong, get the right one. If the XML might be broken, file a bug-report. Everything else is nothing you really need to worry about. Just don't put code together you have no know understanding about how it works. In that case, just read the manual how it's intended to work, double-check, trouble-shoot, listen to PHP error messages and more especially, introduce checks into your own software your own that deals with cases of failure. They are there. Design for those.

simplexml_load_string not parsing my XML string. Charset issue?

I'm using the following PHP code to read XML data from NOAA's tide reporting station API:
$rawxml = file_get_contents(
"http://opendap.co-ops.nos.noaa.gov/axis/webservices/activestations/"
."response.jsp?v=2&format=xml&Submit=Submit"
);
$rawxml = utf8_encode($rawxml);
$ob = simplexml_load_string($rawxml);
var_dump($ob);
Unfortunately, I end up with it displaying this:
object(SimpleXMLElement)#246 (0) { }
It looks to me like the XML is perfectly well-formed - why won't this parse? From looking at another question (Simplexml_load_string() fail to parse error) I got the idea that the header might be the problem - the http call does indeed return a charset value of "ISO-8859-1". But adding in the utf8_encode() call doesn't seem to do the trick.
What's especially confusing is that simplexml_load_string() doesn't actually fail - it returns a cheerful XML array, just with nothing in it!
You've been fooled (and had me fooled) by the oldest trick in the SimpleXML book: SimpleXML doesn't parse the whole document into a PHP object, it presents a PHP API to an internal structure. Functions like var_dump can't see this structure, so don't always give a useful idea of what's in the object.
The reason it looks "empty" is that it is listing the children of the root element which are in the default namespace - but there aren't any, they're all in the "soapenv:" namespace.
To access namespaced elements, you need to use the children() method, passing in the full namespace name (recommended) or its local prefix (simpler, but could be broken by changes in the way the file is generated the other end). To switch back to the "default namespace", use ->children(null).
So you could get the ID attribute of the first stationV2 element like this (live demo):
// Define constant for the namespace names, rather than relying on the prefix the remote service uses remaining stable
define('NS_SOAP', 'http://schemas.xmlsoap.org/soap/envelope/');
// Download the XML
$rawxml = file_get_contents("http://opendap.co-ops.nos.noaa.gov/axis/webservices/activestations/response.jsp?v=2&format=xml&Submit=Submit");
// Parse it
$ob = simplexml_load_string($rawxml);
// Use it!
echo $ob->children(NS_SOAP)->Body->children(null)->ActiveStationsV2->stationsV2->stationV2[0]['ID'];
I've written some debugging functions to use with SimpleXML which should be much less misleading than var_dump etc. Here's a live demo with your code and simplexml_dump.

XML Youtube, access to yt:accessControl? the character :

I have
this xml file
and i'm trying to access to the value of attribute permission in the tag yt:accessControl in php
echo (string)$xmlyt->entry->children('yt')->{'accessControl'}->attributes()->$actionAttr."------------";
but i have the error
Node no longer exists
Understanding how SimpleXML works and XML generally is greatly beneficial to doing this...
You can discover things through trial and error and end up with something like this:
$sxml=simplexml_load_file('http://gdata.youtube.com/feeds/api/videos/'.$videoID.'?v=2');
$yt = $sxml->children('http://gdata.youtube.com/schemas/2007');
print_r($yt->accessControl->attributes());
print_r($yt->accessControl[4]->attributes());
For instance, this will give you permissions for the first and fifth actions, which happen to be comment and embed ATM (should probably loop through all to identify the ones you're interested instead of relying on the order).
Hope this helps,
aL

xml parsing error, using php's asxml()

$file = simplexml_load_file($url); {
foreach($file->entry as $post) {
$row = simplexml_load_string($post->asXML()); // after adding this line, i get error message
$links = $row->xpath('//link[#rel="alternate" and #type="text/html"]');
echo (string) $post->title;
echo (string) $links[0]['href'];
I use this script to parse atom feed. At first didn't work because it couldn't pass the link's href attribute properly. I added $row and even though it worked, it gives an error : "namespace prefix gd for etag on entry is not defined". I'm searching this for hours, can't find a solution. I was so close.
The line $row = simplexml_load_string($post->asXML());, if it worked, would be a long-winded way of writing $row = $post. ->asXML() and simplexml_load_string perform the opposite action to each other so you'd get back the same object you started with.
I think the reason it's behaving strangely in your case is that your XML document is using "namespaces", and the fragment of XML produced by $post->asXML() doesn't quite work as an XML document on its own.
I suspect that the original problem you had, which this line seemed to magically fix, was also with namespaces, as XPath is rather sensitive to them. Look up examples of using registerXPathNamespace and see if they solve your problem. If not, feel free to post a follow-up question showing your original problem, and including a sample of the XML you're processing.

Load an invalid XML in PHP DOM

I have and input XML file that is not correctly formatted ( ie. it has '&' instead of '& amp;')
When i try to load this XML using PHP DOM, $doc->load("file.xml") it throws and error and stops the parsing.
Is there any way to load this un-formatted XML? and No I cant edit the source XML file.
I did try using $doc->loadHTML() but it throws errors all over the place.
I wanted to know if there is a proper way to do this (like load file contents and change it using regex or something similar)
Try setting $doc->validateOnParse = false; before loading your XML via $doc->loadHTML(...).
First, check that it's the & that's causing the error and not something else.
One way or another, you'll have to modify the XML to get it parsed. The HTML in loadHTML is loaded from a string, can't you just replace the invalid characters with the correct ones?
If your installation supports the PHP Tidy extension (http://php.net/manual/en/book.tidy.php) you could try to clean it up with that, though in my experience it's far from foolproof.
If you are sure that's the only thing making it not validate, then you could try loading the file into a string with file_get_contents() function, then search & replace through the string to change the &'s into &'s, then place that string into simpleXML like $xml = simplexml_load_string($cleaned_string);

Categories