I'm trying to get started using XMLReader to process large XML files, but I am getting a strange HTTP 400 Bad Request when I try to run the following code:
<?php
$reader = new XMLReader ();
$reader->open ( "testfile.xml" );
while ( $reader->read () ) {
switch ($reader->nodeType) {
case (XMLREADER::ELEMENT) :
echo "<" . $reader->name . "> <br>";
break;
case (XMLREADER::TEXT) :
if ($reader->hasValue) {
echo $reader->value . "<br>";
}
break;
}
}
$reader->close();
?>
I have also tried it this way and get the same 400 Bad Request error:
<?php
$reader = new XMLReader ();
$reader->open ( "testfile.xml" );
while ( $reader->read() ) {
switch ($reader->nodeType) {
case (XMLREADER::ELEMENT) :
echo "<" . $reader->name . "> <br>";
$reader->read();
if (($reader->nodeType == XMLREADER::TEXT) && $reader->hasValue) {
echo $reader->value . "<br>";
}
break;
}
}
$reader->close();
?>
In both cases, the error goes away when I comment out echo reader->value ."<br>";. Apache error logs aren't showing anything. Also, in spite of the 400 error, the page is created and rendered as expected with the elements and text values (i.e., the code appears to work, it just gives an HTTP error as well).
It is also worth noting that it seems to work without error on a small, simple test XML file with only one root and one child element with text. It's only on the larger more complicated XML file that I'm actually intending to process that I'm getting the error.
Thanks in advance for any help!
FYI in case anyone else runs into this, I found out I needed to use htmlspecialchars() to escape the value. I changed:
echo $reader->value . "<br>";
to
echo htmlspecialchars($reader->value, ENT_XML1, 'UTF-8') . "<br>";
Guess there must be some html in the XML that the browser was trying to interpret causing the 400 error.
Related
I'm trying to get the PS4 firmware version from their XML, but for some reason it's returning NULL.
<?php
$list = simplexml_load_file('http://feu01.ps4.update.playstation.net/update/ps4/list/eu/ps4-updatelist.xml');
if($list) {
echo $list->system_pup[0]['label']; // get firmware version
} else {
echo 'Error opening the XML file.';
}
?>
I have no idea what I'm doing wrong, because I've followed this article and it seems I've done it correctly.
Any ideas?
If accessing the wrong element simplexml doesn't throw an error it just gives you the nothingness that your call returned. You should look at the structure to determine where in the structure your element is. In this case you are off by 1 element.
$list = simplexml_load_file('http://feu01.ps4.update.playstation.net/update/ps4/list/eu/ps4-updatelist.xml');
if($list) {
//print_r($list);
echo $list->region->system_pup[0]['label']; // get firmware version
} else {
echo 'Error opening the XML file.';
}
Another option can be accessing attributes of a node with attributes() function:
$list = simplexml_load_file('http://feu01.ps4.update.playstation.net/update/ps4/list/eu/ps4-updatelist.xml');
echo $list->region->system_pup->attributes()->label;
I am very new to both php and xml. What I am trying to do in
php is read in xml from a call to a url, and then parse the xml.
(I can get this to work in the example below when $urlip = 'localfile.xml'
but not when I put in a url. Ive checked the url by going to it with my browser,
and I can see the xml. I also did a show source, copied it and then pasted the
xml into the localfile and that works fine.
What am I doing wrong in trying to get the xml from the url?
Thank you
The error being returned is:
Error loading XML Start tag expected, ‘<' not found
Here is my code snip it:
$urlip="test.xml";# for debugging since I cannot read from the url yet! not sure why....
if (($xml = file_get_contents($urlip))===false) {
echo "error fetching XML\n";
} else {
libxml_use_internal_errors(true);
$data = simplexml_load_string($xml,null,LIBXML_NOCDATA);
if (!$data) {
echo "Error loading XML\n";
foreach(libxml_get_errors() as $error) {
echo "\t", $error->message;
}
} else {
foreach ($data as $item) {
$type = $item->TAB_TYPE;
$number=$item->ALT_ID;
$title = $item->SHORT_DESCR;
$searchlink = $item->ID;
$rsite=$item->CATEGORY;
echo "type $type, number $number, title $title, search link $searchlink, site $rsite\n";
}
}
}
Most likely situation from what it looks like:
Your function queries the remote URL and returns you an empty string, which passes the condition of your 'if' statement.
After that - you try to pass the empty string into XML, but it cannot, so it gives you an error.
Your steps to solve it:
configure php to open remote urls as comments to your question state - url_fopen
use another way to get content from the URL - cURL library works well
I've been given data from a previous version of a website (it was a custom CMS) and am looking to get it into a state that I can import it into my Wordpress site.
This is what I'm working on - http://www.teamworksdesign.com/clients/ciw/datatest/index.php. If you scroll down to row 187 the data starts to fail (there should be a red message) with the following error message:
Fatal error: Uncaught exception 'Exception' with message 'String could
not be parsed as XML' in
/home/teamwork/public_html/clients/ciw/datatest/index.php:132 Stack
trace: #0
/home/teamwork/public_html/clients/ciw/datatest/index.php(132):
SimpleXMLElement->__construct('
Can anyone see what the problem is and how to fix it?
This is how I'm outputting the date:
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>
<?php
ini_set('memory_limit','1024M');
ini_set('max_execution_time', 500); //300 seconds = 5 minutes
echo "<br />memory_limit: " . ini_get('memory_limit') . "<br /><br />";
echo "<br />max_execution_time: " . ini_get('max_execution_time') . "<br /><br />";
libxml_use_internal_errors(true);
$z = new XMLReader;
$z->open('dbo_Content.xml');
$doc = new DOMDocument;
$doc->preserveWhiteSpace = false;
// move to the first <product /> node
while ($z->read() && $z->name !== 'dbo_Content');
$c = 0;
// now that we're at the right depth, hop to the next <product/> until the end of the tree
while ($z->name === 'dbo_Content')
{
if($c < 201) {
// either one should work
$node = simplexml_import_dom($doc->importNode($z->expand(), true));
if($node->ClassId == 'policydocument') {
$c++;
echo "<h1>Row: $c</h1>";
echo "<pre>";
echo htmlentities($node->XML) . "<br /><br /><br /><b>*******</b><br /><br /><br />";
echo "</pre>";
try{
$xmlObject = new SimpleXMLElement($node->XML);
foreach ($xmlObject->fields[0]->field as $field) {
switch((string) $field['name']) {
case 'parentId':
echo "<b>PARENT ID: </b> " . $field->value . "<br />";
break;
case 'title':
echo "<b>TITLE: </b> " . $field->value . "<br />";
break;
case 'summary':
echo "<b>SUMMARY: </b> " . $field->value . "<br />";
break;
case 'body':
echo "<b>BODY:</b> " . $field->value . "<br />";
break;
case 'published':
echo "<b>PUBLISHED:</b> " . $field->value . "<br />";
break;
}
}
echo '<br /><h2 style="color:green;">Success on node: '.$node->ContentId.'</h2><hr /><br />';
} catch (Exception $e){
echo '<h2 style="color:red;">Failed on node: '.$node->ContentId.'</h2>';
}
}
// go to next <product />
$z->next('dbo_Content');
}
} ?>
</body>
</html>
The error message you're getting "String could not be parsed as XML" means that the XML parser found something in the input data that was not valid XML.
You haven't shown us the data, so I can't tell you exactly what is invalid, but something in there is failing to meet the strict rules for XML parsing. There are any number of possible reasons for this.
If I had to stick my neck out on the line and guess, I'd say the most common reason cause of bad XML in the middle of a file that is otherwise okay would be an unescaped & when it should be the & entity code.
Anyone creating their XML using a proper XML writer shouldn't have this issue, but I've come across plenty of cases where people don't bother using an XML writer and just output raw XML as text and have forgotten to escape the entities, which means that that the data is fine until you come to a company name with an & in it.
If it's as simple as that, and it's a one-off import, you may be able to fix the file manually in a text editor.
However that's just a guess. You'll need to actually examine the XML file for yourself to see the problem. If you can't see the problem visually, I'd suggest using a GUI XML tool to analyse the file.
Hope that helps.
[EDIT]
Okay, I just took a better look at the data in the link you gave, and on thing sticks out like a sore thumb....
encoding="utf-16"
I note that all the data that has worked was using UTF-8, and all the data that has failed is using UTF-16.
PHP is generally fine with UTF-8, but it won't cope very well at all with UTF-16. So it's fairly clear that this is your problem.
And, to be honest, there's really no need to ever use UTF-16, so the solution here is to switch to UTF-8 encoding for everything.
How easy that is for you to do, I can't say, but worst case I'm sure you could find a batch convertion tool.
Hope that helps.
I have an XML file which contains text with some very simple layout constructs:
<?xml version='1.0'?>
<page>
<section>
<header>Header</header>
<par>Some paragraph</par>
<par>Another paragraph with <emph>formatting</emph></par>
</section>
</page>
In PHP then I read this file using SimpleXML (Note that I intentionally strip other tags!):
$page = file_get_contents("page.xml");
if ($page) {
$stripped = strip_tags($page, "<?xml><page><section><header><par><emph>");
$xml = new SimpleXMLElement($stripped);
}
Now I would like to iterate over the XML elements and print them in order as HTML for my website. The final result should be the following snippet:
<h1>Header</h1>
<p>Some paragraph
<p>Another paragraph with <i>formatting</i>
I've noodled through SimpleXML and XPath and tried to figure out how I can iterate over the XML tree in order so that I can digest the original XML file into HTML output. I can produce a somewhat desired result but the <emph></emph> is just gone; how do I descent further into the tree? My code so far:
foreach ($xml->section as $s) {
echo "<h1>" . $s->header . "</h1>";
foreach ($s->par as $p) {
echo "<p>" . $p;
// Do some magic here to ensure <emph> tags are recognized and responded to properly.
}
}
Any hints and pointers are appreciated! Thanks :-)
Well, without an answer I just had to noodle myself :-) So here is what I did and it worked out just fine.
Turned out that the SimpleXML thing didn't cut it, so I used the XMLReader:
$xml = new XMLReader();
Then I manually parsed the XML string, jumped from element to element and acted upon each of them:
if ($xml->xml($stripped)) { // $stripped here is a string that's been validated (see below).
while (false !== $xml->read()) {
$t = $xml->nodeType;
if ($t === XMLReader::ELEMENT) {
$n = $xml->name;
switch ($n) {
case "page":
case "section":
// Nothing to echo here.
break;
case "header":
// Handle attributes here
echo "<h1>";
break;
case "par":
echo "<p> ";
break;
case "emph";
echo "<i>"; // This can also open a <span> for more flexibility later.
break;
default:
// Nothing should arrive here.
echo "Gah!"
}
}
else if ($t === XMLReader::END_ELEMENT) {
... // Close the opened tags here.
}
else if ($t === XMLReader::TEXT) {
$s = $xml->readString();
echo $s;
}
else {
// Everything else are comments or white spaces.
}
}
}
You get the drift. I basically had to bounce through the XML structure myself and, dependent on the element type, handle attributes and nodes of elements manually.
In fact, this is a two-step process. What you see here assumes a valid XML document. I also have a validator that runs before the above code, and which makes sure that the correct elements are nested properly and that the given XML is "well formed" as per my own definitions of nesting, attributes, whatnot. The validator operates after the exact same principle.
Hope this helps.
I have an XML document that will either return a URL or an error code. If it returns an error code I'd like to redirect the page to an error image. I can get SimpleXML to return the URL, but I am not sure how to write a condition if the error returns. If anyone has any suggestions, that'd be great! This is what I have right now:
<?php
error_reporting(0);
$url = 'http://radiocast.co/art/api1/key=KeyHere&album=' . htmlspecialchars($_GET["album"]) . '&artist=' . htmlspecialchars($_GET["artist"]) . '';
$xml = simplexml_load_file($url);
$img = $xml->xpath('//image[#size="large"]');
$large = (string)$img[0];
header("Location:".urldecode($large));
?>
This is what the XML document returns if it cannot be found:
<?xml version="1.0" encoding="utf-8"?>
<lookup status="failed">
<error code="3">Art not found</error></lookup>
How about checking for the error node in your XML, and if you find it then handling the various errors? If you don't find it you can continue with your normal logic.
if (isset($xml->error)) {
switch ($xml->error['code']) {
case '3':
// not found stuff here
break;
// other error codes here
}
} else {
// success logic here
$img = $xml->xpath('//image[#size="large"]');
$large = (string) $img[0];
header("Location: " . urldecode($large));
}