How do I process XML string in order using PHP - php

I have an XML file which contains text with some very simple layout constructs:
<?xml version='1.0'?>
<page>
<section>
<header>Header</header>
<par>Some paragraph</par>
<par>Another paragraph with <emph>formatting</emph></par>
</section>
</page>
In PHP then I read this file using SimpleXML (Note that I intentionally strip other tags!):
$page = file_get_contents("page.xml");
if ($page) {
$stripped = strip_tags($page, "<?xml><page><section><header><par><emph>");
$xml = new SimpleXMLElement($stripped);
}
Now I would like to iterate over the XML elements and print them in order as HTML for my website. The final result should be the following snippet:
<h1>Header</h1>
<p>Some paragraph
<p>Another paragraph with <i>formatting</i>
I've noodled through SimpleXML and XPath and tried to figure out how I can iterate over the XML tree in order so that I can digest the original XML file into HTML output. I can produce a somewhat desired result but the <emph></emph> is just gone; how do I descent further into the tree? My code so far:
foreach ($xml->section as $s) {
echo "<h1>" . $s->header . "</h1>";
foreach ($s->par as $p) {
echo "<p>" . $p;
// Do some magic here to ensure <emph> tags are recognized and responded to properly.
}
}
Any hints and pointers are appreciated! Thanks :-)

Well, without an answer I just had to noodle myself :-) So here is what I did and it worked out just fine.
Turned out that the SimpleXML thing didn't cut it, so I used the XMLReader:
$xml = new XMLReader();
Then I manually parsed the XML string, jumped from element to element and acted upon each of them:
if ($xml->xml($stripped)) { // $stripped here is a string that's been validated (see below).
while (false !== $xml->read()) {
$t = $xml->nodeType;
if ($t === XMLReader::ELEMENT) {
$n = $xml->name;
switch ($n) {
case "page":
case "section":
// Nothing to echo here.
break;
case "header":
// Handle attributes here
echo "<h1>";
break;
case "par":
echo "<p> ";
break;
case "emph";
echo "<i>"; // This can also open a <span> for more flexibility later.
break;
default:
// Nothing should arrive here.
echo "Gah!"
}
}
else if ($t === XMLReader::END_ELEMENT) {
... // Close the opened tags here.
}
else if ($t === XMLReader::TEXT) {
$s = $xml->readString();
echo $s;
}
else {
// Everything else are comments or white spaces.
}
}
}
You get the drift. I basically had to bounce through the XML structure myself and, dependent on the element type, handle attributes and nodes of elements manually.
In fact, this is a two-step process. What you see here assumes a valid XML document. I also have a validator that runs before the above code, and which makes sure that the correct elements are nested properly and that the given XML is "well formed" as per my own definitions of nesting, attributes, whatnot. The validator operates after the exact same principle.
Hope this helps.

Related

Need to remove headers from cURL XML response in PHP

There's a few threads about this, but I couldn't find a solution to this issue in them. I hope it doesn't violate duplicate rules.
I've tested the following code with static XML and it works great, but said XML did not contain any headers.
I'm trying to remove headers through code after making a POST request so I can continue to process the resulting XML, but I'm not having any luck with it.
This is the XML:
<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><soap:Body><AUTOS_Cotizar_PHPResponse xmlns="http://tempuri.org/"><AUTOS_Cotizar_PHPResult><auto xmlns=""><operacion>1555843</operacion><statusSuccess>TRUE</statusSuccess><statusText></statusText><cotizacion><cobertura><codigo>A0</codigo><descripcion>RESPONSABILIDAD CIVIL SOLAMENTE</descripcion><premio>928,45</premio><cuotas>01</cuotas><impcuotas>928,45</impcuotas></cobertura></cotizacion><datos_cotiz><suma>477250</suma><uso>901</uso></datos_cotiz></auto></AUTOS_Cotizar_PHPResult></AUTOS_Cotizar_PHPResponse></soap:Body></soap:Envelope>
this is the code:
//converting raw cURL response to XML
$temp1 = htmlspecialchars ($reply);
//replacing top headers
$temp2 = str_replace('<?xml version="1.0" encoding="utf-8"?><soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"><soap:Body><AUTOS_Cotizar_PHPResponse xmlns="http://tempuri.org/"><AUTOS_Cotizar_PHPResult>', "<<<'EOD'", $temp1);
//replacing closing header tags
$temp3 = str_replace('</AUTOS_Cotizar_PHPResult></AUTOS_Cotizar_PHPResponse></soap:Body></soap:Envelope>', "EOD;", $temp2);
//this returns the original $temp1 without having anything replaced
echo $temp3;
//simplexml conversion
$xml = simplexml_load_string($temp3);
//running through the array and printing all values
if ($xml !== false) {
foreach ($xml->cotizacion as $cotizacion) {
foreach ($cotizacion->cobertura as $cobertura) {
echo $cobertura->codigo;
echo '<br>';
echo $cobertura->descripcion;
echo '<br>';
echo $cobertura->premio;
echo '<br>';
echo $cobertura->cuotas;
echo '<br>';
echo $cobertura->impcuotas;
echo '<br>';
}
}
}
There are probably more efficient ways to do this, or maybe I'm not doing this correctly. I'm just about learning right now, so feel free to correct me in any way if you want, I'd appreciate it!
The way you are processing the response string is a bad idea, you should stick to processing the content as XML and work with it. This uses XPath to find a start point to process the data (which I can't test with the current sample), but should help with what you need to do...
// Load the original reply
$xml = simplexml_load_string($reply);
//running through the array and printing all values
if ($xml !== false) {
// Find the <auto> element (use [0] as you want the first one)
$auto = $xml->xpath("//auto")[0];
// Loop through the cotizacion elements in the auto element
foreach ($auto->cotizacion as $cotizacion) {
foreach ($cotizacion->cobertura as $cobertura) {
echo $cobertura->codigo;
echo '<br>';
echo $cobertura->descripcion;
echo '<br>';
echo $cobertura->premio;
echo '<br>';
echo $cobertura->cuotas;
echo '<br>';
echo $cobertura->impcuotas;
echo '<br>';
}
}
}
The SOAP response is still an XML document, so work with it instead of fighting it. Treating it as a string is definitely not great.
As far as I can tell you're trying to work with all the <cotizaction> elements. It's simple to find elements inside an XML document. Read up on XPath.
$xml = simplexml_load_string(htmlspecialchars($reply));
if ($xml) {
foreach ($xml->xpath('//cotizacion') as $cotizacion) {
// do your thing
}
}

PHP echo content from <tags>

I've searched around and around and I'm not sure how this really works.
I have the tags
<taghere>content</taghere>
and i want to pull the "content" so i can put an ifstatement depending on what the "content" is as the "content" is varrying depending on the page
i.e
<taghere>HelloWorld</taghere>
$content = //function that returns the text between <taghere> and </taghere>
if($content == "HelloWorld")
{
//execute function;
}
else if($content =="Bonjour")
{
//execute seperate function
}
i tried using preg but it doesnt seem to work and just returns whatever value is in the lines field instead of actually giving me the information within the tags
If I understand your question correctly, you want the data INSIDE the tag "taghere".
If you are parsing HTML, you should use DOMDocument
Try something similar to this:
<?php
// Assuming your content (the html where those tags are found) is available as $html
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your HTML
libxml_clear_errors();
// Note: Tag names are case sensitive
$text = $dom->getElementsByTagName('taghere');
// Echo the content
echo $text
you can use DomDocument and loadXML to do this
<?php
function doAction($word=""){
$html="<taghere>$word</taghere>";
$doc = new DOMDocument();
$doc->loadXML($html);
//discard white space
$hTwo= $doc->getElementsByTagName('taghere'); // here u use your desired tag
if($hTwo->item(0)->nodeValue== "HelloWorld")
{
echo "1";
}
else if($hTwo->item(0)->nodeValue== "Bonjour")
{
echo "2";
//execute seperate function
}
}
doAction($word="Bonjour");
You cannot do it like that. Technically it is possible but it's more than an overkill. And you mixed up PHP with HTML in a way that doesn't work.
To achieve the thing that you want you have to do something like this:
$content = 'something';
if ($comtent === 'something') {
//do something
}
if ($content === 'something else') {
//do something else
}
echo '<tag>'. $content . '</tag>' ;
Of course you can change $content in the ifs.
Dont forget, you can allways add an ID into a tag so you can reference it with java script.
<tag id='tagid'>blah blah blah </tag>
<script>
document.getElementById(tagid)
</script>
This might be a much simpler way to get what you are thinking about then some of the other responses
I don't know what regex you tried and therefor not what would have been wrong. Might have been the escaping of the <
<?php
if(preg_match('#\<taghere>(.*)\</taghere>#', $document, $a)){
$content = $a[1];
}
?>
I suppose there will be only one

Parsing large XML with XMLReader [duplicate]

I have the following XML file, the file is rather large and i haven't been able to get simplexml to open and read the file so i'm trying XMLReader with no success in php
<?xml version="1.0" encoding="ISO-8859-1"?>
<products>
<last_updated>2009-11-30 13:52:40</last_updated>
<product>
<element_1>foo</element_1>
<element_2>foo</element_2>
<element_3>foo</element_3>
<element_4>foo</element_4>
</product>
<product>
<element_1>bar</element_1>
<element_2>bar</element_2>
<element_3>bar</element_3>
<element_4>bar</element_4>
</product>
</products>
I've unfortunately not found a good tutorial on this for PHP and would love to see how I can get each element content to store in a database.
It all depends on how big the unit of work, but I guess you're trying to treat each <product/> nodes in succession.
For that, the simplest way would be to use XMLReader to get to each node, then use SimpleXML to access them. This way, you keep the memory usage low because you're treating one node at a time and you still leverage SimpleXML's ease of use. For instance:
$z = new XMLReader;
$z->open('data.xml');
$doc = new DOMDocument;
// move to the first <product /> node
while ($z->read() && $z->name !== 'product');
// now that we're at the right depth, hop to the next <product/> until the end of the tree
while ($z->name === 'product')
{
// either one should work
//$node = new SimpleXMLElement($z->readOuterXML());
$node = simplexml_import_dom($doc->importNode($z->expand(), true));
// now you can use $node without going insane about parsing
var_dump($node->element_1);
// go to next <product />
$z->next('product');
}
Quick overview of pros and cons of different approaches:
XMLReader only
Pros: fast, uses little memory
Cons: excessively hard to write and debug, requires lots of userland code to do anything useful. Userland code is slow and prone to error. Plus, it leaves you with more lines of code to maintain
XMLReader + SimpleXML
Pros: doesn't use much memory (only the memory needed to process one node) and SimpleXML is, as the name implies, really easy to use.
Cons: creating a SimpleXMLElement object for each node is not very fast. You really have to benchmark it to understand whether it's a problem for you. Even a modest machine would be able to process a thousand nodes per second, though.
XMLReader + DOM
Pros: uses about as much memory as SimpleXML, and XMLReader::expand() is faster than creating a new SimpleXMLElement. I wish it was possible to use simplexml_import_dom() but it doesn't seem to work in that case
Cons: DOM is annoying to work with. It's halfway between XMLReader and SimpleXML. Not as complicated and awkward as XMLReader, but light years away from working with SimpleXML.
My advice: write a prototype with SimpleXML, see if it works for you. If performance is paramount, try DOM. Stay as far away from XMLReader as possible. Remember that the more code you write, the higher the possibility of you introducing bugs or introducing performance regressions.
For xml formatted with attributes...
data.xml:
<building_data>
<building address="some address" lat="28.902914" lng="-71.007235" />
<building address="some address" lat="48.892342" lng="-75.0423423" />
<building address="some address" lat="58.929753" lng="-79.1236987" />
</building_data>
php code:
$reader = new XMLReader();
if (!$reader->open("data.xml")) {
die("Failed to open 'data.xml'");
}
while($reader->read()) {
if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'building') {
$address = $reader->getAttribute('address');
$latitude = $reader->getAttribute('lat');
$longitude = $reader->getAttribute('lng');
}
$reader->close();
The accepted answer gave me a good start, but brought in more classes and more processing than I would have liked; so this is my interpretation:
$xml_reader = new XMLReader;
$xml_reader->open($feed_url);
// move the pointer to the first product
while ($xml_reader->read() && $xml_reader->name != 'product');
// loop through the products
while ($xml_reader->name == 'product')
{
// load the current xml element into simplexml and we’re off and running!
$xml = simplexml_load_string($xml_reader->readOuterXML());
// now you can use your simpleXML object ($xml).
echo $xml->element_1;
// move the pointer to the next product
$xml_reader->next('product');
}
// don’t forget to close the file
$xml_reader->close();
Most of my XML parsing life is spent extracting nuggets of useful information out of truckloads of XML (Amazon MWS). As such, my answer assumes you want only specific information and you know where it is located.
I find the easiest way to use XMLReader is to know which tags I want the information out of and use them. If you know the structure of the XML and it has lots of unique tags, I find that using the first case is the easy. Cases 2 and 3 are just to show you how it can be done for more complex tags. This is extremely fast; I have a discussion of speed over on What is the fastest XML parser in PHP?
The most important thing to remember when doing tag-based parsing like this is to use if ($myXML->nodeType == XMLReader::ELEMENT) {... - which checks to be sure we're only dealing with opening nodes and not whitespace or closing nodes or whatever.
function parseMyXML ($xml) { //pass in an XML string
$myXML = new XMLReader();
$myXML->xml($xml);
while ($myXML->read()) { //start reading.
if ($myXML->nodeType == XMLReader::ELEMENT) { //only opening tags.
$tag = $myXML->name; //make $tag contain the name of the tag
switch ($tag) {
case 'Tag1': //this tag contains no child elements, only the content we need. And it's unique.
$variable = $myXML->readInnerXML(); //now variable contains the contents of tag1
break;
case 'Tag2': //this tag contains child elements, of which we only want one.
while($myXML->read()) { //so we tell it to keep reading
if ($myXML->nodeType == XMLReader::ELEMENT && $myXML->name === 'Amount') { // and when it finds the amount tag...
$variable2 = $myXML->readInnerXML(); //...put it in $variable2.
break;
}
}
break;
case 'Tag3': //tag3 also has children, which are not unique, but we need two of the children this time.
while($myXML->read()) {
if ($myXML->nodeType == XMLReader::ELEMENT && $myXML->name === 'Amount') {
$variable3 = $myXML->readInnerXML();
break;
} else if ($myXML->nodeType == XMLReader::ELEMENT && $myXML->name === 'Currency') {
$variable4 = $myXML->readInnerXML();
break;
}
}
break;
}
}
}
$myXML->close();
}
Simple example:
public function productsAction()
{
$saveFileName = 'ceneo.xml';
$filename = $this->path . $saveFileName;
if(file_exists($filename)) {
$reader = new XMLReader();
$reader->open($filename);
$countElements = 0;
while($reader->read()) {
if($reader->nodeType == XMLReader::ELEMENT) {
$nodeName = $reader->name;
}
if($reader->nodeType == XMLReader::TEXT && !empty($nodeName)) {
switch ($nodeName) {
case 'id':
var_dump($reader->value);
break;
}
}
if($reader->nodeType == XMLReader::END_ELEMENT && $reader->name == 'offer') {
$countElements++;
}
}
$reader->close();
exit(print('<pre>') . var_dump($countElements));
}
}
XMLReader is well documented on PHP site. This is a XML Pull Parser, which means it's used to iterate through nodes (or DOM Nodes) of given XML document. For example, you could go through the entire document you gave like this:
<?php
$reader = new XMLReader();
if (!$reader->open("data.xml"))
{
die("Failed to open 'data.xml'");
}
while($reader->read())
{
$node = $reader->expand();
// process $node...
}
$reader->close();
?>
It is then up to you to decide how to deal with the node returned by XMLReader::expand().
This Works Better and Faster For Me
<html>
<head>
<script>
function showRSS(str) {
if (str.length==0) {
document.getElementById("rssOutput").innerHTML="";
return;
}
if (window.XMLHttpRequest) {
// code for IE7+, Firefox, Chrome, Opera, Safari
xmlhttp=new XMLHttpRequest();
} else { // code for IE6, IE5
xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
}
xmlhttp.onreadystatechange=function() {
if (this.readyState==4 && this.status==200) {
document.getElementById("rssOutput").innerHTML=this.responseText;
}
}
xmlhttp.open("GET","getrss.php?q="+str,true);
xmlhttp.send();
}
</script>
</head>
<body>
<form>
<select onchange="showRSS(this.value)">
<option value="">Select an RSS-feed:</option>
<option value="Google">Google News</option>
<option value="ZDN">ZDNet News</option>
<option value="job">Job</option>
</select>
</form>
<br>
<div id="rssOutput">RSS-feed will be listed here...</div>
</body>
</html>
**The Backend File **
<?php
//get the q parameter from URL
$q=$_GET["q"];
//find out which feed was selected
if($q=="Google") {
$xml=("http://news.google.com/news?ned=us&topic=h&output=rss");
} elseif($q=="ZDN") {
$xml=("https://www.zdnet.com/news/rss.xml");
}elseif($q == "job"){
$xml=("https://ngcareers.com/feed");
}
$xmlDoc = new DOMDocument();
$xmlDoc->load($xml);
//get elements from "<channel>"
$channel=$xmlDoc->getElementsByTagName('channel')->item(0);
$channel_title = $channel->getElementsByTagName('title')
->item(0)->childNodes->item(0)->nodeValue;
$channel_link = $channel->getElementsByTagName('link')
->item(0)->childNodes->item(0)->nodeValue;
$channel_desc = $channel->getElementsByTagName('description')
->item(0)->childNodes->item(0)->nodeValue;
//output elements from "<channel>"
echo("<p><a href='" . $channel_link
. "'>" . $channel_title . "</a>");
echo("<br>");
echo($channel_desc . "</p>");
//get and output "<item>" elements
$x=$xmlDoc->getElementsByTagName('item');
$count = $x->length;
// print_r( $x->item(0)->getElementsByTagName('title')->item(0)->nodeValue);
// print_r( $x->item(0)->getElementsByTagName('link')->item(0)->nodeValue);
// print_r( $x->item(0)->getElementsByTagName('description')->item(0)->nodeValue);
// return;
for ($i=0; $i <= $count; $i++) {
//Title
$item_title = $x->item(0)->getElementsByTagName('title')->item(0)->nodeValue;
//Link
$item_link = $x->item(0)->getElementsByTagName('link')->item(0)->nodeValue;
//Description
$item_desc = $x->item(0)->getElementsByTagName('description')->item(0)->nodeValue;
//Category
$item_cat = $x->item(0)->getElementsByTagName('category')->item(0)->nodeValue;
echo ("<p>Title: <a href='" . $item_link
. "'>" . $item_title . "</a>");
echo ("<br>");
echo ("Desc: ".$item_desc);
echo ("<br>");
echo ("Category: ".$item_cat . "</p>");
}
?>

simple_html_dom not returning <h1> elements?

I'm testing a parser using SIMPLE_HTML_DOM and while parsing
the returned HTML DOM from this URL: HERE
It is not finding the H1 elements...
I tried returning all the div's with success.
I'm using a simple request for diagnosing this problem:
foreach($html->find('H1') as $value) { echo "<br />F: ".htmlspecialchars($value); }
While looking at the source code I realized that:
h1 is upper case -> H1 - but the SIMPLE_HTML... is handling that:
//PaperG - If lowercase is set, do a case insensitive test of the value of the selector.
if ($lowercase) {
$check = $this->match($exp, strtolower($val), strtolower($nodeKeyValue));
} else {
$check = $this->match($exp, $val, $nodeKeyValue);
}
if (is_object($debugObject)) {$debugObject->debugLog(2, "after match: " . ($check ? "true" : "false"));}
Can any body help me understanding what is going on here?
Try This
$oHtml = str_get_html($html);
foreach($oHtml->find('h1') as $element)
{
echo $element->innertext;
}
You will also use regular expression following function return an array of all h1 tag's innertext
function getH1($yourhtml)
{
$h1tags = preg_match_all("/(<h1.*>)(\w.*)(<\/h1>)/isxmU", $yourhtml, $patterns);
$res = array();
array_push($res, $patterns[2]);
array_push($res, count($patterns[2]));
return $res;
}
Found it...
But cant explain it!
I tested with another code including H1 (uppercase) and it worked.
While playing with the SIMPLE_HTML_DOM code i commented the "remove_noise" and now its working
perfectly, I think it's because that this website has invalid HTML and
the noise remover is removing too much and not ending after the end tags scripts:
// $this->remove_noise("'<\s*script[^>]*[^/]>(.*?)<\s*/\s*script\s*>'is");
// $this->remove_noise("'<\s*script\s*>(.*?)<\s*/\s*script\s*>'is");
Thank you all for your help.

How to use XMLReader in PHP?

I have the following XML file, the file is rather large and i haven't been able to get simplexml to open and read the file so i'm trying XMLReader with no success in php
<?xml version="1.0" encoding="ISO-8859-1"?>
<products>
<last_updated>2009-11-30 13:52:40</last_updated>
<product>
<element_1>foo</element_1>
<element_2>foo</element_2>
<element_3>foo</element_3>
<element_4>foo</element_4>
</product>
<product>
<element_1>bar</element_1>
<element_2>bar</element_2>
<element_3>bar</element_3>
<element_4>bar</element_4>
</product>
</products>
I've unfortunately not found a good tutorial on this for PHP and would love to see how I can get each element content to store in a database.
It all depends on how big the unit of work, but I guess you're trying to treat each <product/> nodes in succession.
For that, the simplest way would be to use XMLReader to get to each node, then use SimpleXML to access them. This way, you keep the memory usage low because you're treating one node at a time and you still leverage SimpleXML's ease of use. For instance:
$z = new XMLReader;
$z->open('data.xml');
$doc = new DOMDocument;
// move to the first <product /> node
while ($z->read() && $z->name !== 'product');
// now that we're at the right depth, hop to the next <product/> until the end of the tree
while ($z->name === 'product')
{
// either one should work
//$node = new SimpleXMLElement($z->readOuterXML());
$node = simplexml_import_dom($doc->importNode($z->expand(), true));
// now you can use $node without going insane about parsing
var_dump($node->element_1);
// go to next <product />
$z->next('product');
}
Quick overview of pros and cons of different approaches:
XMLReader only
Pros: fast, uses little memory
Cons: excessively hard to write and debug, requires lots of userland code to do anything useful. Userland code is slow and prone to error. Plus, it leaves you with more lines of code to maintain
XMLReader + SimpleXML
Pros: doesn't use much memory (only the memory needed to process one node) and SimpleXML is, as the name implies, really easy to use.
Cons: creating a SimpleXMLElement object for each node is not very fast. You really have to benchmark it to understand whether it's a problem for you. Even a modest machine would be able to process a thousand nodes per second, though.
XMLReader + DOM
Pros: uses about as much memory as SimpleXML, and XMLReader::expand() is faster than creating a new SimpleXMLElement. I wish it was possible to use simplexml_import_dom() but it doesn't seem to work in that case
Cons: DOM is annoying to work with. It's halfway between XMLReader and SimpleXML. Not as complicated and awkward as XMLReader, but light years away from working with SimpleXML.
My advice: write a prototype with SimpleXML, see if it works for you. If performance is paramount, try DOM. Stay as far away from XMLReader as possible. Remember that the more code you write, the higher the possibility of you introducing bugs or introducing performance regressions.
For xml formatted with attributes...
data.xml:
<building_data>
<building address="some address" lat="28.902914" lng="-71.007235" />
<building address="some address" lat="48.892342" lng="-75.0423423" />
<building address="some address" lat="58.929753" lng="-79.1236987" />
</building_data>
php code:
$reader = new XMLReader();
if (!$reader->open("data.xml")) {
die("Failed to open 'data.xml'");
}
while($reader->read()) {
if ($reader->nodeType == XMLReader::ELEMENT && $reader->name == 'building') {
$address = $reader->getAttribute('address');
$latitude = $reader->getAttribute('lat');
$longitude = $reader->getAttribute('lng');
}
$reader->close();
The accepted answer gave me a good start, but brought in more classes and more processing than I would have liked; so this is my interpretation:
$xml_reader = new XMLReader;
$xml_reader->open($feed_url);
// move the pointer to the first product
while ($xml_reader->read() && $xml_reader->name != 'product');
// loop through the products
while ($xml_reader->name == 'product')
{
// load the current xml element into simplexml and we’re off and running!
$xml = simplexml_load_string($xml_reader->readOuterXML());
// now you can use your simpleXML object ($xml).
echo $xml->element_1;
// move the pointer to the next product
$xml_reader->next('product');
}
// don’t forget to close the file
$xml_reader->close();
Most of my XML parsing life is spent extracting nuggets of useful information out of truckloads of XML (Amazon MWS). As such, my answer assumes you want only specific information and you know where it is located.
I find the easiest way to use XMLReader is to know which tags I want the information out of and use them. If you know the structure of the XML and it has lots of unique tags, I find that using the first case is the easy. Cases 2 and 3 are just to show you how it can be done for more complex tags. This is extremely fast; I have a discussion of speed over on What is the fastest XML parser in PHP?
The most important thing to remember when doing tag-based parsing like this is to use if ($myXML->nodeType == XMLReader::ELEMENT) {... - which checks to be sure we're only dealing with opening nodes and not whitespace or closing nodes or whatever.
function parseMyXML ($xml) { //pass in an XML string
$myXML = new XMLReader();
$myXML->xml($xml);
while ($myXML->read()) { //start reading.
if ($myXML->nodeType == XMLReader::ELEMENT) { //only opening tags.
$tag = $myXML->name; //make $tag contain the name of the tag
switch ($tag) {
case 'Tag1': //this tag contains no child elements, only the content we need. And it's unique.
$variable = $myXML->readInnerXML(); //now variable contains the contents of tag1
break;
case 'Tag2': //this tag contains child elements, of which we only want one.
while($myXML->read()) { //so we tell it to keep reading
if ($myXML->nodeType == XMLReader::ELEMENT && $myXML->name === 'Amount') { // and when it finds the amount tag...
$variable2 = $myXML->readInnerXML(); //...put it in $variable2.
break;
}
}
break;
case 'Tag3': //tag3 also has children, which are not unique, but we need two of the children this time.
while($myXML->read()) {
if ($myXML->nodeType == XMLReader::ELEMENT && $myXML->name === 'Amount') {
$variable3 = $myXML->readInnerXML();
break;
} else if ($myXML->nodeType == XMLReader::ELEMENT && $myXML->name === 'Currency') {
$variable4 = $myXML->readInnerXML();
break;
}
}
break;
}
}
}
$myXML->close();
}
Simple example:
public function productsAction()
{
$saveFileName = 'ceneo.xml';
$filename = $this->path . $saveFileName;
if(file_exists($filename)) {
$reader = new XMLReader();
$reader->open($filename);
$countElements = 0;
while($reader->read()) {
if($reader->nodeType == XMLReader::ELEMENT) {
$nodeName = $reader->name;
}
if($reader->nodeType == XMLReader::TEXT && !empty($nodeName)) {
switch ($nodeName) {
case 'id':
var_dump($reader->value);
break;
}
}
if($reader->nodeType == XMLReader::END_ELEMENT && $reader->name == 'offer') {
$countElements++;
}
}
$reader->close();
exit(print('<pre>') . var_dump($countElements));
}
}
XMLReader is well documented on PHP site. This is a XML Pull Parser, which means it's used to iterate through nodes (or DOM Nodes) of given XML document. For example, you could go through the entire document you gave like this:
<?php
$reader = new XMLReader();
if (!$reader->open("data.xml"))
{
die("Failed to open 'data.xml'");
}
while($reader->read())
{
$node = $reader->expand();
// process $node...
}
$reader->close();
?>
It is then up to you to decide how to deal with the node returned by XMLReader::expand().
This Works Better and Faster For Me
<html>
<head>
<script>
function showRSS(str) {
if (str.length==0) {
document.getElementById("rssOutput").innerHTML="";
return;
}
if (window.XMLHttpRequest) {
// code for IE7+, Firefox, Chrome, Opera, Safari
xmlhttp=new XMLHttpRequest();
} else { // code for IE6, IE5
xmlhttp=new ActiveXObject("Microsoft.XMLHTTP");
}
xmlhttp.onreadystatechange=function() {
if (this.readyState==4 && this.status==200) {
document.getElementById("rssOutput").innerHTML=this.responseText;
}
}
xmlhttp.open("GET","getrss.php?q="+str,true);
xmlhttp.send();
}
</script>
</head>
<body>
<form>
<select onchange="showRSS(this.value)">
<option value="">Select an RSS-feed:</option>
<option value="Google">Google News</option>
<option value="ZDN">ZDNet News</option>
<option value="job">Job</option>
</select>
</form>
<br>
<div id="rssOutput">RSS-feed will be listed here...</div>
</body>
</html>
**The Backend File **
<?php
//get the q parameter from URL
$q=$_GET["q"];
//find out which feed was selected
if($q=="Google") {
$xml=("http://news.google.com/news?ned=us&topic=h&output=rss");
} elseif($q=="ZDN") {
$xml=("https://www.zdnet.com/news/rss.xml");
}elseif($q == "job"){
$xml=("https://ngcareers.com/feed");
}
$xmlDoc = new DOMDocument();
$xmlDoc->load($xml);
//get elements from "<channel>"
$channel=$xmlDoc->getElementsByTagName('channel')->item(0);
$channel_title = $channel->getElementsByTagName('title')
->item(0)->childNodes->item(0)->nodeValue;
$channel_link = $channel->getElementsByTagName('link')
->item(0)->childNodes->item(0)->nodeValue;
$channel_desc = $channel->getElementsByTagName('description')
->item(0)->childNodes->item(0)->nodeValue;
//output elements from "<channel>"
echo("<p><a href='" . $channel_link
. "'>" . $channel_title . "</a>");
echo("<br>");
echo($channel_desc . "</p>");
//get and output "<item>" elements
$x=$xmlDoc->getElementsByTagName('item');
$count = $x->length;
// print_r( $x->item(0)->getElementsByTagName('title')->item(0)->nodeValue);
// print_r( $x->item(0)->getElementsByTagName('link')->item(0)->nodeValue);
// print_r( $x->item(0)->getElementsByTagName('description')->item(0)->nodeValue);
// return;
for ($i=0; $i <= $count; $i++) {
//Title
$item_title = $x->item(0)->getElementsByTagName('title')->item(0)->nodeValue;
//Link
$item_link = $x->item(0)->getElementsByTagName('link')->item(0)->nodeValue;
//Description
$item_desc = $x->item(0)->getElementsByTagName('description')->item(0)->nodeValue;
//Category
$item_cat = $x->item(0)->getElementsByTagName('category')->item(0)->nodeValue;
echo ("<p>Title: <a href='" . $item_link
. "'>" . $item_title . "</a>");
echo ("<br>");
echo ("Desc: ".$item_desc);
echo ("<br>");
echo ("Category: ".$item_cat . "</p>");
}
?>

Categories