Parsing an XML with Xpath in PHP - php

Consider the following code :
$dom = new DOMDocument();
$dom->loadXML($file);
$xmlPath = new DOMXPath($dom);
$arrNodes = $xmlPath->query('*/item');
foreach($arrNodes as $item){
//missing code
}
The $file is an xml and each item has a title and a description.
How can I display them (title and description)?
$file = "<item>
<title>test_title</title>
<desc>test</desc>
</item>";

I suggest using php's simplexml, with that, you still get xpath functionality, but with easier approach, for example you would access attributes like this:
$name = $item['name'];
Here's an example:
xmlfile.xml:
<?xml version="1.0" encoding="UTF-8"?>
<xml>
<items>
<item title="Hello World" description="Hellowing the world.." />
<item title="Hello People" description="greeting people.." />
</items>
</xml>
do.php:
<?php
$xml_str = file_get_contents('xmlfile.xml');
$xml = new SimpleXMLElement($xml_str);
$items = $xml->xpath('*/item');
foreach($items as $item) {
echo $item['title'], ': ', $item['description'], "\n";
}

If your item looks like this:
<item>
<title>foo</title>
<description>frob</description>
</item>
You could use getElementsByTagName() and nodeValue:
foreach($arrNodes as $item){
print $item->getElementsByTagName('title')->item(0)->nodeValue;
}
Are title and description attributes? E. g. does an item look like this:
<item title="foo" description="frob" />
If so, you could just use getAttribute():
...
foreach($arrNodes as $item){
print $item->getAttribute('title');
}

The right XPath expression should be:
/*/item/title | /*/item/desc
Or
/*/item/*[self::title or self::desc]
This is evaluate to a node set with title and desc element in document order

Related

Simplexml : xpath with three conditions

I have a xml file like this :
<rss version="2.0" xmlns:atom="https://www.w3.org/2005/Atom">
<channel>
<item>
<city>London</city>
<description>Trip</description>
<link>page.php</link>
<img>img.jpg</img>
</item>
<item>
<city>London</city>
<description>Trip</description>
<link>page.php</link>
<img>img.jpg</img>
</item>
<item>
<city>Paris</city>
<description>Trip</description>
<link>page.php</link>
<img>img.jpg</img>
</item>
.
.
</channel>
</rss>
If I want to select TRIP in LONDON, I do that :
<?php
$xml = simplexml_load_file('file.xml');
$items = $xml->xpath('//item[city[contains(.,"London")] and description[contains(.,"Trip")]]');
foreach($items as $item){
echo ' txt ';
}
?>
If I want to select ONLY the first TRIP in LONDON, I do that :
<?php
$xml = simplexml_load_file('file.xml');
$items = $xml->xpath('//item[city[contains(.,"London")] and description[contains(.,"Trip")]]')[0];
foreach($items as $item){
echo ' txt ';
}
?>
I try also 1 instead of 0, and this
[position()=0]
it does not work.
What's wrong ?
I keep looking.
I have made several tests only with the position filter, for example :
<?php
$xml = simplexml_load_file('site-alpha.xml');
$items = $xml->xpath('//(/item)[1]');
foreach($xml->channel->item as $item){
echo '<div>....</div>';
}
?>
And it doesn't work.
I think I have a problem with this part, but I don't see where.
<?php
// Load the XML file
$xml = simplexml_load_file('your_xml_file.xml');
// Iterate through each "item" element
foreach ($xml->item as $item) {
// Output the description and city
echo $item->description . ' in ' . $item->city . '<br>';
}
?>
Unlike php, xpath indexing start from "1".
So either of these should get you only the first trip:
#indexing is indicated inside the xpath expression so it starts with 1:
$items = $xml->xpath('//item[city[contains(.,"London")] and description[contains(.,"Trip")]][1]');
or
#indexing is indicated outside the xpath expression so it's handled by php and starts with 0:
$items = $xml->xpath('//item[city[contains(.,"London")] and description[contains(.,"Trip")]]')[0];

How can I remove certain elements from XML using SimpleXML

I load the following XML data into SimpleXML like this:
<?php
$xmlString = <<<'XML'
<?xml version="1.0"?>
<response>
<item key="0">
<title>AH 2308</title>
<field_a>3.00</field_a>
<field_b>7.00</field_b>
<field_d1>35.00</field_d1>
<field_d2>40.00</field_d2>
<field_e></field_e>
<field_g2></field_g2>
<field_g>M 45x1,5</field_g>
<field_gewicht>0.13</field_gewicht>
<field_gtin>4055953012781</field_gtin>
<field_l>40.00</field_l>
<field_t></field_t>
<field_abdrueckmutter>KM 9</field_abdrueckmutter>
<field_sicherung>MB 7</field_sicherung>
<field_wellenmutter>KM 7</field_wellenmutter>
</item>
<item key="1">
<title></title>
<field_a></field_a>
<field_b></field_b>
<field_d1></field_d1>
<field_d2></field_d2>
<field_e></field_e>
<field_g2></field_g2>
<field_g></field_g>
<field_gewicht></field_gewicht>
<field_gtin></field_gtin>
<field_l></field_l>
<field_t></field_t>
<field_abdrueckmutter></field_abdrueckmutter>
<field_sicherung></field_sicherung>
<field_wellenmutter></field_wellenmutter>
</item>
</response>
XML;
$xml = simplexml_load_string($xml);
How can I achieve the following result:
<?xml version="1.0"?>
<response>
<item key="0">
<title>AH 2308</title>
<field_a>3.00</field_a>
<field_b>7.00</field_b>
<field_d1>35.00</field_d1>
<field_d2>40.00</field_d2>
<field_e></field_e>
<field_g2></field_g2>
<field_g>M 45x1,5</field_g>
<field_gewicht>0.13</field_gewicht>
<field_gtin>4055953012781</field_gtin>
<field_l>40.00</field_l>
<field_t></field_t>
<field_abdrueckmutter>KM 9</field_abdrueckmutter>
<field_sicherung>MB 7</field_sicherung>
<field_wellenmutter>KM 7</field_wellenmutter>
</item>
<item key="1"></item>
</response>
To delete all empty elements, I could use the following working code:
foreach ($xml->xpath('/child::*//*[not(*) and not(text()[normalize-space()])]') as $emptyElement) {
unset($emptyElement[0]);
}
But that's not exactly what I want.
Basically, when the <title> element is empty, I want to remove it with all its siblings and keep the parent <item> element.
What's important: I also want to keep empty element, if the <title> is not empty. See <item key="0"> for example. The elements <field_e>, <field_g2> and <field_t>will be left untouched.
Is there an easy xpath query which can achieve that? Hope anyone can help. Thanks in advance!
This xpath query is working:
foreach ($xml->xpath('//title[not(text()[normalize-space()])]/following-sibling::*') as $emptyElement) {
unset($emptyElement[0]);
}
It keeps the <title> element but I can live with that.
DOM is more flexible manipulating nodes:
$document = new DOMDocument();
$document->loadXML($xmlString);
$xpath = new DOMXpath($document);
$expression = '/response/item[not(title[normalize-space()])]';
foreach ($xpath->evaluate($expression) as $emptyItem) {
// replace children with an empty text node
$emptyItem->textContent = '';
}
echo $document->saveXML();

PHP XML parsing going directly to value by attribute

i have a XML document that looks like this:
<body>
<item id="9982a">
<value>ab</value>
</item>
<item id="9982b">
<value>abc</value>
</item>
etc...
</body>
Now, i need to get the value for a key, the document is very very big, is there any way to go directly to the key when i know the id? Rather then loop it?
Something like:
$xml = simplexml_load_string(file_get_contents('http://somesite.com/new.xml'));
$body = $xml->body;
$body->item['id'][9982a]; // ab
?
xpathis your friend, you can stay with simplexml:
$xml = simplexml_load_string($x); // assume XML in $x
$result = $xml->xpath("/body/item[#id = '9982a']/value")[0]; // requires PHP >= 5.4
echo $result;
Comment:
in PHP < 5.4, do...
$result = $xml->xpath("/body/item[#id = '9982a']/value");
$result = $result[0];
see it working: https://eval.in/101766
Yes, use Simple HTML DOM Parser instead of SimpleXML.
It would be as easy as:
$xml->find('item[id="9982b"]',0)->find('value',0)->innertext;
It is possible with DOMXpath::evaluate() to fetch scalar values from a DOM using xpath expressions:
$xml = <<<'XML'
<body>
<item id="9982a">
<value>ab</value>
</item>
<item id="9982b">
<value>abc</value>
</item>
</body>
XML;
$dom = new DOMDocument;
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);
var_dump(
$xpath->evaluate('string(//body/item[#id="9982b"]/value)')
);

easy xpath query but no results

Trying to get all URLs values from xml.
I have hundreds of entry exactly in the form like e.g. this entry 16:
<?xml version="1.0" encoding="utf-8" ?>
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<entries>
<entry id="16">
<revision number="1" status="accepted" wordclass="v" nounclasses="" unverified="false"></revision>
<media type="audio" url="http://website.com/file/65.mp3" />
</entry>
<entry id="17">
....
</entry>
</entries>
</root>
I am using this code but cannot get it to work. Why?
$doc = new DOMDocument;
$doc->Load('data.xml');
$xpath = new DOMXPath($doc);
$query = '//root/entries/entry/media';
$entries = $xpath->query($query);
What is the correc query for that? Best would be to only get the url value.
Your query probably returns the proper elements, but by default gives you the content of the media tag ( which in your case are empty, since the tag is self-closing ).
To get the url attribute of the tag you should use getAttribute(), example :
$entries = $xpath->query('//root/entries/entry/media');
foreach($entries as $entry) {
print $entry->getAttribute("url")."<br/>";
}
Or you should just xpath-query the attribute instead and read out it's value:
$urlAttributes = $xpath->query('//root/entries/entry/media/#url');
#####
foreach ($urlAttributes as $urlAttribute)
{
echo $urlAttribute->value, "<br/>\n";
#####
}
See DOMAttr::$valueDocs:
value
The value of the attribute
I would do that with SimpleXML actually:
$file = 'data.xml';
$xpath = '//root/entries/entry/media/#url';
$xml = simplexml_load_file($file);
$urls = array();
if ($xml) {
$urls = array_map('strval', $xml->xpath($xpath));
}
Which will give you all URLs as strings inside the $urls array. If there was an error loading the XML file, the array is empty.

php - simpleXML help

I have this XML code :
<?xml version="1.0"?>
<Days>
<day value="1">
<Imsaak>04:59</Imsaak>
<Fajr>05:09</Fajr>
<Sunrise>06:23</Sunrise>
<Dhuhr>12:39</Dhuhr>
<Asr>16:12</Asr>
<Sunset>18:55</Sunset>
<Maghrib>19:10</Maghrib>
<Isha>20:04</Isha>
</day>
<day value="2">
<Imsaak>04:58</Imsaak>
<Fajr>05:08</Fajr>
<Sunrise>06:22</Sunrise>
<Dhuhr>12:39</Dhuhr>
<Asr>16:12</Asr>
<Sunset>18:56</Sunset>
<Maghrib>19:11</Maghrib>
<Isha>20:05</Isha>
</day>
</Days>
and I want to select <day> node depending on the attribute value
I am using SimpleXMLElement class but I don't how to select with arrtibute value.
how I can do that??
EDIT: my code :
include 'days.xml';
$xml = new SimpleXMLElement($xmlstr);
foreach ($xml->day as $day) {
// process data
}
from php manual SimpleXMLElement::attributes (little bit edited)
Considering this data:
<?xml version="1.0" encoding="utf-8"?>
<data>
<item ID="30001">
<Company>Navarro Corp.</Company>
</item>
<item ID="30002">
<Company>Performant Systems</Company>
</item>
<item ID="30003">
<Company>Digital Showcase</Company>
</item>
</data>
Example of listing both the ID Attribute and Company Element values:
<?php
$xmlObject = new SimpleXMLElement($xmlstring);
foreach ($xmlObject->children() as $node) {
$arr = $node->attributes(); // returns an array
if(in_array("30002", $arr)){ // search the value of an attribute
print ("Company=".$node->Company);
}
//depending of your needs, you could use a switch / case instead of use an if
}
?>
$xml = new SimpleXMLElement($xmlStr)
$xml->day[0]->attribute()->value;//will echo out 1
of course you can loop through all of the day like this:
foreach($sml->day as $day){
$day->attribute()->value; //will trace out 1 and then 2
}

Categories