Xpath parent selector in PHP - php

A bit of background:
Hi guys,
I'm creating a router in PHP for an MVC application on and decided that the structure would be in XML. I have an XML file containing all valid routes (pages) in the system along with their associated controller & action.
There's also a 'param' to indicate if there's a variable on the end of the URI and the variable name to assign it to (confusingly named I know!!)
What I'm doing is looking at the REQUEST_URI and using PHP's explode function to turn it into an array of 'route' elements which I then build a query for.
Here's some sample XML:
<routes>
<route>
<url>blog</url>
<params>
<controller>blogController</controller>
<action>indexAction</action>
</params>
<route>
<url>entry</url>
<params>
<controller>blogController</controller>
<action>entryAction</action>
<param>entryId</param>
</params>
</route>
</route>
</routes>
And here's the query that's being built:
/routes/route/url[text()="blog"]/../route/url[text()="entry"]/..
This always seems to return 0 nodes in PHPs XPath, but using an online expression tester I get the entry route matched.
Can anyone explain what might be going wrong? Does PHP's Xpath parser understand this syntax? I have also tried the ::parent* method
Cheers!

You shouldn't need .. or parent::*.
Try this instead:
/routes/route[url="blog"]/route[url="entry"]
You shouldn't need text() either, but I also don't know PHP very well. Ff the above doesn't work, try:
/routes/route[url/text()="blog"]/route[url/text()="entry"]

XPath wise I come to the same conclusion like DevNull, only a slight addition to select the first match:
/routes/route[url="blog"]/route[url="entry"][1]
With a object interface:
$routes = new RoutesXML($xml);
var_dump($routes->fromPath('blog/entry')); # The SimpleXMLElement
var_dump($routes->fromPath('blog/entry2')); # NULL
Example Implementation:
class RoutesXML
{
private $xml;
public function __construct($xml) {
$this->xml = simplexml_load_string($xml);
}
public function fromPath($path) {
$expression = '/routes';
foreach(explode('/', $path) as $element)
$expression .= "/route[url='$element']";
$expression .= '[1]';
list($route) = $this->xml->xpath($expression) + array(NULL);
return $route;
}
}

Related

PHP - Unable to parse attribute using SimpleXML

Given the following xml:
<data xmlns:ns2="...">
<versions>
<ns2:version type="HW">E</ns2:version>
<ns2:version type="FW">3160</ns2:version>
<ns2:version type="SW">3.4.1 (777)</ns2:version>
</versions>
...
</data>
I am trying to parse the third attribute ~ns2:version type="SW" but when running the following code I get nothing..
$s = simplexml_load_file('data.xml');
echo $s->versions[2]->{'ns2:version'};
Running this gives the following output:
$s = simplexml_load_file('data.xml');
var_dump($s->versions);
How can I properly get that attribute?
You've got some quite annoying XML to work with there, at least as far as SimpleXML is concerned.
Your version elements are in the ns2 namespace, so in order to loop over them, you need to do something like this:
$s = simplexml_load_string($xml);
foreach ($s->versions[0]->children('ns2', true)->version as $child) {
...
}
The children() method returns all children of the current tag, but only in the default namespace. If you want to access elements in other namespaces, you can pass the local alias and the second argument true.
The more complicated part is that the type attributes is not considered to be part of this same namespace. This means you can't use the standard $element['attribute'] form to access it, since your element and attribute are in different namespaces.
Fortunately, SimpleXML's attributes() method works in the same way as children(), and so to access the attributes in the global namespace, you can pass it an empty string:
$element->attributes('')->type
In full, this is:
$s = simplexml_load_string($xml);
foreach ($s->versions[0]->children('ns2', true)->version as $child) {
echo (string) $child->attributes()->type, PHP_EOL;
}
This will get you the output
HW
FW
SW
To get the third attribute.
$s = simplexml_load_file('data.xml');
$sxe = new SimpleXMLElement($s);
foreach ($sxe as $out_ns) {
$ns = $out_ns->getNamespaces(true);
$child = $out_ns->children($ns['ns2']);
}
echo $child[2];
Out put:
3.4.1 (777)

How to parse/extract url from an xml file?

I have an XML file that contains the following type of data
<definition name="/products/phone" path="/main/something.jsp" > </definition>
There are dozens of nodes in the xml file.
What I want to do is extract the url under the 'name' parameter so my end result will be:
http://www.mysite.com/products/phone.jsp
Can I do this with a so called XML parser? I have no idea where to begin. Can someone steer me to a direction. What tools do I need to achieve something like that?
I am particularly interested in doing this with PHP.
It should be easy to append a path to an existing URL and expected resource type given the above basic XML.
If you are comfortable with C#, and you know there is one and only one "definition" element, here is a self contained little program that does what you require (and assumes you are loading the XML from a string):
using System;
using System.Xml;
public class parseXml
{
private const string myDomain = "http://www.mysite.com/";
private const string myExtension = ".jsp";
public static void Main()
{
string xmlString = "<definition name='/products/phone' path='/main/something.jsp'> </definition>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(xmlString);
string fqdn = myDomain +
doc.DocumentElement.SelectSingleNode("//definition").Attributes["name"].ToString() +
myExtension;
Console.WriteLine("Original XML: {0}\nResultant FQDN: {1}", xmlString, fqdn);
}
}
You are going to need to be careful with SelectSingleNode above; the XPath expression assumes there is only one "definition" node and that you are searching from the document root.
Fundamentally, it's worthwhile to read a primer on XML. Xml is not difficult, it's a self describing hierarchical data format - lots of nested text, angle brackets, and quotation marks :).
A good primer would probably be that at the W3 Schools:
http://www.w3schools.com/xml/xml_whatis.asp
You may also want to read up on streaming (SAX/StreamReader) vs. loading (DOM/XmlDocument) Xml:
What is the difference between SAX and DOM?
I can provide a Java example too, if you feel that would be helpful.
Not sure if you solved your problem, so here is a PHP solution:
$xml = <<<DATA
<?xml version="1.0"?>
<root>
<definition name="/products/phone" path="/main/something.jsp"> </definition>
<definition name="/products/cell" path="/main/something.jsp"> </definition>
<definition name="/products/mobile" path="/main/something.jsp"> </definition>
</root>
DATA;
$arr = array();
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($xml);
$xpath = new DOMXPath($dom);
$defs = $xpath->query('//definition');
foreach($defs as $def) {
$attr = $def->getAttribute('name');
if ($attr != "") {
array_push($arr, $attr);
}
}
print_r($arr);
See IDEONE demo
Result:
Array
(
[0] => /products/phone
[1] => /products/cell
[2] => /products/mobile
)

PHP script to echo VLC now playing XML attributes

I've been searching for a while on this and haven't had much luck. I've found plenty of resources showing how to echo data from dynamic XML, but I'm a PHP novice, and nothing I've written seems to grab and print exactly what I want, though from everything I've heard, it should be relatively easy. The source XML (located at 192.168.0.15:8080/requests/status.xml) is as follows:
<root>
<fullscreen>0</fullscreen>
<volume>97</volume>
<repeat>false</repeat>
<version>2.0.5 Twoflower</version>
<random>true</random>
<audiodelay>0</audiodelay>
<apiversion>3</apiversion>
<videoeffects>
<hue>0</hue>
<saturation>1</saturation>
<contrast>1</contrast>
<brightness>1</brightness>
<gamma>1</gamma>
</videoeffects>
<state>playing</state>
<loop>true</loop>
<time>37</time>
<position>0.22050105035305</position>
<rate>1</rate>
<length>168</length>
<subtitledelay>0</subtitledelay>
<equalizer/>
<information>
<category name="meta">
<info name="description">
000003EC 00000253 00000D98 000007C0 00009C57 00004E37 000068EB 00003DC5 00015F90 00011187
</info>
<info name="date">2003</info>
<info name="artwork_url"> file://brentonshp04/music%24/Music/Hackett%2C%20Steve/Guitar%20Noir%20%26%20There%20Are%20Many%20Sides%20to%20the%20Night%20Disc%202/Folder.jpg
</info>
<info name="artist">Steve Hackett</info>
<info name="publisher">Recall</info>
<info name="album">Guitar Noir & There Are Many Sides to the Night Disc 2
</info>
<info name="track_number">5</info>
<info name="title">Beja Flor [Live]</info>
<info name="genre">Rock</info>
<info name="filename">Beja Flor [Live]</info>
</category>
<category name="Stream 0">
<info name="Bitrate">128 kb/s</info>
<info name="Type">Audio</info>
<info name="Channels">Stereo</info>
<info name="Sample rate">44100 Hz</info>
<info name="Codec">MPEG Audio layer 1/2/3 (mpga)</info>
</category>
</information>
<stats>
<lostabuffers>0</lostabuffers>
<readpackets>568</readpackets>
<lostpictures>0</lostpictures>
<demuxreadbytes>580544</demuxreadbytes>
<demuxbitrate>0.015997290611267</demuxbitrate>
<playedabuffers>0</playedabuffers>
<demuxcorrupted>0</demuxcorrupted>
<sendbitrate>0</sendbitrate>
<sentbytes>0</sentbytes>
<displayedpictures>0</displayedpictures>
<demuxreadpackets>0</demuxreadpackets>
<sentpackets>0</sentpackets>
<inputbitrate>0.016695899888873</inputbitrate>
<demuxdiscontinuity>0</demuxdiscontinuity>
<averagedemuxbitrate>0</averagedemuxbitrate>
<decodedvideo>0</decodedvideo>
<averageinputbitrate>0</averageinputbitrate>
<readbytes>581844</readbytes>
<decodedaudio>0</decodedaudio>
</stats>
</root>
What I'm trying to write is a simple PHP script that echoes the artist's name (In this example Steve Hackett). Actually I'd like it to echo the artist, song and album, but I'm confident that if I'm shown how to retrieve one, I can figure out the rest on my own.
What little of my script which actually seems to work goes as follows. I've tried more than what's below, but I left out the bits that I know for a fact aren't working.
<?PHP
$file = file_get_contents('http://192.168.0.15:8080/requests/status.xml');
$sxe = new SimpleXMLElement($file);
foreach($sxe->...
echo "Artist: "...
?>
I think I need to use foreach and echo, but I can't figure out how to do it in a way that will print what's between those info brackets.
I'm sorry if I've left anything out. I'm not only new to PHP, but I'm new to StackOverflow too. I've referenced this site in other projects, and it's always been incredibly helpful, so thanks in advance for your patience and help!
////////Finished Working Script - Thanks to Stefano and all who helped!
<?PHP
$file = file_get_contents('http://192.168.0.15:8080/requests/status.xml');
$sxe = new SimpleXMLElement($file);
$artist_xpath = $sxe->xpath('//info[#name="artist"]');
$album_xpath = $sxe->xpath('//info[#name="album"]');
$title_xpath = $sxe->xpath('//info[#name="title"]');
$artist = (string) $artist_xpath[0];
$album = (string) $album_xpath[0];
$title = (string) $title_xpath[0];
echo "<B>Artist: </B>".$artist."</br>";
echo "<B>Title: </B>".$title."</br>";
echo "<B>Album: </B>".$album."</br>";
?>
Instead of using a for loop, you can obtain the same result with XPath:
// Extraction splitted across two lines for clarity
$artist_xpath = $sxe->xpath('//info[#name="artist"]');
$artist = (string) $artist_xpath[0];
echo $artist;
You will have to adjust the xpath expression (i.e. change #name=... appropriately), but you get the idea. Also notice that [0] is necessary because xpath will return an array of matches (and you only need the first) and the cast (string) is used to extract text contained in the node.
Besides, your XML is invalid and will be rejected by the parser because of the literal & appearing in the <info name="album"> tag.
If you look at your code again, you are missing a function that turns the first result of the xpath expression into a string of a SimpleXMLElement (casting).
One way to write this once is to extend from SimpleXMLElement:
class BetterXMLElement extends SimpleXMLElement
{
public function xpathString($expression) {
list($result) = $this->xpath($expression);
return (string) $result;
}
}
You then create the more specific SimpleXMLElement like you did use the less specific before:
$file = file_get_contents('http://192.168.0.15:8080/requests/status.xml');
$sxe = new BetterXMLElement($file);
And then you benefit in your following code:
$artist = $sxe->xpathString('//info[#name="artist"]');
$album = $sxe->xpathString('//info[#name="album"]');
$title = $sxe->xpathString('//info[#name="title"]');
echo "<B>Artist: </B>".$artist."</br>";
echo "<B>Title: </B>".$title."</br>";
echo "<B>Album: </B>".$album."</br>";
This spares you some repeated code. This means as well less places you can make an error in :)
Sure you can further on optimize this by allowing to pass an array of multiple xpath queries and returning all values named then. But that is something you need to write your own according to your specific needs. So use what you learn in programming to make programming more easy :)
If you want some more suggestions, here is another, very detailed example using DOMDocument, the sister-library of SimpleXML. It is quite advanced but might give you some good inspiration, I think something similar is possible with SimpleXML as well and this is probably what you're looking for in the end:
Extracting data from HTML using PHP and xPath

PHP XML Strategy: Parsing DOM to fill "Bean"

I have a question concerning a good strategy on how to fill a data "bean" with data inside an xml file.
The bean might look like this:
class Person
{
var $id;
var $forename = "";
var $surname = "";
var $bio = new Biography();
}
class Biography
{
var $url = "";
var $id;
}
the xml subtree containing the info might look like this:
<root>
<!-- some more parent elements before node(s) of interest -->
<person>
<name pre="forename">
Foo
</name>
<name pre="surname">
Bar
</name>
<id>
1254
</id>
<biography>
<url>
http://www.someurl.com
</url>
<id>
5488
</id>
</biography>
</person>
</root>
At the moment, I have one approach using DOMDocument. A method
iterates over the entries and fills the bean by "remembering"
the last node. I think thats not a good approach.
What I have in mind is something like preconstructing some xpath
expression(s) and then iterate over the subtrees/nodeLists. Return
an array containing the beans as defined above eventually.
However, it seems not to be possible reusing a subtree /DOMNode
as DOMXPath constructor parameter.
Has anyone of you encountered such a problem?
Did you mean using an XML file as a sort of template ?
You can use some factory to build the empty person or biography node and then feed it, or validate using DTD's
You can search using xpath on selected DOM nodes, see php DOMXpath manual
no. The XML contains real data. I need to transform it into a php array (unfortunenatly it must be PHP :/ don't ask why ...).
---> You can use some factory to build the empty person or biography node and then feed it, or validate using DTD's
The "bean" is not the problem ... Constructing the list of beans is harder than i thought.. maybe the main problem is related to the solution, since I want to keep it as general as possible ..
here is some java code I just wrote, maybe you get an idea..
public List<PersonBean> extract(String xml) throws Exception {
InputSource is =new InputSource(new StringReader(xml));
XPathFactory xfactory = XPathFactory.newInstance();
XPath xpath = xfactory.newXPath();
NodeList nodeList = (NodeList)xpath.evaluate("/root/person", is, XPathConstants.NODESET);
int length = nodeList.getLength();
int pos = -1;
Traverser tra = new Traverser();
Attribute nameAttr = new Attribute();
nameAttr.setName("attr");
while(++pos < length) {
PersonBean bean = new PersonBean();
Node person = nodeList.item(pos);
Node fore = tra.getElementByNodeName(person, "id");
nameAttr.setValue("forename");
Node pre = tra.getElementByNodeNameWithAttribute(person,"name",nameAttr);
nameAttr.setValue("surname");
Node sur = tra.getElementByNodeNameWithAttribute(person, "name", nameAttr);
bean.setForeName(pre.getTextContent());
bean.setSurName(sur.getTextContent());
bean.setId(fore.getTextContent());
Node bio = tra.getElementByNodeName(person, "biography");
Node bid = tra.getElementByNodeName(bio, "id");
Node url = tra.getElementByNodeName(bio, "url");
BiographyBean bioBean = new BiographyBean();
bioBean.setId(bid.getTextContent());
bioBean.setUrl(url.getTextContent());
bean.setBio(bioBean);
persons.add(bean);
}
return persons;
}
Traverser is just a simple iterative xml traverser ..
Attribute another Bean for Value and Name.
This solution works fine, given the case there is a "person"-node.. However, the code could grow drastically for all other elements that need to be parsed..
I don't expect ready made solutions, just a small hint in the right direction.. :)
Cheers,
Mike

PHP simplexml: why does xpath stop working?

A strange thing happened after a supplier changed the XML header a bit. I used to be able to read stuff using xpath, but now I can't even get a reply with
$xml->xpath('/');
They changed it from this...
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE NewsML SYSTEM "http://www.newsml.org/dl.php?fn=NewsML/1.2/specification/NewsML_1.2.dtd" [
<!ENTITY % nitf SYSTEM "http://www.nitf.org/IPTC/NITF/3.4/specification/dtd/nitf-3-4.dtd">
%nitf;
]>
<NewsML>
...
to this:
<?xml version="1.0" encoding="iso-8859-1"?>
<NewsML
xmlns="http://iptc.org/std/NewsML/2003-10-10/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://iptc.org/std/NewsML/2003-10-10/ http://www.iptc.org/std/NewsML/1.2/specification/NewsML_1.2.xsd http://iptc.org/std/NITF/2006-10-18/ http://contentdienst.pressetext.com/misc/nitf-3-4.xsd"
>
...
Most likely this is because they've introduced a default namespace (xmlns="http://iptc.org/std/NewsML/2003-10-10/") into their document. SimpleXML's support for default namespaces is not very good, to put it mildly.
Can you try to explicitly register a namespace prefix:
$xml->registerXPathNamespace("n", "http://iptc.org/std/NewsML/2003-10-10/");
$xml->xpath('/n:NewsML');
You would have to adapt your XPath expressions to use the "n:" prefix on every element. Here is some additional info: http://people.ischool.berkeley.edu/~felix/xml/php-and-xmlns.html.
EDIT: As per the spec:
The registerXPathNamespace() function creates a prefix/ns context for the next XPath query.
This means it would have to be called before every XPath query, thus a function to wrap XPath queries would be the natural thing to do:
function simplexml_xpath_ns($element, $xpath, $xmlns)
{
foreach ($xmlns as $prefix_uri)
{
list($prefix, $uri) = explode("=", $prefix_uri, 2);
$element->registerXPathNamespace($prefix, $uri);
}
return $element->xpath($xpath);
}
Usage:
$xmlns = ["n=http://iptc.org/std/NewsML/2003-10-10/"];
$result = simplexml_xpath_ns($xml, '/n:NewsML', $xmlns);

Categories