PHP XML Strategy: Parsing DOM to fill "Bean"

PHP XML Strategy: Parsing DOM to fill "Bean" - php

I have a question concerning a good strategy on how to fill a data "bean" with data inside an xml file.
The bean might look like this:
class Person
{
var $id;
var $forename = "";
var $surname = "";
var $bio = new Biography();
}
class Biography
{
var $url = "";
var $id;
}
the xml subtree containing the info might look like this:
<root>
<!-- some more parent elements before node(s) of interest -->
<person>
<name pre="forename">
Foo
</name>
<name pre="surname">
Bar
</name>
<id>
1254
</id>
<biography>
<url>
http://www.someurl.com
</url>
<id>
5488
</id>
</biography>
</person>
</root>
At the moment, I have one approach using DOMDocument. A method
iterates over the entries and fills the bean by "remembering"
the last node. I think thats not a good approach.
What I have in mind is something like preconstructing some xpath
expression(s) and then iterate over the subtrees/nodeLists. Return
an array containing the beans as defined above eventually.
However, it seems not to be possible reusing a subtree /DOMNode
as DOMXPath constructor parameter.
Has anyone of you encountered such a problem?

Did you mean using an XML file as a sort of template ?
You can use some factory to build the empty person or biography node and then feed it, or validate using DTD's
You can search using xpath on selected DOM nodes, see php DOMXpath manual

no. The XML contains real data. I need to transform it into a php array (unfortunenatly it must be PHP :/ don't ask why ...).
---> You can use some factory to build the empty person or biography node and then feed it, or validate using DTD's
The "bean" is not the problem ... Constructing the list of beans is harder than i thought.. maybe the main problem is related to the solution, since I want to keep it as general as possible ..
here is some java code I just wrote, maybe you get an idea..
public List<PersonBean> extract(String xml) throws Exception {
InputSource is =new InputSource(new StringReader(xml));
XPathFactory xfactory = XPathFactory.newInstance();
XPath xpath = xfactory.newXPath();
NodeList nodeList = (NodeList)xpath.evaluate("/root/person", is, XPathConstants.NODESET);
int length = nodeList.getLength();
int pos = -1;
Traverser tra = new Traverser();
Attribute nameAttr = new Attribute();
nameAttr.setName("attr");
while(++pos < length) {
PersonBean bean = new PersonBean();
Node person = nodeList.item(pos);
Node fore = tra.getElementByNodeName(person, "id");
nameAttr.setValue("forename");
Node pre = tra.getElementByNodeNameWithAttribute(person,"name",nameAttr);
nameAttr.setValue("surname");
Node sur = tra.getElementByNodeNameWithAttribute(person, "name", nameAttr);
bean.setForeName(pre.getTextContent());
bean.setSurName(sur.getTextContent());
bean.setId(fore.getTextContent());
Node bio = tra.getElementByNodeName(person, "biography");
Node bid = tra.getElementByNodeName(bio, "id");
Node url = tra.getElementByNodeName(bio, "url");
BiographyBean bioBean = new BiographyBean();
bioBean.setId(bid.getTextContent());
bioBean.setUrl(url.getTextContent());
bean.setBio(bioBean);
persons.add(bean);
}
return persons;
}
Traverser is just a simple iterative xml traverser ..
Attribute another Bean for Value and Name.
This solution works fine, given the case there is a "person"-node.. However, the code could grow drastically for all other elements that need to be parsed..
I don't expect ready made solutions, just a small hint in the right direction.. :)
Cheers,
Mike

Related

How to parse/extract url from an xml file?

I have an XML file that contains the following type of data
<definition name="/products/phone" path="/main/something.jsp" > </definition>
There are dozens of nodes in the xml file.
What I want to do is extract the url under the 'name' parameter so my end result will be:
http://www.mysite.com/products/phone.jsp
Can I do this with a so called XML parser? I have no idea where to begin. Can someone steer me to a direction. What tools do I need to achieve something like that?
I am particularly interested in doing this with PHP.

It should be easy to append a path to an existing URL and expected resource type given the above basic XML.
If you are comfortable with C#, and you know there is one and only one "definition" element, here is a self contained little program that does what you require (and assumes you are loading the XML from a string):
using System;
using System.Xml;
public class parseXml
{
private const string myDomain = "http://www.mysite.com/";
private const string myExtension = ".jsp";
public static void Main()
{
string xmlString = "<definition name='/products/phone' path='/main/something.jsp'> </definition>";
XmlDocument doc = new XmlDocument();
doc.LoadXml(xmlString);
string fqdn = myDomain +
doc.DocumentElement.SelectSingleNode("//definition").Attributes["name"].ToString() +
myExtension;
Console.WriteLine("Original XML: {0}\nResultant FQDN: {1}", xmlString, fqdn);
}
}
You are going to need to be careful with SelectSingleNode above; the XPath expression assumes there is only one "definition" node and that you are searching from the document root.
Fundamentally, it's worthwhile to read a primer on XML. Xml is not difficult, it's a self describing hierarchical data format - lots of nested text, angle brackets, and quotation marks :).
A good primer would probably be that at the W3 Schools:
http://www.w3schools.com/xml/xml_whatis.asp
You may also want to read up on streaming (SAX/StreamReader) vs. loading (DOM/XmlDocument) Xml:
What is the difference between SAX and DOM?
I can provide a Java example too, if you feel that would be helpful.

Not sure if you solved your problem, so here is a PHP solution:
$xml = <<<DATA
<?xml version="1.0"?>
<root>
<definition name="/products/phone" path="/main/something.jsp"> </definition>
<definition name="/products/cell" path="/main/something.jsp"> </definition>
<definition name="/products/mobile" path="/main/something.jsp"> </definition>
</root>
DATA;
$arr = array();
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($xml);
$xpath = new DOMXPath($dom);
$defs = $xpath->query('//definition');
foreach($defs as $def) {
$attr = $def->getAttribute('name');
if ($attr != "") {
array_push($arr, $attr);
}
}
print_r($arr);
See IDEONE demo
Result:
Array
(
[0] => /products/phone
[1] => /products/cell
[2] => /products/mobile
)

PHP script to echo VLC now playing XML attributes

I've been searching for a while on this and haven't had much luck. I've found plenty of resources showing how to echo data from dynamic XML, but I'm a PHP novice, and nothing I've written seems to grab and print exactly what I want, though from everything I've heard, it should be relatively easy. The source XML (located at 192.168.0.15:8080/requests/status.xml) is as follows:
<root>
<fullscreen>0</fullscreen>
<volume>97</volume>
<repeat>false</repeat>
<version>2.0.5 Twoflower</version>
<random>true</random>
<audiodelay>0</audiodelay>
<apiversion>3</apiversion>
<videoeffects>
<hue>0</hue>
<saturation>1</saturation>
<contrast>1</contrast>
<brightness>1</brightness>
<gamma>1</gamma>
</videoeffects>
<state>playing</state>
<loop>true</loop>
<time>37</time>
<position>0.22050105035305</position>
<rate>1</rate>
<length>168</length>
<subtitledelay>0</subtitledelay>
<equalizer/>
<information>
<category name="meta">
<info name="description">
000003EC 00000253 00000D98 000007C0 00009C57 00004E37 000068EB 00003DC5 00015F90 00011187
</info>
<info name="date">2003</info>
<info name="artwork_url"> file://brentonshp04/music%24/Music/Hackett%2C%20Steve/Guitar%20Noir%20%26%20There%20Are%20Many%20Sides%20to%20the%20Night%20Disc%202/Folder.jpg
</info>
<info name="artist">Steve Hackett</info>
<info name="publisher">Recall</info>
<info name="album">Guitar Noir & There Are Many Sides to the Night Disc 2
</info>
<info name="track_number">5</info>
<info name="title">Beja Flor [Live]</info>
<info name="genre">Rock</info>
<info name="filename">Beja Flor [Live]</info>
</category>
<category name="Stream 0">
<info name="Bitrate">128 kb/s</info>
<info name="Type">Audio</info>
<info name="Channels">Stereo</info>
<info name="Sample rate">44100 Hz</info>
<info name="Codec">MPEG Audio layer 1/2/3 (mpga)</info>
</category>
</information>
<stats>
<lostabuffers>0</lostabuffers>
<readpackets>568</readpackets>
<lostpictures>0</lostpictures>
<demuxreadbytes>580544</demuxreadbytes>
<demuxbitrate>0.015997290611267</demuxbitrate>
<playedabuffers>0</playedabuffers>
<demuxcorrupted>0</demuxcorrupted>
<sendbitrate>0</sendbitrate>
<sentbytes>0</sentbytes>
<displayedpictures>0</displayedpictures>
<demuxreadpackets>0</demuxreadpackets>
<sentpackets>0</sentpackets>
<inputbitrate>0.016695899888873</inputbitrate>
<demuxdiscontinuity>0</demuxdiscontinuity>
<averagedemuxbitrate>0</averagedemuxbitrate>
<decodedvideo>0</decodedvideo>
<averageinputbitrate>0</averageinputbitrate>
<readbytes>581844</readbytes>
<decodedaudio>0</decodedaudio>
</stats>
</root>
What I'm trying to write is a simple PHP script that echoes the artist's name (In this example Steve Hackett). Actually I'd like it to echo the artist, song and album, but I'm confident that if I'm shown how to retrieve one, I can figure out the rest on my own.
What little of my script which actually seems to work goes as follows. I've tried more than what's below, but I left out the bits that I know for a fact aren't working.
<?PHP
$file = file_get_contents('http://192.168.0.15:8080/requests/status.xml');
$sxe = new SimpleXMLElement($file);
foreach($sxe->...
echo "Artist: "...
?>
I think I need to use foreach and echo, but I can't figure out how to do it in a way that will print what's between those info brackets.
I'm sorry if I've left anything out. I'm not only new to PHP, but I'm new to StackOverflow too. I've referenced this site in other projects, and it's always been incredibly helpful, so thanks in advance for your patience and help!
////////Finished Working Script - Thanks to Stefano and all who helped!
<?PHP
$file = file_get_contents('http://192.168.0.15:8080/requests/status.xml');
$sxe = new SimpleXMLElement($file);
$artist_xpath = $sxe->xpath('//info[#name="artist"]');
$album_xpath = $sxe->xpath('//info[#name="album"]');
$title_xpath = $sxe->xpath('//info[#name="title"]');
$artist = (string) $artist_xpath[0];
$album = (string) $album_xpath[0];
$title = (string) $title_xpath[0];
echo "<B>Artist: </B>".$artist."</br>";
echo "<B>Title: </B>".$title."</br>";
echo "<B>Album: </B>".$album."</br>";
?>

Instead of using a for loop, you can obtain the same result with XPath:
// Extraction splitted across two lines for clarity
$artist_xpath = $sxe->xpath('//info[#name="artist"]');
$artist = (string) $artist_xpath[0];
echo $artist;
You will have to adjust the xpath expression (i.e. change #name=... appropriately), but you get the idea. Also notice that [0] is necessary because xpath will return an array of matches (and you only need the first) and the cast (string) is used to extract text contained in the node.
Besides, your XML is invalid and will be rejected by the parser because of the literal & appearing in the <info name="album"> tag.

If you look at your code again, you are missing a function that turns the first result of the xpath expression into a string of a SimpleXMLElement (casting).
One way to write this once is to extend from SimpleXMLElement:
class BetterXMLElement extends SimpleXMLElement
{
public function xpathString($expression) {
list($result) = $this->xpath($expression);
return (string) $result;
}
}
You then create the more specific SimpleXMLElement like you did use the less specific before:
$file = file_get_contents('http://192.168.0.15:8080/requests/status.xml');
$sxe = new BetterXMLElement($file);
And then you benefit in your following code:
$artist = $sxe->xpathString('//info[#name="artist"]');
$album = $sxe->xpathString('//info[#name="album"]');
$title = $sxe->xpathString('//info[#name="title"]');
echo "<B>Artist: </B>".$artist."</br>";
echo "<B>Title: </B>".$title."</br>";
echo "<B>Album: </B>".$album."</br>";
This spares you some repeated code. This means as well less places you can make an error in :)
Sure you can further on optimize this by allowing to pass an array of multiple xpath queries and returning all values named then. But that is something you need to write your own according to your specific needs. So use what you learn in programming to make programming more easy :)
If you want some more suggestions, here is another, very detailed example using DOMDocument, the sister-library of SimpleXML. It is quite advanced but might give you some good inspiration, I think something similar is possible with SimpleXML as well and this is probably what you're looking for in the end:
Extracting data from HTML using PHP and xPath

Has someone run across a way to force PHP SimpleXMLElement node names to upper case?

The API integration documents specify that that all node name are case sensitive. I'm using PHP SimpleXMLElement and I don't see a way to force upper case node names. Has someone run across a way to force node names to upper case?
$xmlstr = '<Request>'.
'</Request>';
$sxe = new SimpleXMLElement($xmlstr);
$authentication = $sxe->addChild('Authentication');
$authentication->addChild('Version', '2.0');
$processid = $sxe->addChild('Process ID=importSale');
$importsale = $processid->addChild('importSale');
$importsale->addChild('SCRIPTCODE', '<![CDATA[SCRIPT001]]>');
$importsale->addChild('PRODID','<!CDATA[DNTMAN]]>');
echo $sxe->asXML();
When viewing this in “View Source” both “SCRIPTCODE” and “PRODID” are in lower case. How do I force these to upper case?

In theory the code you provided does the job already! all children added to a simpleXMLElement will preserve its original case by default!
$sxe = new SimpleXMLElement('<Request></Request>');
$authentication = $sxe->addChild('Authentication');
$authentication->addChild('Version', '2.0');
$processid = $sxe->addChild('Process ID=importSale');
$importsale = $processid->addChild('importSale');
$importsale->addChild('SCRIPTCODE', '<![CDATA[SCRIPT001]]>');
$importsale->addChild('PRODID','<!CDATA[DNTMAN]]>');
echo $sxe->asXML();
What you get executing the code is something like that :
<Request>
<Authentication>
<Version>2.0</Version>
</Authentication>
<Process>
<Process ID=importSale>
<SCRIPTCODE><![CDATA[SCRIPT001]]></SCRIPTCODE>
<PRODID><!CDATA[DNTMAN]]></PRODID>
</importSale>
</Process>
</Request>
SCRIPTCODE and PRODID both remained uppercase!
Notice : this is not the propper way to add cdata to your node values... this will lead to html-entity conversion like <!CDATA[]]>

Instead of viewing the source code, try echoing the output to the screen with:
echo htmlentities($sxe->asXML());

Using DOMXml and Xpath, to update XML entries

Hello I know there is many questions here about those three topics combined together to update XML entries, but it seems everyone is very specific to a given problem.
I have been spending some time trying to understand XPath and its way, but I still can't get what I need to do.
Here we go
I have this XML file
<?xml version="1.0" encoding="UTF-8"?>
<storagehouse xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="schema.xsd">
<item id="c7278e33ef0f4aff88da10dfeeaaae7a">
<name>HDMI Cable 3m</name>
<weight>0.5</weight>
<category>Cables</category>
<location>B3</location>
</item>
<item id="df799fb47bc1e13f3e1c8b04ebd16a96">
<name>Dell U2410</name>
<weight>2.5</weight>
<category>Monitors</category>
<location>C2</location>
</item>
</storagehouse>
What I would like to do is to update/edit any of the nodes above when I need to. I will do a Html form for that.
But my biggest conserne is how do I find and update a the desired node and update it?
Here I have some of what I am trying to do
<?php
function fnDOMEditElementCond()
{
$dom = new DOMDocument();
$dom->load('storage.xml');
$library = $dom->documentElement;
$xpath = new DOMXPath($dom);
// I kind of understand this one here
$result = $xpath->query('/storagehouse/item[1]/name');
//This one not so much
$result->item(0)->nodeValue .= ' Series';
// This will remove the CDATA property of the element.
//To retain it, delete this element (see delete eg) & recreate it with CDATA (see create xml eg).
//2nd Way
//$result = $xpath->query('/library/book[author="J.R.R.Tolkein"]');
// $result->item(0)->getElementsByTagName('title')->item(0)->nodeValue .= ' Series';
header("Content-type: text/xml");
echo $dom->saveXML();
}
?>
Could someone maybe give me an examples with attributes and so on, so one a user decides to update a desired node, I could find that node with XPath and then update it?

The following example is making use of simplexml which is a close friend of DOMDocument. The xpath shown is the same regardless which method you use, and I use simplexml here to keep the code low. I'll show a more advanced DOMDocument example later on.
So about the xpath: How to find the node and update it. First of all how to find the node:
The node has the element/tagname item. You are looking for it inside the storagehouse element, which is the root element of your XML document. All item elements in your document are expressed like this in xpath:
/storagehouse/item
From the root, first storagehouse, then item. Divided with /. You already know that, so the interesting part is how to only take those item elements that have the specific ID. For that the predicate is used and added at the end:
/storagehouse/item[#id="id"]
This will return all item elements again, but this time only those which have the attribute id with the value id (string). For example in your case with the following XML:
$xml = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<storagehouse xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="schema.xsd">
<item id="c7278e33ef0f4aff88da10dfeeaaae7a">
<name>HDMI Cable 3m</name>
<weight>0.5</weight>
<category>Cables</category>
<location>B3</location>
</item>
<item id="df799fb47bc1e13f3e1c8b04ebd16a96">
<name>Dell U2410</name>
<weight>2.5</weight>
<category>Monitors</category>
<location>C2</location>
</item>
</storagehouse>
XML;
that xpath:
/storagehouse/item[#id="df799fb47bc1e13f3e1c8b04ebd16a96"]
will return the computer monitor (because such an item with that id exists). If there would be multiple items with the same id value, multiple would be returned. If there were none, none would be returned. So let's wrap that into a code-example:
$simplexml = simplexml_load_string($xml);
$result = $simplexml->xpath(sprintf('/storagehouse/item[#id="%s"]', $id));
if (!$result || count($result) !== 1) {
throw new Exception(sprintf('Item with id "%s" does not exists or is not unique.', $id));
}
list($item) = $result;
In this example, $titem is the SimpleXMLElement object of that computer monitor xml element name item.
So now for the changes, which are extremely easy with SimpleXML in your case:
$item->category = 'LCD Monitor';
And to finally see the result:
echo $simplexml->asXML();
Yes that's all with SimpleXML in your case.
If you want to do this with DOMDocument, it works quite similar. However, for updating an element's value, you need to access the child element of that item as well. Let's see the following example which first of all fetches the item as well. If you compare with the SimpleXML example above, you can see that things not really differ:
$doc = new DOMDocument();
$doc->loadXML($xml);
$xpath = new DOMXPath($doc);
$result = $xpath->query(sprintf('/storagehouse/item[#id="%s"]', $id));
if (!$result || $result->length !== 1) {
throw new Exception(sprintf('Item with id "%s" does not exists or is not unique.', $id));
}
$item = $result->item(0);
Again, $item contains the item XML element of the computer monitor. But this time as a DOMElement. To modify the category element in there (or more precisely it's nodeValue), that children needs to be obtained first. You can do this again with xpath, but this time with an expression relative to the $item element:
./category
Assuming that there always is a category child-element in the item element, this could be written as such:
$category = $xpath->query('./category', $item)->item(0);
$category does now contain the first category child element of $item. What's left is updating the value of it:
$category->nodeValue = "LCD Monitor";
And to finally see the result:
echo $doc->saveXML();
And that's it. Whether you choose SimpleXML or DOMDocument, that depends on your needs. You can even switch between both. You probably might want to map and check for changes:
$repository = new Repository($xml);
$item = $repository->getItemByID($id);
$item->category = 'LCD Monitor';
$repository->saveChanges();
echo $repository->getXML();
Naturally this requires more code, which is too much for this answer.

Get child elements in xml with PHP

I have an xml file that I need to parse through and get values. Below is a snippit of xml
<?xml version="1.0"?>
<mobile>
<userInfo>
</userInfo>
<CATALOG>
<s0>
<SUB0>
<DESCR>Paranormal Studies</DESCR>
<SUBJECT>147</SUBJECT>
</SUB0>
</s0>
<sA>
<SUB0>
<DESCR>Accounting</DESCR>
<SUBJECT>ACCT</SUBJECT>
</SUB0>
<SUB1>
<DESCR>Accounting</DESCR>
<SUBJECT>ACCTG</SUBJECT>
</SUB1>
<SUB2>
<DESCR>Anatomy</DESCR>
<SUBJECT>ANATOMY</SUBJECT>
</SUB2>
<SUB3>
<DESCR>Anthropology</DESCR>
<SUBJECT>ANTHRO</SUBJECT>
</SUB3>
<SUB4>
<DESCR>Art</DESCR>
<SUBJECT>ART</SUBJECT>
</SUB4>
<SUB5>
<DESCR>Art History</DESCR>
<SUBJECT>ARTHIST</SUBJECT>
</SUB5>
</sA>
So, I need to grab all the child elements of <sA> and then there are more elements called <sB> etc
But I do not know how to get all of the child elements with <sA>, <sB>, etc.

How about this:
$xmlstr = LoadTheXMLFromSomewhere();
$xml = new simplexml_load_string($xmlstr);
$result = $xml->xpath('//sA');
foreach ($result as $node){
//do something with node
}
PHP does have a nice class to access XML, which is called SimpleXml for a reason, consider heavily using that if your code is going to access only a part of the XML (aka query the xml). Also, consider doing queries using XPath, which is the best way to do it
Notice that I did the example with sA nodes only, but you can configure your code for other node types really easily.
Hope I can help!

you should look into simplexml_load_string() as I'm pretty sure it would make your life a lot easier. It returns a StdObject that you can use like so:
$xml = simplexml_load_string(<your huge xml string>);
foreach ($xml->hpt_mobile->CATALOG->sA as $value){
// do things with sA children
}

$xml = new DOMDocument();
$xml->load('path_to_xml');
$htp = $xml->getElementsByTagName('hpt_mobile')[0];
$catalog = $htp->getElementsByTagName('CATALOG')[0]
$nodes = $catalog->getElementsByTagName('sA')->childNodes;

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP XML Strategy: Parsing DOM to fill "Bean" - php

Did you mean using an XML file as a sort of template ? You can use some factory to build the empty person or biography node and then feed it, or validate using DTD's You can search using xpath on selected DOM nodes, see php DOMXpath manual

Related

How to parse/extract url from an xml file?

PHP script to echo VLC now playing XML attributes

Has someone run across a way to force PHP SimpleXMLElement node names to upper case?

Using DOMXml and Xpath, to update XML entries

Get child elements in xml with PHP

Categories

Resources