How to parse an XML tree with DOMDocument? - php

Here is my XML file :
<?xml version="1.0" encoding="utf-8"?>
<root>
<category>
<name>Category</name>
<desc>Category</desc>
<category>
<name>Subcategory</name>
<desc>Sub-category</desc>
<category>
<name>Subcategory</name>
<desc>Sub-category</desc>
</category>
</category>
</category>
</root>
My tree could have as much levels as possible. There are no requirements about this.
First question :
Is my XML correct to handle this kind of requirement ?
and How could i optimize it (if it's needed)
Second question :
How could I parse it with DOMDocument ?
I know how to load an xml document, but I don't know how to parse it.
I read a little on recursion but I was not able to understand properly how to map with PHP/DOMDocument.
Thanks for the help !
EDIT
What I want to do is manage a category system.
I tried with SQL but it was too hard to manage using the relational model, even with nested select, etc...
So i want to be able make a tree from my xml
like
Category
Sub Category
Sub sub category
Without limits on the depth
I want to be able to search for a category, retrieve all its children (subcategories) (or not), its parent(s) (or not), (the sisters ?), etc...

Well, there's nothing wrong with the XML you're using here, but you don't say enough about what you want to DO with the data for anyone to give you a quality answer about whether or not your XML will capture what you need. As for "[parsing] it with DOMDocument", you can load it into a DOMDocument object like so:
$xml = <<<XML
<?xml version="1.0" encoding="utf-8"?>
<root>
<category>
<name>Category</name>
<desc>Category</desc>
<category>
<name>Subcategory</name>
<desc>Sub-category</desc>
<category>
<name>Subcategory</name>
<desc>Sub-category</desc>
</category>
</category>
</category>
</root>
XML;
$d = new DOMDocument();
$d->loadXML($xml);
At this point, the question once again becomes: Now what do you want to DO with it?

If you're just talking about how to handle a structure like this - i'd say write two functions, one that accepts the full structure, and one that accepts a category DOMNode reference. The first function would do initial processing then pass the first reference to the initial Category node. Then in this function, you process the current node's properties as needed, and then recurse into children if they are present.
It would be more efficient to process this flat of course, in one loop, but then you would lose the literal representation of the hierarchy.

Recapping the point above about what you want to do with it... IMHO there are three broad classes of thing one might do with a chunk of XML.
Having instantiated a DOMDocument and loaded XML into it, you can search it for nodes using XPath queries, much like you search a relational database using SQL SELECT queries. You can extract properties of node, sub-nodes of nodes and the text within nodes. Which is a species of parsing, I'd say. DOMDocument XPath component will do this for you.
You can instead maybe turn your XML into something else - different XML dialect, XHTML, etc, using XSL Transforms. Which may or may not be parsing per se, but does involve parsing. PHP XSLTProcessor component will do this.
Another major idea, which I think DOMDocument does not really support, is a streaming parser. The parser consumes XML in a linear manner, and while doing so invokes callback functions at each node of interest. The somewhat venerable parser named SAX is AFAIK the archetypal streaming parser. There used to be a SAX parser in PHP, I think it has now been moved to PEAR or PECL.
But, yeah, what do you want to do with your XML?

You said you tried SQL and it didn't work for you. Just a tip: If you use Oracle, take a look at START WITH ... CONNECT BY, if you use SQL Server, use recursive CTEs. These approaches do solve the problem.

Related

Map XML accordingly

I am having trouble finding a solution to a problem I am facing, parsing XMLs.
Let me describe what I have now and what's the issue:
I have LINKs of XMLs files that have for example:
<prodcuts>
..
<product>
<id>1</id>
<name><![CDATA[ this is a test product name ]]></name>
<link><![CDATA[http://www.google.com]]></link>
<image><![CDATA[http://www.google.com/image.jpg]]></image>
<sku><![CDATA[ ]]></sku>
<category><![CDATA[ System > Technology ]]></category>
<price>20</price>
<description><![CDATA[ ]]></description>
<instock><![CDATA[ Y ]]></instock>
<availability>Y</availability>
</product>
..
</products>
Another XML has:
<prodcuts>
..
<product>
<productID>1</productID>
<title><![CDATA[ ]]></title>
<link><![CDATA[http://www.google.com]]></link>
<image><![CDATA[http://www.google.com/image.jpg]]></image>
<sku><![CDATA[ ]]></sku>
<categoryPath><![CDATA[ System > Technology ]]></categoryPath>
<price>20</price>
<description><![CDATA[ ]]></description>
<instock><![CDATA[ Y ]]></instock>
<availability>Y</availability>
<size>40</size>
</product>
..
</products>
Now, the difference between those are
1) the first one has a tag name "name", the other one has a tag name "title".
2) The second one has some tags that the first one does not.
Now the problem is, I am parsing the XML file via PHP like this:
$xml->products->product[$i]->id
$xml->products->product[$i]->name
and so on.. If I do this the code I have wrote, will work only for the first one. The tags that are missing is not a problem for now, cause I am inserting to Database NULL cause there are not required fields..
But, what about the second XML? Can I do something "automatically" in order to avoid asking to correct those tags?
This could be done only manually, by grabbing the content of this LINK (via PHP) and rename those ones?
I do not have the file from my clients, just the LINK of XML.
thanks in advance!
ok! I believe I have found some solutions to my problem.. I wrote them here in case someone has the same issues:
Solutions:
i) Read all the children of XML file, no matter how they are written (case-sensitive) and add them to Database. After that, there is a dashboard/PHP file with SQL queries that MATCH those children elements tags of XML with the one that you want.
In this case, you may want to create a file called whatever you like, for example test.xml and CREATE the one that you want, with the correct XML tag elements. In this case, you could UPDATE this, every some hour (according to your needs) via a cronjob..
ii) Create manually the PHP file with the parsing inside, for every XML that you get. Just make sure to keep the XML link in your DB
iii) Ask the client to give you the correct XML. XML is case-sensitive for a reason.
In case you choose the first solution you need to make changes to php.ini file too, cause the XML files may be too large and the max_execution_time is probably too low to run all these PHP - MySQL scripts.
if someone need more explain or have any better advice, please share!

How to get attributes from parents of an xml node efficiently

I'm working on a PHP script using SimpleXML / XPath that needs to print citations for sentences from an XML file which has structure similar to the following:
<text name="text_title">
<book name="book_title">
<chapter name="chapter_title">
<sentence name="sentence_number" id="0000">
<word attr="desired_val" id="1111" />
<word attr="undesired_val" id="2222" />
</sentence>
</chapter>
</book>
</text>
The issue is that I need to return each sentence containing a word bearing attr="desired_val", and then a citation containing its text, book, chapter, and sentence number. I'm currently doing the first part with the xpath query
//word[#$attr='desired_val']/ancestor::sentence
and the second part with a series of subsequent xpath queries based on the ID attribute of each returned sentence, e.g. for the text node:
/text/[book/chapter/sentence[#id={$id}]]/#name
(and so on, for the other relevant nodes). My issue is that this becomes grossly inefficient with large numbers of records, and is causing the script to timeout with more than about ten results. Can anyone suggest ideas about a better way to do this?
If you need all matches, the only optimization I can imagine is to reduce the enormous amount of queries. It takes much time to build the whole list of matches, in order to seek for each match into the document to collect the remaining information. Instead it would be better to query the necessary data from you document in just one step. The same problem occurs in database applications, where people execute too many SQL statements instead of doing everything in just one query.
The SQL for XML is called XQuery. If you use XQuery instead of XPath you can collect all the necessary data in just one step. The following example has been tested with Saxon-HE as a XQuery engine.
<results>
{
for $x in doc("text.xml")/text/book/chapter/sentence/word
where $x/#attr = "desired_val"
return <match text="{$x/../../../../#name}"
book="{$x/../../../#name}"
chapter="{$x/../../#name}"
sentence="{$x/../#name}" />
}
</results>
The following command
java -cp /usr/share/java/Saxon-HE.jar net.sf.saxon.Query '!indent=yes' text.xquery
extracts the required information from the document in just one step.
<?xml version="1.0" encoding="UTF-8"?>
<results>
<match chapter="chapter_title"
text="text_title"
book="book_title"
sentence="sentence_number"/>
</results>
Saxon-HE can be installed on Ubuntu by the following command.
apt-get install libsaxonhe-java
I do not know which XQuery engine is best suited for PHP.

Place a while loop in a string

I would like to create a function where users can create there own XML feed. The feed should be for example the following (quite simple example) feed:
<xml>
<products>
<product>Product 1</product>
<product>Product 2</product>
</products>
</xml>
Very important in the setup is that there is a connection between the database and the setup feed, for example the is loaded from the database. So, the user should create for example the following 'text/xml' as basis:
<xml>
<products>
%whileProducts%
<product>%title%</product>
%/whileProducts%
</products>
</xml>
It is possible to enter the product title via a str_replace, but is it also possible to create a while loop via a replace function? To make it a bit more difficult: it could be possible that there are multiple loops in a loop, for example, a user would like to create a feed with a while loop for the products and inside this loop a new loop for the colors and/or sizes of the product.
No, it's not. str_replace() can only perform literal replacements of one set of constant strings with another corresponding set of constant strings; it can't do anything more complex.
What you want here is a templating engine. Since XML is involved, XSLT may be an appropriate tool to use; it's not simple, though. There are many other templating engines for PHP available, and recommending one is outside the scope of this question.

Generating XML from XPath

I have an example where I need to update an XML node based on an XPath. SimpleXMLElement makes this easy enough by using an XPath to grab the node and then update its value in a pass-by-reference format.
However, this doesn't work at all if the node doesn't actually exist that needs to be updated. Are there any simple ways to automatically generate the XML that matches the XPath if it doesn't exist?
An example:
<example>
<childNode1>green</childNode1>
</example>
Given the XML, I could easily run the xpath command to get the <childNode1> by loading up the xml into a SimpleXMLElement and then running $xml->xpath("example/childNode1");. I could then set the value of the returned SimpleXMLElement to something new and then save the xml.
However, if I needed to set something like <childNode2> the $xml->xpath("example/childNode2") would return nothing and it wouldn't be possible to set the value or confirm that the XML was built.
Is iterating through the XPath and parsing its values the only way to confirm that each child node exists and then build them out as it goes or is there a better way to generate the necessary XPath?
XPath is a query language used for selecting nodes in an XML structure and optionally computing values based on the contents of the XML. It does not possess functions that enable the editing of the XML or automagic node creation.
PHP's SimpleXML and DOM both have XPath implementations and allow the creation or updating of XML structures; I assume you know about them since you talk about editing nodes (I can give examples if you don't). To perform the kind of node additions that you are proposing, you would need to ascertain whether the node existed (e.g. if the xpath query for example/node1 returned a node, but example/node2 returned nothing), create a new node, and add it to the XML tree. For long xpaths, e.g. example/apple/blueberry/canola/date/emblem/node1, you would indeed have to parse the path and check which elements did exist, and add those that did not.
XQuery, a more powerful, fully-featured XML query language of which XPath 2 and XPath are part, will have the ability to transform XML. XQuery is in active development, but unfortunately the current implementations (as of late 2014) lack the ability to update XML. Watch this space though!

PHP XML DOM looping through complex xml records with varying levels

I'm completely unfamiliar with PHP XML DOM, Any help appreciated.
I've been using the site and going through previous answers on this, and anywhere else I could find online, but I can't seem to find a good PHP XML DOM tutorial, my ultimate aim is to take a complex XML and use PHP to upload this into a mysql db. Can anyone recommend a good PHP XML DOM tutorial to do this?
I want to take all the child elements from an XML and put them into an associative array. So in the below 'entry' XML record it would get as far as 'ent_seq' realise there are no sub-nodes, and assign that to a value $array = ($ent_seq[$counter]=>1636730), then the loop would realise the next node is and realise this has subnodes, and choose to assign these subnodes as $array = ($ent_seq[$counter]=>1636730, $keb[$counter]=>通い, $ke_pri[$counter]=>news1); and continue on like that, breaking out the subnodes as appropriate.
<entry>
<ent_seq>1636730</ent_seq>
<k_ele>
<keb>通い</keb>
<ke_pri>news1</ke_pri>
<ke_pri>nf13</ke_pri>
</k_ele>
<r_ele>
<reb>かよい</reb>
<re_pri>news1</re_pri>
<re_pri>nf13</re_pri>
</r_ele>
<sense>
<pos>&n;</pos>
<gloss>coming and going</gloss>
<gloss>commuting</gloss>
</sense>
</entry>
I've not included any of the code I've come up with as its not helpful and it just selects the nodes at the ent_seq level, so for each record its only storing a few pieces of data, and all of the subnodes within the same record.
If anyone could point me in the right direction It would be appreciated.

Categories