reading xml: can xpath read 2 fields? - php

I'm using SimpleXMLElement and xpath to try and read the <subcategory><name> from the xml at the very bottom. This code works.. but the stuff inside the while loop looks a little messy, and now I also want to get the <subcategory><count> and somehow pair it with its appropriate <subcategory><name>.
$names = $xml->xpath('/theroot/category/subcategories/subcategory/name/');
while(list( , $node) = each($names)) {
echo $node;
}
My question: Is it possible to get this pairing while still using xpath since it looks like it can make the job easier?
<theroot>
<category>
<name>Category 1</name>
<subcategories>
<subcategory>
<name>Subcategory 1.1</name>
<count>18</count>
</subcategory>
<subcategory>
<name>Subcategory 1.2</name>
<count>29</count>
</subcategory>
</subcategories>
</category>
<category>
<name>Category 2</name>
<subcategories>
<subcategory>
<name>Subcategory 2.1</name>
<count>18</count>
</subcategory>
<subcategory>
<name>Subcategory 2.2</name>
<count>29</count>
</subcategory>
</subcategories>
</category>
</theroot>

If you are using SimpleXML, and you know the exact layout, it might be easier to do this:
$subcategories = $xml->xpath('/theroot/category/subcategories/subcategory');
foreach($subcategories as $subcategory){
echo $subcategory->name.'='.$subcategory->count;
}
With XPath, you could ofcourse select all subnodes of subcategory, but pairing them back up could be more trouble then just foregoing xpath for the last node.

Related

SimpleXML append multiple files

I'm trying to combine multiple XML files, using SimpleXML if possible. I'm just trying to append products, children, and child data from file 2 into file 1. I'm not trying to merge elements, just append file 2 to the bottom of file 1, and so on. (Though I guess this is technically merging merchandiser elements?) The files contain the same schema and will both look similar to the example below, only thing that will be changing is the actual text. This is just XML for two different products, I added a large space in between products so that it's easier to see where it ends.
<merchandiser xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="merchandiser.xsd">
<header>
<merchantId>35928</merchantId>
<merchantName>Sunspel Clothing</merchantName>
<createdOn>01/14/2016 02:03:31</createdOn>
</header>
<product product_id="14633" name="Cotton Socks" sku_number="1588/102">
<category>
<primary>Accessories</primary>
<secondary>Men's~~Socks</secondary>
</category>
<URL>
<product>
http://click.linksynergy.com/link?id=D*rqD2paIXY&offerid=191965.14633&type=15&murl=http%3A%2F%2Fwww.sunspel.com%2Fuk%2Fcotton-sock-black.html
</product>
<productImage>
http://www.sunspel.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/1/5/1588-102-new.jpg
</productImage>
</URL>
<description>
<short>
Our new cotton socks are designed by Sunspel and crafted in an Italian factory steeped in years of experience, skill and heritage. They are made from the highest quality, extra-long staple Egyptian cotton yarn which, prior to knitting is combed, twisted and mercerised to enhance the comfort, shine and absorption of the fabric as well as its resistance to pilling and shrinking.
</short>
</description>
<discount currency="GBP">
<type>amount</type>
</discount>
<price currency="GBP">
<retail>15.00</retail>
</price>
<shipping>
<availability>in-stock</availability>
</shipping>
<pixel>
http://ad.linksynergy.com/fs-bin/show?id=D*rqD2paIXY&bids=191965.14633&type=15&subid=0
</pixel>
</product>
<product product_id="15115" name="Cotton Socks" sku_number="1589/236">
<category>
<primary>Accessories</primary>
<secondary>Men's~~Socks~~Men's</secondary>
</category>
<URL>
<product>
http://click.linksynergy.com/link?id=D*rqD2paIXY&offerid=191965.15115&type=15&murl=http%3A%2F%2Fwww.sunspel.com%2Fuk%2Fmens-cotton-socks-navy-stripes.html
</product>
<productImage>
http://www.sunspel.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/1/5/1588-236-new.jpg
</productImage>
</URL>
<description>
<short>
Our new cotton socks are designed by Sunspel and crafted in an Italian factory steeped in years of experience, skill and heritage. They are made from the highest quality, extra-long staple Egyptian cotton yarn which, prior to knitting is combed, twisted and mercerised to enhance the comfort, shine and absorption of the fabric as well as its resistance to pilling and shrinking.
</short>
</description>
<discount currency="GBP">
<type>amount</type>
</discount>
<price currency="GBP">
<retail>17.00</retail>
</price>
<shipping>
<availability>in-stock</availability>
</shipping>
<pixel>
http://ad.linksynergy.com/fs-bin/show?id=D*rqD2paIXY&bids=191965.15115&type=15&subid=0
</pixel>
</product>
<product product_id="15116" name="Cotton Socks" sku_number="1589/711">
<category>
<primary>Accessories</primary>
<secondary>Men's~~Socks~~Men's</secondary>
</category>
<URL>
<product>
http://click.linksynergy.com/link?id=D*rqD2paIXY&offerid=191965.15116&type=15&murl=http%3A%2F%2Fwww.sunspel.com%2Fuk%2Fmens-cotton-socks-charcoal-melange-stripes.html
</product>
<productImage>
http://www.sunspel.com/media/catalog/product/cache/1/image/9df78eab33525d08d6e5fb8d27136e95/1/5/1588-711-new.jpg
</productImage>
</URL>
<description>
<short>
Our new cotton socks are designed by Sunspel and crafted in an Italian factory steeped in years of experience, skill and heritage. They are made from the highest quality, extra-long staple Egyptian cotton yarn which, prior to knitting is combed, twisted and mercerised to enhance the comfort, shine and absorption of the fabric as well as its resistance to pilling and shrinking.
</short>
</description>
<discount currency="GBP">
<type>amount</type>
</discount>
<price currency="GBP">
<retail>17.00</retail>
</price>
<shipping>
<availability>in-stock</availability>
</shipping>
<pixel>
http://ad.linksynergy.com/fs-bin/show?id=D*rqD2paIXY&bids=191965.15116&type=15&subid=0
</pixel>
</product>
With some foreach statements I'm able to append all product children and attributes, but this doesn't actually give me the child data.
$file1 = '35928_3210820_mp.xml';
$file2 = '39153_3210820_mp.xml';
$fileout = 'ukmerge.xml';
$xml1 = simplexml_load_file( $file1 );
$xml2 = simplexml_load_file( $file2 ); // loop through the product and add them and their attributes to xml1
$product = $xml2->product;
$prod = $xml2->merchandiser->header->product;
$category = $product->category;
$url = $product->URL;
$description = $product->description;
foreach( $xml2->children() as $child ) {
$new = $xml1->addChild( $child->getName() , htmlspecialchars($child) );
foreach( $child->attributes() as $key => $value ) {
$new->addAttribute( $key, $value );
}
} $fh = fopen( $fileout, 'w') or die ( "can't open file $fileout" );
fwrite( $fh, $xml1->asXML() );
fclose( $fh );
When I try to add on from there then everything gets messed up and nothing is in the correct place/order anymore. I'd also like to put this into a function since I'm going to be doing it often. Any help is greatly appreciate as I've been struggling with this for a few days now and have scowered over a few dozen stackoverflow and php.net threads.
One of the things that's confusing me is the <merchandiser> and <header> tags that every file starts with. Once the merchandiser tag ends it is the end of the document so I need to take only what's inside the merchandiser tag of file 2 and append it inside the merchandiser tag of file 1. The header tag just confuses me cause I'm not sure if it's gets in the way or not.
As preliminary note, your XML sample is malformed. Also it is not coherent with your code (i.e. there is not ->merchandiser->header->product ).
So, in this example I will use a different sample, like this one (file1.xml):
<root>
<product>
<name>Product 1</name>
</product>
<product>
<name>Product 2</name>
</product>
</root>
and this one (file2.xml):
<root>
<product>
<name>Product 3</name>
</product>
<product>
<name>Product 4</name>
</product>
</root>
You don't want to use DOMDocument->importNode() due “it kept throwing a lot of errors”.
You can use DOMDocument in conjunction with SimpleXML and dom_import_simplexml() function.
First of all, prepare destination XML: load the file with SimpleXML, create a DOMDocument using dom_import_simplexml() and set $parent variable to <root> element:
$dst = simplexml_load_file( 'file1.xml' );
$dst = dom_import_simplexml( $dst )->ownerDocument;
$parent = $dst->getElementsByTagName( 'root' )->item(0);
Then, load second file with SimpleXML:
$src = simplexml_load_file( 'file2.xml' );
Through a foreach() loop, import each <product> element from SimpleXML to DOMDocument and appent it as child of $parent node:
foreach( $src->product as $product )
{
$node = dom_import_simplexml( $product );
$node = $dst->importNode( $node, 1 );
$parent->appendChild( $node );
}
Now, your merged XML is ready. You can print it using $dst->saveXML().
I've not be able to product a correctly indented XML. BTW to do this, you can reload-it:
$final = new DOMDocument();
$final->loadXML( $dst->saveXML(), LIBXML_NOBLANKS );
$final->formatOutput = True;
echo $final->saveXML();
Final output:
<?xml version="1.0"?>
<root>
<product>
<name>Product 1</name>
</product>
<product>
<name>Product 2</name>
</product>
<product>
<name>Product 3</name>
</product>
<product>
<name>Product 4</name>
</product>
</root>

PHP xpath get elements by attribute in foreach loop

im am trying to loop through all the LINE_ITEMS and it works fine but in the foreach-loop im trying to access the single LINE_ITEM by attribute by using xpath but i don't get any result. Does anyone know the problem ?
foreach($items = $xmlData->xpath('//zw:LINE_ITEM') as $item) {
$item->xpath('//namespace:PID[#type="erp_pid"]'); No result
}
$xmlData->xpath('//zw:LINE_ITEM')
works fine i get all LINE_ITEMS but
when i try to do some xpath on the item i don't get any result.
How can i access the PID value of for example "erp_pid" ?
<?xml version="1.0" encoding="UTF-8"?>
<NM_DOCS>
<NM_DOC>
<DOCUMENT qualifier="default" role="original" test="false" type="orders">
<VERSION>4.0</VERSION>
<HEADER>
<CONTROL_INFO>
<LAST_SAVE_DATE>2015-03-18T13:44:32+01:00</LAST_SAVE_DATE>
<PROCESS_TYPE>silent</PROCESS_TYPE>
<SOURCE>sales</SOURCE>
</CONTROL_INFO>
<SOURCING_INFO>
<REFERENCES>
<REFERENCE type="order_nexmart">
<ID>109546063</ID>
<DATES>
<DATE type="order">2015-03-18T13:44:30+01:00</DATE>
</DATES>
</REFERENCE>
</REFERENCES>
</SOURCING_INFO>
<DOCUMENT_INFO>
<DOCUMENT_ID>Test bestellnummer</DOCUMENT_ID>
<DATES>
<DATE type="delivery_ordered">2015-04-19T12:41:41+02:00</DATE>
</DATES>
<PARTIES>
<PARTY type="buyer">
<BUSINESS_ROLE>commercial</BUSINESS_ROLE>
<PORTAL_ID>BDE600028</PORTAL_ID>
<ADDITIONAL_IDS>
</ADDITIONAL_IDS>
<ADDRESS>
</ADDRESS>
<CONTACT_DETAILS>
<ACCOUNTS>
<ID type="emart">sales.nexmart</ID>
</ACCOUNTS>
</CONTACT_DETAILS>
</PARTY>
<PARTY type="supplier">
<BUSINESS_ROLE>commercial</BUSINESS_ROLE>
<PORTAL_ID>zweygart_app_de</PORTAL_ID>
<ADDITIONAL_IDS>
</ADDITIONAL_IDS>
<ADDRESS>
</ADDRESS>
<CONTACT_DETAILS>
</CONTACT_DETAILS>
</PARTY>
<PARTY type="delivery">
<BUSINESS_ROLE>commercial</BUSINESS_ROLE>
<ADDRESS>
</ADDRESS>
<CONTACT_DETAILS>
</CONTACT_DETAILS>
</PARTY>
<PARTY type="invoice_recipient">
<ADDRESS>
</ADDRESS>
<CONTACT_DETAILS>
</CONTACT_DETAILS>
</PARTY>
</PARTIES>
<REMARKS>
<REMARK type="order">Test bemerkung</REMARK>
</REMARKS>
</DOCUMENT_INFO>
</HEADER>
<LINE_ITEMS>
<LINE_ITEM>
<PRODUCT_ID>
<SUPPLIER_PID>119556</SUPPLIER_PID>
<GTIN>4030646269130</GTIN>
<ADDITIONAL_PIDS>
<PID type="supplier_pid_original">119556</PID>
<PID type="gtin_original">4030646269130</PID>
<PID type="erp_pid">119556</PID>
</ADDITIONAL_PIDS>
<DESCRIPTIONS>
<DESCR type="short">short desc</DESCR>
<DESCR type="short_original">some text</DESCR>
</DESCRIPTIONS>
</PRODUCT_ID>
</LINE_ITEM>
<LINE_ITEM>
<PRODUCT_ID>
<SUPPLIER_PID>123456789</SUPPLIER_PID>
<GTIN>123456789</GTIN>
<ADDITIONAL_PIDS>
<PID type="supplier_pid_original">123456</PID>
<PID type="gtin_original">123456</PID>
<PID type="erp_pid">123456</PID>
</ADDITIONAL_PIDS>
<DESCRIPTIONS>
<DESCR type="short">short desc</DESCR>
<DESCR type="short_original">Some description</DESCR>
</DESCRIPTIONS>
</PRODUCT_ID>
</LINE_ITEM>
</LINE_ITEMS>
</DOCUMENT>
</NM_DOC>
</NM_DOCS>
The problem is not XPath but SimpleXML. SimpleXMLElement::xpath() is limited. It converts the result into an array of SimpleXMLElement objects, but here are other nodes in a DOM. More important you will have to register the namespaces on each new SimpleXMLElement again.
$element = new SimpleXMLElement($xml);
$element->registerXPathNamespace('namespace', 'urn:foo');
foreach($element->xpath('//namespace:LINE_ITEMS/namespace:LINE_ITEM') as $item) {
$item->registerXPathNamespace('namespace', 'urn:foo');
var_dump((string)$item->xpath('.//namespace:PID[#type="erp_pid"]')[0]);
}
Output:
string(6) "119556"
string(6) "123456"
You might notice that I prefixed your detail expression with an .. A slash at the start of the expression always makes it relative to the document itself, not the current node. The . represents the current node.
If you use DOM directly, you create a separate DOMXPath object and register the namespaces on this object. Additionally you can use XPath expressions that return scalar values.
$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXPath($dom);
$xpath->registerNamespace('namespace', 'urn:foo');
foreach($xpath->evaluate('//namespace:LINE_ITEMS/namespace:LINE_ITEM') as $node) {
var_dump($xpath->evaluate('string(.//namespace:PID[#type="erp_pid"])', $node));
}
This works for me:
foreach($xmlData->LINE_ITEM as $item) {
$erp = ( $item->xpath('//PID[#type="erp_pid"]'));
foreach($erp as $v) {
echo $v. " / ";
}
}
Just remove the namespace in your xpath, your xml doesn't use a namespace.
If you want to iterate through a part of the xml be sure to use the correct path.

Prepending raw XML using PHP's SimpleXML

Given a base $xml and a file containing a <something> tag with attributes, children and children of its children, I would like to append it as first child and all of its children as raw XML.
Original XML:
<root>
<people>
<person>
<name>John Doe</name>
<age>47</age>
</person>
<person>
<name>James Johnson</name>
<age>13</age>
</person>
</people>
</root>
XML in file:
<something someval="x" otherthing="y">
<child attr="val" ..> { some children and values ... }</child>
<child attr="val2" ..> { some children and values ... }</child>
...
</something>
Result XML:
<root>
<something someval="x" otherthing="y">
<child attr="val" ..> { some children and values ... }</child>
<child attr="val2" ..> { some children and values ... }</child>
...
</something>
<people>
<person>
<name>John Doe</name>
<age>47</age>
</person>
<person>
<name>James Johnson</name>
<age>13</age>
</person>
</people>
</root>
This tag would contain several children both direct and recursively, so it would not be practical to build the XML via the SimpleXML operations. Besides, keeping it in a file would result in lower maintenance costs.
Technically it would simply be prepending one child. The problem is that this child would have other children and so on.
On the PHP addChild page there's a comment that says:
$x = new SimpleXMLElement('<root name="toplevel"></root>');
$f1 = new SimpleXMLElement('<child pos="1">alpha</child>');
$x->{$f1->getName()} = $f1; // adds $f1 to $x
However, this does not seem to treat my XML as raw XML therefore causing < and > escaped tags to appear. Several warnings concerning namespaces seem to appear as well.
I suppose I could do a quick replace of such tags but I am not sure whether it could cause future problems and it certainly does not feel right.
Manually hacking the XML is not an option and neither is adding children one by one. Choosing a different library could be.
Any clues on how to get this working?
Thanks!
I'm really not sure if that will work. Try this or downvote this, but I hope it helps. Using DOMDocument (Reference)
<?php
$xml = new DOMDocument();
$xml->loadHTML($yourOriginalXML);
$newNode = DOMDocument::createElement($someXMLtoPrepend);
$nodeRoot = $xml->getElementsByTagName('root')->item(0);
$nodeOriginal = $xml->getElementsByTagName('people')->item(0);
$nodeRoot->insertBefore($newNode,$nodeOriginal);
$finalXmlAsString = $xml->saveXML();
?>
Sometimes UTF-8 can make problems, then try this:
<?php
$xml = new DOMDocument();
$xml->loadHTML(mb_convert_encoding($yourOriginalXML, 'HTML-ENTITIES', 'UTF-8'));
$newNode = DOMDocument::createElement(mb_convert_encoding($someXMLtoPrepend, 'HTML-ENTITIES', 'UTF-8'));
$nodeRoot = $xml->getElementsByTagName('root')->item(0);
$nodeOriginal = $xml->getElementsByTagName('people')->item(0);
$nodeRoot->insertBefore($newNode,$nodeOriginal);
$finalXmlAsString = $xml->saveXML();
?>

How should I modify this for each statement to only display entries where all of the values are unique?

foreach($resultXML->products->children() as $product) {
echo "<p>".$product->{'advertiser-name'}." - ".$product->price."</p>
<p>".$product->{'description'}."</p>";
}
Suppose I wanted to screen out the ones that had the same title, and only display the first title that appears in the return results.
I'm not working with my own database, this is all about what's displayed.
I suppose the easiest way would be to keep track of the titles in an array, and checking it each iteration.
$titles = array();
foreach($resultXML->products->children() as $product) {
if (in_array($product->title, $titles) continue;
$titles[] = $product->title;
echo "<p>".$product->{'advertiser-name'}." - ".$product->price."</p>
<p>".$product->{'description'}."</p>";
}
Assuming that the title is contained in $product->title. You could do something fancier through array functions, but I don't see a reason to make a simple problem complicated.
You have not provided any exemplary XML, so given for
<?xml version="1.0" encoding="UTF-8"?>
<example>
<products>
<product>
<title>First Product</title>
<advertiser-name>First Name</advertiser-name>
</product>
</products>
<products>
<product>
<title>Second Product</title>
<advertiser-name>First Name</advertiser-name>
</product>
</products>
<products>
<product>
<title>Third Product</title>
<advertiser-name>Second Name</advertiser-name>
</product>
</products>
</example>
You want to get all product elements with an advertiser-name that is not an advertiser-name of all preceding product elements.
So for the XML above, that would be the 1st and 3rd product element.
You can write that down as an XPath expression:
/*/products/product[not(advertiser-name = preceding::product/advertiser-name)]
And as PHP code:
$xml = simplexml_load_string($buffer);
$expr = '/*/products/product[not(advertiser-name = preceding::product/advertiser-name)]';
foreach ($xml->xpath($expr) as $product) {
echo $product->asXML(), "\n";
}
This produces the following output:
<product>
<title>First Product</title>
<advertiser-name>First Name</advertiser-name>
</product>
<product>
<title>Third Product</title>
<advertiser-name>Second Name</advertiser-name>
</product>
So one answer to your question therefore is: Only query those elements from the document you're interested in. XPath can be used for that with SimpleXMLElement.
Related questions:
Implementing condition in XPath

SimpleXML query

hi my xml has this structure
<articles>
<article>
<title></title>
<text></text>
<notes>
<note>
<code></code>
<text></text>
</note>
<note>
<code></code>
<text></text>
</note>
</notes>
</article>
</articles>
i have to do this query (sql speaking :) ):
SELECT article WHERE note.code=XXX AND note.text CONTAINS YYY
how can i do that in php?
You use Xpath the query language for xml. W3School have a nice tutorial here: http://www.w3schools.com/XPath/
And to use SimpleXML is simple :P:
$xml = new SimpleXMLElement($xmlstring);
$result = $xml->xpath('/articles/article[code=XXX] and /articles/article[text contains(YYY)]');
foreach ($result as $node)
{
//Do stuff
}
If you use SimpleXML you have to iterate over all childrens and then recursive over their children. Look at DOMDocument and getElementsByTagName method.
I think, it is what you are looking for.
http://www.php.net/manual/en/class.domdocument.php
http://www.php.net/manual/en/domdocument.getelementsbytagname.php

Categories