Remove XML Node if Child Node equals to an array value - php

I'm trying to remove "Product" nodes from an XML if "ProductID" is one of the values in an array. For some reasons, my code only removes the "Product" node with the first "ProductID" found in the array. How do I remove all "Product" nodes if their "ProductID" is one of the values in the array? Are there better way to code this?
Example of the XML:
<Product>
<ProductID>ZZ-DS</ProductID>
<Item>Drop Ship Charges</Item>
<Qty>578</Qty>
<ListPrice>2.50</ListPrice>
<YourPrice>2.50</YourPrice>
<UPC/>
<VendorProductID/>
<ImageSmall>URL</ImageSmall>
<ImageLarge>URL</ImageLarge>
</Product>
<Product>
<ProductID>ZZAI-100</ProductID>
<Item>ZZAI-100</Item>
<Qty>0</Qty>
<ListPrice>0.75</ListPrice>
<YourPrice>0.75</YourPrice>
<UPC/>
<VendorProductID>AI- BUBBLES</VendorProductID>
<ImageSmall>URL</ImageSmall>
<ImageLarge>URL</ImageLarge>
</Product>
What I did:
foreach($xml->Product as $product) {
// Not allowed product IDs
$notAllowed = array('ZZ-DS','ZZAI-100','ZZAI-101','ZZAI-TG01','ZZWM-BL00001N','ZZWM-BL00176N','ZZWM-DJ00089N','ZZWM-DL00195N','ZZWM-DL00196N','ZZWM-DL00198N','ZZWM-DL00292N','ZZWM-DL00293N','ZZWM-DL00295N','ZZWM-UG00049N','ZZWM-UG00050N','ZZWM-UG00051N','ZZWM-V012194N','ZZWM-V012197N','ZZWM-V012207N','ZZWM-V012216N','ZZWM-V021012N','ZZWM-WM10086N','ZZWM-WMP0037N');
if (in_array($product->ProductID, $notAllowed)) {
$deleteNode = dom_import_simplexml($product);
$deleteNode->parentNode->removeChild($deleteNode);
}
}
As the result, the code removed only "Product" node with "ProductID" equals to "ZZ-DS".

The issue is a case of trying to loop over the contents of an array and then removing parts of that array, the array becomes out of sync with the loop. You can see this if you add echo $product->ProductID.PHP_EOL; to the loop and see what gets displayed.
Here are two ways to solve this. The first just uses XPath to find a list of the nodes to work with and then loop over these nodes and check the ID and remove if you want. As your looping over the XPath list of nodes, it will not change whilst removing the nodes your after...
$products = $xml->xpath("//Product");
foreach($products as $product) {
if (in_array($product->ProductID, $notAllowed)) {
$deleteNode = dom_import_simplexml($product);
$deleteNode->parentNode->removeChild($deleteNode);
}
}
The second method would be to select just the nodes you want to delete, this involves building an XPath expression something like...
//Product[ProductID='ZZ-DS' or ProductID='ZZAI-100' or ...]
So this is just a case of imploding the list of ID's with the appropriate other bits, then it's a case of removing all the matching nodes...
$notAllowedList = implode("' or ProductID='", $notAllowed);
$products = $xml->xpath("//Product[ProductID='".$notAllowedList."']");
foreach($products as $product) {
$deleteNode = dom_import_simplexml($product);
$deleteNode->parentNode->removeChild($deleteNode);
}

Related

PHP: How to force simplexml to use certain datatype for a node

How can one enforce simplexml_load_string( ) to use same data structure at each node point.
$xml = "
<level1>
<level2>
<level3>Hello</level3>
<level3>stackoverflow</level3>
</level2>
<level2>
<level3>My problem</level3>
</level2>
</level1>";
$xmlObj = simplexml_load_string($xml)
var_dump($xmlObj);
Examining the output,
level1 is an object; level2 is an array; level2[0] is an array.
level2[1] is an object, because there's only one child node, which I'll rather have as a single index array.
I'm collecting the xml from user, and there may be 1 or more nodes inside each level2. My sanitisation block is a foreach loop which fails when there's only one node inside level2.
The sanitation block looks something like this
foreach($xmlObj -> level2 as $lvl2){
if($lvl2 -> level3[0] == 'condition'){ doSomething( ); }
}
doSomething() works fine when <level2> always has more than one child node in the xml string. If <level2> has only one child <level3> node, an error about trying to get attribute of a non-object comes up.
var_dump shows that the data type changes from object to array depending on how many nodes are nested within.
I'll prefer a way to ensure <level2> to always be an array regardless of how many children are within. That saves me from editing too much. But any other way out would suffice.
Thanks
It is not an information available in the XML itself. So you will have to add it in your implementation. SimpleXML provides both list and item access to a child elements. If you access it as a list (for example with foreach) it will provide all matching child elements.
$xml = "
<level1>
<level2>
<level3>Hello</level3>
<level3>stackoverflow</level3>
</level2>
<level2>
<level3>My problem</level3>
</level2>
</level1>";
$level1 = new SimpleXMLElement($xml);
$result = [];
foreach($level1->level2 as $level2) {
$data2 = [];
foreach ($level2->level3 as $level3) {
$data2[] = (string)$level3;
}
$result[] = $data2;
}
var_dump($result);
So the trick is to use the SimpleXMLElement instance directly and not convert it into an array. Do not treat the creation of your JSON structure as a generic conversion. Build up a specific output while reading the XML using SimpleXML.

Xpath looping query

I have the following xml doc:
<shop id="123" name="xxx">
<product id="123456">
<name>Book</name>
<price>9.99</price
</product>
<product id="789012">
<name>Perfume</name>
<price>12.99</price
</product>
<product id="345678">
<name>T-Shirt</name>
<price>9.99</price
</product>
</shop>
<shop id="456" name="yyy">
<product id="123456">
<name>Book</name>
<price>9.99</price
</product>
</shop>
I have the following loop to gather the information for each product:
$data_feed = 'www.mydomain.com/xml/compression/gzip/';
$xml = simplexml_load_file("compress.zlib://$data_feed");
foreach ($xml->xpath('//product') as $row) {
$id = $row["id"]; // product id eg. "123456"
$name = $row->name;
$price = $row->price;
// update database etc.
}
HOWEVER, I also want to gather the information for each product's parent shop ("id" and "name").
I can easily change my xpath to start from shop as opposed to product, but I'm unsure of the most efficient way to then construct an additional loop within my foreach to loop each indented product
Make sense?
I'd go without xpath and just use two nested foreach-loops:
$xml = simplexml_load_string($x); // assume XML in $x
foreach ($xml->shop as $shop) {
echo "shop $shop[name], id $shop[id] <br />";
foreach ($shop->product as $product) {
echo "- $product->name (id $product[id]), $product->price <br />";
}
}
see it working: http://codepad.viper-7.com/vFmGvY
BTW: your XML is broken, probably a typo. Each closing </price> is missing its last >.
Sure, makes sense, you want one iteration, not a nested product of iterations (albeit that won't cut you much, #michi showed already), which is possible as well:
foreach ($xml->xpath('//product') as $row)
{
$id = $row["id"]; // product id eg. "123456"
$name = $row->name;
$price = $row->price;
$shopId = $row->xpath('../#id')[0];
$shopName = $row->xpath('../#name')[0];
// update database etc.
}
As this example shows, you can run xpath() on each element-node and the context-node is automatically set to the node itself, therefore the realtive path .. in xpath works to access the parent element (see as well: Access an element's parent with PHP's SimpleXML?). Of that then both attributes are read and then via PHP 5.4 array de-referencing the first (and only) attribute is accessed.
I hope this helps and shed some light how it works. Your question reminds me a bit of an earlier one where I suggested some kind of generic solution to these kind of problems:
Answer to Combining two Xpaths into one loop?

SimpleXml attributes not showing for element with no children

I have the following XML:
<?xml version="1.0"?>
<STATUS_LIST>
<ORDER_STATUS SORDER_CODE="SO001" ASSOCIATED_REF="001">
<INVOICES>
<INVOICE INVOICE_CODE="???">SOMETHING</INVOICE>
</INVOICES>
</ORDER_STATUS>
</STATUS_LIST>
When I run the following code:
$statuses = simplexml_load_string($result); //Where $result is my XML
if (!empty($statuses))
{
foreach ($statuses as $status)
{
foreach ($status->INVOICES as $invoice)
{
echo (string)$invoice->attributes()->INVOICE_CODE;
}
}
}
I step through this code and I can see the attributes against ORDER_STATUS but I can't see the attributes against INVOICE. I can however see the value SOMETHING against invoice.
Any idea what could cause this?
Update
After some testing, I can get the attributes to show if I add an element into the INVOICE element, so if I use this xml instead it will work:
<?xml version="1.0"?>
<STATUS_LIST>
<ORDER_STATUS SORDER_CODE="SO001" ASSOCIATED_REF="001">
<INVOICES>
<INVOICE INVOICE_CODE="???"><TEST>tester</TEST></INVOICE>
</INVOICES>
</ORDER_STATUS>
</STATUS_LIST>
So it has to have an element inside to pick up the attributes!?
According to this past question, "SimpleXML doesn't allow attributes and text on the same element." It's pretty ridiculous, and I couldn't find any official coverage of that fact, but it seems true. Lame. It's valid XML. I know Perl SimpleXML reads it fine.
Your problem has nothing to do with the element having no content, you simply have your loops defined slightly wrong.
When you write foreach ($status->INVOICES as $invoice), SimpleXML will loop over every child of the $status element which is called INVOICES; in this case there will always be exactly one such element. But what you actually wanted is to loop over all the children of that element - the individual INVOICE nodes.
To do that you can use one of the following:
foreach ($status->INVOICES->children() as $invoice) (loop over all child nodes of the first, and in this case only, INVOICES element)
foreach ($status->INVOICES[0]->children() as $invoice) (the same, but being more explicit about selecting the first INVOICES node)
foreach ($status->INVOICES[0] as $invoice) (actually does the same again: because you've specifically selected one node, and then asked for a loop, SimpleXML assumes you want its children; this is why foreach($statuses as $status) works as the outer loop)
foreach ($status->INVOICES->INVOICE as $invoice) (loop over only child nodes called INVOICE, which happens to be all of them)
Personally, I would rewrite your sample loop as below:
foreach ($statuses->ORDER_STATUS as $status)
{
foreach ($status->INVOICES->INVOICE as $invoice)
{
echo (string)$invoice['INVOICE_CODE'];
}
}
Here's a live demo to prove that that works.

Two dimensional array

i've tried to find this out by myself before asking but cant really figure it out.
What I have is a loop, it's actually a loop which reads XML data with simplexml_load_file
Now this XML file has data which I want to read and put into an array.. a two dimensional array actually..
So the XML file has a child called Tag and has a child called Amount.
The amount is always differnt, but the Tag is usually the same, but can change sometimes too.
What I am trying to do now is:
Example:
This is the XML example:
<?xml version="1.0"?>
<Data>
<Items>
<Item Amount="9,21" Tag="tag1"/>
<Item Amount="4,21" Tag="tag1"/>
<Item Amount="6,21" Tag="tag2"/>
<Item Amount="1,21" Tag="tag1"/>
<Item Amount="6,21" Tag="tag2"/>
</Data>
</Items>
Now i have a loop which reads this, sees what tag it is and adds up the amounts.
It works with 2 loops and two different array, and I would like to have it all in one array in single loop.
I tried something like this:
$tags = array();
for($k = 0; $k < sizeof($tags); $k++)
{
if (strcmp($tags[$k], $child['Tag']) == 0)
{
$foundTAG = true;
break;
}
else
$foundTAG = false;
}
if (!$foundTAG)
{
$tags[] = $child['Tag'];
}
and then somewhere in the code i tried different variations of adding to the array ($counter is what counts the Amounts together):
$tags[$child['Tag']][$k] = $counter;
$tags[$child['Tag']][] = $counter;
$tags[][] = $counter;
i tried few other combinations which i already deleted since it didnt work..
Ok this might be a really noob question, but i started with PHP yesterday and have no idea how multidimensional arrays work :)
Thank you
this is how you can iterate over the returned object from simple xml:
$xml=simplexml_load_file("/home/chris/tmp/data.xml");
foreach($xml->Items->Item as $obj){
foreach($obj->Attributes() as $key=>$val){
// php will automatically cast each of these to a string for the echo
echo "$key = $val\n";
}
}
so, to build an array with totals for each tag:
$xml=simplexml_load_file("/home/chris/tmp/data.xml");
$tagarray=array();
// iterate over the xml object
foreach($xml->Items->Item as $obj){
// reset the attr vars.
$tag="";
$amount=0;
// iterate over the attributes setting
// the correct vars as you go
foreach($obj->Attributes() as $key=>$val){
if($key=="Tag"){
// if you don't cast this to a
// string php (helpfully) gives you
// a psuedo simplexml_element object
$tag=(string)$val[0];
}
if($key=="Amount"){
// same as for the string above
// but cast to a float
$amount=(float)$val[0];
}
// when we have both the tag and the amount
// we can store them in the array
if(strlen($tag) && $amount>0){
$tagarray[$tag]+=$amount;
}
}
}
print_r($tagarray);
print "\n";
This will break horribly should the schema change or you decide to wear blue socks (xml is extremely colour sensitive). As you can see dealing with the problem child that is xml is tedious - yet another design decision taken in a committee room :-)

Can explain this block Code to me PHP XML DOMDocument Syntax

I am currently learning different ways to iterate through the xml document tags using the
php DOMDocument object, I understand the foreach loop for iterating through the tags, but the $element->item(0)->childNodes->item(0)->nodeValue is a bit unclear to me could somebody explain to me in detail? Thank you.
<?php
$xmlDoc = new DOMDocument();
$xmlDoc->load('StudentData.xml');
$studentRoot = $xmlDoc->getElementsByTagName('Student');
for ($i = 0; $i < ($studentRoot->length); $i++) {
$firstNameTags = $studentRoot->item($i)->getElementsByTagName('FirstName');
echo $firstNameTags->item(0)->childNodes->item(0)->nodeValue.' <br />';
}
/* so much easier and clear to understand! */
foreach($studentRoot as $node) {
/* For every <student> Tag as a separate node,
step into it's child node, and for each child,
echo the text content inside */
foreach($node->childNodes as $child) {
echo $child->textContent.'<br />';
}
}
?>
$elements->item(0)->childNodes->item(0)->nodeValue
First:
$elements
The current elements$ as parsed and referenced. In the code example, that would be:
$firstNameTags = $studentRoot->item($i)->getElementsByTagName('FirstName');
$firstNameTags->...
Next:
->item(0)
Get a reference to the first of the $elements item in the node list. Since this is zero-indexed, ->item(0) would get the first node in the list by index.
->childNodes
Get a list of the child nodes to that first $elements node referenced by ->item(0) above. As there is no (), this is a (read only) property of the DOMNodeList.
->item(0)
Again, get the first node in the list of child nodes by index.
->nodeValue
The value of the node itself.
If the form of the state alone:
$obj->method()->method()->prop
Confuses you, look into method chaining, which is what this uses to put all of those method calls together.
$ Note, you left off the s, but that indicates there's one or more possible by convention. So $element would be zero or one element reference, $elements might be zero, one or more in a collection of $element.

Categories