XML: Delete parent with simplexml_import_dom (PHP) - php

I want to delete those entries where the title matches my $titleArray.
My XML files looks like:
<products>
<product>
<title>Battlefield 1</title>
<url>https://www.google.de/</url>
<price>0.80</price>
</product>
<product>
<title>Battlefield 2</title>
<url>https://www.google.de/</url>
<price>180</price>
</product>
</products>
Here is my code but I don't think that it is working and my IDE says here $node->removeChild($product); -> "Expected DOMNode, got DOMNodeList"
What is wrong and how can I fix that?
function removeProduct($dom, $productTag, $pathXML, $titleArray){
$doc = simplexml_import_dom($dom);
$items = $doc->xpath($pathXML);
foreach ($items as $item) {
$node = dom_import_simplexml($item);
foreach ($titleArray as $title) {
if (mb_stripos($node->textContent, $title) !== false) {
$product = $node->parentNode->getElementsByTagName($productTag);
$node->removeChild($product);
}
}
}
}
Thank you and Greetings!

Most DOM methods that fetch nodes return a list of nodes. You can have several element nodes with the same name. So the result will a list (and empty list if nothing is found). You can traverse the list and apply logic to each node in the list.
Here are two problems with the approach. Removing nodes modifies the document. So you have to be careful not to remove a node that you're still using after that. It can lead to any kind of unexpected results. DOMNode::getElementsByTagName() returns a node list and it is a "live" result. If you remove the first node the list actually changes, not just the XML document.
DOMXpath::evaluate() solves two of the problems at the same time. The result is not "live" so you can iterate the result with foreach() and remove nodes. Xpath expressions allow for conditions so you can filter and fetch specific nodes. Unfortunately Xpath 1.0 has now lower case methods, but you can call back into PHP for that.
function isTitleInArray($title) {
$titles = [
'battlefield 2'
];
return in_array(mb_strtolower($title, 'UTF-8'), $titles);
}
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$xpath->registerNamespace("php", "http://php.net/xpath");
$xpath->registerPHPFunctions('isTitleInArray');
$expression = '//product[php:function("isTitleInArray", normalize-space(title))]';
foreach ($xpath->evaluate($expression) as $product) {
$product->parentNode->removeChild($product);
}
echo $document->saveXml();
Output:
<?xml version="1.0"?>
<products>
<product>
<title>Battlefield 1</title>
<url>https://www.google.de/</url>
<price>0.80</price>
</product>
</products>

Related

read xml content and add some numbers with php

I have an xml file but through php I would like to make some changes only for some products (each item has its own id.
Let me explain better on some products I would like to add the shipping cost with the price and at the item use grid put from 1 to 0.
<Products>
<Product>
<sku>35</sku>
<sku_manufacturer>test sku</sku_manufacturer>
<manufacturer>test manufacturer</manufacturer>
<ean>800000000000</ean>
<title><![CDATA[title test]]></title>
<description><![CDATA[description</description>
<product_price_vat_inc>8.08</product_price_vat_inc>
<shipping_price_vat_inc>4.99</shipping_price_vat_inc>
<quantity>2842</quantity>
<brand><![CDATA[Finder]]></brand>
<merchant_category><![CDATA[Home/test category]]></merchant_category>
<product_url><![CDATA[https://www.example.com]]></product_url>
<image_1><![CDATA[https://www.example.com]]></image_1>
<image_2><![CDATA[]]></image_2>
<image_3><![CDATA[]]></image_3>
<image_4><![CDATA[]]></image_4>
<image_5><![CDATA[]]></image_5>
<retail_price_vat_inc/>
<product_vat_rate>22</product_vat_rate>
<shipping_vat_rate>22</shipping_vat_rate>
<manufacturer_pdf/>
<ParentSKU/>
<parent_title/>
<Cross_Sell_Sku/>
<ManufacturerWarrantyTime/>
<use_grid>1</use_grid>
<carrier>DHL</carrier>
<shipping_time>2#3</shipping_time>
<carrier_grid_1>DHL</carrier_grid_1>
<shipping_time_carrier_grid_1>2#3</shipping_time_carrier_grid_1>
<carrier_grid_2/>
<shipping_time_carrier_grid_2/>
<carrier_grid_3/>
<shipping_time_carrier_grid_3/>
<carrier_grid_4/>
<shipping_time_carrier_grid_4/>
<carrier_grid_5/>
<shipping_time_carrier_grid_5/>
<DisplayWeight>0.050000</DisplayWeight>
<free_return/>
<min_quantity>1</min_quantity>
<increment>1</increment>
<sales>0</sales>
<eco_participation>0</eco_participation>
<shipping_price_supplement_vat_inc>0</shipping_price_supplement_vat_inc>
<Unit_count>-1.000000</Unit_count>
<Unit_count_type/>
</Product>
</Products>
You XML is a little large so let's strip it down for the example:
$xmlString = <<<'XML'
<Products>
<Product>
<sku>35</sku>
<title><![CDATA[title test]]></title>
<use_grid>1</use_grid>
</Product>
<Product>
<sku>42</sku>
<title><![CDATA[title test two]]></title>
<use_grid>1</use_grid>
</Product>
</Products>
XML;
DOM is a standard API for XML manipulation. PHP supports it and Xpath expressions for fetching nodes.
$document = new DOMDocument('1.0', "UTF-8");
// $document->load($xmlFile);
$document->loadXML($xmlString);
// $xpath for fetching node using expressions
$xpath = new DOMXpath($document);
// iterate "Product" nodes with a specific "sku" child
foreach ($xpath->evaluate('//Product[sku="35"]') as $product) {
// output sku and title for validation
var_dump(
$xpath->evaluate('string(sku)', $product),
$xpath->evaluate('string(title)', $product)
);
// iterate the "use_grid" child elements
foreach ($xpath->evaluate('./use_grid', $product) as $useGrid) {
// output current value
var_dump(
$useGrid->textContent
);
// change it
$useGrid->textContent = "0";
}
}
echo "\n\n", $document->saveXML();
Output:
string(2) "35"
string(10) "title test"
string(1) "1"
<?xml version="1.0"?>
<Products>
<Product>
<sku>35</sku>
<title><![CDATA[title test]]></title>
<use_grid>0</use_grid>
</Product>
<Product>
<sku>42</sku>
<title><![CDATA[title test two]]></title>
<use_grid>1</use_grid>
</Product>
</Products>
Xpath::evaluate()
Xpath::evaluate() fetches nodes using an Xpath expression. The result type depends on the expression. A location path like //Product[sku="35"] will return a list of nodes (DOMNodeList). However Xpath functions inside the can return a scalar value - string(sku) will return the text content of the first sku child node as a string or an empty string.
DOMNode::$textContent
Reading $node->textContent will return all the text inside a node - including inside descendant elements.
Writing it replaces the content while taking care of the escaping.

How to remove XML tag based on child attribute using php?

I have an XML like below
<entries>
<entry>
<title lang="en">Sample</title>
<entrydate>0</entrydate>
<contents>0</contents>
<entrynum>0</entrynum>
</entry>
<entry>
<title lang="fr">Sample</title>
<entrydate>1</entrydate>
<contents>1</contents>
<entrynum>1</entrynum>
</entry>
</entries>
Is there a way in PHP to delete the parent node (entry) based on the title lang attribute? I need to keep only the en ones, so in this case I would need to get the XML without the second entry node.
I tried looking around but couldn't find any solution...
You need to use DOMDocument class to parse string to XML document. Then use DOMXpath class to find target element in document and use DOMNode::removeChild() to remove selected element from document.
$doc = new DOMDocument();
$doc->loadXML($xml);
$xpath = new DOMXpath($doc);
// select target entry tag
$entry = $xpath->query("//entry[title[#lang='fr']]")->item(0);
// remove selected element
$entry->parentNode->removeChild($entry);
$xml = $doc->savexml();
You can check result in demo
You could also read your file and generated new one with your modification
<?php
$entries = array('title' => "What's For Dinner",
'link' => 'http://menu.example.com/',
'description' => 'Choose what to eat tonight.');
print "<entries>\n";
foreach ($entries as $element => $content) {
print " <$element>";
print htmlentities($content);
print "</$element>\n";
}
print "</entries>";
?>
Use the method described in this answer, i.e.
<?php
$xml = simplexml_load_file('1.xml');
$del_items = [];
foreach ($xml->entry as $e) {
$attr = $e->title->attributes();
if ($attr && $attr['lang'] != 'en') {
$del_items []= $e;
}
}
foreach ($del_items as $e) {
$dom = dom_import_simplexml($e);
$dom->parentNode->removeChild($dom);
}
echo $xml->asXML();
Output
<?xml version="1.0" encoding="UTF-8"?>
<entries>
<entry>
<title lang="en">Sample</title>
<entrydate>0</entrydate>
<contents>0</contents>
<entrynum>0</entrynum>
</entry>
</entries>
The items cannot be removed within the first loop, because otherwise we may break the iteration chain. Instead, we collect the entry objects into $del_items array, then remove them from XML in separate loop.

how to differentiate these two xml tags with childnodes

i have two tags in my sample xml as below,
<EmailAddresses>2</EmailAddresses>
<EmailAddresses>
<string>Allen.Patterson01#fantasyisland.com</string>
<string>Allen.Patterson12#fantasyisland.com</string>
</EmailAddresses>
how to differentiate these two xml tags based on the childnodes that means how to check that first tag has no childnodes and other one has using DOM php
Hope it will meet your requirement. Just copy,paste and run it. And change/add logic whatever you want.
<?php
$xmlstr = <<<XML
<?xml version='1.0' standalone='yes'?>
<email>
<EmailAddresses>2</EmailAddresses>
<EmailAddresses>
<string>Allen.Patterson01#fantasyisland.com</string>
<string>Allen.Patterson12#fantasyisland.com</string>
</EmailAddresses>
</email>
XML;
$email = new SimpleXMLElement($xmlstr);
foreach ($email as $key => $value) {
if(count($value)>1) {
var_dump($value);
//write your logic to process email strings
} else {
var_dump($value);
// count of emails
}
}
?>
You can use ->getElementsByTagName( 'string' ):
foreach( $dom->getElementsByTagName( 'EmailAddresses' ) as $node )
{
if( $node->getElementsByTagName( 'string' )->length )
{
// Code for <EmailAddresses><string/></EmailAddresses>
}
else
{
// Code for <EmailAddresses>2</EmailAddresses>
}
}
2 is considered as <EmailAddresses> child node, so in your XML ->haschildNodes() returns always True.
You have this problem due your weird XML structure conception.
If you don't have particular reason to maintain this XML syntax, I suggest you to use only one tag:
<EmailAddresses count="2">
<string>Allen.Patterson01#fantasyisland.com</string>
<string>Allen.Patterson12#fantasyisland.com</string>
</EmailAddresses>
Xpath allows you to do that.
$xml = <<<'XML'
<xml>
<EmailAddresses>2</EmailAddresses>
<EmailAddresses>
<string>Allen.Patterson01#fantasyisland.com</string>
<string>Allen.Patterson12#fantasyisland.com</string>
</EmailAddresses>
</xml>
XML;
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
var_dump(
$xpath->evaluate('number(//EmailAddresses[not(*)])')
);
foreach ($xpath->evaluate('//EmailAddresses/string') as $address) {
var_dump($address->textContent);
}
Output:
float(2)
string(35) "Allen.Patterson01#fantasyisland.com"
string(35) "Allen.Patterson12#fantasyisland.com"
The Expressions
Fetch the first EmailAddresses node without any element node child as a number.
Select any EmailAddresses element node:
//EmailAddresses
That does not contain another element node as child node:
//EmailAddresses[not(*)]
Cast the first of the fetched EmailAddresses nodes into a number:
number(//EmailAddresses[not(*)])
Fetch the string child nodes of the EmailAddresses element nodes.
Select any EmailAddresses element node:
//EmailAddresses
Get their string child nodes:
//EmailAddresses/string
In you example the first EmailAddresses seems to be duplicate information and stored in a weird way. Xpath can count nodes, too. The expression count(//EmailAddresses/string) would return the number of nodes.

xPath select attribute based on other attribute's value

I'm struggling with xPath for a while now and i thought i'd got the hang of it until now.
The strange thing is that if i test my pattern online it works, but when i run it locally it doesn't
I have the following XML
<Products>
<Product>
<Property Title="Toy" Text="Car" />
</Product>
</Products>
Now i want to replace all Car values with Bal so i came up with something like this:
$xml_src = 'feed.xml';
$document = new DOMDocument();
$document->load($xml_src);
$xpath = new DOMXpath($document);
foreach($xpath->evaluate('//Property[#Title="Toy"]/#Text') as $text){
$text->data = str_replace('Car', 'Bal', $text->data);
}
echo $document->saveXml();
But that doesn't do anything (i just get the whole feed with the original values), while the xPath pattern works on the site i mentioned above. I don't have a clue why
Your Xpath expression returns DOMAttr nodes. You will have to manipulate the $value property.
foreach($xpath->evaluate('//Property[#Title="Toy"]/#Text') as $text) {
$text->value = str_replace('Car', 'Bal', $text->value);
}
echo $document->saveXml();
A DOMAttr has a single child node that is a DOMText representing its value, but I don't think it is possible to address it with Xpath. Text nodes would be instances of DOMText, they would have a $data property.
You'd have to manuipulate the DOM, not just the representation, which in this case means using DOMElement::setAttribute
foreach($xpath->evaluate('//Property[#Title="Toy"]') as $elProperty) {
$elProperty->setAttribute(
'Text',
str_replace(
'Car',
'Bal',
$elProperty->getAttribute('Text')
)
);
}

php only appending 3 out of 5 child nodes

This code is only appending 3 of the 5 name nodes. Why is that?
Here is the original XML:
It has 5 name nodes.
<?xml version='1.0'?>
<products>
<product>
<itemId>531670</itemId>
<modelNumber>METRA ELECTRONICS/MOBILE AUDIO</modelNumber>
<categoryPath>
<category><name>Buy</name></category>
<category><name>Car, Marine & GPS</name></category>
<category><name>Car Installation Parts</name></category>
<category><name>Deck Installation Parts</name></category>
<category><name>Antennas & Adapters</name></category>
</categoryPath>
</product>
</products>
Then is run this PHP code. which is suppossed to appened ALL name nodes into the product node.
<?php
// load up your XML
$xml = new DOMDocument;
$xml->load('book.xml');
// Find all elements you want to replace. Since your data is really simple,
// you can do this without much ado. Otherwise you could read up on XPath.
// See http://www.php.net/manual/en/class.domxpath.php
//$elements = $xml->getElementsByTagName('category');
// WARNING: $elements is a "live" list -- it's going to reflect the structure
// of the document even as we are modifying it! For this reason, it's
// important to write the loop in a way that makes it work correctly in the
// presence of such "live updates".
foreach ($xml->getElementsByTagName('product') as $product ) {
foreach($product->getElementsByTagName('name') as $name ) {
$product->appendChild($name );
}
$product->removeChild($xml->getElementsByTagName('categoryPath')->item(0));
}
// final result:
$result = $xml->saveXML();
echo $result;
?>
The end result is this and it only appends 3 of the name nodes:
<?xml version="1.0"?>
<products>
<product>
<itemId>531670</itemId>
<modelNumber>METRA ELECTRONICS/MOBILE AUDIO</modelNumber>
<name>Buy</name>
<name>Antennas & Adapters</name>
<name>Car Installation Parts</name>
</product>
</products>
Why is it only appending 3 of the name nodes?
You can temporarily add the name elements to an array before appending them, owing to the fact that you're modifying the DOM in real time. The node list generated by getElementsByTagName() may change as you are moving nodes around (and indeed that appears to be what's happening).
<?php
// load up your XML
$xml = new DOMDocument;
$xml->load('book.xml');
// Array to store them
$append = array();
foreach ($xml->getElementsByTagName('product') as $product ) {
foreach($product->getElementsByTagName('name') as $name ) {
// Stick $name onto the array
$append[] = $name;
}
// Now append all of them to product
foreach ($append as $a) {
$product->appendChild($a);
}
$product->removeChild($xml->getElementsByTagName('categoryPath')->item(0));
}
// final result:
$result = $xml->saveXML();
echo $result;
?>
Output, with all values appended:
<?xml version="1.0"?>
<products>
<product>
<ItemId>531670</ItemId>
<modelNumber>METRA ELECTRONICS/MOBILE AUDIO</modelNumber>
<name>Buy</name><name>Car, Marine & GPS</name><name>Car Installation Parts</name><name>Deck Installation Parts</name><name>Antennas & Adapters</name></product>
</products>
You're modifying the DOM tree as you're pulling results from it. Any modifications to the tree that cover the results of a previous query operation (your getElementsByTagName) invalidate those results, so you're getting undefined results. This is especially true of operations that add/remove nodes.
You're moving nodes as you're iterating through them so 2 are being skipped. I'm not a php guy so I can't give you the code to do this, but what you need to do is build a collection of the name nodes and iterate through that collection in reverse.
A less complicated way to do it is to manipulate the nodes with insertBefore
foreach($xml->getElementsByTagName('name') as $node){
$gp = $node->parentNode->parentNode;
$ggp = $gp->parentNode;
// move the node above gp without removing gp or parent
$ggp->insertBefore($node,$gp);
}
// remove the empty categoryPath node
$ggp->removeChild($gp);

Categories