Trying to work with an external xml file, which is stacked like this:
<?xml version="1.0" encoding="UTF-8"?>
<merchandiser xsi:noNamespaceSchemaLocation="merchandiser.xsd" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<header>
<merchantId>44235</merchantId>
<merchantName>Feelunique (UK)</merchantName>
<createdOn>04/27/2020 00:05:33</createdOn>
</header>
<product part_number="99082" manufacturer_name="Sanctuary Spa" sku_number="99082" name="Sanctuary Spa Sleep Dream Easy Pillow Mist 100ml" product_id="15927186808">
<URL>
<product>https://click.linksynergy.com/link?id=y/LyuzvjryY&offerid=687217.15927186808&type=15&murl=https%3A%2F%2Fwww.feelunique.com%2Fp%2FSanctuary-Spa-Sleep-Dream-Easy-Pillow-Mist-100ml%26curr%3DGBP</product>
</URL>
</product>
</merchandiser>
As you can see the node <product> is used twice, and I need to grab an attribute from the first one, and the value in the second.
My code seems to jump straight to the second one by default and allows me to define the $xml->value of the second <product> node, but I can't seem to figure out how to separate the two in my code and get the attribute I need.
while($xml->read()) {
if($xml->nodeType == XMLReader::ELEMENT) {
if($xml->localName == 'header') {
$header = array();
}
if($xml->localName == 'merchantName') {
$xml->read();
$header['merchant'] = addslashes($xml->value);
}
if($xml->localName == 'product') {
$product = array();
$product['merchant'] = $header['merchant'];
$product['title'] = $xml->getAttribute('name');
}
if($xml->localName == 'product') {
$xml->read();
$product['link'] = $xml->value;
}
}
}
Can somebody point me in the right direction as to how I can achieve both values in my php code?
This isn't a complete solution, but just a demonstration of how to reach elements from each of the two product nodes - and you can modify it as needed:
$doc = new DOMDocument();
$doc->loadXML($xml);
$xpath = new DOMXpath($doc);
$product = $xpath->evaluate("//product/#name");
$link = $xpath->evaluate("//product//URL//product");
foreach ($product as $node1) {
foreach ($link as $node2){
echo trim($node2->nodeValue), PHP_EOL,trim($node1->nodeValue);
}}
Output:
https://click.linksynergy.com...
Sanctuary Spa Sleep Dream Easy Pillow Mist 100ml
XMLReader will just jump from node to node, and by the time you hit 'product', both your if statements will evaluated to true.
The only way you can know which product node you are in, is if you retain the information of it's parent.
Doing this with one big loop will be a pain. It's probably better to start a new function after the level-1 product opens and create a new loop to parse the 'product' subtree.
I wrote a library to help with this.
XMLReader (and expat) can be a great tool to parse large XML documents fast, but you need to learn algorithms how to traverse nested structures effectively. If you find that this is too hard to grasp, I would recommend a simpler XML parser like the DOM, or SimpleXML.
Related
I have an XML file that contains the following content.
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE article>
<article
xmlns="http://docbook.org/ns/docbook" version="5.0"
xmlns:xlink="http://www.w3.org/1999/xlink" >
<para>
This is an <emphasis role="strong">test</emphasis> sentence.
</para>
</article>
When I use
$xml_data = simplexml_load_string($filedata);
foreach ($xml_data['para'] as $data) {
echo $data;
}
I got This is an sentence.. But I want to get This is an <b>test<b> sentence. as result.
Instead of simplexml_load_string I'd recommend DOMDocument, but that is just a personal preference. A naïve implementation might just do a string replacement and that might totally work for you. However, since you've provided actual XML that even includes a NS I'm going to try to keep this as XML-centric as possible, while skipping XPath which could possibly be used, too.
This code loads the XML and walks every node. If it find a <para> element it walks all of the children of that node looking for an <emphasis> node, and if it finds one it replaces it with a new new that is a <b> node.
The replacement process is a little complex, however, because if we just use nodeValue we might lose any HTML that lives in there, so we need to walk the children of the <emphasis> node and clone those into our replacement node.
Because the source document has a NS, however, we also need to remove that from our final HTML. Since we are going from XML to HTML, I think that is a safe usage of a str_replace without going to crazy in the XML land for that.
The code should have enough comments to make sense, hopefully.
<?php
$filedata = <<<EOT
<?xml version="1.0" encoding="utf-8" ?>
<article
xmlns="http://docbook.org/ns/docbook" version="5.0"
xmlns:xlink="http://www.w3.org/1999/xlink" >
<para>
This is an <emphasis role="strong">hello <em>world</em></emphasis> sentence.
</para>
</article>
EOT;
$dom = new DOMDocument();
$dom->loadXML($filedata);
foreach($dom->documentElement->childNodes as $node){
if(XML_ELEMENT_NODE === $node->nodeType && 'para' === $node->nodeName){
// Replace any emphasis elements
foreach($node->childNodes as $childNode) {
if(XML_ELEMENT_NODE === $childNode->nodeType && 'emphasis' === $childNode->nodeName){
// This is arguably the most "correct" way to replace, just in case
// there's extra nodes inside. A cheaper way would be to not loop
// and just use the nodeValue however you might lose some HTML.
$newNode = $dom->createElement('b');
foreach($childNode->childNodes as $grandChild){
$newNode->appendChild($grandChild->cloneNode(true));
}
$childNode->replaceWith($newNode);
}
}
// Build our output
$output = '';
foreach($node->childNodes as $childNode) {
$output .= $dom->saveHTML($childNode);
}
// The provided XML has a namespace, and when cloning nodes that NS comes
// along. Since we are going from regular XML to irregular HTML I think
// a string replacement is best.
$output = str_replace(' xmlns="http://docbook.org/ns/docbook"', '', $output);
echo $output;
}
}
Demo here: https://3v4l.org/04Tc3#v8.0.23
NOTE: PHP 8 added replaceWith. If you are using PHP 7 or less you'd use replaceChild and just play around with things a bit.
What if you have the following XML?
<entry>
<para>This is the first text</para>
<emphasis>This is the second text</emphasis>
<para>This is the <emphasis>next</emphasis> text</para>
<itemizedlist>
<listitem>
<para>
This is an paragraph inside a list
</para>
</listitem>
<itemizedlist>
<listitem>
<para>
This is an paragraph inside a list inside a list
</para>
</listitem>
</itemizedlist>
</itemizedlist>
</entry>
using
if(XML_ELEMENT_NODE === $stuff2->nodeType && 'para' === $stuff2->nodeName){
$newNode = $dom->createElement('p');
foreach($stuff2->childNodes as $grandChild){
$newNode->appendChild($grandChild->cloneNode(true));
}
$stuff2->replaceWith($newNode);
}
if (XML_ELEMENT_NODE === $stuff2->nodeType && 'itemizedlist' === $stuff2->nodeName) {
$newNode = $dom->createElement('ul');
foreach($stuff2->childNodes as $grandChild){
$newNode->appendChild($grandChild->cloneNode(true));
}
$stuff2->replaceWith($newNode);
}
if(XML_ELEMENT_NODE === $stuff2->nodeType && 'emphasis' === $stuff2->nodeName){
$newNode = $dom->createElement('b');
foreach($stuff2->childNodes as $grandChild){
$newNode->appendChild($grandChild->cloneNode(true));
}
$stuff2->replaceWith($newNode);
}
if (XML_ELEMENT_NODE === $stuff2->nodeType && 'listitem' === $stuff2->nodeName) {
$newNode = $dom->createElement('li');
foreach($stuff2->childNodes as $grandChild){
$newNode->appendChild($grandChild->cloneNode(true));
}
$stuff2->replaceWith($newNode);
}
only results in
<p>This is the first text</p>
<emphasis>This is the second text</emphasis>
<para>This is the <emphasis>next</emphasis> text</para>
<itemizedlist>
<listitem>
<para>This is an paragraph inside a list</para>
</listitem>
<itemizedlist>
<listitem>
<para>This is an paragraph inside a list inside a list</para>
</listitem>
</itemizedlist>
</itemizedlist>
I want to delete those entries where the title matches my $titleArray.
My XML files looks like:
<products>
<product>
<title>Battlefield 1</title>
<url>https://www.google.de/</url>
<price>0.80</price>
</product>
<product>
<title>Battlefield 2</title>
<url>https://www.google.de/</url>
<price>180</price>
</product>
</products>
Here is my code but I don't think that it is working and my IDE says here $node->removeChild($product); -> "Expected DOMNode, got DOMNodeList"
What is wrong and how can I fix that?
function removeProduct($dom, $productTag, $pathXML, $titleArray){
$doc = simplexml_import_dom($dom);
$items = $doc->xpath($pathXML);
foreach ($items as $item) {
$node = dom_import_simplexml($item);
foreach ($titleArray as $title) {
if (mb_stripos($node->textContent, $title) !== false) {
$product = $node->parentNode->getElementsByTagName($productTag);
$node->removeChild($product);
}
}
}
}
Thank you and Greetings!
Most DOM methods that fetch nodes return a list of nodes. You can have several element nodes with the same name. So the result will a list (and empty list if nothing is found). You can traverse the list and apply logic to each node in the list.
Here are two problems with the approach. Removing nodes modifies the document. So you have to be careful not to remove a node that you're still using after that. It can lead to any kind of unexpected results. DOMNode::getElementsByTagName() returns a node list and it is a "live" result. If you remove the first node the list actually changes, not just the XML document.
DOMXpath::evaluate() solves two of the problems at the same time. The result is not "live" so you can iterate the result with foreach() and remove nodes. Xpath expressions allow for conditions so you can filter and fetch specific nodes. Unfortunately Xpath 1.0 has now lower case methods, but you can call back into PHP for that.
function isTitleInArray($title) {
$titles = [
'battlefield 2'
];
return in_array(mb_strtolower($title, 'UTF-8'), $titles);
}
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$xpath->registerNamespace("php", "http://php.net/xpath");
$xpath->registerPHPFunctions('isTitleInArray');
$expression = '//product[php:function("isTitleInArray", normalize-space(title))]';
foreach ($xpath->evaluate($expression) as $product) {
$product->parentNode->removeChild($product);
}
echo $document->saveXml();
Output:
<?xml version="1.0"?>
<products>
<product>
<title>Battlefield 1</title>
<url>https://www.google.de/</url>
<price>0.80</price>
</product>
</products>
I want to read this xml document:
<?xml version="1.0" encoding="UTF-8"?>
<tns:getPDMNumber xmlns:tns="http://www.testgroup.com/TestPDM" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.testgroup.com/TestPDM getPDMNumber.xsd ">
<tns:getPDMNumberResponse>
<tns:requestID>22222</tns:requestID>
<tns:pdmNumber>654321</tns:pdmNumber>
<tns:responseCode>0</tns:responseCode>
</tns:getPDMNumberResponse>
</tns:getPDMNumber>
I tried it this way:
$dom->load('response/17_getPDMNumberResponse.xml');
$nodes = $dom->getElementsByTagName("tns:requestID");
//$nodes = $dom->getElementsByTagName("tns:getPDMNumber");
//$nodes = $dom->getElementsByTagName("tns:getPDMNumberResponse");
foreach($nodes as $node)
{
$response=$node->getElementsByTagName("tns:getPDMNumber");
foreach($response as $info)
{
$test = $info->getElementsByTagName("tns:pdmNumber");
$pdm = $test->nodeValue;
}
}
the code never runs into the foreach loop.
Only for clarification my goal is to read the "tns:pdmNumber" node.
Have anybody a idea?
EDIT: I have also tried the commited lines.
The XML uses a namespace, so you should use the namespace aware methods. They have the suffix _NS.
$tns = 'http://www.testgroup.com/TestPDM';
$document = new DOMDocument();
$document->loadXml($xml);
foreach ($document->getElementsByTagNameNS($tns, "pdmNumber") as $node) {
var_dump($node->textContent);
}
Output:
string(6) "654321"
A better option is to use Xpath expression. They allow a more comfortable access to DOM nodes. In this case you have to register a prefix for the namespace that you can use in the Xpath expression:
$document = new DOMDocument();
$document->loadXml($xml);
$xpath = new DOMXpath($document);
$xpath->registerNamespace('t', 'http://www.testgroup.com/TestPDM');
var_dump(
$xpath->evaluate('string(/t:getPDMNumber/t:getPDMNumberResponse/t:pdmNumber)')
);
This:
$nodes = $dom->getElementsByTagName("tns:requestID");
you find all the requestID nodes, and try to loop on them. That's fine, but then you use that node as a basis to find any getPDMNumber nodes UNDER the requestID - but there's nothing - requestID is a terminal node. So
$response=$node->getElementsByTagName("tns:getPDMNumber");
finds nothing, and the inner loop has nothing to do.
It's like saying "Start digging a hole until you reach china. Once you reach China, keep digging until you reach Australia". But you can't keep digging - you've reached the "bottom", and the only thing deeper than China would be going into orbit.
This question already has answers here:
How to use XMLReader in PHP?
(7 answers)
Closed 6 years ago.
PHP developers here ??
I have a PHP function who parse an xml file (using DOMDocument, i'm proficien with this tool). I want to do the same with XMLReader, but i don't understand how XMLReader works...
I want to use XMLReader because it's a light tool.
Feel free to ask me others questions about my issue.
function getDatas($filepath)
{
$doc = new DOMDocument();
$xmlfile = file_get_contents($filepath);
$doc->loadXML($xmlfile);
$xmlcars = $doc->getElementsByTagName('car');
$mycars= [];
foreach ($xmlcars as $xmlcar) {
$car = new Car();
$car->setName(
$xmlcar->getElementsByTagName('id')->item(0)->nodeValue
);
$car->setBrand(
$xmlcar->getElementsByTagName('brand')->item(0)->nodeValue
);
array_push($mycars, $car);
}
return $mycars;
}
PS : I'm not a senior PHP dev.
Ahah Thanks.
This is a good example from this topic, I hope it helps you to understand.
$z = new XMLReader;
$z->open('data.xml');
$doc = new DOMDocument;
// move to the first <product /> node
while ($z->read() && $z->name !== 'product');
// now that we're at the right depth, hop to the next <product/> until the end of the tree
while ($z->name === 'product')
{
// either one should work
//$node = new SimpleXMLElement($z->readOuterXML());
$node = simplexml_import_dom($doc->importNode($z->expand(), true));
// now you can use $node without going insane about parsing
var_dump($node->element_1);
// go to next <product />
$z->next('product');
}
XMLReader does not, as far as I can tell, have some equivalent way of filtering by an element name. So the closest equivalent to this would be, as mentioned in rvbarreto's answer, to iterate through all elements using XMLReader->read() and grabbing the info you need when the element name matches what you are wanting.'
Alternatively, you might want to check out SimpleXML, which supports filtering using XPath expressions, as well as seeking to a node in the XML using the element structure like they are sub-objects of the main object. For instance, instead of using:
$xmlcar->getElementsByTagName('id')->item(0)->nodeValue;
You would use:
$xmlcar->id[0];
Assuming all of your car elements are at the first level of the XML document tree, the following should work as an example:
function getDatas($filepath) {
$carsData = new SimpleXMLElement($filepath, NULL, TRUE);
$mycars = [];
foreach($carsData->car as $xmlcar) {
$car = new Car();
$car->setName($xmlcar->id[0]);
$car->setBrand($xmlcar->id[0]);
$mycars[] = $car;
}
}
I am using php dom to parse xml from another platform, extract certain data from it, and upload to my own platform. I am however stuck when it comes to extracting a certain node value, only if another node value is greater than 0 for the child node 'row'. In the example below, I would like to iterate over the xml and pull out the 'affcustomid' value only if the CPACommission node value is greater than 0. Does anyone have any ideas how I can do this? The below code is a shortened version, in reality, i would get back 100's of rows in the same format as below.
<row>
<rowid>1</rowid>
<currencysymbol>€</currencysymbol>
<totalrecords>2145</totalrecords>
<affcustomid>11159_4498302</affcustomid>
<period>7/1/2014</period>
<impressions>0</impressions>
<clicks>1</clicks>
<clickthroughratio>0</clickthroughratio>
<downloads>1</downloads>
<downloadratio>1</downloadratio>
<newaccountratio>1</newaccountratio>
<newdepositingacc>1</newdepositingacc>
<newaccounts>1</newaccounts>
<firstdepositcount>1</firstdepositcount>
<activeaccounts>1</activeaccounts>
<activedays>1</activedays>
<newpurchases>12.4948</newpurchases>
<purchaccountcount>1</purchaccountcount>
<wageraccountcount>1</wageraccountcount>
<avgactivedays>1</avgactivedays>
<netrevenueplayer>11.8701</netrevenueplayer>
<Deposits>12.4948</Deposits>
<Bonus>0</Bonus>
<NetRevenue>11.8701</NetRevenue>
<TotalBetsHands>4</TotalBetsHands>
<Product1Bets>4</Product1Bets>
<Product1NetRevenue>11.8701</Product1NetRevenue>
<Product1Commission>30</Product1Commission>
<Commission>0</Commission>
<CPACommission>30</CPACommission>
</row>
Thanks in advance!
Mark
The easiest way to fetch data from an XML DOM is Xpath:
$dom = new DOMDocument();
$dom->load('file.xml');
$xpath = new DOMXpath($dom);
var_dump(
$xpath->evaluate('string(//row[CPACommission > 0]/affcustomid)')
);
It would be easier using SimpleXML:
$doc = simplexml_load_file('file.xml');
foreach ($doc->row AS $row) {
if($row->CPACommission > 0){
echo $row->affcustomid;
}
}
But if you still need to use DOMDocument:
$doc = new DOMDocument();
$doc->load('file.xml');
foreach ($doc->getElementsByTagName('row') AS $row) {
if($row->getElementsByTagName('CPACommission')->item(0)->textContent > 0){
echo $row->getElementsByTagName('affcustomid')->item(0)->textContent;
}
}