XML reforming with DOM - php

I am trying to reformat XML adding intermediate level node.
Here is what I have as input:
<channel>
<item>
<title>Advanced PHP Book</title>
</item>
<item>
<title>MySQL primer</title>
</item>
<item>
<title>C++ for beginners</title>
</item>
</channel>
I need it to be like that at the end (page node added between channel and item):
<channel>
<page>
<item>
<title>Advanced PHP Book</title>
</item>
<item>
<title>MySQL primer</title>
</item>
<item>
<title>C++ for beginners</title>
</item>
</page>
</channel>
Here is my testing code:
$sxe = simplexml_load_string($string);
$dom_sxe = dom_import_simplexml($sxe);
$dom = new DOMDocument('1.0');
$channel = $dom->appendChild($dom->createElement('channel'));
$page = $channel->appendChild($dom->createElement('page'));
$dom_sxe = $dom->importNode($dom_sxe, true);
$dom_sxe = $page->appendChild($dom_sxe);
$dom->formatOutput = true;
echo $dom->saveXML();
The problem I have is that channel element is doubled.
Please help.

I don't think this should be too hard: I think you're overcomplicating it by using the simplexml stuff.
$dom = new DOMDocument;
$dom->loadXML($string);
// create the <page> element
$page = $dom->createElement('page');
while ($dom->firstChild->firstChild) {
// move the items in <channel> to the <page> element
$page->appendChild($dom->firstChild->firstChild);
}
// insert the <page> element into <channel>
$dom->firstChild->appendChild($page);
$dom->saveXML();

$xml = '<channel> <item> <title>Advanced PHP Book</title> </item> <item> <title>MySQL primer</title> </item> <item> <title>C++ for beginners</title> </item> </channel>';
$dom = new DOMDocument;
$dom->loadXML($xml);
$page = $dom->createElement('page');
$items = $dom->getElementsByTagName('item');
while ($items->length) {
$page->appendChild($items->item(0));
}
$dom->getElementsByTagName('channel')->item(0)->appendChild($page);
echo $dom->saveXML();
Output
<?xml version="1.0"?>
<channel> <page><item> <title>Advanced PHP Book</title> </item><item> <title>MySQL primer</title> </item><item> <title>C++ for beginners</title> </item></page></channel>
See it.

Related

How to edit large XML files in PHP based on a record in the XML Node

I'm trying to modify a 130mb+ XML file via PHP so it only shows the results where a child node is a specific value. I'm trying to filter this because of limitations via the software we're using to import the XML into our website.
Example: (mockup data)
<Items>
<Item>
<Barcode>...</Barcode>
<BrandCode>...</BrandCode>
<Title>...</Title>
<Content>...</Content>
<ShowOnWebsite>false</BrandDescr>
</Item>
<Item>
<Barcode>...</Barcode>
<BrandCode>...</BrandCode>
<Title>...</Title>
<Content>...</Content>
<ShowOnWebsite>true</BrandDescr>
</Item>
<Item>
<Barcode>...</Barcode>
<BrandCode>...</BrandCode>
<Title>...</Title>
<Content>...</Content>
<ShowOnWebsite>false</BrandDescr>
</Item>
</Items>
Desired result:
I want to create a new XML file with only the records where the child "ShowOnWebsite" is true.
Problems I've run into
Because the XML is so large simple solutions like using SimpleXML or loading the XML into the body and editing the nodes in there don't work. Because they all read the entire file into memory which is too slow and usually fails.
I've also looked at prewk/xml-string-streamer (https://github.com/prewk/xml-string-streamer) which is great for streaming large XML files because it doesn't place them in memory, although I can't find any way to modify the XML via that solution. (Other online posts say you need to have the nodes in memory to edit them).
Anyone got an idea on how to tackle this problem?
Goal
Desired result: I want to create a new XML file with only the records where the child "ShowOnWebsite" is true.
Given
test.xml
<Items>
<Item>
<Barcode>...</Barcode>
<BrandCode>...</BrandCode>
<Title>...</Title>
<Content>...</Content>
<ShowOnWebsite>false</ShowOnWebsite>
</Item>
<Item>
<Barcode>...</Barcode>
<BrandCode>...</BrandCode>
<Title>...</Title>
<Content>...</Content>
<ShowOnWebsite>true</ShowOnWebsite>
</Item>
<Item>
<Barcode>...</Barcode>
<BrandCode>...</BrandCode>
<Title>...</Title>
<Content>...</Content>
<ShowOnWebsite>false</ShowOnWebsite>
</Item>
</Items>
Code
This is the implementation I wrote. The getItems yields the childs without loading the xml at once into the memory.
function getItems($fileName) {
if ($file = fopen($fileName, "r")) {
$buffer = "";
$active = false;
while(!feof($file)) {
$line = fgets($file);
$line = trim(str_replace(["\r", "\n"], "", $line));
if($line == "<Item>") {
$buffer .= $line;
$active = true;
} elseif($line == "</Item>") {
$buffer .= $line;
$active = false;
yield new SimpleXMLElement($buffer);
$buffer = "";
} elseif($active == true) {
$buffer .= $line;
}
}
fclose($file);
}
}
$output = new SimpleXMLElement('<?xml version="1.0" encoding="utf-8"?><Items></Items>');
foreach(getItems("test.xml") as $element)
{
if($element->ShowOnWebsite == "true") {
$item = $output->addChild('Item');
$item->addChild('Barcode', (string) $element->Barcode);
$item->addChild('BrandCode', (string) $element->BrandCode);
$item->addChild('Title', (string) $element->Title);
$item->addChild('Content', (string) $element->Content);
$item->addChild('ShowOnWebsite', $element->ShowOnWebsite);
}
}
$fileName = __DIR__ . "/test_" . rand(100, 999999) . ".xml";
$output->asXML($fileName);
Output
<?xml version="1.0" encoding="utf-8"?>
<Items><Item><Barcode>...</Barcode><BrandCode>...</BrandCode><Title>...</Title><Content>...</Content><ShowOnWebsite>true</ShowOnWebsite></Item></Items>
XMLReader has an expand() method, but XMLWriter is missing the counterpart. So I added a XMLWriter::collapse() method in FluentDOM.
This allows to read the XML with XMLReader, expand it to DOM, use DOM methods to filter/manipulate the it and write it back with XMLWriter:
require __DIR__.'/../../vendor/autoload.php';
// Create the target writer and add the root element
$writer = new \FluentDOM\XMLWriter();
$writer->openUri('php://stdout');
$writer->setIndent(2);
$writer->startDocument();
$writer->startElement('Items');
// load the source into a reader
$reader = new \FluentDOM\XMLReader();
$reader->open(getXMLAsURI());
// iterate the Item elements - the iterator expands them into a DOM node
foreach (new FluentDOM\XMLReader\SiblingIterator($reader, 'Item') as $item) {
/** #var \FluentDOM\DOM\Element $item */
// only "ShowOnWebsite = true"
if ($item('ShowOnWebsite = "true"')) {
// write expanded node to the output
$writer->collapse($item);
}
}
$writer->endElement();
$writer->endDocument();
function getXMLAsURI() {
$xml = <<<'XML'
<Items>
<Item>
<Barcode>...</Barcode>
<BrandCode>...</BrandCode>
<Title>...</Title>
<Content>...</Content>
<ShowOnWebsite>false</ShowOnWebsite>
</Item>
<Item>
<Barcode>...</Barcode>
<BrandCode>...</BrandCode>
<Title>...</Title>
<Content>...</Content>
<ShowOnWebsite>true</ShowOnWebsite>
</Item>
<Item>
<Barcode>...</Barcode>
<BrandCode>...</BrandCode>
<Title>...</Title>
<Content>...</Content>
<ShowOnWebsite>false</ShowOnWebsite>
</Item>
</Items>
XML;
return 'data://text/plain;base64,'.base64_encode($xml);
}

How to parse xml file using php script [duplicate]

This question already has answers here:
Simple XML - Dealing With Colons In Nodes
(4 answers)
Closed 5 years ago.
This is may xml file
<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0">
<channel>
<title>werwer</title>
<link>werwerwe</link>
<item>
<g:id>704667</g:id>
<title>Nike</title>
<description>erterterter</description>
</item>
<item>
<g:id>4456456</g:id>
<title>Nike</title>
<description>erterterter</description>
</item>
</channel></rss>
how to parse that xml file, I have script but it doesnt work
if (file_exists('products.xml')) {
$xml = simplexml_load_file('products.xml');
print_r($xml);
} else {
exit('Failed to open products.xml.');
}
any idea how to get information between g:id?
You need to get an apply the namespace to the children. http://php.net/manual/en/simplexmlelement.getnamespaces.php
<?php
$xml = '<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0">
<channel>
<title>werwer</title>
<link>werwerwe</link>
<item>
<g:id>704667</g:id>
<title>Nike</title>
<description>erterterter</description>
</item>
<item>
<g:id>4456456</g:id>
<title>Nike</title>
<description>erterterter</description>
</item>
</channel>
</rss>';
$xml = simplexml_load_string($xml);
$ns = $xml->getNamespaces(true);
foreach ($xml->channel->item as $item) {
echo $item->children($ns['g'])->id.PHP_EOL;
}
/*
704667
4456456
*/
https://3v4l.org/t2D84

Delete Selected Items From Google Merchant XML

i want to remove g:price=0 OR out of stock OR no image ITEMS from my Google Merchant xml feed by PHP.
i'm trying for hours and hours; but could not find a solution yet..
example: (if i have xml like this; the new xml must list only the second item)
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">
<channel>
<title><![CDATA[example title]]></title>
<link><![CDATA[http://www.example.com]]></link>
<description><![CDATA[example description]]></description>
<item>
<g:additional_image_link><![CDATA[]]></g:additional_image_link>
<g:image><![CDATA[]]></g:image>
<g:availability><![CDATA[out of stock]]></g:availability>
<g:price>0.00 TRY</g:price>
</item>
<item>
<g:image><![CDATA[http://www.example.com/image.jpg]]></g:image>
<g:availability><![CDATA[in stock]]></g:availability>
<g:price>100.00 TRY</g:price>
</item>
</channel>
</rss>
Could someone help me? Expected output is this:
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">
<channel>
<title><![CDATA[example title]]></title>
<link><![CDATA[http://www.example.com]]></link>
<description><![CDATA[example description]]></description>
<item>
<g:image><![CDATA[http://www.example.com/image.jpg]]></g:image>
<g:availability><![CDATA[in stock]]></g:availability>
<g:price>100.00 TRY</g:price>
</item>
</channel>
</rss>
Here we are using DOMDocument for extracting nodes and removing un-required nodes.
Try this code snippet here
<?php
ini_set('display_errors', 1);
libxml_use_internal_errors(true);
$string = <<<XML
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">
<channel>
<title><![CDATA[example title]]></title>
<link><![CDATA[http://www.example.com]]></link>
<description><![CDATA[example description]]></description>
<item>
<g:additional_image_link><![CDATA[]]></g:additional_image_link>
<g:image><![CDATA[]]></g:image>
<g:availability><![CDATA[out of stock]]></g:availability>
<g:price>0.00 TRY</g:price>
</item>
<item>
<g:image><![CDATA[http://www.example.com/image.jpg]]></g:image>
<g:availability><![CDATA[in stock]]></g:availability>
<g:price>100.00 TRY</g:price>
</item>
</channel>
</rss>
XML;
$array = array("g:image", "g:price", "g:availability");
$domObject = new DOMDocument();
$domObject->loadXML($string);
$results = $domObject->getElementsByTagName("item");
$nodesToRemove = array();
foreach ($results as $node)
{
foreach ($node->childNodes as $innerNode)
{
if ($innerNode instanceof DOMElement && in_array($innerNode->tagName, $array))
{
if ($innerNode->tagName == "g:image" && empty($innerNode->textContent))
{
$nodesToRemove[] = $innerNode->parentNode;
break;
} elseif ($innerNode->tagName == "g:price" && preg_match("/\b0+(\.[0]+)\b/", $innerNode->textContent))
{
$nodesToRemove[] = $innerNode->parentNode;
break;
} elseif ($innerNode->tagName == "g:availability" && $innerNode->textContent == "out of stock")
{
$nodesToRemove[] = $innerNode->parentNode;
break;
}
}
}
}
foreach ($nodesToRemove as $node)
{
$node->parentNode->removeChild($node);
}
echo $domObject->saveXML();
Output:
<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:g="http://base.google.com/ns/1.0" version="2.0">
<channel>
<title><![CDATA[example title]]></title>
<link><![CDATA[http://www.example.com]]></link>
<description><![CDATA[example description]]></description>
<item>
<g:image><![CDATA[http://www.example.com/image.jpg]]></g:image>
<g:availability><![CDATA[in stock]]></g:availability>
<g:price>100.00 TRY</g:price>
</item>
</channel>
</rss>

xml DOM : delete element with condition

May be the question is already answered in a way or in another in many questions, but since I'm a new bie in XML, I can't figured it out in my project.
I have an RSS (XML) file with this structure:
<rss>
<channel>
<item>
<title>some title</title>
<description> some descrp </description>
...
</item>
</channel>
</rss>
How can I, in PHP, delete some item when the title is equal to some value? THanks.
EDIT1 : I have my XML file stored at my web server.
$rss = "
<rss>
<channel>
<item>
<title>some title</title>
<description> some descrp </description>
</item>
<item>
<title>some other title</title>
<description> some descrp </description>
</item>
</channel>
</rss>
";
$doc = new DOMDocument();
$doc->loadXML($rss);
$xpath = new DOMXPath($doc);
$els = $xpath->query('//title[text()="some title"]');
foreach($els as $el)
{
$parent = $el->parentNode;
$parent->parentNode->removeChild($parent);
}
echo $doc->saveXML();
It searches for exact match.
ps: another method, without xpath
$doc = new DOMDocument();
$doc->loadXML($rss);
$els = $doc->getElementsByTagName('title');
for($i = $els->length-1; $i >= 0; $i--)
{
$el = $els->item($i);
if ($el->nodeValue == 'some title')
{
$parent = $el->parentNode;
$parent->parentNode->removeChild($parent);
}
}
echo $doc->saveXML();

Generate XML with namespace URI in PHP

Ho to make this "simple" xml with php using DOM? Full code will be welcomed.
<rss version="2.0"
xmlns:wp="http://url.com"
xmlns:dc="http://url2.com"
>
<channel>
<items>
<sometags></sometags>
<wp:status></wp:status>
</items>
</channel>
</rss>
i'm so lost. Code will help me more than any explanation.
Here is the code:
<?php
$dom = new DOMDocument('1.0', 'utf-8');
$rss = $dom->createElement('rss');
$dom->appendChild($rss);
$version = $dom->createAttribute('version');
$rss->appendChild($version);
$value = $dom->createTextNode('2.0');
$version->appendChild($value);
$xmlns_wp = $dom->createAttribute('xmlns:wp');
$rss->appendChild($xmlns_wp);
$value = $dom->createTextNode('http://url.com');
$xmlns_wp->appendChild($value);
$xmlns_dc = $dom->createAttribute('xmlns:dc');
$rss->appendChild($xmlns_dc);
$value = $dom->createTextNode('http://url2.com');
$xmlns_dc->appendChild($value);
$channel = $dom->createElement('channel');
$rss->appendChild($channel);
$items = $dom->createElement('items');
$channel->appendChild($items);
$sometags = $dom->createElement('sometags', '');
$items->appendChild($sometags);
$wp_status = $dom->createElement('wp:status', '');
$items->appendChild($wp_status);
echo $dom->saveXML();
?>
It outputs:
<?xml version="1.0" encoding="utf-8"?>
<rss
version="2.0"
xmlns:wp="http://url.com"
xmlns:dc="http://url2.com"
>
<channel>
<items>
<sometags></sometags>
<wp:status></wp:status>
</items>
</channel>
</rss>
Fore more help: http://us2.php.net/manual/en/book.dom.php
If you want "simple" then use SimpleXML, not DOM. Note that SimpleXML won't indent the XML.
$rss = simplexml_load_string('<rss version="2.0" xmlns:wp="http://url.com" xmlns:dc="http://url2.com" />');
$channel = $rss->addChild('channel');
$items = $channel->addChild('items');
$sometags = $items->addChild('sometags');
$status = $items->addChild('status', null, 'http://url.com');
echo $rss->asXML();

Categories