PHP and Xpath - Get node from inner text - php

I have the following XML structure
<url>
<loc>some-text</loc>
</url>
<url>
<loc>some-other-text</loc>
</url>
My goal is to get loc node from it's inner text (i.e. some-text) or a part of it (i.e. other-text). Here's my best attempt:
$doc = new DOMDocument('1.0','UTF-8');
$doc->load($filename);
$xpath = new Domxpath($doc);
$locs = $xpath->query('/url/loc');
foreach($locs as $loc) {
if(preg_match("/other-text/i", $loc->nodeValue)) return $loc->parentNode;
}
Is it possible to get specific loc node without iterating over all nodes, simply using xpath query?

Yes, you can use a query like //url/loc[contains(., "other-text")]
Example:
$xml = <<<'XML'
<root>
<url>
<loc>some-text</loc>
</url>
<url>
<loc>some-other-text</loc>
</url>
</root>
XML;
$dom = new DOMDocument();
$dom->loadXML($xml);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//url/loc[contains(., "other-text")]') as $node) {
echo $dom->saveXML($node);
}
Output:
<loc>some-other-text</loc>

Related

I want to remove a node in XML with PHP but it is not working [duplicate]

I am trying to develop a function that removes certain URL nodes from my sitemap file. Here is what I have so far.
$xpath = new DOMXpath($DOMfile);
$elements = $xpath->query("/urlset/url/loc[contains(.,'$pageUrl')]");
echo count($elements);
foreach($elements as $element){
//this is where I want to delete the URL
echo $element;
echo "here".$element->nodeValue;
}
Which outputs "111111". I don't know why I can't echo a string in a foreach loop if the $elements count is '1'.
Up until now, I've been doing
$urls = $dom->getElementsByTagName( "url" );
foreach( $urls as $url ){
$locs = $url->getElementsByTagName( "loc" );
$loc = $locs->item(0)->nodeValue;
echo $loc;
if($loc == $fullPageUrl){
$removeUrl = $dom->removeChild($url);
}
}
Which would work fine if my sitemap wasn't so big. It times out right now, so I'm hoping using xpath queries will be faster.
After Gordon's comment, I tried:
$xpath = new DOMXpath($DOMfile);
$query = sprintf('/urlset/url[./loc = "%d"]', $pageUrl);
foreach($xpath->query($query) as $element) {
//this is where I want to delete the URL
echo $element;
echo "here".$element->nodeValue;
}
And its not returning anything.
I tried going a step further and used codepad, using what was used in the other post mentioned, and did this:
<?php error_reporting(-1);
$xml = <<< XML <?xml version="1.0"
encoding="UTF-8" ?> <url>
<loc>professional_services</loc>
<loc>5professional_services</loc>
<loc>6professional_services</loc>
</url> XML;
$id = '5professional_services';
$dom = new DOMDocument; $dom->loadXML($xml);
$xpath = new DOMXPath($dom); $query = sprintf('/url/[loc = $id]');
foreach($xpath->query($query) as $record) {
$record->parentNode->removeChild($record);
}
echo $dom->saveXml();
and I'm getting a "Warning: DOMXPath::query(): Invalid expression" at the foreach loop line. Thanks for the other comment on the urlset, I'll be sure to include the double slashes in my code, tried it and it returned nothing.
XML from a sitemap should be :
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc></loc>
...
</url>
<url>
<loc></loc>
...
</url>
...
</urlset>
Since it got a namespace, the query is a little more complicated than my previous answer :
$xpath = new DOMXpath($DOMfile);
// Here register your namespace with a shortcut
$xpath->registerNamespace('sm', "http://www.sitemaps.org/schemas/sitemap/0.9");
// this request should work
$elements = $xpath->query('/sm:urlset/sm:url[sm:loc = "'.$pageUrl.'"]');
foreach($elements as $element){
// This is a hint from the manual comments
$element->parentNode->removeChild($element);
}
echo $DOMfile->saveXML();
I'm writing out of memory just before going to bed. If it doesn't work I'll go test tomorrow morning. (And yes, I'm aware that it could bring some downvotes)
If you don't have a namespace (you should but that's not an obligation sigh)
$elements = $xpath->query('/urlset/url[loc = "'.$pageUrl.'"]');
You got a concrete example that it's working here : http://codepad.org/vuGl1MAc

How to use php to remove an XML element [duplicate]

I am trying to develop a function that removes certain URL nodes from my sitemap file. Here is what I have so far.
$xpath = new DOMXpath($DOMfile);
$elements = $xpath->query("/urlset/url/loc[contains(.,'$pageUrl')]");
echo count($elements);
foreach($elements as $element){
//this is where I want to delete the URL
echo $element;
echo "here".$element->nodeValue;
}
Which outputs "111111". I don't know why I can't echo a string in a foreach loop if the $elements count is '1'.
Up until now, I've been doing
$urls = $dom->getElementsByTagName( "url" );
foreach( $urls as $url ){
$locs = $url->getElementsByTagName( "loc" );
$loc = $locs->item(0)->nodeValue;
echo $loc;
if($loc == $fullPageUrl){
$removeUrl = $dom->removeChild($url);
}
}
Which would work fine if my sitemap wasn't so big. It times out right now, so I'm hoping using xpath queries will be faster.
After Gordon's comment, I tried:
$xpath = new DOMXpath($DOMfile);
$query = sprintf('/urlset/url[./loc = "%d"]', $pageUrl);
foreach($xpath->query($query) as $element) {
//this is where I want to delete the URL
echo $element;
echo "here".$element->nodeValue;
}
And its not returning anything.
I tried going a step further and used codepad, using what was used in the other post mentioned, and did this:
<?php error_reporting(-1);
$xml = <<< XML <?xml version="1.0"
encoding="UTF-8" ?> <url>
<loc>professional_services</loc>
<loc>5professional_services</loc>
<loc>6professional_services</loc>
</url> XML;
$id = '5professional_services';
$dom = new DOMDocument; $dom->loadXML($xml);
$xpath = new DOMXPath($dom); $query = sprintf('/url/[loc = $id]');
foreach($xpath->query($query) as $record) {
$record->parentNode->removeChild($record);
}
echo $dom->saveXml();
and I'm getting a "Warning: DOMXPath::query(): Invalid expression" at the foreach loop line. Thanks for the other comment on the urlset, I'll be sure to include the double slashes in my code, tried it and it returned nothing.
XML from a sitemap should be :
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc></loc>
...
</url>
<url>
<loc></loc>
...
</url>
...
</urlset>
Since it got a namespace, the query is a little more complicated than my previous answer :
$xpath = new DOMXpath($DOMfile);
// Here register your namespace with a shortcut
$xpath->registerNamespace('sm', "http://www.sitemaps.org/schemas/sitemap/0.9");
// this request should work
$elements = $xpath->query('/sm:urlset/sm:url[sm:loc = "'.$pageUrl.'"]');
foreach($elements as $element){
// This is a hint from the manual comments
$element->parentNode->removeChild($element);
}
echo $DOMfile->saveXML();
I'm writing out of memory just before going to bed. If it doesn't work I'll go test tomorrow morning. (And yes, I'm aware that it could bring some downvotes)
If you don't have a namespace (you should but that's not an obligation sigh)
$elements = $xpath->query('/urlset/url[loc = "'.$pageUrl.'"]');
You got a concrete example that it's working here : http://codepad.org/vuGl1MAc

Creating attribute in domdocument

I have to make this type of XML :-
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.example.com/catalog?item=12&desc=vacation_hawaii</loc>
<changefreq>weekly</changefreq>
</url>
</urlset>
For which I have written this code,
$dom = new domDocument('1.0', 'utf-8');
$dom->formatOutput = true;
$rootElement = $dom->createElementNS('http://www.sitemaps.org/schemas/sitemap/0.9', 'urlset');
$sxe = simplexml_import_dom( $dom );
$urlMain = $sxe->addChild("url");
$loc = $urlMain->addChild("loc","http://www.example.com");
$lastmod = $urlMain->addChild("lastmod","$date");
$changefreq = $urlMain->addChild("changefreq","daily");
$priority = $urlMain->addChild("priority","1");
Everything works completely fine, but for some reason xmlns for urlset is not getting added. What might be wrong here?
Any suggestion would be helpful.
You need to append the root element to the document prior to conversion to simplexml:
$rootElement = $dom->createElementNS('http://www.sitemaps.org/schemas/sitemap/0.9', 'urlset');
$dom->appendChild($rootElement);
$sxe = simplexml_import_dom( $dom );

Using xPath for sitemap.xml

Here is the XML file content:
<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url id="first_url">
<loc>http://example.com</loc>
<lastmod>2014-05-21</lastmod>
</url>
</urlset>
And here goes the PHP code:
<?php
$dom = new DOMDocument('1.0', 'utf-8');
$dom->Load('sitemap.xml');
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//url[#id="first_url"]');
foreach($tags as $tag)
print $tag->getAttribute("id")."<br/>";
?>
This code does not work. But if I remove xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" from file, it works. Why it's so? Thanks!
It is because of the namespace. Here is way you can do it that ignores the name space:
Xpath 1.0:
//*[local-name()="url"][#id="first_url"]
Xpath 2.0:
//*:url[#id="first_url"]
Register the namespace using DOMXPath::registerNamespace
$xpath->registerNamespace("s",
"http://www.sitemaps.org/schemas/sitemap/0.9");
Then use it in your XPath:
$tags = $xpath->query('//s:url[#id="first_url"]');

Removing a node from xml file

I have an xml file
<?xml version="1.0"?>
<category>
<name>SWEATERS</name>
<name>WATCHES</name>
<name>PANTS</name>
<name>test</name>
<name>1</name>
</category>
How i can remove the node <name>test</name> using xpath ,xquery and php. I used this code
$name='test;
$xmlfile="config/shop_categories.xml";
$xml = simplexml_load_file($xmlfile);
$target = $xml->xpath('/category[name="'.trim($name).'"]');
print_r($target[0]);
if($target == false)
return;
$domRef = dom_import_simplexml($target[0]); //Select position 0 in XPath array
$domRef->parentNode->removeChild($domRef);
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($xml->asXML());
$dom->save($xmlfile);
But it is not working.
Pretty sure this is a duplicate, but am too lazy to find it. Here you go:
$xml = <<< XML
<?xml version="1.0"?>
<category>
<name>SWEATERS</name>
<name>WATCHES</name>
<name>PANTS</name>
<name>test</name>
<name>1</name>
</category>
XML;
$dom = new DOMDocument;
$dom->loadXML($xml);
$xPath = new DOMXPath($dom);
foreach($xPath->query('//name[text() = "test"]') as $node) {
$node->parentNode->removeChild($node);
}
echo $dom->saveXML();
Output:
<?xml version="1.0"?>
<category>
<name>SWEATERS</name>
<name>WATCHES</name>
<name>PANTS</name>
<name>1</name>
</category>

Categories