Creating attribute in domdocument - php

I have to make this type of XML :-
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.example.com/catalog?item=12&desc=vacation_hawaii</loc>
<changefreq>weekly</changefreq>
</url>
</urlset>
For which I have written this code,
$dom = new domDocument('1.0', 'utf-8');
$dom->formatOutput = true;
$rootElement = $dom->createElementNS('http://www.sitemaps.org/schemas/sitemap/0.9', 'urlset');
$sxe = simplexml_import_dom( $dom );
$urlMain = $sxe->addChild("url");
$loc = $urlMain->addChild("loc","http://www.example.com");
$lastmod = $urlMain->addChild("lastmod","$date");
$changefreq = $urlMain->addChild("changefreq","daily");
$priority = $urlMain->addChild("priority","1");
Everything works completely fine, but for some reason xmlns for urlset is not getting added. What might be wrong here?
Any suggestion would be helpful.

You need to append the root element to the document prior to conversion to simplexml:
$rootElement = $dom->createElementNS('http://www.sitemaps.org/schemas/sitemap/0.9', 'urlset');
$dom->appendChild($rootElement);
$sxe = simplexml_import_dom( $dom );

Related

generating sitemap with DOMDocument : missing AttributeNode in the output

im trying to generate a sitemap.xml , here is simplified version of my code
$dom = new \DOMDocument();
$dom->encoding = 'utf-8';
$dom->xmlVersion = '1.0';
$dom->formatOutput = true;
$xml_file_name = './sitemap.xml';
$urlset = $dom->createElement('urlset');
$attr_ = new \DOMAttr('xmlns:xsi', "http://www.w3.org/2001/XMLSchema-instance");
$urlset->setAttributeNode($attr_);
$url_node = $dom->createElement('url');
$url_node_loc = $dom->createElement('loc', 'http://localhost' );
$url_node->appendChild($url_node_loc);
$url_node_lastmod = $dom->createElement('lastmod', '2021-08-03T22:17:47+04:30' );
$url_node->appendChild($url_node_lastmod);
$urlset->appendChild($url_node);
$dom->appendChild($urlset);
$dom->save($xml_file_name);
dd('done');
here is the output in my sitemap.xml
This XML file does not appear to have any style information associated with it. The document tree is shown below.
<urlset>
<url>
<loc>http://localhost</loc>
<lastmod>2021-08-03T22:17:47+04:30</lastmod>
</url>
</urlset>
i need to add some attributes to my urlset tag , here is how i've did it
$attr_ = new \DOMAttr('xmlns:xsi', "http://www.w3.org/2001/XMLSchema-instance");
$urlset->setAttributeNode($attr_);
but for some reason this doesn't show up in my sitemap file , urlset has no attributes
Use setAttribute() instead of setAttributeNode().
$urlset->setAttribute('xmlns:xsi', 'http://www.w3.org/2001/XMLSchema-instance');
This will not be a valid sitemap. Sitemaps use an XML namespace (https://www.sitemaps.org/protocol.html)
To create nodes with namespaces you should use the namespace aware DOM methods with the *NS suffix. This will add namespace definitions as needed.
xmlns:xsi is a namespace definition. They can be considered attributes nodes in the reserved namespace {http://www.w3.org/2000/xmlns/}.
$xmlns = [
'sitemap' => 'http://www.sitemaps.org/schemas/sitemap/0.9',
'xmlns' => 'http://www.w3.org/2000/xmlns/',
'xsi' => 'http://www.w3.org/2001/XMLSchema-instance',
];
$document = new \DOMDocument('1.0', 'utf-8');
$document->formatOutput = true;
$urlset = $document->appendChild(
$document->createElementNS($xmlns['sitemap'], 'urlset')
);
// explict namespace definition
$urlset->setAttributeNS(
$xmlns['xmlns'], 'xmlns:xsi', $xmlns['xsi']
);
$url_node = $urlset->appendChild(
$document->createElementNS($xmlns['sitemap'], 'url')
);
$url_node
->appendChild($document->createElementNS($xmlns['sitemap'], 'loc'))
->textContent = 'http://localhost';
$url_node
->appendChild($document->createElementNS($xmlns['sitemap'], 'lastmod'))
->textContent = '2021-08-03T22:17:47+04:30';
echo $document->saveXML();
Output:
<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<url>
<loc>http://localhost</loc>
<lastmod>2021-08-03T22:17:47+04:30</lastmod>
</url>
</urlset>

Add string to sitemap.xml file inside <urlset> tag in PHP

I tried to add a string into the sitemap.xml file inside <urlset> tag, but it stores differently.
<?php
$date_mod = date('Y-m-d');
$sitemap = "<url>
<loc>http://www.website.com/article.php?page=3</loc>
<lastmod>$date_mod</lastmod>
<priority>0</priority>
</url>";
$xml = simplexml_load_file("sitemap.xml");
$xml->addChild($sitemap);
file_put_contents("sitemap.xml", $xml->asXML());
?>
The output is like:
<?xml version="1.0"?>
<urlset>
<url>
<loc>http://www.website.com/article.php?page=3</loc>
<lastmod>2018-01-12</lastmod>
<priority>0</priority>
</url>
<//www.website.com/article.php?page=3</loc>
<lastmod>2018-01-12</lastmod>
<priority>0</priority>
</url>/></urlset>
Please help me.
If the raw xml is like this:
<?xml version="1.0"?>
<urlset>
</urlset>
And you updated xml is like this:
<?xml version="1.0"?>
<urlset>
<url>
<loc>http://www.website.com/article.php?page=3</loc>
<lastmod>2018-01-12</lastmod>
<priority>0</priority>
</url>
</urlset>
Then you could refer to the following code:
<?php
$date_mod = date('Y-m-d');
$sitemap = "<url>
<loc>http://www.website.com/article.php?page=3</loc>
<lastmod>$date_mod</lastmod>
<priority>0</priority>
</url>";
$sitemap_node =simplexml_load_string($sitemap);
$xml = simplexml_load_file("sitemap.xml");
sxml_append($xml,$sitemap_node);
$xml->asXML('sitemap.xml');
function sxml_append(SimpleXMLElement $to, SimpleXMLElement $from) {
$toDom = dom_import_simplexml($to);
$fromDom = dom_import_simplexml($from);
$toDom->appendChild($toDom->ownerDocument->importNode($fromDom, true));
}
?>
Your previous code failed to do that is because addChild method can only deal with text (and stil has some drawbacks), not another xml object.

PHP and Xpath - Get node from inner text

I have the following XML structure
<url>
<loc>some-text</loc>
</url>
<url>
<loc>some-other-text</loc>
</url>
My goal is to get loc node from it's inner text (i.e. some-text) or a part of it (i.e. other-text). Here's my best attempt:
$doc = new DOMDocument('1.0','UTF-8');
$doc->load($filename);
$xpath = new Domxpath($doc);
$locs = $xpath->query('/url/loc');
foreach($locs as $loc) {
if(preg_match("/other-text/i", $loc->nodeValue)) return $loc->parentNode;
}
Is it possible to get specific loc node without iterating over all nodes, simply using xpath query?
Yes, you can use a query like //url/loc[contains(., "other-text")]
Example:
$xml = <<<'XML'
<root>
<url>
<loc>some-text</loc>
</url>
<url>
<loc>some-other-text</loc>
</url>
</root>
XML;
$dom = new DOMDocument();
$dom->loadXML($xml);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//url/loc[contains(., "other-text")]') as $node) {
echo $dom->saveXML($node);
}
Output:
<loc>some-other-text</loc>

Using xPath for sitemap.xml

Here is the XML file content:
<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url id="first_url">
<loc>http://example.com</loc>
<lastmod>2014-05-21</lastmod>
</url>
</urlset>
And here goes the PHP code:
<?php
$dom = new DOMDocument('1.0', 'utf-8');
$dom->Load('sitemap.xml');
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//url[#id="first_url"]');
foreach($tags as $tag)
print $tag->getAttribute("id")."<br/>";
?>
This code does not work. But if I remove xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" from file, it works. Why it's so? Thanks!
It is because of the namespace. Here is way you can do it that ignores the name space:
Xpath 1.0:
//*[local-name()="url"][#id="first_url"]
Xpath 2.0:
//*:url[#id="first_url"]
Register the namespace using DOMXPath::registerNamespace
$xpath->registerNamespace("s",
"http://www.sitemaps.org/schemas/sitemap/0.9");
Then use it in your XPath:
$tags = $xpath->query('//s:url[#id="first_url"]');

Format PHP simpleXML file

Im creating simple PHP code to generate sitemap.xml file, but the problem is, that the file is unformatted. If the file was smaller, I could look at it in browser, but it has about 1,5 MB and Chrome just gives up while loading it (IE loads it and you can view it, but its lagging really hard).
I googled and tried different solutions (even from this site), but none of them worked, so Im humbly asking for your help. Also Im using the newest PHP version and my server runs on Unix based OS if thats important.
Here is code Im using (there are couple thousands (15.000+) rows in the table, so XML file is pretty long:
$xml = '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">';
$xml .= "<url>";
$xml .= "<loc>MY.URL.COM/</loc>";
$xml .= "<changefreq>always</changefreq>";
$xml .= "<priority>1.00</priority>";
$xml .= "</url>";
$xml .= "<url>";
$xml .= "<loc>MY.URL.COM/upload.php</loc>";
$xml .= "<changefreq>weekly</changefreq>";
$xml .= "<priority>0.80</priority>";
$xml .= "</url>";
$select = dbquery("SELECT * FROM MY TABLE");
while ($data = dbarray($select)) {
$xml .= "<url>";
$xml .= "<loc>MY.URL.COM/?id=".$data['id']."</loc>";
$xml .= "<changefreq>daily</changefreq>";
$xml .= "<priority>0.50</priority>";
$xml .= "</url>";
}
$xml .= '</urlset>';
$sxml = new SimpleXMLElement($xml);
$dom = new DOMDocument('1.0');
$dom->formatOutput = true;
$dom->loadXML($sxml->asXML());
$dom->saveXML("sitemap.xml");
Also, every time after new page is added Im running this code to append it to sitemap file and its appending it into one row, so need to fix that too.
Im trying both codes separately, so Im sure the first one doesnt format the document at all.
$sitemap = simplexml_load_file("sitemap.xml");
$url = $sitemap->addChild('url');
$url->addChild("loc", "MY.URL.COM/?id=".$get_id['id']);
$url->addChild("changefreq", "daily");
$url->addChild("priority", "0.50");
$sitemap->asXML("sitemap.xml");
By Unreadable I mean its saved in one line like this:
<?xml version="1.0"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url><loc>MY.URL.COM?id=1</loc><changefreq>daily</changefreq><priority>0.50</priority></url><url><loc>MY.URL.COM?id=2</loc><changefreq>daily</changefreq><priority>0.50</priority></url><url><loc>MY.URL.COM?id=5</loc><changefreq>daily</changefreq><priority>0.50</priority></url><url><loc>MY.URL.COM?id=7</loc><changefreq>daily</changefreq><priority>0.50</priority></url></urlset>
Instead of:
<?xml version="1.0"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>MY.URL.COM?id=1</loc>
<changefreq>daily</changefreq>
<priority>0.50</priority>
</url>
<url>
<loc>MY.URL.COM?id=2</loc>
<changefreq>daily</changefreq>
<priority>0.50</priority>
</url>
<url>
<loc>MY.URL.COM?id=5</loc>
<changefreq>daily</changefreq>
<priority>0.50</priority>
</url>
<url>
<loc>MY.URL.COM?id=7</loc>
<changefreq>daily</changefreq>
<priority>0.50</priority>
</url>
</urlset>
**
WORKING CODE:
**
$xml = "<urlset xmlns='http://www.sitemaps.org/schemas/sitemap/0.9' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd'>";
$select = dbquery("SELECT * FROM MY TABLE");
while ($data = dbarray($select)) {
$xml .= "<url>";
$xml .= "<loc>MY.URL.COM/?id=".$data['id']."</loc>";
$xml .= "<changefreq>daily</changefreq>";
$xml .= "<priority>0.50</priority>";
}
$xml .= '</urlset>';
$sitemap = simplexml_load_file($xml);
$sxe = new SimpleXMLElement($xml);
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($xml);
$dom->save("sitemap.xml");
Code to append to existing xml file:
$sitemap = simplexml_load_file("sitemap.xml");
$url = $sitemap->addChild('url');
$url->addChild("loc", "MY.URL.COM/?id=".$get_id['id']);
$url->addChild("changefreq", "daily");
$url->addChild("priority", "0.50");
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($sitemap->asXML());
$dom->save('sitemap.xml');
Edit Start
//you don't need simplexml
// $sxml = new SimpleXMLElement($xml);
$dom = new DOMDocument('1.0');
$dom->formatOutput = true;
$dom->loadXML($xml); // can use the xml string
$dom->save("sitemap.xml"); // need to use save() rather than saveXML
Edit End
The code that you have written for the first time will do a proper formatting. Problem will arise when you add a new element/node to sitemap.xml using simplexml.
If you want a properly formatted XML, you will need to save it using DOMDocument every time, the same one that you are doing initially.
What you are doing
$sitemap = simplexml_load_file("sitemap.xml");
$url = $sitemap->addChild('url');
$url->addChild("loc", "MY.URL.COM/?id=".$get_id['id']);
$url->addChild("changefreq", "daily");
$url->addChild("priority", "0.50");
$sitemap->asXML("sitemap.xml");
Try changing it to this:
$sitemap = simplexml_load_file("sitemap.xml");
$url = $sitemap->addChild('url');
$url->addChild("loc", "MY.URL.COM/?id=".$get_id['id']);
$url->addChild("changefreq", "daily");
$url->addChild("priority", "0.50");
$dom = new DOMDocument('1.0',LIBXML_NOBLANKS);
$dom->formatOutput = true;
$dom->loadXML($sitemap->asXML());
$dom->saveXML('sitemap.xml');
Hope it helps.
One other thing to look at is if server where your code is running is Unix/Linux it'll use Unix end of lines (LF) which if viewed with Windows editors or browser may not show right as they expect (CR-LF).

Categories