Im creating simple PHP code to generate sitemap.xml file, but the problem is, that the file is unformatted. If the file was smaller, I could look at it in browser, but it has about 1,5 MB and Chrome just gives up while loading it (IE loads it and you can view it, but its lagging really hard).
I googled and tried different solutions (even from this site), but none of them worked, so Im humbly asking for your help. Also Im using the newest PHP version and my server runs on Unix based OS if thats important.
Here is code Im using (there are couple thousands (15.000+) rows in the table, so XML file is pretty long:
$xml = '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">';
$xml .= "<url>";
$xml .= "<loc>MY.URL.COM/</loc>";
$xml .= "<changefreq>always</changefreq>";
$xml .= "<priority>1.00</priority>";
$xml .= "</url>";
$xml .= "<url>";
$xml .= "<loc>MY.URL.COM/upload.php</loc>";
$xml .= "<changefreq>weekly</changefreq>";
$xml .= "<priority>0.80</priority>";
$xml .= "</url>";
$select = dbquery("SELECT * FROM MY TABLE");
while ($data = dbarray($select)) {
$xml .= "<url>";
$xml .= "<loc>MY.URL.COM/?id=".$data['id']."</loc>";
$xml .= "<changefreq>daily</changefreq>";
$xml .= "<priority>0.50</priority>";
$xml .= "</url>";
}
$xml .= '</urlset>';
$sxml = new SimpleXMLElement($xml);
$dom = new DOMDocument('1.0');
$dom->formatOutput = true;
$dom->loadXML($sxml->asXML());
$dom->saveXML("sitemap.xml");
Also, every time after new page is added Im running this code to append it to sitemap file and its appending it into one row, so need to fix that too.
Im trying both codes separately, so Im sure the first one doesnt format the document at all.
$sitemap = simplexml_load_file("sitemap.xml");
$url = $sitemap->addChild('url');
$url->addChild("loc", "MY.URL.COM/?id=".$get_id['id']);
$url->addChild("changefreq", "daily");
$url->addChild("priority", "0.50");
$sitemap->asXML("sitemap.xml");
By Unreadable I mean its saved in one line like this:
<?xml version="1.0"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url><loc>MY.URL.COM?id=1</loc><changefreq>daily</changefreq><priority>0.50</priority></url><url><loc>MY.URL.COM?id=2</loc><changefreq>daily</changefreq><priority>0.50</priority></url><url><loc>MY.URL.COM?id=5</loc><changefreq>daily</changefreq><priority>0.50</priority></url><url><loc>MY.URL.COM?id=7</loc><changefreq>daily</changefreq><priority>0.50</priority></url></urlset>
Instead of:
<?xml version="1.0"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>MY.URL.COM?id=1</loc>
<changefreq>daily</changefreq>
<priority>0.50</priority>
</url>
<url>
<loc>MY.URL.COM?id=2</loc>
<changefreq>daily</changefreq>
<priority>0.50</priority>
</url>
<url>
<loc>MY.URL.COM?id=5</loc>
<changefreq>daily</changefreq>
<priority>0.50</priority>
</url>
<url>
<loc>MY.URL.COM?id=7</loc>
<changefreq>daily</changefreq>
<priority>0.50</priority>
</url>
</urlset>
**
WORKING CODE:
**
$xml = "<urlset xmlns='http://www.sitemaps.org/schemas/sitemap/0.9' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd'>";
$select = dbquery("SELECT * FROM MY TABLE");
while ($data = dbarray($select)) {
$xml .= "<url>";
$xml .= "<loc>MY.URL.COM/?id=".$data['id']."</loc>";
$xml .= "<changefreq>daily</changefreq>";
$xml .= "<priority>0.50</priority>";
}
$xml .= '</urlset>';
$sitemap = simplexml_load_file($xml);
$sxe = new SimpleXMLElement($xml);
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($xml);
$dom->save("sitemap.xml");
Code to append to existing xml file:
$sitemap = simplexml_load_file("sitemap.xml");
$url = $sitemap->addChild('url');
$url->addChild("loc", "MY.URL.COM/?id=".$get_id['id']);
$url->addChild("changefreq", "daily");
$url->addChild("priority", "0.50");
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($sitemap->asXML());
$dom->save('sitemap.xml');
Edit Start
//you don't need simplexml
// $sxml = new SimpleXMLElement($xml);
$dom = new DOMDocument('1.0');
$dom->formatOutput = true;
$dom->loadXML($xml); // can use the xml string
$dom->save("sitemap.xml"); // need to use save() rather than saveXML
Edit End
The code that you have written for the first time will do a proper formatting. Problem will arise when you add a new element/node to sitemap.xml using simplexml.
If you want a properly formatted XML, you will need to save it using DOMDocument every time, the same one that you are doing initially.
What you are doing
$sitemap = simplexml_load_file("sitemap.xml");
$url = $sitemap->addChild('url');
$url->addChild("loc", "MY.URL.COM/?id=".$get_id['id']);
$url->addChild("changefreq", "daily");
$url->addChild("priority", "0.50");
$sitemap->asXML("sitemap.xml");
Try changing it to this:
$sitemap = simplexml_load_file("sitemap.xml");
$url = $sitemap->addChild('url');
$url->addChild("loc", "MY.URL.COM/?id=".$get_id['id']);
$url->addChild("changefreq", "daily");
$url->addChild("priority", "0.50");
$dom = new DOMDocument('1.0',LIBXML_NOBLANKS);
$dom->formatOutput = true;
$dom->loadXML($sitemap->asXML());
$dom->saveXML('sitemap.xml');
Hope it helps.
One other thing to look at is if server where your code is running is Unix/Linux it'll use Unix end of lines (LF) which if viewed with Windows editors or browser may not show right as they expect (CR-LF).
Related
I am trying to develop a function that removes certain URL nodes from my sitemap file. Here is what I have so far.
$xpath = new DOMXpath($DOMfile);
$elements = $xpath->query("/urlset/url/loc[contains(.,'$pageUrl')]");
echo count($elements);
foreach($elements as $element){
//this is where I want to delete the URL
echo $element;
echo "here".$element->nodeValue;
}
Which outputs "111111". I don't know why I can't echo a string in a foreach loop if the $elements count is '1'.
Up until now, I've been doing
$urls = $dom->getElementsByTagName( "url" );
foreach( $urls as $url ){
$locs = $url->getElementsByTagName( "loc" );
$loc = $locs->item(0)->nodeValue;
echo $loc;
if($loc == $fullPageUrl){
$removeUrl = $dom->removeChild($url);
}
}
Which would work fine if my sitemap wasn't so big. It times out right now, so I'm hoping using xpath queries will be faster.
After Gordon's comment, I tried:
$xpath = new DOMXpath($DOMfile);
$query = sprintf('/urlset/url[./loc = "%d"]', $pageUrl);
foreach($xpath->query($query) as $element) {
//this is where I want to delete the URL
echo $element;
echo "here".$element->nodeValue;
}
And its not returning anything.
I tried going a step further and used codepad, using what was used in the other post mentioned, and did this:
<?php error_reporting(-1);
$xml = <<< XML <?xml version="1.0"
encoding="UTF-8" ?> <url>
<loc>professional_services</loc>
<loc>5professional_services</loc>
<loc>6professional_services</loc>
</url> XML;
$id = '5professional_services';
$dom = new DOMDocument; $dom->loadXML($xml);
$xpath = new DOMXPath($dom); $query = sprintf('/url/[loc = $id]');
foreach($xpath->query($query) as $record) {
$record->parentNode->removeChild($record);
}
echo $dom->saveXml();
and I'm getting a "Warning: DOMXPath::query(): Invalid expression" at the foreach loop line. Thanks for the other comment on the urlset, I'll be sure to include the double slashes in my code, tried it and it returned nothing.
XML from a sitemap should be :
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc></loc>
...
</url>
<url>
<loc></loc>
...
</url>
...
</urlset>
Since it got a namespace, the query is a little more complicated than my previous answer :
$xpath = new DOMXpath($DOMfile);
// Here register your namespace with a shortcut
$xpath->registerNamespace('sm', "http://www.sitemaps.org/schemas/sitemap/0.9");
// this request should work
$elements = $xpath->query('/sm:urlset/sm:url[sm:loc = "'.$pageUrl.'"]');
foreach($elements as $element){
// This is a hint from the manual comments
$element->parentNode->removeChild($element);
}
echo $DOMfile->saveXML();
I'm writing out of memory just before going to bed. If it doesn't work I'll go test tomorrow morning. (And yes, I'm aware that it could bring some downvotes)
If you don't have a namespace (you should but that's not an obligation sigh)
$elements = $xpath->query('/urlset/url[loc = "'.$pageUrl.'"]');
You got a concrete example that it's working here : http://codepad.org/vuGl1MAc
I am trying to develop a function that removes certain URL nodes from my sitemap file. Here is what I have so far.
$xpath = new DOMXpath($DOMfile);
$elements = $xpath->query("/urlset/url/loc[contains(.,'$pageUrl')]");
echo count($elements);
foreach($elements as $element){
//this is where I want to delete the URL
echo $element;
echo "here".$element->nodeValue;
}
Which outputs "111111". I don't know why I can't echo a string in a foreach loop if the $elements count is '1'.
Up until now, I've been doing
$urls = $dom->getElementsByTagName( "url" );
foreach( $urls as $url ){
$locs = $url->getElementsByTagName( "loc" );
$loc = $locs->item(0)->nodeValue;
echo $loc;
if($loc == $fullPageUrl){
$removeUrl = $dom->removeChild($url);
}
}
Which would work fine if my sitemap wasn't so big. It times out right now, so I'm hoping using xpath queries will be faster.
After Gordon's comment, I tried:
$xpath = new DOMXpath($DOMfile);
$query = sprintf('/urlset/url[./loc = "%d"]', $pageUrl);
foreach($xpath->query($query) as $element) {
//this is where I want to delete the URL
echo $element;
echo "here".$element->nodeValue;
}
And its not returning anything.
I tried going a step further and used codepad, using what was used in the other post mentioned, and did this:
<?php error_reporting(-1);
$xml = <<< XML <?xml version="1.0"
encoding="UTF-8" ?> <url>
<loc>professional_services</loc>
<loc>5professional_services</loc>
<loc>6professional_services</loc>
</url> XML;
$id = '5professional_services';
$dom = new DOMDocument; $dom->loadXML($xml);
$xpath = new DOMXPath($dom); $query = sprintf('/url/[loc = $id]');
foreach($xpath->query($query) as $record) {
$record->parentNode->removeChild($record);
}
echo $dom->saveXml();
and I'm getting a "Warning: DOMXPath::query(): Invalid expression" at the foreach loop line. Thanks for the other comment on the urlset, I'll be sure to include the double slashes in my code, tried it and it returned nothing.
XML from a sitemap should be :
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc></loc>
...
</url>
<url>
<loc></loc>
...
</url>
...
</urlset>
Since it got a namespace, the query is a little more complicated than my previous answer :
$xpath = new DOMXpath($DOMfile);
// Here register your namespace with a shortcut
$xpath->registerNamespace('sm', "http://www.sitemaps.org/schemas/sitemap/0.9");
// this request should work
$elements = $xpath->query('/sm:urlset/sm:url[sm:loc = "'.$pageUrl.'"]');
foreach($elements as $element){
// This is a hint from the manual comments
$element->parentNode->removeChild($element);
}
echo $DOMfile->saveXML();
I'm writing out of memory just before going to bed. If it doesn't work I'll go test tomorrow morning. (And yes, I'm aware that it could bring some downvotes)
If you don't have a namespace (you should but that's not an obligation sigh)
$elements = $xpath->query('/urlset/url[loc = "'.$pageUrl.'"]');
You got a concrete example that it's working here : http://codepad.org/vuGl1MAc
I want to add an XML tree into another XML, and I have tried with following code which is not working:
<?php
$str1 = '<parent>
<name>mrs smith</name>
</parent>';
$xml1 = simplexml_load_string($str1);
print_r($xml1);
$str2 = '<tag>
<child>child1</child>
<age>3</age>
</tag>';
$xml2 = simplexml_load_string($str2);
print_r($xml2);
$xml1->addChild($xml2);
print_r($xml1);
?>
Expect output XML:
<parent>
<name>mrs smith</name>
<tag>
<child>child1</child>
<age>3</age>
</tag>
</parent>
Please assist me.
You can use DOMDocument::importNode
<?php
$str2 = '<tag>
<child>child1</child>
<age>3</age>
</tag>';
$str1 = '<parent>
<name>mrs smith</name>
</parent>';
$tagDoc = new DOMDocument;
$tagDoc->loadXML($str2);
$tagNode = $tagDoc->getElementsByTagName("tag")->item(0);
//echo $tagDoc->saveXML();
$newdoc = new DOMDocument;
$newdoc->loadXML($str1);
$node = $newdoc->importNode($tagNode, true);
$newdoc->documentElement->appendChild($node);
echo $newdoc->saveXML();die;
I have to make this type of XML :-
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.example.com/catalog?item=12&desc=vacation_hawaii</loc>
<changefreq>weekly</changefreq>
</url>
</urlset>
For which I have written this code,
$dom = new domDocument('1.0', 'utf-8');
$dom->formatOutput = true;
$rootElement = $dom->createElementNS('http://www.sitemaps.org/schemas/sitemap/0.9', 'urlset');
$sxe = simplexml_import_dom( $dom );
$urlMain = $sxe->addChild("url");
$loc = $urlMain->addChild("loc","http://www.example.com");
$lastmod = $urlMain->addChild("lastmod","$date");
$changefreq = $urlMain->addChild("changefreq","daily");
$priority = $urlMain->addChild("priority","1");
Everything works completely fine, but for some reason xmlns for urlset is not getting added. What might be wrong here?
Any suggestion would be helpful.
You need to append the root element to the document prior to conversion to simplexml:
$rootElement = $dom->createElementNS('http://www.sitemaps.org/schemas/sitemap/0.9', 'urlset');
$dom->appendChild($rootElement);
$sxe = simplexml_import_dom( $dom );
Would anyone know how i can "explode" a string back into "normal" xml format?
I found this script (ref:gooseflight,2010) that looks like it can do the job but the output comes out stuck together.
Here's the code:
enter code herefunction combineXML($file)
{
global $xmlstr;
$xml = simplexml_load_file($file);
foreach($xml as $element)
$xmlstr .= $element->asXML();
}
$files[] = "tmp.xml";
$files[] = "traduction.xml";
$xmlstr = '<CAB>';
foreach ($files as $file)
combineXML($file);
$xmlstr .= '</CAB>';
// Convert string to XML for further processing
$xml = simplexml_load_string($xmlstr);
$bytes = file_put_contents("combined.xml", $xml->asXML())
Here is the output:
<?xml version="1.0" encoding="UTF-8"?>
<CAB>
<CABO>XXXXXXXXXX0987650003</CABO><ACTIVITY>NONE</ACTIVITY><BEORI>blablaE</BEORI>BEDEST>blabla</BEDEST><NATRELA>more blabla</NATRELA><ANE>2014</ANE><NODEP>1111</NODEP>
</CAB>
So how could i seperate the nodes to look like this?:
<?xml version="1.0" encoding="UTF-8"?>
<CAB>
<CABO>XXXXXXXXXX0987650003</CABO>
<ACTIVITY>NONE</ACTIVITY>
<BEORI>blablaE</BEORI>
<BEDEST>blabla</BEDEST>
<NATRELA>more blabla</NATRELA>
<ANE>2014</ANE>
<NODEP>1111</NODEP>
.....
</CAB>
Would anyone know how to fix it?
I would suggest to use DomDocument class to save the XML; check this:
$dom_obj = new DOMDocument();
$dom_obj->loadXML($file);
// Do all your changes to the file by using DomDocument command (e.g. CreateElement, CreateAttribute, etc)
$dom_obj->formatOutput = true;
$dom_obj->save($file);