How can I save several sitemap files, limited by 1000 URL's each file, like sitemap1.xml, sitemap2.xml?
Basically I want to limit the foreach each file by put_file_content.
My code is:
$sitemap = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>
<urlset xmlns=\"http://www.sitemaps.org/schemas/sitemap/0.9\">
<url>
<loc>". Yii::app() -> getBaseUrl(true) ."</loc>
<priority>1</priority>
</url>
";
foreach($websites as $website) {
$sitemap .= "<url>
<loc>".$website['domain']."</loc>
<priority>0.5</priority>
</url>
";
}
$sitemap .= "</urlset>";
file_put_contents("sitemap.xml", $sitemap, LOCK_EX);
Let's create that application quickly:
Create some template XML where you add the websites to.
Chunk the $websites with the help of a NoRewindIterator and a LimitIterator
Let's start with the second point and create this faking the URLs as well as the XML just to see if this is easy to wire-up:
$limit = 3;
$urls = new ArrayIterator(range(0, 9)); // 10 Fake URLs
$urls->rewind();
$it = new NoRewindIterator($urls);
First we set a limit per file (here three to keep it low for testing) and then we setup the data-source for the URLs. Here those are 10 fake URLs, that are just the numbers from zero to nine.
Those URLs are rewound because they are then wrapped into a NoRewindIterator and that one never rewinds but we want to rewind the data-source once (this is not necessary for all iterators, but for quite some so we do this correct).
The rewind operation is blocked by the NoRewindIterator so that we can continue to get X chunks by the size of $limit. And that is exactly what is done now:
$fileCounter = 0;
while ($it->valid()) {
$fileCounter++;
printf("File %d:\n", $fileCounter);
$websites = new LimitIterator($it, 0, $limit);
foreach($websites as $website) {
printf(" * Website: %s\n", $website);
}
}
As long as $it is valid (read: as long as there are URLs to output), a new file is created (starting at one) and then three websites are foreach-ed via the LimitIterator. When that iteration is done, it is continued until all website URLs have been consumed. The output is as following:
File 1:
* Website: 0
* Website: 1
* Website: 2
File 2:
* Website: 3
* Website: 4
* Website: 5
File 3:
* Website: 6
* Website: 7
* Website: 8
File 4:
* Website: 9
This so far show how to do the chunking (or sometimes this is also called pagination). As the example shows, only the part about creating the XML documents is missing.
For creating an XML documention you could concatenate a string, however, we don't do that. We use an existing library for it that does this all perfectly well. That library is called DOMDocument, and here is an example how to create a sitemap file with two exemplary locations within the urlset:
$doc = new DOMDocument();
$doc->formatOutput = TRUE;
$nsUri = 'http://www.sitemaps.org/schemas/sitemap/0.9';
$urlset = $doc->appendChild($doc->createElementNS($nsUri, 'urlset'));
$url = $doc->createElementNS($nsUri, 'url');
$location = $url->appendChild($doc->createElementNS($nsUri, 'loc', 'BASEURL'));
$priority = $url->appendChild($doc->createElementNS($nsUri, 'priority', '1'));
$urlset->appendChild(clone $url);
$priority->nodeValue = '0.5';
$location->nodeValue = 'TEST';
$urlset->appendChild(clone $url);
echo $doc->saveXML();
This code-example shows how to create the document and then how to add the elements with their proper namespaces to it. It also shows how create a boilerplate <url> element that can be modified and added easily by cloning it.
The output of this example then is:
<?xml version="1.0"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>BASEURL</loc>
<priority>1</priority>
</url>
<url>
<loc>TEST</loc>
<priority>0.5</priority>
</url>
</urlset>
So now all general problems have been solved. All that is needed is to web these two together and to store to disk. I spare the later part for this examples sake (you just pass a filename as parameter into saveXML) and output the XMLs instead:
<?php
/**
* Save Sitemap XML Files Limit by 1000 URLs per each File
*
* #link https://stackoverflow.com/q/19750485/367456
*/
$limit = 3;
$urls = new ArrayIterator(range(0, 9)); // 10 Fake URLs
$urls->rewind();
$it = new NoRewindIterator($urls);
$fileCounter = 0;
$baseDoc = new DOMDocument();
$baseDoc->formatOutput = TRUE;
$nsUri = 'http://www.sitemaps.org/schemas/sitemap/0.9';
while ($it->valid()) {
$fileCounter++;
$doc = clone $baseDoc;
$urlset = $doc->appendChild($doc->createElementNS($nsUri, 'urlset'));
$url = $doc->createElementNS($nsUri, 'url');
$location = $url->appendChild($doc->createElementNS($nsUri, 'loc', 'BASEURL'));
$priority = $url->appendChild($doc->createElementNS($nsUri, 'priority', '1'));
$urlset->appendChild(clone $url);
$priority->nodeValue = '0.5';
printf("File %d:\n", $fileCounter);
$websites = new LimitIterator($it, 0, $limit);
foreach ($websites as $website) {
$location->nodeValue = $website;
$urlset->appendChild(clone $url);
}
echo $doc->saveXML();
}
The output then is in XML instead of plain text:
File 1:
<?xml version="1.0"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>BASEURL</loc>
<priority>1</priority>
</url>
<url>
<loc>0</loc>
<priority>0.5</priority>
</url>
<url>
<loc>1</loc>
<priority>0.5</priority>
</url>
<url>
<loc>2</loc>
<priority>0.5</priority>
</url>
</urlset>
File 2:
<?xml version="1.0"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>BASEURL</loc>
<priority>1</priority>
</url>
<url>
<loc>3</loc>
<priority>0.5</priority>
</url>
<url>
<loc>4</loc>
<priority>0.5</priority>
</url>
<url>
<loc>5</loc>
<priority>0.5</priority>
</url>
</urlset>
File 3:
<?xml version="1.0"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>BASEURL</loc>
<priority>1</priority>
</url>
<url>
<loc>6</loc>
<priority>0.5</priority>
</url>
<url>
<loc>7</loc>
<priority>0.5</priority>
</url>
<url>
<loc>8</loc>
<priority>0.5</priority>
</url>
</urlset>
File 4:
<?xml version="1.0"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>BASEURL</loc>
<priority>1</priority>
</url>
<url>
<loc>9</loc>
<priority>0.5</priority>
</url>
</urlset>
So all that is left to do now is that you offer the original data-source as an iterator at the very beginning, that you increase the number of URLs (the limit) to your own value and that you add the correct Base-URL per each file (if you really needs that).
As far as XML Sitemaps are concerned, you can also create one file that links the other files. The limits are a bit higher IIRC, compare with: Multiple Sitemap: entries in robots.txt?.
I hope this helps you to achieve what you're looking for in a well established way.
you can try a for loop ( for ( $x = 0 ; $x < 1000 ; $x++ ) { $websites[$x] } ) or you can exit the foreach loop with an external variable like so:
$i = 1;
foreach ($websites as $website)
{
if ($i === 1000) break;
$i++;
#do your thing
}
Related
How do I set the route for our sitemap? Currently we set header in our controller, but in the view, it will show as plain text.
controller code
public function sitemap(){
$data = [];
$model = new BlogModel();
$data[‘blogs’] = $model->where(‘STATUS’, ‘1’)->orderBy(‘ID’, ‘DESC’)->findAll();
return view(‘sitemap’, $data);
}
sitemap code
<?php echo ‘<?xml version=”1.0" encoding=”UTF-8"?>’; ?>
<urlset
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance"
xmlns:image=”http://www.google.com/schemas/sitemap-image/1.1"
xsi:schemaLocation=”http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd"
xmlns=”http://www.sitemaps.org/schemas/sitemap/0.9">
<! — created with Free Online Sitemap Generator www.xml-sitemaps.com →
<url>
<loc><?= base_url();?></loc>
<changefreq>daily</changefreq>
<priority>1.0</priority>
</url>
<?php foreach($blogs as $blog){
$cat_id = $blog[‘PAGE_ID’];
$db = \Config\Database::connect();
$query = $db->query(“SELECT * from mcms_links WHERE ID = ‘$cat_id’”);
$result = $query->getRowArray();
?>
<url>
<loc><?=base_url();?>/<?=$result[‘VALID_NAME’];?>/<?=$blog[‘VALID_NAME’];?></loc>
<changefreq>daily</changefreq>
<priority>1.00</priority>
</url>
<?php }
?>
</urlset>
routes
$routes->get(‘sitemap\.xml’, ‘Sitemap::sitemap’);
I tried to add a string into the sitemap.xml file inside <urlset> tag, but it stores differently.
<?php
$date_mod = date('Y-m-d');
$sitemap = "<url>
<loc>http://www.website.com/article.php?page=3</loc>
<lastmod>$date_mod</lastmod>
<priority>0</priority>
</url>";
$xml = simplexml_load_file("sitemap.xml");
$xml->addChild($sitemap);
file_put_contents("sitemap.xml", $xml->asXML());
?>
The output is like:
<?xml version="1.0"?>
<urlset>
<url>
<loc>http://www.website.com/article.php?page=3</loc>
<lastmod>2018-01-12</lastmod>
<priority>0</priority>
</url>
<//www.website.com/article.php?page=3</loc>
<lastmod>2018-01-12</lastmod>
<priority>0</priority>
</url>/></urlset>
Please help me.
If the raw xml is like this:
<?xml version="1.0"?>
<urlset>
</urlset>
And you updated xml is like this:
<?xml version="1.0"?>
<urlset>
<url>
<loc>http://www.website.com/article.php?page=3</loc>
<lastmod>2018-01-12</lastmod>
<priority>0</priority>
</url>
</urlset>
Then you could refer to the following code:
<?php
$date_mod = date('Y-m-d');
$sitemap = "<url>
<loc>http://www.website.com/article.php?page=3</loc>
<lastmod>$date_mod</lastmod>
<priority>0</priority>
</url>";
$sitemap_node =simplexml_load_string($sitemap);
$xml = simplexml_load_file("sitemap.xml");
sxml_append($xml,$sitemap_node);
$xml->asXML('sitemap.xml');
function sxml_append(SimpleXMLElement $to, SimpleXMLElement $from) {
$toDom = dom_import_simplexml($to);
$fromDom = dom_import_simplexml($from);
$toDom->appendChild($toDom->ownerDocument->importNode($fromDom, true));
}
?>
Your previous code failed to do that is because addChild method can only deal with text (and stil has some drawbacks), not another xml object.
I have a XML image Sitemap for google like:
<?xml version="1.0" encoding="UTF-8"?>
<urlset
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>http://www.domain.de</loc>
<image:image>
<image:loc>http://www.domain.de/image1.jpg</image:loc>
<image:title>image 1</image:title>
</image:image>
<image:image>
<image:loc>http://www.domain.de/image2.jpg</image:loc>
<image:title>image 2</image:title>
</image:image>
</url>
</urlset>
Now I will delete a "image:image" child where "image:loc" is like "http://www.domain.de/image2.jpg".
How can I do this with php?
I have tested code like the following:
$xmlPageUrl="http://www.domain.de/image2.jpg";
foreach($sitemap->xPath('//url[image:image/image:loc="' . $xmlPageUrl . '"]') as $node) {
$sitemap->parentNode->removeChild($node);
}
Who can help me?
Use can use text() in xpath to search for specific content :
//image:image/image:loc[text()="http://www.domain.de/image2.jpg"]
so :
$doc = new DOMDocument;
$doc->loadxml($xmlString);
$xpath = new DOMXpath($doc);
$xmlPageUrl="http://www.domain.de/image2.jpg";
foreach($xpath->query('//image:image/image:loc[text()="'.$xmlPageUrl.'"]') as $node) {
$node->parentNode->parentNode->removeChild($node->parentNode);
}
the XML now contains :
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>http://www.domain.de</loc>
<image:image>
<image:loc>http://www.domain.de/image1.jpg</image:loc>
<image:title>image 1</image:title>
</image:image>
</url>
</urlset>
I have a problem when I try to insert a new node to my xml file. I need to add:
<ul><loc> mylink</ul></loc>
before the last node in my xml file.
I tried the code below, but it doesn't work and it breaks the last node before I insert the new one:
$userfile = "sitemap.xml";
$fh = fopen($userfile, 'r+');
$addUser = "<url><loc>https://www.example.com/Privacy-Policy</loc></url></urlset> ";
fseek($fh, -10, SEEK_END);
fwrite($fh, $addUser);
fclose($fh);
This is part of my xml file:
<urlset >
<url>
<loc>example.com</loc>
</url>
<url>
<loc>example.com</loc>
</url>
</urlset>
my output is:
<urlset >
<url>
<loc>example1.com</loc>
</url>
<url>
<loc>example2.com</loc>
</ur<url> //<-- See here my xml file broke
<loc>example2.com</loc>
</url>
</urlset>
This should work for you:
(Here I just load the xml file with simplexml_load_file(), to create a SimpleXMLElement(). After this you can simply add a child to the root node)
<?php
$xml = simplexml_load_file("file.xml");
$xml = new SimpleXMLElement($xml->asXML());
$urlChild = $xml->addChild("url", "");
$urlChild->addChild("loc", "example2.com");
$xml->asXML("file.xml");
?>
input file:
<urlset >
<url>
<loc>example.com</loc>
</url>
<url>
<loc>example.com</loc>
</url>
</urlset>
output file:
<?xml version="1.0"?>
<urlset>
<url>
<loc>example.com</loc>
</url>
<url>
<loc>example.com</loc>
</url>
<url>
<loc>example2.com</loc>
</url>
</urlset>
I'm currently working on the sitemaps for a website, and I'm using SimpleXML to import and do some checks on the original XML file. after this I use simplexml_load_file("small.xml"); to convert it to DOMDocument to make it easier to precisely add and manipulate XML elements. Below is the test XML sitemap that i'm working from:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:52:32-Orouke.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:23-castle technology.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:38-banana split.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:42-Waveney.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:55:12-pure orange.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:57:54-tau press.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:21-E.f.m.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:31-apple.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:45-townhouse communications.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
</urlset>
Now. here is the test code I'm using to modify:
<?php
$root = simplexml_load_file("small.xml");
$domRoot = dom_import_simplexml($root);
$dom = $domRoot->ownerDocument;
$urlElement = $dom->createElement("url");
$locElement = $dom->createElement("loc");
$locElement->appendChild($dom->createTextNode("www.google.co.uk"));
$urlElement->appendChild($locElement);
$lastmodElement = $dom->createElement("lastmod");
$lastmodElement->appendChild($dom->createTextNode("2011-08-02"));
$urlElement->appendChild($lastmodElement);
$domRoot->appendChild($urlElement);
$dom->formatOutput = true;
echo $dom->saveXML();
?>
The main problem is, that no matter where i place $dom->formatOutput = true; the existing XML that was imported from SimpleXML is formatted correctly, but anything new is formatted in the "all one line" style, as follows:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:52:32-Orouke.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:23-castle technology.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:38-banana split.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:53:42-Waveney.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:55:12-pure orange.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:57:54-tau press.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:21-E.f.m.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:31-apple.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url>
<loc>http://www.companycheck.co.uk/searches/2011/08/22/23:59:45-townhouse communications.html</loc>
<lastmod>2011-08-23</lastmod>
</url>
<url><loc>www.google.co.uk</loc><lastmod>2011-08-02</lastmod></url></urlset>
If anyone has an idea why this is happening and how to fix it I would be very grateful.
There is a workaround. You can force reformatting by saving your new xml to string first, then load it again after setting the formatOutput property, e.g.:
$strXml = $dom->saveXML();
$dom->formatOutput = true;
$dom->loadXML($strXml);
echo $dom->saveXML();
To format output nicely, you need to set the preserveWhiteSpace variable to false before loading as stated in the documentation
Example:
$Xhtml = "<div><span></span></div>";
$doc = new DOMDocument('1.0','UTF-8');
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
$doc->loadXML($Xhtml);
$formattedXhtml = $doc->saveXML($doc->documentElement, LIBXML_NOXMLDECL);
$expectedFormatting =<<<EOF
<div>
<span/>
</div>
EOF;
$this->assertEquals($expectedFormatting,$formattedXhtml,"The XHTML is formatted");
Just for the visitor that comes here as this was the first answer on Google Search.
I had this same problem using code like Simon's.
Turns out that when you disable errors (either with $doc->loadHTML(..., LIBXML_NOERROR) or libxml_use_internal_errors(true);), it won't format anymore (example: https://3v4l.org/ur76E).
The solution is to not disable errors and suppress them on the PHP side (with #).
Ugly, but it works: https://3v4l.org/BSJVu
The final silver bullet function looks like:
function beautifyDoc(DOMDocument $doc): void
{
$previousLibXmlState = libxml_use_internal_errors(false);
$previousErrorHandler = set_error_handler(null);
try {
$html = $doc->saveHTML();
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
#$doc->loadHTML($html);
} finally {
libxml_use_internal_errors($previousLibXmlState);
set_error_handler($previousErrorHandler);
}
}
// usage
$doc = new DOMDocument();
// ...load html and do stuff...
beautifyDoc($doc);
echo $doc->saveHTML(); // done
(it also takes care of the php error handler, if already set)