Sitemap creation with DOMDocument throws parsing error

Sitemap creation with DOMDocument throws parsing error - php

I'm creating a sitemap in XML, it works well with one record displayed, but when including 1+ records, it throws an error:
XML Parsing Error: junk after document element
Which shows this code here:
<?xml version="1.0" encoding="UTF-8"?>
<url><loc>http://www.mywebsite.com/page/1</loc><changefreq>daily</changefreq><priority>0.6</priority></url>
<url><loc>http://www.mywebsite.com/page/2</loc><changefreq>daily</changefreq><priority>0.6</priority></url>
My code:
$xml = new DOMDocument('1.0', 'UTF-8');
for($i = 0; $i < 2; $i++)
{
$url = $xml->createElement('url');
$xml->appendChild($url);
$website_url = 'http://www.mywebsite.com/page/' . $i;
$loc = $xml->createElement('loc', $website_url);
$url->appendChild($loc);
$change = $xml->createElement('changefreq', 'daily');
$url->appendChild($change);
$priority = $xml->createElement('priority', '0.6');
$url->appendChild($priority);
}
header('Content-type: text/xml');
echo $xml->saveXML();
Why is it throwing this kind of error when the XML seems valid to me?

At least in your example, you have two root nodes (<url>), as this is not allowed in xml, the second is the junk after document element.
You're missing the <urlset> root node, see: http://www.sitemaps.org/protocol.php

Related

Trouble creating a valid RSS feed in PHP

I'm trying to get an RSS feed, change some text, and then serve it again as an RSS feed. However, the code I've written doesn't validate properly. I get these errors:
line 3, column 0: Missing rss attribute: version
line 14, column 6: Undefined item element: content (10 occurrences)
Here is my code:
<?php
header("Content-type: text/xml");
echo "<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type='text/xsl'?>
<?xml-stylesheet type='text/xsl' media='screen'
href='/~d/styles/rss2full.xsl'?>
<rss xmlns:content='http://purl.org/rss/1.0/modules/content/'>
<channel>
<title>Blaakdeer</title>
<description>Blog RSS</description>
<language>en-us</language>
";
$html = "";
$url = "http://feeds.feedburner.com/vga4a/mPSm";
$xml = simplexml_load_file($url);
for ($i = 0; $i < 10; $i++){
$title = $xml->channel->item[$i]->title;
$description = $xml->channel->item[$i]->description;
$content = $xml->channel->item[$i]->children("content", true);
$content = preg_replace("/The post.*/","", $content);
echo "<item>
<title>$title</title>
<description>$description</description>
<content>$content</content>
</item>";
}
echo "</channel></rss>";

Just as you don't treat XML as a string when parsing it, you don't treat it as as string when you create it. Use the proper tools to create your XML; in this case, the DomDocument class.
You had a number of problems with your XML; biggest is that you were creating a <content> element, but the original RSS had a <content:encoded> element. That means the element name is encoded but it's in the content namespace. Big difference between that and an element named content. I've added comments to explain the other steps.
<?php
// create the XML document with version and encoding
$xml = new DomDocument("1.0", "UTF-8");
$xml->formatOutput = true;
// add the stylesheet PI
$xml->appendChild(
$xml->createProcessingInstruction(
'xml-stylesheet',
'type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"'
)
);
// create the root element
$root = $xml->appendChild($xml->createElement('rss'));
// add the version attribute
$v = $root->appendChild($xml->createAttribute('version'));
$v->appendChild($xml->createTextNode('2.0'));
// add the namespace
$root->setAttributeNS(
'http://www.w3.org/2000/xmlns/',
'xmlns:content',
'http://purl.org/rss/1.0/modules/content/'
);
// create some child elements
$ch = $root->appendChild($xml->createElement('channel'));
// specify the text directly as second argument to
// createElement because it doesn't need escaping
$ch->appendChild($xml->createElement('title', 'Blaakdeer'));
$ch->appendChild($xml->createElement('description', 'Blog RSS'));
$ch->appendChild($xml->createElement('language', 'en-us'));
$url = "http://feeds.feedburner.com/vga4a/mPSm";
$rss = simplexml_load_file($url);
for ($i = 0; $i < 10; $i++) {
if (empty($rss->channel->item[$i])) {
continue;
}
$title = $rss->channel->item[$i]->title;
$description = $rss->channel->item[$i]->description;
$content = $rss->channel->item[$i]->children("content", true);
$content = preg_replace("/The post.*/","", $content);
$item_el = $ch->appendChild($xml->createElement('item'));
$title_el = $item_el->appendChild($xml->createElement('title'));
// this stuff is unknown so it has to be escaped
// so have to create a separate text node
$title_el->appendChild($xml->createTextNode($title));
$desc_el = $item_el->appendChild($xml->createElement('description'));
// the other alternative is to create a cdata section
$desc_el->appendChild($xml->createCDataSection($description));
// the content:encoded element is not the same as a content element
// the element must be created with the proper namespace prefix
$cont_el = $item_el->appendChild(
$xml->createElementNS(
'http://purl.org/rss/1.0/modules/content/',
'content:encoded'
)
);
$cont_el->appendChild($xml->createCDataSection($content));
}
header("Content-type: text/xml");
echo $xml->saveXML();

The first error is just a missing attribute, easy enough:
<rss version="2.0" ...>
For the <p> and other HTML elements, you need to escape them. The file should look like this:
<p>...
There are other ways, but this is the easiest way. In PHP you can just call a function to encode entities.
$output .= htmlspecialchars(" <p>Paragraph</p> ");
As for the <content> tag problem, it should be <description> instead. The <content> tag currently generates two errors. Changing it to <description> in both places should fix both errors.
Otherwise it looks like you understand the basics. You <open> and </close> tags and those have to match. You can also use what is called empty tags: <empty/> which exist on their own but to not include content and no closing tag.

How to update Document-A XML nodes with Document-B XML nodes using php

I have two XML files: one from a client and one created from a db query. The db XML file has this structure:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<tags>
<title>Wordsleuth (2006, volume 3, 4): The Dictionary: Disapproving Schoolmarm or Accurate Record?</title>
<alias>favart/wordsleuth-2006-volume-3-4-the-dictionary-disapproving-schoolmarm-or-accurate-record</alias>
<id>4361</id>
</tags>
</metadata>
The client XML has this structure:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<metadata xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<tags>
<title>Wordsleuth (2006, vol. 3, 4): The Dictionary: Disapproving Schoolmarm or Accurate Record? – Search by Title – Favourite Articles – TERMIUM Plus® – Translation Bureau</title>
<description>A Language Update article on the role that the dictionary plays in language usage.</description>
<keywords>language usage; dictionaries</keywords>
<subject>English language; Terminology</subject>
</tags>
</metadata>
Each with approx 200 'tags' elements. After getting some hints from here and here and referencing the PHP manual my first crack at it produced this:
$client = 'C:\xampp\htdocs\wetkit\sites\all\modules\my_metatags\favart.xml';
$db = 'C:\xampp\htdocs\wetkit\sites\all\modules\my_metatags\tmp\from db\favart_db.xml';
$c_xmlstr = file_get_contents($client);
$d_xmlstr = file_get_contents($db);
$favartdoc_db = new DomDocument('1.0','UTF-8');
$favartdoc_cl = new DomDocument('1.0','UTF-8');
$favartdoc_db->loadXML($d_xmlstr);
$favartdoc_cl->loadXML($c_xmlstr);
for ($i=0;$i==$favartdoc_cl->getElementsByTagName('title')->count; $i++){
$c_nodes = $x_favartdoc_cl->query('/metadata/tags/title');
$c_node = $c_nodes->item($i);
for ($j=0; $j==$favartdoc_db->getElementsByTagName('title')->count; $j++){
$d_nodes = $x_favartdoc_db->query('/metadata/tags/title');
$d_node = $d_nodes->item($j);
if(stripos(trim($c_node->nodeValue), trim($d_node->nodeValue))===0){
$favartdoc_cl->replaceChild($d_node,$c_node);
if($i==($c_nodes->count)){break;};
}
}
$favartdoc_cl->saveXML();
}
This code runs, generates no errors, and does nothing. An echo statement at the end
echo "\n\n" . "THE TOTAL NUMBER OF MATCHES EQUALS " . $i . " IN " . $j . " NODES." . "\n";
generates this message:
THE TOTAL NUMBER OF MATCHES EQUALS 1 IN 1 NODES.
A second simpler approach produced this:
$favartdoc_db = new DomDocument('1.0','UTF-8');
$favartdoc_cl = new DomDocument('1.0','UTF-8');
$favartdoc_db->load($db);
$favartdoc_cl->load($client);
$favartdoc_cl->formatOutput = true;
$c_meta_x = new DOMXpath($favartdoc_cl);
$d_meta_x = new DOMXpath($favartdoc_db);
foreach ($c_meta_x->query('//tags') as $c_tag){
foreach ($d_meta_x->query('//tags') as $d_tag){
if(strncasecmp(trim($c_tag->title), trim($d_tag->title) , strlen(trim($d_tag->title)))===0){
$c_tag->appendChild($d_tag);
}
}
}
$favartdoc_cl->saveXML();
But this generates an error:
exception 'DOMException' with message 'Wrong Document Error'
Suggestions to correct that error, by calling importNode before attaching it to the DOM, still generate the same error.
As you can see I'm trying a different string matching function in each. Ultimately I want to replace the titles in the client XML with those from the db or append the whole tag set from the db XML to the client XML then delete the client title element afterwards.
Any help would be appreciated.

This is what worked for me.
$client = 'some\where\somefile.xml';
$db = 'some\where\someOtherfile.xml';
$c_xmlstr = file_get_contents($client);
$d_xmlstr = file_get_contents($db);
$doc_db = new DomDocument('1.0','UTF-8');
$doc_cl = new DomDocument('1.0','UTF-8');
$doc_db->loadXML($d_xmlstr);
$fdoc_cl->loadXML($c_xmlstr);
$x_doc_db = new DOMXpath($doc_db);
$x_doc_cl = new DOMXpath($doc_cl);
$c_nodes = $x_doc_cl->query('/metadata/tags');
$c_nodes_titles = $x_doc_cl->query('/metadata/tags/title');
for($i=0;$i<=$c_nodes->length;++$i){
$c_node = $c_nodes->item($i);
$c_node_title = $c_nodes_titles->item($i);
$d_nodes = $x_doc_db->query('/metadata/tags');
$d_nodes_titles = $x_doc_db->query('/metadata/tags/title');
$d_nodes_ids = $x_doc_db->query('/metadata/tags/id');
for($j=0;$j<=$d_nodes->length;++$j){
$d_node_title = $d_nodes_titles->item($j);
$d_node_id = $d_nodes_ids->item($j);
if(strncasecmp(trim($c_node_title->textContent),trim($d_node_title->textContent) , strlen(trim($d_node_title->textContent)))===0 && trim($c_node_title->textContent)===trim($d_node_title->textContent)){
$db_id = $doc_cl->createElement("db_id");
$db_id_val = $doc_cl->createTextNode($d_node_id->nodeValue);
if(!is_null($c_node)){$c_node->appendChild($db_id);}
if(!is_null($c_node)){$c_node->appendChild($db_id_val);}
}
}
if($i===($c_nodes->count) && $j===($d_nodes->count)){break;};
}
$doc_cl->saveXML();

PHP XML response start tag expected, but i see it in var_dump

I have the following being returned
var_dump:
string(799) "<?xml version="1.0" encoding="ISO-8859-1"?> <serv:message xmlns:serv="http://www.webex.com/schemas/2002/06/service" xmlns:com="http://www.webex.com/schemas/2002/06/common" xmlns:att="http://www.webex.com/schemas/2002/06/service/attendee"><serv:header><serv:response><serv:result>SUCCESS</serv:result><serv:gsbStatus>BACKUP</serv:gsbStatus></serv:response></serv:header><serv:body><serv:bodyContent xsi:type="att:registerMeetingAttendeeResponse" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"><att:register><att:attendeeID>29281003</att:attendeeID></att:register></serv:bodyContent></serv:body></serv:message>"
i'm trying to use SimpleXML, but i'm first validating output with this function (sorry, can't remember where i found it on stackoverflow):
function isXML($xml){
libxml_use_internal_errors(true);
$doc = new DOMDocument('1.0', 'utf-8');
$doc->loadXML($xml);
$errors = libxml_get_errors();
if(empty($errors)){
return true;
}
$error = $errors[0];
if($error->level < 3){
return true;
}
$explodedxml = explode("r", $xml);
$badxml = $explodedxml[($error->line)-1];
$message = $error->message . ' at line ' . $error->line . '. Bad XML: ' . htmlentities($badxml);
return $message;
}
result of isXML()
Start tag expected, '<' not found at line 1. Bad XML: <?xml ve
I see the '<', unless the var_dump is inaccurate. I've broken this thing down as much as I could. Any help would be greatly appreciated.

I stripped the problem down a little more:
$xml = '<?xml version="1.0" encoding="ISO-8859-1"?>
<serv:message xmlns:serv="http://www.webex.com/schemas/2002/06/service"/>';
// escape xml special chars - this will provoke the error
$xml = htmlspecialchars($xml);
$document = new DOMDocument();
$document->loadXml($xml);
Output:
Warning: DOMDocument::loadXML(): Start tag expected, '<' not found in Entity, line: 1 in /tmp/...
What happens it that your XML is still escaped/encoded. You do not see that in the browser because the special characters are interpreted by it. It treats the response (including the var_dump()) as HTML. Open the source view to check the actual value.
Debug the source that reads the XML string, you might want to change it or add a html_entity_decode() there.
HINT: You're XML uses namespaces, so you might better off with DOM + Xpath. Check out DOMXpath::evaluate().

DOM get node value by "brother" value in this string

I am creating PHP system for edit XML files to translation of game.
I am using DOM e.g for file-comparision for translators (with update XML file).
I have old and new XML (in advance: I can not change XML structure) with new strings and/or new IDs.
For future echo node value to comparision by the same ID order, I have following code:
<?php
$xml2 = new DOMDocument('1.0', 'utf-16');
$xml2->formatOutput = true;
$xml2->preserveWhiteSpace = false;
$xml2->load(substr($file, 0, -4).'-pl.xml');
$xml = new DOMDocument('1.0', 'utf-16');
$xml->formatOutput = true;
$xml->preserveWhiteSpace = false;
$xml->load($file);
for ($i = 0; $i < $xml->getElementsByTagName('string')->length; $i++) {
if ($xml2->getElementsByTagName('string')->item($i)) {
$element_pl = $xml2->getElementsByTagName('string')->item($i);
$body_pl = $element_pl->getElementsByTagName('body')->item(0);
$id_pl = $element_pl->getElementsByTagName('id')->item(0);
} else $id_pl->nodeValue = "";
$element = $xml->getElementsByTagName('string')->item($i);
$id = $element->getElementsByTagName('id')->item(0);
$body = $element->getElementsByTagName('body')->item(0);
if ($id_pl->nodeValue == $id->nodeValue) {
$element->appendChild( $xml->createElement('body-pl', $body_pl->nodeValue) );
}
}
$xml = simplexml_import_dom($xml);
?>
Above code change:
<?xml version="1.0" encoding="utf-16"?>
<strings>
<string>
<id>1</id>
<name>ABC</name>
<body>English text</body>
</string>
</strings>
to (by adding text from *-pl.xml file):
<?xml version="1.0" encoding="utf-16"?>
<strings>
<string>
<id>1</id>
<name>ABC</name>
<body>English text</body>
<body-pl>Polish text</body-pl>
</string>
</strings>
But I need find "body" value in *-pl.xml by "name" value.
"For" loop:
get "ABC" from "name" tag [*.xml] ->
find "ABC" in "name" tag [*-pl.xml] ->
get body node from that "string" [*-pl.xml]
I can do that by strpos(), but my (the smallest) file have 25346 lines..
Is there something to do e.g. "has children ("name", "ABC") -> parent" ?
Then I can get "body" value of this string.
Thank you in advance for suggestions or link to similar, resolved ask,
Greetings

You need XPath expressions:
//name[text()='ABC']/../body
or
//name[text()='ABC']/following-sibling::body
Check the PHP manual for DOMXPath class and its query method. In a nutshell, you'd use it like this:
$xpath = new DOMXPath($dom_document);
// find all `body` nodes that have a `name` sibling
// with an `ABC` value in the entire document
$nodes = $xpath->query("//name[text()='ABC']/../body");
foreach($nodes as $node) {
echo $node->textContent , "\n\n";
}

XML Parsing Error

here i am creating xml file dynamically at run time but i m getting error
XML Parsing Error: junk after document element
Location: http://localhost/tam/imagedata.php?imageid=8
Line Number 9, Column 1:
^
$id=$_GET['imageid'];
$dom = new DomDocument('1.0');
$query="select * from tbl_image_gallery where imageId='$id'";
$select=mysql_query($query);
while($res=mysql_fetch_array($select))
{
$content = $dom->appendChild($dom->createElement('content'));
$image = $content->appendChild($dom->createElement('image'));
$small_image_path = $image->appendChild($dom->createElement('small_image_path'));
$small_image_path->appendChild($dom->createTextNode("load/images/small/".$res['image']));
$big_image_path = $image->appendChild($dom->createElement('big_image_path'));
$big_image_path->appendChild($dom->createTextNode("load/images/big/".$res['image']));
$description = $image->appendChild($dom->createElement('description'));
$description->appendChild($dom->createTextNode($res['description']));
$dom->formatOutput = true;
}
echo $test1 = $dom->saveXML();
and xml format is
<?xml version="1.0"?>
<content>
<image>
<small_image_path>load/images/small/1.jpg</small_image_path>
<big_image_path>load/images/big/1.jpg</big_image_path>
<description>hgjghj</description>
</image>
<image><small_image_path>load/images/small/2.jpg</small_image_path><big_image_path>load/images/big/2.jpg</big_image_path><description>fgsdfg</description></image><image><small_image_path>load/images/small/3.jpg</small_image_path><big_image_path>load/images/big/3.jpg</big_image_path><description>sdfgsdfg</description></image><image><small_image_path>load/images/small/4.jpg</small_image_path><big_image_path>load/images/big/4.jpg</big_image_path><description>gsbhsg</description></image><image><small_image_path>load/images/small/4.jpg</small_image_path><big_image_path>load/images/big/4.jpg</big_image_path><description>gsbhsg</description></image><image><small_image_path>load/images/small/avatar.jpg</small_image_path><big_image_path>load/images/big/avatar.jpg</big_image_path><description></description></image></content>

Can it be that you are posting html code into the description field?
Could be usefull to add a CDataSection instead of a TextNode
$cdata = $dom->createCDATASection($res['description']);
$image->appendChild($cdata);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Sitemap creation with DOMDocument throws parsing error - php

At least in your example, you have two root nodes (<url>), as this is not allowed in xml, the second is the junk after document element. You're missing the <urlset> root node, see: http://www.sitemaps.org/protocol.php

Related

Trouble creating a valid RSS feed in PHP

How to update Document-A XML nodes with Document-B XML nodes using php

PHP XML response start tag expected, but i see it in var_dump

DOM get node value by "brother" value in this string

XML Parsing Error

Categories

Resources