php XML DOM translates special chars to &#xYY; - php

I send this with AJAX POST:
<li><ul class "zone zCentral ui-sortable"><li><ul class="region rCol3 ui-sortable"><li class="" style=""><div><span class="tc tc_video">574081</span> <span>video: 'Mundo.Hoy': ¿Dónde habré olvidado... mi memoria?</span></div></li></ul></li></ul></li>
I do this to create XML:
header('Content-type: text/html; charset=utf-8');
if(isset($_POST) && isset($_POST['data']))
{
$data = '<ul id="zone_container" class="ui-sortable">';
$data .= $_POST['data'];
$data .= '</ul>';
$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadXML($data);
echo $dom->saveXML();
exit();
}
and i get this:
<?xml version="1.0"?>
<ul id="zone_container" class="ui-sortable">
<li><ul class="zone zCentral ui-sortable"><li><ul class="region rCol3 ui-sortable"><li class="" style=""><div><span class="tc tc_video">574081</span> <span>video: 'Mundo.Hoy': ¿Dónde habré olvidado... mi memoria?</span></div> </li></ul></li></ul></li></ul>
¿Dónde habré olvidado... mi memoria?
translates to:
¿Dónde habr&#xE9 ; olvidado... mi memoria?
I need original chars in the XML, these are utf-8 valid and i don't know the reason for this encode :(

The easiest way to fix this is to set the encoding type after you have loaded the XML:
$dom = new DOMDocument();
$dom->loadXML($data);
$dom->encoding = 'utf-8';
echo $dom->saveXML();
exit();
You can also fix it by putting an XML declaration at the beginning of your data:
$data = '<?xml version="1.0" encoding="utf-8"?>' . $data;
$dom = new DOMDocument();
$dom->loadXML($data);
echo $dom->saveXML();
exit();

I solved with this:
header('Content-type: text/html; charset=utf-8');
if(isset($_POST) && isset($_POST['data']))
{
$data = '<?xml version="1.0" encoding="utf-8"?>';
$data .= '<ul id="zone_container" class="ui-sortable">';
$data .= $_POST['data'];
$data .= '</ul>';
$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadXML($data);
echo $dom->saveXML();
exit();
adding the:
$data = '<?xml version="1.0" encoding="utf-8"?>';
to the XML at the beginning
thanks for responses :)

Related

Creating an XML file from PHP

I am trying to create XML files via PHP for Googles Merchant center.
I had previously done the same with creating a sitemap but now am having issues.
$xmlString = '<?xml version="1.0" encoding="UTF-8"?>';
$xmlString .= '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">';
$xmlString .= '<url>';
$xmlString .= '<loc>example.com</loc>';
$xmlString .= '<lastmod>'.date(DATE_ATOM,time()).'</lastmod>';
$xmlString .= '<changefreq>daily</changefreq>';
$xmlString .= '<priority>1.0</priority>';
$xmlString .= '</url>';
$xmlString .= '</urlset>';
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadXML($xmlString);
$dom->save('../sitemap.xml');
This is more or less what I have to create my sitemap except I am obviously creating more URLs by querying my database.
But then I do more or less the exact same thing for my product feed but it does not work.
$xmlString .= '<?xml version="1.0" encoding="UTF-8"?>';
$xmlString .= '<rss version="2.0" xmlns:g="http://base.google.com/ns/1.0">';
$xmlString .= '<channel>';
$xmlString .= '<title>Product Feed</title>';
$xmlString .= '<link>https://www.example.com/</link>';
$xmlString .= '<description>Product Feed</description>';
$xmlString .= '<item>';
$xmlString .= '<g:id>'.$product_id.'</g:id>';
$xmlString .= '<title>'.$product_name.'</title>';
$xmlString .= '<description>'.$product_description.'</description>';
$xmlString .= '<link>https://www.example.com/product/'.$product_url.'</link>';
$xmlString .= '<g:condition>new</g:condition>';
$xmlString .= '<g:price>'.$rounded_lowest_sale_price.'</g:price>';
$xmlString .= '<g:availability>in stock</g:availability>';
$xmlString .= '<g:image_link>https://www.example.com/media/product_images/'.$first_image.'</g:image_link>';
$xmlString .= '<g:mpn>'.$model_no.'</g:mpn>';
$xmlString .= '<g:brand>'.$brand.'</g:brand>';
$xmlString .= '<g:google_product_category>Business & Industrial > Material Handling</g:google_product_category>';
$xmlString .= '</item>';
}
$xmlString .= '</channel>';
$xmlString .= '</rss>';
echo $xmlString;
$dom = new DOMDocument;
$dom->preserveWhiteSpace = FALSE;
$dom->loadXML($xmlString);
$dom->save('../product_feeds/product_feed_int.xml');
It saves the XML file but all it contains is:
<?xml version="1.0"?>
without even the UTF-8 encoding.
If you don't use DOMDocument to create the xml you can put your string directly into a file
file_put_contents('../product_feeds/product_feed_int.xml', $xmlString);
consider to use & for & in
Business & Industrial
This might Help
<?php
$xmldoc = new DOMDocument();
$xmldoc->encoding = 'utf-8';
$xmldoc->xmlVersion = '1.0';
$xmldoc->formatOutput = true;
$xml_file_name = 'xmlfile.xml';
$root = $xmldoc->createElement('Cars');
$car_node = $xmldoc->createElement('car');
$attr_movie_id = new DOMAttr('car_model', 'ritz');
$car_node->setAttributeNode($attr_movie_id);
$child_node_title = $xmldoc->createElement('Make', 'Maruti');
$car_node->appendChild($child_node_title);
$child_node_year = $xmldoc->createElement('Year', 2012);
$car_node->appendChild($child_node_year);
$child_node_genre = $xmldoc->createElement('Type', 'Hatchback');
$car_node->appendChild($child_node_genre);
$child_node_ratings = $xmldoc->createElement('Ratings', 6.2);
$car_node->appendChild($child_node_ratings);
$root->appendChild($car_node);
$xmldoc->appendChild($root);
$xmldoc->save($xml_file_name);
echo "$xml_file_name created";
?>

save XML file in minified form using DOMDocument

I want to save space by saving the xml file in minified form
for example
<body>
<div>
<p>hello</p>
<div/>
</div>
</body>
it should be saved like this
<body><div><p>hello</p><div/></div></body>
I'm using DOMDocument to create xml file like this
$xml = new DOMDocument("1.0", "UTF-8");
$xml->preserveWhiteSpace = false;
$xml->formatOutput = false;
$feed = $xml->createElement("feed");
$feed = $xml->appendChild($feed);
/*add attribute*/
$feed_attribute = $xml->createAttribute('xmlns:xsi');
$feed_attribute->value = 'http://www.w3.org/2001/XMLSchema-instance';
$feed->appendChild($feed_attribute);
$aggregator = $xml->createElement("aggregator");
$aggregator = $feed->appendChild($aggregator);
$name = $xml->createElement('name', 'test.com');
$aggregator->appendChild($name);
...etc
$xml->save(public_path() .$string, LIBXML_NOEMPTYTAG);
You're already using the right options. DOMDocument::$formatOutput and DOMDocument::$preserveWhiteSpace:
Format Output
DOMDocument::$formatOutput adds indentation whitespace nodes to an XML DOM if saved. (It is disabled by default.)
$document = new DOMDocument();
$body = $document->appendChild($document->createElement('body'));
$div = $body->appendChild($document->createElement('div'));
$div
->appendChild($document->createElement('p'))
->appendChild($document->createTextNode('hello'));
echo "Not Formatted:\n", $document->saveXML();
$document->formatOutput = TRUE;
echo "\nFormatted:\n", $document->saveXML();
Output:
Not Formatted:
<?xml version="1.0"?>
<body><div><p>hello</p></div></body>
Formatted:
<?xml version="1.0"?>
<body>
<div>
<p>hello</p>
</div>
</body>
However it does not indent if here are text child nodes. It tries to avoid changes to the text output of an HTML/XML document. So it will usually not reformat a loaded document with existing indention whitespace nodes.
Preserve White Space
DOMDocument::$preserveWhiteSpace is an option for the parser. If disabled (It is enabled by default) the parser will ignore any text nodes that would consists of only whitespaces. Indentations are text nodes with a linebreak and some spaces or tabs. It can be used to remove indentations from an XML.
$xml = <<<'XML'
<?xml version="1.0"?>
<body>
<div>
<p>hello</p>
</div>
</body>
XML;
$document = new DOMDocument();
$document->preserveWhiteSpace = FALSE;
$document->loadXML($xml);
echo $document->saveXML();
Output:
<?xml version="1.0"?>
<body><div><p>hello</p></div></body>
Try this, you have to use saveXML() instead of save(),
<?php
$xml = new DOMDocument('1.0');
$xml->preserveWhiteSpace = false;
$xml->formatOutput = false;
$root = $xml->createElement('book');
$root = $xml->appendChild($root);
$title = $xml->createElement('title');
$title = $root->appendChild($title);
$text = $xml->createTextNode("This is the \n title");
$text = $title->appendChild($text);
echo "Saving all the document:\n";
$xml_content = $xml->saveXML();
echo $xml_content . "\n";
$xml_content = str_replace(array(">\n", ">\t"), '>', trim($xml_content, "\n"));
echo $xml_content . "\n";
// Write the contents back to the file
$filename = "/tmp/xml_minified.xml";
file_put_contents($filename, $xml_content);
?>

PHP DOM Document: How to replace or change the node "default:link" into "xhtml:link" in the xml?

I tried to remove the attribute "xmlns:xhtml" from the tag "xhtml:link" with the following code:
Source Code:
$doc = new DOMDocument('1.0', 'utf-8');
$url = 'android-app://com.domain.name';
$element = $doc->createElementNS($url,'xhtml:link');
$attribute = $doc->childNodes->item(0);
//echo '<br>tag: '.$doc->getElementsByTagName("xhtml:link")[0];
$element->setAttribute('href', $url);
$element->setAttribute('rel', 'alternate');
//echo '<pre>';print_r($element);echo '</pre>';
$element->hasAttributeNS($url, 'xhtml');
$element->removeAttributeNS($url, 'xhtml');
$doc->appendChild($element);
echo $doc->saveXML();
OutPut:
<?xml version="1.0" encoding="utf-8"?>
<default:link href="android-app://com.domain.name" rel="alternate"/>
But, I am expecting the output looks like:
<?xml version="1.0" encoding="utf-8"?>
<xhtml:link href="android-app://com.domain.name" rel="alternate"/>
Please help me what I have to do? Here I struck to replace the tag...
Thanks!
Try createElement function instead of createElementNS:
$doc = new DOMDocument('1.0', 'utf-8');
$url = 'android-app://com.domain.name';
$element = $doc->createElement('xhtml:link');
$attribute = $doc->childNodes->item(0);
$element->setAttribute('href', $url);
$element->setAttribute('rel', 'alternate');
$doc->appendChild($element);
echo $doc->saveXML();
OUTPUT:
<?xml version="1.0" encoding="utf-8"?>
<xhtml:link href="android-app://com.domain.name" rel="alternate"/>

Format PHP simpleXML file

Im creating simple PHP code to generate sitemap.xml file, but the problem is, that the file is unformatted. If the file was smaller, I could look at it in browser, but it has about 1,5 MB and Chrome just gives up while loading it (IE loads it and you can view it, but its lagging really hard).
I googled and tried different solutions (even from this site), but none of them worked, so Im humbly asking for your help. Also Im using the newest PHP version and my server runs on Unix based OS if thats important.
Here is code Im using (there are couple thousands (15.000+) rows in the table, so XML file is pretty long:
$xml = '<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">';
$xml .= "<url>";
$xml .= "<loc>MY.URL.COM/</loc>";
$xml .= "<changefreq>always</changefreq>";
$xml .= "<priority>1.00</priority>";
$xml .= "</url>";
$xml .= "<url>";
$xml .= "<loc>MY.URL.COM/upload.php</loc>";
$xml .= "<changefreq>weekly</changefreq>";
$xml .= "<priority>0.80</priority>";
$xml .= "</url>";
$select = dbquery("SELECT * FROM MY TABLE");
while ($data = dbarray($select)) {
$xml .= "<url>";
$xml .= "<loc>MY.URL.COM/?id=".$data['id']."</loc>";
$xml .= "<changefreq>daily</changefreq>";
$xml .= "<priority>0.50</priority>";
$xml .= "</url>";
}
$xml .= '</urlset>';
$sxml = new SimpleXMLElement($xml);
$dom = new DOMDocument('1.0');
$dom->formatOutput = true;
$dom->loadXML($sxml->asXML());
$dom->saveXML("sitemap.xml");
Also, every time after new page is added Im running this code to append it to sitemap file and its appending it into one row, so need to fix that too.
Im trying both codes separately, so Im sure the first one doesnt format the document at all.
$sitemap = simplexml_load_file("sitemap.xml");
$url = $sitemap->addChild('url');
$url->addChild("loc", "MY.URL.COM/?id=".$get_id['id']);
$url->addChild("changefreq", "daily");
$url->addChild("priority", "0.50");
$sitemap->asXML("sitemap.xml");
By Unreadable I mean its saved in one line like this:
<?xml version="1.0"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url><loc>MY.URL.COM?id=1</loc><changefreq>daily</changefreq><priority>0.50</priority></url><url><loc>MY.URL.COM?id=2</loc><changefreq>daily</changefreq><priority>0.50</priority></url><url><loc>MY.URL.COM?id=5</loc><changefreq>daily</changefreq><priority>0.50</priority></url><url><loc>MY.URL.COM?id=7</loc><changefreq>daily</changefreq><priority>0.50</priority></url></urlset>
Instead of:
<?xml version="1.0"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>MY.URL.COM?id=1</loc>
<changefreq>daily</changefreq>
<priority>0.50</priority>
</url>
<url>
<loc>MY.URL.COM?id=2</loc>
<changefreq>daily</changefreq>
<priority>0.50</priority>
</url>
<url>
<loc>MY.URL.COM?id=5</loc>
<changefreq>daily</changefreq>
<priority>0.50</priority>
</url>
<url>
<loc>MY.URL.COM?id=7</loc>
<changefreq>daily</changefreq>
<priority>0.50</priority>
</url>
</urlset>
**
WORKING CODE:
**
$xml = "<urlset xmlns='http://www.sitemaps.org/schemas/sitemap/0.9' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xsi:schemaLocation='http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd'>";
$select = dbquery("SELECT * FROM MY TABLE");
while ($data = dbarray($select)) {
$xml .= "<url>";
$xml .= "<loc>MY.URL.COM/?id=".$data['id']."</loc>";
$xml .= "<changefreq>daily</changefreq>";
$xml .= "<priority>0.50</priority>";
}
$xml .= '</urlset>';
$sitemap = simplexml_load_file($xml);
$sxe = new SimpleXMLElement($xml);
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($xml);
$dom->save("sitemap.xml");
Code to append to existing xml file:
$sitemap = simplexml_load_file("sitemap.xml");
$url = $sitemap->addChild('url');
$url->addChild("loc", "MY.URL.COM/?id=".$get_id['id']);
$url->addChild("changefreq", "daily");
$url->addChild("priority", "0.50");
$dom = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->loadXML($sitemap->asXML());
$dom->save('sitemap.xml');
Edit Start
//you don't need simplexml
// $sxml = new SimpleXMLElement($xml);
$dom = new DOMDocument('1.0');
$dom->formatOutput = true;
$dom->loadXML($xml); // can use the xml string
$dom->save("sitemap.xml"); // need to use save() rather than saveXML
Edit End
The code that you have written for the first time will do a proper formatting. Problem will arise when you add a new element/node to sitemap.xml using simplexml.
If you want a properly formatted XML, you will need to save it using DOMDocument every time, the same one that you are doing initially.
What you are doing
$sitemap = simplexml_load_file("sitemap.xml");
$url = $sitemap->addChild('url');
$url->addChild("loc", "MY.URL.COM/?id=".$get_id['id']);
$url->addChild("changefreq", "daily");
$url->addChild("priority", "0.50");
$sitemap->asXML("sitemap.xml");
Try changing it to this:
$sitemap = simplexml_load_file("sitemap.xml");
$url = $sitemap->addChild('url');
$url->addChild("loc", "MY.URL.COM/?id=".$get_id['id']);
$url->addChild("changefreq", "daily");
$url->addChild("priority", "0.50");
$dom = new DOMDocument('1.0',LIBXML_NOBLANKS);
$dom->formatOutput = true;
$dom->loadXML($sitemap->asXML());
$dom->saveXML('sitemap.xml');
Hope it helps.
One other thing to look at is if server where your code is running is Unix/Linux it'll use Unix end of lines (LF) which if viewed with Windows editors or browser may not show right as they expect (CR-LF).

Cannot parse CDATA with SimpleXML

I have the following XML code:
<para>
<![CDATA[
<?php
$data = '<?xml version="1.0"?>
<root>content</root>';
$sxe = simplexml_load_string($data);
var_dump($sxe);
?>
]]>
</para>
I want to parse the CDATA section to take this result:
Content:
<?php
$data = '<?xml version="1.0"?>
<root>content</root>';
$sxe = simplexml_load_string($data);
var_dump($sxe);
?>
I use the following but it doesn't work.
$xml='sxml.xml';
$book = simplexml_load_file($xml, 'SimpleXMLElement', LIBXML_NOCDATA);
$para = $book->chapter->para[1];
print "Content: ".$para."<br>";
foreach($para AS $node) {
print "Iter Content: ".$node."<br>";
}
This results in:
Content:
content';
$sxe = simplexml_load_string($data);
var_dump($sxe);
?>
In phpinfo() it shows that libxml is enabled and active.
Any help? Thanks in advance
You should encode the special characters using htmlspecialchars
For example:
print "Content: ".htmlspecialchars($para)."<br>";

Categories