SimpleXMLElement won't save readable well-formatted file - php

Here's the code:
for($i = 0; $i < count(array_values($resources['titles'])); $i++){
//var_dump($key);
$ad = $xml->addChild('ad');
$ad->addChild('title', htmlentities(htmlspecialchars(substr($resources['titles'][$i], 0, 70))));
$ad->addChild('text', 'Текст текст');
$ad->addChild('price', htmlentities($resources['prices'][$i]));
//file_put_contents('test.txt',$resources['titles'][$i]."\n", FILE_APPEND);
}
$xml->asXML($this->_xmlOutput);
It saves all the data okay, but xml file is not formatted well and cyrillic symbols (theres a lot of them) turned into &#x447 (what is that code?). Also file is saved as ansi, not utf-8. So the question is - how to properly create well formatted and readable (with cyrillic symbols) XML document?

Prefix XML with appropriate encoding header and tags1.
First line in XML should be:
<?xml version="1.0" encoding="UTF-8"?>

Found better solution using DOMDocument. Heres rewrited example of code inside the loop:
$node_ad = $xml->CreateElement('ad');
$node_ads->appendChild($node_ad);
//$node_ads->addChild('title', htmlentities(htmlspecialchars(substr($resources['titles'][$i], 0, 70))));
$title = $xml->CreateElement('title', htmlentities(htmlspecialchars(substr($resources['titles'][$i], 0, 70))));
$node_ad->appendChild($title);
$text = $xml->CreateElement('text', 'Текст текст');
$node_ad->appendChild($text);
$images_node = $xml->CreateElement('images');
$node_ad->appendChild($images_node);
$images = $xml->CreateElement('image', $this->_mainUrl.'/uploads/'.$resources['images'][$i]);
$images_node->appendChild($images);

Related

PHP get base64 decoding xml array and encode it to pdf

I need to convert XML array (contained base64 decoding) and encode it to PDF.
This is the array:
<response>
<xmlArray>
<blabla>TG9yZW0gaXBzdW0gZG9sb3Igc2l0IGFtZXQsIGNvbnNlY3RldHVyIGFkaXBpc2NpbmcgZWxpdC4gTnVuYyB2ZW5lbmF0aXMsIGp
1c3RvIHV0IGF1Y3RvciBzZW1wZXIsIHB1cnVzIGxlY3R1cyBlbGVtZW50dW0gbGliZXJvLCBhYyBwZWxsZW50ZXNxdWUgYW50ZSBqdXN0b
yBldCB0dXJwaXMuIE51bmMgYmliZW5kdW0gZWdlc3RhcyBkb2xvciB2b2x1dHBhdCBlZ2VzdGFzLiBTdXNwZW5kaXNzZSBkYXBpYnVzIHN
lbSBvZGlvLCBpbiBmYXVjaWJ1cyBsZWN0dXMgcHVsdmluYXIgdml0YWUuIE5hbSBtYXR0aXMgZXVpc21vZCBhdWd1ZSwgZWdldCBmaW5pY
nVzIGxlbyBoZW5kcmVyaXQgbmVjLiBDbGFzcyBhcHRlbnQgdGFjaXRpIHNvY2lvc3F1IGFkIGxpdG9yYSB0b3JxdWVudCBwZXIgY29udWJ
pYSBub3N0cmEsIHBlciBpbmNlcHRvcyBoaW1lbmFlb3MuIE51bmMgaWQgbnVuYyBsZWN0dXMuIFBlbGxlbnRlc3F1ZSBsYWN1cyB1cm5hL
CB2aXZlcnJhIGNvbnZhbGxpcyBlZmZpY2l0dXIgdmVsLCBhbGlxdWFtIHNpdCBhbWV0IGRpYW0uIEluIGEgYWxpcXVldCBtYXNzYS4gU2V
udCwgdGVtcHVzIGhlbmRyZXJpdCBlbmltIGZhdWNpYnVzLiBEb25lYyBtYXR0aXMgZWxpdCBub24gbWFzc2EgaW50ZXJkdW0gZmF1Y2lid
XMuIEFlbmVhbiBub24gbWF1cmlzIGluIHVybmEgbWFsZXN1YWRhIGx1Y3R1cy4=
</blabla>
</xmlArray>
</response>
My attempt up until now is
<?php
// 1st. convert xml array; get the blabla array information
$blabla2 = $xml->xmlArray->blabla;
// remove CR, new lines and whitespace on blabla2
$blabla3 = str_replace(array("\n", "\t", "\r"), '', $blabla2);
// 2nd. encode it to PDF
header('Content-type: application/pdf');
$blabla4 = base64_decode($blabla3 );
echo $blabla4 ;
?>
The result is a pdf, but not the way I expected as it shows this line on pdf:
%PDF-1.410obj<</Title(þÿ)/Creator(þÿwkhtmltopdf0.12.2.4) ...
Would you mind to tell me how to show the pdf properly?
Thanks a lot!

How I can use ucfirst() on PHP SimpleXML node?

I use php and simplexml for parse url. I want take value of simplexml node and change it, first I convert it to string, but ucfirst() doesn't work for that string.
$xml = simplexml_load_file($url);
foreach($xml->offers->offer as $offer)
{
$bodyType = (string) $offer->{"body-type"}; //I convert simplexml to string first
echo ucfirst($bodyType); // In this line ucfirst doesn't work
}
How to deal with it?
UPDATE: Problem was in Cyrillic letters, since ucfirst works only with Latin.
Working solution is to use this function:
$bodyType = (string) $offer->{"body-type"};
$encoding='UTF-8';
$str = mb_ereg_replace('^[\ ]+', '', $bodyType);
$str = mb_strtoupper(mb_substr($str, 0, 1, $encoding), $encoding). mb_substr($str, 1, mb_strlen($str), $encoding);
Dear plz share your xml file data also. I have used the following and it is working fine..
<?xml version="1.0"?>
<data>
<offers>
<offer>
<body-type>offer 1</body-type>
</offer>
<offer>
<body-type>offer 2</body-type>
</offer>
</offers>
</data>
my output is
Offer 1
Offer 2
HTML: Offer 1<br />Offer 2<br />
by following php code..
<?PHP
$url = "test.xml";
$xml = simplexml_load_file($url);
foreach($xml->offers->offer as $offer)
{
$bodyType = (string) $offer->{"body-type"}; //I convert simplexml to string first
echo ucfirst($bodyType); // In this line ucfirst doesn't work
echo '<br />';
}
?>
Given the test.xml from Farrukh's answer, you can actually even omit the typecasting. This works as well for me:
<?php
$url = "test.xml";
$xml = simplexml_load_file($url);
foreach($xml->offers->offer as $offer) {
echo ucfirst($offer->{"body-type"}) .'<br>';
}
Here's a live demo: http://codepad.viper-7.com/L4VwPL
UPDATE (after URL was provided by OP)
You'll most likely have an encoding issue. When I set the UTF-8 charset explicitly, it works as expected (otherwise simplexml returns corrupted strings only).
$url = "http://carsguru.net/x/used/exchange/4.xml";
$xml = simplexml_load_file($url);
header('Content-Type: text/html; charset=utf-8');
foreach($xml->offers->offer as $offer) {
echo ucfirst($offer->{"body-type"}) .'<br>';
}
When I run the above snippet, I get this output (stripped):
фургон
универсал
хэтчбек
хэтчбек
минивэн
минивэн
минивэн
седан
седан
универсал
хэтчбек
универсал
седан
хэтчбек
седан
NOTE You don't serve a content-type/charset header for the xml! I'd add that.
Anyway, you may want to have a look at this: iconv -> iconv("cp1251", "UTF-8", $str);
Actually file encoding is Cyrillic windows-1251, which is probably make sence.
Why? You can, of course, use valid UTF-8! Here is an example node from your XML converted with this cp1251-to-utf8-function (might look odd, but renders perfectly!)
<?xml version="1.0" encoding="UTF-8"?>
<auto-catalog>
<creation-date>2013-02-07 02:00:08 GMT+4</creation-date>
<host>carsguru.net</host>
<offers>
<offer type="commercial">
<url>http://carsguru.net/used/5131406/view.html</url>
<date>2013-02-07</date>
<mark>ГАЗ</mark>
<model>2705</model>
<year>2003</year>
<seller-city>Санкт-Петербург</seller-city>
<seller-phone>8-921-997-74-06</seller-phone>
<price>150000</price>
<currency-type>RUR</currency-type>
<steering-wheel>левый</steering-wheel>
<run-metric>км</run-metric>
<run>194</run>
<displacement>2300</displacement>
<stock>в наличии</stock>
<state>Хорошее</state>
<color>синий</color>
<body-type>фургон</body-type>
<engine-type>бензин</engine-type>
<gear-type>задний</gear-type>
<transmission>ручная</transmission>
<horse-power>98</horse-power>
<image>http://carsguru.net/clf/03/af/9c/8b/used.4r9v39h31facog8cs0w0wk8ws.jpg.medium.jpg</image>
<image>http://carsguru.net/clf/ae/51/be/3a/used.bxyc3q9mx80sko0wg80880w0k.jpg.medium.jpg</image>
<image>http://carsguru.net/clf/28/dc/c1/d4/used.8i1b76l1b8o4cwg8gc08oos4s.jpg.medium.jpg</image>
<image>http://carsguru.net/clf/55/3d/37/10/used.7dmn7puczuo0wo4cs8kko0cco.jpg.medium.jpg</image>
<image>http://carsguru.net/clf/49/02/15/54/used.7k8lhomw4j4s4040kssk4kgso.jpg.medium.jpg</image>
<equipment>Магнитола</equipment>
<equipment>Подогрев зеркал</equipment>
</offer>
</offers>
</auto-catalog>

PHP htmlentities and saving the data in xml format

Im trying to save some data into a xml file using the following PHP script:
<?php
$string = 'Go to google maps and some special characters ë è & ä etc.';
$string = htmlentities($string, ENT_QUOTES, 'UTF-8');
$doc = new DOMDocument('1.0', 'UTF-8');
$doc->preserveWhiteSpace = false;
$doc->formatOutput = true;
$root = $doc->createElement('top');
$root = $doc->appendChild($root);
$title = $doc->createElement('title');
$title = $root->appendChild($title);
$id = $doc->createAttribute('id');
$id->value = '1';
$text = $title->appendChild($id);
$text = $doc->createTextNode($string);
$text = $title->appendChild($text);
$doc->save('data.xml');
echo 'data saved!';
?>
I'm using htmlentities to translate all of the string into an html format, if I leave this out the special characters won't be translated to html format. this is the output:
<?xml version="1.0" encoding="UTF-8"?>
<top>
<title id="1">&lt;a href=&quot;google.com/maps&quot;&gt;Go to google maps&lt;/a&gt; and some special characters &euml; &egrave; &amp; &auml; etc.</title>
</top>
The ampersand of the html tags get a double html code: &lt; and an ampersand becomes: &amp;
Is this normal behavior? Or how can I prevent this from happening? Looks like a double encoding.
Try to remove the line:
$string = htmlentities($string, ENT_QUOTES, 'UTF-8');
Because the text passed to createTextNode() is escaped anyway.
Update:
If you want the utf-8 characters to be escaped. You could leave that line and try to add the $string directly in createElement().
For example:
$title = $doc->createElement('title', $string);
$title = $root->appendChild($title);
In PHP documentation it says that $string will not be escaped. I haven't tried it, but it should work.
It is the htmlentities that turns a & into &
When working with xml data you should not use htmlentities, as the DOMDocument will handle a & and not &.
As of php 5.3 the default encoding is UTF-8, so there is no need to convert to UTF-8.
This line:
$string = htmlentities($string, ENT_QUOTES, 'UTF-8');
… encodes a string as HTML.
This line:
$text = $doc->createTextNode($string);
… encodes your string of HTML as XML.
This gives you an XML representation of an HTML string. When the XML is parsed you get the HTML back.
how can I prevent this from happening?
If your goal is to store some text in an XML document. Remove the line that encodes it as HTML.
Looks like a double encoding.
Pretty much. It is encoded twice, it just uses different (albeit very similar) encoding methods for each of the two passes.

What can be alternate way to load strip out javascript and put it in array for later use

I am using following code to strip out javascript from html dom string and put them in array for later use.
What can be alternate good use.
My Problem:
I am getting problem with unicode inside the file. When files with unicode are parsed then it generates following error:
Warning: DOMDocument::saveHTML() [domdocument.savehtml]: output
conversion failed due to conv error, bytes 0x97 0xC3 0xA0 0xC2 in
my code:
function loadJSCodeToLast( $strDOM ){
//Find all the <script></script> code and add to $objApp
global $objApp;
$objDOM = new DOMDocument();
//$x = new DOMImplementation();
//$doc = $x->createDocument(NULL,"rootElementName");
//$strDOM = '<kool>'.$strDOM.'</kool>';
$objDOM->preserveWhiteSpace = false;
//$objDOM->formatOutput = true;
#$objDOM->loadHtml( $strDOM );
$xpath = new DOMXPath($objDOM);
$objScripts = $xpath->query('//script');
$totCount = $objScripts->length;
if ($totCount > 0) {
//document contains script tags
foreach($objScripts as $entries){
$strSrc = $entries->getAttribute('src');
if( $strSrc !== ''){
$objApp->AddJSFile( $strSrc );
}else{
$objApp->AddJSScript( $entries->nodeValue );
}
$entries->parentNode->removeChild( $entries );
}
}
//return $objDOM->saveHTML();
//echo $GLOBALS['strTemplateDirAbs'];
return preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), array('', '', '', ''), $objDOM->saveHTML()));
}
Try converting your string with utf8_encode() before loading it.
$txt = utf8_encode($txt);
var_dump(loadJSCodeToLast($txt));
The XML parser converts the text of an XML document into UTF-8, even
if you have set the character encoding of the XML, for example as a
second parameter of the DOMDocument constructor. After parsing the XML
with the load() command all its texts have been converted to UTF-8.
In case you append text nodes with special characters (e. g. Umlaut)
to your XML document you should therefore use utf8_encode() with your
text to convert it into UTF-8 before you append the text to the
document. Otherwise you will get an error message like "output
conversion failed due to conv error" at the save()
From DOMDocument::save documentation comments.

PHP SimpleXML doesn't preserve line breaks in XML attributes

I have to parse externally provided XML that has attributes with line breaks in them. Using SimpleXML, the line breaks seem to be lost. According to another stackoverflow question, line breaks should be valid (even though far less than ideal!) for XML.
Why are they lost? [edit] And how can I preserve them? [/edit]
Here is a demo file script (note that when the line breaks are not in an attribute they are preserved).
PHP File with embedded XML
$xml = <<<XML
<?xml version="1.0" encoding="utf-8"?>
<Rows>
<data Title='Data Title' Remarks='First line of the row.
Followed by the second line.
Even a third!' />
<data Title='Full Title' Remarks='None really'>First line of the row.
Followed by the second line.
Even a third!</data>
</Rows>
XML;
$xml = new SimpleXMLElement( $xml );
print '<pre>'; print_r($xml); print '</pre>';
Output from print_r
SimpleXMLElement Object
(
[data] => Array
(
[0] => SimpleXMLElement Object
(
[#attributes] => Array
(
[Title] => Data Title
[Remarks] => First line of the row. Followed by the second line. Even a third!
)
)
[1] => First line of the row.
Followed by the second line.
Even a third!
)
)
Using SimpleXML, the line breaks seem to be lost.
Yes, that is expected... in fact it is required of any conformant XML parser that newlines in attribute values represent simple spaces. See attribute value normalisation in the XML spec.
If there was supposed to be a real newline character in the attribute value, the XML should have included a
character reference instead of a raw newline.
The entity for a new line is
. I played with your code until I found something that did the trick. It's not very elegant, I warn you:
//First remove any indentations:
$xml = str_replace(" ","", $xml);
$xml = str_replace("\t","", $xml);
//Next replace unify all new-lines into unix LF:
$xml = str_replace("\r","\n", $xml);
$xml = str_replace("\n\n","\n", $xml);
//Next replace all new lines with the unicode:
$xml = str_replace("\n","
", $xml);
Finally, replace any new line entities between >< with a new line:
$xml = str_replace(">
<",">\n<", $xml);
The assumption, based on your example, is that any new lines that occur inside a node or attribute will have more text on the next line, not a < to open a new element.
This of course would fail if your next line had some text that was wrapped in a line-level element.
Assuming $xmlData is your XML string before it is sent to the parser, this should replace all newlines in attributes with the correct entity. I had the issue with XML coming from SQL Server.
$parts = explode("<", $xmlData); //split over <
array_shift($parts); //remove the blank array element
$newParts = array(); //create array for storing new parts
foreach($parts as $p)
{
list($attr,$other) = explode(">", $p, 2); //get attribute data into $attr
$attr = str_replace("\r\n", "
", $attr); //do the replacement
$newParts[] = $attr.">".$other; // put parts back together
}
$xmlData = "<".implode("<", $newParts); // put parts back together prefixing with <
Probably can be done more simply with a regex, but that's not a strong point for me.
Here is code to replace the new lines with the appropriate character reference in that particular XML fragment. Run this code prior to parsing.
$replaceFunction = function ($matches) {
return str_replace("\n", "
", $matches[0]);
};
$xml = preg_replace_callback(
"/<data Title='[^']+' Remarks='[^']+'/i",
$replaceFunction, $xml);
This is what worked for me:
First, get the xml as a string:
$xml = file_get_contents($urlXml);
Then do the replacement:
$xml = str_replace(".\xe2\x80\xa9<as:eol/>",".\n\n<as:eol/>",$xml);
The "." and "< as:eol/ >" were there because I needed to add breaks in that case. The new lines "\n" can be replaced with whatever you like.
After replacing, just load the xml-string as a SimpleXMLElement object:
$xmlo = new SimpleXMLElement( $xml );
Et Voilà
Well, this question is old but like me, someone might come to this page eventually.
I had slightly different approach and I think the most elegant out of these mentioned.
Inside the xml, you put some unique word which you will use for new line.
Change xml to
<data Title='Data Title' Remarks='First line of the row. \n
Followed by the second line. \n
Even a third!' />
And then when you get path to desired node in SimpleXML in string output write something like this:
$findme = '\n';
$pos = strpos($output, $findme);
if($pos!=0)
{
$output = str_replace("\n","<br/>",$output);
It doesn't have to be '\n, it can be any unique char.

Categories