I've got an xml to parse with php, that contains some umlaut characters.
Every node that contains a string has the string wrapped in a cdata tag, but my problem starts before parsing the xml: when I load the file (I've also tried to print out the contents of the file with file_get_contests, same result), the umlaut characters get broken, so for example ü becomes ü. Running a htmlentities() is futile, as the characters are already broken at that point. The xml encode is utf-8, so I don't know what else to do to avoid this problem. Anyone can help me?
Edit:
xml sample 'locations.xml':
<?xml version="1.0" encoding="utf-8"?>
<locations>
<location>
<id>481</id>
<city><![CDATA[Zürich]]></city>
</location>
</locations>
php code:
function parseLocations(){
$xml = new DOMDocument();
$xml->load('locations.xml');
$xml->preserveWhiteSpace = false;
$data = array();
$locations = $xml->childNodes->item(0);
for($i=0; $i<$locations->childNodes->length; $i++){
$location = $locations->childNodes->item($i);
if($location->nodeName=="location"){
$tmp = parseVenue($location);
$data[] = $tmp;
}
}
echo var_export($data, true);
}
function parseVenue($location){
//I need to exclude some of the nodes
$exclude = array('#text');
$data = array();
for($i=0; $i<$location->childNodes->length; $i++){
$tag = $location->childNodes->item($i);
if(!in_array($tag->nodeName, $exclude)){
$data[$tag->nodeName] = $tag->nodeValue;
}
}
return $data;
}
echoed output:
array ( 0 => array ( 'id' => '481', 'city' => 'Zürich'), )
Related
I'm trying to get an RSS feed, change some text, and then serve it again as an RSS feed. However, the code I've written doesn't validate properly. I get these errors:
line 3, column 0: Missing rss attribute: version
line 14, column 6: Undefined item element: content (10 occurrences)
Here is my code:
<?php
header("Content-type: text/xml");
echo "<?xml version='1.0' encoding='UTF-8'?>
<?xml-stylesheet type='text/xsl'?>
<?xml-stylesheet type='text/xsl' media='screen'
href='/~d/styles/rss2full.xsl'?>
<rss xmlns:content='http://purl.org/rss/1.0/modules/content/'>
<channel>
<title>Blaakdeer</title>
<description>Blog RSS</description>
<language>en-us</language>
";
$html = "";
$url = "http://feeds.feedburner.com/vga4a/mPSm";
$xml = simplexml_load_file($url);
for ($i = 0; $i < 10; $i++){
$title = $xml->channel->item[$i]->title;
$description = $xml->channel->item[$i]->description;
$content = $xml->channel->item[$i]->children("content", true);
$content = preg_replace("/The post.*/","", $content);
echo "<item>
<title>$title</title>
<description>$description</description>
<content>$content</content>
</item>";
}
echo "</channel></rss>";
Just as you don't treat XML as a string when parsing it, you don't treat it as as string when you create it. Use the proper tools to create your XML; in this case, the DomDocument class.
You had a number of problems with your XML; biggest is that you were creating a <content> element, but the original RSS had a <content:encoded> element. That means the element name is encoded but it's in the content namespace. Big difference between that and an element named content. I've added comments to explain the other steps.
<?php
// create the XML document with version and encoding
$xml = new DomDocument("1.0", "UTF-8");
$xml->formatOutput = true;
// add the stylesheet PI
$xml->appendChild(
$xml->createProcessingInstruction(
'xml-stylesheet',
'type="text/xsl" media="screen" href="/~d/styles/rss2full.xsl"'
)
);
// create the root element
$root = $xml->appendChild($xml->createElement('rss'));
// add the version attribute
$v = $root->appendChild($xml->createAttribute('version'));
$v->appendChild($xml->createTextNode('2.0'));
// add the namespace
$root->setAttributeNS(
'http://www.w3.org/2000/xmlns/',
'xmlns:content',
'http://purl.org/rss/1.0/modules/content/'
);
// create some child elements
$ch = $root->appendChild($xml->createElement('channel'));
// specify the text directly as second argument to
// createElement because it doesn't need escaping
$ch->appendChild($xml->createElement('title', 'Blaakdeer'));
$ch->appendChild($xml->createElement('description', 'Blog RSS'));
$ch->appendChild($xml->createElement('language', 'en-us'));
$url = "http://feeds.feedburner.com/vga4a/mPSm";
$rss = simplexml_load_file($url);
for ($i = 0; $i < 10; $i++) {
if (empty($rss->channel->item[$i])) {
continue;
}
$title = $rss->channel->item[$i]->title;
$description = $rss->channel->item[$i]->description;
$content = $rss->channel->item[$i]->children("content", true);
$content = preg_replace("/The post.*/","", $content);
$item_el = $ch->appendChild($xml->createElement('item'));
$title_el = $item_el->appendChild($xml->createElement('title'));
// this stuff is unknown so it has to be escaped
// so have to create a separate text node
$title_el->appendChild($xml->createTextNode($title));
$desc_el = $item_el->appendChild($xml->createElement('description'));
// the other alternative is to create a cdata section
$desc_el->appendChild($xml->createCDataSection($description));
// the content:encoded element is not the same as a content element
// the element must be created with the proper namespace prefix
$cont_el = $item_el->appendChild(
$xml->createElementNS(
'http://purl.org/rss/1.0/modules/content/',
'content:encoded'
)
);
$cont_el->appendChild($xml->createCDataSection($content));
}
header("Content-type: text/xml");
echo $xml->saveXML();
The first error is just a missing attribute, easy enough:
<rss version="2.0" ...>
For the <p> and other HTML elements, you need to escape them. The file should look like this:
<p>...
There are other ways, but this is the easiest way. In PHP you can just call a function to encode entities.
$output .= htmlspecialchars(" <p>Paragraph</p> ");
As for the <content> tag problem, it should be <description> instead. The <content> tag currently generates two errors. Changing it to <description> in both places should fix both errors.
Otherwise it looks like you understand the basics. You <open> and </close> tags and those have to match. You can also use what is called empty tags: <empty/> which exist on their own but to not include content and no closing tag.
I tried writing into my XML file with simpleXML and I wanted to write a string with the value "<test>asd</test>" then it turned into total giberrish (I know this is related with encoding formats but I don't know the solution to fix this, I tried turning into encoding="UTF-8" but it still yield a similar result)
My XML File:
<?xml version="1.0"?>
<userinfos>
<userinfo>
<account>
<user>TIGERBOY-PC</user>
<toDump>2014-02-04 22:17:22</toDump>
<nextToDump>2014-02-05 00:17:22</nextToDump>
<lastChecked>2014-02-04 16:17:22</lastChecked>
<isActive>0</isActive>
<upTime>2014-02-04 16:17:22</upTime>
<toDumpDone>1</toDumpDone>
<systemInfo><test>asd</test></systemInfo>
</account>
<account>
<user>TIGERBOY-PCV</user>
<toDump>2014-02-04 22:17:22</toDump>
<nextToDump>2014-02-05 00:17:22</nextToDump>
<lastChecked>2014-02-04 16:17:22</lastChecked>
<isActive>1</isActive>
<upTime>2014-02-04 16:17:22</upTime>
<toDumpDone>1</toDumpDone>
</account>
</userinfo>
</userinfos>
My PHP File:
<?php
//Start of Functions
function changeAgentInfo()
{
$userorig = $_POST['user'];
$userinfos = simplexml_load_file('userInfo.xml'); // Opens the user XML file
$flag = false;
foreach ($userinfos->userinfo->account as $account)
{
// Checks if the user in this iteration of the loop is the same as $userorig (the user i want to find)
if($account->user == $userorig)
{
$flag = true; // Flag that user is found
$meow = "<test>asd</test>";
$account->addChild('systemInfo',$meow);
}
}
$userinfos->saveXML('userInfo.xml');
echo "Success";
}
//End of Functions
// Start of Program
changeAgentInfo();
?>
Thank you and have a nice day =)
This isn't gibberish; it is simply the XML entities for < (<) and > (>). To add nested XML elements with SimpleXML, you can do the following:
$node = $account->addChild('systemInfo');
$node->addChild('test', 'asd');
You'll see that first we add a node to <account>, then add a child to that newly created node.
If you plan on adding several children to the <systemInfo> element, you could perhaps do the following:
$items = array(
'os' => 'Windows 7',
'ram' => '8GB',
'browser' => 'Google Chrome'
);
$node = $account->addChild('systemInfo');
foreach ($items as $key => $value) {
$node->addChild($key, $value);
}
The addChild function is used to add the child element to an Xml node. You are trying to add xml instead of text.
You have
$meow = "<test>asd</test>";
$account->addChild('systemInfo',$meow);
You should change it to
$account->addChild('systemInfo','my system info text');
I parse an xml file with this code:
$file = file_get_contents('test.xml');
$xml = $file;
echo '<pre>';
$xml = htmlentities_decode ($xml);
print_r (simplexml_load_string($xml));
function htmlentities_decode( $string ){
$trans = get_html_translation_table(HTML_ENTITIES, ENT_QUOTES);
$trans = array_flip($trans);
return strtr($string, $trans);
}
My xml File has Umlauts like this decoded: &amul; or ß.
How do I have to decode/encode my output, that I have to decode/encode them, that they are shown in the same way like above? ( &amul; or ß).
Simple xml can not read them directly, so I have to decode them first, that simple xml can work with it.
Afterwards (after the pasring) I want to save the as utf8 to the database.
What is the best way, to do that?
I'm creating a sitemap in XML, it works well with one record displayed, but when including 1+ records, it throws an error:
XML Parsing Error: junk after document element
Which shows this code here:
<?xml version="1.0" encoding="UTF-8"?>
<url><loc>http://www.mywebsite.com/page/1</loc><changefreq>daily</changefreq><priority>0.6</priority></url>
<url><loc>http://www.mywebsite.com/page/2</loc><changefreq>daily</changefreq><priority>0.6</priority></url>
My code:
$xml = new DOMDocument('1.0', 'UTF-8');
for($i = 0; $i < 2; $i++)
{
$url = $xml->createElement('url');
$xml->appendChild($url);
$website_url = 'http://www.mywebsite.com/page/' . $i;
$loc = $xml->createElement('loc', $website_url);
$url->appendChild($loc);
$change = $xml->createElement('changefreq', 'daily');
$url->appendChild($change);
$priority = $xml->createElement('priority', '0.6');
$url->appendChild($priority);
}
header('Content-type: text/xml');
echo $xml->saveXML();
Why is it throwing this kind of error when the XML seems valid to me?
At least in your example, you have two root nodes (<url>), as this is not allowed in xml, the second is the junk after document element.
You're missing the <urlset> root node, see: http://www.sitemaps.org/protocol.php
I am trying to convert array to xml data in php. I am using xmlserializer pear package for this. My array is:
$arr=array(1000=>'name is john');
When I convert it to xml using this code:
options=array ('mode'=>'simplexml','addDecl'=>true,'indent'=>' ','rootName'=>'names');
$serializer = new XML_Serializer($options);
$result = $serializer->serialize($arr);
if($result == true)
$data=$serializer->getSerializedData();
echo $data;
I get following response:
<?xml version="1.0"?>
<names>name is john</names>
But I want this kind of response:
<?xml version="1.0"?>
<names>
<1000>name is john</1000>
</names>
can anyone tell where my mistake is?
I guess this is because numeric values are not allowed element names in XML. However, if you really want to have "xml-style" output like above (beside it is not real xml) you must bypass the library and code it by hand. I think this will do it for you:
public function xml_encode($array, $tag = "root"){
$result = '<'.$tag.'>';
foreach($array as $key => $value){
if(is_array($value)){
$result.=xml_encode($value, $key);
}else{
$result .= '<'.$key.'>'.$value.'</'.$key.'>';
}
}
$result .= '</'.$tag.'>';
return $result;
}