How can I create an XML file which uses special characters like À,Æ,Ç,È?
Using SimpleXML, it creates the following error
Warning: SimpleXMLElement::__construct(): Entity: line 24: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xE5 0x6C 0x3A 0x20 in C:\xampp\htdocs\protech\admin\xml and rss\xml_create2.php on line 84
TRY This...
<?xml version='1.0' encoding='UTF-8'?>
utf8_encode($variable)
Most likely, utf8_encode() should be enough to fix your problem. It will create an UTF-8 encoded string, as the function name already suggests. So when creating your element, use something like
new SimpleXMLElement(utf8_encode($xml));
You can use DOMDocument to create the xml document and add the elements,text whatever you want ..
See here for the reference
Related
I am using PHP's SimpleXML to process an XML file, and get this error:
Message: simplexml_load_string(): Entity: line 9: parser error : EntityRef: expecting ';'
A quick Google search reveals that this is generally caused by an un-escaped & - there's a dozen questions with that answer here on Stack Overflow. However, here's line 9 of the file:
<p>In-kingdom commentary on the following items can be found on the November LoP. https://oscar.sca.org/kingdom/kingloi.php?kingdom=9&loi=4191</p>
As you can see, the & is escaped. A text search on the file reveals no other instances of &.
What am I missing?
Please note: I have no ability to edit the XML file - I must take it as it comes and only fix things in my code. I currently open the XML with the following code:
$rawstring = file_get_contents($filename);
$safestring = html_entity_decode($rawstring, 0, 'ISO-8859-1');
$xmlstring = simplexml_load_string($safestring);
(the html_entity_decode is necessary as the file uses Latin-1 encoding and simplexml expects UTF-8)
Help appreciated.
html_entity_decode() is not intended for what you appear to think it is intended for and is actually exactly what is causing your problem. As the name suggests: it decodes html entities, like &, into their actual representation; in the case of & => &.
If you want to convert the character encoding of the original $rawstring to ISO-8859-1 or UTF-8 you should use something like iconv() or mb_convert_encoding().
Here's an example that should work:
$rawstring = file_get_contents($filename);
$safestring = mb_convert_encoding($rawstring, 'ISO-8859-1' /*, $optionalOriginalEncoding */);
$xmlstring = simplexml_load_string($safestring);
See the list of supported encodings, as well.
However, since the original $rawstring is Latin-1, conversion to ISO-8859-1 is pointless, since Latin-1 is ISO-8859-1. You may need to convert to UTF-8, but I'm fairly certain that that's not even necessary either.
I want to parse an external php feed.
The address: http://www.hittadjur.se/feed.php?count=1
The output:
<?xml version="1.0"?>
<annons>
<rubrik>Wilja</rubrik>
<datum>2013-03-22</datum>
<ras>Chihuahua långhår</ras>
<ort>Göteborg</ort><bildurl>http://www.hittadjur.se/images/uploaded/thumbs/1363984467.jpg</bildurl><addurl>http://www.hittadjur.se/index.php?page=case&type=&county=32&subpage=show&case=1363984558</addurl>
</annons>
My PHP code that doesn't work:
$content = utf8_encode(file_get_contents('http://www.hittadjur.se/feed.php?count=1'));
$xml = simplexml_load_file($content);
echo $xml->annons->rubrik;
The reason I use the utf8_encode is that I receive this message if I don't:
parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xE5 0x6E 0x67 0x68
The error now is:
Warning: simplexml_load_file() [function.simplexml-load-file]: I/O warning : failed to load external entity
Any ideas?
Thanks!
Try to pass full directory path if you are trying to load xmls held at your server
simplexml_load_file($_SERVER['DOCUMENT_ROOT'].'/example.xml')
or if you want to access xml by http protocol you will need to set allow_url_fopen ON in php.ini or
ini_set('allow_url_fopen ','ON');
in your code. or you can also do this if you are using php version <5
$temp = file_get_contents($url);
$XmlObj = simplexml_load_string($temp);
like alvaro vicario wrote, the problem are the parameter separators ( & ) in your urls. in xml, an ampersand is a entity marker ( = start of a named symbol (sequence) or numerical representation of a character code point ) and must be escaped.
either replace & by & in your urls or mark the urls as literal text (CDATA section in xml speak): <![CDATA[http://...]]>.
eg. : <addurl><![CDATA[http://www.hittadjur.se/index.php?page=case&type=&county=32&subpage=show&case=1363984558]]></addurl>
if you are uncomfortable with the express utf8 conversion in your code and you know the character encoding of your data source, you may enhance the xml prologue (iso-8859-1 contains the offending å/0xE5of your xml):
<?xml version="1.0" encoding="iso-8859-1"?>
I'm afraid that the feed provides malformed XML. Apart from the encoding mess:
<addurl>http://www.hittadjur.se/index.php?page=case&type=&county=32&subpage=show&case=1363984558</addurl>
^
\_ Data not properly escaped
I may be wrong but I don't think you can parse it using regular XML functions because they're designed for valid XML (that's the whole purpose of using XML in the first place).
Perhaps you can try with DOMDocument. It's designed for HTML so it can deal with invalid input but it can also do XML.
Edit: Here's a trick to fix invalid XML but, honestly, I'm not sure it's worth the effort.
I have a XML file that I parse .. I didn't generate the XML .. I faced a problem while parsing the file ... There is a node that has ' single quotes .. This generates errors ..
I tried using addslashes() and htmlentities() when using simplexml_load_file() but nothing happened!! .. Is there a way to resolve this and parse the file with the quotes??
Warning: simplexml_load_file() [function.simplexml-load-file]: THE URL:853: parser error : Input is not proper UTF-8, indicate encoding ! Bytes: 0xE2 0x20 0x20 0x6C in /****/parseXML.php on line 7
It sounds like you have an encoding problem. The single quotes are likely not the single quote character as found on your keyboard but rather a fancier one generated by a word pressing program. The result is likely that your input file is not UTF-8, but rather another multibyte character set.
You either need to convert the encoding with a text editor to make the file completely UTF-8, or use PHP's iconv to convert from the encoding of the file (probably ISO-8859-1) to UTF-8 encoding and then load it with simplexml_load_string.
I am creating a XML navigation for my website. This line below is causing a simpleXML issue:
<label>Osnabrück</label>
My PHP code, using HTMLentities has changed Osnabrück into Osnabrück. However, when trying to parse my XML with this line in it, I get this error:
/application/configs/navigation.xml:318: parser error : Entity 'Atilde' not defined simplexml_load_file()
Should I not be using htmlentities()? Or is there some kind of setting I'm missing?
Kind Regards
Steve
You should not be using HTML Entities in XML. Using normal UTF-8 characters should be fine.
The occurrence of Osnabrück means that at some point, most likely, the city name is processed as ISO-8859-1 instead of UTF-8. It is not htmlentities()'s fault. You need to find that point and fix it.
You can use iconv() function to convert to utf-8 dynamicaly.
iconv("ISO-8859-1", "UTF-8", $text);
I'm trying to parse a XML file, but when loading it simpleXML prints the following warning:
Warning: simplexml_load_file() [function.simplexml-load-file]: gpr_545.xml:55: parser error : Entity 'Oslash' not defined in import.php on line 35
This is that line:
<forenames>BØIE</forenames><x> </x>
As it is a warning, I might ignore it, but I'd like to understand what is happening.
HTML-entities like Ø is not the same as XML-entities. Here's a table for replacing HTML-entities to XML-entities.
As I can tell from one of your comments to another post, you're having trouble with an entity /. I don't know if this even is a valid HTML-entity, my Firefox won't show the character - only ouputs the entity name. But I found an other table for most entities and their character reference number. Try adding them to your replace-table and you should be safe. /'s reference number is / by the way.
HTML Encoding of Latin1 characters (like Ø, what that character describes) is what has broken the XML parser. If you're in control of the data, you need to escape it using XML style character encoding (Ø just happens to be & #216;)
I think this is an encoding problem. php, simplexml in this particular case, does not like the danish O you've got in that fornames tag. You could try to encode the whole file in utf-8 and removing the escaped version from the tag by that. Aferwards you can read a fully escaped character free file into simplexml.
K
Just had a very similar problem and solved it in the following way. The main idea was to load a file into a string, replace all bad entities on something like "[[entity]]Oslash;" and carry out reverse replacement before displaying some xml node.
function readXML($filename){
$xml_string = implode("", file($filename));
$xml_string = str_replace("&", "[[entity]]", $xml_string);
return simplexml_load_string($xml_string);
}
function xml2str($xml){
$str = str_replace("[[entity]]", "&", (string)$xml);
$str = iconv("UTF-8", "WINDOWS-1251", $str);
return $str;
}
$xml = readXML($filename);
echo xml2str($xml->forenames);
iconv("UTF-8", "WINDOWS-1251", $str) as I have "WINDOWS-1251" encoding on my page
Try to use this line:
<forenames><![CDATA[BØIE]]></forenames><x> </x>
and read this about CDATA