converting & to & for XML in PHP - php

I am building a XML RSS for my page. And running into this error:
error on line 39 at column 46: xmlParseEntityRef: no name
Apparently this is because I cant have & in XML... Which I do in my last field row...
What is the best way to clean all my $row['field']'s in PHP so that &'s turn into &

Use htmlspecialchars to encode just the HTML special characters &, <, >, " and optionally ' (see second parameter $quote_style).

It's called htmlentities() and html_entity_decode()

Really should look in the dom xml functions in php. Its a bit of work to figure out, but you avoid problems like this.

Convert Reserved XML characters to Entities
function xml_convert($str, $protect_all = FALSE)
{
$temp = '__TEMP_AMPERSANDS__';
// Replace entities to temporary markers so that
// ampersands won't get messed up
$str = preg_replace("/&#(\d+);/", "$temp\\1;", $str);
if ($protect_all === TRUE)
{
$str = preg_replace("/&(\w+);/", "$temp\\1;", $str);
}
$str = str_replace(array("&","<",">","\"", "'", "-"),
array("&", "<", ">", """, "&apos;", "-"),
$str);
// Decode the temp markers back to entities
$str = preg_replace("/$temp(\d+);/","&#\\1;",$str);
if ($protect_all === TRUE)
{
$str = preg_replace("/$temp(\w+);/","&\\1;", $str);
}
return $str;
}

Use
html_entity_decode($row['field']);
This will take and revert back to the & from & also if you have &npsb; it will change that to a space.
http://us.php.net/html_entity_decode
Cheers

Related

PHP Convert Unicode to text

I am receiving from a form the following urlencoded string %F0%9D%90%B4%F0%9D%91%99%F0%9D%91%92%F0%9D%91%97%F0%9D%91%8E%F0%9D%91%9B%F0%9D%91%91%F0%9D%91%9F%F0%9D%91%8E
If I decode it I get the following formatted text: ๐ด๐‘™๐‘’๐‘—๐‘Ž๐‘›๐‘‘๐‘Ÿ๐‘Ž
Is there any way with PHP to get the plain "Alejandra" text from the encoded or decoded string?
I have tried without success several ways to do it with
mb_convert_encoding($string, "UTF-16",mb_detect_encoding($string))
iconv('utf-16', 'utf-8', rawurldecode($string)
and any other solution I could in stackoverflow.
Edit:
I tried the proposed solution $strAscii = iconv('UTF-8','ASCII//TRANSLIT',$str); but it deletes the special characters such as รกรฉรญรณรบรฑรง which we need to stay.
Expected result
input: ๐ด๐‘™๐‘’๐‘—๐‘Ž๐‘›๐‘‘๐‘Ÿ๐‘Ž
output: Alejandra
input: รlejandra
output: รlejandra
Thank you in advance.
urldecode or rawurldecode is sufficient.
$string = "%F0%9D%90%B4%F0%9D%91%99%F0%9D%91%92%F0%9D%91%97%F0%9D%91%8E%F0%9D%91%9B%F0%9D%91%91%F0%9D%91%9F%F0%9D%91%8E";
$str = urldecode($string);
var_dump($str);
//string(36) "๐ด๐‘™๐‘’๐‘—๐‘Ž๐‘›๐‘‘๐‘Ÿ๐‘Ž"
Demo: https://3v4l.org/OMQ35
A special debugger gives me: string(36) UTF-8mb4. This means that there are also UTF-8 characters in the string that require 4 bytes. The character A is the Unicode character โ€œ๐ดโ€ (U+1D434).
Note:
If the special UTF-8 characters cause problems, you can try to display the strings as ASCII characters with iconv.
$strAscii = iconv('UTF-8','ASCII//TRANSLIT',$str);
//string(9) "Alejandra"
What you are getting is called a "psuedo-alphabet", you can see a list of them here: https://qaz.wtf/u/convert.cgi. The one that you appear to be getting can be seen here: https://en.wikipedia.org/wiki/Mathematical_Alphanumeric_Symbols
Basically what you need to do is take the string, split it and use a lookup table to convert it back to regular characters. This implementation is terribly efficient but that's because I grabbed the alphabets from the above Wikipedia page and was too lazy to reorganise it.
function math_symbols_to_plain_text($input, $alphabet)
{
$alphabets = [
['a','๐š','๐‘Ž','๐’‚','๐–บ','๐—ฎ','๐˜ข','๐™–','๐’ถ','๐“ช','๐”ž','๐–†','๐šŠ','๐•’'],
['b','๐›','๐‘','๐’ƒ','๐–ป','๐—ฏ','๐˜ฃ','๐™—','๐’ท','๐“ซ','๐”Ÿ','๐–‡','๐š‹','๐•“'],
['c','๐œ','๐‘','๐’„','๐–ผ','๐—ฐ','๐˜ค','๐™˜','๐’ธ','๐“ฌ','๐” ','๐–ˆ','๐šŒ','๐•”'],
['d','๐','๐‘‘','๐’…','๐–ฝ','๐—ฑ','๐˜ฅ','๐™™','๐’น','๐“ญ','๐”ก','๐–‰','๐š','๐••'],
['e','๐ž','๐‘’','๐’†','๐–พ','๐—ฒ','๐˜ฆ','๐™š','โ„ฏ','๐“ฎ','๐”ข','๐–Š','๐šŽ','๐•–'],
['f','๐Ÿ','๐‘“','๐’‡','๐–ฟ','๐—ณ','๐˜ง','๐™›','๐’ป','๐“ฏ','๐”ฃ','๐–‹','๐š','๐•—'],
['g','๐ ','๐‘”','๐’ˆ','๐—€','๐—ด','๐˜จ','๐™œ','โ„Š','๐“ฐ','๐”ค','๐–Œ','๐š','๐•˜'],
['h','๐ก','โ„Ž','๐’‰','๐—','๐—ต','๐˜ฉ','๐™','๐’ฝ','๐“ฑ','๐”ฅ','๐–','๐š‘','๐•™'],
['i','๐ข','๐‘–','๐’Š','๐—‚','๐—ถ','๐˜ช','๐™ž','๐’พ','๐“ฒ','๐”ฆ','๐–Ž','๐š’','๐•š'],
['j','๐ฃ','๐‘—','๐’‹','๐—ƒ','๐—ท','๐˜ซ','๐™Ÿ','๐’ฟ','๐“ณ','๐”ง','๐–','๐š“','๐•›'],
['k','๐ค','๐‘˜','๐’Œ','๐—„','๐—ธ','๐˜ฌ','๐™ ','๐“€','๐“ด','๐”จ','๐–','๐š”','๐•œ'],
['l','๐ฅ','๐‘™','๐’','๐—…','๐—น','๐˜ญ','๐™ก','๐“','๐“ต','๐”ฉ','๐–‘','๐š•','๐•'],
['m','๐ฆ','๐‘š','๐’Ž','๐—†','๐—บ','๐˜ฎ','๐™ข','๐“‚','๐“ถ','๐”ช','๐–’','๐š–','๐•ž'],
['n','๐ง','๐‘›','๐’','๐—‡','๐—ป','๐˜ฏ','๐™ฃ','๐“ƒ','๐“ท','๐”ซ','๐–“','๐š—','๐•Ÿ'],
['o','๐จ','๐‘œ','๐’','๐—ˆ','๐—ผ','๐˜ฐ','๐™ค','โ„ด','๐“ธ','๐”ฌ','๐–”','๐š˜','๐• '],
['p','๐ฉ','๐‘','๐’‘','๐—‰','๐—ฝ','๐˜ฑ','๐™ฅ','๐“…','๐“น','๐”ญ','๐–•','๐š™','๐•ก'],
['q','๐ช','๐‘ž','๐’’','๐—Š','๐—พ','๐˜ฒ','๐™ฆ','๐“†','๐“บ','๐”ฎ','๐––','๐šš','๐•ข'],
['r','๐ซ','๐‘Ÿ','๐’“','๐—‹','๐—ฟ','๐˜ณ','๐™ง','๐“‡','๐“ป','๐”ฏ','๐–—','๐š›','๐•ฃ'],
['s','๐ฌ','๐‘ ','๐’”','๐—Œ','๐˜€','๐˜ด','๐™จ','๐“ˆ','๐“ผ','๐”ฐ','๐–˜','๐šœ','๐•ค'],
['t','๐ญ','๐‘ก','๐’•','๐—','๐˜','๐˜ต','๐™ฉ','๐“‰','๐“ฝ','๐”ฑ','๐–™','๐š','๐•ฅ'],
['u','๐ฎ','๐‘ข','๐’–','๐—Ž','๐˜‚','๐˜ถ','๐™ช','๐“Š','๐“พ','๐”ฒ','๐–š','๐šž','๐•ฆ'],
['v','๐ฏ','๐‘ฃ','๐’—','๐—','๐˜ƒ','๐˜ท','๐™ซ','๐“‹','๐“ฟ','๐”ณ','๐–›','๐šŸ','๐•ง'],
['w','๐ฐ','๐‘ค','๐’˜','๐—','๐˜„','๐˜ธ','๐™ฌ','๐“Œ','๐”€','๐”ด','๐–œ','๐š ','๐•จ'],
['x','๐ฑ','๐‘ฅ','๐’™','๐—‘','๐˜…','๐˜น','๐™ญ','๐“','๐”','๐”ต','๐–','๐šก','๐•ฉ'],
['y','๐ฒ','๐‘ฆ','๐’š','๐—’','๐˜†','๐˜บ','๐™ฎ','๐“Ž','๐”‚','๐”ถ','๐–ž','๐šข','๐•ช'],
['z','๐ณ','๐‘ง','๐’›','๐—“','๐˜‡','๐˜ป','๐™ฏ','๐“','๐”ƒ','๐”ท','๐–Ÿ','๐šฃ','๐•ซ'],
['A','๐€','๐ด','๐‘จ','๐– ','๐—”','๐˜ˆ','๐˜ผ','๐’œ','๐“','๐”„','๐•ฌ','๐™ฐ','๐”ธ'],
['B','๐','๐ต','๐‘ฉ','๐–ก','๐—•','๐˜‰','๐˜ฝ','โ„ฌ','๐“‘','๐”…','๐•ญ','๐™ฑ','๐”น'],
['C','๐‚','๐ถ','๐‘ช','๐–ข','๐—–','๐˜Š','๐˜พ','๐’ž','๐“’','โ„ญ','๐•ฎ','๐™ฒ','โ„‚'],
['D','๐ƒ','๐ท','๐‘ซ','๐–ฃ','๐——','๐˜‹','๐˜ฟ','๐’Ÿ','๐““','๐”‡','๐•ฏ','๐™ณ','๐”ป'],
['E','๐„','๐ธ','๐‘ฌ','๐–ค','๐—˜','๐˜Œ','๐™€','โ„ฐ','๐“”','๐”ˆ','๐•ฐ','๐™ด','๐”ผ'],
['F','๐…','๐น','๐‘ญ','๐–ฅ','๐—™','๐˜','๐™','โ„ฑ','๐“•','๐”‰','๐•ฑ','๐™ต','๐”ฝ'],
['G','๐†','๐บ','๐‘ฎ','๐–ฆ','๐—š','๐˜Ž','๐™‚','๐’ข','๐“–','๐”Š','๐•ฒ','๐™ถ','๐”พ'],
['H','๐‡','๐ป','๐‘ฏ','๐–ง','๐—›','๐˜','๐™ƒ','โ„‹','๐“—','โ„Œ','๐•ณ','๐™ท','โ„'],
['I','๐ˆ','๐ผ','๐‘ฐ','๐–จ','๐—œ','๐˜','๐™„','โ„','๐“˜','โ„‘','๐•ด','๐™ธ','๐•€'],
['J','๐‰','๐ฝ','๐‘ฑ','๐–ฉ','๐—','๐˜‘','๐™…','๐’ฅ','๐“™','๐”','๐•ต','๐™น','๐•'],
['K','๐Š','๐พ','๐‘ฒ','๐–ช','๐—ž','๐˜’','๐™†','๐’ฆ','๐“š','๐”Ž','๐•ถ','๐™บ','๐•‚'],
['L','๐‹','๐ฟ','๐‘ณ','๐–ซ','๐—Ÿ','๐˜“','๐™‡','โ„’','๐“›','๐”','๐•ท','๐™ป','๐•ƒ'],
['M','๐Œ','๐‘€','๐‘ด','๐–ฌ','๐— ','๐˜”','๐™ˆ','โ„ณ','๐“œ','๐”','๐•ธ','๐™ผ','๐•„'],
['N','๐','๐‘','๐‘ต','๐–ญ','๐—ก','๐˜•','๐™‰','๐’ฉ','๐“','๐”‘','๐•น','๐™ฝ','โ„•'],
['O','๐Ž','๐‘‚','๐‘ถ','๐–ฎ','๐—ข','๐˜–','๐™Š','๐’ช','๐“ž','๐”’','๐•บ','๐™พ','๐•†'],
['P','๐','๐‘ƒ','๐‘ท','๐–ฏ','๐—ฃ','๐˜—','๐™‹','๐’ซ','๐“Ÿ','๐”“','๐•ป','๐™ฟ','โ„™'],
['Q','๐','๐‘„','๐‘ธ','๐–ฐ','๐—ค','๐˜˜','๐™Œ','๐’ฌ','๐“ ','๐””','๐•ผ','๐š€','โ„š'],
['R','๐‘','๐‘…','๐‘น','๐–ฑ','๐—ฅ','๐˜™','๐™','โ„›','๐“ก','โ„œ','๐•ฝ','๐š','โ„'],
['S','๐’','๐‘†','๐‘บ','๐–ฒ','๐—ฆ','๐˜š','๐™Ž','๐’ฎ','๐“ข','๐”–','๐•พ','๐š‚','๐•Š'],
['T','๐“','๐‘‡','๐‘ป','๐–ณ','๐—ง','๐˜›','๐™','๐’ฏ','๐“ฃ','๐”—','๐•ฟ','๐šƒ','๐•‹'],
['U','๐”','๐‘ˆ','๐‘ผ','๐–ด','๐—จ','๐˜œ','๐™','๐’ฐ','๐“ค','๐”˜','๐–€','๐š„','๐•Œ'],
['V','๐•','๐‘‰','๐‘ฝ','๐–ต','๐—ฉ','๐˜','๐™‘','๐’ฑ','๐“ฅ','๐”™','๐–','๐š…','๐•'],
['W','๐–','๐‘Š','๐‘พ','๐–ถ','๐—ช','๐˜ž','๐™’','๐’ฒ','๐“ฆ','๐”š','๐–‚','๐š†','๐•Ž'],
['X','๐—','๐‘‹','๐‘ฟ','๐–ท','๐—ซ','๐˜Ÿ','๐™“','๐’ณ','๐“ง','๐”›','๐–ƒ','๐š‡','๐•'],
['Y','๐˜','๐‘Œ','๐’€','๐–ธ','๐—ฌ','๐˜ ','๐™”','๐’ด','๐“จ','๐”œ','๐–„','๐šˆ','๐•'],
['Z','๐™','๐‘','๐’','๐–น','๐—ญ','๐˜ก','๐™•','๐’ต','๐“ฉ','โ„จ','๐–…','๐š‰','โ„ค']
];
$replace = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'];
$lookup = [
'serif-normal',
'serif-bold',
'serif-italic',
'serif-bolditalic',
'sans-normal',
'sans-bold',
'sans-italic',
'sans-bolditalic',
'script-normal',
'script-bold',
'franktur-normal',
'fraktur-bold',
'monospace',
'doublestruck'
];
$map_index = array_search($alphabet, $lookup);
$split = mb_str_split($input);
$output = '';
foreach ($split as $char) {
foreach ($alphabets as $i => $letter) {
if ($letter[$map_index] === $char)
$output .= $replace[$i];
}
}
return $output;
}
$input = '๐ด๐‘™๐‘’๐‘—๐‘Ž๐‘›๐‘‘๐‘Ÿ๐‘Ž';
$output = math_symbols_to_plain_text($input, 'serif-italic');
echo $input . PHP_EOL . $output . PHP_EOL;
Yields:
๐ด๐‘™๐‘’๐‘—๐‘Ž๐‘›๐‘‘๐‘Ÿ๐‘Ž
Alejandra
If I am not wrong, you are trying to decode URL then why you are not trying to use urldecode()
follow this .PHP DOC

PHP simplexml_load_file and LIBXML_NOENT [duplicate]

I have a php file which prints an xml based on a MySql db.
I get an error every time at exactly the point where there is an & sign.
Here is some php:
$query = mysql_query($sql);
$_xmlrows = '';
while ($row = mysql_fetch_array($query)) {
$_xmlrows .= xmlrowtemplate($row);
}
function xmlrowtemplate($dbrow){
return "<AD>
<CATEGORY>".$dbrow['category']."</CATEGORY>
</AD>
}
The output is what I want, i.e. the file outputs the correct category, but still gives an error.
The error says: xmlParseEntityRef: no name
And then it points to the exact character which is a & sign.
This complains only if the $dbrow['category'] is something with an & sign in it, for example: "cars & trucks", or "computers & telephones".
Anybody know what the problem is?
BTW: I have the encoding set to UTF-8 in all documents, as well as the xml output.
& in XML starts an entity. As you haven't defined an entity &WhateverIsAfterThat an error is thrown. You should escape it with &.
$string = str_replace('&', '&', $string);
How do I escape ampersands in XML
To escape the other reserved characters:
function xmlEscape($string) {
return str_replace(array('&', '<', '>', '\'', '"'), array('&', '<', '>', '&apos;', '"'), $string);
}
$string =htmlspecialchars($string,ENT_XML1);
is the most universal way to solve all encoding errors (IMHO better that write custom functions + there is no point to solve just &).
Credit: Put Wrikken's and joshweir's comment as answer to be more visible.
You need to either turn & into its entity &, or wrap the contents in CDATA tags.
If you choose the entity route, there are additional characters you need to turn into entities:
> >
< <
' &apos;
" "
Background: Beware of the ampersand when using XML
Wikipedia: List of XML character entity references
Switch and regex with using xml escape function.
function XmlEscape(str) {
if (!str || str.constructor !== String) {
return "";
}
return str.replace(/[\"&><]/g, function (match) {
switch (match) {
case "\"":
return """;
case "&":
return "&";
case "<":
return "<";
case ">":
return ">";
}
});
};
public function sanitize(string $data) {
return str_replace('&', '&', $data);
}
You are right: here is more context - the example is in relation to the ' how to deal with data containing '&' when we pass this data to SimpleXml. Of course there is also other solution to use
<![CDATA[some stuff]]>

Replace characters with word in PHP?

Want to replace specific letters in a string to a full word.
I'm using:
function spec2hex($instr) {
for ($i=0; $i<strlen($instr); $i++) {
$char = substr($instr, $i,1);
if ($char == "a"){
$char = "hello";
}
$convString .= "&#".ord($char).";";
}
return $convString;
}
$myString = "adam";
$convertedString = spec2hex($myString);
echo $convertedString;
but that's returning:
hdhm
How do I do this? By the way, this is to replace punctuation with hex characters.
Thanks all.
Use http://php.net/substr_replace
substr_replace($instr, $word, $i,1);
ord() expects only a SINGLE character. You're passing in hello, so ord is doing its thing only on the h:
php > echo ord('hello');
104
php > echo ord('h');
104
So in effect your output is actually
hdhm
it you want to use your same code just change $convString .= "&#".ord($char).";";
to $convString .= $char;
If you just want to replace the occurrence of a with hello within the string you pass to the function, why not use PHP's str_replace()?
function spec2hex($instr) {
return str_replace("a","hello",$instr);
}
I must assume that you don't want to have hex characters instead of punctuation but html entities. Be aware that str_replace(), when called with arrays, will run over the string for multiple times, thus replacing the ";" in "{" also!
Your posted code is not useful for replacing punctuation.
use strtr() with arrays, it doesn't have the drawback of str_replace().
$aReplacements = array(',' => ',', '.' => '.'); //todo: complete the array
$sText = strtr($sText, $aReplacements);

XML error at ampersand (&)

I have a php file which prints an xml based on a MySql db.
I get an error every time at exactly the point where there is an & sign.
Here is some php:
$query = mysql_query($sql);
$_xmlrows = '';
while ($row = mysql_fetch_array($query)) {
$_xmlrows .= xmlrowtemplate($row);
}
function xmlrowtemplate($dbrow){
return "<AD>
<CATEGORY>".$dbrow['category']."</CATEGORY>
</AD>
}
The output is what I want, i.e. the file outputs the correct category, but still gives an error.
The error says: xmlParseEntityRef: no name
And then it points to the exact character which is a & sign.
This complains only if the $dbrow['category'] is something with an & sign in it, for example: "cars & trucks", or "computers & telephones".
Anybody know what the problem is?
BTW: I have the encoding set to UTF-8 in all documents, as well as the xml output.
& in XML starts an entity. As you haven't defined an entity &WhateverIsAfterThat an error is thrown. You should escape it with &.
$string = str_replace('&', '&', $string);
How do I escape ampersands in XML
To escape the other reserved characters:
function xmlEscape($string) {
return str_replace(array('&', '<', '>', '\'', '"'), array('&', '<', '>', '&apos;', '"'), $string);
}
$string =htmlspecialchars($string,ENT_XML1);
is the most universal way to solve all encoding errors (IMHO better that write custom functions + there is no point to solve just &).
Credit: Put Wrikken's and joshweir's comment as answer to be more visible.
You need to either turn & into its entity &, or wrap the contents in CDATA tags.
If you choose the entity route, there are additional characters you need to turn into entities:
> >
< <
' &apos;
" "
Background: Beware of the ampersand when using XML
Wikipedia: List of XML character entity references
Switch and regex with using xml escape function.
function XmlEscape(str) {
if (!str || str.constructor !== String) {
return "";
}
return str.replace(/[\"&><]/g, function (match) {
switch (match) {
case "\"":
return """;
case "&":
return "&";
case "<":
return "<";
case ">":
return ">";
}
});
};
public function sanitize(string $data) {
return str_replace('&', '&', $data);
}
You are right: here is more context - the example is in relation to the ' how to deal with data containing '&' when we pass this data to SimpleXml. Of course there is also other solution to use
<![CDATA[some stuff]]>

Revert escaped characters

I saved some data in the database using mysql_real_escape_string() so the single quotes are escaped like this '. It looks ok in the browser, but how can I convert it back to single quote when I save the text in a txt file?
Please note that mysql_real_escape_string() does not turn apostrophes ' into ' Only HTML-oriented functions do, so you must have calls to htmlentities() somewhere in your script.
As for your question, the function you're looking for is html_entity_decode()
echo html_entity_decode(''', ENT_QUOTES);
This is the reason why you should not store encoded text in the database. You should have stored it in it's original format, and encoded it when you display it.
Now you have to check what characters the function does encode, and write string replacements that converts them back, in reverse order.
Pseudo-code example:
s = Replace(s, "'", "'")
s = Replace(s, "<", "<")
s = Replace(s, ">", ">")
s = Replace(s, "&", "&")
That is just an ascii value of "'", use chr to get it back to a character. Here's the code
$string = "Hello ' Man";
$string = preg_replace('|&#(\d{1,3});|e', 'chr(\1)', $string);
echo $string; # Hello ' Man

Categories