PHP How to encode text to numeric entity? - php

I have xml like this:
<formula type="inline">
<default:math xmlns="http://www.w3.org/1998/Math/MathML">
<default:mi>
&Zopf;
</default:mi>
</default:math>
</formula>
My goal is to get rid of all special entities like &Zopf; by replacing them by their numeric entity presentations.
I tried :
$test = <content of the xml>;
$convmap = array(0x80, 0xffff, 0, 0xffff);
$test = mb_encode_numericentity($test, $convmap, 'UTF-8');
But this will not replace the &Zopf; Any idea?
My goal is to get:
ℤ
as shown here: http://www.fileformat.info/info/unicode/char/2124/index.htm
Thank you.

Your converter is converting your LaTeX into MathML, not HTML entities. You need something that converts directly into HTML character references, or a MathML to HTML character reference converter.
You should be able to use htmlentities:
htmlentities($symbolsToEncode, ENT_XML1, 'UTF-8');
http://pt1.php.net/htmlentities
You can change ENT_XML1 to ENT_SUBSTITUTE and it will return Unicode Replacement Characters or Hex character references.
As an alternative, you could use strtr to convert the characters to something you specify:
$chars = array(
"\x8484" => "蒄"
...
);
$convertedXML = strtr($xml, $chars);
http://php.net/strtr
Someone has done something similar on GitHub.

So you need to decode the named entities first:
function decodeNamedEntities($string) {
static $entities = NULL;
if (NULL === $entities) {
$entities = array_flip(
array_diff(
get_html_translation_table(HTML_ENTITIES, ENT_COMPAT | ENT_HTML5, 'UTF-8'),
get_html_translation_table(HTML_ENTITIES, ENT_COMPAT | ENT_XML1, 'UTF-8')
)
);
}
return str_replace(array_keys($entities), $entities, $string);
}
After that you can use htmlentities to encode them in a different format if it is really needed.

Related

PHP Escaped special characters to html

I have string that looks like this "v\u00e4lkommen till mig" that I get after doing utf8_encode() on the string.
I would like that string to become
välkommen till mig
where the character
\u00e4 = ä = ä
How can I achive this in PHP?
Do not use utf8_(de|en)code. It just converts from UTF8 to ISO-8859-1 and back. ISO 8859-1 does not provide the same characters as ISO-8859-15 or Windows1252, which are the most used encodings (besides UTF-8). Better use mb_convert_encoding.
"v\u00e4lkommen till mig" > This string looks like a JSON encoded string which IS already utf8 encoded. The unicode code positiotion of "ä" is U+00E4 >> \u00e4.
Example
<?php
header('Content-Type: text/html; charset=utf-8');
$json = '"v\u00e4lkommen till mig"';
var_dump(json_decode($json)); //It will return a utf8 encoded string "välkommen till mig"
What is the source of this string?
There is no need to replace the ä with its HTML representation ä, if you print it in a utf8 encoded document and tell the browser the used encoding. If it is necessary, use htmlentities:
<?php
$json = '"v\u00e4lkommen till mig"';
$string = json_decode($json);
echo htmlentities($string, ENT_COMPAT, 'UTF-8');
Edit: Since you want to keep HTML characters, and I now think your source string isn't quite what you posted (I think it is actual unicode, rather than containing \unnnn as a string), I think your best option is this:
$html = str_replace( str_replace( str_replace( htmlentities( $whatever ), '<', '<' ), '>', '>' ), '&', '&' );
(note: no call to utf8-decode)
Original answer:
There is no direct conversion. First, decode it again:
$decoded = utf8_decode( $whatever );
then encode as HTML:
$html = htmlentities( $decoded );
and of course you can do it without a variable:
$html = htmlentities( utf8_decode( $whatever ) );
http://php.net/manual/en/function.utf8-decode.php
http://php.net/manual/en/function.htmlentities.php
To do this by regular expression (not recommended, likely slower, less reliable), you can use the fact that HTML supports &#xnnnn; constructs, where the nnnn is the same as your existing \unnnn values. So you can say:
$html = preg_replace( '/\\\\u([0-9a-f]{4})/i', '&#x$1;', $whatever )
The html_entity_decode worked for me.
$json = '"v\u00e4lkommen till mig"';
echo $decoded = html_entity_decode( json_decode($json) );

PHP not have a function for XML-safe entity decode? Not have some xml_entity_decode?

THE PROBLEM: I need a XML file "full encoded" by UTF8; that is, with no entity representing symbols, all symbols enconded by UTF8, except the only 3 ones that are XML-reserved, "&" (amp), "<" (lt) and ">" (gt). And, I need a build-in function that do it fast: to transform entities into real UTF8 characters (without corrupting my XML).
PS: it is a "real world problem" (!); at PMC/journals, for example, have 2.8 MILLION of scientific articles enconded with a special XML DTD (knowed also as JATS format)... To process as "usual XML-UTF8-text" we need to change from numeric entity to UTF8 char.
THE ATTEMPTED SOLUTION: the natural function to this task is html_entity_decode, but it destroys the XML code (!), transforming the reserved 3 XML-reserved symbols.
Illustrating the problem
Suppose
$xmlFrag ='<p>Hello world!    Let A<B and A=∬dxdy</p>';
Where the entities 160 (nbsp) and x222C (double integral) must be transformed into UTF8, and the XML-reserved lt not. The XML text will be (after transformed),
$xmlFrag = '<p>Hello world!    Let A<B and A=∬dxdy</p>';
The text "A<B" needs an XML-reserved character, so MUST stay as A<B.
Frustrated solutions
I try to use html_entity_decode for solve (directly!) the problem... So, I updated my PHP to v5.5 to try to use the ENT_XML1 option,
$s = html_entity_decode($xmlFrag, ENT_XML1, 'UTF-8'); // not working
// as I expected
Perhaps another question is, "WHY there are no other option to do what I expected?" -- it is important for many other XML applications (!), not only for me.
I not need a workaround as answer... Ok, I show my ugly function, perhaps it helps you to understand the problem,
function xml_entity_decode($s) {
// here an illustration (by user-defined function)
// about how the hypothetical PHP-build-in-function MUST work
static $XENTITIES = array('&','>','<');
static $XSAFENTITIES = array('#_x_amp#;','#_x_gt#;','#_x_lt#;');
$s = str_replace($XENTITIES,$XSAFENTITIES,$s);
//$s = html_entity_decode($s, ENT_NOQUOTES, 'UTF-8'); // any php version
$s = html_entity_decode($s, ENT_HTML5|ENT_NOQUOTES, 'UTF-8'); // PHP 5.3+
$s = str_replace($XSAFENTITIES,$XENTITIES,$s);
return $s;
} // you see? not need a benchmark:
// it is not so fast as direct use of html_entity_decode; if there
// was an XML-safe option was ideal.
PS: corrected after this answer. Must be ENT_HTML5 flag, for convert really all named entities.
This question is creating, time-by-time, a "false answer" (see answers). This is perhaps because people not pay attention, and because there are NO ANSWER: there are a lack of PHP build-in solution.
... So, lets repeat my workaround (that is NOT an answer!) to not create more confusion:
The best workaround
Pay attention:
The function xml_entity_decode() below is the best (over any other) workaround.
The function below is not an answer to the present question, it is only a workwaround.
function xml_entity_decode($s) {
// illustrating how a (hypothetical) PHP-build-in-function MUST work
static $XENTITIES = array('&','>','<');
static $XSAFENTITIES = array('#_x_amp#;','#_x_gt#;','#_x_lt#;');
$s = str_replace($XENTITIES,$XSAFENTITIES,$s);
$s = html_entity_decode($s, ENT_HTML5|ENT_NOQUOTES, 'UTF-8'); // PHP 5.3+
$s = str_replace($XSAFENTITIES,$XENTITIES,$s);
return $s;
}
To test and to demonstrate that you have a better solution, please test first with this simple benckmark:
$countBchMk_MAX=1000;
$xml = file_get_contents('sample1.xml'); // BIG and complex XML string
$start_time = microtime(TRUE);
for($countBchMk=0; $countBchMk<$countBchMk_MAX; $countBchMk++){
$A = xml_entity_decode($xml); // 0.0002
/* 0.0014
$doc = new DOMDocument;
$doc->loadXML($xml, LIBXML_DTDLOAD | LIBXML_NOENT);
$doc->encoding = 'UTF-8';
$A = $doc->saveXML();
*/
}
$end_time = microtime(TRUE);
echo "\n<h1>END $countBchMk_MAX BENCKMARKs WITH ",
($end_time - $start_time)/$countBchMk_MAX,
" seconds</h1>";
Use the DTD when loading the JATS XML document, as it will define any mapping from named entities to Unicode characters, then set the encoding to UTF-8 when saving:
$doc = new DOMDocument;
$doc->load($inputFile, LIBXML_DTDLOAD | LIBXML_NOENT);
$doc->encoding = 'UTF-8';
$doc->save($outputFile);
I had the same problem because someone used HTML templates to create XML, instead of using SimpleXML. sigh... Anyway, I came up with the following. It's not as fast as yours, but it's not an order of magnitude slower, and it is less hacky. Yours will inadvertently convert #_x_amp#; to $amp;, however unlikely its presence in the source XML.
Note: I'm assuming default encoding is UTF-8
// Search for named entities (strings like "&abc1;").
echo preg_replace_callback('#&[A-Z0-9]+;#i', function ($matches) {
// Decode the entity and re-encode as XML entities. This means "&"
// will remain "&" whereas "€" becomes "€".
return htmlentities(html_entity_decode($matches[0]), ENT_XML1);
}, "<Foo>€&foo Ç</Foo>") . "\n";
/* <Foo>€&foo Ç</Foo> */
Also, if you want to replace special characters with numbered entities (in case you don't want a UTF-8 XML), you can easily add a function to the above code:
// Search for named entities (strings like "&abc1;").
$xml_utf8 = preg_replace_callback('#&[A-Z0-9]+;#i', function ($matches) {
// Decode the entity and re-encode as XML entities. This means "&"
// will remain "&" whereas "€" becomes "€".
return htmlentities(html_entity_decode($matches[0]), ENT_XML1);
}, "<Foo>€&foo Ç</Foo>") . "\n";
echo mb_encode_numericentity($xml_utf8, [0x80, 0xffff, 0, 0xffff]);
/* <Foo>€&foo Ç</Foo> */
In your case you want it the other way around. Encode numbered entities as UTF-8:
// Search for named entities (strings like "&abc1;").
$xml_utf8 = preg_replace_callback('#&[A-Z0-9]+;#i', function ($matches) {
// Decode the entity and re-encode as XML entities. This means "&"
// will remain "&" whereas "€" becomes "€".
return htmlentities(html_entity_decode($matches[0]), ENT_XML1);
}, "<Foo>€&foo Ç</Foo>") . "\n";
// Encodes (uncaught) numbered entities to UTF-8.
echo mb_decode_numericentity($xml_utf8, [0x80, 0xffff, 0, 0xffff]);
/* <Foo>€&foo Ç</Foo> */
Benchmark
I've added a benchmark for good measure. This also demonstrates the flaw in your solution for clarity. Below is the input string I used.
<Foo>€&foo Ç é #_x_amp#; ∬</Foo>
Your method
php -r '$q=["&",">","<"];$y=["#_x_amp#;","#_x_gt#;","#_x_lt#;"]; $s=microtime(1); for(;++$i<1000000;)$r=str_replace($y,$q,html_entity_decode(str_replace($q,$y,"<Foo>€&foo Ç é #_x_amp#; ∬</Foo>"),ENT_HTML5|ENT_NOQUOTES)); $t=microtime(1)-$s; echo"$r\n=====\nTime taken: $t\n";'
<Foo>€&foo Ç é & ∬</Foo>
=====
Time taken: 2.0397531986237
My method
php -r '$s=microtime(1); for(;++$i<1000000;)$r=preg_replace_callback("#&[A-Z0-9]+;#i",function($m){return htmlentities(html_entity_decode($m[0]),ENT_XML1);},"<Foo>€&foo Ç é #_x_amp#; ∬</Foo>"); $t=microtime(1)-$s; echo"$r\n=====\nTime taken: $t\n";'
<Foo>€&foo Ç é #_x_amp#; ∬</Foo>
=====
Time taken: 4.045273065567
My method (with unicode to numbered entity):
php -r '$s=microtime(1); for(;++$i<1000000;)$r=mb_encode_numericentity(preg_replace_callback("#&[A-Z0-9]+;#i",function($m){return htmlentities(html_entity_decode($m[0]),ENT_XML1);},"<Foo>€&foo Ç é #_x_amp#; ∬</Foo>"),[0x80,0xffff,0,0xffff]); $t=microtime(1)-$s; echo"$r\n=====\nTime taken: $t\n";'
<Foo>€&foo Ç é #_x_amp#; ∬</Foo>
=====
Time taken: 5.4407880306244
My method (with numbered entity to unicode):
php -r '$s=microtime(1); for(;++$i<1000000;)$r=mb_decode_numericentity(preg_replace_callback("#&[A-Z0-9]+;#i",function($m){return htmlentities(html_entity_decode($m[0]),ENT_XML1);},"<Foo>€&foo Ç é #_x_amp#;</Foo>"),[0x80,0xffff,0,0xffff]); $t=microtime(1)-$s; echo"$r\n=====\nTime taken: $t\n";'
<Foo>€&foo Ç é #_x_amp#; ∬</Foo>
=====
Time taken: 5.5400078296661
public function entity_decode($str, $charset = NULL)
{
if (strpos($str, '&') === FALSE)
{
return $str;
}
static $_entities;
isset($charset) OR $charset = $this->charset;
$flag = is_php('5.4')
? ENT_COMPAT | ENT_HTML5
: ENT_COMPAT;
do
{
$str_compare = $str;
// Decode standard entities, avoiding false positives
if ($c = preg_match_all('/&[a-z]{2,}(?![a-z;])/i', $str, $matches))
{
if ( ! isset($_entities))
{
$_entities = array_map('strtolower', get_html_translation_table(HTML_ENTITIES, $flag, $charset));
// If we're not on PHP 5.4+, add the possibly dangerous HTML 5
// entities to the array manually
if ($flag === ENT_COMPAT)
{
$_entities[':'] = '&colon;';
$_entities['('] = '&lpar;';
$_entities[')'] = '&rpar';
$_entities["\n"] = '&newline;';
$_entities["\t"] = '&tab;';
}
}
$replace = array();
$matches = array_unique(array_map('strtolower', $matches[0]));
for ($i = 0; $i < $c; $i++)
{
if (($char = array_search($matches[$i].';', $_entities, TRUE)) !== FALSE)
{
$replace[$matches[$i]] = $char;
}
}
$str = str_ireplace(array_keys($replace), array_values($replace), $str);
}
// Decode numeric & UTF16 two byte entities
$str = html_entity_decode(
preg_replace('/(&#(?:x0*[0-9a-f]{2,5}(?![0-9a-f;]))|(?:0*\d{2,4}(?![0-9;])))/iS', '$1;', $str),
$flag,
$charset
);
}
while ($str_compare !== $str);
return $str;
}
For those coming here because your numeric entity in the range 128 to 159 remains as numeric entity instead of being converted to a character:
echo xml_entity_decode('€');
//Output € instead expected €
This depends on PHP version (at least for PHP >=5.6 the entity remains) and on the affected characters. The reason is that the characters 128 to 159 are not printable characters in UTF-8. This can happen if the data to be converted mix up windows-1252 content (where € is the € sign).
Try this function:
function xmlsafe($s,$intoQuotes=1) {
if ($intoQuotes)
return str_replace(array('&','>','<','"'), array('&','>','<','"'), $s);
else
return str_replace(array('&','>','<'), array('&','>','<'), html_entity_decode($s));
}
example usage:
echo '<k nid="'.$node->nid.'" description="'.xmlsafe($description).'"/>';
also: https://stackoverflow.com/a/9446666/2312709
this code used in production seem that no problems happened with UTF-8

php non latin to hex function

I have website that's in win-1251 encoding and it needs to stay that way. But I also need to be able to echo few links that contain non latin, non cyrillic characters like šžāņūī...
I need a function that convert this
"māja un man tā patīk"
to
"māja un man tā patīk"
and that does not touch html, so if there is <b> it needs to stay as <b>, not > or <
And please no advices about the encoding and how wrong that is.
$str = "<b>Obāchan</b> おばあちゃん";
$str = preg_replace_callback('/./u', function ($matches) {
$chr = $matches[0];
if (strlen($chr) > 1) {
$chr = mb_convert_encoding($chr, 'HTML-ENTITIES', 'UTF-8');
}
return $chr;
}, $str);
This expects the original $str to be UTF-8 encoded, i.e. your PHP file should be saved in UTF-8. It encodes all non-ASCII compatible code points to HTML entities. Since all HTML special characters are ASCII characters, they remain untouched. The resulting string is pure ASCII. Since the lower Win-1251 code points are ASCII compatible, the resulting string is also a valid Win-1251 string. The above $str converts to:
<b>Obāchan</b> おばあちゃん
The main things you probably don't want to encode are <, > and &. Those are really the only special characters. So how about encoding everything first, and then just decode <, > and & I feel you should be fine.
This is untested:
$output =
htmlspecialchars_decode(
htmlentities($input, ENT_NOQUOTES, 'CP-1251')
);
let me know
What Evert suggest looks logical to me too! If you insist this is a way to do it if there are only two letters that bother you. For more letters the scrit will not be as effective and needs to change.
<?PHP
function myConvert($str)
{
$chars['ā']='ā';
$chars['ī']='ī';
foreach ($chars as $key => $value)
$output = str_replace($key, $value, $str);
echo $str;
}
myConvert("māja un man tā patīk");
?>
==================edited==============
For many characters maybe this one can help you:
<?PHP
function myConvert($str)
{
$final=null;
$parts = preg_split("/&#[0-9]*;/i", $str);//get all text parts
preg_match_all("/&#[0-9]*;/i", $str, $delimiters );//get delimiters;
$delimiters[0][]='';//make arrays equal size
foreach($parts as $key => $value)
$final.=$value.mb_convert_encoding
($delimiters[0][$key], "UTF-8", "HTML-ENTITIES");
return $final;
}
$fh = fopen("testFile.txt", 'w') ;
fwrite($fh, myConvert("māja un man tā patīkī"));
fclose($fh);
?>
The desired output is written in the text file. This code, exactly as it is -not merged in some project- does what it claims to do. Converts codes like ā to the analogous character they present.

How do I convert arabic letters in htmlentities symbols?

I need convert arabic letters in htmlentities symbols. Codepage: ISO-8859-1.
سك - this is arabic symbol for example.
htmlentities("سك")
returns:
س�
How can I get from this symbol the html-entities سك?
htmlentities() can do only characters that have named entities. See this question on how to convert arbitrary characters into numeric entities.
You're probably not targeting the correct charset. Try: htmlentities('سك', ENT_QUOTES, 'UTF-8');
i'm using a function to make sure there are no html code or cotation posted by user
function cleartext($x1){
$x1 = str_replace('"','',$x1);
$x1 = str_replace("'",'',$x1);
$x1 = htmlentities($x1, ENT_QUOTES, 'UTF-8');
return $x1;
}
so thank for ( ENT_QUOTES, 'UTF-8' ) it helped me to find what am looking for

Generating XML document in PHP (escape characters)

I'm generating an XML document from a PHP script and I need to escape the XML special characters.
I know the list of characters that should be escaped; but what is the correct way to do it?
Should the characters be escaped just with backslash (\') or what is the proper way?
Is there any built-in PHP function that can handle this for me?
I created simple function that escapes with the five "predefined entities" that are in XML:
function xml_entities($string) {
return strtr(
$string,
array(
"<" => "<",
">" => ">",
'"' => """,
"'" => "&apos;",
"&" => "&",
)
);
}
Usage example Demo:
$text = "Test & <b> and encode </b> :)";
echo xml_entities($text);
Output:
Test &amp; <b> and encode </b> :)
A similar effect can be achieved by using str_replace but it is fragile because of double-replacings (untested, not recommended):
function xml_entities($string) {
return str_replace(
array("&", "<", ">", '"', "'"),
array("&", "<", ">", """, "&apos;"),
$string
);
}
Use the DOM classes to generate your whole XML document. It will handle encodings and decodings that we don't even want to care about.
Edit: This was criticized by #Tchalvak:
The DOM object creates a full XML document, it doesn't easily lend itself to just encoding a string on it's own.
Which is wrong, DOMDocument can properly output just a fragment not the whole document:
$doc->saveXML($fragment);
which gives:
Test & <b> and encode </b> :)
Test &amp; <b> and encode </b> :)
as in:
$doc = new DOMDocument();
$fragment = $doc->createDocumentFragment();
// adding XML verbatim:
$xml = "Test & <b> and encode </b> :)\n";
$fragment->appendXML($xml);
// adding text:
$text = $xml;
$fragment->appendChild($doc->createTextNode($text));
// output the result
echo $doc->saveXML($fragment);
See Demo
What about the htmlspecialchars() function?
htmlspecialchars($input, ENT_QUOTES | ENT_XML1, $encoding);
Note: the ENT_XML1 flag is only available if you have PHP 5.4.0 or higher.
htmlspecialchars() with these parameters replaces the following characters:
& (ampersand) becomes &
" (double quote) becomes "
' (single quote) becomes &apos;
< (less than) becomes <
> (greater than) becomes >
You can get the translation table by using the get_html_translation_table() function.
Tried hard to deal with XML entity issue, solve in this way:
htmlspecialchars($value, ENT_QUOTES, 'UTF-8')
In order to have a valid final XML text, you need to escape all XML entities and have the text written in the same encoding as the XML document processing-instruction states it (the "encoding" in the <?xml line). The accented characters don't need to be escaped as long as they are encoded as the document.
However, in many situations simply escaping the input with htmlspecialchars may lead to double-encoded entities (for example é would become &eacute;), so I suggest decoding html entities first:
function xml_escape($s)
{
$s = html_entity_decode($s, ENT_QUOTES, 'UTF-8');
$s = htmlspecialchars($s, ENT_QUOTES, 'UTF-8', false);
return $s;
}
Now you need to make sure all accented characters are valid in the XML document encoding. I strongly encourage to always encode XML output in UTF-8, since not all the XML parsers respect the XML document processing-instruction encoding. If your input might come from a different charset, try using utf8_encode().
There's a special case, which is your input may come from one of these encodings: ISO-8859-1, ISO-8859-15, UTF-8, cp866, cp1251, cp1252, and KOI8-R -- PHP treats them all the same, but there are some slight differences in them -- some of which even iconv() cannot handle. I could only solve this encoding issue by complementing utf8_encode() behavior:
function encode_utf8($s)
{
$cp1252_map = array(
"\xc2\x80" => "\xe2\x82\xac",
"\xc2\x82" => "\xe2\x80\x9a",
"\xc2\x83" => "\xc6\x92",
"\xc2\x84" => "\xe2\x80\x9e",
"\xc2\x85" => "\xe2\x80\xa6",
"\xc2\x86" => "\xe2\x80\xa0",
"\xc2\x87" => "\xe2\x80\xa1",
"\xc2\x88" => "\xcb\x86",
"\xc2\x89" => "\xe2\x80\xb0",
"\xc2\x8a" => "\xc5\xa0",
"\xc2\x8b" => "\xe2\x80\xb9",
"\xc2\x8c" => "\xc5\x92",
"\xc2\x8e" => "\xc5\xbd",
"\xc2\x91" => "\xe2\x80\x98",
"\xc2\x92" => "\xe2\x80\x99",
"\xc2\x93" => "\xe2\x80\x9c",
"\xc2\x94" => "\xe2\x80\x9d",
"\xc2\x95" => "\xe2\x80\xa2",
"\xc2\x96" => "\xe2\x80\x93",
"\xc2\x97" => "\xe2\x80\x94",
"\xc2\x98" => "\xcb\x9c",
"\xc2\x99" => "\xe2\x84\xa2",
"\xc2\x9a" => "\xc5\xa1",
"\xc2\x9b" => "\xe2\x80\xba",
"\xc2\x9c" => "\xc5\x93",
"\xc2\x9e" => "\xc5\xbe",
"\xc2\x9f" => "\xc5\xb8"
);
$s=strtr(utf8_encode($s), $cp1252_map);
return $s;
}
If you need proper xml output, simplexml is the way to go:
http://www.php.net/manual/en/simplexmlelement.asxml.php
Proper escaping is the way to get correct XML output but you need to handle escaping differently for attributes and elements. (That is Tomas' answer is incorrect).
I wrote/stole some Java code a while back that differentiates between attribute and element escaping. The reason is that the XML parser considers all white space special particularly in attributes.
It should be trivial to port that over to PHP (you can use Tomas Jancik's approach with the above appropriate escaping). You don't have to worry about escaping extended entities if your using UTF-8.
If you don't want to port my Java code you can look at XMLWriter which is stream based and uses libxml so it should be very efficient.
You can use this methods:
http://php.net/manual/en/function.htmlentities.php
In that way all entities (html/xml) are escaped and you can put your string inside XML tags
Based on the solution of sadeghj the following code worked for me:
/**
* #param $arr1 the single string that shall be masked
* #return the resulting string with the masked characters
*/
function replace_char($arr1)
{
if (strpos ($arr1,'&')!== FALSE) { //test if the character appears
$arr1=preg_replace('/&/','&', $arr1); // do this first
}
// just encode the
if (strpos ($arr1,'>')!== FALSE) {
$arr1=preg_replace('/>/','>', $arr1);
}
if (strpos ($arr1,'<')!== FALSE) {
$arr1=preg_replace('/</','<', $arr1);
}
if (strpos ($arr1,'"')!== FALSE) {
$arr1=preg_replace('/"/','"', $arr1);
}
if (strpos ($arr1,'\'')!== FALSE) {
$arr1=preg_replace('/\'/','&apos;', $arr1);
}
return $arr1;
}
function replace_char($arr1)
{
$arr[]=preg_replace('>','&gt', $arr1);
$arr[]=preg_replace('<','&lt', $arr1);
$arr[]=preg_replace('"','&quot', $arr1);
$arr[]=preg_replace('\'','&apos', $arr1);
$arr[]=preg_replace('&','&amp', $arr1);
return $arr;
}

Categories