PHP: htmlentities and htmlspecialchars not converting some characters - php

I'm trying convert all special chars into HTML safe entities on their way into my database, but I can't seem to get PHP to handle certain characters. For example, if my string contains any of the following: ¡£¢∞§¶ It gets turned into an empty string.
So for example, the following string:
Hello£
Get turned into an empty string after it's POSTed and processed by the following code:
$workDetails["copy"] = htmlentities($workDetails["copy"], ENT_QUOTES, "UTF-8");
I presume I'm doing something wrong? :(

Maybe it will just be enough if you change the Encoding of your website to UTF-8 via the header() command:
header("Content-Type: text/html; charset=utf-8"); in PHP
or
<?xml version="1.0" encoding="utf-8" ?>; at the top of your HTML template if you use one.
but if you definitely need to convert those chars to its specific html code, you should create your own function to replace the symbols which are not covered by htmlspecialchars() as well.

Related

PHP http_build_query generating a ® sign when I write reg, how to escape?

I have the following code:
$data1 = [
"user_number" => "423423", // unique_id
"reg_date" => "2013-01-20", // date of registration yyyy-mm-dd
];
echo http_build_query($data1);
This is generating the following string:
user_number=423423®_date=2013-01-20
As you can see, it converts "reg" to ®, breaking the API query. How to prevent it from doing that?
&reg is how you write a registered trademark symbol in HTML.
Your URL is fine, the problem is that you are interpreting it as a URL in HTML and not a URL in plain text.
Use htmlspecialchars to convert the string to HTML source code.
The file (physical file .php) you are coding may be in ISO-8859-1 or similar.
You need to convert your file to UTF-8 to avoid this problem.
You may also want to apply a function to solve convert special chars:
echo htmlspecialchars(http_build_query($data1));
echo htmlentities(http_build_query($data1));
Both will work.
The issue here is that &reg is encoded as ®.
By using functions to replace these special chars, it will render the way you want.
The online php interpreters probably use those, that's why we can't reproduce the issue.
echo http_build_query($data1,'','&');
&reg is how you write a registered trademark symbol in HTML.
that means your original code returns
user_number=423423&reg_date=2013-01-20
and when you output to browser browser converts &reg part to (r)

Mysql PHP, Retrived string contains apostrphone cant replace

I have some records on my table and as I can see it on PHPMyAdmin it contains apostrophe like this:
Brazil’s ‘car wash’
When I make a query and echo them on web page without any header these apostrophes appear as question marks like this:
Brazil�s �car wash�
but with this header:
header("Content-Type: text/html; charset=ISO-8859-1");
they appear correctly.
Now the problem, I cannot replace them using this code:
$title = str_replace('’',"",$title);
$title = str_replace("‘","",$title);
How can I replace those apostrophes if str_replace not working?
This indicates your data is stored in ISO encoding, while the default encoding of your webpage might be UTF-8 or something else. Why replace them? You could convert your data to UTF-8 prior outputting or change the whole site encoding to ISO. But I would always prefer UTF-8.
You can convert to UTF-8 with $title = utf8_encode($title).

Parsing xml with PHP what to do with characters like these

I'm parsing an xml document using php.
When I see the result in my browser I get the following characters:
ñ instead of spanish ñ
í instead of í
á instead of á
ó instead of ó
é instead of é
I was going to use a str_replace and replace every odd character for the good ones, but sadly the pattern before happens only sometimes and in general I have a wide collection of odd characters :(
The xml heading is:
<?xml version="1.0" encoding="iso-8859-1"?>
But if I change it to utf-8 it simply won't be printed ..
I load the xml as a string with simplexml_load_string (comes from database like that)
Can you please give me any ideas on how to solve this?
Thanks a lot
You have 2 options:
a) include a header('Content-Type: text/html; charset=iso-8859-1'); before any output in your php file.
b) convert the output to utf-8 with $str = mb_convert_encoding($str, 'UTF-8', 'ISO-8859-1');
Both should do the trick.
SimpleXML uses UTF-8 to encode stored strings. You can use an XML-File with iso-8859-1, but if you want to print XML values with this encoding, you have to use utf8_decode before.
$string = preg_replace('/[\x00-\x1F\x80-\xFF]/', '', $string);
// new xml
$xml = new SimpleXMLElement('new.xml');
// Displaying XML in textual form
echo $xml->asXML();

Handle special symbols using PHP (i.e. ™ instead of â„¢)

I'm trying to read an XML document which contains ™ (™), but for some reason, no matter what I try, it always displays as â„¢.
For example:
$xml = new SimpleXMLElement('<item><title>test</title></item><item><title>™</title></item>');
foreach ($xml->item AS $item)
{
echo $item->title . "<br />";
}
Results in:
test
â„¢
Just to be clear, I don't want it to just show appropriately, I need to insert it to a DB.
Thanks!
The code in your original post works fine for me, at least if I add <xml> tags. Make sure the content encoding of your HTML page is set correctly, i.e. send the HTTP header Content-Type:text/html; charset=UTF-8 or set this in your <head>. When inserting the string containing this symbol into the database, first set the character set to UTF-8 using SET NAMES UTF8. Of course, the database/table/field into which you are inserting should be set to UTF8 too.
Try using utf8_decode or utf8_encode php functions. They should convert it into the correct character.
echo utf8_decode($this->title);
Run htmlentities() over the whole string before you load it into simplexmlelement. This will convert anything PHP recognises as a html entity (e.g. £, &, €). This will let you store them in your database without havin to use all the mb* functions, and all the other hoops you need to jump through for UTF8 support in databases.
If you have any really special characters that can not be encoded this way, this will not work.
If php is getting it from the XML file correctly, and the problem is outputting it into your database, use htmlspecialchars, which will convert all symbols into their html equivalents. The symbol will be stored as "™", which can be handled well when you retrieve it from the database.

json with special characters like é

I'm developing a dependent select script using jQuery, PHP and JSON as the response.
Everything goes well except for using special characters like French ones (é , è , à...)
if I pre-encode them like (é , è , à) (Here I'm using spaces between the ampersand and the rest of the word to prevent auto encoding in my question) it works but when rendered with jquery the characters are not converted to what they should look like (é...), instead they are shown as is (é)
If I write them like (é) and don't pre-encode them the full value in this array entry is not shown.
What should I do here?
Thanks.
If I write them like (é) and don't pre-encode them the full value in
this array entry is not shown.
What should I do here?
In JSON you do not HTML-encode values. You send them literally (é) and set set Content-Type correctly:
header('Content-Type: application/json; Charset=UTF-8');
Declare the encoding your data is in, of course.
This worked for me, hopefully it will work for anyone else experiencing similar issues.
$title = 'é';
$title = mb_convert_encoding($title, "UTF-8", "HTML-ENTITIES");
header('Content-Type: application/json; Charset="UTF-8"');
echo json_encode(array('title' => $title));
The mb_convert_encoding function takes a value and converts it from (in this case) HTML-ENTITIES to UTF-8.
See here for me details on the function http://php.net/manual/en/function.mb-convert-encoding.php
Just like the first anwser
Do you use a database? If Yes, make sure the database table is declared UFT8
How is declared the HTML page? UTF-8
IS the string in the PHP script file? If yes, make sure the file has a UTF-8 file format
You could also use utf8_encode (to send to HTML) and utf8_decode (to receive) but not the right way

Categories