I'm using php 5.2. My file and db table are utf8.
when i insert a column with json_encoded data in it, it converts non-ascii chars into \u-something. ok. when i json_decode the data those \u-somethings are still there!
wasn't json_decode supposed to convert back to the normal chars when displaying on a utf8 page. For example, instead of f\u00f6tter, it should display fötter. do i have to use another function for conversion?
json_encode and json_decode are kind of weak in PHP. Both do the minimum to produce valid, but not necessarily the intended output. json_decode has no idea if \u00f6 is supposed to be ö or \u00f6. There is no way to make json_decode aggressively convert unicode characters back. Remember that JSON is designed to be directly eval'able by JavaScript, and JavaScript will evaluate those escapings.
But why are you json encoding your data to store it in MySQL?
i know this is an old question but the answer did not really answer the question for me.
eventually, i was able to get what i wanted using the JSON_UNESCAPED_UNICODE flag.
$output = json_encode($input, JSON_THROW_ON_ERROR | JSON_UNESCAPED_UNICODE | JSON_UNESCAPED_SLASHES);
https://www.php.net/manual/en/function.json-encode.php
caveat: this only works if you are able to store utf8 in your database; you said your table is utf8 so that should be ok. i was not able to find a way to convert the \u strings back at decode time, so encoding it as unescaped utf8 is what worked for me.
also, if you're using PHP 7.1+ (your post said 5.2 but that was a long time ago) then you may want to add the JSON_UNESCAPED_LINE_TERMINATORS flag as well)
Related
This question is different from UTF-8 all the way through as it asks for how safe and is it a good practice to use the mb_convert_encoding function.
Lets say that a user can upload the files using the PHP API. Each filename and path gets stored in a PostgreSQL database table which has UTF-8 as default encoding.
Sometimes user uploads files which names aren't UTF-8 encoded and they get imported into the database. The problem is that the characters that are not UTF-8 encoded are scrambled and do not display as they should in the table columns.
I was thinking of adding the following to the PHP code before import:
if ( ! mb_check_encoding($output, 'UTF-8') {
$output = mb_convert_encoding($content, 'UTF-8');
}
Does this look like a good practice and will it be displayed and converted by the user's client correctly if I return UTF-8 as the output? Is there a potential loss to the bytes by using mb_convert_encoding?
Thanks
If you're going to convert an encoding, you need to know what you're converting from. You can check whether the encoding is or isn't valid UTF-8, but if it tells you it's not valid UTF-8 then you still have no clue what it is. Omitting the $from_encoding parameter from mb_convert_encoding just makes it assume some preset encoding for that parameter, but that doesn't mean that $content actually is in that encoding.
In other words: if you don't know what encoding a string is in, you cannot meaningfully convert it to anything else either, and just trying to convert it from ¯\_(ツ)_/¯ is a crapshoot with the result being equally likely to be something useful and utter garbage.
If you encounter unknown encodings, you only have a few choices:
Reject the input value.
Test whether it's one of a handful of other expected encodings and then explicitly convert from your best guess; but that is pretty much a crapshoot as well.
Just use bin2hex or something similar on the value, essentially giving up on trying to interpret it correctly, but still leaving some semblance to the original value.
I am facing a problem with storing special characters in database and retrieving again as symbol.
For example, I have a string like Côte d'Ivoire
What I want to do is converting the special character ô to HTML number ô or name ô and at the time of retrieval I need to convert HTML to special symbol again.
I also need to pass this string as JSON response of a web service.
I tried some php functions like htmlspecialchars() and htmlspecialchars_decode() but not getting the desired output.
Any help will be appreciated. If there is any other way to do it then it will also be very helpful.
Thanks in advance
You can use the htmlentities function to transform the special characters.
You have to pass UTF8 to the json_encode function, so you can use utf8_encode on your data before encoding.
http://php.net/manual/en/function.htmlentities.php
http://php.net/manual/en/function.utf8-encode.php
use 'utf8_unicode_ci' for Collation while saving data on database and retrieve data usual way.and check exact data is saving on database.
This problem is much easier to solve, when you use UTF-8 for the whole site including your database. Escaping should be done as late as possible and only for the needed target system.
An example:
Your HTML page is UTF-8 encoded and you receive user input, you get the user input also in UTF-8. This value you can store as it is to the database, just use prepared statements or call mysqli_real_escape_string() before building the SQL-string. This escapes the input just to make it safe for SQL-statements, the database will contain the original user input.
When receiving the value back from the database you get the original UTF-8 input, then you can call htmlspecialchars() to escape it for displaying in HTML output. I wrote a small article about using UTF-8 for the whole site there you can find more information.
I've got json data. There is "cyrillic" strings in json file, like this one:
\u0418\u0432\u0430\u043D\u043E\u0432 \u0418.
When I decode json and put this data in database table I get the string
Иванов И.
On one decoding web-site I entered this string and got very good (the one I need)
Иванов И.
And also this site told me that it was converted from CP1252 to UTF-8.
So I tried to convert data from json after decoding manually using
mb_convert_encoding ( $string, "UTF-8","windows-1252");
mb_convert_encoding ( $string, "UTF-8","CP1252");
and
iconv("windows-1252","UTF-8",$string);
iconv("CP1252","UTF-8",$string);
Any of this functions made the string in database table look like
Øòðýþò ÃËœ.
or
Øòðýþò Ø.
both are not decoded on above site properly. So the question is, how do I convert this string?
Upd: used this sql request:
ALTER DATABASE logenterprise
CHARACTER SET utf8
Tried after the same things that wrote above - result is the same.
Also tried this just in case:
alter table mytable convert to CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Curse you damned encodings ^^
They gave me a hard time too.
Everything looked fine (database, encoding of the inputdata and on the website), but still i got cryptic chars in my tables. So what's the problem then? It's the connection to your database-server.
Fortunately you can fix this with a simple query.
Right after establishing the mysql-connection you need to execute the following query:
mysql_query("SET NAMES 'utf8'");
Voilà. When you execute your INSERT-Query the data gets nicely saved in your db.
This saved my ass many times as i was handling 'Umlauts' and the €-sign.
Note: You shouldn't use mysql_xxx methods anymore as they are deprecated. I just used them in the example to make the code clearer.
I have a prepared statement written on PHP that retrieves a string from MySQL, then i use
json_encode to send the data to client, and it works perfect.
The problem happens when the string in MySQL contains “ .
Should i encode it differently? Or use some special flags?Or there are other solutions?
Thanks
Should i encode it differently?
I'd say yes. Obviously the string you receive from the database is not UTF-8 encoded. And that's the problem, because json_encodeDocs needs UTF-8 encoded strings. If they are invalid, it will return NULL - because there was no valid data to encode.
You can verify this by checking for the last error with the json_last_errorDocs function.
So when you query data from your database, tell the database server that you expect UTF-8 encoded data by setting the database client encoding. Consult the documentation of the database client library you're using, it's documented there.
See as well json_encode() non utf-8 strings? which shows how you can re-encode the strings itself if you don't want to change the database client connection.
Well, the subject says everything. I'm using json_encode to convert some UTF8 data to JSON and I need to transfer it to some layer that is currently ASCII-only. So I wonder whether I need to make it UTF-8 aware, or can I leave it as it is.
Looking at JSON rfc, UTF8 is also valid charset in JSON output, although not recommended, i.e. some implemenatations can leave UTF8 data inside. The question is whether PHP's implementation dumps everthing as ASCII or opts to leave something as UTF-8.
Unlike JSON support in other languages, json_encode() does not have the ability to generate anything other than ASCII.
According to the JSON article in Wikipedia, Unicode characters in strings are always
double-quoted Unicode with backslash escaping
The examples in the PHP Manual on json_encode() seem to confirm this.
So any UTF-8 character outside ASCII/ANSI should be escaped like this: \u0027 (note, as #Ignacio points out in the comments, that this is the recommended way to deal with those characters, not a required one)
However, I suppose json_decode() will convert the characters back to their byte values? You may get in trouble there.
If you need to be sure, take a look at iconv() that could convert your UTF-8 String into ASCII (dropping any unsupported characters) beforehand.
Well, json_encode returns a string. According to the PHP documentation for string:
A string is series of characters. Before PHP 6, a character is the same as a byte. That is, there are exactly 256 different characters possible. This also implies that PHP has no native support of Unicode. See utf8_encode() and utf8_decode() for some basic Unicode functionality.
So for the time being you do not need to worry about making it UTF-8 aware. Of course you still might want to think about this anyway, to future-proof your code.