I have a CSV file that looks like this:
http://ideone.com/YWuuWx
I read the file and convert it to array, which works completely fine, but then I jsonize the array - but json_encode doesnt put the real values - it puts null - here is the dump of the array and jsonized array:
http://jave.jecool.net/stackoverflowdemos/csv_to_json_to_arraydump.php
I convert like this: $php_array= json_encode($json_array,JSON_PRETTY_PRINT);
anyone knows what might cause the problem?
EDIT: I think ther is like 90% chance that its caused by the latin1 characters - anyone knows the best workaround?
Assuming that it is in fact an encoding error, and that your data is actually encoded in some ISO-8859 variant (I'm guessing latin2 rather than latin1 based on your use of LATIN SMALL LETTER R WITH CARON), and that it is CONSISTENTLY so, you can use iconv() to re-encode it as UTF-8 before doing json_encode():
$foo = iconv('ISO-8859-2', 'utf8', $foo);
Related
In php, json_encode() will encode UTF8 in hex entities, e.g.
json_encode('中'); // become "\u4e2d"
Assume the data "\u4e2d" is now being stored in MySQL, is it possible to convert back from "\u4e2d" to 中 without using PHP, just plain MySQL?
On my configuration, select hex('中'); returns E4B8AD
which is the hex code of the UTF8 bytes. Naturally it
is not the same as the hex of the code point 4e2d, but you can get
that with select hex(cast('中' as char(1) character set utf16));.
Update: The questioner has edited the question, to what looks to me like a completely different question, now it's apparently: how to get '中' given a string containing '\u4e2d' when 4e2d is the code point of 中 and the default character set is utf8. Okay, that is
select cast(char(conv(right('\u4e2d',4),16,10) using utf16) as char(1) character set utf8);
Encoding non-ASCII characters as JavaScript entities is only one of the different things that JSON encoders will do—and it isn't actually mandatory:
echo json_encode('中'), PHP_EOL;
echo json_encode('中', JSON_UNESCAPED_UNICODE), PHP_EOL;
echo json_encode('One "Two" Three \中'), PHP_EOL;
"\u4e2d"
"中"
"One \"Two\" Three \\\u4e2d"
Thus the only safe decoding approach is using a dedicated JSON decoder. MySQL bundles the required abilities since 5.7.8:
SET #input = '"One \\"Two\\" Three \\\\\\u4e2d"';
SELECT #input AS json_string, JSON_UNQUOTE(#input) AS original_string;
json_string original_string
============================ ===================
"One \"Two\" Three \\\u4e2d" One "Two" Three \中
(Demo)
If you have an older version you'll have to resort to more elaborate solutions (you can Google for third-party UDF's).
In any case, I suggest you get back to the design table. It's strange that you need JSON data in a context where you don't have a proper JSON decoder available.
I'm writing PHP code that uses a database. To do so, I use an array as a hash-map.
Every time content is added or removed from my DB, I save it to file.
I'm forced by my DB structure to use this method and can't use mysql or any other standard DB (School project, so structure stays as is).
I built two functions:
function saveDB($db){
$json_db = json_encode($db);
file_put_contents("wordsDB.json", $json_db);
} // saveDB
function loadDB(){
$json_db = file_get_contents("wordsDB.json");
return json_decode($json_db, true);
} // loadDB
When echo-ing the string I get after the encoding or after loading from file, I get a valid json (Tested it on a json viewer) Whenever I try to decode the string using json_decode(), I get null (Tested it with var_dump()).
The json string itself is very long (~200,000 characters, and that's just for testing).
I tried the following:
Replacing single/double-quotes with double/single-quotes (Without any backslashes, with one backslash and three backslashes. And any combination I could think of with a different number of backslashes in the original and replaced string), both manually and using str_replace().
Adding quotes before and after the json string.
Changing the page's encoding.
Decoding without saving to file (Right after encoding).
Checked for slashes and backslashes. None to be found.
Tried addslashes().
Tried using various "Escape String" variants.
json_last_error() doesn't work. I get no error number (Get null, not 0).
It's not my server, so I'm not sure what PHP version is used, and I can't upgrade/downgrade/install anything.
I believe the size has something to do with it, because small strings seem to work fine.
Thanks Everybody :)
In your JSON file change null to "null" and it will solve the problem.
Check if your file is UTF8 encoded. json_decode works with UTF8 encoded data only.
EDIT:
After I saw uploaded JSON data, I did some digging and found that there are 'null' key. Search for:
"exceeding":{"S01E01.html":{"2217":1}},null:{"S01E01.html":
Change that null to be valid property name and json_decode will do the job.
I had a similar problem last week. my json was valid according to jsonlint.com.
My json string contained a # and a & and those two made json_decode fail and return null.
by using var_dump(json_decode($myvar)) which stops right where it fails I managed to figure out where the problem was coming from.
I suggest var_dumping and using find dunction to look for these king of characters.
Just on the off chance.. and more for anyone hitting this thread rather than the OP's issue...I missed the following, someone had htmlentities($json) way above me in the call stack. Just ensure you haven't been bitten by the same and check the html source.
Kickself #124
This has been bugging me for ages and I want to get to the bottom of this once and for all. I have an associative array which fields I have defined using ISO-8859-1 characters. For instance:
array("utført" => "red");
I also have another array that I have loaded in from a file. I have printed this array out in a browser, checking that values like Æ, Ø and Å is intact. I try to compare two fields from these arrays and I'm slapped by the message:
Undefined index: utfã¸rt on line 39
I can't help but sob. Every single damn time I involve any letters outside UTF-8 in a script they are at some point converted into ã¸r or similar nonsense.
My script file is encoded in ISO-8859-1, the document from which I'm loading my data is the same, and so is the MySQL table I'm trying to save the data to.
So the only conclusion I can draw is that PHP isn't accepting just any character-sets into it's code, and I have to somehow force PHP to speak Norwegian.
Thanks for any suggestions
Just FYI, I won't accept any answers in the lines of "Just don't use those characters" or "Just replace those characters with UTF equivalents at file load" or any other hack solutions
When you read your data from external file try to convert them in proper encoding.
Something like this I have on my mind...
$f = file_get_contents('externaldata.txt');
$f = mb_convert_encoding($f, 'iso-8859-1');
// from this point deal with $f whatever you want
Also, look at mb_convert_encoding() manual for more info.
I'm having a problem with some characters like 'í' or 'ñ' working in a web project with PHP and MySQL.
The database table is in UTF-8 charset and the web page is ISO-8859-1 (latin-1). at first look everything is handled ok, but a problem is coming when I use the JSON_ENCODE function of PHP.
When I get a query result, let's say, this row:
| ID | VALUE |
--------------------
| 1 | Línea |
I got the following (correct) array in PHP:
Array("ID"=>"1","VALUE"=>"Línea");
So far, so good. But, when i apply the JSON_ENCODE
$result = json_encode($result);
//$result is {"id":"1","value":"L"}
Then i tried some coding/decoding but i couldn't get the right result.
First I tried to decode the UTF-8 chars like follow:
$result['value'] = utf8_decode($result['value']);
//and I get $result['value'] is "L?a"
Then I tried with mb functions:
$result['value'] = mb_convert_encoding($result['value'],"ISO-8859-1","UTF-8");
//and I get that $result['value'] is "Lnea"
I don't really know why is the Json_encode breaking my string and i can't figure out what else to try. I will appreciate any help :)
Thanks!
The documentation for json_encode states that the function will only work on UTF-8 data. If it's not working for you, it means that your data is not UTF-8.
To understand what's going wrong, you need to know what your connection character set is. Is it UTF-8? Something else? Use SET NAMES utf-8 and see if it makes any difference.
Assuming the connection character set is indeed UTF-8, json_encode should work just fine. Then, you still have the final issue of converting the encoded data to ISO-8859-1. For example:
// assume any strings in $result are UTF-8 encoded
$json = json_encode($result);
$output = mb_convert_encoding($json, 'ISO-8859-1', 'UTF-8');
echo $output;
If it still doesn't work, it means that your UTF-8 strings contain characters not available in the ISO-8859-1 character set. There's nothing you can do about that.
Update:
When debugging complex character set conversions like this, you can use file_put_contents to write intermediate results to a file which you can inspect with a hex editor. This will help confirm that the output of a particular step of the process is correct or not.
I have a site I want to migrate from ISO to UTF-8.
I have a record in database indexed by the following primary key :
s:22:"Informations générales";
The problem is, now (with UTF-8), when I serialize the string, I get :
s:24:"Informations générales";
(notice the size of the string is now the number of bytes, not string length)
So this is not compatible with non-utf8 previous records !
Did I do something wrong ? How could I fix this ?
Thanks
The behaviour is completely correct. Two strings with different encodings will generate different byte streams, thus different serialization strings.
Dump the database in latin1.
In the command line:
sed -e 's/latin1/utf8/g' -i ./DBNAME.sql
Import the file converted to a new database in UTF-8.
Use a php script to update each field.
Make a query, loop through each field and update the serialized string using this:
$str = preg_replace('!s:(\d+):"(.*?)";!se', "'s:'.strlen('$2').':\"$2\";'", $str);
After that, I was able to use unserialize() and everything working with UTF-8.
To unserialize an utf-8 encoded serialized array:
$array = #unserialize($arrayFromDatabase);
if ($array === false) {
$array = #unserialize(utf8_decode($arrayFromDatabase)); //decode first
$array = array_map('utf8_encode', $array ); // encode the array again
}
PHP 4 and 5 do not have built-in Unicode support; I believe PHP 6 is starting to add more Unicode support although I'm not sure how complete that is.
You did nothing wrong. PHP prior to v6 just isn't Unicode aware, and as such doesn't support it, if you don't beat it to be (i.e., via the mbstring extension or other means).
We here wrote our own wrapper around serialize() to remedy this. You could, too, move to other serialization techniques, like JSON (with json_encode() and json_decode() in PHP since 5.2.0).