PHP: Converting \xc3\xa4 to ä - php

I'm trying to correct an encoding error.
For example, a string which should read "Morgan Pålsson - världsreporter" has been encoded as "Morgan P\xc3\xa5lsson - v\xc3\xa4rldsreporter".
How do I convert "\xc3\xa5" back to "å" and "\xc3\xa4" back to "ä"?
I've tried combinations of various encode/decode functions and iconv, but no luck.
This seems like it should be straightforward. Any ideas?

We had this problem when decoding strings coming from Salesforce via the SOAP interface. Our solution to this "\xc3\xa4" problem looks really weird, but it works. Note that this is a Python solution, but maybe you can apply this to PHP as well! :)
decoded_string=encoded_string.encode('raw_unicode_escape').decode('unicode_escape').encode('latin1').decode('utf-8')

Related

PHP's var_dump / print_r output is garbled - encoding issue?

I'm having a problem where on a server the output of var_dump and print_r come out entirely garbled. print_r outputs pure gibberish (eg. ��]{W�8�����- ... etc), while var_dump at least gives string (1664), followed by similar gibberish (though this time wrapped in double quotes).
This looks like a character encoding issue, but no encoding I can find seems to fix it (and I don't know why just dumping a PHP object should be outputting non-ascii characters anyway), and echo works fine. Alternatively, I wonder if it could be a gzip issue. Either way, I suspect it must be something in PHP or Apache's configuration, but I have no idea how to fix it.
I'd be very grateful if anyone has any suggestions as to how fix this!
Update: on further investigation, it seems it's a problem specific to the particular object I'm trying to dump. The object in question is decoded JSON requested (via curl) from an API. Is it possible that either json_decode or curl could be misconfigured / mangling the encoding?
For what it's worth, I finally got to the bottom of this problem (I think!)
The problem seems to be that the API's output was being run through json_decode whether it was JSON or not. MySQL errors were causing an error page, not a JSON response, which when run through json_decode (by the API-handling code that received it) before var_dump produced garbled character salad, as above.

PHP json_decode returns null

I'm writing PHP code that uses a database. To do so, I use an array as a hash-map.
Every time content is added or removed from my DB, I save it to file.
I'm forced by my DB structure to use this method and can't use mysql or any other standard DB (School project, so structure stays as is).
I built two functions:
function saveDB($db){
$json_db = json_encode($db);
file_put_contents("wordsDB.json", $json_db);
} // saveDB
function loadDB(){
$json_db = file_get_contents("wordsDB.json");
return json_decode($json_db, true);
} // loadDB
When echo-ing the string I get after the encoding or after loading from file, I get a valid json (Tested it on a json viewer) Whenever I try to decode the string using json_decode(), I get null (Tested it with var_dump()).
The json string itself is very long (~200,000 characters, and that's just for testing).
I tried the following:
Replacing single/double-quotes with double/single-quotes (Without any backslashes, with one backslash and three backslashes. And any combination I could think of with a different number of backslashes in the original and replaced string), both manually and using str_replace().
Adding quotes before and after the json string.
Changing the page's encoding.
Decoding without saving to file (Right after encoding).
Checked for slashes and backslashes. None to be found.
Tried addslashes().
Tried using various "Escape String" variants.
json_last_error() doesn't work. I get no error number (Get null, not 0).
It's not my server, so I'm not sure what PHP version is used, and I can't upgrade/downgrade/install anything.
I believe the size has something to do with it, because small strings seem to work fine.
Thanks Everybody :)
In your JSON file change null to "null" and it will solve the problem.
Check if your file is UTF8 encoded. json_decode works with UTF8 encoded data only.
EDIT:
After I saw uploaded JSON data, I did some digging and found that there are 'null' key. Search for:
"exceeding":{"S01E01.html":{"2217":1}},null:{"S01E01.html":
Change that null to be valid property name and json_decode will do the job.
I had a similar problem last week. my json was valid according to jsonlint.com.
My json string contained a # and a & and those two made json_decode fail and return null.
by using var_dump(json_decode($myvar)) which stops right where it fails I managed to figure out where the problem was coming from.
I suggest var_dumping and using find dunction to look for these king of characters.
Just on the off chance.. and more for anyone hitting this thread rather than the OP's issue...I missed the following, someone had htmlentities($json) way above me in the call stack. Just ensure you haven't been bitten by the same and check the html source.
Kickself #124

Not able to parse this json

I am trying to parse the json output from
http://www.nyc.gov/portal/apps/311_contentapi/services/all.json
And my php json_decode returns a NULL
I am not sure where the issue is, I tried running a small subset of the data through JSONLint and it validated the json.
Any Ideas?
The error is in this section:
{
"id":"2002-12-05-22-24-56_000010083df0188b4001eb56",
"service_name":"Outdoor Electric System Complaint",
"expiration":"2099-12-31T00:00:00Z",
"brief_description":"Report faulty Con Edison equipment, including dangling or corroded power lines or "hot spots.""
}
See where it says "hot spots." in an already quoted string. Those "'s should've been escaped. Since you don't have access to edit the JSON perhaps you could do a search for "hot spots."" and replace it with \"hot spots.\"" like str_replace('"hot spots.""', '\\"hot spots.\\""\, $str); for as long as that's in there. Of course that only helps if this is a one time thing. If the site continues to make errors in their JSON output you'll have to come up with something more complex.
What I did to identify the errors in the JSON ...
Since faulty quoting is the first thing to look for, I downloaded the JSON to a text file, opened in a text editor (I used vim but any full featured editor would do), ran a search and replace that removed all characters except double-quote and looked at the result. It was clear that correct lines should have 4 double-quotes so I simply searched for 5 double-quotes together and found the first bad line. I noted the line number and then undid the search and replace to get the original file back and looked at that line. This gives you what you need to get the developers of the API to fix the JSON.
Writing code to automatically fix the bad JSON before giving it to json_decode() would be quite a bit harder but doable using techniques like those in another answer.
According to the PHP manual:
In the event of a failure to decode, json_last_error() can be used to determine the exact nature of the error.
Try calling it to see where the error is.

How do I get PHP to accept ISO-8859-1 characters in general?

This has been bugging me for ages and I want to get to the bottom of this once and for all. I have an associative array which fields I have defined using ISO-8859-1 characters. For instance:
array("utført" => "red");
I also have another array that I have loaded in from a file. I have printed this array out in a browser, checking that values like Æ, Ø and Å is intact. I try to compare two fields from these arrays and I'm slapped by the message:
Undefined index: utfã¸rt on line 39
I can't help but sob. Every single damn time I involve any letters outside UTF-8 in a script they are at some point converted into ã¸r or similar nonsense.
My script file is encoded in ISO-8859-1, the document from which I'm loading my data is the same, and so is the MySQL table I'm trying to save the data to.
So the only conclusion I can draw is that PHP isn't accepting just any character-sets into it's code, and I have to somehow force PHP to speak Norwegian.
Thanks for any suggestions
Just FYI, I won't accept any answers in the lines of "Just don't use those characters" or "Just replace those characters with UTF equivalents at file load" or any other hack solutions
When you read your data from external file try to convert them in proper encoding.
Something like this I have on my mind...
$f = file_get_contents('externaldata.txt');
$f = mb_convert_encoding($f, 'iso-8859-1');
// from this point deal with $f whatever you want
Also, look at mb_convert_encoding() manual for more info.

What JSON does this CF code return?

Trying to implement the excellent jQuery bidirectional infite scroll as explained here:
http://www.bennadel.com/blog/1803-Creating-A-Bidirectional-Infinite-Scroll-Page-With-jQuery-And-ColdFusion.htm
For the server-side, which returns JSON, the example is in ColdFusion. Trying to implement it in PHP.
I need to find out what the format of the JSON is.
RIght now, I am returning
[{"src":"https:\/\/s3.amazonaws.com\/gbblr_2\/100\/IMG_1400 - original.jpg","offset":"5"},{"src":"https:\/\/s3.amazonaws.com\/gbblr_2\/100\/IMG_1399 - original.jpg","offset":6},{"src":"https:\/\/s3.amazonaws.com\/gbblr_2\/100\/IMG_1398 - original.jpg","offset":7}]
which doesn't work, in the html that is generated it shows "UNDEFINED" for both the src and the offset variables.
So my question: what kind of JSON does that coldfusion code generate? What is the format of JSON that I need to return.
Thanks for any tips!!
CF's JSON mentioned in Ben's post is similar to this:
[{"SRC":"http:\/\/example.com\/public","OFFSET":3.0},{"SRC":"http:\/\/example.com\/public","OFFSET":3.0}]
I'd try to check key names first. Yes, CF makes them uppercase, and JS doesn't like it sometimes. Check his function applyListItems() and check if RegExp finds something or not.
If this doesn't help little Firebug line debugging and console.log will do the trick I guess.
Looks like the JSON you're creating should be equivalent to his. He is creating an array of structures; where each structure contains the keys "src" and "offset".
He is converting to base64 and binary for streaming purposes, but I don't know how that would work -- or if it would be required -- for a php implementation.
I would use Firebug to figure out exactly where in your JavaScript the error is being thrown. That will tell you more about what exactly the problem is.

Categories