how to fix a malformed JSON in php - php

i have this JSON string that i want to decode it with json_decode(); function
{"phase":2,"id":"pagelet_profile_picture","css":["VCxcl","Ix2pq"],"js":["fZYUE","VfnZ3"],"content":{"pagelet_profile_picture":"\u003cdiv class=\"profile-picture\">\u003cspan class=\"profile-picture-overlay\">\u003c\/span>\u003cimg class=\"photo img\" src\=\"http:\/\/profile.ak.fbcdn.net\/hprofile-ak-snc4\/222_111_2222_n.jpg\" alt=\"bla bla\" id=\"profile_pic\" \/>\u003c\/div>"}}
there is the json_last_error(); but it not helping me. (got JSON_ERROR_STATE_MISMATCH and JSON_ERROR_SYNTAX sometimes)
i want to know what wrong with this JSON string and how i can fix it automatically in PHP so i can decode it.
some code will be very helpful
thanks.

Using a json lint, it seems the problem is the src\=
the \ escapes the = sign, which makes no sense.
If you replace src\= with src= it passes the validator.
The fix:
Fix the code that generates the json string in the first place.
or
use str_replace to change 'src\=' to 'src='

The problem with a wrong encoding is that it's just a wrong encoding. Things then break.
If the problem is related to invalid escape sequences as Ben pointed out in his answer, you can try to fix the input string for these sequences, probably with a smarter algorithm that is looking for any not-needed escape sequence replacing it with it's non-escaped value by removing the escape character \.
You can do so by creating a list of characters that need actual to be escaped, then parse the whole string for the escape character, if found, check if the next character requires escaping or not and then act upon.
However that's only one strategy and as the input is not properly encoded, it's not easy to just fix things because they are already broken.

Related

Understanding what \u0000 is in PHP / JSON and getting rid of it

I haven't a clue what is going on but I have a string inside an array. It must be a string as I have ran this on it first:
$array[0] = (string)$array[0];
If I output $array[0] to the browser in plain text it shows this:
hellothere
But if I JSON encode $array I get this:
hello\u0000there
Also, I need to separate the 'there' part (the bit after the \u0000), but this doesn't work:
explode('\u0000', $array[0]);
I don't even know what \u0000 is or how to control it in PHP.
I did see this link: Trying to find and get rid of this \u0000 from my json
...which suggests str_replacing the JSON that is generated. I can't do that (and need to separate it as mentioned above first) so I then checked Google for 'php check for backslash \0 byte' but I still can't work out what to do.
\uXXXX is the JSON Unicode escape notation (X is hexadecimal).
In this case, it means the 0 ASCII char, aka the NUL byte, to split it you can either do:
explode('\u0000', json_encode($array[0]));
Or better yet:
explode("\0", $array[0]); // PHP doesn't use the same notation as JSON
The string you have is "hello\0world", or "hello\x00world" whatever you prefer. If you echo it, the null symbol \0 won't be displayed, thats why you see helloworld instead, but json_encode will detect it and escape it as it does to any other special character, thats why its replaced by a visible \u0000 string.
In my way of seeing it, json is encoding the string perfectly, the \u0000 is there to do its job of reproducing the inputted string in a json encoded way. You don't have to touch its output. If you don't want that \u0000 there you should fix its input instead.
you can simply do trim($str) without giving it a charlist
\uXXXX is the unicode symbol with code XXXX (hexadecimal).
For example: http://msdn.microsoft.com/en-us/library/aa664669(v=vs.71).aspx
If you really get 0000 - then it's just the char with code 0
I came across this issue today and I sorted it out by replacing \u0000 in my array with "" before sending it back to the client.
echo str_replace('\\u0000', "", json_encode($send));
In my case I've found the symbol inside serialized Laravel job's payload json, something like s:8:"\0*\0order"; (or s:8:"\u0000*\u0000order";) which meant that serialized object's property order has visibility protected on a moment of serialization
Just in case anyone need it to apply to the whole array
$data = (array)json_decode(str_replace('\u0000*\u0000', '', json_encode($data)));
Try explode("\u0000", $array[0]);, making sure you use double quotes. With single quotes it's going to parse the literal 6 character value.
As others have mentioned, \u0000 is the Unicode NUL character.

JSON with \x26 in values breaks PHPs json_decode

https://www.googleapis.com/freebase/v1/search?query=madonna#
The JSON result is breaking PHPs json_decode. To be exact, the following string is breaking decoding: "Sticky \x26amp; Sweet Tour".
Browsers however seem to be able to understand it: http://jsfiddle.net/nggX2/ & http://jsfiddle.net/QUVFt/
http://jsonlint.com/ claims it's invalid JSON.
On PHP's side I've tried: http://codepad.viper-7.com/suUbQD and http://codepad.viper-7.com/QjqCH7
Any thoughts on what's going on?
What's going on is that this is invalid JSON. The response from that url is incorrect--JSON doesn't allow the \xXX two-digit hexadecimal binary escape sequences, only \uXXXX unicode code point escape sequences. Here it should just be &, though--no escape sequence needed.
No idea why google/freebase is outputting invalid JSON.
Your JSON should look like the following:
"Sticky \\x26amp; SweetTour"
The slash needs to be escaped, because it is the escape char.

Weird Word Character breaks AJAX

I seem to be having a probelm whenever I try and send something by AJAX that has the Word '-' (hyphen) character in it. It seems to turn he whole string into 'null' in PHP when I convert to JSON.
Has anyone else seen/solved this?
the "Word hyphen" you're talking about is probably an em-dash. This is not a standard ascii character, which means that your issue is likely to be around character encoding.
Either encode all the extended characters in your string as HTML entities using the PHP htmlentities() function, or else ensure that all your content is served as UTF-8.
What are you using? json_decode? Try seeing what you get out of json_last_error
http://www.php.net/manual/en/function.json-last-error.php
The json decode example function has in it, a dash, so its probably not an issue.
http://php.net/manual/en/function.json-decode.php
Check the section on there that says 'common errors'.

PHP Json_Encode strange characters?

I am using JSON_ENCODE in PHP to output data.
When it gets to this word: Æther it outputs \u00c6ther.
Anyone know of a way to make json output that character or am I going to have to change the text to not have that character in it?
That's the unicode version of the character. JavaScript should handle it properly. You'll notice the slash before it which means that it's an escape sequence. The u indicates it's a unicode code point and the hex digits represent the actual character.
See here for some more info.
That is working as specified. The RFC ( http://www.ietf.org/rfc/rfc4627.txt ) indicates that any character may be escaped, and your average printable character can be written in the \uXXXX format.
Any JSON parser that cannot understand a character escaped in that way is not compliant with the standard. Work on resolving that problem rather than trying to coax PHP into misbehaving as well.
(It is legal to put UTF-8 characters into JSON strings without escaping them as well, with a few exceptions, but the safe approach of escaping anything questionable is wise.)

preg_replace - NULL result?

Here's a small example (download, rename to .php and execute it in your shell):
test.txt
Why does preg_replace return NULL instead of the original string?
\x{2192} is the same as HTML "→" ("→").
I had an null response when my regular expression included the u UTF-8 PCRE modifier. If your source text is not UTF and you have this modifier, you'll get a null result.
From the documentation on preg_replace():
Return Values
preg_replace() returns an array if the
subject parameter is an array, or a
string otherwise.
If matches are found, the new subject
will be returned, otherwise subject
will be returned unchanged or NULL if
an error occurred.
In your pattern, I don't think the u flag is supported. WRONG
Edit: It seems like some kind of encoding issue with the subject. When I erase '147 3.2 V6 - GTA (184 kW)' and manually re-type it everything seems to work.
Edit 2: In the pattern you provided, there are 3 spaces that seem to be giving issues to the regex engine. When I convert them to decimal their value is 160 (as opposed to normal space 32). When I replace those spaces with normal ones it seems to work.
I've replaced the offending spaces with underscores below:
'147 3.2 V6 - GTA (184 kW)'
'147 3.2_V6 - GTA_(184_kW)'
You are using single quotes, which means the only thing that you can escape is other single quotes. To enable escape sequences (e.g. \x32, then use double quotes "")
I am not a UTF8 expert, but the escape code \x2192 is not correct either. You can do: \x21\x92 to get both bytes into your string, but you may want to look at utf8_encode and utf8_decode
Your source string has invalid characters in it, or something. PHP gives:
Warning: preg_replace(): Compilation failed: invalid UTF-8 string at offset 0 in test.php on line 7
I believe there is also a fault in your Regex expression: ~\x{2192}~u
Try replacing what I have and see if that works out for you: /\x{2192}/u

Categories