I haven't a clue what is going on but I have a string inside an array. It must be a string as I have ran this on it first:
$array[0] = (string)$array[0];
If I output $array[0] to the browser in plain text it shows this:
hellothere
But if I JSON encode $array I get this:
hello\u0000there
Also, I need to separate the 'there' part (the bit after the \u0000), but this doesn't work:
explode('\u0000', $array[0]);
I don't even know what \u0000 is or how to control it in PHP.
I did see this link: Trying to find and get rid of this \u0000 from my json
...which suggests str_replacing the JSON that is generated. I can't do that (and need to separate it as mentioned above first) so I then checked Google for 'php check for backslash \0 byte' but I still can't work out what to do.
\uXXXX is the JSON Unicode escape notation (X is hexadecimal).
In this case, it means the 0 ASCII char, aka the NUL byte, to split it you can either do:
explode('\u0000', json_encode($array[0]));
Or better yet:
explode("\0", $array[0]); // PHP doesn't use the same notation as JSON
The string you have is "hello\0world", or "hello\x00world" whatever you prefer. If you echo it, the null symbol \0 won't be displayed, thats why you see helloworld instead, but json_encode will detect it and escape it as it does to any other special character, thats why its replaced by a visible \u0000 string.
In my way of seeing it, json is encoding the string perfectly, the \u0000 is there to do its job of reproducing the inputted string in a json encoded way. You don't have to touch its output. If you don't want that \u0000 there you should fix its input instead.
you can simply do trim($str) without giving it a charlist
\uXXXX is the unicode symbol with code XXXX (hexadecimal).
For example: http://msdn.microsoft.com/en-us/library/aa664669(v=vs.71).aspx
If you really get 0000 - then it's just the char with code 0
I came across this issue today and I sorted it out by replacing \u0000 in my array with "" before sending it back to the client.
echo str_replace('\\u0000', "", json_encode($send));
In my case I've found the symbol inside serialized Laravel job's payload json, something like s:8:"\0*\0order"; (or s:8:"\u0000*\u0000order";) which meant that serialized object's property order has visibility protected on a moment of serialization
Just in case anyone need it to apply to the whole array
$data = (array)json_decode(str_replace('\u0000*\u0000', '', json_encode($data)));
Try explode("\u0000", $array[0]);, making sure you use double quotes. With single quotes it's going to parse the literal 6 character value.
As others have mentioned, \u0000 is the Unicode NUL character.
Related
In PHP, is it at all possible to output the contents of a string to show any escaped characters that may be contained within the string? I get that the whole point of escaping characters is so that they aren't treated in the usual way. But I would still like to be able to view the raw contents of a string so I can see for myself exactly how characters like \n and \r, etc. are represented. Does PHP have a method for doing this?
Use json_encode() to encode the string as JSON. The JSON encoding of strings (which is, in fact, JavaScript) is the same as the one used by PHP. Both JavaScript and PHP were inspired from C and they copied the notation of string literals from it.
if you use single quotation marks it should do what you need
eg echo 'this\n'; will output this\n where as echo "this\n"; will output this and a new line
I have a binary Word .doc that looks something like this in string format:
þÿÿÿÿÿÿÿppp„±¶g œÙ Text in word doc here I'm interested in [|`ñÿ|Standard1$S_HmHnHsHtHOJPJQJCJEH567>
When I echo that string, I can see all the text I'm interested in finding in between unrecognized characters (but those I'm not worried about them since I only want the text). The issue is that PHP does not seem to recognize it as a string and so I cannot search it with strpos(), strpos(), strchr(), mb_strpos() all return nothing. No -1, no error in the PHP error log, just nothing.
However, when I call gettype() I get string. I suspect this is an encoding issue, but mb_detect_encoding returns UTF-8. I have tried converting it to multiple different encoding types, without avail.
How can I get PHP to search this string? I understand that parsing a Word .doc is more complex of an issue, but for my purposes the plaintext I'm interested in are in the binary data. Does anyone have any experience with this?
Thank you :)
Since you string seems binary encoded and you are only interested in text a quick solution would be to use filter_var to clean the string from non ascii-printable characters.Try using this before searching:
$clean_string = filter_var($str,FILTER_FLAG_STRIP_LOW, FILTER_FLAG_STRIP_HIGH);
Notice the part "Standard1$". php is taking $ as the operator instead of a character.
check here.
<?php
$s = "þÿÿÿÿÿÿÿppp„±¶g œÙ Text in word doc here I'm interested in [|`ñÿ|Standard1$S_HmHnHsHtHOJPJQJCJEH567>";
$s2 = strpos($s, "interested");
echo $s2;
?>
you might want to put a backslash before that $ sign.
i have this JSON string that i want to decode it with json_decode(); function
{"phase":2,"id":"pagelet_profile_picture","css":["VCxcl","Ix2pq"],"js":["fZYUE","VfnZ3"],"content":{"pagelet_profile_picture":"\u003cdiv class=\"profile-picture\">\u003cspan class=\"profile-picture-overlay\">\u003c\/span>\u003cimg class=\"photo img\" src\=\"http:\/\/profile.ak.fbcdn.net\/hprofile-ak-snc4\/222_111_2222_n.jpg\" alt=\"bla bla\" id=\"profile_pic\" \/>\u003c\/div>"}}
there is the json_last_error(); but it not helping me. (got JSON_ERROR_STATE_MISMATCH and JSON_ERROR_SYNTAX sometimes)
i want to know what wrong with this JSON string and how i can fix it automatically in PHP so i can decode it.
some code will be very helpful
thanks.
Using a json lint, it seems the problem is the src\=
the \ escapes the = sign, which makes no sense.
If you replace src\= with src= it passes the validator.
The fix:
Fix the code that generates the json string in the first place.
or
use str_replace to change 'src\=' to 'src='
The problem with a wrong encoding is that it's just a wrong encoding. Things then break.
If the problem is related to invalid escape sequences as Ben pointed out in his answer, you can try to fix the input string for these sequences, probably with a smarter algorithm that is looking for any not-needed escape sequence replacing it with it's non-escaped value by removing the escape character \.
You can do so by creating a list of characters that need actual to be escaped, then parse the whole string for the escape character, if found, check if the next character requires escaping or not and then act upon.
However that's only one strategy and as the input is not properly encoded, it's not easy to just fix things because they are already broken.
I seem to be having a probelm whenever I try and send something by AJAX that has the Word '-' (hyphen) character in it. It seems to turn he whole string into 'null' in PHP when I convert to JSON.
Has anyone else seen/solved this?
the "Word hyphen" you're talking about is probably an em-dash. This is not a standard ascii character, which means that your issue is likely to be around character encoding.
Either encode all the extended characters in your string as HTML entities using the PHP htmlentities() function, or else ensure that all your content is served as UTF-8.
What are you using? json_decode? Try seeing what you get out of json_last_error
http://www.php.net/manual/en/function.json-last-error.php
The json decode example function has in it, a dash, so its probably not an issue.
http://php.net/manual/en/function.json-decode.php
Check the section on there that says 'common errors'.
I am pulling data from the Facebook graph which has characters encoded like so: \u2014 and \u2014
Is there a function to convert those characters into HTML? i.e \u2014 -> —
If you have some further reading on these character codes), or suggested reading about unicode in general I would appreciate it. This is so confusing to me. I don't know what to call these codes... I guess unicode, but unicode seems to mean a whole lot of things.
that's not entirely true bobince.
How do you handle json containing spanish accents?
there are 2 problems.
I make FB.api(url, function(response)
... var s=JSON.stringify(response);
and pass it to a php script via $.post
First I get a truncated string. I need escape(JSON.stringify(response))
Then I get a full json encoded string with spanish accents.
As a test, I place it in a text file I load with file_get_contents and apply php json_decode and get nothing.
You first need utf8_encode.
And then you get awaiting object of your desire.
After a full day of test and google without any result when decoding unicode properly, I found your post.
So many thanks to you.
Someone asked me to solve the problem of Arabic texts from the Facebook JSON archive, maybe this code helps someone who searches for reading Arabic texts from Facebook (or instagram) JSON:
$str = '\u00d8\u00ae\u00d9\u0084\u00d8\u00b5';
function decode_encoded_utf8($string){
return preg_replace_callback('#\\\\u([0-9a-f]{4})#ism', function($matches) { return mb_convert_encoding(pack("H*", $matches[1]), "UTF-8", "UCS-2BE"); }, $string);
}
echo iconv("UTF-8", "ISO-8859-1//TRANSLIT", decode_encoded_utf8($str));
Facebook Graph API returns JSON objects. Use json_decode() to read them into PHP and you do not have to worry about handling string literal escapes like \uNNNN. Don't try to decode JSON/JavaScript string literals by yourself, or extract chosen properties using regex.
Having read the string value, you'll have a UTF-8-encoded string. If your target HTML is also UTF-8-encoded, you don't need to replace — (U+2014) with any entity reference. Just use htmlspecialchars() on the string when outputting it, so that any < or & characters in the string are properly encoded.
If you do for some reason need to produce ASCII-safe HTML, use htmlentities() with the charset arg set to 'utf-8'.