Weird Word Character breaks AJAX - php

I seem to be having a probelm whenever I try and send something by AJAX that has the Word '-' (hyphen) character in it. It seems to turn he whole string into 'null' in PHP when I convert to JSON.
Has anyone else seen/solved this?

the "Word hyphen" you're talking about is probably an em-dash. This is not a standard ascii character, which means that your issue is likely to be around character encoding.
Either encode all the extended characters in your string as HTML entities using the PHP htmlentities() function, or else ensure that all your content is served as UTF-8.

What are you using? json_decode? Try seeing what you get out of json_last_error
http://www.php.net/manual/en/function.json-last-error.php
The json decode example function has in it, a dash, so its probably not an issue.
http://php.net/manual/en/function.json-decode.php
Check the section on there that says 'common errors'.

Related

Why is php converting certain characters to '?'

Everything in my code is running my database(Postgresql) is using utf8 encoding, I've checked the php.ini file its encoding is utf8, I tried debugging to see if it was any of the functions I used that were doing this, but nothing everything is running as expected, however after my frontend sends a post request to backend server through curl for some text to be inserted in the database, some characters like 'da' are converted to '?' in postgre and in memcached, I think php is converting them to Latin-1 again after the request reaches the other side for some reason becuase I use utf8_encode before the request and utf8_decode on the other side
this is the code to send the request
$pre_opp->
Send_Request_To_BackEnd("/Settings",$school_name,$uuid,"Upload_Bio","POST",str_replace(" ","%",utf8_encode($bio)));
this is how the backend system receives this
$data= str_replace("%"," ",utf8_decode($_POST["Data"]));
Don't replace " " with "%".
Use urlencode and urldecode instead of utf8_encode and utf8_decode - It will give you a clean alphanumeric representation of any character to easily transport your data.
If everything in your environment defaults to UTF-8, you shouldn't need utf_encode and utf_decode anyways, I guess. But if you still do, you could try combining both like this:
Send_Request_To_BackEnd("/Settings",$school_name,$uuid,"Upload_Bio","POST", urlencode(utf8_encode($bio)));
and
$data= str_replace("%"," ",utf8_decode(urldecode($_POST["Data"])));
You say this like it's a mystery:
I think php is converting them to Latin-1 again after the request reaches the other side for some reason
But then you give the reason yourself:
because I use utf8_encode before the request and utf8_decode on the other side
That is exactly what uf8_decode does: it converts UTF-8 to Latin-1.
As the manual explains, this is also where your '?' replacements come from:
This function converts the string string from the UTF-8 encoding to ISO-8859-1. Bytes in the string which are not valid UTF-8, and UTF-8 characters which do not exist in ISO-8859-1 (that is, characters above U+00FF) are replaced with ?.
Since you'd picked the unfortunate replacement of % for space, sequences like "%da" were being interpreted as URL percent escapes, and generating invalid UTF-8 strings. You then asked PHP to convert them to Latin-1, and it couldn't, so it substituted "?".
The simple solution is: don't do that. If your data is already in UTF-8, neither of those functions will do anything but mess it up; if it's not already in UTF-8, then work out what encoding it's in and use iconv or mb_convert_encoding to convert it, once. See also "UTF-8 all the way through".
Since we can't see your Send_Request_To_BackEnd function, it's hard to know why you thought you needed it. If you're constructing a URL with that string, you should use urlencode inside your request sending code; you shouldn't need to decode it the other end, PHP will do that for you.

PHP and Unicode or UTF-8?

My PHP application outputs JSON where special characters are encoded, f.ex. the string "Brøndum" is represented as "Br\u00f8ndum".
Can you tell me which encoding this is, as well as how I get back from "Br\u00f8ndum" to "Brøndum".
I have tried utf8_encode/decode but they don't work as expected.
Thanks!
That's standard JSON unicode escaping.
You get back to the actual character by using a JSON parser. json_decode in the case of PHP.
You can tell PHP not to escape Unicode characters in the first place with the JSON_UNESCAPED_UNICODE flag.
json_encode("Brøndum", JSON_UNESCAPED_UNICODE)
mb_detect_encoding is your function. You just pass it the string and it detects the codification. You can also send it an array with the possibilities (as a regular string like "hello" could potentially be encoded in different codifications.
echo mb_detect_encoding("Br\u00f8ndum");

decoding ISO characters

I got Chinese characters encoded in ISO-8859-1, for example 兼 = 兼
Those characters are taken form the database using AJAX and sent by Json using json_encode.
I then use the template Handlebars to set the data on the page.
When I look at the ajax page the characters are displayed correctly, the source is still encoded.
But the final result displays the encrypted characters.
I tried to decode on the javascript part with unescape but there is no foreach with the template that gives me the possibility to decode the specific variable, so it crashes.
I tried to decode on the PHP side with htmlspecialchars_decode but without success.
Both pages are encoded in ISO-8859-1, but I can change them in UTF8 if necessary, but the data in the database remains encoded in ISO-8859-1.
Thank you for your help.
You're simply representing your characters in HTML entities. If you want them as "actual characters", you'll need to use an encoding that can represent those characters, ISO-8859 won't do. htmlspecialchars_decode doesn't work because it only decodes a handful of characters that are special in HTML and leaves other characters alone. You'll need html_entity_decode to decode all entities, and you'll need to provide it with a character set to decode to which can handle Chinese characters, UTF-8 being the obvious best choice:
$str = html_entity_decode($str, ENT_COMPAT, 'UTF-8');
You'll then need to make sure the browser knows that you're sending it UTF-8. If you want to store the text in the database in UTF-8 as well (which you really should), best follow the guide How to handle UTF-8 in a web app which explains all the pitfalls.
Are you including your text with the "double-stache" Handlebars syntax?
{{your expression}}
As the Handlebars documentation mentions, that syntax HTML-escapes its output, which would cause the results you're mentioning, where you're seeing the entity 兼 instead of 兼.
Using three braces instead ("triple-stache") won't escape the output and will let the browser correctly interpet those numeric entities:
{{{your expression}}}

how to fix a malformed JSON in php

i have this JSON string that i want to decode it with json_decode(); function
{"phase":2,"id":"pagelet_profile_picture","css":["VCxcl","Ix2pq"],"js":["fZYUE","VfnZ3"],"content":{"pagelet_profile_picture":"\u003cdiv class=\"profile-picture\">\u003cspan class=\"profile-picture-overlay\">\u003c\/span>\u003cimg class=\"photo img\" src\=\"http:\/\/profile.ak.fbcdn.net\/hprofile-ak-snc4\/222_111_2222_n.jpg\" alt=\"bla bla\" id=\"profile_pic\" \/>\u003c\/div>"}}
there is the json_last_error(); but it not helping me. (got JSON_ERROR_STATE_MISMATCH and JSON_ERROR_SYNTAX sometimes)
i want to know what wrong with this JSON string and how i can fix it automatically in PHP so i can decode it.
some code will be very helpful
thanks.
Using a json lint, it seems the problem is the src\=
the \ escapes the = sign, which makes no sense.
If you replace src\= with src= it passes the validator.
The fix:
Fix the code that generates the json string in the first place.
or
use str_replace to change 'src\=' to 'src='
The problem with a wrong encoding is that it's just a wrong encoding. Things then break.
If the problem is related to invalid escape sequences as Ben pointed out in his answer, you can try to fix the input string for these sequences, probably with a smarter algorithm that is looking for any not-needed escape sequence replacing it with it's non-escaped value by removing the escape character \.
You can do so by creating a list of characters that need actual to be escaped, then parse the whole string for the escape character, if found, check if the next character requires escaping or not and then act upon.
However that's only one strategy and as the input is not properly encoded, it's not easy to just fix things because they are already broken.

How to parse unicode format (e.g. \u201c, \u2014) using PHP

I am pulling data from the Facebook graph which has characters encoded like so: \u2014 and \u2014
Is there a function to convert those characters into HTML? i.e \u2014 -> —
If you have some further reading on these character codes), or suggested reading about unicode in general I would appreciate it. This is so confusing to me. I don't know what to call these codes... I guess unicode, but unicode seems to mean a whole lot of things.
that's not entirely true bobince.
How do you handle json containing spanish accents?
there are 2 problems.
I make FB.api(url, function(response)
... var s=JSON.stringify(response);
and pass it to a php script via $.post
First I get a truncated string. I need escape(JSON.stringify(response))
Then I get a full json encoded string with spanish accents.
As a test, I place it in a text file I load with file_get_contents and apply php json_decode and get nothing.
You first need utf8_encode.
And then you get awaiting object of your desire.
After a full day of test and google without any result when decoding unicode properly, I found your post.
So many thanks to you.
Someone asked me to solve the problem of Arabic texts from the Facebook JSON archive, maybe this code helps someone who searches for reading Arabic texts from Facebook (or instagram) JSON:
$str = '\u00d8\u00ae\u00d9\u0084\u00d8\u00b5';
function decode_encoded_utf8($string){
return preg_replace_callback('#\\\\u([0-9a-f]{4})#ism', function($matches) { return mb_convert_encoding(pack("H*", $matches[1]), "UTF-8", "UCS-2BE"); }, $string);
}
echo iconv("UTF-8", "ISO-8859-1//TRANSLIT", decode_encoded_utf8($str));
Facebook Graph API returns JSON objects. Use json_decode() to read them into PHP and you do not have to worry about handling string literal escapes like \uNNNN. Don't try to decode JSON/JavaScript string literals by yourself, or extract chosen properties using regex.
Having read the string value, you'll have a UTF-8-encoded string. If your target HTML is also UTF-8-encoded, you don't need to replace — (U+2014) with any entity reference. Just use htmlspecialchars() on the string when outputting it, so that any < or & characters in the string are properly encoded.
If you do for some reason need to produce ASCII-safe HTML, use htmlentities() with the charset arg set to 'utf-8'.

Categories