Invalid PHP JSON encoding - php

I'm working on a project in PHP (5.3.1) where I need to send a JSON string to a webservice (in python), but the result I get from json_encode does not pass as a valid JSON (i'm using JSLint to check validity).
I should add that the structure I'm trying to encode is fairly big (13K encoded), and consists partially of UTF8 data, and while json_encode does handle it, i get spaces in weird places in the result. For example, I could get {"hello":tru e} or {"hell o":true} which results in an error from the webservice since the JSON is invalid (or data, like in the second example).
I've also tried to use Zend framework for JSON encoding, but that didn't make much different.
Is there a known issue with JSON in PHP? Did anyone encounter that behavior and found a solution?

You state that "the structure I'm trying to encode ... consists partially of UTF8 data." This implies that it is also partially of non-UTF8 data. The json_encode doc has a comment at the bottom, that
json_encode() expects strings to be encoded to be in UTF8 format, while by default PHP strings are ISO-8859-1 encoded.
This means that
json_encode(array('àü'));
will produce a json representation of an empty string, while
json_encode(array(utf8_encode('àü')));
will work.
Are the failing segments of the JSON due to non-UTF8 input?

For sure object keys cannot contain spaces or any non unicode characters, unquoted variables can be only boolean, integer ,float, object and array value, strings should always be quoted.
Also, I would recommend you to add correct header before your json output.
if(!headers_sent())
header('Content-Type: application/json; charset=utf-8', true,200);
Can you also post your array or object that you passing to json_encode?

I was handling some automatically generated emails the other day and noticed the same weird behavior (spaces were inserted to the email body), so I started to check the email post and found the culprit:
From the SMTP RFC2821:
The maximum total length of a text
line including the is 1000
characters (not counting the leading
dot duplicated for transparency).
My email body was indeed in one line, so breaking it with \n's fixed the spaces issue.

After scratching my head for nearly a day, I've come to the conclusion that the problem was not in the json_encode function. It was with my post function.
Basically, the json_encode was preparing the data to be sent to another service. Before today, I've used stream_context_create and fopen to post data to the external service, but now I use fsockopen and fputs and it seems to be working.
Although I'm unsure as to the nature of the problem, I'm happy it works now :)
BTW: After this process, I mail myself the input and output (both in JSON) and this is how I saw there was a problem in the first place. This problem still persists but I guess that's related to the encoding of the mail or something of that sort.

Related

php filter a string so that json_encode does not error out

I'm grabbing a bunch of data from a database and putting it into a PHP array. I'm then looking to json_encode that array using $output = json_encode($out).
My issue is that from time to time, something in the array is not able to be read by json_encode and the whole thing fails. If I use print_r($out) to have a look, I can clearly see where it's failing, because the character that is screwing things up always appears as a question mark inside of a black diamond �.
First - what are these characters?
Second - Is there a function I can pass the elements through prior to adding them to the array that would strip these out, or replace 'them' with blanks?
I found the answer to this. Since the data coming FROM the database was stored with the "black diamond" character, I needed to get this out POST grabbing it from the database.
$x[4] = utf8_encode(odbc_result($query, 'B'));
By passing the result through utf8_encode, the string is encoded into UTF-8 and the illegal character is removed.
Say echo json_encode($out);
This will solve your issue
Black diamonds are browser issue. Database uses plain question marks.
It seems you are getting already wrong data from databalse. But that's quite tricky to have incorrect utf with your settings. You need to check everything
if your table marked with utf8 charset
if your data indeed encoded in utf (not marked but indeed encoded)
if your server sending correct charset in Content-type header.
it is also useful to see the page choosing different charsets from your browser menu.
But first of all you have to wipe any trace of all random actions you tried, all these various encode, decode and stuff. Just plain and direct output from database. Otherwise you will never get to the problem

JavaScript to PHP - POST encoding issue

I'm attempting to send a string from client-side JavaScript to a back-end PHP script. The string contains special quotes like ’ and “.
When I look at the console in Chrome I can see that these are sent in the POST headers as they are. On the PHP side I then immediately json_encode() the $_POST array and send it back to see what its collected. The special characters now look like this \u2019. This is for testing please note I would normally sanitize all post data.
I wish to use UTF-8 but I'm not sure what I'm missing. My HTML includes:
<meta charset="utf-8">
My PHP server has UTF-8 set as its default charset.
If I start saving such data to the database I start ending up with strings like this: â for ’. However this is not a database issue the characters are already bad before going into the database. MySQL purely accentuates them.
Any ideas?
Update
I've noticed that if I return the string back to javascript without using json_encode() then it's in its original format with the special quotes (’ and “) still.
Have you tried:
utf8_decode()
On the server side for the variables you're passing? PHP is likely expecting iso-8859-1 rather than uft-8.
Turns out there was an issue both sides of the pond. The issue PHP side which this question regards was that the data was being sent to the back-end via a GET request (url encoded). I have changed this to a POST request.
This has allowed me to specify the UTF-8 charset when sending the headers for the POST request.

Posting json string which contains £

I have a VB6 application that is creating a JSON string and posting it to a website (PHP5). It may look like this:
data=thisisthejsonstringitcontainsthe£hmtlcharacter&code=123&api_key=321
This is an issue because the £ is thought to be the start of a new variable so the json string is being cut.
Does this need to be encoded somehow at the VB source? Or can I do something with this when it arrives at the website? If it needs encoded by VB can anyone suggest a suitable function?
I'm using the application/x-www-form-urlencoded content type when posting.
This might seem far fetched, but have you tried sending &pound urlencoded beforehand? being %26pound
If you send url arguments to a webserver, you'll need to urlencode them. That's true for all arguments in formats that don't escape and can contain problematic characters themselves, which includes JSON.

JSON Encode and curly quotes

I've run into an interesting behavior in the native PHP 5 implementation of json_encode(). Apparently when serializing an object to a json string, the encoder will null out any properties that are strings containing "curly" quotes, the kind that would potentially be copy-pasted out of MS Word documents with the auto conversion enabled.
Is this an expected behavior of the function? What can I do to force these kinds of characters to covert to their basic equivalents? I've checked for character encoding mismatches between the database returning the data and the administration page the inserts it and everything is setup correctly - it definitely seems like the encoder just refuses these values because of these characters. Has anyone else encountered this behavior?
EDIT:
To clarify;
MSWord will take standard quotation marks and apostraphes and convert them to more aesthetic "fancy" or "curly" quotes. These characters can cause problems when placed in content managers that have charset mistmatches between their editing interface (in the html) and the database encoding.
That's not the problem here, though. For example, I have a json_object representing a person's profile and the string:
Jim O’Shea
The UTF code for that apostraphe being \u2019
Will come out null in the json object when fetched from database and directly json_encoded.
{"model_name":"Bio","logged":true,"BioID":"17","Name":null,"Body":"Profile stuff!","Image":"","Timestamp":"2011-09-23 11:15:24","CategoryID":"1"}
Never had this specific problem (i.e. with json_encode()) but a simple - albeit a bit ugly - solution I have used in other places is to loop through your data and pass it through this function I got from somewhere (will credit it when I find out where I got it):
function convert_fancy_quotes ($str) {
return str_replace(array(chr(145),chr(146),chr(147),chr(148),chr(151)),array("'","'",'"','"','-'),$str);
}
json_encode has the nasty habit of silently dropping strings that it finds invalid (i.e. non-UTF8) characters in. (See here for background: How to keep json_encode() from dropping strings with invalid characters)
My guess is the curly quotes are in the wrong character set, or get converted along the way. For example, it could be that your database connection is ISO-8859-1 encoded.
Can you clarify where the data comes from in what format?
If I ever need to do that, I first copy the text into Notepad and then copy it from there. Notepad forces it to be normal quotes. Never had to do it through code though...

What is the correct encoding / escaping / htmlentities needed when sending data from js to php, php to mysql, and for REST json responses

This is something I encounter often, and I usually end up resorting to the try and try again method until the data works. I figured SO would know what the best practice is in order to maintain the data and not mess up the json.
Let's assume the data I want to send is text data of the most annoying sort - special characters, &,<,", \n, \n\r, \t, +, etc.
Let's also assume I want to keep everything in utf8, and my mysql table is configured to be utf8. However, since PHP's utf8 support is lacking, this should be considered.
What encoding / escaping / htmlentities should I be doing from:
1) Sending JSON data from client JS to PHP via AJAX POST (anything different for GET?)
2) Decoding data in PHP and storing text string in mysql database (or store the escaped/encoded data? )
3) Retrieving data from MySQL DB in PHP and returned as JSON response to JS AJAX request
4) In a JSON response from our REST api
Whenever I use php/mysql/jquery to pass data back and forth, I end up using the following combination of encodings/escapings, and it seems to work well for me.
1) you don't need to do anything here, UNLESS you are sending a URL (I think this is only for GET requests) - but if you're sending a url you need to use encodeURIComponent(url), which will properly escape the &'s and special characters in the url (see more here).
2) Use mysqli and bound parameters, it will do all the escaping for you (read about it here)
3) I always use this when echoing data into an HTML file :
<?php
htmlspecialchars($string_to_escape, ENT_QUOTES, 'UTF-8', false);
?>
This will properly encode all special characters (the false is for "no double encoding"). Also make sure you the proper UTF-8 meta tags at the top of your html pages.
4) Using json_encode should always escape your data properly, but I would use the code from #3 just to make sure. But you'll probably only need it if you're returning data with special characters in it.
for sending json data to php you don't have to do anything special. JSON is just a serialized javascript 'variable'
use prepared statements! do not try to decode/strip/alter the content with php
use the appropriate functions to escape data for json (I don't know if there's a builtin php function for that)
same as 3.

Categories