Strange encoding behaviour in AFNetworking

Strange encoding behaviour in AFNetworking - php

I'm using AFNetworking in and iOS project and so far everything went ok. Now I have a script in PHP that is supposed to get some info and return some json. Both the info the script is provided with and the json it is supposed to return cointains latin chars, mainly ã and õ.
The thing is that when i recieve the json back at my iOS app the characters come encoded as what I think is NSNonLossyASCIIStringEncoding. I think the encoding is not UTF8 because back at the app:
[jsonManager GET:myURL parameters:sendingData success:^(AFHTTPRequestOperation *op,id responseObject){
NSLog(#"%d",op.responseStringEncoding);
NSLog(#"%d",op.responseSerializer.stringEncoding);
NSLog(#"%#",op.responseString);
NSLog(#"%#",[[NSString alloc]initWithData:op.responseData encoding:NSNonLossyASCIIStringEncoding]);
} failure:^(AFHTTPRequestOperation *op,NSError *error){
NSLog(#"%#",op.responseString);
}];
The last NSLog(in case of success) is the only one that outputs the responseString as it was supposed to be. The third log outputs \u00e3 in the place of every ã.
And the first log confirms that the encoding used was NSUTF8StringEncoding.
The second log states that responseSerializer.stringEnconding is NSNonLossyASCIIStringEncoding because I set it to be like that, previously to making the request, it made no difference, dont know why either...
The really strange thing is that if I invoke the script using a browser I can see that the output is encoded as UTF8.
What is wrong here?
Thank You.

It sounds like your server is using different encoding types depending on the client or some header.
NSJSONSerialization strictly implements RFC 4627, which states:
JSON text SHALL be encoded in Unicode. The default encoding is
UTF-8.
JSON is always Unicode-encoded, so my guess is that your server isn't following the spec.
Instead of using your browser, try to replicate the behavior using CURL, or a Chrome plug-in like Advanced REST Client. One place to start is your server's parsing of the Accept, User-Agent and Content-Type headers.

Related

HTTP Content Type in Header

I have old web app that generates XML files in php. This XMLs are requested by XMLHttpRequest object (AJAX). Everything works correctly. But today there has been some server upgrade and web app breaks down a little.
The problem is that in code there are checks related to XMLHttpRequests.
1) if I have a response than I parse it properly based on it content type.
var contentType = xhr.getResponseHeader("Content-Type");
//build the json object if the response has one
if(contentType == "application/json") {
response = JSON.parse(xhr.responseText);
}
//get the dom element if the response is XML
else if(contentType == "text/xml") {
response = xhr.responseXML;
} else { //by default get the response as text
response = xhr.responseText;
}
And here is the problem cause server now returns:
text/xml;charset=UTF-8
instead of
text/xml
Ok I can just change this line and the error disappear. But I would like to know why server upgrade (bluehost) can have influance on this.
This is PHP/MySQL environment.

Both are valid content types. The content type can be set by the web server software (e.g. Apache) or the script (PHP). I'm assuming it's PHP because of the tag on your question.
If you control the script on the server and want to specify the content type, it's easy to do within PHP by adding the line:
header('Content-Type: text/xml');
This must occur before any other output is sent from the script because headers appear before content in http responses. If the header is not set within the PHP script, then the web server will choose one instead.
If you don't control the script that produces the XML or the server then you just need to accept that it is common for systems to be upgraded and this may impact on your own application.

Just to add to Steve E's answer, the "charset=UTF-8" portion is specifying a character set.
There is no better explanation of unicode (UTF-8 is an implementation of unicode) and character sets then the one on Joel on Software, here (incidentally Joel also created Stack Overflow). In short, character sets define the set of characters than can be used in text. Unicode, a character set, supports nearly all international languages. UTF-8 specifies how the Unicode character set is implemented in bytes (so with UTF-8, Unicode characters take anywhere from 1 - 4 bytes). When you see garbled text (for example ?s instead of characters) that is often because the document is not being interpreted in the correct character encoding.
It's actually best practice to include the encoding in the content-type header, so I would keep it as "text/xml;charset=UTF-8". Bluehost was likely updating their default settings (ie/ the default content-type they display for xml documents) which caused the change. Just as an aside, the terms character set and encoding are sometimes used interchangeably, but when you specify "charset=UTF-8" you are more correctly specifying the encoding (UTF-8 is the encoding, Unicode is the character set).

UTF-8 data received by php isn't decoded

I'm having some troubles with my $_POST/$_REQUEST datas, they appear to be utf8_encoded still.
I am sending conventional ajax post requests, in these conditions:
oXhr.setRequestHeader("Content-type", "application/x-www-form-urlencoded; charset=utf-8");
js file saved under utf8-nobom format
meta-tags in html <header> tag setup
php files saved under utf-8-nobom format as well
encodeURIComponent is used but I tried without and it gives the same result
Ok, so everything is fine: the database is also in utf8, and receives it this way, pages show well.
But when I'm receiving the character "º" for example (through $_REQUEST or $_POST), its binary represention is 11000010 10111010, while "º" hardcoded in php (utf8...) binary representation is 10111010 only.
wtf? I just don't know whether it is a good thing or not... for instance if I use "#º#" as a delimiter of the explode php function, it won't get detected and this is actually the problem which lead me here.
Any help will be as usual greatly appreciated, thank you so much for your time.
Best rgds.
EDIT1: checking against mb_check_encoding
if (mb_check_encoding($_REQUEST[$i], 'UTF-8')) {
raise("$_REQUEST is encoded properly in utf8 at index " . $i);
} else {
raise(false);
}
The encoding got confirmed, I had the message raised up properly.

Single byte utf-8 characters do not have bit 7(the eight bit) set so 10111010 is not utf-8, your file is probably encoded in ISO-8859-1.

more specific about sending json with utf8 to iphone from php server

We have php server that sends json string in utf-8 encoding.
Im responsible for the iphone app that get the data.
I want to be sure that on my side everything is correct :
//after I downlad the data stream :
NSString* content = [[NSString alloc] initWithData:self.m_dataToParse encoding:NSUTF8StringEncoding];
//here the data is shown correctly in the console
NSLog(#"%#",content);
SBJsonParser *_parser = [[SBJsonParser alloc]init];
NSDictionary *jsonContentDictionary = [_parser objectWithData:self.m_dataToParse];
//here, when i printer values of array IN array, i see \u454 u\545 \4545 format. any ideas why ?
for(id key in jsonContentDictionary)
{
NSLog(#"key:%#, value:%#,key, [ jsonContentDictionary objectForKey:key]);
}
im using the latest version of json library :
https://github.com/stig/json-framework/
There is problem is the iphone side ? (json parser ? ) or in the php server ?
just to be clear again :
1.on the console, before json, the string looks o.k
2.after doing json, the array in array values are in the format of \u545 \u453 \u545
Thanks in advance.

Your code is correct.
A possible reason of the issue, and you must investigate it with your content provider (the server that sends the json to you), is that even if the whole json string is correctly encoded as utf-8 (remember: the json text is a sequence of character and so an encoding must be specified), it may happen that some or all of the text content (that is the values of the single objects contained in the json message) has been originally encoded in another format, typically this is html-encoding (iso-8859) especially when particular characters are used (e.g. cyrillic or asian). Now the json framework by default decodes all data as utf-8, but if there is a coding mismatch between the utf-8 characters and the iso-8859 (just to remain in the example) then the only way to transform them in utf-8 is to use the \u format. This happens quite often, especially when php scripts extract the info from html pages, which are usually encoded using iso-8859. And consider also that iOS is not able to convert the whole set of iso-8859 characters to unicode (e.g.: cyrillic).
So possible solutions are:
- do a content encoding of texts server side (iso-8859 --> utf-8)
- or if this is not possible, then it's up to you to recognize the \uxxx sequences coming more often from your content provider and replace them with the corresponding utf-8 characters.

Encoding problem in PHP while making a webservice call

I have the nth problem encoding related with PHP!
so the story is:
i read a url from a file (ISO-8859). I cant change the encoding of this file for various reason I wont discuss here.
I use that url to make a call to a rest webservice.
the url happens to contain the symbol "è" which is conveted to � when it is loaded by the PHP engine.
as a result the webservice returns and unexpected result because what it gets is actually the word "perch�" instead of "perchè".
I tried to force php to work with ISO-8859 by doing:
ini_set('default_charset', "ISO-8859");
The problem is that it still doesn't work and the webservice doesn't answer properly. I am sure that the webservice works as I tried to copy paste the url by hand in a browser and I received the expected data.

You can convert data from one character set into another using iconv().
Your REST web service is most likely expecting UTF-8 data, so you would have to do something like this:
$data = iconv("iso-8859-1", "utf-8", $data);
before sending the request.

Why doesn't jQuery.parseJSON() work on all servers?

Hey there, I have an Arabic contact script that uses Ajax to retrieve a response from the server after filling the form.
On some apache servers, jQuery.parseJSON() throws an invalid json excepion for the same json it parses perfectly on other servers. This exception is thrown only on chrome and IE.
The json content gets encoded using php's json_encode() function. I tried sending the correct header with the json data and setting the unicode to utf-8, but that didn't help.
This is one of the json responses I try to parse (removed the second part of if because it's long):
{"pageTitle":"\u062e\u0637\u0623 \u0639\u0646\u062f \u0627\u0644\u0625\u0631\u0633\u0627\u0644 !"}
Note: This language of this data is Arabic, that's why it looks like this after being parsed with php's json_encode().
You can try to make a request in the examples given down and look at the full response data using firebug or webkit developer tools. The response passes jsonlint!
Finally, I have two urls using the same version of the script, try to browse them using chrome or IE to see the error in the broken example.
The working example : http://namodg.com/n/
The broken example: http://www.mt-is.co.cc/my/call-me/
Updated: To clarify more, I would like to note that I manged to fix this by using the old eval() to parse the content, I released another version with this fix, it was like this:
// Parse the JSON data
try
{
// Use jquery's default parser
data = $.parseJSON(data);
}
catch(e)
{
/*
* Fix a bug where strange unicode chars in the json data makes the jQuery
* parseJSON() throw an error (only on some servers), by using the old eval() - slower though!
*/
data = eval( "(" + data + ")" );
}
I still want to know if this is a bug in jquery's parseJSON() method, so that I can report it to them.

Found the problem! It was very hard to notice, but I saw something funny about that opening brace... there seemed to be a couple of little dots near it. I used this JavaScript bookmarklet to find out what it was:
javascript:window.location='http://www.google.com/search?q=u+'+('000'+prompt('String?').charCodeAt(prompt('Index?')).toString(16)).slice(-4)
I got the results page. Guess what the problem is! There is an invisible character, repeated twice actually, at the beginning of your output. The zero width non-breaking space is also called the Unicode byte order mark (BOM). It is the reason why jQuery is rejecting your otherwise valid JSON and why pasting the JSON into JSONLint mysteriously works (depending on how you do it).
One way to get this unwanted character into your output is to save your PHP files using Windows Notepad in UTF-8 mode! If this is what you are doing, get another text editor such as Notepad++. Resave all your PHP files without the BOM to fix your problem.
Step 1: Set up Notepad++ to encode files in UTF-8 without BOM by default.
Step 2: Open each existing PHP file, change the Encoding setting, and resave it.

You should try using json2.js (it's on https://github.com/douglascrockford/JSON-js)
Even John Resig (creator of jQuery) says you should:
This version of JSON.js is highly recommended. If you're still using the old version, please please upgrade (this one, undoubtedly, cause less issues than the previous one).
http://ejohn.org/blog/the-state-of-json/

I don't see anything related to parseJSON()
The only difference I see is that in the working example a session-cookie is set(guess it is needed for the "captcha", the mathematical calculation), in the other example no session-cookie is set. So maybe the comparision of the calculation-result fails without the session-cookie.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Strange encoding behaviour in AFNetworking - php

Related

HTTP Content Type in Header

UTF-8 data received by php isn't decoded

more specific about sending json with utf8 to iphone from php server

Encoding problem in PHP while making a webservice call

Why doesn't jQuery.parseJSON() work on all servers?

Categories

Resources