I have old web app that generates XML files in php. This XMLs are requested by XMLHttpRequest object (AJAX). Everything works correctly. But today there has been some server upgrade and web app breaks down a little.
The problem is that in code there are checks related to XMLHttpRequests.
1) if I have a response than I parse it properly based on it content type.
var contentType = xhr.getResponseHeader("Content-Type");
//build the json object if the response has one
if(contentType == "application/json") {
response = JSON.parse(xhr.responseText);
}
//get the dom element if the response is XML
else if(contentType == "text/xml") {
response = xhr.responseXML;
} else { //by default get the response as text
response = xhr.responseText;
}
And here is the problem cause server now returns:
text/xml;charset=UTF-8
instead of
text/xml
Ok I can just change this line and the error disappear. But I would like to know why server upgrade (bluehost) can have influance on this.
This is PHP/MySQL environment.
Both are valid content types. The content type can be set by the web server software (e.g. Apache) or the script (PHP). I'm assuming it's PHP because of the tag on your question.
If you control the script on the server and want to specify the content type, it's easy to do within PHP by adding the line:
header('Content-Type: text/xml');
This must occur before any other output is sent from the script because headers appear before content in http responses. If the header is not set within the PHP script, then the web server will choose one instead.
If you don't control the script that produces the XML or the server then you just need to accept that it is common for systems to be upgraded and this may impact on your own application.
Just to add to Steve E's answer, the "charset=UTF-8" portion is specifying a character set.
There is no better explanation of unicode (UTF-8 is an implementation of unicode) and character sets then the one on Joel on Software, here (incidentally Joel also created Stack Overflow). In short, character sets define the set of characters than can be used in text. Unicode, a character set, supports nearly all international languages. UTF-8 specifies how the Unicode character set is implemented in bytes (so with UTF-8, Unicode characters take anywhere from 1 - 4 bytes). When you see garbled text (for example ?s instead of characters) that is often because the document is not being interpreted in the correct character encoding.
It's actually best practice to include the encoding in the content-type header, so I would keep it as "text/xml;charset=UTF-8". Bluehost was likely updating their default settings (ie/ the default content-type they display for xml documents) which caused the change. Just as an aside, the terms character set and encoding are sometimes used interchangeably, but when you specify "charset=UTF-8" you are more correctly specifying the encoding (UTF-8 is the encoding, Unicode is the character set).
Related
I'm using AFNetworking in and iOS project and so far everything went ok. Now I have a script in PHP that is supposed to get some info and return some json. Both the info the script is provided with and the json it is supposed to return cointains latin chars, mainly ã and õ.
The thing is that when i recieve the json back at my iOS app the characters come encoded as what I think is NSNonLossyASCIIStringEncoding. I think the encoding is not UTF8 because back at the app:
[jsonManager GET:myURL parameters:sendingData success:^(AFHTTPRequestOperation *op,id responseObject){
NSLog(#"%d",op.responseStringEncoding);
NSLog(#"%d",op.responseSerializer.stringEncoding);
NSLog(#"%#",op.responseString);
NSLog(#"%#",[[NSString alloc]initWithData:op.responseData encoding:NSNonLossyASCIIStringEncoding]);
} failure:^(AFHTTPRequestOperation *op,NSError *error){
NSLog(#"%#",op.responseString);
}];
The last NSLog(in case of success) is the only one that outputs the responseString as it was supposed to be. The third log outputs \u00e3 in the place of every ã.
And the first log confirms that the encoding used was NSUTF8StringEncoding.
The second log states that responseSerializer.stringEnconding is NSNonLossyASCIIStringEncoding because I set it to be like that, previously to making the request, it made no difference, dont know why either...
The really strange thing is that if I invoke the script using a browser I can see that the output is encoded as UTF8.
What is wrong here?
Thank You.
It sounds like your server is using different encoding types depending on the client or some header.
NSJSONSerialization strictly implements RFC 4627, which states:
JSON text SHALL be encoded in Unicode. The default encoding is
UTF-8.
JSON is always Unicode-encoded, so my guess is that your server isn't following the spec.
Instead of using your browser, try to replicate the behavior using CURL, or a Chrome plug-in like Advanced REST Client. One place to start is your server's parsing of the Accept, User-Agent and Content-Type headers.
I am currently moving blog posts from wordpress to drupal. however after moving it
some of the text is not being displayed correctly.
wordpress is displaying :
When it hasn’t (html code is <h2>When it hasn’t</h2>)
Drupal is displaying :
When it hasn’t (html code is <h2>When it hasn’t</h2>)
In the wordpress and drupal db the value is correct. The source is the same.
<h2>When it hasn’t</h2>
I did a search and found many options. None of them helped.
Below are the ones I have done and checked.
1) I double checked that utf-8 is the character encoing in drupal and wp.
I also made a simple test.php file to check nothing else was coming in the way
and it still did not display correctly.
2) I made sure when we take a mysqldump and upload to drupal utf-8
is used.
3) I also made sure the .php file is in utf-8 when saved.
4) I changed the encoding type in chrome for every option available and nothing
displayed it correctly.
5) I also used php functions to recode it but they did not work.
$value2="<h2>When it hasn’t</h2>";
$out = recode_string('..utf-8', $value2);
//output - When it hasnt
$out2= mb_convert_encoding($value2,'UTF-8', "UTF-8");
// output - When it hasn’t
$out3= #iconv('UTF-8', 'utf-8', $value2);
// output - When it hasn’t
I have ran out of options now and I am stuck. Please help
You say the text in both databases is correct, but actually this doesn't mean too much: to viewing the content of a record you must use some client, and quite a few transformations may happen depending on how the text is rendered so you can read it.
So only two things matters:
the encoding of the column
the encoding of the HTML page returned by Drupal
Since your page outputs ’ (in CP1252 is xE2x80x99) for ’ (Unicode U+2019, UTF-8 is 0xE28099) I guess the column is indeed UTF-8, however there's someone between the database and the browser who thinks the text is CP1252. This is what you have to check:
If using MySQL, the connection encoding must be UTF-8 so that what you have in your PHP script is UTF-8 text. You can use SET NAMES 'UTF-8'. Note that if you don't need the Unicode set, you can even use CP1252: the only important thing is that you know the encoding, since PHP strings are just byte arrays.
Explicitely define the response encoding in the HTTP Content-Type header. I mean, configure Drupal to call header('Content-Type: text/html; charset=utf-8');
If the HTTP response encoding is different than the one used for the text retrieved from the db, transcode the query result accordingly
How to set the charset to UTF-8 for a received http variable in PHP?
I have a html form using the POST methode with 1 input field. But when i submit the form and echo the retrieved the contents from the input field via $_POST['input_name'] i get this: KrkiÄ - but i entered (and i need) this: Krkič
So how can i fix this?
I figured it out now. :)
If i want to add the contents to MYSQL then i need to add this:
if(!$mysqli->set_charset("utf8")){
printf("Error loading character set utf8: %s\n",$mysqli->error);
}
If i just need to echo the contents then adding this meta tag
<meta charset="utf-8">
into html head is enough.
There is no global default charset in PHP -- lots of things are encoding-aware, and each needs to be configured independently.
mb_internal_encoding applies only to the multibyte string family of functions, so it has an effect only if you are already using them (you need to do so most of the time that you operate on multibyte text from PHP code).
Other places where an incorrectly set encoding will give you problems include:
The source file itself (saved on the disk using which encoding?)
The HTTP headers sent to the browser (display the content received as which encoding?)
Your database connection (which encoding should be used to interpret your queries? which encoding for the results sent back to you?)
Each of these needs to be addressed independently, and most of the time they also need to agree among themselves.
Therefore, it is not enough to say "I want to display some characters". You also need to show how you are displaying them, where they are coming from and what the advertised encoding is for your HTML.
you can use:
<meta charset="UTF-8" />
on top of your php file place this
header('Content-Type: text/html; charset="UTF-8"');
I'm having some troubles with my $_POST/$_REQUEST datas, they appear to be utf8_encoded still.
I am sending conventional ajax post requests, in these conditions:
oXhr.setRequestHeader("Content-type", "application/x-www-form-urlencoded; charset=utf-8");
js file saved under utf8-nobom format
meta-tags in html <header> tag setup
php files saved under utf-8-nobom format as well
encodeURIComponent is used but I tried without and it gives the same result
Ok, so everything is fine: the database is also in utf8, and receives it this way, pages show well.
But when I'm receiving the character "º" for example (through $_REQUEST or $_POST), its binary represention is 11000010 10111010, while "º" hardcoded in php (utf8...) binary representation is 10111010 only.
wtf? I just don't know whether it is a good thing or not... for instance if I use "#º#" as a delimiter of the explode php function, it won't get detected and this is actually the problem which lead me here.
Any help will be as usual greatly appreciated, thank you so much for your time.
Best rgds.
EDIT1: checking against mb_check_encoding
if (mb_check_encoding($_REQUEST[$i], 'UTF-8')) {
raise("$_REQUEST is encoded properly in utf8 at index " . $i);
} else {
raise(false);
}
The encoding got confirmed, I had the message raised up properly.
Single byte utf-8 characters do not have bit 7(the eight bit) set so 10111010 is not utf-8, your file is probably encoded in ISO-8859-1.
The scheme is a text input field in a html form to be autocompleted using jQuery.autocomplete and getting the appropriate server response (e;g. a city name json list). The whole package works well... except that the client does not get data returned from the server when typing accented characters (éèà..). Same as many, it looks like I'm facing a char encoding issue but can not manage to figure out where and how to solve it despite many tries (iconv, utf8_encode, urldecode...) and readings like this one for example.
Therefore I'd need some help/hints to understand where to act (before prototyping jQuery autocomplete code ... ?)
EDIT: might be also a jQuery accent folding issue, I'll try also that way.
Configuration:
server: Apache2.2 (debian lenny)
php : compiled 5.3.3 (so the option JSON_UNESCAPED_UNICODE is not available for json_encode)
mysql: 5.1.49 with MySQL charset: UTF-8 Unicode (utf8),
class: using a modified PFBC2.x version for the php form building
meta
The website is mostly for french users so it's all designed with ISO-8859-1 (bad initial choice I guess) :
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
jQuery autocomplete code (applied to the city input field)
// DEBUG Testing (tested w/ and w/o the $charset_attr: no change)
$charset_attr = 'contentType: "application/x-www-form-urlencoded;charset=ISO-8859-1"';
echo 'jQuery("#' . $this->attributes["id"] . '").autocomplete({source:"' , $this->xhr_path . '", minLength:2, ' . $charset_attr .'});';
The generated code for that input field is matching the above expection.
Converting mysql rows into utf8 using this function :
I convert the msyql returned array into utf8 prior to sending json back to the client. Actually I tested and wrote also other functions, but this does not change anything so I guess the point is not there.
$encoded_arr = utf8json($returnData);
echo json_encode($encoded_arr);
flush();
Encoding control 1 (client side)
A embed control in the html form in order to check which char encoding is actually passed to jQuery.autocomplete :
jQuery(document).ready(function() {
<?php
$test_str ="foobar";
$check_encoding = "'" . mb_detect_encoding($test_str) . "'";
?>
alert('Check charset server encoding: ' + <?php echo $check_encoding;?> ); // output : ASCII
});
Encoding control 2 (server side)
$inputData = (isset($_GET))? htmlspecialchars($_GET['term'],ENT_COMPAT, 'UTF-8') : NULL;
$encoding_get = mb_detect_encoding($_GET['term']);
$encoding_data = mb_detect_encoding($inputData);
$utf8converted = #iconv(strtolower($encoding_get), 'utf-8', $inputData);
$checkconversion = mb_detect_encoding($utf8converted);
Sending lowcase normal characters (ea..), I get all as ASCII.
Sending lowcase accented characters (éèà..), I get all as UTF8.
So I'm lost as the server receives the proper char string, produces a json return (tested without ajax) but it looks like the client does not receive or interprate this properly.
For those facing the same kind of ...%$# issue, here is what I've done to solve my case :
Checking the char encoding at each node (eg client, apache server, mysql server), using mb_detect_encoding on the server side,
Finally pointed out the problem location node : in my case passing UTF8 chars to the mysql server i/o latin ISO-8859-1, so mysql server did not return the expected answers, which I could not detect or debug with direct url POSTing data to the server script. So I had log the input and output in a file, checking entry character encoding and mysql server output.
Changed the ajax request to POST i/o GET,
Solved by encoding $_POST data to ISO prior sending the mysql server request, using mb_convert_encoding, as well described here.