Special characters (ë) in JSON-response - php

My database stores some texts which I have to get with AJAX. This is going well but only when it not contains special characters such as ë or ä. I found some articles about this topic which told me to change the charset of the AJAX-request, but none of these worked for me.
When I start firebug it said this about the headers:
Antwoordheaders (dutch for responseheaders)
Cache-Control no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Connection close
Content-Length 94
Content-Type text/html; charset=ISO-8859-15
Date Wed, 26 Sep 2012 09:52:56 GMT
Expires Thu, 19 Nov 1981 08:52:00 GMT
Pragma no-cache
Server Apache
X-Powered-By PleskLin
Verzoekheaders (dutch for requestheaders)
Accept text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
Accept-Encoding gzip, deflate
Accept-Language nl,en-us;q=0.7,en;q=0.3
Authorization Basic c3BvdGlkczp6SkBVajRrcw==
Connection keep-alive
Content-Type text/html; charset=ISO-8859-15
Cookie __utma=196329838.697518114.1346065716.1346065716.1346065716.1; __utmz=196329838.1346065716.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); PHPSESSID=2h4vu8gu9v8fe5l1t3ad5agp86
DNT 1
Host www.spotids.com
Referer http://www.spotids.com/private/?p=16
User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20100101 Firefox/14.0.1
Both of the headers are talking about charset=ISO-8859-15 which should include characters like ë, but it doesn't work for me.
The code I used for this (PHP):
`$newresult = mysql_query($query2);
$result = array();
while( $row = mysql_fetch_array($newresult))
{
array_push($result, $row);
}
$jsonText = json_encode($result);
echo $jsonText;`

Make sure you set the headers to UTF-8:
header('Content-Type: application/json; charset=utf-8');
Make sure your connection to database is made with UTF-8 encoding before any queries:
$query = mysql_query("SET NAMES 'UTF8'");
As far as I know, JSON encodes any characters that cannot be represented in pure ASCII. And you should decode that JSON on response.
Try to move to PDO as mysql_* functions are deprecated. Use this nice tutorial

From JSON RFC-4627 : JSON text SHALL be encoded in Unicode. The default encoding is
UTF-8.
Use mb_convert_encoding or iconv to change string encoding.
And send correct header:
header('Content-Type: application/json;charset=utf-8');
echo json_encode($data);

verify the Content-Type meat
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Related

GuzzleHttp request sends garbled characters

I use GuzzleHTTP 6.0 to get the data from the API server. For some reason the request which the API server receives are not UTF-8 endoded the characters ü,ö,ä,ß are garbled characters.
My default System and Database is UTF-8 encoded.
I set debug to true in the RequestOptions this is the output:
User-Agent: GuzzleHttp/6.2.1 curl/7.47.0 PHP/7.0.22-0ubunut0.16.04.1
Content-type: text/xml;charset="UTF-8"
Accept: text/xml" Cache-Control: no-cache
Content-Length: 2175 * upload completely sent off: 2175 out of 2175 bytes
<HTTP/1.1 200 OK <Server:Apache:Coyote/1.1 <Content-Type: text/xml; charset=utf-8 <Transfer-Encoding: chunked <Date: Thu, 23 Nov 2017 9:34:12 GMT <* Connection #5 to host www.abcdef.com left intact
I have set explicitily the headers contents to UTF-8;
$headers = array(
'Content-type' => 'text/xml;charset="utf-8"',
'Accept' => 'text/xml',
'Content-length' => strlen($requestBody),
);
I also tried to test using mb_detect_encoding() method
mb_detect_encoding($requestBody,'UTF-8',true); // returns UTF-8
Any further ideas how do i debug this issue..??
Content-Length must contain number of bytes, not number of characters. That could the reason if you use mbstring.func_overload. Try to omit manual set of this header, Guzzle will set it automatically in the correct way for you then.

Encoding accents csv - PHP

I've been looking through all the answers here and I haven't found the solution.
Here's what I got:
MySQL :
Database & Table encoding => utf8_unicode_ci
I'm trying to convert a an array (containing rows from a query) to CSV
however when i open the csv I get this
Prénom
instead of
Prénom
here's my code
$allQueryRows = array();
while($row_query = $stmt_select->fetch(PDO::FETCH_ASSOC)){
$row_query = array_map("utf8_encode", $row_query);
array_push($allQueryRows, $row_query);
}
download_send_headers("csv" . date("Y-m-d") . ".csv");
echo array2csv($allQueryRows);
die();
function array2csv(array &$array)
{
if (count($array) == 0) {
return null;
}
ob_start();
$df = fopen("php://output", 'w');
fputcsv($df, array_keys(reset($array)));
foreach ($array as $row) {
fputcsv($df, $row);
}
fclose($df);
return ob_get_clean();
}
function download_send_headers($filename) {
// disable caching
$now = gmdate("D, d M Y H:i:s");
header("Expires: Tue, 03 Jul 2001 06:00:00 GMT");
header("Cache-Control: max-age=0, no-cache, must-revalidate, proxy-revalidate");
header("Last-Modified: {$now} GMT");
// force download
header("Content-Type: application/force-download");
header("Content-Type: application/octet-stream");
header("Content-Type: application/download");
// disposition / encoding on response body
header("Content-Disposition: attachment;filename={$filename}");
header("Content-Transfer-Encoding: binary");
}
You should send a SET NAMES utf8; MySQL query before your SELECT query instead of array_mapping your data after.
Then in HTTP headers, send
Content-Type: text/csv; charset=utf-8;
AT first blush it looks like a utf-16 / utf-8 issue. Here's how to start to diagnose it:
First, when you say, "however when i open the csv I get this" what do you mean by "open", i.e. open with what?
I would suggest looking at the file in hex to see exactly what is in the file. It could be that what you are opening it in (your editor or whatever) is what is causing the display coming into your eye ball to be wrong OR it could be that the underlying data that your opening program is seeing is wrong. I think you need to sort out that question first.
(Tip: In general this is the old process of divide and conquer: find a way to test somewhere in the middle of your problem to see which half of your system is causing the problem. The quickest results come from picking test points about half way in the middle of the complexity, not near an edge of the problem, i.e. a Boolean search for the bug. It might not find the problem in the first iteration, but it will help narrow it down.)
Also perhaps you need to tell SQL which to use, e.g. $connection->set_charset("utf8");
Or perhaps what you are seeing is actually being displayed differently from what you think it is because of a utf8/utf16 display level mixup. I generally set stay with utf8 and so set Content-Type: text/plain; charset=UTF-8; (Also if you are viewing this file via your editor make sure it's set to the correct character space.)

How to determine if a string was compressed?

How can I determine whether a string was compressed with gzcompress (aparts from comparing sizes of string before/after calling gzuncompress, or would that be the proper way of doing it) ?
PRE: I guess, if you send a request, you can immediately look into $http_response_header to see if the one of the items in the array is a variation of Content-Encoding: gzip. But this is not ideal!
there is a far better method.
Here is HOW TO...
Check if its GZIP. Like a BOSS!
according to GZIP RFC:
The header of gzip content looks like this
+---+---+---+---+---+---+---+---+---+---+
|ID1|ID2|CM |FLG| MTIME |XFL|OS | (more-->)
+---+---+---+---+---+---+---+---+---+---+
the ID1 and ID2 identify the content as GZIP. And CM states that the ZLIB_ENCODING (the compression method) is ZLIB_ENCODING_DEFLATE - which is customarily used by GZIP with all web-servers.
oh! and they have fixed values:
The value of ID1 is "\x1f"
The value of ID2 is "\x8b"
The value of CM is "\x08" (or just 8...)
almost there:
`$is_gzip = 0 === mb_strpos($mystery_string , "\x1f" . "\x8b" . "\x08");`
Working example
<?php
/** #link https://gist.github.com/eladkarako/d8f3addf4e3be92bae96#file-checking_gzip_like_a_boss-php */
date_default_timezone_set("Asia/Jerusalem");
while (ob_get_level() > 0) ob_end_flush();
mb_language("uni");
#mb_internal_encoding('UTF-8');
setlocale(LC_ALL, 'en_US.UTF-8');
header('Time-Zone: Asia/Jerusalem');
header('Charset: UTF-8');
header('Content-Encoding: UTF-8');
header('Content-Type: text/plain; charset=UTF-8');
header('Access-Control-Allow-Origin: *');
function get($url, $cookie = '') {
$html = #file_get_contents($url, false, stream_context_create([
'http' => [
'method' => "GET",
'header' => implode("\r\n", [''
, 'Pragma: no-cache'
, 'Cache-Control: no-cache'
, 'User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2310.0 Safari/537.36'
, 'DNT: 1'
, 'Accept-Language: en-US,en;q=0.8'
, 'Accept: text/plain'
, 'X-Forwarded-For: ' . implode(', ', array_unique(array_filter(array_map(function ($item) { return filter_input(INPUT_SERVER, $item, FILTER_SANITIZE_SPECIAL_CHARS); }, ['HTTP_X_FORWARDED_FOR', 'REMOTE_ADDR', 'HTTP_CLIENT_IP', 'SERVER_ADDR', 'REMOTE_ADDR']), function ($item) { return null !== $item; })))
, 'Referer: http://eladkarako.com'
, 'Connection: close'
, 'Cookie: ' . $cookie
, 'Accept-Encoding: gzip'
])
]]));
$is_gzip = 0 === mb_strpos($html, "\x1f" . "\x8b" . "\x08", 0, "US-ASCII");
return $is_gzip ? zlib_decode($html, ZLIB_ENCODING_DEFLATE) : $html;
}
$html = get('http://www.pogdesign.co.uk/cat/');
echo $html;
What do we see here that is worth mentioning?
start with initializing the PHP engine to use UTF-8 (since we don't really know if the web-server will return a GZIP content.
Providing the header Accept-Encoding: gzip, tells the web-sever, it may output a GZIP content.
Discovering GZIP content (you should use the multi-byte functions with ASCII encoding).
Finally returning the plain output, is easy using the ZLIB methods.
A string and a compressed string are both simply sequences of bytes. You cannot really distinguish one sequence of bytes from another sequence of bytes. You should know whether a blob of bytes represents a compressed format or not from accompanying metadata.
If you really need to guess programmatically, you have several things you can try:
Try to uncompress the string and see if the uncompress operation succeeds. If it fails, the bytes probably did not represent a compressed string.
Try to check for obvious "weird" bytes like anything before 0x20. Those bytes aren't typically used in regular text. There's no real guarantee that they occur in a compressed string though.
Use mb_check_encoding to see whether a string is valid in the encoding you suspect it to be in. If it isn't, it's probably compressed (or you checked for the wrong encoding). With the caveat that virtually any byte sequence is valid in virtually every single-byte encoding, so this'll only work for multi-byte encodings.
This work fine for me:
if (#gzuncompress($_xml)!==false) {
// gzipped sring
You can simply try gzuncompress() on the data as noted by #DiDiegodaFonseca. If it fails, it was not made by gzcompress(), or it was not faithfully transmitted.
If you really want to, you can check the first two bytes for a zlib header (not a gzip header, as incorrectly suggested in the accepted answer). gzcompress() produces a zlib stream, not a gzip stream. gzencode() is what produces a gzip stream. gzdeflate() produces a raw deflate stream.
RFC 1950 describes the zlib header. It is two bytes, where the two bytes taken as a big-endian 16-bit unsigned integer must be a multiple of 31. In addition to checking that, you can check that the low four bits of the first byte is 8 (1000), and that the high bit is zero.

unicode is wrong when get file from server

I wanna download this link from google which mage txt file by php.
when I do it by browser,the unicode is correct and all things are right,but when I do it by curl or file_get_content it contain bad alphabets.
what is difference and how should I solve it?
downloaded by brower
[[["سلام","hello","",""]],[["interjection",["سلام","هالو","الو"],[["سلام",["hello","hi","aloha","all hail"]],["هالو",["hallo","hello","halloo"]],["الو",["hello"]]]]],"en",,[["سلام",[5],0,0,1000,0,1,0]],[["hello",4,,,""],["hello",5,[["سلام",1000,0,0],["خوش",0,0,0],["میهمان گرامی",0,0,0],["خوش آمدید",0,0,0],["درود کاربر",0,0,0]],[[0,5]],"hello"]],,,[["en"]],65]
download by following php script:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<?php
$t = file_get_contents("http://translate.google.com/translate_a/t?client=t&hl=en&sl=auto&tl=fa&multires=1&prev=btn&ssel=0&tsel=3&uptl=fa&alttl=en&sc=1&text=hello");
$f = fopen("t.txt", "w+");
fwrite($f, $t);
fclose($f);
?>
</body></html>
[[["ÓáÇã","hello","",""]],[["interjection",["ÓáÇã","åÇáæ","Çáæ"],[["ÓáÇã",["hello","hi","aloha","all hail"]],["åÇáæ",["hallo","hello","halloo"]],["Çáæ",["hello"]]]]],"en",,[["ÓáÇã",[5],0,0,1000,0,1,0]],[["hello",4,,,""],["hello",5,[["ÓáÇã",1000,0,0],["ÎæÔ",0,0,0],["ã\u06CCåãÇä ÑÇã\u06CC",0,0,0],["ÎæÔ ÂãÏ\u06CCÏ",0,0,0],["ÏÑæÏ ÇÑÈÑ",0,0,0]],[[0,5]],"hello"]],,,[["en"]],4]
Header:
Header are:
HTTP/1.1 200 OK
Pragma: no-cache
Date: Fri, 25 May 2012 22:29:12 GMT
Expires: Fri, 25 May 2012 22:29:12 GMT
Cache-Control: private, max-age=600
Content-Type: text/javascript; charset=UTF-8
Content-Language: fa
Set-Cookie: PREF=ID=b6c08a0545f50594:TM=1337984952:LM=1337984952:S=Sf1xcow2qPZrFeu0; expires=Sun, 25-May-2014 22:29:12 GMT; path=/; domain=.google.com
X-Content-Type-Options: nosniff
Content-Disposition: attachment
Server: HTTP server (unknown)
X-XSS-Protection: 1; mode=block
Transfer-Encoding: chunked
Add parameters ie=UTF-8 and oe=UTF-8 to query string of the url:
$t = file_get_contents("http://translate.google.com/translate_a/t?ie=UTF-8&oe=UTF-8&client=t&hl=en&sl=auto&tl=fa&multires=1&prev=btn&ssel=0&tsel=3&uptl=fa&alttl=en&sc=1&text=hello");
This worked for me once, as I was about to throw lots of code to the garbage! Maybe it will help you too
iconv( 'CP1252', 'UTF-8', $string);
echoing what you get from file_get_contents into the PHP output should work fine, as you are going from a UTF-8 JSON response to a UTF-8 HTML response. Works for me off the given URL.
When you store to a file, you then have to worry about what encoding the tools you are using to read the file are working in. Just fwriteing is fine as long as the text editor you view it in knows the output is UTF-8. On Windows, Notepad may instead try to read it in the locale-dependent default ('ANSI') codepage, which won't be UTF-8. On a Western European install it'd be code page 1252 and you'd get output like سلام for سلام.
(One way around that is to put a UTF-8 fake-BOM at the front of the file with fwrite($f, "\xef\xbb\xbf");. This is a bit dodgy because UTF-8 doesn't need a Byte Order Mark (its byte order is fixed) and it breaks UTF-8's ASCII-compatibility, but Windows tools like fake-BOMs. The other way around it is to get a better text editor that allows you to default to handling files as UTF-8.)
You've got something slightly different here, as ÓáÇã is what you get when you save سلام in the Windows default Arabic encoding (code page 1256) and then read it in the Windows default Western encoding (code page 1252). This implies there's some kind of extra store-and-load step involved in your testing, that's messing up the encoding.
If it's anything to do with Windows command line tools you might as well give up, because the Command Prompt and MSVCRT apps don't really play well with Unicode at all.

what could be an easiest way of parsing headers and body php

hi guys i below is what i receive from a curl response.
HTTP/1.1 200 OK
X-Account-Object-Count: 4
X-Account-Bytes-Used: 3072798
X-Account-Container-Count: 3
Accept-Ranges: bytes
Content-Length: 15
Content-Type: text/plain; charset=utf-8
Date: Thu, 12 Jan 2012 04:07:33 GMT
a1
abc
testing
i found a good function which parses the headers and i can grab the key value pairs in headers not a problem the problem i have is how to grab the names in the body
a1
abc
testing
i think may be regex can do the best job but do not know if regex is the best approach or is there any other function which can return headers separete and body separate.
Any help is appreciated. thanks.
Updates
Now i am getting the response as
HTTP/1.1 200 OK
X-Account-Object-Count: 4
X-Account-Bytes-Used: 3072798
X-Account-Container-Count: 3
Accept-Ranges: bytes
Content-Length: 115
Content-Type: application/json; charset=utf-8
Date: Thu, 12 Jan 2012 04:47:36 GMT
[{"name":"a1","count":0,"bytes":0},{"name":"abc","count":0,"bytes":0},{"name":"testing","count":4,"bytes":3072798}]
so the names are in json
Seems like it should be as simple as:
list($headers, $body) = explode("\n\n", $response);
$bodyValues = explode("\n", $body);

Categories