I use GuzzleHTTP 6.0 to get the data from the API server. For some reason the request which the API server receives are not UTF-8 endoded the characters ü,ö,ä,ß are garbled characters.
My default System and Database is UTF-8 encoded.
I set debug to true in the RequestOptions this is the output:
User-Agent: GuzzleHttp/6.2.1 curl/7.47.0 PHP/7.0.22-0ubunut0.16.04.1
Content-type: text/xml;charset="UTF-8"
Accept: text/xml" Cache-Control: no-cache
Content-Length: 2175 * upload completely sent off: 2175 out of 2175 bytes
<HTTP/1.1 200 OK <Server:Apache:Coyote/1.1 <Content-Type: text/xml; charset=utf-8 <Transfer-Encoding: chunked <Date: Thu, 23 Nov 2017 9:34:12 GMT <* Connection #5 to host www.abcdef.com left intact
I have set explicitily the headers contents to UTF-8;
$headers = array(
'Content-type' => 'text/xml;charset="utf-8"',
'Accept' => 'text/xml',
'Content-length' => strlen($requestBody),
);
I also tried to test using mb_detect_encoding() method
mb_detect_encoding($requestBody,'UTF-8',true); // returns UTF-8
Any further ideas how do i debug this issue..??
Content-Length must contain number of bytes, not number of characters. That could the reason if you use mbstring.func_overload. Try to omit manual set of this header, Guzzle will set it automatically in the correct way for you then.
My database stores some texts which I have to get with AJAX. This is going well but only when it not contains special characters such as ë or ä. I found some articles about this topic which told me to change the charset of the AJAX-request, but none of these worked for me.
When I start firebug it said this about the headers:
Antwoordheaders (dutch for responseheaders)
Cache-Control no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Connection close
Content-Length 94
Content-Type text/html; charset=ISO-8859-15
Date Wed, 26 Sep 2012 09:52:56 GMT
Expires Thu, 19 Nov 1981 08:52:00 GMT
Pragma no-cache
Server Apache
X-Powered-By PleskLin
Verzoekheaders (dutch for requestheaders)
Accept text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
Accept-Encoding gzip, deflate
Accept-Language nl,en-us;q=0.7,en;q=0.3
Authorization Basic c3BvdGlkczp6SkBVajRrcw==
Connection keep-alive
Content-Type text/html; charset=ISO-8859-15
Cookie __utma=196329838.697518114.1346065716.1346065716.1346065716.1; __utmz=196329838.1346065716.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); PHPSESSID=2h4vu8gu9v8fe5l1t3ad5agp86
DNT 1
Host www.spotids.com
Referer http://www.spotids.com/private/?p=16
User-Agent Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20100101 Firefox/14.0.1
Both of the headers are talking about charset=ISO-8859-15 which should include characters like ë, but it doesn't work for me.
The code I used for this (PHP):
`$newresult = mysql_query($query2);
$result = array();
while( $row = mysql_fetch_array($newresult))
{
array_push($result, $row);
}
$jsonText = json_encode($result);
echo $jsonText;`
Make sure you set the headers to UTF-8:
header('Content-Type: application/json; charset=utf-8');
Make sure your connection to database is made with UTF-8 encoding before any queries:
$query = mysql_query("SET NAMES 'UTF8'");
As far as I know, JSON encodes any characters that cannot be represented in pure ASCII. And you should decode that JSON on response.
Try to move to PDO as mysql_* functions are deprecated. Use this nice tutorial
From JSON RFC-4627 : JSON text SHALL be encoded in Unicode. The default encoding is
UTF-8.
Use mb_convert_encoding or iconv to change string encoding.
And send correct header:
header('Content-Type: application/json;charset=utf-8');
echo json_encode($data);
verify the Content-Type meat
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
I wanna download this link from google which mage txt file by php.
when I do it by browser,the unicode is correct and all things are right,but when I do it by curl or file_get_content it contain bad alphabets.
what is difference and how should I solve it?
downloaded by brower
[[["سلام","hello","",""]],[["interjection",["سلام","هالو","الو"],[["سلام",["hello","hi","aloha","all hail"]],["هالو",["hallo","hello","halloo"]],["الو",["hello"]]]]],"en",,[["سلام",[5],0,0,1000,0,1,0]],[["hello",4,,,""],["hello",5,[["سلام",1000,0,0],["خوش",0,0,0],["میهمان گرامی",0,0,0],["خوش آمدید",0,0,0],["درود کاربر",0,0,0]],[[0,5]],"hello"]],,,[["en"]],65]
download by following php script:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<?php
$t = file_get_contents("http://translate.google.com/translate_a/t?client=t&hl=en&sl=auto&tl=fa&multires=1&prev=btn&ssel=0&tsel=3&uptl=fa&alttl=en&sc=1&text=hello");
$f = fopen("t.txt", "w+");
fwrite($f, $t);
fclose($f);
?>
</body></html>
[[["ÓáÇã","hello","",""]],[["interjection",["ÓáÇã","åÇáæ","Çáæ"],[["ÓáÇã",["hello","hi","aloha","all hail"]],["åÇáæ",["hallo","hello","halloo"]],["Çáæ",["hello"]]]]],"en",,[["ÓáÇã",[5],0,0,1000,0,1,0]],[["hello",4,,,""],["hello",5,[["ÓáÇã",1000,0,0],["ÎæÔ",0,0,0],["ã\u06CCåãÇä ÑÇã\u06CC",0,0,0],["ÎæÔ ÂãÏ\u06CCÏ",0,0,0],["ÏÑæÏ ÇÑÈÑ",0,0,0]],[[0,5]],"hello"]],,,[["en"]],4]
Header:
Header are:
HTTP/1.1 200 OK
Pragma: no-cache
Date: Fri, 25 May 2012 22:29:12 GMT
Expires: Fri, 25 May 2012 22:29:12 GMT
Cache-Control: private, max-age=600
Content-Type: text/javascript; charset=UTF-8
Content-Language: fa
Set-Cookie: PREF=ID=b6c08a0545f50594:TM=1337984952:LM=1337984952:S=Sf1xcow2qPZrFeu0; expires=Sun, 25-May-2014 22:29:12 GMT; path=/; domain=.google.com
X-Content-Type-Options: nosniff
Content-Disposition: attachment
Server: HTTP server (unknown)
X-XSS-Protection: 1; mode=block
Transfer-Encoding: chunked
Add parameters ie=UTF-8 and oe=UTF-8 to query string of the url:
$t = file_get_contents("http://translate.google.com/translate_a/t?ie=UTF-8&oe=UTF-8&client=t&hl=en&sl=auto&tl=fa&multires=1&prev=btn&ssel=0&tsel=3&uptl=fa&alttl=en&sc=1&text=hello");
This worked for me once, as I was about to throw lots of code to the garbage! Maybe it will help you too
iconv( 'CP1252', 'UTF-8', $string);
echoing what you get from file_get_contents into the PHP output should work fine, as you are going from a UTF-8 JSON response to a UTF-8 HTML response. Works for me off the given URL.
When you store to a file, you then have to worry about what encoding the tools you are using to read the file are working in. Just fwriteing is fine as long as the text editor you view it in knows the output is UTF-8. On Windows, Notepad may instead try to read it in the locale-dependent default ('ANSI') codepage, which won't be UTF-8. On a Western European install it'd be code page 1252 and you'd get output like سلام for سلام.
(One way around that is to put a UTF-8 fake-BOM at the front of the file with fwrite($f, "\xef\xbb\xbf");. This is a bit dodgy because UTF-8 doesn't need a Byte Order Mark (its byte order is fixed) and it breaks UTF-8's ASCII-compatibility, but Windows tools like fake-BOMs. The other way around it is to get a better text editor that allows you to default to handling files as UTF-8.)
You've got something slightly different here, as ÓáÇã is what you get when you save سلام in the Windows default Arabic encoding (code page 1256) and then read it in the Windows default Western encoding (code page 1252). This implies there's some kind of extra store-and-load step involved in your testing, that's messing up the encoding.
If it's anything to do with Windows command line tools you might as well give up, because the Command Prompt and MSVCRT apps don't really play well with Unicode at all.
Hey Guys,
I am totally confused on PHP's imap functions, they are all over the place and I've dug through comments to get this far. I can do imap_fetchbody($connection, 5, null); and retrieve the entire message which gives me all that i need:
ie:
To: user#gmail.com Content-Type:
multipart/mixed;
boundary=001517570c8a681a5004a36a1fd6
--001517570c8a681a5004a36a1fd6 Content-Type: multipart/alternative;
boundary=001517570c8a681a4804a36a1fd4
--001517570c8a681a4804a36a1fd4 Content-Type: text/plain;
charset=ISO-8859-1
asdf
--001517570c8a681a4804a36a1fd4 Content-Type: text/html;
charset=ISO-8859-1
asdf
--001517570c8a681a4804a36a1fd4--
--001517570c8a681a5004a36a1fd6 Content-Type: image/gif;
name="ajax-loader.gif"
Content-Disposition: attachment;
filename="ajax-loader.gif"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_gnruant80
... bunch of code here omitted ...
But I really want this information in an Array or Object to use it. without having to think up some regular expression to get data out of the entire texts above.
Does anyone know a way to do that or which php imap command to use? Or is there a place with really good examples anyone knows of?
You can use a non null value for section parameter(3rd parameter). Please see second example of http://php.net/manual/en/function.imap-fetchbody.php. I think that is what you are searching.
I have a custom web based contact management system that we built in PHP to track contacts and recently starting checking our Google e-mail box using IMAP and then, if that contact is in our contact management system:
Copying the message into a MySQL database table that's associated with that contact
Marking that contact to follow up with that day
Archiving the message in Gmail
Everything seems to be working great, EXCEPT... every so many emails we get a really garbled message that looks like this:
FABRRRQAUUUUAJXDjxZrUtzNFa2UMwjYj5YnYgZ74Ndwa4bwfzqmpH3/wDZjTcl
CnKdr2Fa7SJP+Ek8S/8AQJX/AMB5P8aZN4s162j33GmxxrnG54XUfqa6ysHxp/yA/wDtqv8AWuej
jFUqKDgtSpQsr3L13r4tPDcOoShBcTxgog6FiP5CsrwtpjuzavekvcTZKFuwPf8AH+VZOlwS+Iby
1jlBFnZRKhGeDjt9Sf0Fd0qhVCqAABgA
I go back and check the message and it appears to be only text, so I don't think it is an image. Any idea how to prevent that?
Thanks in advance.
Sincerely,
James
The example you provided looks like it is base64 encoded. The headers of the email message will tell you how to handle the content of the email message.
For example, the following defines an email message where the body is plain text, but it is stored as being base64 encoded. I have "x"ed out the privacy sensitive information.
Received: from xxxxxxxxx ([xxx.xx.xx.xxx]) by xxxxxxxxxx.xxx.xxxxxxxxxxxxxxx.xxx with Microsoft SMTPSVC(6.0.3790.3959);
Wed, 29 Apr 2009 21:29:16 +0000
Received: from xxxx-xxx-xxxxxx ([xxx.xx.xxx.xxxx]) by xxxxxxxx ; Wed, 29 Apr 2009 15:29:16
-0600
Message-ID: <AADB29A7-AAED-4068-B4A8-300E3B0D93AB#localhost>
MIME-Version: 1.0
From: xxxxxxxxxx#xxxxxxxxxxxxxxx.com
To: xxxxxxxxxx#xxxxxxxxxxxxxxx.com
Date: 29 Apr 2009 15:29:16 -0600
Subject: xxxx Account Update
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: base64
Return-Path: xxxxxx#xxxxxxxx.com
X-OriginalArrivalTime: 29 Apr 2009 21:29:16.0374 (UTC) FILETIME=[8C63AF60:01C9C911]
Pay close attention to the Content-Type and Content-Transfer-Encoding headers.
I believe the IMAP is over SSL, so it might be the connection to IMAP that gets out of sync. The best solution I have for that is just check to see if the body contains a really long word. Since that garble has no spaces:
<?php
function wordlength($txt, $limit)
{
$words = explode(' ', $txt);
foreach($words as $v)
{
if(strlen($v) > $limit)
{
return false;
}
}
return true;
}
?>
Usage:
<?php
$txt = "Message Body would be here";
if(!wordlength($txt, 45))
{
//maybe try to pull the message again or
//send an email to you telling you there is a problem
}
?>
I picked 45 just in case some uses the word Pneumonoultramicroscopicsilicovolcanoconiosis in an email. :D
Jordan might be right though. It may just be base64 encoded. I would just explode() the headers then and search for that and if it's there, a simple base64_decode() will do the trick.
This helped me with a garbled e-mail subject.
http://php.net/manual/en/function.imap-header.php