cannot convert JSON response from windows-1253 to utf8 - php

I'm trying to parse a JSON response from a web service I have no control over.
These are the headers
This is the body I see in php with sensitive parts hidden
I'm using guzzle http client to send the request and to retrieve the response
If I try to decode it directly I receive an empty object so I'm assuming a conversion is needed so I am trying to convert the response contents like this
json_decode(iconv($charset, 'UTF-8', $contents))
or
mb_convert_encoding($contents, 'UTF-8', $charset);
both of which throw an exception.
Notice: iconv(): Wrong charset, conversion from 'windows-1253' to 'UTF-8' is not allowed in Client.php on line 205
Warning: mb_convert_encoding(): Illegal character encoding specified in Client.php on line 208
I've used this piece of code successfully before but I can't understand why it fails now.
Sending the same request using POSTMAN correctly retrieves the data without broken characters and it seems to show the same headers and body received.
I'm updating based on comments.
mb_detect_encoding($response->getBody()) -> UTF-8
mb_detect_encoding($response->getBody->getContents()) -> ASCII
json_last_error_msg -> Malformed UTF-8 characters, possibly incorrectly encoded
Additionally as a trial and error attempt I tried all iconv encodings to see if any could convert it to utf-8 without an error to detect the encoding using this one
private function detectEncoding($str){
$iconvEncodings = [...]
$finalEncoding = "unknown";
foreach($iconvEncodings as $encoding){
try{
iconv($encoding, 'UTF-8', $str);
return $encoding;
}
catch (\Exception $exception){
continue;
}
}
return $finalEncoding;
}
Apparently no encoding worked and everything gave the same exception. I'm assuming the problem is with retrieving the response json correctly via guzzle and not with iconv itself. It can't be that it's not any of the 1000+ ones.
Some more info with CURL
I just retried the same payload using CURL
/**
* #param $options
* #return bool|string
*/
public function makeCurlRequest($options)
{
$payload = json_encode($options);
// Prepare new cURL resource
$ch = curl_init($this->softoneurl);
curl_setopt_array($ch, [
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_ENCODING => "", // handle compressed
CURLOPT_USERAGENT => "test", // name of client
CURLOPT_AUTOREFERER => true, // set referrer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // time-out on connect
CURLOPT_TIMEOUT => 120, // time-out on response
CURLINFO_HEADER_OUT => true,
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => $payload,
]);
// Set HTTP Header for POST request
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Content-Type: application/json',
'Content-Length: ' . strlen($payload))
);
// Submit the POST request
$result = curl_exec($ch);
// Close cURL session handle
curl_close($ch);
return $result;
}
I received the exact same string and the exact same results with converting it. Perhaps an option I'm missing?
Apparently there's something wrong with iconv itself in the environment and it's not application specific. Running the following code via SSH
php -r "var_dump(iconv('Windows-1253', 'UTF-8', 'test'));"
yields
PHP Notice: iconv(): Wrong charset, conversion from `Windows-1253' to `UTF-8' is not allowed in Command line code on line 1
PHP Stack trace:
PHP 1. {main}() Command line code:0
PHP 2. iconv(*uninitialized*, *uninitialized*, *uninitialized*) Command line code:1
Command line code:1:
bool(false)
Perhaps some dependency is missing

About 14 hours of troubleshooting later I'm able to answer my own question correctly. In my case since this was running in the context of a CLI command, it caused an issue due to missing libraries. Basically the CLI php binary didn't have access to some libraries iconv needed.
More specifically the gconv libraries.
In my case in Debian 9 it was located in
/usr/lib/x86_64-linux-gnu/gconv
and this folder contains a lot of libraries for each encoding used.
A good way to understand this is if you run in a system you have root access the command
strace iconv -f <needed_encoding> -t utf-8
It will yield a lot of folders that iconv tries to access including the gconv folder and will point you to the location of the ones you need to include in your SSH environment. If you don't have access as root you have to ask your hosting provider.

Try this:
$response = $guzzle->request('GET', $url);
$type = $response->getHeader('content-type');
$parsed = Psr7\parse_header($type);
$original_body = (string)$response->getBody();
$utf8_body = mb_convert_encoding($original_body, 'UTF-8', $parsed[0]['charset'] ?: 'UTF-8');

For those who had the same issue there is a simpliest method to resolve it i know its 3 years later but u can also set some headers.
header('Content-Type: application/json; charset=windows-1253');
that solved my problem instantly.

Related

PHP script no longer receiving data from context stream

We have a "legacy" script that stopped working a little while back. Pretty sure it's because the endpoint it's connecting to changed from http to https, and the old http address now returns a 301.
I've never done anything other than tiny changes to PHP scripts, so am a little out of my depth here.
Note that our PHP version is old - 5.3.0. This may well be part of the problem.
The script as-is (relevant bit anyway):
$uri = "http://www.imf.org/external/np/fin/data/rms_mth.aspx"
."?SelectDate=$date&reportType=CVSDR&tsvflag=Y";
$opts = array('http' => array(
'proxy' => 'tcp://internal.proxy.address:port',
'method' => 'GET',
'request_fulluri' => true)
);
$ctx = stream_context_create($opts);
$lines = file($uri, false, $ctx);
foreach ($lines as $line)
...
This returns nothing any more. The link btw is the IMF link for exchange rates, so that is open to all - if you open it you'll get a download with a rate table in it. The rest of the script basically parses this for the data we want.
Now, pretty sure our proxy is OK. Running some tests with curl gives the following results:
curl --proxy tcp://internal.proxy.address:port -v https://www.imf.org/external/np/fin/data/rms_mth.aspx?SelectDate=05/28/2020&reportType=CVSDR&tsvflag=Y
(specify https) works just fine.
curl --proxy tcp://internal.proxy.address:port -v http://www.imf.org/external/np/fin/data/rms_mth.aspx?SelectDate=05/28/2020&reportType=CVSDR&tsvflag=Y
(specify http) does not work, and shows a 301 error
curl --proxy tcp://internal.proxy.address:port -v -L http://www.imf.org/external/np/fin/data/rms_mth.aspx?SelectDate=05/28/2020&reportType=CVSDR&tsvflag=Y
(specify http with follow redirects) then works OK.
I've tried a few things after some googling. It seems I need opts for 'ssl' as well when using https. So I've made the following changes
$uri = "https://www.imf.org/external/np/fin/data/rms_mth.aspx"
."?SelectDate=$date&reportType=CVSDR&tsvflag=Y";
$opts = array('http' => array(
'proxy' => 'tcp://internal.proxy.address:port',
'method' => 'GET',
'request_fulluri' => true),
'ssl' => array(
'verify_peer' => false,
'verify_peer_name' => false,
'SNI_enabled' => false)
);
Sadly, the SNI_enabled flag was introduced after 5.3.0, so I don't think this helps. There's also a follow_location context option for http, but that was introduced in 5.3.4, so also no use.
(BTW, I have little to no control over the version of PHP we have, so while I appreciate higher versions may offer better solutions, that's not a lot of use to me I'm afraid).
Basically, I am now stuck. No combination of these parameters or settings returns any data at all. I can see it works via curl and the proxy, so it's not a general connectivity issue.
Any and all suggestions gratefully received!
Update: After adding the lines to enable error reporting, the error code is for the stream connecting:
Warning: file(https://www.imf.org/external/np/fin/data/rms_mth.aspx?SelectDate=05/28/2020&reportType=CVSDR&tsvflag=Y): failed to open stream: Cannot connect to HTTPS server through proxy in /usr/bass/apps/htdocs/BASS/mods/module.XSM.php on line 79
(line 79 is the $lines = ... line)
So it doesn't connect in the php script, but running the same connection via the proxy in curl works fine. What's the difference in php that causes this?
You can use php curl functions to get the response from your given url. And then you can use explode() function to break the response line by line.
$uri = "https://www.imf.org/external/np/fin/data/rms_mth.aspx"
."?SelectDate=$date&reportType=CVSDR&tsvflag=Y";
$opts = array(
CURLOPT_URL => $uri,
CURLOPT_PROXY => 'tcp://internal.proxy.address:port',
CURLOPT_HEADER => false,
CURLOPT_RETURNTRANSFER => true
);
$ch = curl_init();
curl_setopt_array($ch, $opts);
$lines = curl_exec($ch);
curl_close($ch);
$lines = explode("\n", $lines); // breaking the whole response string line by line
foreach ($lines as $line)
...

GET request with headers using cURL PHP

I m trying to make a GET request with headers using cURL PHP. I am getting an empty response form the server. I would like to know if I have made this request correctly using cURL PHP.
// curl GET request with headers
$url = $sendMailURL;
$requestHeaders = array(
$hConLength_.':'.$conLengthValue,
$hConType_.':'.$conTypeValue,
$hHost_.':'.$conHostValue,
$hDate_.':'.$conAmzDateValue);
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $requestHeaders);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output = curl_exec($ch);
if(curl_error($ch)){
echo 'curl error: '.curl_error($ch);
}else{
print_r($output);
}
print_r($requestHeaders):
Array ( [0] => Content-Length:207
[1] => Content-Type:application/x-www-form-urlencoded
[2] => Host:email.eu-west-1.amazonaws.com
[3] => X-amz-date:20180115T224433Z )
I'm positive that you have found your answer, but in case there are others who will probably want to figure out how to check the headers or want to go down the rabbit hole trying to figure out how SignatureV4 works.
You need to set curl_option CURLINFO_HEADER_OUT
I used following to set all the options in one go, but I don't see why you can't set them one by one.
$conf = [
\CURLOPT_CUSTOMREQUEST => $method, // method
\CURLOPT_URL => $url, // url
\CURLOPT_HTTPHEADER => $requestHeaders, // headers
\CURLOPT_RETURNTRANSFER => true,
\CURLOPT_HEADER => true,
\CURLOPT_CONNECTTIMEOUT => $timeout, // set to anything that fits
\CURLOPT_PROTOCOLS => CURLPROTO_HTTPS, // whatever is your protocol
\CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1, // normally 1.1
\CURLINFO_HEADER_OUT => 1, // this will cause request headers to show up
];
curl_setopt_array($this->curl, $conf);
After running your request and before closing curl you can check for your headers and response headers all together.
$result = curl_exec($this->curl);
$info = curl_getinfo($this->curl, CURLINFO_HEADER_OUT); // run after exec before close
var_dump($info);
curl_close($this->curl);
you don't have to specify CURLINFO_HEADER_OUT when calling curl_getinfo but it will show a lot more info which makes it a little harder to find what you are looking for.
Since you mentioned in comments that you are trying to make SignatureV4 for AWS and I had the same issue, a few notes:
Although http headers are not case sensitive, when calculating hash they should be in lowercase and also specified in authorization token in lower case too. to make it easier, I just included headers in both request and hash as lowercase
You don't need to hash all the headers, since you will need to mention which headers are included in hash, in the authorization token.
Someone at AWS thought it is cool to sort querystring items alphabetically before calculating hash, so if you are sending querystring, make sure tokens are sorted before hashing it e.g.
Engine=standard&LanguageCode=en-US // will work
LanguageCode=en-US&Engine=standard // will not work
The only two headers that need to be in hash are x-amz-date and host. (at least that was the case for Polly) and they need to look like this
host:service.region.amazonaws.com
x-amz-date:20210920T180242Z
authorization:AWS4-HMAC-SHA256 Credential=xxxxxxxxxxxxxxxxxxxx/20210920/region/service/aws4_request, SignedHeaders=host;x-amz-date, Signature=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

XML-RPC failing to respond to POST requests via cURL in PHP

I'm having some issues with calling WordPress XML-RPC via cURL in PHP. It's a WordPress.com hosted blog, and the XML-RPC file is located at http://sunseekerblogbook.com/xmlrpc.php.
Starting yesterday (or at least, yesterday was when it was noticed), cURL has been failing with error #52: Empty reply from server.
The code snippet we're using is below:
$ch = curl_init('http://sunseekerblogbook.com/xmlrpc.php');
curl_setopt_array($ch, [
CURLOPT_HEADER => false,
CURLOPT_HTTPHEADER => [
'Content-Type: text/xml'
],
CURLOPT_POSTFIELDS => xmlrpc_encode_request('wp.getPosts', [
1,
WP_USERNAME,
WP_PASSWORD,
[
'number' => 15
]
]),
CURLOPT_RETURNTRANSFER => true
]);
$ret = curl_exec($ch);
$data = xmlrpc_decode($ret, 'UTF-8');
Using cURL directly however, everything returns exactly as expected:
$output = [];
exec('curl -d "<?xml version=\"1.0\" encoding=\"UTF-8\"?><methodCall><methodName>wp.getPosts</methodName><params><param><value><int>1</int></value></param><param><value><string>' . WP_USERNAME . '</string></value></param><param><value><string>' . WP_PASSWORD . '</string></value></param><param><value><struct><member><name>number</name><value><int>15</int></value></member></struct></value></param></params></methodCall>" sunseekerblogbook.com/xmlrpc.php', $output);
$data = xmlrpc_decode(implode('', $output), 'UTF-8');
We've been successfully able to query WordPress since July 2013, and we're at a dead-end as to why this has happened. It doesn't look like PHP or cURL have been updated/changed recently on the server, but the first code snippet has failed on every server we've tried it on now (with PHP 5.4+).
Using the http://sunseekerblogbook.wordpress.com/xmlrpc.php link gives the same issue.
Is there anything missing from the PHP code that would cause this issue? That it's suddenly stopped working over 12 months down the line is what has flummoxed me.
Managed to fix it. Looking at the headers sent by cURL, the only differences were that the cURL command line uses Content-Type: application/x-www-form-urlencoded and that the user agent was set to User-Agent: curl/7.30.0.
The choice of content type didn't affect it, but setting a user agent sorted it! It seems WordPress.com (but not self-hosted WordPress.org sites running the latest v3.9.2) now requires a user agent for XML-RPC requests, though this hasn't been documented anywhere that I can find.

'&' becomes '&' when trying to get contents from a URL

I was running my WebServer for months with the same Algorithm where I got the content of a URL by using this line of code:
$response = file_get_contents('http://femoso.de:8019/api/2/getVendorLogin?' . http_build_query(array('vendor'=>$vendor,'user'=>$login,'pw'=>$pw),'','&'));
But now something must have changed as out of sudden it stopped working.
In earlier days the URL looked like it should have been:
http://femoso.de:8019/api/2/getVendorLogin?vendor=100&user=test&pw=test
but now I get an error in my nginx log saying that I requested the following URL which returned a 403
http://femoso.de:8019/api/2/getVendorLogin?vendor=100&user=test&pw=test
I know that something changed on the target server, but I think that shouldn't affect me or not?!
I already spent hours and hours of reading and searching through Google and Stackoverflow, but all the suggested ways as
urlencode() or
htmlspecialchars() etc...
didn't work for me.
For your information, the environment is a zend application with a nginx server on my end and a php webservice with apache on the other end.
Like I said, it changed without any change on my side!
Thanks
Let's find out the culprit!
1) Is it http_build_query ? Try replacing:
'http://femoso.de:8019/api/2/getVendorLogin?' . http_build_query(array('vendor'=>$vendor,'user'=>$login,'pw'=>$pw)
with:
"http://femoso.de:8019/api/2/getVendorLogin?vendor={$vendor}&user={$login}&pw={$pw}"
2) Is some kind of post-processing in the place? Try replacing '&' with chr(38)
3) Maybe give a try and play a little bit with cURL?
$ch = curl_init();
curl_setopt_array($ch, array(
CURLOPT_URL => 'http://femoso.de:8019/api/2/getVendorLogin?' . http_build_query(array('vendor'=>$vendor,'user'=>$login,'pw'=>$pw),
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => true, // include response header in result
//CURLOPT_FOLLOWLOCATION => true, // uncomment to follow redirects
CURLINFO_HEADER_OUT => true, // track request header, see var_dump below
));
$data = curl_exec($ch);
curl_close($ch);
var_dump($data, curl_getinfo($ch, CURLINFO_HEADER_OUT));
exit;
Sounds like your arg_separator.output is set to "&" in your php.ini. Either comment that line out or change to just "&"
I'm no expert but that's the way the computer reads the address since it's a special character. Something with encoding. Simple fix would be to to filter by utilizing str_replace(). Something along those lines.

New Spin on cURL not working over SSL

So I've been finding a lot of posts here and other places on the interwebs regarding PHP, cURL and SSL. I've got a problem that I'm not seeing around.
Obviously, if I set SSL_VERIFYPEER/HOST to blindly accept I can get this to work, but I would like to use my cert to verify the connection.
So here is some code:
$options = array(
CURLOPT_URL => $oAuthResult['signed_url'],
CURLOPT_RETURNTRANSFER => TRUE,
CURLOPT_HEADER => 0,
CURLOPT_SSL_VERIFYPEER => TRUE,
CURLOPT_SSL_VERIFYHOST => 2,
CURLOPT_CAINFO => getcwd() . '\application\third_party\certs\rootCerr.crt'
);
curl_setopt_array($ch, $options);
try {
$result = curl_exec($ch);
$errCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if (curl_getinfo($ch, CURLINFO_HTTP_CODE) != 200) {
throw new Exception('<strong>Error trying to ExecuteWebRequest, returned: '.$errCode .'<br>URL:'.$url . '<br>POST data (if any):</strong><br>');
}
curl_close($ch);
} catch (Exception $e) {
//print the error stuff
}
The error code that is returned is 0...which means that everything is A-OK...but since nothing comes back to the screen...I'm pretty sure it's not working.
Anyone?
The $errCode you extract is the HTTP code which is 200-299 when OK. Getting 0 means it was never set due to a problem or similar.
You should rather use curl_errno() after curl_exec() to figure out if things went fine or not. (You can't check the curl_exec() return code for errors as easily, as you have CURLOPT_RETURNTRANSFER enabled which makes that function instead return the contents of the transfer it is set to get. Of course, getting no contents at all returned should also be a good indicator that something failed.)
I've implemented LibCurl Certs by using the CURLOPT_CAINFO as you have indicated...
However, by providing the file name itself wasn't good enough... It had crashed on me too.
For me, the file was referenced by relative path... Additionally, I had to make sure the cert was in Base64 format too. Then everything went through without a hitch..

Categories