php curl japanese output garbled - php

Consider following URL:
click here
There is some encoding into Japanese characters. Firefox browser on my PC is able to detect it automatically and show the characters. For Chrome, on the other hand, I have to change the encoding manually to "Shift_JIS" to see the japanese characters.
If I try to access the content via PHP-cURL, the encoded text appears garbled like this
���ϕi�̂��ƂȂ��I�݂��Ȃ̃N�`�R�~�T�C�g�������������i�A�b�g�R�X���j�ɂ��܂����I
I tried:
curl_setopt($ch, CURLOPT_ENCODING, 'Shift_JIS');
I also tried (after downloading the curl response):
$output_str = mb_convert_encoding($curl_response, 'Shift_JIS', 'auto');
$output_str = mb_convert_encoding($curl_response, 'SJIS', 'auto');
But that does not work either.
Here is the full code
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-US,en;q=0.5',
'Connection: keep-alive'
));
//curl_setopt($ch, CURLOPT_ENCODING, 'SJIS');
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$response = curl_exec($ch);

That page doesn't return valid HTML, it's actually Javascript. If you fetch it with curl and output it, add header('Content-type: text/html; charset=shift_jis'); to your code and when you load it in Chrome the characters will display properly.
Since the HTML doesn't specify the character set, you can specify it from the server using header().
To actually convert the encoding so it will display properly in your terminal, you can try the following:
Use iconv() to convert to UTF-8
$curl_response = iconv('shift-jis', 'utf-8', $curl_response);
Use mb_convert_encoding() to convert to UTF-8
$curl_response = mb_convert_encoding($curl_response, 'utf-8', 'shift-jis');
Both of those methods worked for me and I was able to see Japanese characters displayed correctly on my terminal.
UTF-8 should be fine, but if you know your system is using something different, you can try that instead.
Hope that helps.

The following code will output the Japanese characters correctly in the browser:-
<?php
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, $setUrlHere);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
// grab URL content
$response = curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
header('Content-type: text/html; charset=shift_jis');
echo $response;

Related

Accent characters are showing weird using curl

I am trying to add contacts to Constant Contact account using V3 API. Some user data contain accent characters, when I add these it is showing as unicode characters in constant contact account.
For example
First name is GÒKÜL and last name is NÁTH. It is showing in constant contact as Gu00d2Ku00dcL and Nu00c1TH. I want to show these as original.
I think issue is in my curl function to add/update contact. Below is the code
function updateContact($access_token,$contactid,$entry){
$ch = curl_init();
$base = 'https://api.cc.email/v3/';
$url = $base . '/contacts/'.$contactid;
curl_setopt($ch, CURLOPT_URL, $url);
$authorization = 'Authorization: Bearer ' . $access_token;
$ct = 'Content-Type: application/json;';
curl_setopt($ch, CURLOPT_HTTPHEADER, array($authorization, $ct));
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "PUT");
curl_setopt($ch, CURLOPT_POSTFIELDS, $entry);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_BASIC);
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
I tried $ct = 'Content-Type: application/json; charset=UTF-8'; and $result = utf8_decode(curl_exec($ch)); But not working.
I think someone can help me..
Those are not 'unicode' characters, they look a bit like Unicode Escape sequences, However, a backslash is missing. You should not be getting Gu00d2Ku00dcL, but G\u00d2K\u00dcL.
My hope is that those backslashes are actually there and something went wrong with sharing this output. If this is the case, the easiest way to parse these is to use the json_decode function.
If those backslases are really missing, then this suggests that the server you are working with is broken, and there's no easy fix for this. In that case you might want to contact who runs this server and let them know.

Twitter API token request returning gobbledygook

I'm trying to use Application Only Authentication, as described here:
https://developer.twitter.com/en/docs/basics/authentication/overview/application-only
I'm using the following PHP code to do so.
if(empty($_COOKIE['twitter_auth'])) {
require '../../social_audit_config/twitter_config.php';
$encoded_key = urlencode($api_key);
$encoded_secret = urlencode($api_secret);
$credentials = $encoded_key.":".$encoded_secret;
$encoded_credentials = base64_encode($credentials);
$request_headers = array(
'Host: api.twitter.com',
'User-Agent: BF Sharing Report',
'Authorization: Basic '.$encoded_credentials,
'Content-Type: application/x-www-form-urlencoded;charset=UTF-8',
'Content-Length: 29',
'Accept-Encoding: gzip'
);
print_r($request_headers);
$ch = curl_init();
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_HTTPHEADER, $request_headers);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_URL, 'https://api.twitter.com/oauth2/token');
curl_setopt($ch, CURLOPT_POSTFIELDS, 'grant_type=client_credentials');
$attempt_auth = curl_exec($ch);
print_r($attempt_auth);
}
It should return JSON with the token in it, but instead it returns gobbledygook, as seen in the image below:
I'm sure I'm missing some very simple step, where am I going wrong?
If I send the curl request without the headers, it returns an error in JSON format as expected, so is there something wrong with my headers?
You have few options here. Instead of setting header directly, use below
curl_setopt($ch, CURLOPT_ENCODING, 'gzip');
If you set header directly then you should use
print_r(gzdecode($attempt_auth));
See below thread as well
Decode gzipped web page retrieved via cURL in PHP
php - Get compressed contents using cURL

Save a image from a URL

I want to grab a set of images using the URL in PHP. I've tried using file_get_contents and curl. Below is the code that I have tried using.
$image = file_get_contents('http://user:pwd#server/directory/images/image1.jpg');
file_put_contents('D:/images/image1.jpg', $image);
and
$url = 'http://server/directory/images/image1.jpg';
$localFilePath = 'D:/images/image1.jpg';
$ch = curl_init ($url);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER,1);
curl_setopt($ch, CURLOPT_USERPWD, "user:pwd");
$raw = curl_exec($ch);
curl_close ($ch);
if(file_exists($localFilePath)){
unlink($localFilePath);
}
$fp = fopen($localFilePath,'wb');
fwrite($fp, $raw);
fclose($fp);
In both cases, I am getting the following error:
401 - Unauthorized : Access is denied due to invalid credentials.
The password has a special character. I can't change it to a plain password, as the password policies don't allow it.
I Don't see the place where you tell CURL to use AUTHentication:
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
I think it also might be because you're not using cookies, enable them:
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookies.txt");
The equivalent of this line is
curl_setopt($ch, CURLOPT_USERPWD, "user:pwd");
This one, and you can try this by replacing your one
curl_setopt($ch, CURLOPT_HTTPHEADER,
array("Authorization: Basic ".base64_encode("user:pwd")));
// also you can try by changing + / characters into _-
Now come to the point. Base64 has several implementation based on RFC. For example : RFC3538. Different library or different language implemented different RFC for base64 encoding/decoding. For example at some implementation it uses + and / character and some use the _ and - character.
So lets say your curl is sending the base64 string for the authorization is xyz+12= but your server is expecting the string to be as xyz_12=. So it will obviously fail to decode.

PHP File Get Contents & String Encoding

Retrieved the contents of a css file: (http://gizmodo.com/assets/stylesheets/app-ecbc6044c59319aab4c2a1e31380ef56.css)
Detected the encoding with mb_detect_encoding... says UTF-8.
Viewed the page in a browser, looks fine (readable), and declares #charset "UTF-8";
Tried to output the string, got garbage.
Tried to save it to a file, got garbage.
Tried to convert the encoding to ASCII, ISO-8859-1, and HTML-ENTITIES. No luck.
Any ideas here how to determine why this string is garbage, and how to fix it?
$url = 'http://gizmodo.com/assets/stylesheets/app-ecbc6044c59319aab4c2a1e31380ef56.css';
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch,CURLOPT_ENCODING , "gzip");
$data = curl_exec($ch);
curl_close($ch);
echo $data;
Important line is
curl_setopt($ch,CURLOPT_ENCODING , "gzip");
The Content-Encoding of the page you're trying to fetch is gzip. You'll need to uncompress it before using it.
I just tried the following and it worked fine:
echo gzdecode(file_get_contents($your_url));

My password curl is failing sometimes?

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $PathUrl);
curl_setopt($ch, CURLOPT_USERPWD, 'someuser:somepass');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
$info = curl_getinfo($ch);
Any ideas on why it works about 30% of the time and the other 70% if fails....viewing the url on any browser works all the time
You may be better off setting the Authorization header via CURLOPT_HTTPHEADER.
Eg, curl_setopt($ch, CURLOPT_HTTPHEADER, array('Authorization' => 'user:pass'))
Edit: also, this may not apply because you say it works 30% of the time, but just be aware of common forms of encoding for Auth headers, eg, base64.

Categories