PHP File Get Contents & String Encoding

PHP File Get Contents & String Encoding - php

Retrieved the contents of a css file: (http://gizmodo.com/assets/stylesheets/app-ecbc6044c59319aab4c2a1e31380ef56.css)
Detected the encoding with mb_detect_encoding... says UTF-8.
Viewed the page in a browser, looks fine (readable), and declares #charset "UTF-8";
Tried to output the string, got garbage.
Tried to save it to a file, got garbage.
Tried to convert the encoding to ASCII, ISO-8859-1, and HTML-ENTITIES. No luck.
Any ideas here how to determine why this string is garbage, and how to fix it?

$url = 'http://gizmodo.com/assets/stylesheets/app-ecbc6044c59319aab4c2a1e31380ef56.css';
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
curl_setopt($ch,CURLOPT_ENCODING , "gzip");
$data = curl_exec($ch);
curl_close($ch);
echo $data;
Important line is
curl_setopt($ch,CURLOPT_ENCODING , "gzip");

The Content-Encoding of the page you're trying to fetch is gzip. You'll need to uncompress it before using it.
I just tried the following and it worked fine:
echo gzdecode(file_get_contents($your_url));

Related

PHP file_get_contents only works for some servers

I am working on a website, that will have a large number of files. So, I made a separate server for my files such as images and txt files. The problem is that php's file_get_contents function does not work for this server.
I have tried echo file_get_contents("http://url"); and I get nothing, but when I do echo file_get_contents("http://google.com"); I get google's homepage. This the same case for a curl connection.
$ch = curl_init();
$url = "http://running-files.rf.gd/hello.html";
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$body = curl_exec($ch);
$info = curl_getinfo($ch);
$error = curl_errno($ch);
curl_close($ch);
echo $body;
My guess is that there is something need in the .htaccess file. Anyone have some suggestions?

If you're opening a URI with special characters,
(such as spaces) you need to encode the URI with urlencode().
Ex:
file_get_contents("http://domain-name.com?id=".urlencode("something with special characters"));
"something with special characters" can be a variable at most cases

php curl japanese output garbled

Consider following URL:
click here
There is some encoding into Japanese characters. Firefox browser on my PC is able to detect it automatically and show the characters. For Chrome, on the other hand, I have to change the encoding manually to "Shift_JIS" to see the japanese characters.
If I try to access the content via PHP-cURL, the encoded text appears garbled like this
���ϕi�̂��ƂȂ��I�݂��Ȃ̃N�`�R�~�T�C�g�������������i�A�b�g�R�X���j�ɂ��܂����I
I tried:
curl_setopt($ch, CURLOPT_ENCODING, 'Shift_JIS');
I also tried (after downloading the curl response):
$output_str = mb_convert_encoding($curl_response, 'Shift_JIS', 'auto');
$output_str = mb_convert_encoding($curl_response, 'SJIS', 'auto');
But that does not work either.
Here is the full code
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-US,en;q=0.5',
'Connection: keep-alive'
));
//curl_setopt($ch, CURLOPT_ENCODING, 'SJIS');
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
$response = curl_exec($ch);

That page doesn't return valid HTML, it's actually Javascript. If you fetch it with curl and output it, add header('Content-type: text/html; charset=shift_jis'); to your code and when you load it in Chrome the characters will display properly.
Since the HTML doesn't specify the character set, you can specify it from the server using header().
To actually convert the encoding so it will display properly in your terminal, you can try the following:
Use iconv() to convert to UTF-8
$curl_response = iconv('shift-jis', 'utf-8', $curl_response);
Use mb_convert_encoding() to convert to UTF-8
$curl_response = mb_convert_encoding($curl_response, 'utf-8', 'shift-jis');
Both of those methods worked for me and I was able to see Japanese characters displayed correctly on my terminal.
UTF-8 should be fine, but if you know your system is using something different, you can try that instead.
Hope that helps.

The following code will output the Japanese characters correctly in the browser:-
<?php
// create a new cURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, $setUrlHere);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
// grab URL content
$response = curl_exec($ch);
// close cURL resource, and free up system resources
curl_close($ch);
header('Content-type: text/html; charset=shift_jis');
echo $response;

How to pass a UTF-8 URL to cURL?

I have a table in UTF-8 format. I want to read data from the database and then to curl get content. I pass url in UTF-8 format, but I can't get content. what should I do to solve the problem?
I use below code, but had a problem:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$this->content = curl_exec($ch);
curl_close($ch);

Try to utf8_encode and then urlencode() the URL,
$url = urlencode(utf8_encode($url));

Unable to use file_get_contents(), returns nothing

I'm trying to get some data from a website that is not mine, using this code.
<?
$text = file_get_contents("https://ninjacourses.com/explore/4/");
echo $text;
?>
However, nothing is being echo'd, and the string length is 0.
I've done this method before, and it has worked no problem, but with this website, it is not working at all.
Thanks!

I managed to get the contents using curl like this:
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, "https://ninjacourses.com/explore/4/");
$result = curl_exec($ch);
curl_close($ch);

cURL is a way you can hit a URL from your code to get a html response from it. cURL means client URL which allows you to connect with other URLs and use their responses in your code
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, "https://ninjacourses.com/explore/4/");
$result = curl_exec($ch);
curl_close($ch);
i think this is useful for you curl-with-php and another

Incorrect MD5 checksum after downloading with CURL in PHP - file_get_contents works fine

I have a script where I have to download some files and to make sure that everything worked fine I'm comparing MD5 checksums.
I have found that the checksums are not correct when downloading with CURL. The script below demonstrates this. It downloads the Google logo and compares checksums.
$url = 'http://www.google.com/intl/en_ALL/images/logo.gif';
echo md5_file($url)."\n";
$path = 'f1';
file_put_contents($path, file_get_contents($url));
echo md5_file($path)."\n";
$path = 'f2';
$out = fopen($path, 'wb');
$ch = curl_init();
curl_setopt($ch, CURLOPT_FILE, $out);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_URL, $url);
curl_exec($ch);
curl_close($ch);
echo md5_file($path)."\n";
the output is:
e80d1c59a673f560785784fb1ac10959
e80d1c59a673f560785784fb1ac10959
d83892759d58a1281e3f3bc7503159b5
The first two are correct (they match the MD5 checksum when I download the logo using firefox) and the result produced by curl is not OK.
any ideas how to fix that?
thanks for your help
UPDATE:
interestingly the code below works just fine and produces the correct output. The problem really only seems to exist when saving to a file. Unfortunately I have to save directly to a file since the files I'm downloading can get rather large.
$path = 'f3';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
file_put_contents($path, curl_exec ($ch));
echo md5_file($path)."\n";
curl_close ($ch);

You're missing an fclose($out), which could account for md5_file seeing an incomplete file.

Try to add
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
To your curl options

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP File Get Contents & String Encoding - php

The Content-Encoding of the page you're trying to fetch is gzip. You'll need to uncompress it before using it. I just tried the following and it worked fine: echo gzdecode(file_get_contents($your_url));

Related

PHP file_get_contents only works for some servers

php curl japanese output garbled

How to pass a UTF-8 URL to cURL?

Unable to use file_get_contents(), returns nothing

Incorrect MD5 checksum after downloading with CURL in PHP - file_get_contents works fine

Categories

Resources