Downloading an image using cURL
https://cdni.rt.com/deutsch/images/2018.04/article/5ac34e500d0403503d8b4568.jpg
when saving this image manually from the browser to the local pc, the size shown by the system is 139,880 bytes
When downloading it using cURL, the file seems to be damaged and does not get considered as a valid image
its size, when downloaded using cURL, is 139,845 which is lower than the size when downloading it manually
digging the issue further, found that the server is returning the content length in the response headers as
content-length: 139845
This length is identical to what cURL downloaded, so I suspected that cURL closes the transfer once reached the alleged (possibly wrong) length by the server
Is there any way to make cURL download the file completely even if the content-length header is wrong
Used code:
//curl ini
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER,0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT,20);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.bing.com/');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8');
curl_setopt($ch, CURLOPT_MAXREDIRS, 5); // Good leeway for redirections.
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // Many login forms redirect at least once.
curl_setopt($ch, CURLOPT_COOKIEJAR , "cookie.txt");
//curl get
$x='error';
$url='https://cdni.rt.com/deutsch/images/2018.04/article/5ac34e500d0403503d8b4568.jpg';
curl_setopt($ch, CURLOPT_HTTPGET, 1);
curl_setopt($ch, CURLOPT_URL, trim($url));
$exec=curl_exec($ch);
$x=curl_error($ch);
$fp = fopen('test.jpg','x');
fwrite($fp, $exec);
fclose($fp);
the server has a bugged implementation of Accept-Encoding compressed transfer mechanism.
the response is ALWAYS gzip-compressed, but won't tell the client that it's gzip-compressed unless the client has the Accept-Encoding: gzip header in the request. when the server doesn't tell the client that it's gzipped, the client won't gzip-decompress it before saving it, thus your corrupted download. tell curl to offer gzip compression by setting CURLOPT_ENCODING,
curl_setopt($ch,CURLOPT_ENCODING,'gzip');
, then the server will tell curl that it's gzip-compressed, and curl will decompress it for you, before giving it to PHP.
you should probably tell the server admin about this, it's a serious bug in his web server, corrupting downloads.
libcurl has an option for that called CURLOPT_IGNORE_CONTENT_LENGTH, unfortunately this is not natively supported in php, but you can trick php into setting the option anyway, by using the correct magic number (which, at least on my system is 136),
if(!defined('CURLOPT_IGNORE_CONTENT_LENGTH')){
define('CURLOPT_IGNORE_CONTENT_LENGTH',136);
}
if(!curl_setopt($ch,CURLOPT_IGNORE_CONTENT_LENGTH,1)){
throw new \RuntimeException('failed to set CURLOPT_IGNORE_CONTENT_LENGTH! - '.curl_errno($ch).': '.curl_error($ch));
}
you can find the correct number for your system by compiling and running the following c++ code:
#include <iostream>
#include <curl/curl.h>
int main(){
std::cout << CURLOPT_IGNORE_CONTENT_LENGTH << std::endl;
}
but it's probably 136.
lastly, protip, file_get_contents ignore the content-length header altogether, and just keeps downloading until the server closes the connection (which is potentially much slower than curl) - also, you should probably contact the server operator and let him know, something's wrong/bugged with his server.
Related
I have audio files on a remote server that are streamed / chunked to the user. This all works great in the clients browser.
But when I try to download and save the files locally from another server using curl, it only seems to be able to download small files (less than 10mb) sucessfully, anything larger and it seems to only download the header.
I assume this is because of the chunking, so my question is how do I make curl download the larger (chunked) files?
With wget on the cli on linux this is as simple as :
wget -cO - https://example.com/track?id=460 > mytrack.mp3
This is the func I have written using curl in PHP, but like I say it's only downloading headers on large files :
private function downloadAudio($url, $fn){
$ch = curl_init($url);
$path = TEMP_DIR . $fn;
$fp = fopen($path, 'wb');
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
if (file_exists($path)) {
return true;
}
return false;
}
In my case it was failing as I had forgotten to increase the default PHP memory_limit on the origin server.
It turned out after posting this question that it was actually successfully downloading any files that seemed to be below the 100mb mark, not 10mb as I had stated in the question. As soon as I realised this I checked the memory_limit and low and behold it was set to the default 128m.
I hadn't noticed any problems client side as it was being chunked, but when the server tried to grab an entire 300mb file in less than 1 second the memory limit must have been reached.
I have the following PHP script that works perfectly 99% of the time. But it will not download an image from this one server which I think is running Varnish Cache.
<?php
$imglink = 'http://www.dirtrider.com/wp-content/uploads/2014/10/WD-10_1_14-001.jpg';
$ch = curl_init($imglink);
$fp = fopen('/home/path/www/tmp/test.jpg', "w");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_exec($ch);
fclose($fp);
You get a 403 Forbidden error if you use CURL to load that image. You can work around this error very easily. Just add an alternate user agent for your CURL request:
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
And et voila! It works like a charm. Seems like Varnishe Cache blocks CURL requests which use a CURL default user agent.
Here's the script I use to download an image from Facebook
function downloadImage($image_url)
{
// Set filename
$filename = dirname(__FILE__).'/wow.jpg';
// Proceed to download
// Open file to save
$file = fopen($filename, 'w');
// Use curl
$ch = curl_init($image_url);
// Set options
curl_setopt($ch, CURLOPT_FILE, $file);
curl_setopt($ch, CURLOPT_ENCODING, '');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
// Execute
$data = curl_exec($ch);
// Close curl
curl_close($ch);
// Close file
fclose($file);
}
// Download image
downloadImage('https://graph.facebook.com/WowSuchPage/picture?width=720&height=720');
The download succeeds though, but when I open the image file, it appears to be broken.
Here's the image that gets downloaded:
This only occurs when the image source is from Facebook, any other domains are OK. I don't think it has something to do with my ISP though because if I download the image through my browser, it's all fine
I hope you can help me on this one as this has been bugging me for some time now. Thanks!
EDIT
By the way, I'm using Wampserver 2.4 on localhost. My PHP version is 5.4.12
FIXED
Alright, I finally found the issue. It seems either cURL or the SSL component/extension in my local wampserver is the root of the issue. I only had to use "http://" rather than "https://" to access the image perfectly
But it would really be great if I can still download the image as perfect as it should be even in https://, I won't close this one yet until I find some answers. Thanks for your help
I wish to download files from my web server with download progress information. For that purpose, PHP cURL seems to be the best choice.
However, I have difficulties that the downloaded files are not placed into Downloads folder, where all the web files are normally downloaded. I use the following file download routine:
$fp = fopen(dirname(__FILE__) . 'uploaded.pdf', 'w+');
$url = "file:///D:/WEB/SAIFA/WWW/PickUpTest.pdf";
$ch = curl_init(str_replace(" ","%20", $url));
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_BUFFERSIZE, 1024*8);
curl_setopt($ch, CURLOPT_NOPROGRESS, false );
curl_setopt($ch, CURLOPT_PROGRESSFUNCTION, 'progressCallback' );
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt( $ch, CURLOPT_FILE, $fp );
curl_exec( $ch );
curl_close($ch);
fclose($fp);
unset($fp);
My problem is, that instead Downloads folder, the file is silently downloaded into my WWW folder, where the my PHP scripts including this cURL one reside. I get no File Download Save As dialog box neither.
To force Save As dialog box, I added the following header, at the beginning of the script:
header("Content-Disposition: attachment; filename=\"uploaded.pdf\"");
$fp = fopen(dirname(__FILE__) . 'uploaded.pdf', 'w+');
...
After using the header, I get the Save As dialog box however, the file is still silently download into the folder with my PHP scripts. In the Downloads folder, a file 'uploaded.pdf' with filesize 0 is saved.
My question is, how to make PHP cURL to download files properly and place them into Downloads folder and offer Save As dialog box?
I use:
WAMP
Windows 7
PHP Version 5.4.12
Curl Version 7.29.0
By using the file functions you're actually asking your server to save the file so it makes sense that the results of the cURL call end up in your PHP folder.
What you really want, if I understand the problem, is to send the results of the cURL back to the browser. You're halfway there by sending the header(...) - which lets the user's browser know a file is coming and should be downloaded, the step you've missed is sending the cURL data with the header.
You could echo the contents of the file after you've saved it or, more efficiently (assuming you don't want an extra copy of the file), remove the code to save the file locally and remove the cURL option CURLOPT_RETURNTRANSFER. That will tell cURL to send the output directly so it will become the data for the download.
Hope that helps!
EDIT A simple example that grabs a local file (C:\test.pdf) and sends it to the user's browser (as uploaded.pdf).
<?php
header("Content-Disposition: attachment; filename=\"uploaded.pdf\"");
// Get a FILE url to my test document
$url = 'file://c:/test.pdf';
$url= str_replace(" ","%20", $url);
$ch= curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_exec( $ch );
curl_close ($ch);
Hope that helps a bit more!
I am using curl and php to find out information about a given url (e.g. http status code, mimetype, http redirect location, page title etc).
$ch = curl_init($url);
$useragent="Mozilla/5.0 (X11; U; Linux x86_64; ga-GB) AppleWebKit/532.9 (KHTML, like Gecko) Chrome/5.0.307.11 Safari/532.9";
curl_setopt($ch,CURLOPT_HTTPHEADER,array (
"Accept: application/rdf+xml;q=0.9, application/json;q=0.6, application/xml;q=0.5, application/xhtml+xml;q=0.3, text/html;q=0.2, */*;q=0.1"
));
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$content=curl_exec($ch);
$chinfo = curl_getinfo($ch);
curl_close($ch);
This generally works well. However, if the url points to a larger file then I get a fatal error:
Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to allocate 14421576 bytes)
Is there anyway of preventing this? For example, by telling curl to give up if the file is too large, or by catching the error?
As a workaround, I've added
curl_setopt($ch, CURLOPT_TIMEOUT, 3);
which assumes that any file that takes longer than 3 seconds to load will exhaust the allowed memory, but this is far from satisfactory.
Have you tried using CURLOPT_FILE to save the file directly to disk instead of using memory? You can even specify /dev/null to put it nowhere at all...
Or, you can use CURLOPT_WRITEFUNCTION to set a custom data-writing function. Have the function just scan the headers and then throw away the actual data.
Alternately, give PHP some more memory via php.ini.
If you're getting header information, then why not use a HEAD request? That avoids the memory usage of getting the whole page in a maximumn 16MiB memory slot.
curl_setopt($ch, CURLOPT_HEADER, true);
Then, for the page title, use file_get_contents() instead, as it's much better with its native memory allocation.