Which way is fastest to save external file to my server. why and how ?
Using Curl :
$ch = curl_init();
$fp = fopen ($local_file, 'w+');
$ch = curl_init($remote_file);
curl_setopt($ch, CURLOPT_TIMEOUT, 50);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_ENCODING, "");
curl_exec($ch);
curl_close($ch);
fclose($fp);
Using Copy:
copy($extFile, "report.csv");
it mostly depends on the protocol (for instance, if it was a local file, copy() would be faster), but since you're saying "remote file", curl will probably be faster. you're using CURLOPT_ENCODING and CURLOPT_FOLLOWLOCATION, i guess that means it's transferred over http, where curl is generally much faster than copy, for at least 2 reasons:
1: PHP's fopen http wrappers doesn't use compression, but when you set CURLOPT_ENCODING to emptystring here, you tell curl to use compression if possible. (and while it depends on how libcurl is compiled, gzip and deflate compression is usually compiled in with libcurl.)
2: copy() keeps reading from the socket until the remote server closes the connection, which may be much later than when the file is completely downloaded. meanwhile, curl will only read until it has read bytes equal to the Content-Length:-http header, then close the connection itself, which is often much faster than stalling on read() until the remote server closes the connection (which copy() does, but curl_exec() doesn't.)
but the only way to know for sure ofc, TIAS.
$starttime=microtime(true);
$ch = curl_init();
$fp = fopen ($local_file, 'w+');
$ch = curl_init($remote_file);
curl_setopt($ch, CURLOPT_TIMEOUT, 50);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_ENCODING, "");
curl_exec($ch);
curl_close($ch);
fclose($fp);
echo "used ".(microtime(true)-$starttime)." seconds.\n";
vs
$starttime=microtime(true);
copy($extFile, "report.csv");
echo "used ".(microtime(true)-$starttime)." seconds.\n";
gives you roughly microsecond precision (IEEE 754 double floating point precision probably corrupts it somewhat, but probably not enough to matter.)
Related
I'm using basic cURL requests to fetch webpages in PHP, however these webpages are big in size and I'm limited in bandwidth usage.
Is there a way to reduce/optimize cURL data usage, for example using compression. I also heard that Brotli compression is the best, but I'm not sure how to use it.
$headers[] = "Accept-Encoding: gzip"; // tell the server you accept gzip
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch,CURLOPT_ENCODING , "gzip"); // tells curl to gunzip it automatically
$data = curl_exec($ch);
Not tried this with brotli, support will vary by software version which you didn't tell us about.
Downloading an image using cURL
https://cdni.rt.com/deutsch/images/2018.04/article/5ac34e500d0403503d8b4568.jpg
when saving this image manually from the browser to the local pc, the size shown by the system is 139,880 bytes
When downloading it using cURL, the file seems to be damaged and does not get considered as a valid image
its size, when downloaded using cURL, is 139,845 which is lower than the size when downloading it manually
digging the issue further, found that the server is returning the content length in the response headers as
content-length: 139845
This length is identical to what cURL downloaded, so I suspected that cURL closes the transfer once reached the alleged (possibly wrong) length by the server
Is there any way to make cURL download the file completely even if the content-length header is wrong
Used code:
//curl ini
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER,0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT,20);
curl_setopt($ch, CURLOPT_REFERER, 'http://www.bing.com/');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.8) Gecko/2009032609 Firefox/3.0.8');
curl_setopt($ch, CURLOPT_MAXREDIRS, 5); // Good leeway for redirections.
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // Many login forms redirect at least once.
curl_setopt($ch, CURLOPT_COOKIEJAR , "cookie.txt");
//curl get
$x='error';
$url='https://cdni.rt.com/deutsch/images/2018.04/article/5ac34e500d0403503d8b4568.jpg';
curl_setopt($ch, CURLOPT_HTTPGET, 1);
curl_setopt($ch, CURLOPT_URL, trim($url));
$exec=curl_exec($ch);
$x=curl_error($ch);
$fp = fopen('test.jpg','x');
fwrite($fp, $exec);
fclose($fp);
the server has a bugged implementation of Accept-Encoding compressed transfer mechanism.
the response is ALWAYS gzip-compressed, but won't tell the client that it's gzip-compressed unless the client has the Accept-Encoding: gzip header in the request. when the server doesn't tell the client that it's gzipped, the client won't gzip-decompress it before saving it, thus your corrupted download. tell curl to offer gzip compression by setting CURLOPT_ENCODING,
curl_setopt($ch,CURLOPT_ENCODING,'gzip');
, then the server will tell curl that it's gzip-compressed, and curl will decompress it for you, before giving it to PHP.
you should probably tell the server admin about this, it's a serious bug in his web server, corrupting downloads.
libcurl has an option for that called CURLOPT_IGNORE_CONTENT_LENGTH, unfortunately this is not natively supported in php, but you can trick php into setting the option anyway, by using the correct magic number (which, at least on my system is 136),
if(!defined('CURLOPT_IGNORE_CONTENT_LENGTH')){
define('CURLOPT_IGNORE_CONTENT_LENGTH',136);
}
if(!curl_setopt($ch,CURLOPT_IGNORE_CONTENT_LENGTH,1)){
throw new \RuntimeException('failed to set CURLOPT_IGNORE_CONTENT_LENGTH! - '.curl_errno($ch).': '.curl_error($ch));
}
you can find the correct number for your system by compiling and running the following c++ code:
#include <iostream>
#include <curl/curl.h>
int main(){
std::cout << CURLOPT_IGNORE_CONTENT_LENGTH << std::endl;
}
but it's probably 136.
lastly, protip, file_get_contents ignore the content-length header altogether, and just keeps downloading until the server closes the connection (which is potentially much slower than curl) - also, you should probably contact the server operator and let him know, something's wrong/bugged with his server.
I have the following PHP script that works perfectly 99% of the time. But it will not download an image from this one server which I think is running Varnish Cache.
<?php
$imglink = 'http://www.dirtrider.com/wp-content/uploads/2014/10/WD-10_1_14-001.jpg';
$ch = curl_init($imglink);
$fp = fopen('/home/path/www/tmp/test.jpg', "w");
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_exec($ch);
fclose($fp);
You get a 403 Forbidden error if you use CURL to load that image. You can work around this error very easily. Just add an alternate user agent for your CURL request:
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
And et voila! It works like a charm. Seems like Varnishe Cache blocks CURL requests which use a CURL default user agent.
this is very weird, php curl download stops at 95% all the time. its driving me crazy.
here is the code that i'm using nothing fancy
$fp = fopen($file, 'w');
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "http://www.domain.com/");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch,CURLOPT_HTTPHEADER,array("ETag: $rddash"));
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_exec($ch);
curl_close($ch);
fclose($fp);
Something i noticed, the remote website is using Etag, so i used it but still not working.
what could be the reason the download stops before it completes??
Maybe a timeout issue in your php.ini settings. Use set_time_limit(0); in your code.
See the manual for more details.
Also check the PHP error log.
I have sites where stored some xml files, and I want to download to our server, we don't have ftp connection, so we can download through http. I always used file(url) is there any better way to download files through php
If you can access them via http, file() (which reads the file into an array) and file_get_contents() (which reads content into a string) are perfectly fine provided that the wrappers are enabled.
Using CURL could also be a nice option:
// create a new CURL resource
$ch = curl_init();
// set URL and other appropriate options
curl_setopt($ch, CURLOPT_URL, "http://www.server.com/file.zip");
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
set_time_limit(300); # 5 minutes for PHP
curl_setopt($ch, CURLOPT_TIMEOUT, 300); # and also for CURL
$outfile = fopen('/mysite/file.zip', 'wb');
curl_setopt($ch, CURLOPT_FILE, $outfile);
// grab file from URL
curl_exec($ch);
fclose($outfile);
// close CURL resource, and free up system resources
curl_close($ch);