PHP - Safe way to download large files?

PHP - Safe way to download large files? - php

Information
There are many ways to download files in PHP, file_get_contents + file_put_contents, fopen, readfile and cURL.
Question?
When having a large file, let's say 500 MB from another server / domain, what is the "correct" way to downloaded it safe? If connection failes it should find the position and continue OR download the file again if it contains errors.
It's going to be used on a web site, not in php.exe shell.
What I figured out so far
I've read about AJAX solutions with progress bars but what I'm really looking for is a PHP solution.
I don't need to buffer the file to a string like file_get_contents does. That probably uses memory as well.
I've also read about memory problems. A solution that don't use that much memory might be prefered.
Concept
This is sort of what I want if the result is false.
function download_url( $url, $filename ) {
// Code
$success['success'] = false;
$success['message'] = 'File not found';
return $success;
}

The easiest way to copy large files can be demonstrated here Save large files from php stdin but the does not shows how to copy files with http range
$url = "http://REMOTE_FILE";
$local = __DIR__ . "/test.dat";
try {
$download = new Downloader($url);
$download->start($local); // Start Download Process
} catch (Exception $e) {
printf("Copied %d bytes\n", $pos = $download->getPos());
}
When an Exception occur you can resume the file download for the last point
$download->setPos($pos);
Class used
class Downloader {
private $url;
private $length = 8192;
private $pos = 0;
private $timeout = 60;
public function __construct($url) {
$this->url = $url;
}
public function setLength($length) {
$this->length = $length;
}
public function setTimeout($timeout) {
$this->timeout = $timeout;
}
public function setPos($pos) {
$this->pos = $pos;
}
public function getPos() {
return $this->pos;
}
public function start($local) {
$part = $this->getPart("0-1");
// Check partial Support
if ($part && strlen($part) === 2) {
// Split data with curl
$this->runPartial($local);
} else {
// Use stream copy
$this->runNormal($local);
}
}
private function runNormal($local) {
$in = fopen($this->url, "r");
$out = fopen($local, 'w');
$pos = ftell($in);
while(($pos = ftell($in)) <= $this->pos) {
$n = ($pos + $this->length) > $this->length ? $this->length : $this->pos;
fread($in, $n);
}
$this->pos = stream_copy_to_stream($in, $out);
return $this->pos;
}
private function runPartial($local) {
$i = $this->pos;
$fp = fopen($local, 'w');
fseek($fp, $this->pos);
while(true) {
$data = $this->getPart(sprintf("%d-%d", $i, ($i + $this->length)));
$i += strlen($data);
fwrite($fp, $data);
$this->pos = $i;
if ($data === - 1)
throw new Exception("File Corupted");
if (! $data)
break;
}
fclose($fp);
}
private function getPart($range) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $this->url);
curl_setopt($ch, CURLOPT_RANGE, $range);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, $this->timeout);
$result = curl_exec($ch);
$code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
// Request not Satisfiable
if ($code == 416)
return false;
// Check 206 Partial Content
if ($code != 206)
return - 1;
return $result;
}
}

You'd want to download the remote file in chunks. This answer has a great example:
How download big file using PHP (low memory usage)

Related

"$keyMaterial must be a string, resource, or OpenSSLAsymmetricKey" Error in firebase/php-jwt decode function

I'm using the below php code to retrieve the keys from given google docs url and it's working fine because I can check them on my browser. The code is based on this answer.
<?php
require_once('../vendor/autoload.php');
require_once('../vendor/firebase/php-jwt/src/BeforeValidException.php');
require_once('../vendor/firebase/php-jwt/src/ExpiredException.php');
require_once('../vendor/firebase/php-jwt/src//SignatureInvalidException.php');
use \Firebase\JWT\JWT;
use \Firebase\JWT\Key;
$token = "eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJpc3MiOiJmaXJlYmFzZS1hZG1pbnNkay04Y25oM0BvcmRlcnMtYXBwLTdiMTYxLmlhbS5nc2VydmljZWFjY291bnQuY29tIiwic3ViIjoiZmlyZWJhc2UtYWRtaW5zZGstOGNuaDNAb3JkZXJzLWFwcC03YjE2MS5pYW0uZ3NlcnZpY2VhY2NvdW50LmNvbSIsImF1ZCI6Im9yZGVycy1hcHAtN2IxNjEiLCJpYXQiOjE2NTU5MjY3NzAsImV4cCI6MTY1NTkzMDM3MCwidWlkIjoxLCJjbGFpbXMiOnsiZnVsbG5hbWUiOiJNYWRzb24ifX0.XVdrqlzKxEexcDwbRzxrPiVXwcV9WHPBjSvYxkO86DmSZXGzt2Fpqe-Vuy3qhDHD5B73vqnKRNxomPil47ig49AGJPmci9o0HeZCt1lr7WVtKyPa4uHudkLor3c3VrhXstfXFnrCo6t9UHDLmAPUjeLbKKA4w1mqygN7KCTMCXbKV7QQgqsVfxu0DdI4npuBWEBj3z0W3vJaXz0R3NvpdMWgrVvBc5YXGn_NB2JQ9HvrLG2U2WYvqKWtIJF5xrDKP48OgU1-DO82dQFu2ouLN0dOjnmbOLU8qlau21rXeCu0zMbJ5C-_5kI5EIsXSs22yYU-BPXsGRhRwRAOo85GSA";
$keys_file = "publicKeys.json"; // the file for the downloaded public keys
$cache_file = "pkeys.cache"; // this file contains the next time the system has to revalidate the keys
/**
* Checks whether new keys should be downloaded, and retrieves them, if needed.
*/
function checkKeys()
{
global $cache_file;
if (file_exists($cache_file)) {
$fp = fopen($cache_file, "r+");
if (flock($fp, LOCK_SH)) {
$contents = fread($fp, filesize($cache_file));
if ($contents > time()) {
flock($fp, LOCK_UN);
} elseif (flock($fp, LOCK_EX)) { // upgrading the lock to exclusive (write)
// here we need to revalidate since another process could've got to the LOCK_EX part before this
if (fread($fp, filesize($cache_file)) <= time()) {
refreshKeys($fp);
}
flock($fp, LOCK_UN);
} else {
throw new \RuntimeException('Cannot refresh keys: file lock upgrade error.');
}
} else {
// you need to handle this by signaling error
throw new \RuntimeException('Cannot refresh keys: file lock error.');
}
fclose($fp);
} else {
refreshKeys();
}
}
/**
* Downloads the public keys and writes them in a file. This also sets the new cache revalidation time.
* #param null $fp the file pointer of the cache time file
*/
function refreshKeys($fp = null)
{
global $keys_file;
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, "https://www.googleapis.com/robot/v1/metadata/x509/securetoken#system.gserviceaccount.com");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
$data = curl_exec($ch);
$header_size = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
$headers = trim(substr($data, 0, $header_size));
$raw_keys = trim(substr($data, $header_size));
if (preg_match('/age:[ ]+?(\d+)/i', $headers, $age_matches) === 1) {
$age = $age_matches[1];
if (preg_match('/cache-control:.+?max-age=(\d+)/i', $headers, $max_age_matches) === 1) {
$valid_for = $max_age_matches[1] - $age;
ftruncate($fp, 0);
fwrite($fp, "" . (time() + $valid_for));
fflush($fp);
// $fp will be closed outside, we don't have to
$fp_keys = fopen($keys_file, "w");
if (flock($fp_keys, LOCK_EX)) {
fwrite($fp_keys, $raw_keys);
fflush($fp_keys);
flock($fp_keys, LOCK_UN);
}
fclose($fp_keys);
}
}
}
/**
* Retrieves the downloaded keys.
* This should be called anytime you need the keys (i.e. for decoding / verification).
* #return null|string
*/
function getKeys()
{
global $keys_file;
$fp = fopen($keys_file, "r");
$keys = null;
if (flock($fp, LOCK_SH)) {
$keys = fread($fp, filesize($keys_file));
flock($fp, LOCK_UN);
}
fclose($fp);
return $keys;
}
checkKeys();
$pKeys_raw = getKeys();
// echo json_encode($pKeys_raw); exit;
if ($pKeys_raw) {
$pkeys = json_decode($pKeys_raw, true);
// $decodejwt = JWT::decode($token, $pkeys, ["RS256"]);
$decodejwt = JWT::decode($token, new Key($pkeys, "RS256"));
$decoded_array = (array) $decodejwt;
echo "Decode:\n" . print_r($decoded_array, true) . "\n";
} else {
echo "empty";
}
These are the keys from google api:
{
"1aef569f52414e9f4a7104b6d071f066dfeed677":
"-----BEGIN CERTIFICATE-----
\nMIIDHDCCAgSgAwIBAgIIQ8idkMV5aoQwDQYJKoZIhvcNAQEFBQAwMTEvMC0GA1UE
\nAwwmc2VjdXJldG9rZW4uc3lzdGVtLmdzZXJ2aWNlYWNjb3VudC5jb20wHhcNMjIw
\nNjA0MDkzODQyWhcNMjIwNjIwMjE1MzQyWjAxMS8wLQYDVQQDDCZzZWN1cmV0b2tl
\nbi5zeXN0ZW0uZ3NlcnZpY2VhY2NvdW50LmNvbTCCASIwDQYJKoZIhvcNAQEBBQAD
\nggEPADCCAQoCggEBAM9SHVisixHJe5omHxC4iUIdPoKmODvIkVWt4VgJQk4XNUn3
\nm8J1JRIVfIuNCLFiwvQUKu2Gb8e4pQQY0DAuTeno3NY+HLvb6dgq04tXWWo44IHQ
\n8t6IZoctzI9Vz41Vi/te9sk0fU5mMSX2zkQPmN4eSkwA9Vxcm1I1C+9m2njM6+Fy
\nrGfA5PPpFCKEU3rvWNalS/oOHQK9oG9ch4QXDm6ax6wgPXdxCMTm/oX58h+0d4F0
\n0iO20NEHFbjT5C+B4S+d4HOYVfY3tJOmtVBHxMNGe4N5LamsLQIqDRoQId14oT/A
\nYrFvp1RYLkkNXfiShmkHtgH9iutDi6as5LIzLgUCAwEAAaM4MDYwDAYDVR0TAQH/
\nBAIwADAOBgNVHQ8BAf8EBAMCB4AwFgYDVR0lAQH/BAwwCgYIKwYBBQUHAwIwDQYJ
\nKoZIhvcNAQEFBQADggEBAJjhWc3AO86f/5SFontdVUrRC+C7c+u9EyE8WMnEX5eK
\nU05vEiqqi22MR+Cv3SaB1gC/koKt7gGWKR+n7yRCdRHQALK0gSpIb6K4aSJR3qKW
\naR0TrXSisRVEHwMXVWAXMHM+jCHsFCDf4EJlm2CJMLODKNdwOsRdxG0No6sB7I92
\nattm8pJ2+qL+Q/Pe7NwTMd5PlEHxebJZFDAE5+F6QeO7hRPftA6B/PT+lTSRmdbS
\nRIJgAJmUFO5rSmcIsrcyCCrI9IbwKyA7qP8jKQ30ROHJyR10smTRYAIvpXhZbPm2
\nPxgtkJNN0GCVv7fLEnpWzF4+6nUe73sbdzLPdXIdL6A=
\n-----END CERTIFICATE-----\n",
"f90fb1ae048a548fb681ad6092b0b869ea467ac6":
"-----BEGIN CERTIFICATE-----
\nMIIDHTCCAgWgAwIBAgIJAI5jwaS/+yl0MA0GCSqGSIb3DQEBBQUAMDExLzAtBgNV
\nBAMMJnNlY3VyZXRva2VuLnN5c3RlbS5nc2VydmljZWFjY291bnQuY29tMB4XDTIy
\nMDYxMjA5Mzg0NVoXDTIyMDYyODIxNTM0NVowMTEvMC0GA1UEAwwmc2VjdXJldG9r
\nZW4uc3lzdGVtLmdzZXJ2aWNlYWNjb3VudC5jb20wggEiMA0GCSqGSIb3DQEBAQUA
\nA4IBDwAwggEKAoIBAQC/UMsDz3GlGlDZsDYq7//fjP3x4hKdPVygGADdA2OK2akz
\n7it/Wk80fowrE46PhnG/NJ4aU6MHteJDBfeVAn5kN5K9Ljl9YgqsNbfcDIWf5nhU
\nUktVFvuPiyotrrGxOPmuRskEPDAZsZc6jfujkTB+fRLnYYUOOXYAsp7EiC7txQoo
\nezKSv+HoPpF2HCke+Mb8fk6ar2ZjvAPEtO+1jKuk3fA40B/i4ywmf0YOAywC7tSS
\nENIgJfmOaFVQO9gkDcUqiQXKMbs91602eHTSmsv8K0fUGzx/TqxbBApAxMNSsoTI
\nQe5zZvNY18ZdGtz5z+BE1Y/2Tu/M5NwAgVJaUDsXAgMBAAGjODA2MAwGA1UdEwEB
\n/wQCMAAwDgYDVR0PAQH/BAQDAgeAMBYGA1UdJQEB/wQMMAoGCCsGAQUFBwMCMA0G
\nCSqGSIb3DQEBBQUAA4IBAQAjXtjKN4RPPNEVTDAWcOuao7kiD+8zjzz25aXz+32d
\nUawyBF602j3Q2hPIfLBp2Zja7crigzKHBXF7bixLkleKkSb/0HLwoNPH4AiPneJn
\njSVyvcOGQ4x4ktDwlYWQZJM8hGkurvf6IUf4uJf5wEyMM1qNDxlGdkXqe1L8Ub0x
\nIKvywHeCbjdySMoSC2+6fYxqnhVlmxBhsOfdvW6SxuyUWpkMpY/Q4KekTCU7NPpQ
\nF7hAypfuLYiEv/EJd0tSa6HLLQ10jP0042bqCJXWNmYF/zh1clGjlm3G96y89EjX
\nVAGeFTGwUgzF5WQCMFa9wx+8Ch1zEAxLREoQmbIkFCSs
\n-----END CERTIFICATE-----\n"
}
I'm trying to decode the token generated with my private key and my google service account email but it's returning the error below:
"$keyMaterial must be a string, resource, or OpenSSLAsymmetricKey"
Am I missing something? I've read some others answers and I have checked server time and it's ok so I couldn't find what's wrong.

Get size of readfile transferred [duplicate]

Is there a way to get the size of a remote file http://my_url/my_file.txt without downloading the file?

Found something about this here:
Here's the best way (that I've found) to get the size of a remote
file. Note that HEAD requests don't get the actual body of the request,
they just retrieve the headers. So making a HEAD request to a resource
that is 100MB will take the same amount of time as a HEAD request to a
resource that is 1KB.
<?php
/**
* Returns the size of a file without downloading it, or -1 if the file
* size could not be determined.
*
* #param $url - The location of the remote file to download. Cannot
* be null or empty.
*
* #return The size of the file referenced by $url, or -1 if the size
* could not be determined.
*/
function curl_get_file_size( $url ) {
// Assume failure.
$result = -1;
$curl = curl_init( $url );
// Issue a HEAD request and follow any redirects.
curl_setopt( $curl, CURLOPT_NOBODY, true );
curl_setopt( $curl, CURLOPT_HEADER, true );
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $curl, CURLOPT_USERAGENT, get_user_agent_string() );
$data = curl_exec( $curl );
curl_close( $curl );
if( $data ) {
$content_length = "unknown";
$status = "unknown";
if( preg_match( "/^HTTP\/1\.[01] (\d\d\d)/", $data, $matches ) ) {
$status = (int)$matches[1];
}
if( preg_match( "/Content-Length: (\d+)/", $data, $matches ) ) {
$content_length = (int)$matches[1];
}
// http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
if( $status == 200 || ($status > 300 && $status <= 308) ) {
$result = $content_length;
}
}
return $result;
}
?>
Usage:
$file_size = curl_get_file_size( "http://stackoverflow.com/questions/2602612/php-remote-file-size-without-downloading-file" );

Try this code
function retrieve_remote_file_size($url){
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_NOBODY, TRUE);
$data = curl_exec($ch);
$size = curl_getinfo($ch, CURLINFO_CONTENT_LENGTH_DOWNLOAD);
curl_close($ch);
return $size;
}

As mentioned a couple of times, the way to go is to retrieve the information from the response header's Content-Length field.
However, you should note that
the server you're probing not necessarily implements the HEAD method(!)
there's absolutely no need to manually craft a HEAD request (which, again, might not even be supported) using fopen or alike or even to invoke the curl library, when PHP has get_headers() (remember: K.I.S.S.)
Use of get_headers() follows the K.I.S.S. principle and works even if the server you're probing does not support the HEAD request.
So, here's my version (gimmick: returns human-readable formatted size ;-)):
Gist: https://gist.github.com/eyecatchup/f26300ffd7e50a92bc4d (curl and get_headers version)
get_headers()-Version:
<?php
/**
* Get the file size of any remote resource (using get_headers()),
* either in bytes or - default - as human-readable formatted string.
*
* #author Stephan Schmitz <eyecatchup#gmail.com>
* #license MIT <http://eyecatchup.mit-license.org/>
* #url <https://gist.github.com/eyecatchup/f26300ffd7e50a92bc4d>
*
* #param string $url Takes the remote object's URL.
* #param boolean $formatSize Whether to return size in bytes or formatted.
* #param boolean $useHead Whether to use HEAD requests. If false, uses GET.
* #return string Returns human-readable formatted size
* or size in bytes (default: formatted).
*/
function getRemoteFilesize($url, $formatSize = true, $useHead = true)
{
if (false !== $useHead) {
stream_context_set_default(array('http' => array('method' => 'HEAD')));
}
$head = array_change_key_case(get_headers($url, 1));
// content-length of download (in bytes), read from Content-Length: field
$clen = isset($head['content-length']) ? $head['content-length'] : 0;
// cannot retrieve file size, return "-1"
if (!$clen) {
return -1;
}
if (!$formatSize) {
return $clen; // return size in bytes
}
$size = $clen;
switch ($clen) {
case $clen < 1024:
$size = $clen .' B'; break;
case $clen < 1048576:
$size = round($clen / 1024, 2) .' KiB'; break;
case $clen < 1073741824:
$size = round($clen / 1048576, 2) . ' MiB'; break;
case $clen < 1099511627776:
$size = round($clen / 1073741824, 2) . ' GiB'; break;
}
return $size; // return formatted size
}
Usage:
$url = 'http://download.tuxfamily.org/notepadplus/6.6.9/npp.6.6.9.Installer.exe';
echo getRemoteFilesize($url); // echoes "7.51 MiB"
Additional note: The Content-Length header is optional. Thus, as a general solution it isn't bullet proof!

Php function get_headers() works for me to check the content-length as
$headers = get_headers('http://example.com/image.jpg', 1);
$filesize = $headers['Content-Length'];
For More Detail : PHP Function get_headers()

Sure. Make a headers-only request and look for the Content-Length header.

one line best solution :
echo array_change_key_case(get_headers("http://.../file.txt",1))['content-length'];
php is too delicius
function urlsize($url):int{
return array_change_key_case(get_headers($url,1))['content-length'];
}
echo urlsize("http://.../file.txt");

I'm not sure, but couldn't you use the get_headers function for this?
$url = 'http://example.com/dir/file.txt';
$headers = get_headers($url, true);
if ( isset($headers['Content-Length']) ) {
$size = 'file size:' . $headers['Content-Length'];
}
else {
$size = 'file size: unknown';
}
echo $size;

The simplest and most efficient implementation:
function remote_filesize($url, $fallback_to_download = false)
{
static $regex = '/^Content-Length: *+\K\d++$/im';
if (!$fp = #fopen($url, 'rb')) {
return false;
}
if (isset($http_response_header) && preg_match($regex, implode("\n", $http_response_header), $matches)) {
return (int)$matches[0];
}
if (!$fallback_to_download) {
return false;
}
return strlen(stream_get_contents($fp));
}

Since this question is already tagged "php" and "curl", I'm assuming you know how to use Curl in PHP.
If you set curl_setopt(CURLOPT_NOBODY, TRUE) then you will make a HEAD request and can probably check the "Content-Length" header of the response, which will be only headers.

Try the below function to get Remote file size
function remote_file_size($url){
$head = "";
$url_p = parse_url($url);
$host = $url_p["host"];
if(!preg_match("/[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*/",$host)){
$ip=gethostbyname($host);
if(!preg_match("/[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*/",$ip)){
return -1;
}
}
if(isset($url_p["port"]))
$port = intval($url_p["port"]);
else
$port = 80;
if(!$port) $port=80;
$path = $url_p["path"];
$fp = fsockopen($host, $port, $errno, $errstr, 20);
if(!$fp) {
return false;
} else {
fputs($fp, "HEAD " . $url . " HTTP/1.1\r\n");
fputs($fp, "HOST: " . $host . "\r\n");
fputs($fp, "User-Agent: http://www.example.com/my_application\r\n");
fputs($fp, "Connection: close\r\n\r\n");
$headers = "";
while (!feof($fp)) {
$headers .= fgets ($fp, 128);
}
}
fclose ($fp);
$return = -2;
$arr_headers = explode("\n", $headers);
foreach($arr_headers as $header) {
$s1 = "HTTP/1.1";
$s2 = "Content-Length: ";
$s3 = "Location: ";
if(substr(strtolower ($header), 0, strlen($s1)) == strtolower($s1)) $status = substr($header, strlen($s1));
if(substr(strtolower ($header), 0, strlen($s2)) == strtolower($s2)) $size = substr($header, strlen($s2));
if(substr(strtolower ($header), 0, strlen($s3)) == strtolower($s3)) $newurl = substr($header, strlen($s3));
}
if(intval($size) > 0) {
$return=intval($size);
} else {
$return=$status;
}
if (intval($status)==302 && strlen($newurl) > 0) {
$return = remote_file_size($newurl);
}
return $return;
}

Here is another approach that will work with servers that do not support HEAD requests.
It uses cURL to make a request for the content with an HTTP range header asking for the first byte of the file.
If the server supports range requests (most media servers will) then it will receive the response with the size of the resource.
If the server does not response with a byte range, it will look for a content-length header to determine the length.
If the size is found in a range or content-length header, the transfer is aborted. If the size is not found and the function starts reading the response body, the transfer is aborted.
This could be a supplementary approach if a HEAD request results in a 405 method not supported response.
/**
* Try to determine the size of a remote file by making an HTTP request for
* a byte range, or look for the content-length header in the response.
* The function aborts the transfer as soon as the size is found, or if no
* length headers are returned, it aborts the transfer.
*
* #return int|null null if size could not be determined, or length of content
*/
function getRemoteFileSize($url)
{
$ch = curl_init($url);
$headers = array(
'Range: bytes=0-1',
'Connection: close',
);
$in_headers = true;
$size = null;
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2450.0 Iron/46.0.2450.0');
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_VERBOSE, 0); // set to 1 to debug
curl_setopt($ch, CURLOPT_STDERR, fopen('php://output', 'r'));
curl_setopt($ch, CURLOPT_HEADERFUNCTION, function($curl, $line) use (&$in_headers, &$size) {
$length = strlen($line);
if (trim($line) == '') {
$in_headers = false;
}
list($header, $content) = explode(':', $line, 2);
$header = strtolower(trim($header));
if ($header == 'content-range') {
// found a content-range header
list($rng, $s) = explode('/', $content, 2);
$size = (int)$s;
return 0; // aborts transfer
} else if ($header == 'content-length' && 206 != curl_getinfo($curl, CURLINFO_HTTP_CODE)) {
// found content-length header and this is not a 206 Partial Content response (range response)
$size = (int)$content;
return 0;
} else {
// continue
return $length;
}
});
curl_setopt($ch, CURLOPT_WRITEFUNCTION, function($curl, $data) use ($in_headers) {
if (!$in_headers) {
// shouldn't be here unless we couldn't determine file size
// abort transfer
return 0;
}
// write function is also called when reading headers
return strlen($data);
});
$result = curl_exec($ch);
$info = curl_getinfo($ch);
return $size;
}
Usage:
$size = getRemoteFileSize('http://example.com/video.mp4');
if ($size === null) {
echo "Could not determine file size from headers.";
} else {
echo "File size is {$size} bytes.";
}

Most answers here uses either CURL or are basing on reading headers. But in some certain situations you can use a way easier solution. Consider note on filesize()'s docs on PHP.net. You'll find there a tip saying: "As of PHP 5.0.0, this function can also be used with some URL wrappers. Refer to Supported Protocols and Wrappers to determine which wrappers support stat() family of functionality".
So, if your server and PHP parser is properly configured, you can simply use filesize() function, fed it with full URL, pointing to a remote file, which size you want to get, and let PHP do the all magic.

Try this: I use it and got good result.
function getRemoteFilesize($url)
{
$file_headers = #get_headers($url, 1);
if($size =getSize($file_headers)){
return $size;
} elseif($file_headers[0] == "HTTP/1.1 302 Found"){
if (isset($file_headers["Location"])) {
$url = $file_headers["Location"][0];
if (strpos($url, "/_as/") !== false) {
$url = substr($url, 0, strpos($url, "/_as/"));
}
$file_headers = #get_headers($url, 1);
return getSize($file_headers);
}
}
return false;
}
function getSize($file_headers){
if (!$file_headers || $file_headers[0] == "HTTP/1.1 404 Not Found" || $file_headers[0] == "HTTP/1.0 404 Not Found") {
return false;
} elseif ($file_headers[0] == "HTTP/1.0 200 OK" || $file_headers[0] == "HTTP/1.1 200 OK") {
$clen=(isset($file_headers['Content-Length']))?$file_headers['Content-Length']:false;
$size = $clen;
if($clen) {
switch ($clen) {
case $clen < 1024:
$size = $clen . ' B';
break;
case $clen < 1048576:
$size = round($clen / 1024, 2) . ' KiB';
break;
case $clen < 1073741824:
$size = round($clen / 1048576, 2) . ' MiB';
break;
case $clen < 1099511627776:
$size = round($clen / 1073741824, 2) . ' GiB';
break;
}
}
return $size;
}
return false;
}
Now, test like these:
echo getRemoteFilesize('http://mandasoy.com/wp-content/themes/spacious/images/plain.png').PHP_EOL;
echo getRemoteFilesize('http://bookfi.net/dl/201893/e96818').PHP_EOL;
echo getRemoteFilesize('https://stackoverflow.com/questions/14679268/downloading-files-as-attachment-filesize-incorrect').PHP_EOL;
Results:
24.82 KiB
912 KiB
101.85 KiB

To cover the HTTP/2 request, the function provided here https://stackoverflow.com/a/2602624/2380767 needs to be changed a bit:
<?php
/**
* Returns the size of a file without downloading it, or -1 if the file
* size could not be determined.
*
* #param $url - The location of the remote file to download. Cannot
* be null or empty.
*
* #return The size of the file referenced by $url, or -1 if the size
* could not be determined.
*/
function curl_get_file_size( $url ) {
// Assume failure.
$result = -1;
$curl = curl_init( $url );
// Issue a HEAD request and follow any redirects.
curl_setopt( $curl, CURLOPT_NOBODY, true );
curl_setopt( $curl, CURLOPT_HEADER, true );
curl_setopt( $curl, CURLOPT_RETURNTRANSFER, true );
curl_setopt( $curl, CURLOPT_FOLLOWLOCATION, true );
curl_setopt( $curl, CURLOPT_USERAGENT, get_user_agent_string() );
$data = curl_exec( $curl );
curl_close( $curl );
if( $data ) {
$content_length = "unknown";
$status = "unknown";
if( preg_match( "/^HTTP\/1\.[01] (\d\d\d)/", $data, $matches ) ) {
$status = (int)$matches[1];
} elseif( preg_match( "/^HTTP\/2 (\d\d\d)/", $data, $matches ) ) {
$status = (int)$matches[1];
}
if( preg_match( "/Content-Length: (\d+)/", $data, $matches ) ) {
$content_length = (int)$matches[1];
} elseif( preg_match( "/content-length: (\d+)/", $data, $matches ) ) {
$content_length = (int)$matches[1];
}
// http://en.wikipedia.org/wiki/List_of_HTTP_status_codes
if( $status == 200 || ($status > 300 && $status <= 308) ) {
$result = $content_length;
}
}
return $result;
}
?>

If you using laravel 7 <=
use Illuminate\Support\Facades\Http;
Http::head($url)->header('Content-Length');

PHP Curl check for file existence before downloading

I am writing a PHP program that downloads a pdf from a backend and save to a local drive. Now how do I check whether the file exists before downloading?
Currently I am using curl (see code below) to check and download but it still downloads the file which is 1KB in size.
$url = "http://wedsite/test.pdf";
$path = "C:\\test.pdf;"
downloadAndSave($url,$path);
function downloadAndSave($urlS,$pathS)
{
$fp = fopen($pathS, 'w');
$ch = curl_init($urlS);
curl_setopt($ch, CURLOPT_FILE, $fp);
$data = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
echo $httpCode;
//If 404 is returned, then file is not found.
if(strcmp($httpCode,"404") == 1)
{
echo $httpCode;
echo $urlS;
}
fclose($fp);
}
I want to check whether the file exists before even downloading. Any idea how to do it?

You can do this with a separate curl HEAD request:
curl_setopt($ch, CURLOPT_NOBODY, true);
$data = curl_exec($ch);
$httpCode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
When you actually want to download you can use set NOBODY to false.

Call this before your download function and it's done:
<?php function remoteFileExists($url) {
$curl = curl_init($url);
//don't fetch the actual page, you only want to check the connection is ok
curl_setopt($curl, CURLOPT_NOBODY, true);
//do request
$result = curl_exec($curl);
$ret = false;
//if request did not fail
if ($result !== false) {
//if request was ok, check response code
$statusCode = curl_getinfo($curl, CURLINFO_HTTP_CODE);
if ($statusCode == 200) {
$ret = true;
}
}
curl_close($curl);
return $ret;
}
?>

Since you are using HTTP to fetch a resource on the internet, what you really want to check is that the return code is a 404.
On some PHP installations, you can just use file_exists($url) out of the box. This does not work in all environments, however. http://www.php.net/manual/en/wrappers.http.php
Here is a function much like file_exists but for URLs, using curl:
<?php function curl_exists()
$file_headers = #get_headers($url);
if($file_headers[0] == 'HTTP/1.1 404 Not Found') {
$exists = false;
}
else {
$exists = true;
}
} ?>
source: http://www.php.net/manual/en/function.file-exists.php#75064
Sometimes the CURL extension isn't installed with PHP. In that case you can still use the socket library in the PHP core:
<?php function url_exists($url) {
$a_url = parse_url($url);
if (!isset($a_url['port'])) $a_url['port'] = 80;
$errno = 0;
$errstr = '';
$timeout = 30;
if(isset($a_url['host']) && $a_url['host']!=gethostbyname($a_url['host'])){
$fid = fsockopen($a_url['host'], $a_url['port'], $errno, $errstr, $timeout);
if (!$fid) return false;
$page = isset($a_url['path']) ?$a_url['path']:'';
$page .= isset($a_url['query'])?'?'.$a_url['query']:'';
fputs($fid, 'HEAD '.$page.' HTTP/1.0'."\r\n".'Host: '.$a_url['host']."\r\n\r\n");
$head = fread($fid, 4096);
$head = substr($head,0,strpos($head, 'Connection: close'));
fclose($fid);
if (preg_match('#^HTTP/.*\s+[200|302]+\s#i', $head)) {
$pos = strpos($head, 'Content-Type');
return $pos !== false;
}
} else {
return false;
}
} ?>
source: http://www.php.net/manual/en/function.file-exists.php#73175
An even faster function can be found here:
http://www.php.net/manual/en/function.file-exists.php#76246

In the first example above $file_headers[0] may contain more than or something other than 'HTTP/1.1 404 Not Found', e.g:
HTTP/1.1 404 Document+%2Fdb%2Fscotbiz%2Freports%2FR20131212%2Exml+not+found
So it's important to use some other test, e.g., regex, as '==' is not reliable.

Using curl as an alternative to fopen file resource for fgetcsv

Is it possible to make curl, access a url and the result as a file resource? like how fopen does it.
My goals:
Parse a CSV file
Pass it to fgetcsv
My obstruction: fopen is disabled
My chunk of codes (in fopen)
$url = "http://download.finance.yahoo.com/d/quotes.csv?s=USDEUR=X&f=sl1d1t1n&e=.csv";
$f = fopen($url, 'r');
print_r(fgetcsv($f));
Then, I am trying this on curl.
$curl = curl_init();
curl_setopt($curl, CURLOPT_VERBOSE, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, false);
curl_setopt($curl, CURLOPT_POST, true);
curl_setopt($curl, CURLOPT_POSTFIELDS, $param);
curl_setopt($curl, CURLOPT_URL, $url);
$content = #curl_exec($curl);
curl_close($curl);
But, as usual. $content already returns a string.
Now, is it possible for curl to return it as a file resource pointer? just like fopen? Using PHP < 5.1.x something. I mean, not using str_getcsv, since it's only 5.3.
My error
Warning: fgetcsv() expects parameter 1 to be resource, boolean given
Thanks

Assuming that by fopen is disabled you mean "allow_url_fopen is disabled", a combination of CURLOPT_FILE and php://temp make this fairly easy:
$f = fopen('php://temp', 'w+');
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $url);
curl_setopt($curl, CURLOPT_FILE, $f);
// Do you need these? Your fopen() method isn't a post request
// curl_setopt($curl, CURLOPT_POST, true);
// curl_setopt($curl, CURLOPT_POSTFIELDS, $param);
curl_exec($curl);
curl_close($curl);
rewind($f);
while ($line = fgetcsv($f)) {
print_r($line);
}
fclose($f);
Basically this creates a pointer to a "virtual" file, and cURL stores the response in it. Then you just reset the pointer to the beginning and it can be treated as if you had opened it as usual with fopen($url, 'r');

You can create a temporary file using fopen() and then fwrite() the contents into it. After that, the newly created file will be readable by fgetcsv(). The tempnam() function should handle the creation of arbitrary temporary files.
According to the comments on str_getcsv(), users without access to the command could try the function below. There are also various other approaches in the comments, make sure you check them out.
function str_getcsv($input, $delimiter = ',', $enclosure = '"', $escape = '\\', $eol = '\n') {
if (is_string($input) && !empty($input)) {
$output = array();
$tmp = preg_split("/".$eol."/",$input);
if (is_array($tmp) && !empty($tmp)) {
while (list($line_num, $line) = each($tmp)) {
if (preg_match("/".$escape.$enclosure."/",$line)) {
while ($strlen = strlen($line)) {
$pos_delimiter = strpos($line,$delimiter);
$pos_enclosure_start = strpos($line,$enclosure);
if (
is_int($pos_delimiter) && is_int($pos_enclosure_start)
&& ($pos_enclosure_start < $pos_delimiter)
) {
$enclosed_str = substr($line,1);
$pos_enclosure_end = strpos($enclosed_str,$enclosure);
$enclosed_str = substr($enclosed_str,0,$pos_enclosure_end);
$output[$line_num][] = $enclosed_str;
$offset = $pos_enclosure_end+3;
} else {
if (empty($pos_delimiter) && empty($pos_enclosure_start)) {
$output[$line_num][] = substr($line,0);
$offset = strlen($line);
} else {
$output[$line_num][] = substr($line,0,$pos_delimiter);
$offset = (
!empty($pos_enclosure_start)
&& ($pos_enclosure_start < $pos_delimiter)
)
?$pos_enclosure_start
:$pos_delimiter+1;
}
}
$line = substr($line,$offset);
}
} else {
$line = preg_split("/".$delimiter."/",$line);
/*
* Validating against pesky extra line breaks creating false rows.
*/
if (is_array($line) && !empty($line[0])) {
$output[$line_num] = $line;
}
}
}
return $output;
} else {
return false;
}
} else {
return false;
}
}

How use CURLOPT_WRITEFUNCTION when download a file by CURL

My Class for download file direct from a link:
MyClass{
function download($link){
......
$ch = curl_init($link);
curl_setopt($ch, CURLOPT_FILE, $File->handle);
curl_setopt($ch,CURLOPT_WRITEFUNCTION , array($this,'__writeFunction'));
curl_exec($ch);
curl_close($ch);
$File->close();
......
}
function __writeFunction($curl, $data) {
return strlen($data);
}
}
I want know how to use CRULOPT_WRITEFUNCTION when download file.
Above code if i remove line:
curl_setopt($ch,CURLOPT_WRITEFUNCTION , array($this,'__writeFunction'));
Then it will run good, i can download that file.But if i use CURL_WRITEFUNCTION option i can't download file.

I know this is an old question, but maybe my answer will be of some help for you or someone else. Try this:
function get_write_function(){
return function($curl, $data){
return strlen($data);
}
}
I don't know exactly what you want to do, but with PHP 5.3, you can do a lot with the callback. What's really great about generating a function in this way is that the values passed through the 'use' keyword remain with the function afterward, kind of like constants.
function get_write_function($var){
$obj = $this;//access variables or functions within your class with the object variable
return function($curl, $data) use ($var, $obj) {
$len = strlen($data);
//just an example - you can come up with something better than this:
if ($len > $var){
return -1;//abort the download
} else {
$obj->do_something();//call a class function
return $len;
}
}
}
You can retrieve the function as a variable as follows:
function download($link){
......
$var = 5000;
$write_function = $this->get_write_function($var);
$ch = curl_init($link);
curl_setopt($ch, CURLOPT_FILE, $File->handle);
curl_setopt($ch, CURLOPT_WRITEFUNCTION , $write_function);
curl_exec($ch);
curl_close($ch);
$File->close();
......
}
That was just an example. You can see how I used it here: Parallel cURL Request with WRITEFUNCTION Callback. I didn't actually test all of this code, so there may be minor errors. Let me know if you have problems, and I'll fix it.

<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_BUFFERSIZE, 8096);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, true);
curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, 'http://blog.ronnyristau.de/wp-content/uploads/2008/12/php.jpg');
$content = curl_exec($ch);
curl_close($ch);
$out = fopen('/tmp/out.png','w');
if($out){
fwrite($out, $content);
fclose($out);
}

Why do you use curl to download a file? Is there a special reason? You can simply use fopen and fread
I have written a small class for it.
<?php
class Utils_FileDownload {
private $source;
private $dest;
private $buffer;
private $overwrite;
public function __construct($source,$dest,$buffer=4096,$overwrite=false){
$this->source = $source;
$this->dest = $dest;
$this->buffer = $buffer;
$this->overwrite = $overwrite;
}
public function download(){
if($this->overwrite||!file_exists($this->dest)){
if(!is_dir(dirname($this->dest))){mkdir(dirname($this->dest),0755,true);}
if($this->source==""){
$resource = false;
Utils_Logging_Logger::getLogger()->log("source must not be empty.",Utils_Logging_Logger::TYPE_ERROR);
}
else{ $resource = fopen($this->source,"rb"); }
if($this->source==""){
$dest = false;
Utils_Logging_Logger::getLogger()->log("destination must not be empty.",Utils_Logging_Logger::TYPE_ERROR);
}
else{ $dest = fopen($this->dest,"wb"); }
if($resource!==false&&$dest!==false){
while(!feof($resource)){
$read = fread($resource,$this->buffer);
fwrite($dest,$read,$this->buffer);
}
chmod($this->dest,0644);
fclose($dest); fclose($resource);
return true;
}else{
return false;
}
}else{
return false;
}
}
}

It seems like cURL uses your function instead of writing to the request once CURLOPT_WRITEFUNCTION is specified.
So the correct solution would be :
MyClass{
function download($link){
......
$ch = curl_init($link);
curl_setopt($ch, CURLOPT_FILE, $File->handle);
curl_setopt($ch,CURLOPT_WRITEFUNCTION , array($this,'__writeFunction'));
curl_exec($ch);
curl_close($ch);
$File->close();
......
}
function __writeFunction($curl, $data) {
echo $data;
return strlen($data);
}
}
This can also handle binary files as well.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP - Safe way to download large files? - php

You'd want to download the remote file in chunks. This answer has a great example: How download big file using PHP (low memory usage)

Related

"$keyMaterial must be a string, resource, or OpenSSLAsymmetricKey" Error in firebase/php-jwt decode function

Get size of readfile transferred [duplicate]

PHP Curl check for file existence before downloading

Using curl as an alternative to fopen file resource for fgetcsv

How use CURLOPT_WRITEFUNCTION when download a file by CURL

Categories

Resources