Download multiple images from remote server with PHP (a LOT of images) - php

I am trying to download lots of files from an external server (approx. 3700 images). These images go from 30KB to 200KB each.
When I use the copy() function on 1 image, it works. When I use it in a loop, all I get are 30B images (empty images files).
I tried using copy, cURL, wget, and file_get_contents. Every time, I either get a lot of empty files, or nothing at all.
Here are the codes I tried:
wget:
exec('wget http://mediaserver.centris.ca/media.ashx?id=ADD4B9DD110633DDDB2C5A2D10&t=pi&f=I -O SIA/8605283.jpg');
copy:
if(copy($donnees['PhotoURL'], $filetocheck)) {
echo 'Photo '.$filetocheck.' updated<br/>';
}
cURL:
$ch = curl_init();
$source = $data[PhotoURL];
curl_setopt($ch, CURLOPT_URL, $source);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec ($ch);
curl_close ($ch);
$destination = $newfile;
$file = fopen($destination, "w+");
fputs($file, $data);
fclose($file);
Nothing seems to be working properly. Unfortunately, I don't have much choice to download all these files at once, and I need a way to make it work as soon as possible.
Thanks a lot, Antoine

Getting them one by one might be quite slow. Consider splitting them into packs of 20-50 images and grabbing them with multiple threads. Here's the code to get you started:
$chs = array();
$cmh = curl_multi_init();
for ($t = 0; $t < $tc; $t++)
{
$chs[$t] = curl_init();
curl_setopt($chs[$t], CURLOPT_URL, $targets[$t]);
curl_setopt($chs[$t], CURLOPT_RETURNTRANSFER, 1);
curl_multi_add_handle($cmh, $chs[$t]);
}
$running=null;
do {
curl_multi_exec($cmh, $running);
} while ($running > 0);
for ($t = 0; $t < $tc; $t++)
{
$path_to_file = 'your logic for file path';
file_put_contents($path_to_file, curl_multi_getcontent($chs[$t]));
curl_multi_remove_handle($cmh, $chs[$t]);
curl_close($chs[$t]);
}
curl_multi_close($cmh);
I used that approach to grab a few millions of images recently, since one by one would take up to a month.
The amount of images you grab at once should depend on their expected size and your memory limits.

I used this function for that and worked pretty well.
function saveImage($urlImage, $title){
$fullpath = '../destination/'.$title;
$ch = curl_init ($urlImage);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER,1);
$rawdata=curl_exec($ch);
curl_close ($ch);
if(file_exists($fullpath)){
unlink($fullpath);
}
$fp = fopen($fullpath,'x');
$r = fwrite($fp, $rawdata);
setMemoryLimit($fullpath);
fclose($fp);
return $r;
}
Combined with this other one to prevent memory overflow:
function setMemoryLimit($filename){
set_time_limit(50);
$maxMemoryUsage = 258;
$width = 0;
$height = 0;
$size = ini_get('memory_limit');
list($width, $height) = getimagesize($filename);
$size = $size + floor(($width * $height * 4 * 1.5 + 1048576) / 1048576);
if ($size > $maxMemoryUsage) $size = $maxMemoryUsage;
ini_set('memory_limit',$size.'M');
}

Related

Downloading Images from URL in PHP

I have a file, where I read URLs line by line. The URL are links to images. All the image-links work in the browser. Now I want to download them to my current directory. For the record I need to use a PHP-script.
This is how I get my Images:
for ($j; $j< $count;$j++)
{
// Get Image-URL as String and then do getImage
$image = $arr[$j];
$newname = $j;
getImage($image, $newname);
}
function getImage($image, $newname)
{
$ch = curl_init($image);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER,1);
$rawdata=curl_exec ($ch);
curl_close ($ch);
$fp = fopen("$newname.jpg",'w');
fwrite($fp, $rawdata);
fclose($fp);
}
But the problem is, I get all the images, only the last one is viewable. The others I can't open and they only have 1KB.
So what did I do wrong?
Plus I need to also download png-Files later on, so can I then just change the $newname.jpg into $newname.png?
Thanks in advance, I really need an answer fast, been sitting here for hours, trying to figure it out.
Why not use stream_copy_to_stream?
function getImage($image, $newname, $fileType = "jpg")
{
$in = fopen($image, "r");
$out = fopen("$newname.$fileType",'w');
stream_copy_to_stream($in, $out);
fclose($in); fclose($out);
}
You can also play with stream_set_read_buffer
stream_set_read_buffer($in, 4096);
Just tested this with our avatar pics.
$data = [
"https://www.gravatar.com/avatar/42eec337b6404f97aedfb4f39d4991f2?s=32&d=identicon&r=PG&f=1",
"https://www.gravatar.com/avatar/2700a034dcbecff07c55d4fef09d110b?s=32&d=identicon&r=PG&f=1",
];
foreach ($data as $i => $image) getImage($image, $i, "png");
Works perfectly.
Debug: Read the remote headers for more info?
$httpresponseheader
var_dump($http_response_header);
or stream_get_meta_data
var_dump(stream_get_meta_data($in), stream_get_meta_data($out));

How Can php know if the image has been loaded fully?

I wrote a PHP script which simply gets URL for images and tries to download/save them on server.
My problem here is that sometimes the image is not fully loaded and the codes blow only save images partially. I did some research but couldn't figure out if I can add something to it, so it can check whether it is saving the full image or not.
Images sizes and other properties of images are random so I can't check it with those factors unless there is a way that I can get those info before loading them image.
Thank you all.
if ( $leng >= "5" ) {
define('UPLOAD_DIR', dirname(__FILE__) . '/files/');
$length = 512000;
$handle = fopen($url, 'rb');
$filename = UPLOAD_DIR . substr(strrchr($url, '/'), 1);
$write = fopen($filename, 'w');
while (!feof($handle))
{
$buffer = fread($handle, $length);
fwrite($write, $buffer);
}
fclose($handle);
fclose($write);
}else {Echo "failed";}
I think that using cURL is a better solution than using fopen for url. Check this out:
$file = fopen($filename, 'wb'); //Write it as a binary file - like image
$c = curl_init($url);
curl_setopt($c, CURLOPT_FILE, $file);
curl_setopt($c, CURLOPT_FOLLOWLOCATION, true); //Allow cURL to follow redirects (but take note that this does not work with safemode enabled)
curl_exec($c);
$httpCode = curl_getinfo($c, CURLINFO_HTTP_CODE); //200 means OK, you may check it later, just for sure
curl_close($c);
fclose($file);
Partially based on Downloading a large file using curl

PHP Prevent Remote Image from downloading over 5mb while checking for Width and Height

I need to know how to limit PHP from downloading a file more than 5mb returning an error if it does. But if it passes I want it to check for the width and height. But I don't want it to download the file again to check for the width and height. Here is my current code:
<?php
list($width, $height) = getimagesize('http://www.spacetelescope.org/static/archives/images/large/heic0601a.jpg');
echo $width.' x '.$height;
thanks.
Because a jpeg image doesn't actually store its dimensions somewhere in the file you cannot do this.
To prevent it from trying to get the dimensions of the image if its larger than 5MB you can do a head request to the server to get the size of the file. see PHP: Remote file size without downloading file for how to do this.
It would be something like:
function getRemoteFileSize($remoteFile){
$ch = curl_init($remoteFile);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
$data = curl_exec($ch);
curl_close($ch);
if ($data === false)
return 0;
if (preg_match('/Content-Length: (\d+)/', $data, $matches))
return (int)$matches[1];
return 0;
}
$remoteFile = 'http://www.spacetelescope.org/static/archives/images/large/heic0601a.jpg';
if(getRemoteFileSize($remoteFile) < 5 * 1024 * 1024){
list($width, $height) = getimagesize($remoteFile);
echo $width.' x '.$height;
}

verify if a given URL is a valid Image/size using HEAD method

How should one check an URL with HEAD method to check if the given URL is an image and it not exceed 200x100 px?
This is inherently impossible.
The HEAD method does not return any information about the resource's content.
You can by download enough data (not entire image), and then check...
function ranger($url){
$headers = array(
"Range: bytes=0-32768"
);
$curl = curl_init($url);
curl_setopt($curl, CURLOPT_HTTPHEADER, $headers);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($curl);
curl_close($curl);
return $data;
}
$start = microtime(true);
$url = "http://news.softpedia.com/images/news2/Debian-Turns-15-2.jpeg";
$raw = ranger($url);
$im = imagecreatefromstring($raw);
$width = imagesx($im);
$height = imagesy($im);
$stop = round(microtime(true) - $start, 5);
echo $width." x ".$height." ({$stop}s)";
test...
640 x 480 (0.20859s)
Loading 32kb of data worked for me. (image is 90kb)

Downloading big files and writing it locally

which is the best way to download from php large files without consuming all server's memory?
I could do this (bad code):
$url='http://server/bigfile';
$cont = file_get_contents($url);
file_put_contents('./localfile',$cont);
This example loads entry remote file in $cont and this could exceed memory limit.
Is there a safe function (maybe built-in) to do this (maybe stream_*)?
Thanks
You can use curl and the option CURLOPT_FILE to save the downloaded content directly to a file.
set_time_limit(0);
$fp = fopen ('file', 'w+b');
$ch = curl_init('http://remote_url/file');
curl_setopt($ch, CURLOPT_TIMEOUT, 75);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_exec($ch);
curl_close($ch);
fclose($fp);
Here is a function I use when downloading large files. It will avoid loading the entire file into the buffer. Instead, it will write to the destination as it receives the bytes.
function download($file_source, $file_target)
{
$rh = fopen($file_source, 'rb');
$wh = fopen($file_target, 'wb');
if (!$rh || !$wh) {
return false;
}
while (!feof($rh)) {
if (fwrite($wh, fread($rh, 1024)) === FALSE) {
return false;
}
}
fclose($rh);
fclose($wh);
return true;
}

Categories