Download thousands of images from URL PHP/AJAX - php

I have an array of image URLs and want to download images in my server through curl or file_get_content, but when I am downloading files I am getting corrupt jpg, png and webp files in my server, and I want to download multiple files one by one through ajax, like when I run an ajax request it should grab 10 or 20 images from the array and download them and run another ajax request right after one success and grab another 20 images from the array,
$images = array('http://ensemblepakistan.com/wp-content/uploads/2018/06/11.jpg',
'https://ensemblepakistan.com/wp-content/uploads/2018/08/SHK-242-3.jpg',
'https://ensemblepakistan.com/wp-content/uploads/2018/08/SHK-242-1.jpg',
'https://ensemblepakistan.com/wp-content/uploads/2018/08/SHK-242-2.jpg',
'https://ensemblepakistan.com/wp-content/uploads/2018/08/SHBK-258-3.jpg',
'https://ensemblepakistan.com/wp-content/uploads/2018/08/SHBK-258-2.jpg',
'https://ensemblepakistan.com/wp-content/uploads/2018/08/SHBK-258-1.jpg',
'https://ensemblepakistan.com/wp-content/uploads/2018/10/SHBK-313-3-min.jpg'
);
foreach($images as $image) {
$name = basename($image);
$newfile = $_SERVER['DOCUMENT_ROOT'] .'/test/'.$name;
$ch = curl_init($image);
$fp = fopen($newfile, 'wb');
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
}
Here's the code that i'm using right now

Aside from Malwarebytes blocking this WP pages - the SSL cert seems to have an issue so remove https:// and make all calls to http://; I was able to easily grab your images via code like this:
<?php
$images = array('http://ensemblepakistan.com/wp-content/uploads/2018/06/11.jpg',
'http://ensemblepakistan.com/wp-content/uploads/2018/08/SHK-242-3.jpg',
'http://ensemblepakistan.com/wp-content/uploads/2018/08/SHK-242-1.jpg'
);
$imgcount = 0;
foreach($images as $image) {
$img = file_get_contents($image);
file_put_contents("image-".$imgcount.".jpg",$img);
$imgcount = $imgcount + 1;
}
?>
Your images likely were corrupt because the SSL handshake (site has a bad cert) or the MIME type which you were saving as; your method should've worked also had you went through non-SSL URLs; the above example should suffice though.

Related

Preventing hotlinking on Amazon S3 with PHP

In my site i am doing a image protection section to cut the costs of Amazon S3. So as a part of that i have made anti hot-linking links for images using php (to the best of my understanding).
<video src="/media.php?id=711/video.mp4"></video>
Then my media.php file looks like:
if (isset($_SERVER['HTTP_REFERER']) && $_SERVER['HTTP_REFERER'] != 'example.com')
{
header('HTTP/1.1 503 Hot Linking Not Permitted');
header("Content-type: image/jpeg");
readfile("http://example.com/monkey.jpg");
exit;
}
$url = 'https://s3.example.com';
$s3 = "$url/$file";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $s3);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$results = explode("\n", trim(curl_exec($ch)));
foreach($results as $line) {
if (strtok($line, ':') == 'Content-Type') {
$parts = explode(":", $line);
$mime = trim($parts[1]);
}
}
header("Content-type: $mime");
readfile($s3);
To make it less obvious, I have set up a rewrite to route /711/video.mp4 into cdn/711/video.mp4. That way, it doesn't look like there is an PHP script.
RewriteRule ^cdn/([0-9a-zA-Z_]+)/([0-9a-zA-Z-\w.-]+)([\w.-]+)?$ media\.php\?id=$1/$2 [QSA,L]
This above system is working fine but the issue is when i load image directly the loading time of the image is 237ms and when the image is loaded through this PHP script the loading time is 1.65s
I have shared the entire code i have, so if there is any chance of improvement in it please guide me in the right direction so i can make changes accordingly.
The reason your script takes longer than querying s3 directly is that you've added a lot of overhead to the image request. Your webserver needs to download the image and then forward it to your users. That is almost definitely your biggest bottleneck.
The first thing I would suggest doing is using the S3 API. This still uses curl() under the hood but has optimizations that should nominally increase performance. This would also allow you to make your s3 bucket "private" which would make obscuring the s3 url unnecessary.
All of that said, the recommended way to prevent hotlinking with AWS is to use cloudfront with referrer checking. How that is done is outlined by AWS here.
If you don't want to refactor your infrastructure, the best way to improve performance is likely to implement a local cache. At its most basic, that would look something like this:
$cacheDir = '/path/to/a/local/directory/';
$cacheFileName = str_replace('/', '_', $file);
if (file_exists($cacheDir . $cacheFileName)){
$mime = mime_content_type($cacheDir . $cacheFileName);
$content = file_get_contents($cacheDir . $cacheFileName);
} else {
$url = 'https://s3.example.com';
$s3 = "$url/$file";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $s3);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_NOBODY, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$results = explode("\n", trim(curl_exec($ch)));
foreach($results as $line) {
if (strtok($line, ':') == 'Content-Type') {
$parts = explode(":", $line);
$mime = trim($parts[1]);
}
}
$content = file_get_contents($s3);
file_put_contents($cacheDir . $cacheFileName, $content);
}
header("Content-type: $mime");
echo $content;
This stores a copy of the file locally so that the server does not need to download it from s3 every time it is requested. That should reduce your overhead somewhat, though it will not do as well as a purely AWS based solution. With this solution you'll also have to add ways of cache-breaking, periodically expiring the cache, etc. Just to reiterate, you shouldn't just copy/paste this into a production environment, it is a start but is more a proof of concept than production ready code.

Get Image With file_get_contents it return Not Found Error

I have one Image on another server (Image).but when i get this image With file_get_contents() function it will return
Not Found Error
and generate this Image.
file_put_contents(destination_path, file_get_contents(another_server_path));
plz help me. if there are another way to get those image.
Try this.
There is problem with URL Special character.then you have to decode some special character from url basename.
$imgfile = 'http://www.lagrolla.com.au/image/m fr 137 group.jpg';
$destinationPath = '/path/to/folder/';
$filename = basename($imgpath);
$imgpath = str_replace($filename,'',$imgpath).rawurldecode($filename);
copy($imgfile,$destination_path.$filename);
Another way to download copy file from another server is using curl:
$ch = curl_init('http://www.lagrolla.com.au/image/data/m%20fr%20137%20group.jpg');
$destinationPath = '/path/to/folder/filenameWithNoSpaces.jpg';
$fp = fopen($destinationPath, 'wb');
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
Note: It is bad practice to save images with spaces in file name, so you should save this file with proper name.

PHP: Get metadata of a remote .mp3 file

I am looking for a function that gets the metadata of a .mp3 file from a URL (NOT local .mp3 file on my server).
Also, I don't want to install http://php.net/manual/en/id3.installation.php or anything similar to my server.
I am looking for a standalone function.
Right now i am using this function:
<?php
function getfileinfo($remoteFile)
{
$url=$remoteFile;
$uuid=uniqid("designaeon_", true);
$file="../temp/".$uuid.".mp3";
$size=0;
$ch = curl_init($remoteFile);
//==============================Get Size==========================//
$contentLength = 'unknown';
$ch1 = curl_init($remoteFile);
curl_setopt($ch1, CURLOPT_NOBODY, true);
curl_setopt($ch1, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch1, CURLOPT_HEADER, true);
curl_setopt($ch1, CURLOPT_FOLLOWLOCATION, true); //not necessary unless the file redirects (like the PHP example we're using here)
$data = curl_exec($ch1);
curl_close($ch1);
if (preg_match('/Content-Length: (\d+)/', $data, $matches)) {
$contentLength = (int)$matches[1];
$size=$contentLength;
}
//==============================Get Size==========================//
if (!$fp = fopen($file, "wb")) {
echo 'Error opening temp file for binary writing';
return false;
} else if (!$urlp = fopen($url, "r")) {
echo 'Error opening URL for reading';
return false;
}
try {
$to_get = 65536; // 64 KB
$chunk_size = 4096; // Haven't bothered to tune this, maybe other values would work better??
$got = 0; $data = null;
// Grab the first 64 KB of the file
while(!feof($urlp) && $got < $to_get) { $data = $data . fgets($urlp, $chunk_size); $got += $chunk_size; } fwrite($fp, $data); // Grab the last 64 KB of the file, if we know how big it is. if ($size > 0) {
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RESUME_FROM, $size - $to_get);
curl_exec($ch);
// Now $fp should be the first and last 64KB of the file!!
#fclose($fp);
#fclose($urlp);
} catch (Exception $e) {
#fclose($fp);
#fclose($urlp);
echo 'Error transfering file using fopen and cURL !!';
return false;
}
$getID3 = new getID3;
$filename=$file;
$ThisFileInfo = $getID3->analyze($filename);
getid3_lib::CopyTagsToComments($ThisFileInfo);
unlink($file);
return $ThisFileInfo;
}
?>
This function downloads 64KB from a URL of an .mp3 file, then returns the array with the metadata by using getID3 function (which works on local .mp3 files only) and then deletes the 64KB's previously downloaded.
Problem with this function is that it is way too slow from its nature (downloads 64KB's per .mp3, imagine 1000 mp3 files.)
To make my question clear : I need a fast standalone function that reads metadata of a remote URL .mp3 file.
This function downloads 64KB from a URL of an .mp3 file, then returns the array with the metadata by using getID3 function (which works on local .mp3 files only) and then deletes the 64KB's previously downloaded. Problem with this function is that it is way too slow from its nature (downloads 64KB's per .mp3, imagine 1000 mp3 files.)
Yeah, well what do you propose? How do you expect to get data if you don't get data? There is no way to have a generic remote HTTP server send you that ID3 data. Really, there is no magic. Think about it.
What you're doing now is already pretty solid, except that it doesn't handle all versions of ID3 and won't work for files with more than 64KB of ID3 tags. What I would do to improve it to is to use multi-cURL.
There are several PHP classes available that make this easier:
https://github.com/jmathai/php-multi-curl
$mc = EpiCurl::getInstance();
$results[] = $mc->addUrl(/* Your stream URL here /*); // Run this in a loop, 10 at a time or so
foreach ($results as $result) {
// Do something with the data.
}

php function file_get_contents() gets only few KB of remote file

Hello i want to download remote zip, which is about 8 MB big. I wrote simple script
set_time_limit(0);
$zip = file_get_contents('http://web.tld/folder/download/getfile.do?filename=file.zip&_lang=Lang');
file_put_contents('zip_files/file.zip',$zip);
it works but stored file is not 8 MB but only 52 KB.
Its same if i use
set_time_limit(0);
$url = 'http://web.tld/folder/download/getfile.do?filename=file.zip&_lang=Lang';
$path = 'zip_files/file.zip';
/* get and save remote data without exceeding php memory limit */
$fp = fopen($path, 'w');
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_FILE, $fp);
$data = curl_exec($ch);
curl_close($ch);
fclose($fp);
so maybe i have to use some stream option ?! Thank you
ps: i tried Snoopy library (http://sourceforge.net/projects/snoopy/) and its also same, only 52KB :P
include "libs/Snoopy-2.0/Snoopy.class.php";
$snoopy = new Snoopy;
$snoopy->submit($url);
print $snoopy->results;
Look inside saved file (use any text editor) it's possible to see not zip, just a page with wrong URL or something.

Download file from google drive api to my server using php

1 - I have configure google picker and it is working fine and I select the file from picker and get the file id.
2 - After refresh token etc all process I get the file metadata and get the file export link
$downloadExpLink = $file->getExportLinks();
$downloadUrl = $downloadExpLink['application/vnd.openxmlformats-officedocument.wordprocessingml.document'];
3 - After that I use this
if ($downloadUrl) {
$request = new Google_HttpRequest($downloadUrl, 'GET', null, null);
$httpRequest = Google_Client::$io->authenticatedRequest($request);
if ($httpRequest->getResponseHttpCode() == 200)
{
$content = $httpRequest->getResponseBody();
print_r($content);
} else {
// An error occurred.
return null;
}
and get this response
[responseBody:protected] => PK��DdocProps/app.xml���
�0D���k�I[ѫ��m
��!����A={���}�
2G�Z�g�V��Bľ֧�n�Ҋ�ap!����fb�d����k}Ikc�_`t<+�(�NJ̽�����#��EU-
�0#���P����........
4 - I use some cURL functions to get file from google drive and save it to server. IN server directory a file created but cropped. I use this code
$downloadExpLink = $file->getExportLinks();
$downloadUrl = $downloadExpLink['application/vnd.openxmlformats-officedocument.wordprocessingml.document'];
//$downloadUrl value is
/*https://docs.google.com/feeds/download/documents/export/Export?id=1CEt1ya5kKLtgK************IJjDEY5BdfaGI&exportFormat=docx*/
When I put this url into browser it will download file successfully but when I use this url to fetch file with cURL or any php code and try to save it on server it saves corrupted file.
$ch = curl_init();
$source = $downloadUrl;
curl_setopt($ch, CURLOPT_URL, $source);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec ($ch);
curl_close ($ch);
$destination = "test/afile5.docx";
$file = fopen($destination, "w+");
fputs($file, $data);
fclose($file);
It result a corrupted file stored on server but whe I use this code to get any file other then google drive I download it successfully on server.
Can any one please help that how to download file from $downloadUrl to my server using php ?

Categories