How can I tell if a URL points to an image? - php

I have a php page with a text input where the user is supposed to paste a remote URL of an image, and I will have to store it in the server and display it to the user. Now the problem is, I don't trust a user will always provide a proper image url, and I don't want them to upload a pdf or other file, or a huge, few gb worth of file. Now I can check the extension, but that isn't very helpful, and I hear I can check the mime-type, but I don't know how I can open the file once and check all the validations like mime-type and file size in one go, and then copy the file over. Moreover, since the file will be pretty much served as it is(with a minor name change), I would like to know if it is possible to make sure that the file doesn't have any injected virus or problematic code.
Any suggestions appreciated.

You can use exif_imagetype() to see if its an image.
If you want to be 100% sure that its not malware or something weird. its a good idea to use the GD library and save it via the GD library. So there is no dangerous code inside.

Well there are really multiple things that can be done here. I would suggest using cURL as your mechanism for transferring the file (rather than file_get_contents() or similar). The reason for this is that you can first send a HEAD request against the resource to just get the header information before committing to actually download it. From the headers, you should be able to evaluate the file name, file size, mime-type information, etc. Note that NONE of this information should be trusted, but it at least gives you a sanity check before committing to the file download.
Once you have done the sanity check, you can download the file into a local snadbox directory. This should not be a web-accessible directory. You could use exif_imagetype() to determine if the file is indeed an image of the type you are interested in.
Assuming this all looks good, I would just do the last bit of cleanup-and renaming in GD library (perhaps use imagecreatefrom*() functions to make final image from the temp download file).

With Curl you have no problem with https, you may store a file and check it.
Here is the code to check content-type for image then file is checked with exif_imagetype() (enable php_mbstring and php_exif extentions).
$url = 'https://www.google.com/images/icons/ui/doodle_plus/doodle_plus_google_logo_on_grey.gif';
$userAgent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT,60);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_exec( $ch ) ;
if(!curl_errno($ch))
{
$type = curl_getinfo($ch, CURLINFO_CONTENT_TYPE);
if ( stripos($type, 'image') !== FALSE )
{
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_HEADER, false);
$filename = tempnam('/path/to/store/file/', 'prefix');
$fp=fopen($filename,'wb');
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_exec($ch);
fclose($fp);
if ( exif_imagetype($filename) !== FALSE )
{
echo "100% IMAGE!";
// take it!
}
unlink($filename);
}
}
curl_close($ch);

Related

How to figure out type of the image before downloading it?

I know I can save an image using the following method:
$input = 'http://images.websnapr.com/?size=size&key=Y64Q44QLt12u&url=http://google.com';
$output = 'google.com.jpg'; // << How to save the image with proper extension?
file_put_contents($output, file_get_contents($input));
But what if I don't know the format of the downloaded image? What if it's "png"? How can I figure out the type of target image before saving it?
The best thing to do is just download it and rename the file later. That way, you only have to make one request.
Another thing you can do however is make an HTTP HEAD request. This gets all of the response headers, including the Content-Type header, so you can decide if you want that data or not before you download the file. However, not all servers support HEAD requests.
In any case, cURL is the easiest way to do this:
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, 'http://www.example.com/something');
curl_setopt($ch, CURLOPT_NOBODY, true);
curl_setopt($ch, CURLOPT_HEADER, true);
$headers = curl_exec($ch);

Check if URL is an image?

This has been asked before, but most, if not all answers, were either solutions that didn't work in all situations or were unnecessary (like using getimagesize(), which downloads the entire image).
How would I check if a given URL leads to an image without having to hardcode image extensions (like .png', .jpg, etc.)?
You can read only file header by CURL and then detect if the header is image header or not.
curl = curl_init();
curl_setopt($curl, CURLOPT_URL, $file_url);
curl_setopt($curl, CURLOPT_NOBODY, true);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
$header = curl_exec($curl);
curl_close($curl);
if (strstr($header, 'image/png')) {
//file is image
}

Download image using PHP but image is htaccess redirected?

I want to download an image to my server using PHP. This image's html only allows target="_self" meaning it can only be downloaded from the browser apparently. I try to access the image directly in the browser and I get redirected. Is there any way to download this image onto my server via PHP? Maybe I'm missing an option in cURL?
Thanks!
Yes, you have to tell CURL to follow redirects --- try this function:
function wgetImg($img, $pathToSaveTo) {
$ch = curl_init($img);
$fp = fopen($pathToSaveTo, 'wb');
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_AUTOREFERER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
}

fopen working for some urls but not others

I am using fopen to check the existence of an image file (and as a precursor for extracting the image from the external url).
It is working perfectly fine for most images, for example,
SY300.jpg">http://ecx.images-amazon.com/images/I/51DbiFInDUL.SY300.jpg
But it is not working for images from a website like Victoria's Secret, for example:
http://dm.victoriassecret.com/product/428x571/V360249.jpg
Is this a permissions problem? And if so, is there any work around?
$url = "http://dm.victoriassecret.com/product/428x571/V360249.jpg";
$handle = #fopen($url,'r');
if($handle !== false){
return true;
}
For successful link, $handle returns "Resource ID #11", but for unsuccessful link like Victoria's Secret, $handle returns nothing.
Additionally, exif_imagetype is not returning anything for the images (we have the exif extension installed).
Is there any work around for this? We are building a bookmarklet that allows users to extract pictures from sites. We noticed that other bookmarklets are able to get around this (i.e. Pinterest) and are able to get the pictures from Victoria's Secret.
It's don't show a data due to hotlink protection defined in .htaccess file. You need to grab a data as a client. I tried you can using CURL if you put HTTP header information of user agent read contents and save to file.
In my solutions your problem is solved.
Note: Be note for filetype on remote server that are using in header, there are for an example GIF file image/gif so you can put another filetype example for PNG.
Example of solution that WORKS:
error_reporting(E_ALL);
ini_set('display_errors', '1');
$url = "http://dm.victoriassecret.com/product/428x571/V360249.jpg";
function getimg($url) {
$headers[] = 'Accept: image/gif, image/x-bitmap, image/jpeg, image/pjpeg';
$headers[] = 'Connection: Keep-Alive';
$headers[] = 'Content-type: application/x-www-form-urlencoded;charset=UTF-8';
$user_agent = 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)';
$process = curl_init($url);
curl_setopt($process, CURLOPT_HTTPHEADER, $headers);
curl_setopt($process, CURLOPT_HEADER, 0);
curl_setopt($process, CURLOPT_USERAGENT, $user_agent);
curl_setopt($process, CURLOPT_TIMEOUT, 30);
curl_setopt($process, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($process, CURLOPT_FOLLOWLOCATION, 1);
$return = curl_exec($process);
curl_close($process);
return $return;
}
$imgurl = $url;
$imagename= basename($imgurl);
if(file_exists($imagename)){continue;}
$image = getimg($imgurl);
file_put_contents($imagename,$image);
Note: If you are on Linux filesystem be sure that root folder is writeable (CHMOD) otherwise will not save a file in a path.
And so you are talking about EXIF data, and how is CURL downloaded image is identical to orignal I've checked with md5sum between original image on victoriasecret server and downloaded using CURL. However, a results are SAME, IDENTICAL so you can grab and analyzing downloaded data for future... and delete if you don't need anymore.
On a Linux platform you can use for testing identical files by sum of md5 result using md5sum:
md5sum V360249.jpg V360249_original.jpg
893a47cbf0b4fbe4d1e49d9d4480b31d V360249.jpg
893a47cbf0b4fbe4d1e49d9d4480b31d V360249_original.jpg
A result are same and you can be sure that exif_imagetype information is correctly and identical.
By removing the # symbol, I was able to get a more meaningful error:
Warning: fopen(http://dm.victoriassecret.com/product/428x571/V360249.jpg) [function.fopen]: failed to open stream: HTTP request failed! in [removedSomedatahere]/test.php on line 5
It does similar in curl, wget, and fopen with no other options set. I would hypothesize that this has something to do with cookies or other setting not being set, but I don't have a direct answer for you. Hopefully that helps a little.
[Edited - Solution based on comments]
So it appears that using curl may be a better option in this case if you also set the user agent. The site was blocking based on the user agent. So the solution is to set a commonly used browser as the agent.
Here is an example of setting the user agent:
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
Please see this link to understand how to set the user agent in curl.

cURL script will not download images; Instead renders junk

I have been using the following function to download pictures from a distributor for use on our website as was described here:
$url = "http://covers.stl-distribution.com/7819/lg-9781936417445.jpg";
$itemnum = 80848;
$path = "www.gullions.com/localstore/test/test.jpg";
header('Content-type: image/jpeg');
$ch = curl_init($url);
$fp = fopen("http://www.gullions.com/localstore/test/test.jpg", 'wb');
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_exec($ch);
curl_close($ch);
fclose($fp);
Instead of downloading and saving the picture, it only prints crazy characters to the screen. I know that means that for some reason the browser is trying to render the picture but most likely doesn't recognize the file type. But I can't seem to find the problem. You can see the crazy output by navigating here. To verify that the image wasn't downloaded and saved, you can navigate here. Also, FTP also shows that no file was downloaded. If you navigate to the original picture's download url you'll see that the file we are trying to download does in fact exist.
I have contacted my host and verified that no settings have been changed with the server, that cURL is functioning properly, and have even rolled back my browser to verify that a recent update didn't cause the issue. I created a test file by removing the function from the application and have tried running it separately but have only had the same results.
Add:
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
as by default this parameter is false. Docs read:
TRUE to return the transfer as a string of the return value of
curl_exec() instead of outputting it out directly.
EDIT
Setting CURLOPT_RETURNTRANSFER make curl_exec() return data, so it should be written manually, like this:
$url = "http://covers.stl-distribution.com/7819/lg-9781936417445.jpg";
$ch = curl_init($url);
$fp = fopen("./test.jpg", 'wb');
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_HEADER, 0);
fwrite($fp, curl_exec($ch));
curl_close($ch);
fclose($fp);
Also this code, that uses CURLOPT_FILE works for me just fine:
$url = "http://covers.stl-distribution.com/7819/lg-9781936417445.jpg";
$ch = curl_init($url);
$fp = fopen("./test.jpg", 'wb');
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
so I basically suspect that your file handle is not valid, therefore cURL falls back to default output. Try this code with elementary error checking (you should ALWAYS check for errors):
$url = "http://covers.stl-distribution.com/7819/lg-9781936417445.jpg";
$fp = fopen("./test.jpg", 'wb');
if( $fp != null ) {
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_FILE, $fp);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_exec($ch);
curl_close($ch);
fclose($fp);
} else {
exit('ERROR: Failed to open file');
}
Note that my examples write to the same folder scripts sits in. It must work unless your server got file permissions messed badly. If it works for you, then investigate if (usually) user your scripts runs as can write to your target folder.
You haven't told your browser what type of file to expect, so it's probably defaulting to text/plain.
You need at least:
header('Content-type: image/jpeg');
As well, curl by default outputs whatever it fetches, unless you explicitly tell it you want to have the fetched data returned, or saved directly to file.

Categories