Would calling getimagesize() on a file and checking if the returned value differs from false suffice to determine whether or not a file is an image?
Are there any other possibilities to determine if a file is an image in php, solutions that are more foolproof than simply checking the extension.
getimagesize() is a pretty reliable indication that the file is an image, yes.
It will determine if the image appears to have a valid header.
It (usually) won't determine if there is any corruption in the actual image data, which may show up as a messed up image or an error part way through loading the image.
You may also keep in mind that it is possible for a file to be a valid image file but also to conceal other data - either within metadata, image data, or after the end of the image data. So while getimagesize() may tell you you have a valid image, it doesn't necessarily mean the file isn't also valid as another type. Since JAR and ZIP files read from the end of the file, it's possible for a file to be both a valid image and a valid JAR/ZIP file, and JAR files are executable in a browser - the basis of the GIFAR exploit.
It would suffice to find out whether it's one of the supported file formats, yes. It actually parses the header bytes of the file, and is therefore very reliable.
It's the best method to use that is built into PHP.
Advanced tools like ImageMagick's identify command do essentially the same - consider them only if you need to support many more file formats than those supported by getimagesize() (their list is here, in the IMAGETYPE_* constants).
Related
I noticed that PHP Imagick changes the IDAT chunks when processing PNGs.
How exactly is this done? Is there a possibility to create IDAT chunks that remain unchanged? Is it possible to predict the outcome of Imagick?
Background information to this questions:
I wondered whether the following code (part of a PHP file upload) can prevent hiding PHP code (e.g. webshells) in PNGs:
$image = new Imagick('uploaded_file.png');
$image->stripImage();
$image->writeImage('secure_file.png');
Comments are stripped out, so the only way to bypass this filter is hiding the PHP payload in the IDAT chunk(s). As described here, it is theoretically possible but Imagick somehow reinterprets this Image data even if I set Compression and CompressionQuality to the values I used to create the PNG. I also managed to create a PNG whose ZLIB header remained unchanged by Imagick, but the raw compressed image data didn't. The only PNGs where I got identical input and output are the ones which went through Imagick before. I also tried to find the reason for this in the source code, but couldn't locate it.
I'm aware of the fact that other checks are necessary to ensure the uploaded file is actually a PNG etc. and PHP code in PNGs is no problem if the server is configured properly, but for now I'm just interested in this issue.
IDAT chunks can vary and still produce an identical image. The PNG spec unfortunately forces the IDAT chunks to form a single continuous data stream. What this means is that the data can be grouped/chunked differently, but when re-assembled into a single stream will be identical. Is the actual data different or is just the "chunking" changed? If the later, why does it matter if the image is identical? PNG is a lossless type of compression, stripping the metadata and even decompressing+recompressing an image shouldn't change any pixel values.
If you're comparing the compressed data and expecting it to be identical, it can be different and still yield an identical image. This is because FLATE compression uses an iterative process to find the best matches in previous data. The higher the "quality" number you give it, the more it will search for matches and shrink the output data size. With zlib, a level 9 deflate request will take a lot longer than the default and result in slightly smaller output data size.
So, please answer the following questions:
1) Are you trying to compare the compressed data before/after your strip operation to see if somehow the image changed? If so, then looking at the compressed data is not the way to do it.
2) If you want to strip metadata without any other aspect of the image file changing then you'll need to write the tool yourself. It's actually trivial to walk through PNG chunks and reassemble a new file while skipping the chunks you want to remove.
Answer my questions and I'll update my answer with more details...
I wondered whether the following code (part of a PHP file upload) can prevent hiding PHP code (e.g. webshells) in PNGs
You should never need to think about this. If you are worried about people hiding webshells in a file that is uploaded to your server, you are doing something wrong.
For example, serving those files through the PHP parser....which is the way a webshell could be invoked to attack a server.
From the Imagick readme file:
5) NEVER directly serve any files that have been uploaded by users directly through PHP, instead either serve them through the webserver, without invoking PHP, or use readfile to serve them within PHP.
readfile doesn't execute the file, it just sends it to the end-user without invoking it, and so completely prevents the type of attack you seem to be concerned about.
I am wondering what the difference between $_FILES["file"]["type"] and end(explode(".", $name), as well as an appropriate method to determine if the retrieved file type is really the correct content of the file.
For example, what's the best way to sort a file that was named "image.exe" and renamed "image.jpg."
I've seen a lot of talk about MIME types, but it seems that method has been deprecated.
This is the correct way to read an extension:
$ext = pathinfo($name, PATHINFO_EXTENSION);
The correct way to check if something is an image is to try and read it with an image tool, such as imagemagick or GD. GD is easier, but imagemagick is better at handling big images, such as one uploaded from a 12 megapixel camera.
If you're worried a jpg is really an exe, the only way to safely process it is to read it as a jpg and try to create a new jpg (typically resizing it at the same time). Gmail does this with image attachments.
Also beware a real jpeg might have some kind of exploit, so even if it is an image it is not safe to just pass it onto the user anyway. You really should resize it to create a new "safe" jpeg and then give that to the user. You could make a new jpg the same size if you want, but passing the original data on to other users is dangerous.
I wouldn't even allow an admin user for your website to access a jpg uploaded by a random internet user. It could be used to hack into the admin's PC.
$_FILES["file"]["type"] is supplied by the user's browser and hence useless for security.
The file extension, as you note, is easy to fake as well.
If you want to make sure an image is an image, the easiest way is to run getimagesize() on it.
If you want to make super sure and remove any and all metadata possibly embedded in the image, use GD's imagecreatefromstring() to copy the image to an empty canvas (but be prepared for a possible slight loss in quality.)
For other file types, there apparently is the Fileinfo library now. It uses the underlying Operating System's mime.magic file to estimate a file's type by checking certain characteristics and "header bytes" in the file.
If you're just working with image files, then getimagesize() should do it:
http://www.php.net/manual/en/function.getimagesize.php
It will return the image type as an element of the array it returns or FALSE if the file is not a valid image.
Beware of double extensions when using end(explode(".", $name).
Apache will read the right-most extensions if 2 extensions are given which map onto the same meta information. E.g. file.gif.html will be associated as an html file.
As stated above, the client-provided browser info is useless as a security measure.
Best bet - getimagesize();
As with everything, there are ways around it. A malicious code comment could be added to the picture that bypasses the getimagesize() check because the header is still valid
I want check uploading file type, tell please where way is reliable for this, in where case obtain I more exactly info about file type? in this:
$_files['uploaded_file']['type']
or in this:
$imgType = getimagesize($_files['uploaded_file']['tmp_name']);
$imgType['mime'];
?
$_FILES['uploaded_file']['type'] is user input - it is an arbitrary string defined by the client. It is therefore not safe to use for anything at all, ever.
getimagesize() is a much safer way to do it, but it does not protect you completely.
You also need to:
Store the file on your local server with a name completely of your own devising. It is not safe to rely on any user input for generating local file system file names.
Use GD to copy the pixel data from the source file to the destination file. getimagesize() only looks at the meta data associated with the file and does not look at the file data. It is possible to hide malicious code inside something that looks like an image. Once you have resampled the image, ensure the uploaded file is deleted from the server.
Ensure that the file is stored in the local file system with the minimum required permissions and a restrictive owner/group.
$_FILES[...]['type'] is never reliable, it's an arbitrary user-supplied value. getimagesize, exif_imagetype or finfo are the preferred ways to check what you've got. Also see Security threats with uploads.
I Trust getimagesize
Because any one can edit the uploaded file type Using any HTTP/HTTPS headers like tamper data addons in firefox
In practice you should probably use the meta data that comes with the image, ($_files['uploaded_file']['type']), however this could be tampered with before upload.
If you're just after the size, use this as it will be faster than actually measuring the image with getimagesize. However if you are after filetype information, it may be best to check beyond what the file 'says' it is as it's all to easy to sneak an executable through a naive check.
Using getimagesize() will provide you with more consistent mime information since it uses it's environment mime-type definitions for files (GD module mime database).
If you solely rely on $_FILES['uploaded_file']['type'], then it will contain mime-type as defined on client computer (i.e. browser) or browser may not even send mime-type.
One pretty stupid example of this is checking if($_FILES['uploaded_file']['type'] == 'image/jpeg'), which may fail when using IE6/7 that will send 'image/pjpeg' mime-type
An alternative to both of these is using mime_content_type() but it is being deprecated as PECL module so as of PHP 5.3.0 there are FileInfo functions that should work much more consistent with any file type - though I haven't tested it.
For more overview of mime-types check this SO article: How do I find the mime-type of a file with php?
Don't ever trust $_FILES["image"]["type"]. It takes whatever is sent from the browser, so don't trust this for the image type. so use getimagesize() or if you want to be on more safe side use finfo_open
Source :http://php.net/manual/en/reserved.variables.files.php
http://i.stack.imgur.com/JU3e2.jpg
So I am uploading some images but when I run it against some would be php validation This photos seems not to have a mime/type. should I measure against the text extension instead? (explode(".","img.jpg")) or is there a fix for this?
The MIME type for uploaded files is set by the client, not by the server. As such, it is completely unreliable.
If you want to check that a file actually is an image of a given format, you should rather use the exif_imagetype function.
http://php.net/manual/en/function.exif-imagetype.php
Okay. So I have about 250,000 high resolution images. What I want to do is go through all of them and find ones that are corrupted. If you know what 4scrape is, then you know the nature of the images I.
Corrupted, to me, is the image is loaded into Firefox and it says
The image “such and such image” cannot be displayed, because it contains errors.
Now, I could select all of my 250,000 images (~150gb) and drag-n-drop them into Firefox. That would be bad though, because I don't think Mozilla designed Firefox to open 250,000 tabs. No, I need a way to programmatically check whether an image is corrupted.
Does anyone know a PHP or Python library which can do something along these lines? Or an existing piece of software for Windows?
I have already removed obviously corrupted images (such as ones that are 0 bytes) but I'm about 99.9% sure that there are more diseased images floating around in my throng of a collection.
An easy way would be to try loading and verifying the files with PIL (Python Imaging Library).
from PIL import Image
v_image = Image.open(file)
v_image.verify()
Catch the exceptions...
From the documentation:
im.verify()
Attempts to determine if the file is broken, without actually decoding the image data. If this method finds any problems, it raises suitable exceptions. This method only works on a newly opened image; if the image has already been loaded, the result is undefined. Also, if you need to load the image after using this method, you must reopen the image file.
i suggest you check out imagemagick for this: http://www.imagemagick.org/
there you have a tool called identify which you can either use in combination with a script/stdout or you can use the programming interface provided
In PHP, with exif_imagetype():
if (exif_imagetype($filename) === false)
{
unlink($filename); // image is corrupted
}
EDIT: Or you can try to fully load the image with ImageCreateFromString():
if (ImageCreateFromString(file_get_contents($filename)) === false)
{
unlink($filename); // image is corrupted
}
An image resource will be returned on
success. FALSE is returned if the
image type is unsupported, the data is
not in a recognized format, or the
image is corrupt and cannot be loaded.
If your exact requirements are that it show correctly in FireFox you may have a difficult time - the only way to be sure would be to link to the exact same image loading source code as FireFox.
Basic image corruption (file is incomplete) can be detected simply by trying to open the file using any number of image libraries.
However many images can fail to display simply because they stretch a part of the file format that the particular viewer you are using can't handle (GIF in particular has a lot of these edge cases, but you can find JPEG and the rare PNG file that can only be displayed in specific viewers). There are also some ugly JPEG edge cases where the file appears to be uncorrupted in viewer X, but in reality the file has been cut short and is only displaying correctly because very little information has been lost (FireFox can show some cut off JPEGs correctly [you get a grey bottom], but others result in FireFox seeming the load them half way and then display the error message instead of the partial image)
You could use imagemagick if it is available:
if you want to do a whole folder
identify "./myfolder/*" >log.txt 2>&1
if you want to just check a file:
identify myfile.jpg