I am creating file upload script and I'm looking for the best techniques and practices to validate uploaded files.
Allowed extensions are:
$allowed_extensions = array('gif','jpg','png','swf','doc','docx','pdf','zip','rar','rtf','psd');
Here's the list of what I'm doing.
Checking file extension
$path_info = pathinfo($filename);
if( !in_array($path_info['extension'], $allowed_extensions) ) {
die('File #'.$i.': Incorrent file extension.');
}
Checking file mime type
$allowed_mimes = array('image/jpeg','image/png','image/gif','text/richtext','multipart/x-zip','application/x-shockwave-flash','application/msword','application/pdf','application/x-rar-compressed','image/vnd.adobe.photoshop');
if( !in_array(finfo_file($finfo, $file), $allowed_mimes) ) {
die('File #'.$i.': Incorrent mime type.');
}
Checking file size.
What should I do to make sure uploaded files are valid files? I noticed strange thing. I changed .jpg file extension to .zip and... it was uploaded. I thought it will have incorrect MIME type but after that I noticed I'm not checking for a specific type but if a specific MIME type exist in array. I'll fix it later, that presents no problems for me (of course if you got any good solution/idea, do not hesitate to share it, please).
I know what to do with images (try to resize, rotate, crop, etc.), but have no idea how to validate other extensions.
Now's time for my questions.
Do you know good techniques to validate such files? Maybe I should unpack archives for .zip/.rar files, but what about documents (doc, pdf)?
Will rotating, resizing work for .psd files?
Basically I thought that .psd file has following mime: application/octet-stream but when
I tried to upload .psd file it showed me (image/vnd.adobe.photoshop). I'm a bit confused about this. Do files always have the same MIME type?
Also, I cannot force code block to work. Does anyone have a guess as to why?
Lots of file formats have a pretty standard set of starting bytes to indicate the format. If you do a binary read for the first several bytes and test them against the start bytes of known formats it should be a fairly reliable way to confirm the file type matches the extension.
For example, JPEG's start bytes are 0xFF, 0xD8; so something like:
$fp = fopen("filename.jpg", "rb");
$startbytes = fread($fp, 8);
$chunked = str_split($startbytes,1);
if ($chunked[0] == 0xFF && $chunked[1] == 0xD8){
$exts[] = "jpg";
$exts[] = "jpeg";
}
then check against the exts.
could work.
If you want to validate images, a good thing to do is use getimagesize(), and see if it returns a valid set of sizes - or errors out if its an invalid image file. Or use a similar function for whatever files you are trying to support.
The key is that the file name means absolutely nothing. The file extensions (.jpg, etc), the mime types... are for humans.
The only way you can guarantee that a file is of the correct type is to open it and evaluate it byte by byte. That is, obviously, a pretty daunting task if you want to try to validate a large number of file types. At the simplest level, you'd look at the first few bytes of the file to ensure that they match what is expected of a file of that type.
Related
Checking for mime type in php is pretty easy but as far as I know mime can be spoofed. The attacker can upload a php script with for example jpeg mime type. One thing that comes to mind is to check the file extension of the uploaded file and make sure it matches the mime type. All of this is assuming the upload directory is browser accessible.
Question: Are there any other techniques for preventing "bad files" from getting in with mime type spoofing?
Short answer: No.
Longer answer:
Comparing the extension and making sure that it matches the MIME type doesn't really prevent anything. As was said in the comments, it's even easier to modify a file extension. MIME type and extension are only to be meant as hints, there's no inherent security in them.
Ensuring that incoming files do no harm is very dependent on what your purpose for them is going to be. In your case I understood that you are expecting images. So what you could do is perform some sanity checks first: scan the first couple of bytes to see if the files contain the relevant image header signatures - all relevant image formats have these.
The "signature headers" help you to decide what kind of image format a file tries to impersonate. In a next step you could check if the rest of the contents are compliant with the underlying image format. This would guarantee you that the file is really an image file of that specific format.
But even then, the file could be carefully crafted in a way that when you display the image, a popular library used to display that image (e.g. libpng etc.) would run into a buffer overflow that the attacker found in that library.
Unfortuantely there's no way to actively prevent this besides not allowing any input from the client side at all.
Caution - this answer is now obsolete
The documentation for getimagesize explicitly states "Do not use getimagesize() to check that a given file is a valid image.
In case of Images
Check the extension with a list of allowed ones (ex. ".jpg", ".jpeg", ".png")
Check the uploaded file itself by running getimagesize on the file, it will return FALSE if it's not an image.
Other types of upload
Check the allowed extensions (ex. ".pdf")
Check that the mime type of the file corresponds to the extension
Sample code:
function getRealMimeType($filename) {
$finfo = new finfo(FILEINFO_MIME, "/usr/share/misc/magic");
if (!$finfo) {
echo "Opening fileinfo database failed";
return "";
}
return $finfo->file($filename);
}
See finfo_file documentation.
"mime_content_type" and "exif_imagetype" should not be used for security purposes because both of them allow spoofed files!
More details from link below:
https://straighttips.blogspot.com/2021/01/php-upload-spoofed-files.html
File extension check in order to block dangerous file extensions such as ".php" is the best way to go if files are going to be uploaded somewhere in the "public_html" folder!
Antivirus scan may be a nice alternative because some spoofed files are detected by antivirus!
Check the extension.
<?php
$okFiles = array('jpg', 'png', 'gif');
$pathInfo = pathinfo($filename);
if(in_array($pathInfo['extension'], $okFiles)) {
//Upload
}
else {
//Error
}
?>
You can also - like you said - check if the extension match the MIME type, but it's much more easy to just check the extension.
Btw why do you care about the MIME type?
Not so much of a coding problem here, but a general question relating to security.
I'm currently working on a project that allows user submitted content.
A key part of this content is the user uploads a Zip file.
The zip file should contain only mp3 files.
I then unzip those files to a directory on the server, so that we can stream the audio on the website for users to listen to.
My concern is that this opens us up for some potentially damaging zip files.
I've read about 'zipbombs' in the past, and obviously don't want a malicious zip file causing damage.
So, is there a safe way of doing this?
Can i scan the zip file without unzipping it first, and if it contains anything other than MP3's delete it or flag a warning to the admin?
If it makes a difference i'm developing the site on Wordpress.
I currently use the built in upload features of wordpress to let the user upload the zip file to our server (i'm not sure if there's any form of security within wordpress already to scan the zip file?)
Code, only extract MP3s from zip, ignore everthing else
$zip = new ZipArchive();
$filename = 'newzip.zip';
if ($zip->open($filename)!==TRUE) {
exit("cannot open <$filename>\n");
}
for ($i=0; $i<$zip->numFiles;$i++) {
$info = $zip->statIndex($i);
$file = pathinfo($info['name']);
if(strtolower($file['extension']) == "mp3") {
file_put_contents(basename($info['name']), $zip->getFromIndex($i));
}
}
$zip->close();
I would use use something like id3_get_version (http://www.php.net/manual/en/function.id3-get-version.php) to ensure the contents of the file is mp3 too
Is there a reason they need to ZIP the MP3s? Unless there's a lot of text frames in the ID3v2 info in the MP3s, the file size will actually increase with the ZIP due to storage of the dictionary.
As far as I know, there isn't any way to scan a ZIP without actually parsing it. The data are opaque until you run each bit through the Huffman dictionary. And how would you determine what file is an MP3? By file extension? By frames? MP3 encoders have a loose standard (decoders have a more stringent spec) which makes it difficult to scan the file structure without false negatives.
Here are some ZIP security risks:
Comment data that causes buffer overflows. Solution: remove comment data.
ZIPs that are small in compressed size but inflate to fill the filesystem (classic ZIP bomb). Solution: check inflated size before inflating; check dictionary to ensure it has many entries, and that the compressed data isn't all 1's.
Nested ZIPs (related to #2). Solution: stop when an entry in the ZIP archive is itself ZIP data. You can determine this by checking for the central directory's marker, the number 0x02014b50 (hex, always little-endian in ZIP - http://en.wikipedia.org/wiki/Zip_%28file_format%29#Structure).
Nested directory structures, intended to exceed the filesystem's limit and hang the deflating process. Solution: don't unzip directories.
So, either do a lot of scrubbing and integrity checks, or at the very least use PHP to scan the archive; check each file for its MP3-ness (however you do that - extension and the presence of MP3 headers? You can't rely on them being at byte 0, though. http://en.wikipedia.org/wiki/MP3#File_structure) and deflated file size (http://www.php.net/manual/en/function.zip-entry-filesize.php). Bail out if an inflated file is too big, or if there are any non-MP3s present.
Use the following code the file names inside a .zip archive:
$zip = zip_open('test.zip');
while($entry = zip_read($zip)) {
$file_name = zip_entry_name($entry);
$ext = pathinfo($file_name, PATHINFO_EXTENSION);
if(strtoupper($ext) !== 'MP3') {
notify_admin($file_name);
}
}
Note that following code will only have look at the extension. Meaning that user can upload anything what has a MP3 extension. To really check if the file is an mp3 you'll have to unpack it. I would advice you to do that in a temporary directory.
After the file is unpacked you may analyze it using, for example ffmpeg or whatever. Having detailed data about bitrate, track lenght, etc will be interesting in any case.
If the analysis fails you can flag the file.
Checking for mime type in php is pretty easy but as far as I know mime can be spoofed. The attacker can upload a php script with for example jpeg mime type. One thing that comes to mind is to check the file extension of the uploaded file and make sure it matches the mime type. All of this is assuming the upload directory is browser accessible.
Question: Are there any other techniques for preventing "bad files" from getting in with mime type spoofing?
Short answer: No.
Longer answer:
Comparing the extension and making sure that it matches the MIME type doesn't really prevent anything. As was said in the comments, it's even easier to modify a file extension. MIME type and extension are only to be meant as hints, there's no inherent security in them.
Ensuring that incoming files do no harm is very dependent on what your purpose for them is going to be. In your case I understood that you are expecting images. So what you could do is perform some sanity checks first: scan the first couple of bytes to see if the files contain the relevant image header signatures - all relevant image formats have these.
The "signature headers" help you to decide what kind of image format a file tries to impersonate. In a next step you could check if the rest of the contents are compliant with the underlying image format. This would guarantee you that the file is really an image file of that specific format.
But even then, the file could be carefully crafted in a way that when you display the image, a popular library used to display that image (e.g. libpng etc.) would run into a buffer overflow that the attacker found in that library.
Unfortuantely there's no way to actively prevent this besides not allowing any input from the client side at all.
Caution - this answer is now obsolete
The documentation for getimagesize explicitly states "Do not use getimagesize() to check that a given file is a valid image.
In case of Images
Check the extension with a list of allowed ones (ex. ".jpg", ".jpeg", ".png")
Check the uploaded file itself by running getimagesize on the file, it will return FALSE if it's not an image.
Other types of upload
Check the allowed extensions (ex. ".pdf")
Check that the mime type of the file corresponds to the extension
Sample code:
function getRealMimeType($filename) {
$finfo = new finfo(FILEINFO_MIME, "/usr/share/misc/magic");
if (!$finfo) {
echo "Opening fileinfo database failed";
return "";
}
return $finfo->file($filename);
}
See finfo_file documentation.
"mime_content_type" and "exif_imagetype" should not be used for security purposes because both of them allow spoofed files!
More details from link below:
https://straighttips.blogspot.com/2021/01/php-upload-spoofed-files.html
File extension check in order to block dangerous file extensions such as ".php" is the best way to go if files are going to be uploaded somewhere in the "public_html" folder!
Antivirus scan may be a nice alternative because some spoofed files are detected by antivirus!
Check the extension.
<?php
$okFiles = array('jpg', 'png', 'gif');
$pathInfo = pathinfo($filename);
if(in_array($pathInfo['extension'], $okFiles)) {
//Upload
}
else {
//Error
}
?>
You can also - like you said - check if the extension match the MIME type, but it's much more easy to just check the extension.
Btw why do you care about the MIME type?
Why is it that on some mp3s files, when I call mime_content_type($mp3_file_path) it returns application/octet-stream?
This is my code:
if (!empty($_FILES)) {
$tempFile = $_FILES['Filedata']['tmp_name'];
$image = getimagesize($tempFile);
$mp3_mimes = array('audio/mpeg', 'audio/x-mpeg', 'audio/mp3', 'audio/x-mp3', 'audio/mpeg3', 'audio/x-mpeg3', 'audio/mpg', 'audio/x-mpg', 'audio/x-mpegaudio');
if (in_array(mime_content_type($tempFile), $mp3_mimes)) {
echo json_encode("mp3");
} elseif ($image['mime']=='image/jpeg') {
echo json_encode("jpg");
} else{
echo json_encode("error");
}
}
EDIT:
I've found a nice class here:
http://www.zedwood.com/article/127/php-calculate-duration-of-mp3
MP3 files are a strange beast when it comes to identifying them. You can have an MP3 stored with a .wav container. There can be an ID3v2 header at the start of the file. You can embed an MP3 essentially within any file.
The only way to detect them reliably is to parse slowly through the file and try to find something that looks like an MP3 frame. A frame is the smallest unit of valid MP3 data possible, and represents (going off memory) 0.028 seconds of audio. The size of the frame varies based on bitrate and sampling rate, so you can't just grab the bitrate/sample rate of the first frame and assume all the other frames will be the same size - a VBR mp3 must be parsed in its entirety to calculate the total playing time.
All this boils down to that identifying an MP3 by using PHP's fileinfo and the like isn't reliable, as the actual MP3 data can start ANYWHERE in a file. fileinfo only looks at the first kilobyte or two of data, so if it says it's not an MP3, it might very well be lying because the data started slightly farther in.
application/octet-stream is probably mime_content_type s fallback type when it fails to recognize a file.
The MP3 in that case is either not a real MP3 file, or - more likely - the file is a real MP3 file, but does not contain the "magic bytes" the PHP function uses to recognize the format - maybe because it's a different sub-format or has a variable bitrate or whatever.
You could try whether getid3 gives you better results. I've never worked with it but it looks like a pretty healthy library to get lots of information out of multimedia files.
If you have access to PHP's configuration, you may also be able to change the mime.magic file PHP uses, although I have no idea whether a better file exists that is able to detect your MP3s. (The mime.magic file is the file containing all the byte sequences that mime_content_type uses to recognize certain file types.)
Fleep is the answer to this question. Allowing application/octet-stream is dangerous since .exe and other dangerous files can display with that mime type.
See this answer https://stackoverflow.com/a/52570299/14482130
Simple question. Is there a way to only allow txt files upon uploading? I've looked around and all I find is text/php, which allows PHP.
$uploaded_type=="text/php
When you upload a file with PHP its stored in the $_FILES array. Within this there is a key called "type" which has the mime type of the file EG $_FILES['file']['type']
So to check it is a txt file you do
if($_FILES['file']['type'] == 'text/plain'){
//Do stuff with it.
}
It's explained very well here. Also, don't rely on file extentions it's very unreliable.
Simply put: there's no way. Browsers don't consistently support type limiters on file upload fields (AFAIK that was planned or even is integrated into the HTML standard, but barely implemented at best). Both the file extension and mime-type information are user supplied and hence can't be trusted.
You can really only try to parse the file and see if it validates to whatever format you expect, that's the only reliable way. What you need to be careful with are buffer overflows and the like caused by maliciously malformed files. If all you want are text files, that's probably not such a big deal though.
You could check the mime type of the uploading file. In codeIgniter, this code is used in the upload library:
$this->file_type = preg_replace("/^(.+?);.*$/", "\\1", $_FILES[$field]['type']);
The variable $this->file_type then used to check the upload configuration, to see if the uploaded file is in allowed type or not. You can see the complete code in the CodeIgniter upload library file.
You need to check the file extension of the uploaded file.
There is Pear HttpUpload, it supports this.