Security of unzipping user submitted files

Security of unzipping user submitted files - php

Not so much of a coding problem here, but a general question relating to security.
I'm currently working on a project that allows user submitted content.
A key part of this content is the user uploads a Zip file.
The zip file should contain only mp3 files.
I then unzip those files to a directory on the server, so that we can stream the audio on the website for users to listen to.
My concern is that this opens us up for some potentially damaging zip files.
I've read about 'zipbombs' in the past, and obviously don't want a malicious zip file causing damage.
So, is there a safe way of doing this?
Can i scan the zip file without unzipping it first, and if it contains anything other than MP3's delete it or flag a warning to the admin?
If it makes a difference i'm developing the site on Wordpress.
I currently use the built in upload features of wordpress to let the user upload the zip file to our server (i'm not sure if there's any form of security within wordpress already to scan the zip file?)

Code, only extract MP3s from zip, ignore everthing else
$zip = new ZipArchive();
$filename = 'newzip.zip';
if ($zip->open($filename)!==TRUE) {
exit("cannot open <$filename>\n");
}
for ($i=0; $i<$zip->numFiles;$i++) {
$info = $zip->statIndex($i);
$file = pathinfo($info['name']);
if(strtolower($file['extension']) == "mp3") {
file_put_contents(basename($info['name']), $zip->getFromIndex($i));
}
}
$zip->close();
I would use use something like id3_get_version (http://www.php.net/manual/en/function.id3-get-version.php) to ensure the contents of the file is mp3 too

Is there a reason they need to ZIP the MP3s? Unless there's a lot of text frames in the ID3v2 info in the MP3s, the file size will actually increase with the ZIP due to storage of the dictionary.
As far as I know, there isn't any way to scan a ZIP without actually parsing it. The data are opaque until you run each bit through the Huffman dictionary. And how would you determine what file is an MP3? By file extension? By frames? MP3 encoders have a loose standard (decoders have a more stringent spec) which makes it difficult to scan the file structure without false negatives.
Here are some ZIP security risks:
Comment data that causes buffer overflows. Solution: remove comment data.
ZIPs that are small in compressed size but inflate to fill the filesystem (classic ZIP bomb). Solution: check inflated size before inflating; check dictionary to ensure it has many entries, and that the compressed data isn't all 1's.
Nested ZIPs (related to #2). Solution: stop when an entry in the ZIP archive is itself ZIP data. You can determine this by checking for the central directory's marker, the number 0x02014b50 (hex, always little-endian in ZIP - http://en.wikipedia.org/wiki/Zip_%28file_format%29#Structure).
Nested directory structures, intended to exceed the filesystem's limit and hang the deflating process. Solution: don't unzip directories.
So, either do a lot of scrubbing and integrity checks, or at the very least use PHP to scan the archive; check each file for its MP3-ness (however you do that - extension and the presence of MP3 headers? You can't rely on them being at byte 0, though. http://en.wikipedia.org/wiki/MP3#File_structure) and deflated file size (http://www.php.net/manual/en/function.zip-entry-filesize.php). Bail out if an inflated file is too big, or if there are any non-MP3s present.

Use the following code the file names inside a .zip archive:
$zip = zip_open('test.zip');
while($entry = zip_read($zip)) {
$file_name = zip_entry_name($entry);
$ext = pathinfo($file_name, PATHINFO_EXTENSION);
if(strtoupper($ext) !== 'MP3') {
notify_admin($file_name);
}
}
Note that following code will only have look at the extension. Meaning that user can upload anything what has a MP3 extension. To really check if the file is an mp3 you'll have to unpack it. I would advice you to do that in a temporary directory.
After the file is unpacked you may analyze it using, for example ffmpeg or whatever. Having detailed data about bitrate, track lenght, etc will be interesting in any case.
If the analysis fails you can flag the file.

Related

is it possible to limit the size of uncompressed zipped files in unix?

i am implementing a service where i have to extract a zip file which was uploaded by a user.
in order to avoid disk overflow, i have to limit BOTH zip file size AND unzipped files size.
is there anyway to do that (check unzipped files size) BEFORE unzipping?
(for security reasons).
i am using unix, called from a PHP script.

Since you're working in PHP, use its ZipArchive library.
$zip = zip_open($file);
$extracted_size = 0;
while (($zip_entry = zip_read($zip))) {
$extracted_size += zip_entry_filesize($zip_entry);
if ($extracted_size > $max_extracted_size) {
// abort
}
}
// do the actual unzipping
You might want to put a limit on the number of files as well, or add a constant amount per file, to take into account the size of the metadata for each file. While you can't easily get a precise figure for that, adding a few hundred bytes to a couple of kilobytes per file is a reasonable estimate.

Ways to stop people from uploading GIFs with injections in them?

I have a PHP website where people can fill out help-tickets. It allows them to upload screenshots for their ticket. I allow gif, psd, bmp, jpg, png, tif to be uploaded. Upon receiving the upload, the PHP script ignores the file extension. It identifies the filetype using only the MIME information, which for these filetypes is always stored within the first 12 bytes of the file.
Someone uploaded several GIFs, which when viewed with a browser, the browser said it was invalid, and my virus scanner alerted me that it was a injection (or something like that). See below for a zip file containing these GIFs.
I don't think only checking header info is adequate. I have heard that an image can be completely valid, but also contain exploit code.
So I have two basic questions:
Does anyone know how they did injected bad stuff into a GIF (while still keeping a valid GIF MIME type)? If I know this, maybe I can check for it at upload time.
How can I prevent someone from uploading files like this?
I am on shared hosting so I can't install a server-side virus
scanner.
Submitting the info to a online virus scanning website
might be too slow.
Is there any way to check myself using a PHP class that checks for these things?
Will resize the image using GD fail if it's not valid? Or would the exploit still slip through and be in the resized image? If it fails, that would be ideal because then I could use resizing as a technique to see if they are valid.
Update: Everyone, thanks for replying so far. I am attempting to look on the server for the GIFs that were uploaded. I will update this post if I find them.
Update 2: I located the GIFs for anyone interested. I put them in a zip file encrypted with password "123". It is located here (be careful there are multiple "Download" buttons on this hosting site -- some of them are for ads) http://www.filedropper.com/badgifs. The one called 5060.gif is flagged by my antivirus as a trojan (TR/Graftor.Q.2). I should note that these files were upload prior to me implementing the MIME check of the first 12 bytes. So now, I am safe for these particular ones. But I'd still like to know how to detect an exploit hiding behind a correct MIME type.
Important clarification: I'm only concerned about the risk to the PC who downloads these files to look at them. The files are not a risk to my server. They won't be executed. They are stored using a clean name (a hex hash output) with extension of ".enc" and I save them to disk in an encrypted state using an fwrite filter:
// Generate random key to encrypt this file.
$AsciiKey = '';
for($i = 0; $i < 20; $i++)
$AsciiKey .= chr(mt_rand(1, 255));
// The proper key size for the encryption mode we're using is 256-bits (32-bytes).
// That's what "mcrypt_get_key_size(MCRYPT_RIJNDAEL_128, MCRYPT_MODE_CBC)" says.
// So we'll hash our key using SHA-256 and pass TRUE to the 2nd parameter, so we
// get raw binary output. That will be the perfect length for the key.
$BinKey = hash('SHA256', '~~'.TIME_NOW.'~~'.$AsciiKey.'~~', true);
// Create Initialization Vector with block size of 128 bits (AES compliant) and CBC mode
$InitVec = mcrypt_create_iv(mcrypt_get_iv_size(MCRYPT_RIJNDAEL_128, MCRYPT_MODE_CBC), MCRYPT_RAND);
$Args = array('iv' => $InitVec, 'key' => $BinKey, 'mode' => 'cbc');
// Save encoded file in uploads_tmp directory.
$hDest = fopen(UPLOADS_DIR_TMP.'/'.$Hash.'.enc', 'w');
stream_filter_append($hDest, 'mcrypt.rijndael-128', STREAM_FILTER_WRITE, $Args);
fwrite($hDest, $Data);
fclose($hDest);

As for the first question, you'll never really know if you're not able to retrieve any logs or the images in question, because there are many things these exploit may have targeted and depending on what's the target the way the exploit was put into the file can be completely different.
Edit: W32/Graftor is a generic name for programs that appear to have trojan-like characteristics.
After opening the file 5060.gif in a hex editor, I noticed the program is actually a renamed windows program. Although it's not a browser exploit and thus harmless unless it's actually opened and executed, you'll have to make sure it isn't served with the MIME type defined by the uploader because a user may still be tricked into opening the program; see the answer to the second question.
As for the second question: to prevent any exploit code from being run or a user, you'll have to make sure all files are stored with a safe extension in the filename so they are served with the correct MIME type. For example, you can use this regular expression to check the file name:
if(!preg_match ( '/\\.(gif|p(sd|ng)|tiff?|jpg)$/' , $fileName)){
header("415 Unsupported Media Type");
die("File type not allowed.");
}
Also make sure you're serving the files with the correct Content Type; make sure you don't use the content type specified with the uploaded file when serving the file to the user. If you rely on the Content-Type specified by the uploader, the file may be served as text/html or anything similar and will be parsed by the users' browser as such.
Please note that this only protects against malicious files exploiting vulnerabilities in the users' browser, the image parser excluded.
If you're trying to prevent exploits against the server you'll have to make sure that you won't let the PHP parser execute the contents of the image and that the image library you are using to process the image does not have any known vulnerabilities.
Also note that this code does not defend you against images that contain an exploit for the image parser used by the users browser; to defend against this, you can check if getimagesize() evaluates to true as suggested by Jeroen.
Note that using getimagesize() alone isn't sufficient if you don't check file names and make sure files are served with the correct Content-Type header, because completely valid images can have HTML / PHP code embedded inside comments.

You can use the getimagesize() function for this. If the image is invalid it will simply return false.
if (getimagesize($filename)) {
// valid image
} else {
// not a valid image
}
It's worth noting that this isn't 100% safe either, but it's the best you can do as far as I know.
Read more about this here.

I dont know much about image formats, but recreating the images and then storing the result, I feel has a good chance of eliminating unnecessary tricky stuff. Especially if you strip all the meta data like comments and all the other types of optional embedded fields that some image formats support.

You can try phpMussel on any php script that accepts uploads. The file will be scanned using ClamAV signatures, plus some internal heuristic signatures that look for this type of intrusion specifically.

1) You're never going to know exactly what the problem was if you deleted the .gif and your A/V didn't write a log.
Q: Is the .gif in question still on the server?
Q: Have you checked your A/V logs?
2) There are many different possible exploits, which may or may not have anything directly to do with the .gif file format. Here is one example:
http://www.phpclasses.org/blog/post/67-PHP-security-exploit-with-GIF-images.html
3) To mitigate the risk in this example, you should:
a) Only upload files (any files) to a secure directory on the server
b) Only serve files with specific suffixes (.gif, .png, etc)
c) Be extremely paranoid about anything that's uploaded to your site (especially if you then allow other people to download it from your site!)

On very usefull tip to prevent problems with injected PHP came from my host's system admin: I have a site where people can uploaded their own content. I wanted to make sure the directory where uploaded images are served from doesn't run any PHP. That way someone could even post a picture named "test.php" and it would still NEVER be parsed by PHP if it was in the upload directory. The solution was simple: In the folder the uploaded content is served from put the following .htacess:
RewriteEngine On
RewriteRule \.$ - [NC]
php_flag engine off
This will switch off the PHP engine for the folder, thus stopping any attempt to launch any PHP to exploit server side vulnerabilities.

Late response, but may be useful for somebody.
You may try such approach:
//saves filtered $image to specified $path
function save($image,$path,$mime) {
switch($mime) {
case "image/jpeg" : return imagejpeg(imagecreatefromjpeg($image),$path);
case "image/gif" : return imagegif(imagecreatefromgif($image),$path);
case "image/png" : return imagepng(imagecreatefrompng($image),$path);
}
return false;
};

Secure User Image Upload Capabilities in PHP

I'm implementing a user-based image uploading tool for my website. The system should allow any users to upload JPEG and PNG files only. I'm, of course, worried about security and so I'm wondering how the many smarter people than myself feel about the following checks for allowing uploads:
1) First white list the allowable file extensions in PHP to allow only PNG, png, jpg, JPG and JPEG. Retrieve the user's file's extension via a function such as:
return end(explode(".", $filename));
This should help disallow the user from uploading something malicious like .png.php. If this passes, move to step 2.
2) Run the php function getimageize() on the TMP file. Via something like:
getimagesize($_FILES['userfile']['tmp_name']);
If this does not return false, proceed.
3) Ensure a .htaccess file is placed within the uploads directory so that any files within this directory cannot parse PHP files:
php_admin_value engine Off
4) Rename the user's file to something pre-determined. I.E.
$filename = 'some_pre_determined_unique_value' . $the_file_extension;
This will also help prevent SQL injection as the filename will be the only user-determined variable in any queries used.
If I perform the above, how vulnerable for attack am I still? Before accepting a file I should hopefully have 1) only allowed jpgs and pngs, 2) Verified that PHP says it's a valid image, 3) disabled the directory the images are in from executing .php files and 4) renamed the users file to something unique.
Thanks,

Regarding file names, random names are definitely a good idea and take away a lot of headaches.
If you want to make totally sure the content is clean, consider using GD or ImageMagick to copy the incoming image 1:1 into a new, empty one.
That will slightly diminish image quality because content gets compressed twice, but it will remove any EXIF information present in the original image. Users are often not even aware how much info gets put into the Metadata section of JPG files! Camera info, position, times, software used... It's good policy for sites that host images to remove that info for the user.
Also, copying the image will probably get rid of most exploits that use faulty image data to cause overflows in the viewer software, and inject malicious code. Such manipulated images will probably simply turn out unreadable for GD.

Regarding your number 2), don't just check for FALSE. getimagesize will also return the mime type of the image. This is by far a more secure way to check proper image type than looking at the mime type the client supplies:
$info = getimagesize($_FILES['userfile']['tmp_name']);
if ($info === FALSE) {
die("Couldn't read image");
}
if (($info[2] !== IMAGETYPE_PNG) && ($info[2] !== IMAGETYPE_JPEG)) {
die("Not a JPEG or PNG");
}

All the checks seem good, number 3 in particular. If performance is not an issue, or you are doing this in the background, you could try accessing the image using GD and seeing if it is indeed an image and not just a bunch of crap that someone is trying to fill your server with.

Concerning No. 2, I read on php.net (documentation of the function getimagesize() ):
Do not use getimagesize() to check that a given file is a valid image. Use a purpose-built solution such as the Fileinfo extension instead.

PHP Upload File Validation

I am creating file upload script and I'm looking for the best techniques and practices to validate uploaded files.
Allowed extensions are:
$allowed_extensions = array('gif','jpg','png','swf','doc','docx','pdf','zip','rar','rtf','psd');
Here's the list of what I'm doing.
Checking file extension
$path_info = pathinfo($filename);
if( !in_array($path_info['extension'], $allowed_extensions) ) {
die('File #'.$i.': Incorrent file extension.');
}
Checking file mime type
$allowed_mimes = array('image/jpeg','image/png','image/gif','text/richtext','multipart/x-zip','application/x-shockwave-flash','application/msword','application/pdf','application/x-rar-compressed','image/vnd.adobe.photoshop');
if( !in_array(finfo_file($finfo, $file), $allowed_mimes) ) {
die('File #'.$i.': Incorrent mime type.');
}
Checking file size.
What should I do to make sure uploaded files are valid files? I noticed strange thing. I changed .jpg file extension to .zip and... it was uploaded. I thought it will have incorrect MIME type but after that I noticed I'm not checking for a specific type but if a specific MIME type exist in array. I'll fix it later, that presents no problems for me (of course if you got any good solution/idea, do not hesitate to share it, please).
I know what to do with images (try to resize, rotate, crop, etc.), but have no idea how to validate other extensions.
Now's time for my questions.
Do you know good techniques to validate such files? Maybe I should unpack archives for .zip/.rar files, but what about documents (doc, pdf)?
Will rotating, resizing work for .psd files?
Basically I thought that .psd file has following mime: application/octet-stream but when
I tried to upload .psd file it showed me (image/vnd.adobe.photoshop). I'm a bit confused about this. Do files always have the same MIME type?
Also, I cannot force code block to work. Does anyone have a guess as to why?

Lots of file formats have a pretty standard set of starting bytes to indicate the format. If you do a binary read for the first several bytes and test them against the start bytes of known formats it should be a fairly reliable way to confirm the file type matches the extension.
For example, JPEG's start bytes are 0xFF, 0xD8; so something like:
$fp = fopen("filename.jpg", "rb");
$startbytes = fread($fp, 8);
$chunked = str_split($startbytes,1);
if ($chunked[0] == 0xFF && $chunked[1] == 0xD8){
$exts[] = "jpg";
$exts[] = "jpeg";
}
then check against the exts.
could work.

If you want to validate images, a good thing to do is use getimagesize(), and see if it returns a valid set of sizes - or errors out if its an invalid image file. Or use a similar function for whatever files you are trying to support.
The key is that the file name means absolutely nothing. The file extensions (.jpg, etc), the mime types... are for humans.
The only way you can guarantee that a file is of the correct type is to open it and evaluate it byte by byte. That is, obviously, a pretty daunting task if you want to try to validate a large number of file types. At the simplest level, you'd look at the first few bytes of the file to ensure that they match what is expected of a file of that type.

why some mp3s on mime_content_type return application/octet-stream

Why is it that on some mp3s files, when I call mime_content_type($mp3_file_path) it returns application/octet-stream?
This is my code:
if (!empty($_FILES)) {
$tempFile = $_FILES['Filedata']['tmp_name'];
$image = getimagesize($tempFile);
$mp3_mimes = array('audio/mpeg', 'audio/x-mpeg', 'audio/mp3', 'audio/x-mp3', 'audio/mpeg3', 'audio/x-mpeg3', 'audio/mpg', 'audio/x-mpg', 'audio/x-mpegaudio');
if (in_array(mime_content_type($tempFile), $mp3_mimes)) {
echo json_encode("mp3");
} elseif ($image['mime']=='image/jpeg') {
echo json_encode("jpg");
} else{
echo json_encode("error");
}
}
EDIT:
I've found a nice class here:
http://www.zedwood.com/article/127/php-calculate-duration-of-mp3

MP3 files are a strange beast when it comes to identifying them. You can have an MP3 stored with a .wav container. There can be an ID3v2 header at the start of the file. You can embed an MP3 essentially within any file.
The only way to detect them reliably is to parse slowly through the file and try to find something that looks like an MP3 frame. A frame is the smallest unit of valid MP3 data possible, and represents (going off memory) 0.028 seconds of audio. The size of the frame varies based on bitrate and sampling rate, so you can't just grab the bitrate/sample rate of the first frame and assume all the other frames will be the same size - a VBR mp3 must be parsed in its entirety to calculate the total playing time.
All this boils down to that identifying an MP3 by using PHP's fileinfo and the like isn't reliable, as the actual MP3 data can start ANYWHERE in a file. fileinfo only looks at the first kilobyte or two of data, so if it says it's not an MP3, it might very well be lying because the data started slightly farther in.

application/octet-stream is probably mime_content_type s fallback type when it fails to recognize a file.
The MP3 in that case is either not a real MP3 file, or - more likely - the file is a real MP3 file, but does not contain the "magic bytes" the PHP function uses to recognize the format - maybe because it's a different sub-format or has a variable bitrate or whatever.
You could try whether getid3 gives you better results. I've never worked with it but it looks like a pretty healthy library to get lots of information out of multimedia files.
If you have access to PHP's configuration, you may also be able to change the mime.magic file PHP uses, although I have no idea whether a better file exists that is able to detect your MP3s. (The mime.magic file is the file containing all the byte sequences that mime_content_type uses to recognize certain file types.)

Fleep is the answer to this question. Allowing application/octet-stream is dangerous since .exe and other dangerous files can display with that mime type.
See this answer https://stackoverflow.com/a/52570299/14482130

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.