When a user uploads a file, randomly it gets replaced by another user's upload, I've finally tracked down the issue to PHP and the tmp file name being reused. Is there a way to fix this? Is there a way to make better random names? It seems to degrade over time, as in the random file name seed gets weaker? This is on PHP 5.2.8 and FreeBSD 7.0
Here is a log showing how the same tmp file name gets used and is overwritten by another upload: http://pastebin.com/m65790440
Any help is GREATLY appreciated. I've been trying to fix this for over 4 months and has gotten worse over time. Thank you.
EDIT: Keep in mind that this is not a PHP code issue, this is happening before it reaches any PHP code, the file received via $_FILES['name']['tmp_name'] is incorrect when it is received and its been traced back that it is being overwritten with someone else's upload before it reaches the upload processing script
After chasing the relevant code down to _gettemp in FreeBSD 7's libc implementation, I'm unclear regarding how the contents of the file tmp_name could be invalid. (To trace it, you might download a copy of PHP 5.2.8 and read in main/rfc1867.c - line 1018 calls in main/php_open_temporary_file.c, the function starting on line 227, which does it's main work in the function starting on line 97, which, however, is essentially just a wrapper for mkstemp on your system, which is found in the FreeBSD libc implementation on line 66 (linked), which uses _gettemp (same as above) to actually generate the random filename. However the manpage for mkstemp mentions in the BUGS section that the arc4random() function is not reentrant. It might be a possibility that 2 simultaneous requests are entering the critical code section and returning the same tmp_name - I know too little about how Apache works with either mod_php or php-cgi to comment there (though using FastCGI/php-cgi might work - I can't comment successfully on this at this time).
However, aiming for the simpliest solution, if you are not quite experiencing the file tmp_name itself being invalid, but colliding instead with other uploaded files (for example, if using the filename portion of tmp_name as your only source of uniqueness in the stored filename), you could be facing collisions due to the birthday paradox. In another question you mention having some 5,000,000 files to move, and in still another question you mention recieving 30-40k uploads a day. This strikes me as a prime situation for a birthday paradox collision. The mktemp man page mentions that (if using six 'Xs' as PHP does) there are 56,800,235,584 possible filenames (62 ** 6, or 62 ** n where n = number of 'Xs', etc). However, given that you have more than some 5 million files, the probability of a collision is approximately 100% (another heuristic suggests you'll have already experienced some order of 220 collisions already, if ((files*(files-1))/2)/(62**6) means anything, where files = 5,000,000). If this is the problem you are facing (probable, if not adding further entropy to the generated uploaded filename), you might try something like move_uploaded_file($file['tmp_name'], UPLOADS.sha1(mt_rand().$file['tmp_name']).strrchr($file['name'], '.')) - the idea being to add more randomness to the random filename, preventing collisions. An alternative could be to add two more 'Xs' to line 134 of main/php_open_temporary_file.c and recompile.
It sounds like something is seriously wrong with either your PHP installation or whichever system call PHP is internally using to generate the random file names (most likely tempnam).
For everyone else: PHP handles uploaded files internally before the user code is ever processed. These names are stored in $_FILES['file']['tmp_name'] (where 'file' is the (quoted) name of the file input element on the form).
Is PHP running under apache, as mod_php?
You may try to create a per-process temporary upload directory whose name contains your php getmypid(), then ini_set your PHP process' upload_tmp_dir to
that directory. This will not work if a new php process is spawned for every request.
Move your files to a user dir after they have been uploaded. Those temp files should be removed.
I would recommend using a GUID generator for the filename seeing that you are getting so many.
Related
I have hundreds of mp3 files on my server. Each file's modified-date is important because it is fetched by PHPs filemtime to represent it's upload date (since there's no way to determine an upload time without storing values in a database).
I have come across an audio issue in which all the files need to be normalized and re-uploaded to the server. This would, of course, change the modified-date of each file to "today". I need each file to retain it's original modified-date.
I'm not sure if this is a software-recommendation question or a programming question, so I apologize if this is the wrong .SE site. Is this even possible?
You should be able to set the modified time with touch: http://php.net/manual/en/function.touch.php
This requires PHP > 5.3 and the user running the script (probably your web user unless you run it from the cli) needs to have write permission on the file.
You have two options for implementation:
Store the filenames and their mtimes in temporary storage (either a file or a database table). When you finish the upload, run through all of the files and use touch to reset the mtime.
As you upload the files, check to see if the file already exists. If it does, grab the mtime in a temporary variable, overwrite the file, then touch it with the correct mtime.
I know this isn't the answer you're looking for, but it would make far more sense to start storing this information in a database than relying on the last-modified date. This way you can show your users the date that they need to know and retain the true date of modification.
An approach like this also gives you much more flexibility.
As requested by #Snailer - for the sake of closing the question.
I want to allow registered users of a website (PHP) to upload files (documents), which are going to be publicly available for download.
In this context, is the fact that I keep the file's original name a vulnerability ?
If it is one, I would like to know why, and how to get rid of it.
While this is an old question, it's surprisingly high on the list of search results when looking for 'security file names', so I'd like to expand on the existing answers:
Yes, it's almost surely a vulnerability.
There are several possible problems you might encounter if you try to store a file using its original filename:
the filename could be a reserved or special file name. What happens if a user uploads a file called .htaccess that tells the webserver to parse all .gif files as PHP, then uploads a .gif file with a GIF comment of <?php /* ... */ ?>?
the filename could contain ../. What happens if a user uploads a file with the 'name' ../../../../../etc/cron.d/foo? (This particular example should be caught by system permissions, but do you know all locations that your system reads configuration files from?)
if the user the web server runs as (let's call it www-data) is misconfigured and has a shell, how about ../../../../../home/www-data/.ssh/authorized_keys? (Again, this particular example should be guarded against by SSH itself (and possibly the folder not existing), since the authorized_keys file needs very particular file permissions; but if your system is set up to give restrictive file permissions by default (tricky!), then that won't be the problem.)
the filename could contain the x00 byte, or control characters. System programs may not respond to these as expected - e.g. a simple ls -al | cat (not that I know why you'd want to execute that, but a more complex script might contain a sequence that ultimately boils down to this) might execute commands.
the filename could end in .php and be executed once someone tries to download the file. (Don't try blacklisting extensions.)
The way to handle this is to roll the filenames yourself (e.g. md5() on the file contents or the original filename). If you absolutely must allow the original filename to best of your ability, whitelist the file extension, mime-type check the file, and whitelist what characters can be used in the filename.
Alternatively, you can roll the filename yourself when you store the file and for use in the URL that people use to download the file (although if this is a file-serving script, you should avoid letting people specify filenames here, anyway, so no one downloads your ../../../../../etc/passwd or other files of interest), but keep the original filename stored in the database for display somewhere. In this case, you only have SQL injection and XSS to worry about, which is ground that the other answers have already covered.
That depends where you store the filename. If you store the name in a database, in strictly typed variable, then HTML encode before you display it on a web page, there won't be any issues.
The name of the files could reveal potentially sensitive information. Some companies/people use different naming conventions for documents, so you might end up with :
Author name ( court-order-john.smith.doc )
Company name ( sensitive-information-enterprisename.doc )
File creation date ( letter.2012-03-29.pdf )
I think you get the point, you can probably think of some other information people use in their filenames.
Depending on what your site is about this could become an issue (consider if wikileaks published leaked documents that had the original source somewhere inside the filename).
If you decide to hide the filename, you must consider the problem of somebody submitting an executable as a document, and how you make sure people know what they are downloading.
I'm making a real simple "backend" (PHP5) for two flash/air-applications. One of them will upload a photo, the backend will save it to a folder, and the second app will poll the backend for new photo's and show them.
I don't got any access to a database, so the backend has to be pure PHP5 and nothing more. That's why I chose to save the images to a folder (with a timestamp in their names) and use readdir() to get them back.
This all works like a charm. Nevertheless, I would really like to make sure the backend only returns photo's that are completely uploaded, preventing the second app to try to load an unfinished image. Are there any methods/tricks that I can use to validate a file?
You could check the filesize a couple hundred milliseconds apart and see if it changes:
$first = filesize($file);
// wait 100ms
usleep(100000);
$second = filesize($file);
if($first == $second) {
// file is no longer being actively uploaded
}
The usual trick for atomic filesystem operations is to write into a temporary file that is not matched by the reader (e.g. XXX.jpg.tmp) and once it's completely uploaded, rename it to it's target name. Renames on the same volume are atomic, so there is no point where the file is either uncomplete or unavailable.
A really easy and common way to do so would be to create a trigger file based on the files name, so that you get something like
123.jpg
123.rdy
or
123.jpg
123.jpg.rdy
You create that file (just an empty stub) as soon as the upload is complete. The application that grabs files to load only cares about files with a trigger file and then processes those. Alternatively, you could also save the uploaded file as ie. 123.bsy or 123.jpg.bsy while it is still being uploaded and then rename it to the finale name 123.jpg after the upload is done. Since renames in the same directory are usually really cheap operations in term of processing time, the chances of running in a race condition should be pretty low. (This might or might now depend on the OS used, though ...)
If you need to keep the files in that place, you could, of course, use a database where you add a record for each file, as the upload is complete. The other app could then just provide files with a matching database record.
After writing this all down I figgered it out myself. What I did was adding the exact amount of bytes in the filename as well and validate that while outputting the list of images. The .tmp/.bsy-sollution is nice also, but I read it a bit to late :)
Upside to my solution is that no more renaming is required after the upload is done. Thanks everybody for your fast answers!
I've got a Mac server and I'm building PHP code to allow users to upload images, documents, and even video files. Researching this has definitely gotten me nervous, I want the uploaded content to be virus free.
Is building something myself going to be a huge challenge? Would you do it, or would you find some OS or OTS product? (And do you know of any you can recommend)?
Conceptually, what you're talking about is pretty straightforward. Accepting and processing uploads is pretty simple, it's definitely not something I think you need to worry about buying a pre-built solution for.
Generally things like images and videos can't really have "viruses" (unless the viewer application is really poor and lets them run code somehow - also known as "Internet Explorer"), but it's not really difficult to virus-scan them anyway if you'd like to. Just find a command-line scanner that can run on the server (something like Clam AV), and whenever a file is uploaded, run it through the scanner and reject the upload (and log the event) if it fails the scan.
If you're uploading very large files, you might also consider a Flash upload/status bar so that users can see how much of the file is uploaded. SWFUpload is a good choice for that.
You can scan files with ClamAV by doing something like this in PHP:
$out = '';
$int = -1;
exec('/usr/local/bin/clamscan --stdout /path/to/file.ext', $out, $int);
if ($int == 0)
{
print('No virus!');
}
/*
Return codes from clamscan:
0 : No virus found.
1 : Virus(es) found.
40: Unknown option passed.
50: Database initialization error.
52: Not supported file type.
53: Can't open directory.
54: Can't open file. (ofm)
55: Error reading file. (ofm)
56: Can't stat input file / directory.
57: Can't get absolute path name of current working directory.
58: I/O error, please check your file system.
59: Can't get information about current user from /etc/passwd.
60: Can't get information about user '' from /etc/passwd.
61: Can't fork.
62: Can't initialize logger.
63: Can't create temporary files/directories (check permissions).
64: Can't write to temporary directory (please specify another one).
70: Can't allocate memory (calloc).
71: Can't allocate memory (malloc).
*/
The short answer: Don't buy anything. The experience and sense of accomplishment you will gain from coding this yourself is far more worth it.
The long answer: Trusting any form of user input is generally a bad idea. However, being sensible about what you do with user data is always the best way to go. If you don't do foolish things*, you'll be fine, and you'll gain tremendously from the experience.
( * I know that's a little ambiguous, but hey, try identifying a mistake before you've made it. I know I rarely can. ;)
I'm building sort of the same right now using FancyUpload from digitarald for Mootools 1.2.1
check this example: http://localhost/fancyupload/showcase/photoqueue/ to see how cool that is.
Just make sure you read up on how to pass a session to Flash (using GET / POST parameters!! Your session cookies will not work. ) and do some checks on the filetype.
Personally, i'd not let my users upload video's. Just use youtube and embed that stuff.
Oh yeah, and if you want to have thumbnails of thet stuff that's uploaded, go for ImageMagick installed on your server along with Ghostscript. Imagemagick can then even generate thumbnails from PDF's!
"Is building something myself going to be a huge challenge?" Yes, it is. Not as huge as to outsource it to a third party solution, but what you want to code here is possibly the most dangerous thing that you can get to code on a php web script: allowing users to upload files to your server. You need to be extremelly careful to filter the files you are going to accept to prevent users from uploading php scripts to your server. Common mistakes that people do while filtering are:
Not filter at all.
Filter based on incorrect regular expressions easily bypassables.
Not using is_uploaded_file and move_uploaded_file functions can get to LFI vulnerabilities.
Not using the $_FILES array (using global variables instead) can get to RFI vulns.
Filter based on the type from the $_FILES array, fakeable as it comes from the browswer.
Filter based on server side checked mime-type, fooled by simulating what the magic files contain (i.e. a file with this content GIF8 is identified as an image/gif file but perfectly executed as a php script)
Use blacklisting of dangerous files or extensions as opposed to whitelisting of those that are explicitely allowed.
Incorrect apache settings that allow to upload an .htaccess files that redefines php executable extensions (i.e. txt)..
I could go on, but I think you were already scared before asking :)
As per the viruses thing, yeah, just run an AV.
Here’s code to process the uploaded files, just so you get the idea:
foreach ($_FILES as $file) {
if (!$file['error']) {
move_uploaded_file ($file['tmp_name'], 'uploads/'. $file['name']);
} elseif (4 != $file['error']) {
$error_is = $file['error'];
// do something with the error :-)
}
}
header ('Location: ...'); // go to the updated page, like, with the new files
die;
You're better off using a third-party virus scanner to make sure the uploads are virus-free.
(Writing your own code to check for virus sounds like a daunting task)
Examples:
Gmail I think is using Norton, while Yahoo!Mail I think is using McAfee.
Keith Palmer suggestes small script.
Use clamdscan instead of clamscan.
clamdscan communicates with setup clamd (clamav daemon), while clamscan is a standalone application so virus signatures are loaded EACH TIME you call it, this could be generate quite a big load on your server.
Besides you could also try clamuko (this gives you on-access scanning), so you could just drop files into dir observed by clamuko.
There is also FUSE-based ClamFS which could probably be better solution, if you can't insert modules into the kernel.
To quote some famous words:
“Programmers… often take refuge in an understandable, but disastrous, inclination towards complexity and ingenuity in their work. Forbidden to design anything larger than a program, they respond by making that program intricate enough to challenge their professional skill.”
While solving some mundane problem at work I came up with this idea, which I'm not quite sure how to solve. I know I won't be implementing this, but I'm very curious as to what the best solution is. :)
Suppose you have this big collection with JPG files and a few odd SWF files. With "big" I mean "a couple thousand". Every JPG file is around 200KB, and the SWFs can be up to a few MB in size. Every day there's a few new JPG files. The total size of all the stuff is thus around 1 GB, and is slowly but steadily increasing. Files are VERY rarely changed or deleted.
The users can view each of the files individually on the webpage. However there is also the wish to allow them to download a whole bunch of them at once. The files have some metadata attached to them (date, category, etc.) that the user can filter the collection by.
The ultimate implementation would then be to allow the user to specify some filter criteria and then download the corresponding files as a single ZIP file.
Since the amount of criteria is big enough, I cannot pre-generate all the possible ZIP files and must do it on-the-fly. Another problem is that the download can be quite large and for users with slow connections it's quite likely that it will take an hour or more. Support for "resume" is therefore a must-have.
On the bright side however the ZIP doesn't need to compress anything - the files are mostly JPEGs anyway. Thus the whole process shouldn't be more CPU-intensive than a simple file download.
The problems then that I have identified are thus:
PHP has execution timeout for scripts. While it can be changed by the script itself, will there be no problems by removing it completely?
With the resume option, there is the possibility of the filter results changing for different HTTP requests. This might be mitigated by sorting the results chronologically, as the collection is only getting bigger. The request URL would then also include a date when it was originally created and the script would not consider files younger than that. Will this be enough?
Will passing large amounts of file data through PHP not be a performance hit in itself?
How would you implement this? Is PHP up to the task at all?
Added:
By now two people have suggested to store the requested ZIP files in a temporary folder and serving them from there as usual files. While this is indeed an obvious solution, there are several practical considerations which make this infeasible.
The ZIP files will usually be pretty large, ranging from a few tens of megabytes to hundreads of megabytes. It's also completely normal for a user to request "everything", meaning that the ZIP file will be over a gigabyte in size. Also there are many possible filter combinations and many of them are likely to be selected by the users.
As a result, the ZIP files will be pretty slow to generate (due to sheer volume of data and disk speed), and will contain the whole collection many times over. I don't see how this solution would work without some mega-expensive SCSI RAID array.
This may be what you need:
http://pablotron.org/software/zipstream-php/
This lib allows you to build a dynamic streaming zip file without swapping to disk.
Use e.g. the PhpConcept Library Zip library.
Resuming must be supported by your webserver except the case where you don't make the zipfiles accessible directly. If you have a php script as mediator then pay attention to sending the right headers to support resuming.
The script creating the files shouldn't timeout ever just make sure the users can't select thousands of files at once. And keep something in place to remove "old zipfiles" and watch out that some malicious user doesn't use up your diskspace by requesting many different filecollections.
You're going to have to store the generated zip file, if you want them to be able to resume downloads.
Basically you generate the zip file and chuck it in a /tmp directory with a repeatable filename (hash of the search filters maybe). Then you send the correct headers to the user and echo file_get_contents to the user.
To support resuming you need to check out the $_SERVER['HTTP_RANGE'] value, it's format is detailed here and once your parsed that you'll need to run something like this.
$size = filesize($zip_file);
if(isset($_SERVER['HTTP_RANGE'])) {
//parse http_range
$range = explode( '-', $seek_range);
$new_length = $range[1] - $range[0]
header("HTTP/1.1 206 Partial Content");
header("Content-Length: $new_length");
header("Content-Range: bytes {$range[0]}-$range[1]");
echo file_get_contents($zip_file, FILE_BINARY, null, $range[0], $new_length);
} else {
header("Content-Range: bytes 0-$size");
header("Content-Length: ".$size);
echo file_get_contents($zip_file);
}
This is very sketchy code, you'll probably need to play around with the headers and the contents to the HTTP_RANGE variable a bit. You can use fopen and fwrite rather than file_get contents if you wish and just fseek to the right place.
Now to your questions
PHP has execution timeout for scripts. While it can be changed by the script itself, will there be no problems by removing it completely?
You can remove it if you want to, however if something goes pear shaped and your code get stuck in an infinite loop at can lead to interesting problems should that infinite loop be logging and error somewhere and you don't notice, until a rather grumpy sys-admin wonders why their server ran out of hard disk space ;)
With the resume option, there is the possibility of the filter results changing for different HTTP requests. This might be mitigated by sorting the results chronologically, as the collection is only getting bigger. The request URL would then also include a date when it was originally created and the script would not consider files younger than that. Will this be enough?
Cache the file to the hard disk, means you wont have this problem.
Will passing large amounts of file data through PHP not be a performance hit in itself?
Yes it wont be as fast as a regular download from the webserver. But it shouldn't be too slow.
i have a download page, and made a zip class that is very similar to your ideas.
my downloads are very big files, that can't be zipped properly with the zip classes out there.
and i had similar ideas as you.
the approach to give up the compression is very good, with that you not even need fewer cpu resources, you save memory because you don't have to touch the input files and can pass it throught, you can also calculate everything like the zip headers and the end filesize very easy, and you can jump to every position and generate from this point to realize resume.
I go even further, i generate one checksum from all the input file crc's, and use it as an e-tag for the generated file to support caching, and as part of the filename.
If you have already download the generated zip file the browser gets it from the local cache instead of the server.
You can also adjust the download rate (for example 300KB/s).
One can make zip comments.
You can choose which files can be added and what not (for example thumbs.db).
But theres one problem that you can't overcome with the zip format completely.
Thats the generation of the crc values.
Even if you use hash-file to overcome the memory problem, or use hash-update to incrementally generate the crc, it will use to much cpu resources.
Not much for one person, but not recommend for professional use.
I solved this with an extra crc value table that i generate with an extra script.
I add this crc values per parameter to the zip class.
With this, the class is ultra fast.
Like a regular download script, as you mentioned.
My zip class is work in progress, you can have a look at it here: http://www.ranma.tv/zip-class.txt
I hope i can help someone with that :)
But i will discontinue this approach, i will reprogram my class to a tar class.
With tar i don't need to generate crc values from the files, tar only need some checksums for the headers, thats all.
And i don't need an extra mysql table any more.
I think it makes the class easier to use, if you don't have to create an extra crc table for it.
It's not so hard, because tars file structure is easier as the zip structure.
PHP has execution timeout for scripts. While it can be changed by the script itself, will there be no problems by removing it completely?
If your script is safe and it closes on user abort, then you can remove it completely.
But it would be safer, if you just renew the timeout on every file that you pass throught :)
With the resume option, there is the possibility of the filter results changing for different HTTP requests. This might be mitigated by sorting the results chronologically, as the collection is only getting bigger. The request URL would then also include a date when it was originally created and the script would not consider files younger than that. Will this be enough?
Yes that would work.
I had generated a checksum from the input file crc's.
I used this as an e-tag and as part of the zip filename.
If something changed, the user can't resume the generated zip,
because the e-tag and filename changed together with the content.
Will passing large amounts of file data through PHP not be a performance hit in itself?
No, if you only pass throught it will not use much more then a regular download.
Maybe 0.01% i don't know, its not much :)
I assume because php don't do much with the data :)
You can use ZipStream or PHPZip, which will send zipped files on the fly to the browser, divided in chunks, instead of loading the entire content in PHP and then sending the zip file.
Both libraries are nice and useful pieces of code. A few details:
ZipStream "works" only with memory, but cannot be easily ported to PHP 4 if necessary (uses hash_file())
PHPZip writes temporary files on disk (consumes as much disk space as the biggest file to add in the zip), but can be easily adapted for PHP 4 if necessary.