I've been reading up on PHP file upload security and a few articles have recommended renaming the files. For example, the OWASP article Unrestricted File Upload
says:
It is recommended to use an algorithm to determine the filenames. For
instance, a filename can be a MD5 hash of the name of file plus the
date of the day.
If a user uploads a file named Cake Recipe.doc is there really any reason to rename it to 45706365b7d5b1f35?
If the answer is yes, for whatever reason, then how do you keep track of the original file name and extension?
To your primary question, is it good practice to rename files, the answer is a definite yes, especially if you are creating a form of File Repository where users upload files (and filenames) of their choosing, for several reason:
Security - if you have a poorly written application that allows the download of files by name or through direct access (it's a horrid, but it happens), it's much harder for a user, whether maliciously or on purpose, to "guess" the names of files.
Uniqueness -- the likelihood of two different people uploading a file of the same name is very high (ie. avatar.gif, readme.txt, video.avi, etc). The use of a unique identifier significantly decreases the likelihood that two files will be of the same name.
Versioning -- It is much easier to keep multiple "versions" of a document using unique names. It also avoids the need for additional code to parse a filename to make changes. A simple example would document.pdf to document(1).pdf, which becomes more complicated when you don't underestimate users abilities to create horrible names for things.
Length -- working with known filename lengths is always better than working with unknown filename lengths. I can always know that (my filepath) + (X letters) is a certain length, where (my filepath) + (random user filename) is completely unknown.
OS -- the length above can also create problems when attempting to write extremely random/long filenames to a drive. You have to account for special characters, lengths and the concerns for trimmed filenames (user may not receive a working file because the extension has been trimmed).
Execution -- It's easy for the OS to execute a file named .exe, or .php, or (insert other extension). It's hard when there isn't an extension.
URL encoding -- Ensuring the name is URL safe. Cake Recipe.doc is not a URL safe name, and can on some systems (either server or browser side) / some situations, cause inconsistencies when the name should be a urlencoded value.
As for storing the information, you would typically do this in a database, no different than the need you have already, since you need a way to refer back to the file (who uploaded, what the name is, occassionally where it is stored, the time of upload, sometimes the size). You're simply adding to that the actual stored name of the file in addition to the user's name for the file.
The OWASP recommendation isn't a bad one -- using the filename and a timestamp (not date) would be mostly unique. I take it a step further to include the microtime with the timestamp, and often some other unique bit of information, so that a duplicate upload of a small file couldn't occur in the same timeframe -- I also store the date of the upload which is additional insurance against md5 clashes, which has a higher probability in systems that store many files and for years. It is incredibly unlikely that you would generate two like md5s, using filename and microtime, on the same day. An example would be:
$filename = date('Ymd') . '_' . md5($uploaded_filename . microtime());
My 2 cents.
When I upload files I use PHP's unique_id() function for the filename that is stored on the server (and I preserve the file extension since it makes it easier for me when I am looking at all the files in the storage directory via the local file system).
I save the file outside of the website file system (aka you can never browse directly to the files).
I always use php's move_uploaded_file() function to save the file to the server.
I store the original filename, the path/filename where it is stored, and any other project related information you might need about who uploaded it, etc in a database.
In some of my implementations I also create a hash of the file contents and save that in the database too. Then with other uploaded files look in the database to see if I have a copy of that exact file already stored.
Some code examples:
The form:
form method="post" enctype="multipart/form-data" action="your_form_handler.php">
<input type="file" name="file1" value="" />
<input type="submit" name="b1" value="Upload File" />
</form>
The form handler:
<?php
// pass the file input name used in the form and any other pertinent info to store in the db, username in this example
_process_uploaded_file('file1', 'jsmith');
exit;
function _process_uploaded_file($file_key, $username='guest'){
if(array_key_exists($file_key, $_FILES)){
$file = $_FILES[$file_key];
if($file['size'] > 0){
$data_storage_path = '/path/to/file/storage/directory/';
$original_filename = $file['name'];
$file_basename = substr($original_filename, 0, strripos($original_filename, '.')); // strip extention
$file_ext = substr($original_filename, strripos($original_filename, '.'));
$file_md5_hash = md5_file($file['tmp_name']);
$stored_filename = uniqid();
$stored_filename .= $file_ext;
if(! move_uploaded_file($file['tmp_name'], $data_storage_path.$stored_filename)){
// unable to move, check error_log for details
return 0;
}
// insert a record into your db using your own mechanism ...
// $statement = "INSERT into yourtable (original_filename, stored_filename, file_md5_hash, username, activity_date) VALUES (?, ?, ?, ?, NOW())";
// success, all done
return 1;
}
}
return 0;
}
?>
Program to handle download requests
<?php
// Do all neccessary security checks etc to make sure the user is allowed to download the file, etc..
//
$file = '/path/to/your/storage/directory' . 'the_stored_filename';
$filesize = filesize($file);
header('Content-Description: File Transfer');
header("Content-type: application/forcedownload");
header("Content-disposition: attachment; filename=\"filename_to_display.example\"");
header("Content-Transfer-Encoding: Binary");
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Pragma: public');
header("Content-length: ".$filesize);
ob_clean();
flush();
readfile("$file");
exit;
If you want to present the download in the same page that the user is requesting it from then look at my answer to this post: Dowloading multiple PDF files from javascript
There is a good reason you need to rename uploaded file and it is,
if two upload same file, or files with same name, the latter file will replace the former file which is not favourable.
you can use hashing algos like
$extensions = explode(".",$file-name);
$ext = $extensions[count($extensions)-1];
$file-name = md5($file-name .$_SERVER['REMOTE_ADDR']) .'.' .$ext;
then you can save details of filename, hashed filename, uploader details, date, time to keep track of files
Related
why should I use this code to get the name of the file?
$filename = pathinfo($_FILES['file']['name'], PATHINFO_FILENAME)
If I could also get the name through this code:
$filename = $_File['file']['name']
Thank you very much! I'm a beginner in PHP, so sorry if the question is too dumb :D
Because $_File['file']['name'] comes from the user end, and although ordinarily it is just the file name, an ill-intentioned user can actually set it to whatever he wants (example: full path name to overwrite files in the server) and you have to filter it just like every other user input to prevent an attack vector in your system.
Same is true for everything in $_FILE, don't trust the informed MIME type, don't save files without checking if the extension is safe (saving a .php file will be a disaster) etc.
For example, I've seen a system that would trust files of type equal to image/jpeg and other image types, and then saves it without checking the actual file extension. A forged request can inject a .php shell script to this website's upload folder and be used to take control.
First of all, I apologize if the question is not clear, I'm explaining it below.
For every file uploaded, I'm renaming the file and recording the hash values (using sha1_files function, please suggest if there are some better or faster hashing techniques for the file in php) in a separate DB table and checking the hash of every new file to avoid duplicate files.
In this manner, the one uploading a duplicate file will get an error msg and the file won't be uploaded.
My question is, is there any techniques or algorithm by which I can prevent duplicate file upload but the duplicate file uploader will be unaware of it and will find the file in his/her account with a different name than the one already present. However, users won't be able to upload banned files by any means.
Yes, you should use xxhash which is much faster than sha1.
According to their benchmarks:
The benchmark uses SMHasher speed test, compiled with Visual 2010 on a
Windows Seven 32-bits box. The reference system uses a Core 2 Duo
#3GHz
SHA1-32 is 0.28 GB/s fast, and xxHash is 5.4 GB/s.
The PHP library is only getting a string as input, so you should use the binary library, and have something like this in your PHP:
list($hash) = explode(" ", shell_exec("/path/to/xxHash/xxhsum " . escapeshellarg($filePath)));
echo $hash;
Installing xxhash:
$ wget https://codeload.github.com/Cyan4973/xxHash/tar.gz/v0.6.3 -O xx.tar.gz
$ tar xvzf xx.tar.gz
$ cd xxHash-0.6.3; make
Just add some extra logic in your code possibly using an extra table or extra fields in the existing table (it is up to you, there is more than one way to do it) that saves the file to an alternate location should you discover it is a duplicate rather than sending an error. Not sure, though, if what you are doing is a good idea from the UI design point of view, as you are doing something different with the user input in a way that the user will notice without telling the user why.
Use an example like this to generate your sha1 hash client side before upload.
Save all your uploaded files with their hash as the filename, or have a database table which contains the hash and your local filename for each file, also save file size and content type.
Before upload submit hash from client side to your server and check for hash in database. If its not present then commence file upload. If present then fake the upload client side or whatever you want to do so the user thinks they have uploaded their file.
Create a column in your users table for files uploaded. Store a serialised associative array in this column with hash => users_file_name as key=>value pairs. Unserialize and display to each user to maintain their own file names then use readfile to serve them the file with the correct name, selecting it server side using the hash
As for your URL question. Create a page for the downloads but include the user in the url as well, so mysite.com/image.php?user=NewBee&image=filename.jpg
Query the database for files uploaded by NewBee and unserialize the array. Then:
$upload = $_GET['image'];
foreach($array as $hash => $filename){
if($filename == $upload)
$file = $hash;
}
Seach database for the path to your copy of that file, then using readfile you can output the same file with whatever namme you want.
header("Content-Description: File Transfer");
header("Content-type: {$contenttype}");
header("Content-Disposition: attachment; filename=\"{$filename}\"");
header("Content-Length: " . filesize($file));
header('Pragma: public');
header("Expires: 0");
readfile($file);
You could create an extra table which links files uploaded (so entries in your table with file hashes) with useraccounts. This table can contain an individual file name for every file belonging to a specific user (so the same file can have a different name per user). With current technologies you could also think about creating the file hash in the browser via javascript and then upload the file only if there isn't already a file with that hash in your database if it is you can instead just link this user to the file.
Addition because of comment:
If you want the same file to be accessible through multiple urls you can use something like apache's mod_ rewrite. I'm no expert with that but you can look here for a first idea. You could update the .htaccess dynamically with your upload script.
I am new to php and would like to ask you for some help returning an unique result from file_get_contents(). The reason is I want to give each photo an unique name, so that later it will be possible to delete just one of them and not all.
$file =addslashes(file_get_contents($_FILES['image']['tmp_name'][$key]));
Unfortunately time() and microtime() doesn't help in this situation.
Maybe this will help you: http://php.net/manual/en/function.uniqid.php
uniqid();
$imageName = $imageName . '_' . uniqid();
Why not just generate the SHA-1 of the content and use that as the file name (sort of like how git stores objects in a repository's loose object database)? I typically don't like to insert blobs into the RDBMS; it's a little clunky and you have to make sure the blob field has enough space for the kind of file sizes you're expecting to work with. Plus, the filesystem is optimized to handle, well, files. So it makes sense to keep a special directory on the server to which the Web server has write access. Then you write the files there and store references to them in the database. Here's an example:
// Read the contents of the file and create a SHA-1 hash signature.
$tmpPath = $_FILES['image']['tmp_name'];
$blob = file_get_contents($tmpPath);
$name = sha1($blob) . '.img'; // e.g. b99c6e26c3775fca9918ad614b7be7fe4fd7bee3.img
// Save the file to your server somewhere.
$dstPath = "/path/to/imgdb/$name";
move_uploaded_file($tmpPath,$dstPath);
// TODO: insert reference (i.e. $name) into database somehow...
And yes, researches have broken SHA-1, but you could easily write some safeguards against that if you're paranoid enough (e.g. check for an existing upload with the same hash; it the content differs, just mutate/append to the name a little to change it). You aren't identifying the image on the backend by its content: you just need a unique name. Once you have that, you just look it up from the DB to figure out the file path on disk corresponding to the actual image data.
If you expect the image files to be pretty large, you could limit how much you read into memory using the 4th parameter to file_get_contents().
So im making a website with an image upload functionality and im storing the image name to the database. I took a screenshot of my mac and wanted to upload this photo "Screen shot 2011-02-18 at 6.52.20 PM.png". Well, thats not a nice name to store in mysql! How do people ususally rename photos in such a way that each photo uploaded has a unique name? Also, how would i make sure i keep the file extension in the end when renaming the photo.
I would drop the extension, otherwise Apache (or equivalent) will run a1e99398da6cf1faa3f9a196382f1fadc7bb32fb7.php if requested (which may contain malicious PHP). I would also upload it to above the docroot.
If you need to to make the image accessible above the docroot, you can store a safe copy that is ran through image functions or serve it from some PHP with header('Content-Type: image/jpeg') for example and readfile() (not include because I can embed PHP in a GIF file).
Also, pathinfo($path, PATHINFO_EXTENSION) is the best way to get an extension.
Ensure you have stored a reference to this file with original filename and other meta data in a database.
function getUniqueName($originalFilename) {
return sha1(microtime() . $_SERVER['REMOTE_ADDR'] . $originalFilename);
}
The only way this can generate a duplicate is if one user with the same IP uploads the same filename more than once within a microsecond.
Alternatively, you could just use the basename($_FILES['upload']['tmp_name']) that PHP assigns when you upload an image. I would say it should be unique.
Hash the image name. Could be md5, sha1 or even a unix timestamp.
Here is an (untested) example with a random number (10 to 99)
<?php
function generate_unique_name($file_name)
{
$splitted = split(".", $file_name);
return time() . rand(10,99) . "." . $splitted[count($splitted)-1];
}
?>
You could use an image table like:
id: int
filename: varchar
hash: varchar
format: enum('jpeg', 'png')
The hash can be something like sha1_file($uploaded_file) and used to make sure duplicate images aren't uploaded. (So you could have multiple entries in the image table with the same hash, if you wanted.) The id is useful so you can have integer foreign key links back to the image table.
Next store the images in either:
/image/$id.$format
or
/image/$hash.$format
The second format via the hash would make sure you don't duplicate image data. If you are dealing with lots of images, you may want to do something like:
/image/a/b/c/abcdef12345.jpg
where you use multiple layers of folders to store the images. Many file systems get slowed down with too many files in a single directory.
Now you can link to those files directly, or set up a URL like:
/image/$id/$filename
For example:
/image/12347/foo.jpg
The foo.jpg comes from whatever the user uploaded. It is actually ignored because you look up via the id. However, it makes the image have a nice name if the person chooses to download it. (You may optionally validate that the image filename matches after you look up the id.)
The above path can be translated to image.php via Apache's MultiView or ModRewrite. Then you can readfile() or use X-SendFile (better performance, but not always available) to send the file to the user.
Note that if you don't have X-SendFile and don't want to process things through PHP, you could use a RewriteRule to convert /image/$hash/foo.jpg into /image/a/b/c/$hash.jpg.
As title which of them is better and why? Any weaknesses from doing it?
I been hearing that Jquery/Javascript checking is bad and adviced to use PHP but somehow don't know why....
Need some recommend from any of you. Thanks in advance.
Anyone see if this is good or bad:
<input type="file" name="task_doc" class="task_doc" onChange="checkext();"/>
function checkext(){
var permittedFileType = ['pdf', 'doc', 'docx', 'xls', 'xlsx'];
var fext = $(".task_doc").val().split('.').pop().toLowerCase();
var resultFile = validate_filetype(fext, permittedFileType);
if(resultFile === false){
$(".task_doc").replaceWith("<input type='file' name='task_doc' class='task_doc' onChange='checkext();'>");
alert("Invalid Extension");
}
else{
alert("Success");
}
}
function validate_filetype(fext, ftype)
{
for(var num in ftype)
{
if(fext == ftype[num])
return true;
}
return false;
}
If you use only javascript to check for data-validity, advanced users will have the possibility of uploading any data they want.
On the other hand using javascript might be a convenient way for the user to get fast feedback, if his entered data (files in this case) is invalid.
So I suggest using both client side and server side scripts.
You have to assume that any outside data is tainted and could be malicious. A user could disable JavaScript and send any file they want. Or a user could send a file to the server and change the MIME type and/or extension to bypass checks on the server as well.
Your best bet is to make sure your server is set up to correctly handle the various MIME types and not by default parse unknown file types as PHP. In other words, don't set Apache to handle anything but .php files as PHP and block .php files from being uploaded at all. Handling file uploads is a sticky situation at best, security-wise. I would highly recommend saving uploads outside of your document root directory, renaming them to a random string that only you know (i.e. on upload store the random name in a database), then send the file via PHP to the browser.
header('Content-Description: File Transfer');
header('Content-Type: application/octet-stream');
header('Content-Disposition: attachment; filename=' . basename($filename));
header('Content-Transfer-Encoding: binary');
readfile($filename);
I recommend doing this because storing them outside the document root prevents access, using a unique filename stops somebody from directly accessing it, and forcing a download (should) prevent any auto execution of a malicious file so hopefully the user's anti-virus could find it....