I am trying to convert our uploaded filenames from an unreadable pile of files to an organized and human-readable structure of files. I am wondering if there are any additional security measures I need to take to safe this type of system.
To give a brief overview, the current system uploads files, generates a random filename, and allows the files to be accessed only through a download script (I have no need to serve the files directly to the browser).
In short, I'd like to implement a WebDav system and would think the easiest solution would be to store uploaded files with their original name (separated into different folders).
Thank you
Edit: To clarify, I'd like to retain the filename as much as possible, but I'd obviously need to at least sanitize the filenames first. I've considered chmod-ing the containing folder to prevent execution (a folder located outside of the web directory). What, in addition, am I not considering.
In short, I'd like to implement a WebDav system and would think the easiest solution would be to store uploaded files with their original name
This is pretty wide question but to make answer short: NEVER TRUST USER PROVIDED DATA. You must always do server side validation and sanitization, otherwise you will be hacked sooner or later.
Original file name is sent by client, so it can be anything. Here's some ideas of what I'd try to send you as "original" file names knowing you are so
carefree: ../../../../etc/passwd or ../../config/db.php. Handle as-it-comes. Enjoy :)
EDIT
I should have mentioned things I've considered -- sanitizing filenames
Sanitized file name is not an original file name any more. However there's approach you could consider here to meed your goal and still stay safe. You could validate/sanitize original file names and if after that it still the same as it came from user, you can keep the file and retain the original name. If it is not, then you should reject the file upload as whole. At the end fo the day you will have only files that you can allow to be accessed with original file names via other API/interfaces.
EDIT
I've considered chmod-ing the containing folder to prevent execution
This is bad security. You should rather keep files in the folder that is not accessible directly instead.
From How to Securely Allow Users to Upload Files:
Always Store Uploaded Files Outside of the Document Root
If your website is example.com and when a visitor accesses this website in their browser, the script located at /home/example/public_html/index.php is executed, then you should not be storing the files that users have uploaded in /home/example/public_html/ or any of its subdirectories. A good candidate, instead, would be /home/example/uploaded/.
...
Instead of storing the file at /home/example/uploaded/some/directories/user_provided.file, store all relevant metadata in a database record (while taking care to prevent SQL injection vulnerabilities) and use a random filename for the actual filesystem storage.
This does three things:
It guarantees that your user's file will never be executed as a script. They get read-only access whether they like it or not. (No reverse shells!)
It prevents the user from controlling the filename, to prevent security-critical files from being overwritten.
It allows you to retain as much metadata about each file as you'd like without sacrificing security.
If you need a real implementation to reference, here are two from a CMS that I'm developing:
Uploading files
Serving files
Related
I'm programming a file converter. Therefore the user uploads a file e.g. test.txt which is then convertet and a download link is sent back to the user. For security purposes I change the name of the files as soon as they are uploaded like it is also suggested here.
Instead create files and folders with randomly generated names like fg3754jk3h
The problem starts when it comes to download. For a better UX I want the downloadable files to have the same name as the user supplied files, not a random string. At the moment I also get an error in Chrome:
<Filename> is an unusual download and may be harmful. [translated]
I think this could also be a result of the crypthographic file names.
So my question: What is the best method to change the file names back to the original ones without having any security issues, or should I better do a scrict validation of the file names? And will this get rid of the displayed error message?
You can provide the original filename when returning the file to the user. (see Downloading a file with a different name to the stored name for a few ways of doing it)
The principle of not storing the file with original name is to avoid a malicious user trying to upload some script to your server that he can execute. You should do it, but also you should put that files in a temporary directory that your web server have no access.
For example:
You web server are pointing to /var/www
When your receive the uploaded file, store it on /var/uploads instead of /var/www/uploads. This way, the file will never be accessible to user (at least from web)
You save the original filename on your database
You still should generate a random filename, this will avoid filename collisions (many people will upload their cute-cat.jpg images), There's no problem keeping file extension. eg: kr3242sd93fdsh.jpg
You provide some endpoint to your user download the file by some random string (I suggest you avoiding use the same random string that you used to name the file): https://youserver.com/download?id=uoqq41jsak
On your download endpoint, you define the original filename on Content-Disposition's filename attribute.
This is a best practice question regarding how to handle user uploads and distribute static content to a large number of concurrent users.
I have an upload form for images (png, jpg, gif) and other forms of multimedia (mp4, webm). The files are created, hashed, and stored in storage/app/attachments/ as their hash with no extension.
The request URL /file/md5/filename (such as /file/9d42b752ecd0e3b4542aeca21c7c50a9/dancing_cat.gif) will distribute the file with that name. The route is completely flexible, so replacing dancing_cat.gif with boring_cat_dancing_poorly.gif will still fetch the same file, but will distribute it with the new filename specified.
The point of this system is to stop duplicates from being uploaded while preserving the original name of the document that the uploader had. Other instances of the same file uploaded will also keep their name.
The code I have for this works, however, people raise issue with distributing static content through PHP. I am told that on my large, target platform, this system will work poorly and will immediately become a bottleneck. I am told I should use routes in Apache/nginx/Lighhttpd/whatever webserver to try and serve the static file directly by capturing the request URL before it hits PHP, but that may cause issues with mime types (i.e. an image won't render correctly).
My question is: What is the best practice for achieving what I am doing? How would a big website handle distributing static, user-uploaded content while avoiding a "PHP Bottleneck". I am early enough into my project to consider major rewrites, so please be as informative as possible.
I hope im clear whats the problem but, you may try to hash your current user name and file name plus file extension with sha1 or any shorter encoder wich generates a hash and its barely hard to generate same hash with theese combinations and add that generated hash to file name saved in ur dir. for example
/file/9d42b752ecd0e3b4542aeca21c7c50a9/gifhse3peo40ed-user_photo.jpg
You may then distribute hashes per user for example creating specific folder for specific user to save his uploads so when user reuploads any file the code will know where to save vice versa.
Hope it helps!
I want to allow registered users of a website (PHP) to upload files (documents), which are going to be publicly available for download.
In this context, is the fact that I keep the file's original name a vulnerability ?
If it is one, I would like to know why, and how to get rid of it.
While this is an old question, it's surprisingly high on the list of search results when looking for 'security file names', so I'd like to expand on the existing answers:
Yes, it's almost surely a vulnerability.
There are several possible problems you might encounter if you try to store a file using its original filename:
the filename could be a reserved or special file name. What happens if a user uploads a file called .htaccess that tells the webserver to parse all .gif files as PHP, then uploads a .gif file with a GIF comment of <?php /* ... */ ?>?
the filename could contain ../. What happens if a user uploads a file with the 'name' ../../../../../etc/cron.d/foo? (This particular example should be caught by system permissions, but do you know all locations that your system reads configuration files from?)
if the user the web server runs as (let's call it www-data) is misconfigured and has a shell, how about ../../../../../home/www-data/.ssh/authorized_keys? (Again, this particular example should be guarded against by SSH itself (and possibly the folder not existing), since the authorized_keys file needs very particular file permissions; but if your system is set up to give restrictive file permissions by default (tricky!), then that won't be the problem.)
the filename could contain the x00 byte, or control characters. System programs may not respond to these as expected - e.g. a simple ls -al | cat (not that I know why you'd want to execute that, but a more complex script might contain a sequence that ultimately boils down to this) might execute commands.
the filename could end in .php and be executed once someone tries to download the file. (Don't try blacklisting extensions.)
The way to handle this is to roll the filenames yourself (e.g. md5() on the file contents or the original filename). If you absolutely must allow the original filename to best of your ability, whitelist the file extension, mime-type check the file, and whitelist what characters can be used in the filename.
Alternatively, you can roll the filename yourself when you store the file and for use in the URL that people use to download the file (although if this is a file-serving script, you should avoid letting people specify filenames here, anyway, so no one downloads your ../../../../../etc/passwd or other files of interest), but keep the original filename stored in the database for display somewhere. In this case, you only have SQL injection and XSS to worry about, which is ground that the other answers have already covered.
That depends where you store the filename. If you store the name in a database, in strictly typed variable, then HTML encode before you display it on a web page, there won't be any issues.
The name of the files could reveal potentially sensitive information. Some companies/people use different naming conventions for documents, so you might end up with :
Author name ( court-order-john.smith.doc )
Company name ( sensitive-information-enterprisename.doc )
File creation date ( letter.2012-03-29.pdf )
I think you get the point, you can probably think of some other information people use in their filenames.
Depending on what your site is about this could become an issue (consider if wikileaks published leaked documents that had the original source somewhere inside the filename).
If you decide to hide the filename, you must consider the problem of somebody submitting an executable as a document, and how you make sure people know what they are downloading.
I have a topic/question concerning your upload filename standards, if any, that you are using. Imagine you have an application that allows many types of documents to be uploaded to your server and placed into a directory. Perhaps the same document could even be uploaded twice. Usually, you have to make some kind of unique filename adjustment when saving the document. Assume it is saved in a directory, not saved directly into a database. Of course, the Meta Data would probably need to be saved into the database. Perhaps the typical PHP upload methods could be the application used; simple enough to do.
Possible Filenaming Standard:
1.) Append the document filename with a unique id: image.png changed to image_20110924_ahd74vdjd3.png
2.) Perhaps use a UUID/GUID and store the actual file type (meta) in a database: 2dea72e0-a341-11e0-bdc3-721d3cd780fb
3.) Perhaps a combination: image_2dea72e0-a341-11e0-bdc3-721d3cd780fb.png
Can you recommend a good standard approach?
Thanks, Jeff
I always just hash the file using md5() or sha1() and use that as a filename.
E.g.
3059e384f1edbacc3a66e35d8a4b88e5.ext
And I would save the original filename in the database may I ever need it.
This will make the filename unique AND it makes sure you don't have the same file multiple times on your server (since they would have the same hash).
EDIT
As you can see I had some discussion with zerkms about my solution and he raised some valid points.
I would always serve the file through PHP instead of letting user download them directly.
This has some advantages:
I would add records into the database if users upload a file. This would contain the user who uploaded the file, the original filename and tha hash of the file.
If a user wants to delete a file you just delete the record of the user with that file.
If no more users has the file after delete you can delete the file itself (or keep it anyway).
You should not keep the files somewhere in the document root, but rather somewhere else where it isn't accessible by the public and serve the file using PHP to the user.
A disadvantage as zerkms has pointed out is that serving files through PHP is more resource consuming, although I find the advantages to be worth the extra resources.
Another thing zerkms has pointed out is that the extension isn't really needed when saving the file as hash (since it already is in the database), but I always like to know what kind of files are in the directory by simply doing a ls -la for example. However again it isn't really necessarily.
I have a web application that allows users to upload certain documents relevant to their account (word/excel/powerpoint etc). I'm building this with CodeIgniter and I just wanted to check that I'm not missing anything security wise.
MIME types of the file are checked
Maximum size is checked
The filename is hashed
The filename is never seen by any user rather when clicking a "download" link, the safedownload controller is called with an ID ( http://www.example.com/safedownload/1245/ )
Is there anything I'm missing? The CHMOD of the files in the directory is currently set to 0600, is that secure?
Thanks.
Have you considered the way the files are accessed later? There is a common flaw you should be aware of-
If the filepath can be manipulated in any way, it is possible that your server could be accessed, completely outside the folder you store the documents in- for instance ../../../etc/somefile
To protect against this, you could check the filepath that is about to be accessed for '..' to be sure noone has found a way to to get those characters in the command your code executes!