This is a best practice question regarding how to handle user uploads and distribute static content to a large number of concurrent users.
I have an upload form for images (png, jpg, gif) and other forms of multimedia (mp4, webm). The files are created, hashed, and stored in storage/app/attachments/ as their hash with no extension.
The request URL /file/md5/filename (such as /file/9d42b752ecd0e3b4542aeca21c7c50a9/dancing_cat.gif) will distribute the file with that name. The route is completely flexible, so replacing dancing_cat.gif with boring_cat_dancing_poorly.gif will still fetch the same file, but will distribute it with the new filename specified.
The point of this system is to stop duplicates from being uploaded while preserving the original name of the document that the uploader had. Other instances of the same file uploaded will also keep their name.
The code I have for this works, however, people raise issue with distributing static content through PHP. I am told that on my large, target platform, this system will work poorly and will immediately become a bottleneck. I am told I should use routes in Apache/nginx/Lighhttpd/whatever webserver to try and serve the static file directly by capturing the request URL before it hits PHP, but that may cause issues with mime types (i.e. an image won't render correctly).
My question is: What is the best practice for achieving what I am doing? How would a big website handle distributing static, user-uploaded content while avoiding a "PHP Bottleneck". I am early enough into my project to consider major rewrites, so please be as informative as possible.
I hope im clear whats the problem but, you may try to hash your current user name and file name plus file extension with sha1 or any shorter encoder wich generates a hash and its barely hard to generate same hash with theese combinations and add that generated hash to file name saved in ur dir. for example
/file/9d42b752ecd0e3b4542aeca21c7c50a9/gifhse3peo40ed-user_photo.jpg
You may then distribute hashes per user for example creating specific folder for specific user to save his uploads so when user reuploads any file the code will know where to save vice versa.
Hope it helps!
Related
I am trying to convert our uploaded filenames from an unreadable pile of files to an organized and human-readable structure of files. I am wondering if there are any additional security measures I need to take to safe this type of system.
To give a brief overview, the current system uploads files, generates a random filename, and allows the files to be accessed only through a download script (I have no need to serve the files directly to the browser).
In short, I'd like to implement a WebDav system and would think the easiest solution would be to store uploaded files with their original name (separated into different folders).
Thank you
Edit: To clarify, I'd like to retain the filename as much as possible, but I'd obviously need to at least sanitize the filenames first. I've considered chmod-ing the containing folder to prevent execution (a folder located outside of the web directory). What, in addition, am I not considering.
In short, I'd like to implement a WebDav system and would think the easiest solution would be to store uploaded files with their original name
This is pretty wide question but to make answer short: NEVER TRUST USER PROVIDED DATA. You must always do server side validation and sanitization, otherwise you will be hacked sooner or later.
Original file name is sent by client, so it can be anything. Here's some ideas of what I'd try to send you as "original" file names knowing you are so
carefree: ../../../../etc/passwd or ../../config/db.php. Handle as-it-comes. Enjoy :)
EDIT
I should have mentioned things I've considered -- sanitizing filenames
Sanitized file name is not an original file name any more. However there's approach you could consider here to meed your goal and still stay safe. You could validate/sanitize original file names and if after that it still the same as it came from user, you can keep the file and retain the original name. If it is not, then you should reject the file upload as whole. At the end fo the day you will have only files that you can allow to be accessed with original file names via other API/interfaces.
EDIT
I've considered chmod-ing the containing folder to prevent execution
This is bad security. You should rather keep files in the folder that is not accessible directly instead.
From How to Securely Allow Users to Upload Files:
Always Store Uploaded Files Outside of the Document Root
If your website is example.com and when a visitor accesses this website in their browser, the script located at /home/example/public_html/index.php is executed, then you should not be storing the files that users have uploaded in /home/example/public_html/ or any of its subdirectories. A good candidate, instead, would be /home/example/uploaded/.
...
Instead of storing the file at /home/example/uploaded/some/directories/user_provided.file, store all relevant metadata in a database record (while taking care to prevent SQL injection vulnerabilities) and use a random filename for the actual filesystem storage.
This does three things:
It guarantees that your user's file will never be executed as a script. They get read-only access whether they like it or not. (No reverse shells!)
It prevents the user from controlling the filename, to prevent security-critical files from being overwritten.
It allows you to retain as much metadata about each file as you'd like without sacrificing security.
If you need a real implementation to reference, here are two from a CMS that I'm developing:
Uploading files
Serving files
I'm developping an App in Android which somehow has avatars like Whatsapp do. As you know, in WhatsApp you can create a group, and set a group picture for it.
I don't have any problems on taking the image, saving, etc. The problem I have is that I'm developing the webservice in Symfony2 (PHP) and I want to receive the image and save it somewhere on the server. However, obviously those images are NOT public and should be only viewed for users with permissions. I've thought about traditional method: saving the image on a folder and giving the link or not, but this is totally easy to hack.
So guys, how would you do this? Maybe saving the binary data into MySql directly? Is there any clean way to achieve this?
Any tips are appreciated.
Thanks.
Another answer is to set the mime type of the PHP call to be an image. A call to a URL like http://xxx/images.php?id=8989031289130 would then return an image instead of an HTML file.
You then have access to the PHP security context and can validate whether the user actually has permissions to view this file.
There are some more details at:
Setting Mime type in PHP
The typical answer here is to use a file naming scheme that precludes guessing. For example, you could take the filename plus a secret salt, hash them together, and append the hash to the filename (before the extension). Thus, what would be /foo/bar/baz.jpg would become /foo/bar/baz_8843d7f92416211de9ebb963ff4ce28125932878.jpg.
So long as your hash salt remains secret, filenames are more or less mathematically protected from random or brute-force discovery. This is, for example, the core of how Facebook protects its' users pictures without having to actually require authentication for each image request (which doesn't scale well at all).
I have no idea how the big websites save the pictures on their servers. Could any one tell me how do they save the pictures that are uploaded by the users in their database?
I was thinking, maybe they would just save the file(the picture) in some path and just save that path in the databse is that right?
But I want to do it this way. Is this right? For example, a website named www.photos.com. When a user uploads a picture I would create a folder of the user name and save those pictures in that folder.
I believe we can create a directory using php file concepts. So when a new user uploads his picture or file, I want to create a directory with his name.
Example: if user name is john, I would create a directory like this on photos.com www.photos.com/john/ and then save all his pictures to this directory when he uploads a picture. Is this the right way to do this?
I have no one here that has good knowledge of saving the files to servers so please let me know how to do this? I want to do it the correct and secure way.
All big websites don't save pictures to the database they store them in the disk.
They save a reference to the picture's position in a table. And then link from there.
Why? Performance.
Pulling heavy content from a database is a huge performance bottleneck. And databases don't scale horizontally that well, so it would mean even a bigger problem. All big sites use static content farms to deal with static content such as images. That's servers who won't care less about your identity.
How do they keep the pictures really private you might ask? They don't.
The picture's link is, in itself, the address and the password. Let's take Facebook, for example. If I store a private picture on my account you should not be able to open it. But, as long as you have the correct address you can.
This picture is private. Notice the filename
10400121_87110566301_7482172_n.jpg
(facebook changes the url from time to time so the link may be broken)
It's non sequential. The only way to get the picture is to know it's address.
Based on a previous user photo you can't guess the next one.
It has a huge entropy so even if you start taking random wild guesses you'll have an extensive amount of failures and, if you do get to a picture, you won't be able to, from there, realize the owners identity which, in itself, is protection in anonymity.
Edit (why you should not store images in a "username" folder:
After your edit it became clear that you do intent to put files on disk and not on the database. This edit covers the new scenario.
Even though your logic (create a folder per user) seams more organized it creates problems when you start having many users and many pictures. Imagine that your servers have 1T disk space. And lets also imagine that 1T is more or less accurate with the load the server can handle.
Now you have 11 users, assume they start uploading at the same time and each will upload more than 100GB of files. When they reach 91GB each the server is full and you must start storing images on a different server. If that user/folder structure is followed you would have to select one of the users and migrate all of his data to a different server. Also, it makes a hard-limit on a user who can't upload more than 1T in files.
Should I store all files in the same folder, then?
No, big-sites generally store files in sequential folders (/000001/, /000002/, etc) having an x defined number of files per folder. This is mainly for file-system performance issues.
More on how many files in a directory is too many?
It is usually a bad idea to store images in your database (if your site is popular). Database is, traditionally, one of main bottlenecks in most any application out there. No need to load it more than necessary. If images are in the filesystem, many http servers (nginx, for example) will serve them most efficiently.
The biggest social network in Russia, Vkontakte does exactly this: store images in the filesystem.
Another big social network implemented a sophisticated scalable blob storage. But it's not available to the public, AFAIK.
Summary of this answer: don't store blobs in the database.
is this the right way to do
Yes.
The only thing I'd suggest to use not name but id.
www.photos.com/albums/1234/ would be okay for starter.
Image management may best be achieved by physically uploading images to the server and then recording file location and image details in a database. Subsequently, a Search Form could be configured to permit the user to do a text search, part number search, or other queries. A PHP script could be written to produce a valid HTML image tag based on data found in the table.
uploading images into a MySQLâ„¢ BLOB field is such a bad idea such image data is generally problematic if the images are much larger than thumbnails. If the images are large, you can end up having to copy/paste one SQL INSERT statement at a time (into phpMyAdmin). If the images are large and the SQL INSERT statement is broken into two lines by your text editor, you'll never be able to restore the image.
What are some ideas out there for storing images on web servers. Im Interacting with PHP and MySQL for the application.
Question 1
Do we change the name of the physical file to a000000001.jpg and store it in a base directory or keep the user's unmanaged file name, i.e 'Justin Beiber Found dead.jpg'? For example
wwroot/imgdir/a0000001.jpg
and all meta data in a database, such as FileName and ReadableName, Size, Location, etc.
I need to make a custom Filemanager and just weighing out some pros and cons of the underlying stucture of how to store the images.
Question 2
How would I secure an Image from being downloaded if my app/database has not set it to be published/public?
In my app I can publish images, or secure them from download, if I stored the image in a db table I could store it as a BLOB and using php prevent the user from downloading it. I want to be able to do the same with the image if it was store in the FileSystem, but im not sure if this is possible with PHP and Files in the system.
Keeping relevant file names can be good for SEO, but you must also make sure you don't duplicate.
In all cases I would rename files to lowercase and replace spaces by underscores (or hyphens)
Justin Beiber Found dead.jpg => justin_beiber_finally_dead.jpg
If the photo's belongs to an article or something specific you can perhaps add the article ID to the image, i.e. 123_justin_beiber_found_dead.jpg. Alternatively you can store the images in an article specific folder, i.e. /images/123/justin_beiber_found_dead.jpg.
Naming the files like a0000001 removes all relevance to the files and adds no value whatsoever.
Store (full) filepaths only in the database.
For part 2;
I'm not sure what the best solution here is, but using the filesystem, I think you will have to configure apache to serve all files in a particular directory by PHP. In PHP you can then check if the file can be published and then spit it out. If not, you can serve a dummy image. This however is not very efficient and will be much heavier on apache.
I have a topic/question concerning your upload filename standards, if any, that you are using. Imagine you have an application that allows many types of documents to be uploaded to your server and placed into a directory. Perhaps the same document could even be uploaded twice. Usually, you have to make some kind of unique filename adjustment when saving the document. Assume it is saved in a directory, not saved directly into a database. Of course, the Meta Data would probably need to be saved into the database. Perhaps the typical PHP upload methods could be the application used; simple enough to do.
Possible Filenaming Standard:
1.) Append the document filename with a unique id: image.png changed to image_20110924_ahd74vdjd3.png
2.) Perhaps use a UUID/GUID and store the actual file type (meta) in a database: 2dea72e0-a341-11e0-bdc3-721d3cd780fb
3.) Perhaps a combination: image_2dea72e0-a341-11e0-bdc3-721d3cd780fb.png
Can you recommend a good standard approach?
Thanks, Jeff
I always just hash the file using md5() or sha1() and use that as a filename.
E.g.
3059e384f1edbacc3a66e35d8a4b88e5.ext
And I would save the original filename in the database may I ever need it.
This will make the filename unique AND it makes sure you don't have the same file multiple times on your server (since they would have the same hash).
EDIT
As you can see I had some discussion with zerkms about my solution and he raised some valid points.
I would always serve the file through PHP instead of letting user download them directly.
This has some advantages:
I would add records into the database if users upload a file. This would contain the user who uploaded the file, the original filename and tha hash of the file.
If a user wants to delete a file you just delete the record of the user with that file.
If no more users has the file after delete you can delete the file itself (or keep it anyway).
You should not keep the files somewhere in the document root, but rather somewhere else where it isn't accessible by the public and serve the file using PHP to the user.
A disadvantage as zerkms has pointed out is that serving files through PHP is more resource consuming, although I find the advantages to be worth the extra resources.
Another thing zerkms has pointed out is that the extension isn't really needed when saving the file as hash (since it already is in the database), but I always like to know what kind of files are in the directory by simply doing a ls -la for example. However again it isn't really necessarily.