How file uploads to a clustered environment are typically handled using PHP?
I am currently working on a backend system for a cloud storage/sharing website
I would like to make sure that a clients upload is stored to the node with least load and most available resources, creating a unique file reference stored to the database. Also multiple uploads of the same file should be detected to reduce space.
Are there any best practices, design patterns documentation covering this topic and might help illustrating what is needed to get things done here.
Related
I am working on a web application that a lot of users can upload files. I can store the hash of stored files and check if the file is already on the server and only store some meta data (filename, owner, .… ) and redirect the user to that file when the user need to retrieve that files. I can delete the file when all users soft delete their file.
My question is should I do that on an enterprise application or the additional programming effort weigh out its benefits, cloud drives, ...?
What is the general approach regarding this issue? What about mail servers, social networks and sites similar to SE?
Only implement the hashing strategy if you can foresee a significant amount of duplicates being uploaded. Otherwise it is not worth adding the unnecessary complexity that you will have to troubleshoot and maintain.
I asked a question earlier about the difference between cloud apps and web apps, and the answers and links I received made me to believe that 'cloud' is more of a location of an application, and not just about specific applications. And that prompts these questions:
1) If I'm developing an application that'll be based in the cloud using PHP and MySQL; traditional server setups requires me to have a PHP and MySQL engines on the server; otherwise, they won't run. Is it the same with the cloud? Do I have to look for clouds with these engines, install them myself, or they aren't needed at all?
2) When building applications, files are usually referenced relatively or absolutely, based on their location to the calling file. With the cloud, since you don't know the location of the files, how can you reference the required files? Do you have to use URLs for that?
I've pored over many of the cloud questions on here, and it seems that there are a lot of confused souls out there just like myself, and most of the answers don't seem too convincing. Hence, my reason for asking again.
Thanks.
The cloud doesn't mean you don't know the location of your files, it only means that the files are not stored on the end user's computer (possibly). From your perspective as the developer of the web application, you still will (indeed must) know the locations of any stored files, since it is your application storing them.
To give your end user a reference URL to a file, you can do many different things. One method, for example, involves storing some kind of unique identifier along with the file path to a stored file on your server together in a database. You give the user a URL that references the unique identifier, and in your code you then retrieve the file from disk and stream it down to the user using the correct headers.
Another method is to store files in the database as binary BLOBs, and retrieve the data and send it down to the browser with the correct headers. Again, you as the application developer are still responsible for the fate of those files, even though the end user doesn't need to worry about where or how they're stored.
Is it better to read and list images directly from file system using simple php, or is it better to store image meta info and filename in the database and access the images by doing a mysql select. What are the pros and cons of both solutions.
Listing files on a file system is probably the easiest way to accomplish what you trying to do but it's going to be very slow if you are trying to cycle through several thousand directories/files on a networked file system (NFS, CIFS, GlusterFS, etc).
Storing files in a database will create a much more overhead since you are now involving an external application to store information. You have to remember that every time you are using a database you are also using network I/O, authentication mechanism, query parser, etc. At the same time all of this overhead might provide for a faster response then using a networked file system.
To conclude - everything depends on amount of files you are working with and underlying infrastructure. Two major things to look out for are going to be disk I/O and network I/O.
I would do the following:
Upload all the images in one directory
Store references to those images that are tied to the uploader's User ID
Then just select the image URLs that are tied to that ID, and output them however necessary.
People find it easier to store their files within folders and parse that folder with php. If you go the database method the database eventually gets larger and larger and larger.
I can see it becoming personal preference, but I personally have gone with parsing folders for images rather than storing it within a database.
Depends on the scale of what you are doing.
This is what I would be doing.
Store the file metadata in the database. You can store quite a bit of information about this image this way.
Store the image file on a distributed storage system like Amazon S3. Store the path in your metadata. Replication is part of the system. And it easily integrates with Cloudfront CDN.
Distribute the the images through Amazon Cloudfront CDN.
This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
Storing Images in DB - Yea or Nay?
I need to store profile pictures. Lots of them.
I'm not sure if I should store them in the database. I'm not even sure if that's a good idea to begin with, or if I should just store them in a separate directory on the server, and disallow access to them with HTAccess.
But I'm not overly familiar with HTAccess and when I have used snippets to disallow access to a folder, it has just never worked.
I am using winhost.com to host my sites, so I would assume that HTAccess would work.
Can anyone suggest which way would be better for storing tens of thousands of profile pictures on a single server? I have read many blogs, forum posts etc that I've found on Google, and am a little bit more confused since half of them suggest one thing, and the other half disagree and suggest using a database would be perfectly fine.
Personal experience says that storing lots of image in a database makes the database very slow to back up. That can be irritating when you come to run repeatable tests, or update the DB schema and you want to take an ad-hoc backup, as well as in a general case. Also, depending on database, storing blobs (which inevitably means that you're storing rows of non-fixed length) can make querying the table quite slow - although that can easily be fixed with appropriate indexing.
If you store them in the filesystem and serve them directly with your webserver as you suggest, one problem you will find is how to appropriately access-control them if you want only logged-in users to see them. That will depend on the design of your application and may not be a problem.
Two other options:
you can store them in the filesystem and serve them with an application page, so that it can e.g. check access control before fetching the image and sending it to the client.
you can use X-SendFile: headers if your webserver supports them to serve a file on the filesystem - the application page tells the webserver the file to fetch, and the webserver will fetch the file and send it. Potentially the application and the image files can live on different machines if you use e.g. FastCGI, and the image is never sent over the FastCGI connection.
You may also want to consider cacheing - if you write any programmatic way to send the file, you'll need to add additional logic so that the image can be cached by the browser, or you'll just end up serving the image over and over again and upping your bandwidth costs.
There is a trade off - it will depend on your exact situation and needs. The benefits of each include
Filesystem
Performance, especially caching and I/O
Database
Easier to scale out to multiple web servers
Easier to administer (backup, security etc)
I'm guessing that you are using MySQL, but on the off chance that you have a SQL 2008 DB, have a look at FileStream in this SO article - this gives the best of both worlds.
I'd definitely root for storing only the image path in the database. Storing the image data will slow your site down and put extra strain on your system.
The only case I could imagine an advantage in storing the image data inside the database would be, if you're planning on moving the site around. Then you wouldn't have to worry about filepaths etc..
I have a general question about this.
When you have a gallery, sometimes people need to upload 1000's of images at once. Most likely, it would be done through a .zip file. What is the best way to go about uploading this sort of thing to a server. Many times, server have timeouts etc. that need to be accounted for. I am wondering what kinds of things should I be looking out for and what is the best way to handle a large amount of images being uploaded.
I'm guessing that you would allow a user to upload a zip file (assuming the timeout does not effect you), and this zip file is uploaded to a specific directory, lets assume in this case a directory is created for each user in the system. You would then unzip the directory on the server and scan the user's folder for any directories containing .jpg or .png or .gif files (etc.) and then import them into a table accordingly. I'm guessing labeled by folder name.
What kind of server side troubles could I run into?
I'm aware that there may be many issues. Even general ideas would be could so I can then research further. Thanks!
Also, I would be programming in Ruby on Rails but I think this question applies accross any language.
There's no reason why you couldn't handle this kind of thing with a web application. There's a couple of excellent components that would be useful for this:
Uploadify (based on jquery/flash)
plupload (from moxiecode, the tinymce people)
The reason they're useful is that in the first instance, it uses a flash component to handle uploads, so you can select groups of files from the file browser window (assuming no one is going to individually select thousands of images..!), and with plupload, drag and drop is supported too along with more platforms.
Once you've got your interface working, the server side stuff just needs to be able to handle individual uploads, associating them with some kind of user account, and from there it should be pretty straightforward.
With regards to server side issues, that's really a big question, depending on how many people will be using the application at the same time, size of images, any processing that takes place after. Remember, the files are kept in a temporary location while the script is processing them, and either deleted upon completion or copied to a final storage location by your script, so space/memory overheads/timeouts could be an issue.
If the images are massive in size, say raw or tif, then this kind of thing could still work with chunked uploads, but implementing some kind of FTP upload might be easier. Its a bit of a vague question, but should be plenty here to get you going ;)
For those many images it has to be a serious app.. thus giving you the liberty to suggest a piece of software running on the client (something like yahoo mail/picassa does) that will take care of 'managing' (network interruptions/resume support etc) the upload of images.
For the server side, you could process these one at a time (assuming your client is sending them that way)..thus keeping it simple.
take a peek at http://gallery.menalto.com
they have a dozen of methods for uploading pictures into galleries.
You can choose ones which suits you.
Either have a client app, or some Ajax code that sends the images one by one, preventing timeouts. Alternatively if this is not available to the public. FTP still works...
I'd suggest a client application (maybe written in AIR or Titanium) or telling your users what FTP is.
deviantArt.com for example offers FTP as an upload method for paying subscribers and it works really well.
Flickr instead has it's own app for this. The "Flickr Uploadr".