I am developing Cloud File Storage that can be accessed by specific users.
What is the best way to store folders and files?
What I mean is either the folder created in Database or the folder create in the real folder in server.
These way must be considered to which one is fast and secure.
All the methods has its own advantages and should be used based on your use case.
1) Storing in Database
2) Storing in real folder
or , may be
3) storing in AWS cloud storage like s3
Some of the pros / cons revolve around
Size of the Individual file
How many files are you storing.
Concurrent users trying to download each file.
How do you plan
to do Backup etc.
Do have look at these article
https://habiletechnologies.com/blog/better-saving-files-database-file-system/
https://softwareengineering.stackexchange.com/a/150787
Related
I have made a database with tables like projects, employees.
I have some fotos and pdf files (for instance, scan of certificates) related to the entries in my DB, that should be accessible via our internal web site.
Does anyone have any suggestions on how I should manage this?
I was thinking on setting up a subdomain "files.ourdomain.com", and create subdirectories there for each table. And make a directory for each record? Should i create a DB field for "employee.certificates" with the entire path/filename of the certificate foto?`
Or should i actually store the files in the database? (MySql INNODB)
In a comment, Peter said that it's easier and better to store the files in a database for security reasons and there is a case for this - only DBA's and the webservers will have access to the files and they could be encrypted. I don't agree with the "easier" bit though as you would need to encode the image before placing it into the DB and decode again when you want to display it.
You have a choice:
you store the files actually in the database itself
you store the file in a folder (outside of the web tree) and only store the
filename to the file
Each method has pros and cons:
If you store the files actually within the database you could take the backup file and put it onto another server and everything will be present. However, if you're using replication or clustering then each server will have a copy of the file and so storage requirements increase. Obvious, you will also need more space for backups, and backing up and restoration will take proportionally longer.
If you store the file in a central location and only record the location, your DB storage requirements are lower and multiple DB servers can be confident that there is only ever one copy of a file. Again, the files can be backed up separately. The downside is "what happens if your file storage server fails". However, with mirroring and backups, this can be mitigated against.
In both cases, you would need to store a filename for each file so that the webservers can use them.
Have a look at this Stack Overflow question which does a much better job of the pros and cons than I ever could.
Speaking personally, I store a filename which links to an external file.
I'm writing an app that let's you share files in the cloud.
You make a user-account and can upload your files with and send links to friends.
I'm using Amazon S3 to save the data.
But I'm not sure how I should proceed.
There are buckets, which you can create in S3, and in those buckets you save your files.
I thought about making a bucket for each user, but then I read that you can only have 100 buckets at a time.
Isn't there a better way to managing this then to just save all user files in one "directory".
This will get so messy. I have never used S3 before, I would be very thankful for any advice.
And if this is the only way, what naming convention proved to be the best?
Thanks!
Even though S3 has a flat structure within a bucket, each object has its own path much like the directories you're used to.
So you can structure your paths like so:
/<user-id>/<album-1>/...
One thing to keep in mind is that not all directory related features are available, such as:
Deny access to /<user-123>/*,
Copy from one directory to another.
Is it better to read and list images directly from file system using simple php, or is it better to store image meta info and filename in the database and access the images by doing a mysql select. What are the pros and cons of both solutions.
Listing files on a file system is probably the easiest way to accomplish what you trying to do but it's going to be very slow if you are trying to cycle through several thousand directories/files on a networked file system (NFS, CIFS, GlusterFS, etc).
Storing files in a database will create a much more overhead since you are now involving an external application to store information. You have to remember that every time you are using a database you are also using network I/O, authentication mechanism, query parser, etc. At the same time all of this overhead might provide for a faster response then using a networked file system.
To conclude - everything depends on amount of files you are working with and underlying infrastructure. Two major things to look out for are going to be disk I/O and network I/O.
I would do the following:
Upload all the images in one directory
Store references to those images that are tied to the uploader's User ID
Then just select the image URLs that are tied to that ID, and output them however necessary.
People find it easier to store their files within folders and parse that folder with php. If you go the database method the database eventually gets larger and larger and larger.
I can see it becoming personal preference, but I personally have gone with parsing folders for images rather than storing it within a database.
Depends on the scale of what you are doing.
This is what I would be doing.
Store the file metadata in the database. You can store quite a bit of information about this image this way.
Store the image file on a distributed storage system like Amazon S3. Store the path in your metadata. Replication is part of the system. And it easily integrates with Cloudfront CDN.
Distribute the the images through Amazon Cloudfront CDN.
I accept file uploads from users. Each file has a pointer in the db which has info on the file location in the filesystem.
Currently, I'm storing the files in the filesystem non categorically, and each file is currently just named a unique value. All categorisation and naming etc is done in the app using the db.
A factor that I'm concerned about is that of file synchronization issues.
If I wanted to set up file system synchronization where, for example, the user's files are automatically updated by bridging with a pc app, would this system still work well?
I have no idea how such a system would work so hopefully I can get some input.
Basically, is representing a file's name and location purely in the database optimal, especially if said file may be synchronized with a pc application?
Yes, the way you are doing this is the best way to do it. You are using a file system to store files and a database to sore structured data.
One suggestion I would make is that you create a directory tree on the file system. You may one day run up against a maximum files per directory limitation of your file system. I have built systems that create a new sub directory for each day or week.
Make sure you have good backups of the database as well as the document repository.
All you need to make such a system work is to make sure the API you use (or, more likely, create) can talk to the database and to the filesystem in a sensible way. Since this is what your site is already doing anyway, it shoudn't be hard to implement.
The mere fact that your files are given identifiers instead of plain-English names is mostly irrelevant with regard to remote synchronization.
Store a file hash in the database rather than a path (i.e. SHA1) and have a separate database connect the hash with the path. Write a small app that will synchronize the hash database so that when you move your files to a different location it'll be easy to build a new database with updated paths.
That way you can also have the system load the file from a different location depending of which hash database you use to locate the file so it offers some transparency if you need people to be able to access the same file from diverse locations (i.e. nfs or webdav).
We use exactly this model for file storage, along with (shameless plug) SabreDAV to make it seem to the end-user it's a normal filesystem.
I think this is a perfectly fine model, as long as looking up the file is documented and easily retrieved there shouldn't be an issue. Just make backups of your DB :)
One other advice I can give, we use an md5() on the file-id to generate a unique filename. We use parts of the files to generate a directory structure, for example.. id 1 will yield: b026324c6904b2a9cb4b88d6d61c81d1, the resulting filename will become:
b02/632/4c6/904b2a9cb4b88d6d61c81d1 The reason for this is that most stable filesystems can become very slow after a high number of files (or directories) in one directory. It's much, much faster too traverse a few sub-directories.
The Boring Answer™:
I think it depends on what you wanna do, as always :)
I mean take your regular web hosting company. Developers are synching files to web servers all the time. Would it make sense for a web server to store hash-generated file names in a db that pointed to physical files? No. Then you couldn't log in with your FTP-client and upload files like that, and you'd have to code a custom module to get Apache to work etc. Instant headache.
Does it make sense for Flickr to use a db? Yes, absolutely! (Then again, you can't log in with an FTP-client and manage your photos—and that's probably a good thing!)
Just remember, a file system is a (very simple) db too. And it's a db that comes with a lot of useful free tools.
my 2¢
/0
A while a go I had to developed a music site that allowed audio files to be uploaded to a site and then converted in to various formats using ffmpeg, people would then download the uploaded audio files after purchasing them and a tmp file would be created and placed at the download location and was only valid for each download instance and the tmp file would then get deleted.
Now I am revisiting the project, I have to add pictures and video as upload content also.
I want to find the best method for storing the files,
option 1 : storing the files in a folder and reference them in the database
option 2 : storing the actual file in the database(mysql) as blob.
I am toying around with this idea to consider the security implications of each method, and other issues I might have not calculated for.
See this earlier StackOverflow question Storing images in a database, Yea or nay?.
I know you mentioned images and video, however this question has relevance to all large binary content media files.
The consensus seems to be that storing file paths to the images on the filesystem, rather then the actual images is the way to go.
I would recommend storing as files and storing their locations in the database.
Storage the files in a database requires more resources and makes backing up/restoring databases slower.
Do you really want to have to transfer lots of videos every time you do a database dump?
File systems work very well for dishing out files, and you can back them up/sync them very easily.
I would go for the database option. I've used it on a number of projects, some very larger 100+GB. The storage implementation is key, design it poorly and your performance will be punished. See this example for some good implementation ideas:
Database storage allows more scalability and security.
I would go for storing files directly on the disk, and database holding only their ID/url.
This way accessing those files (that can be large, binary files) doesnt require any php/database operation, and it's done by the webserver directly.
Also it will be easier to move those files to another server if you'd want to.
Actually only one upside I can see atm of storing them in database is easier backup - you wanna backup your DB anyway, this way you'll have all data in one place and you can be sure that each backup is full (i.e. you don't have files on disk that aren't used by database entries; and you don't have image IDs in your database that point to nowhere)
I asked a similar question using Oracle as the backend for a Windows Forms application.
The answer really boils down to your requirements for backing up and restoring the files. If that requirement is important then use the database as it'll be easier (as you're backing up the database anyway, right? :o)