I am debating on storing images in the Mongo GridFS or on an cloud file system. I am leaning towards the cloud because of a few reasons.The language being used is PHP on a Nginx server.
Storing images in the GridFS increases the size of the database. Therefore more of the database has to be in memory and I will spend more time/money managing the servers when it comes to things like sharding.
Retrieving the image from GridFS takes longer than cloud because I have to
a) Query the image using the id
b) read the image into memory
c) use a php header to display the image
The cloud would be better because its a url of the image directly to the cloud.
Does those reasons sound valid or should I be going in a different direction with my thinking?
Its not entirely true.
Storing images in the GridFS increases the size of the database.
Therefore more of the database has to be in memory and I will spend
more time/money managing the servers when it comes to things like
sharding.
Mongodb gridfs splits the huge files into the chunks and only they will be loaded and served (by each chunk) when it is requested . Yes definitely it ll take more memory than the file system. These are all the trade offs when using the in memory data stores.
Retrieving the image from GridFS takes longer than cloud because I
have to a) Query the image using the id b) read the image into memory
c) use a php header to display the image
As i stated in my previous point, it will be getting loaded into the memory at the first time. so you wont be having much of a performance problem, infact it ll be a gain since it served from RAM instead disk. But if you are not satisfied still, i would recommend to cache the images in nginx. so it wont come to mongo after the first one.
Related
I use Jelastic to host a PHP application. Editors can upload pictures through the application that are stored in the file system. These pictures are stored within the document root and are served on the frontend as e.g. http://example.com/uploads/123/picture.jpeg
For the NGinx application server, I have enabled vertical scaling but have a single node, i.e. no horizontal scaling.
Picture uploads are not reliable. When I update a picture #1 through my PHP admin interface, then update another one, picture #1 has changed back to the old picture.
My question: Are picture uploads sync'ed across multiple cloudlets on a single node? What will happen if I scale horizontally to multiple nodes?
My question: Are picture uploads sync'ed across multiple cloudlets on a single node?
I think there is a terminology problem here.
Cloudlet: A composite resource unit composed of RAM and CPU usage. 1 Cloudlet = 128MB RAM and approx. 200MHz CPU. A server (Jelastic refers to this as a 'node') typically uses multiple cloudlets; e.g. it may use several GB RAM and/or several GHz CPU at any given moment.
More details at http://kb.layershift.com/introducing-cloudlets
Each node is a self-contained (virtual) server, with its own filesystem. So if you have a single NGINX PHP application server, it doesn't matter if it uses 1 or 100 cloudlets (remember, this is only a measurement of RAM and CPU consumption!), it has 1 filesystem and all of the files that you successfully write there will be available for any subsequent requests.
What will happen if I scale horizontally to multiple nodes?
Right, you have to be careful here. If your application is writing to the local filesystem, you have a problem when dealing with multiple horizontally scaled servers. This is a very typical scaling problem that every application must deal with.
If we're simply talking about static resources (e.g. images), one of the best and simplest ways to handle this issue is to upload all of those to a single server. For example if you have 4 NGINX PHP servers - let's say they load balance your-application.com - you might make one of those servers (or perhaps a completely separate environment) images.your-application.com
So you perform the uploads to images.your-application.com, and reference that directly in your HTML when you wish to display those uploaded images.
Remember, images.your-application.com is only responsible for serving the actual images; so it's really lightweight and should handle a decent volume simply with vertical scaling - which is completely automatic on Jelastic.
When you need to scale images.your-application.com, the easy way is to take a CDN service (CloudFlare, Incapsula etc.). This will leave images.your-application.com only handling the uploads and the small amount of download traffic which is not already cached at the CDN.
Having the same issue, please read this jelastic tutorial.
In summary, jelastic have a script which help you with sincronization, you just have to execute the script and indicate the folders you want to sync in all nodes.
Then, everytime you upload a file to those folders, in cuestion of seconds or minutes the files will be available for all nodes; the time is depending of the file size.
this question is confusing me so i thought i should listen to an expert voice !.
is it better to upload images to a folder and just save link to mysql, or better upload img itself into a blob mysql field ?
thank you very much
I have often built systems to store images in the database, there are pros and cons to doing this.
Pros:
All your data is kept in one place, if you migrate your website/database the images will just be there
Its easier to sort/delete/etc...
Since you have to serve it via a PHP script, you can perform additional things such as security if required, or image processing (obviously you can do this with flat file too, but you have to make sure the security cant be bypassed by leaving the images in a public directory).
Cons:
Its slower then serving a flat file from the webserver as a PHP script needs to retrieve it, and MySQL needs to return the data.
Your database will become large very fast and not all web hosts take too kindly to this.
The file system is faster for flat file storage and retrieval as thats exactly what a file system is designed for.
Bad. Your webserver does a much better job managing expiry headers and directly loading files from the filesystem. Throughput will be much higher using the filesystem. It's what it's designed for, utilize it.
SQL databases are designed for relational data, not images. You're just loading your database unnecessarily. Store the path/image name instead.
If your application is large i.e you have to display a large number/size of images repeatedly then you should go for first method (storing only image path in database and actual images on file system). This will reduce the processing time to display images moreover consumes less resources. Secondly, if your application requires less number of images then you can store them directly in database . This way it becomes easy to take backups and port application to another OS.
Is it better to read and list images directly from file system using simple php, or is it better to store image meta info and filename in the database and access the images by doing a mysql select. What are the pros and cons of both solutions.
Listing files on a file system is probably the easiest way to accomplish what you trying to do but it's going to be very slow if you are trying to cycle through several thousand directories/files on a networked file system (NFS, CIFS, GlusterFS, etc).
Storing files in a database will create a much more overhead since you are now involving an external application to store information. You have to remember that every time you are using a database you are also using network I/O, authentication mechanism, query parser, etc. At the same time all of this overhead might provide for a faster response then using a networked file system.
To conclude - everything depends on amount of files you are working with and underlying infrastructure. Two major things to look out for are going to be disk I/O and network I/O.
I would do the following:
Upload all the images in one directory
Store references to those images that are tied to the uploader's User ID
Then just select the image URLs that are tied to that ID, and output them however necessary.
People find it easier to store their files within folders and parse that folder with php. If you go the database method the database eventually gets larger and larger and larger.
I can see it becoming personal preference, but I personally have gone with parsing folders for images rather than storing it within a database.
Depends on the scale of what you are doing.
This is what I would be doing.
Store the file metadata in the database. You can store quite a bit of information about this image this way.
Store the image file on a distributed storage system like Amazon S3. Store the path in your metadata. Replication is part of the system. And it easily integrates with Cloudfront CDN.
Distribute the the images through Amazon Cloudfront CDN.
I did some Google searches and can't seem to find what i want. I'm designing my web site to use MYSQL, PHP Web Servers. multiple web servers with load balancers and a MySql Custer for scaling is planed so far. But then i get to images/videos/mp3s. I need a file system multiple servers can read files from and write files to. So one web server can run the MySQL, Networked File System and Web Server, but as the site scales the site can be switched to multiple servers. Does anyone have any examples, tutorials or resources to help me on this? The site runs on Ubuntu Servers. My original idea was to just store the images in MySQL(I know how to do that and have working examples) so all servers could read/write but other people told me thats a bad idea and i should use a file system(but don't want to use the local one, as i don't think it san scale for large sites).
There are Three systems that come to mind - Mogilefs, Mongodb GridFS and a cloud based storage solution.
MogileFS (OMG Files!) was developed for Livejournal and stores metadata in Mysql. It uses that to find the actual disk with the appropriate file and streams it out.
MongoDB GridFS is a lot newer, and probably easier to get going, certainly for a smaller system. It uses a new 'NoSql' database to store parts of files across its database, assembling as required. Searching around for information will find plenty of information.
Finally, you could simply avoid the whole issue and just upload images into Amazon's S3, or Rackspace Cloudfiles. I've done the latter before (though the site was already running inside Rackspace's system) and it's not very difficult, again with plenty of examples around.
For S3 there is also a command-line tool, s3cmd that can be set to sync (or, better) upload and then delete a directory full of files into an S3 'bucket'.
First storing images/large files is not really possible with MySQL because of the maximum size limitation
To quote this answer Choosing data type for MySQL?
MySQL is incapable of working with any data that is larger than max_allowed_packet (default: 1M) in size, unless you construct complicated and memory intense workarounds at the server side. This further restricts what can be done with TEXT/BLOB-like types, and generally makes the LARGETEXT/LARGEBLOB type useless in a default configuration.
Now for storage and upgrade compatibility why not just store them on an NAS or Raid system that you can continue to tack drives onto. Then in your DB just store a path to the file. Much lest db intensive and allows for decent scalability.
I have a LAMP server with 256MB RAM (poor man's server in cloud). I have an app written to run on this machine. Currently people upload images and they go straight into mysql as BLOB.
There are concerns that this might be very memory consuming operation and we move over it to simple plain files. Can some one tell me if these concerns are valid? (Worth putting efforts into changing a lot of ode that's already written given that we will have sufficient RAM in next 6 months ?)
As a general rule when should we store images in DB and when as files?
To read a BLOB in MySQL you need three times as much memory as it takes (it gets copied into several buffers).
So yes, reading a BLOB in MySQL consumes more memory than reading a file.
You should store them in the file system for several reasons:
The images are easily accessible via other apps (shell, FTP, www, etc...),
It's less resource intensive (including memory) to read them from the file system than from a database
If the database gets corrupted, the images are safe.
You also won't have tables bump up against their size limitations (determined by OS file size limitations) which slows them down (and making them require more resources to read).
The only time you should consider storing images in the DB is when they are used in transaction processing, and even then, there are numerous workarounds to that when storing in the file system.
To summarize:
Database Storage:
Pros:
Assures referential integrity
Easier backup strategy
Easier clustering (database cluster)
Cons:
Higher cost in memory usage and storage
Hard to scale
Additional code must be written to support HTTP caching
Requires a database and associated querying code
File System Storage
Pros:
Low memory footprint (more efficient)
Storage equal to file size
Easy retrieval and storage
Allows the web server to control caching
Cons:
Referential integrity not assured
Backups are not always in sync with database backups
Requires additional backup strategy
If referential integrity of your images is important, store them in the database. The advantages is that a backup of your database will always means that your rows and images are in sync. It does mean though that it is a bit more costly resource wise to store and retrieve.
If the images themselves are not that important, store them as files. It allows for fast and simple retrieval and storage. The downside of using files though is that your backup strategy becomes more complicated and your files will not always be in sync with your database rows.
I personally always store them in the database. For me, the rewards are greater then the cost. This is hardly always the case though and you should look at your application requirements to see which is best for you.
Some large websites are using BLOBs to store their website content. Flickr's use of BLOBs is actually well documented. To answer your question though, file storage is more memory efficient than database storage.
If it is for serving on a WebPage I would user plain file system with a link to the image file name in text format on DataBase. Apache and browsers usually do an amazing job on caching static files.
Even though in theory, you could achieve similar performance serving images from a database, the amount of work you need to do for it does not justify this selection given that the only advantage I can think of is a more cohesive database (with a simple DB dump you get ALL your data: images + data).
If you have a lot of files to keep track of, or they are very large, I'd store them as files. Especially if these files are to be accessed via the web, in which case you can offload all that effort from the SQL server and let the web server handle the transfer.
A good way to track images is to name them using the primary key, and then keep track of the original file name (if you need it) in the database. This way you can always know which file connects to which row. Also, if you have many files (thousands, millions,...), you might consider 'hashing' them into directories, so that 1-1000 are stored in /1, 1001-2000 ares stored in /2, etc. Some OS's see a bit of slowdown when you get a large number of files in a single directory.