Related
This question already has answers here:
Storing Images in DB - Yea or Nay?
(56 answers)
Closed 9 years ago.
In the context of a web application, my old boss always said put a reference to an image in the database, not the image itself. I tend to agree that storing an url vs. the image itself in the DB is a good idea, but where I work now, we store a lot of images in the database.
The only reason I can think of is perhaps it's more secure? You don't want someone having a direct link to an url? But if that is the case, you can always have the web site/server handle images, like handlers in asp.net so that a user needs to authenticate to view the image. I'm also thinking performance would be hurt by pulling out the images from the database. Any other reasons why it might be a good/not so good idea to store images in a database?
Exact Duplicate: User Images: Database or filesystem storage?
Exact Duplicate: Storing images in database: Yea or nay?
Exact Duplicate: Should I store my images in the database or folders?
Exact Duplicate: Would you store binary data in database or folders?
Exact Duplicate: Store pictures as files or or the database for a web app?
Exact Duplicate: Storing a small number of images: blob or fs?
Exact Duplicate: store image in filesystem or database?
Pros of putting images in a Database.
Transactions. When you save the blob, you can commit it just like any other piece of DB data. That means you can commit the blob along with any of the associate meta-data and be assured that the two are in sync. If you run out of disk space? No commit. File didn't upload completely? No commit. Silly application error? No commit. If keeping the images and their associated meta data consistent with each other is important to your application, then the transactions that a DB can provide can be a boon.
One system to manage. Need to back up the meta data and blobs? Back up the database. Need to replicate them? Replicate the database. Need to recover from a partial system failure? Reload the DB and roll the logs forward. All of the advantages that DBs bring to data in general (volume mapping, storage control, backups, replication, recovery, etc.) apply to your blobs. More consistency, easier management.
Security. Databases have very fine grained security features that can be leveraged. Schemas, user roles, even things like "read only views" to give secure access to a subset of data. All of those features work with tables holding blobs as well.
Centralized management. Related to #2, but basically the DBAs (as if they don't have enough power) get to manage one thing: the database. Modern databases (especially the larger ones) work very well with large installations across several machines. Single source of management simplifies procedures, simplifies knowledge transfer.
Most modern databases handle blobs just fine. With first class support of blobs in your data tier, you can easily stream blobs from the DB to the client. While there are operations that you can do that will "suck in" the entire blob all at once, if you don't need that facility, then don't use it. Study the SQL interface for your DB and leverage its features. No reason to treat them like "big strings" that are treated monolithically and turn your blobs in to big, memory gobbling, cache smashing bombs.
Just like you can set up dedicated file servers for images, you can set up dedicated blob servers in your database. Give them dedicated disk volumes, dedicated schemas, dedicated caches, etc. All of your data in the DB isn't the same, or behaves the same, no reason to configure it all the same. Good databases have the fine level of control.
The primary nit regarding serving up an blob from a DB is ensuring that your HTTP layer actually leverages all of the HTTP protocol to perform the service.
Many naive implementations simply grab the blob, and dump them wholesale down the socket. But HTTP has several important features well suited to streaming images, etc. Notably caching headers, ETags, and chunked transfer to allow clients to request "pieces" of the blob.
Ensure that your HTTP service is properly honoring all of those requests, and your DB can be a very good Web citizen. By caching the files in a filesystem for serving by the HTTP server, you gain some of those advantages "for free" (since a good server will do that anyway for "static" resources), but make sure if you do that, that you honor things like modification dates etc. for images.
For example, someone requests spaceshuttle.jpg, an image created on Jan 1, 2009. That ends up cached on the file system on the request date, say, Feb 1, 2009. Later, the image is purged from the cache (FIFO policy, or whatever), and someone, later, on Mar 1, 2009 requests it again. Well, now it has a Mar 1, 2009 "create date", even though the entire time its create date was really Jan 1. So, you can see, especially if your cache turns around a lot, clients that may be using If-Modified headers may be getting more data than they actually need, since the server THINKS the resource has changed, when in fact it has not.
If you keep the cache creation date in sync with the actual creation date, this can be less of a problem.
But the point is that it's something to think through about the entire problem in order be a "good web citizen", and save you and your clients potentially some bandwidth etc.
I've just gone through all this for a Java project serving videos from a DB, and it all works a treat.
If you on occasion need to retrieve an image and it has to be available on several different web servers. But I think that's pretty much it.
If it doesn't have to be available on several servers, it's always better to put them in the file system.
If it has to be available on several servers and there's actually some kind of load in the system, you'll need some kind of distributed storage.
We're talking an edge case here, where you can avoid adding an additional level of complexity to your system by leveraging the database.
Other than that, don't do it.
I understand that the majority of database professionals will cross their fingers and hiss at you if you store images in the database (or even mention it). Yes, there are definitely performance and storage implications when using the database as the repository for large blocks of binary data of any kind (images just tend to be the most common bits of data that can't be normalized). However, there are most certainly circumstances where database storage of images is not only allowable but advisable.
For instance, in my old job we had an application where users would attach images to several different points of a report that they were writing, and those images had to be printed out when it was done. These reports were moved about via SQL Server replication, and it would have introduced a HUGE headache to try to manage these images and file paths across multiple systems and servers with any sort of reliability. Storing them in the database gave us all of that "for free," and the reporting tool didn't have to go out to the file system to retrieve the image.
My general advice would be not to limit yourself to one approach or the other - go with the technique that fits the situation. File systems are very good at storing files, and databases are very good at providing bite-sized chunks of data on request. On the other hand, one of my company's products has a requirement to store the entire state of the application in the database, which means that file attachments go in there as well. With our DB server (SQL Server 2005) I've yet to run into observable performance problems even with large customers and databases.
Microsoft's SQL 2008 gives you the best of both worlds with the FileStream feature - might be worth checking out. http://technet.microsoft.com/en-us/library/bb933993.aspx
One of the advantages of storing images into database is that it's portable across the systems and independent on filesystem(s) layout.
The simplest / most performant / most scalable solution is to store your images on the file system. If security is a concern, put them in a location that is not accessible by the web server and write a script that handles security and serves up the files.
Assuming your web/app server and DB server are different machines, you will take a few hits by putting images in the DB: (1) Network latency between the two machines, (2) DB connection overhead, (3) consuming an additional DB connection for each image served. I would be more concerned about the last point: if your site serves a lot of images, your web servers are going to be consuming many DB connections and could exhaust your connection pools.
If your application runs on multiple servers, I'd store the reference copy of your images in the database and then cache them on demand on the filesystems. Doing so is just way less of an error prone pain in the ass than trying to sync filesystems laterally.
If your application is on a single server, then yeah, stick to the filesystem and have the database maintain a path to the data.
Most SQL databases are of course not designed with serving up images in mind, but there is a certain amount of convenience associated with having them in the database.
For example, if you already have a database running and have replication configured. You instantly have an HA image store rather than trying to work some rsync or nfs based filesystem replication. Also, having a bunch of web processes (or designing some new service) to write files to disk increases your complexity a bit. Really it's just more moving parts.
At the very least, I would recommend keeping 'meta' data about the image (such as any permissions, who owns it, etc) and the actual data separated into different tables so it will be fairly easy to switch to a different data store down the line. That coupled with some sort of CDN or caching should give you pretty good performance up to a point, so I suppose it depends on how scalable this application needs to be and how you balance that with ease of implementation.
You don't have to store the URL (if you feel this is unsafe). You can just store a unique id that references the image elsewhere.
Database storage tends to be more expensive and costly to maintain than a file system - hence I wouldn't store LOTS of images in a database.
database for data
filesystem for files
disaster recovery is absolutely no fun when you have terabytes of image data stored in the database. You're better off finding a better way to distribute your data to make it more reliable etc... Of course all the overhead (mentioned above) is multiplied when replicating and so on...
Just don't do it!
This really seems like a KISS (keep it simple stupid) problem. File systems are made to easily handle storing picture files, but it is not easy to do in a database and easy to mess up the data. Why take a performance hit and all the difficulty in the sql and rendering when you can just worry about file security? You can also handle mixed systems ewith NFS or CIFS. File systems are mature technologies. Much simpler, more robust.
I stored images in a database for a demonstration application. The reason I did it was security - deleting a record that I shouldn't have wasn't a big problem, but deleting a file I shouldn't have might have been a problem!
If performance became an issue, I would have investigated whether rogue file deletion was a real possibility or not.
If it are images which are pulled out the database on a regular basis, I would always try to use the filesystem.
If it were images which need to pulled out once in a while, and saving them in the database makes life easier, I have no problem at all with this.
I'm using PHP. I want to store temporary data on MySQL. I have also 64GB Memcached server. So, I want use memcache server to store temporary data. But I have a doubt about performace of MySQL and Memcached server.
What is best and speed and reliable way to store get temporary data between MySQL and Memcached ?
The answer is...it depends. Memcached is very fast and, even better, you can add multiple machines or shards to your memcached pool more or less invisibly to your application. MySQL has a variety of storage engines (and potentially replication) which can persist your data if you need to store it.
So it depends on how "temporary" your temp data needs to be. A good way of thinking about it is this: if your server(s) reboot, do you care if the data is there or not?
Data in a memcached shard will be gone if the memcached instance is restarted, but it is often much faster to access than data in a MySQL table. Data in MySQL will be more persistent but will be slower to access.
MySQL does have a memory storage engine. Memory storage is fast compared to most other MySQL lookups, and data in it will disappear if the server or service restarts.
If you are choosing between that and Memcached, usually Memcached is the better choice (although it always will depend on your own data - YMMV). The MySQL memory engine has strict size limits and you will need to prune your data or risk hitting them. With Memcached, you may hit size limits but the Memcached service will simply start evicting values instead of throwing errors. Evictions aren't great but at least they won't cause errors you need to handle and you can usually prevent them by watching your metrics and setting appropriate TTLs on your content.
First, what I intend to do is to use memory to store the most recent "user update" records for each user.
I am new to MySQL. How can I create tables in memory?
In official website, it is said that we can set ENGINE = MEMORY when creating table. But the document claims that those tables in memory are always, for read, not for write.
I have simply no idea how to do that.
I am into this problems for a few days. I can't install memcache and any PHP extension in server as I'm not using Virtual Private Server, what I can do is just transfer scripts and files in httpdocs folder... I have also tried using flat files to store data to work as buffer/cache, but I found that I cannot write/create files in server's file directory due to denied permission, and I am not allowed to change this permission.
Using MySQL to buffer may be the only choice left for me. Hope someone can give me some hints.
Thank you.
p.s. I am using Linux Apache server running PHP, with MySQL as DB.
ENGINE = MEMORY tables can be used for both read or write.
The only thing to be careful of is that all data in a memory table disappears when the server crashes, is turned off, rebooted, etc. (As you would expect for an in-memory table.)
You really should read carefully about MEMORY engine of MySQL. The data is stored in RAM so when the server is powered off, or rebooted, the RAM will be cleared, and data will be wiped. MEMORY table should be the fastest accessible table type of MySQL, but only stores temporary data, with no guarantee.
If I understood right, you are trying to make static cache of some sort of data generated from PHP, aren't you? The easiest way is to write them as solid file cache in your www directory, either HTML or JS. If you can't chmod your directory to writable, then store them in MySQL should be fine too, but only if that actually helps.
The idea of cache data is to: reduce SQL queries, reduce disk I/O, reduce code generation. But using MEMORY table costs too much memory usage. Store them in a normal MyISAM table should be fine too, and safe you a lot of background work.
However, there should be 2 things to consider: 1, if the cache does not exist when accessing; 2, if the cache is up-to-date.
Giving your result some sort of key should be a good idea, so the PHP checks for cached date first, if doesn't not exist, generate the cache, then display, or otherwise, display the cache directly.
I have a LAMP server with 256MB RAM (poor man's server in cloud). I have an app written to run on this machine. Currently people upload images and they go straight into mysql as BLOB.
There are concerns that this might be very memory consuming operation and we move over it to simple plain files. Can some one tell me if these concerns are valid? (Worth putting efforts into changing a lot of ode that's already written given that we will have sufficient RAM in next 6 months ?)
As a general rule when should we store images in DB and when as files?
To read a BLOB in MySQL you need three times as much memory as it takes (it gets copied into several buffers).
So yes, reading a BLOB in MySQL consumes more memory than reading a file.
You should store them in the file system for several reasons:
The images are easily accessible via other apps (shell, FTP, www, etc...),
It's less resource intensive (including memory) to read them from the file system than from a database
If the database gets corrupted, the images are safe.
You also won't have tables bump up against their size limitations (determined by OS file size limitations) which slows them down (and making them require more resources to read).
The only time you should consider storing images in the DB is when they are used in transaction processing, and even then, there are numerous workarounds to that when storing in the file system.
To summarize:
Database Storage:
Pros:
Assures referential integrity
Easier backup strategy
Easier clustering (database cluster)
Cons:
Higher cost in memory usage and storage
Hard to scale
Additional code must be written to support HTTP caching
Requires a database and associated querying code
File System Storage
Pros:
Low memory footprint (more efficient)
Storage equal to file size
Easy retrieval and storage
Allows the web server to control caching
Cons:
Referential integrity not assured
Backups are not always in sync with database backups
Requires additional backup strategy
If referential integrity of your images is important, store them in the database. The advantages is that a backup of your database will always means that your rows and images are in sync. It does mean though that it is a bit more costly resource wise to store and retrieve.
If the images themselves are not that important, store them as files. It allows for fast and simple retrieval and storage. The downside of using files though is that your backup strategy becomes more complicated and your files will not always be in sync with your database rows.
I personally always store them in the database. For me, the rewards are greater then the cost. This is hardly always the case though and you should look at your application requirements to see which is best for you.
Some large websites are using BLOBs to store their website content. Flickr's use of BLOBs is actually well documented. To answer your question though, file storage is more memory efficient than database storage.
If it is for serving on a WebPage I would user plain file system with a link to the image file name in text format on DataBase. Apache and browsers usually do an amazing job on caching static files.
Even though in theory, you could achieve similar performance serving images from a database, the amount of work you need to do for it does not justify this selection given that the only advantage I can think of is a more cohesive database (with a simple DB dump you get ALL your data: images + data).
If you have a lot of files to keep track of, or they are very large, I'd store them as files. Especially if these files are to be accessed via the web, in which case you can offload all that effort from the SQL server and let the web server handle the transfer.
A good way to track images is to name them using the primary key, and then keep track of the original file name (if you need it) in the database. This way you can always know which file connects to which row. Also, if you have many files (thousands, millions,...), you might consider 'hashing' them into directories, so that 1-1000 are stored in /1, 1001-2000 ares stored in /2, etc. Some OS's see a bit of slowdown when you get a large number of files in a single directory.
I need a simple way for multiple running PHP scripts to share data.
Should I create a MySQL DB with a RAM storage engine, and share data via that (can multiple scripts connect to the same DB simultaneously?)
Or would flat files with one piece of data per line be better?
Flat files? Nooooooo...
Use a good DB engine (MySQL, SQLite, etc). Then, for maximum performance, use memcached to cache content.
In this way, you have the ease and reliability of sharing data between processes using proven server software that handles concurrency, etc... But you get the speed of having your data cached.
Keep in mind a couple things:
MySQL has a query cache. If you are issuing the same queries repeteadly, you can gain a lot of performance without adding a caching layer.
MySQL is really fast anyway. Have you load-tested to demonstrate it is not fast enough?
Please don't use flat files, for the sanity of the maintainers.
If you're just looking to have shared data, as fast as possible, and you can hold it all in RAM, then memcached is the perfect solution.
If you'd like persistence of data, then use a DBMS, like MySQL.
Generally, a DB is better, however, if you are sharing a small, mostly static amount of data, there might be performance benefits (and simplicity) of doing it with flat files.
Anything other than trivial data sharing and I would pick a DB however.
1- Where the flat file can be usefull:
Flat file can be faster than a database, but in very specific applications.
They are faster if the data is read from start to finish without any search or write.
If the data dont fit in memory and need to be read fully to get the job done, It 'can' be faster than a database. Also if there is lot more write than read, flat file also shine, most default databases setups will need to make the read queries wait for the write to finish in order maintain indexes and foreign keys. Making the write queries usually slower than simple reads.
TD/LR vesion:
Use flat files for jobs based system(Aka, simple logs parsing), not for web searches queries.
2- Flat files pit falls:
If your going with a flat file, you will need to synchronize your scripts when the file change using custom lock mechanism. Which can lead to slowdown, corruption up to dead lock if you have a bug.
3- Ram based Database ?
Most databases have in memory cache for query results, search indexes, making them very hard to beat with a flat file. Because they cache in memory, making it run entirely from memory is most of the time ineffective and dangerous. Better to properly tune the database configuration.
If your looking to optimize performance using ram, I would first look at running your php scrips, html pages, and small images from a ram drive. Where the cache mechanism is more likely to be crude and hit the hard drive systematically for non changing static data.
Better result can be reach with a load balancer, clustering with a back plane connections up to ram based SAN array. But that's a whole other topic.
5- can multiple scripts connect to the same DB simultaneously?
Yes, its called connection pooling. In php (client side) its the function to open a connection its mysql-pconnect(http://php.net/manual/en/function.mysql-pconnect.php).
You can configure the maximum open connection in php.ini I think. Similar setting on mysql server side define the maximum of concurrent client connections in /etc/mysql/my.cnf.
You must do this in order to take advantage of parrallel processessing of the cpu and avoid php script to wait the query of each other finish. It greatly increase performance under heavy load.
There is also one connection pool/thread pool in Apache configuration for regular web clients. See httpd.conf.
Sorry for the wall of text, was bored.
Louis.
If you're running them on multiple servers, a filesystem-based approach will not cut it (unless you've got a consistent shared filesystem, which is unlikely and may not be scalable).
Therefore you'll need a server-based database anyway to allow the sharing of data between web servers. If you're serious about either performance or availability, your application will support multiple web servers.
I would say that the MySql DB would be better choice unless you have some mechanism in place to deal with locks on the flat files (and some way to control access). In this case the DB layer (regardless of specific DBMS) is acting as an indirection layer, letting you not worry about it.
Since the OP doesn't specify a web server (and PHP actually can run from a commandline) then I'm not certain that the caching technologies are what they're after here. The OP could be looking to do some sort of flying data transform that isn't website driven. Who knows.
If your system has a PHP cache (that caches compiled PHP code in memory, like APC), try putting your data into a PHP file, as PHP code. If you have to write data, there are some security issues.
I need a simple way for multiple
running PHP scripts to share data.
APC, and memcached are both good options depending on context. shared memory may also be an option.
Should I create a MySQL DB with a RAM
storage engine, and share data via
that (can multiple scripts connect to
the same DB simultaneously?)
That's also a decent option, but will probably not be as fast as APC or memcached.
Or would flat files with one piece of
data per line be better?
If this is read-only data, that's a possibility -- but may be slower than any of the options above. Especially if the data is large. Rather than writing custom parsing code, however, consider simply building a PHP array, and include() the file.
If this is a datastore that may be accessed by several writers simultaneously, by all means do NOT use a flat file! Writing to a flat file from multiple processes is likely to lead to file corruption. You can lock the file, but you risk lock contention issues, and long lock wait times.
Handling concurrent writes is the reason applications like mysql and memcached exist.