This question already has answers here:
Storing Images in DB - Yea or Nay?
(56 answers)
Closed 9 years ago.
I want to create an image gallery and obviously, it must have images in it.
Somehow, I've been wondering about what's better between storing the images in a directory and retrieve them one by one or store them in the database as a BLOB data?
Thank you people! Cheers!
I am willing to learn either of the methods so please enlighten me.
This question has been debated for many years. Advocates will make strong cases for each. Neither side as ever been definitively proven to be right in all cases.
Both methods break down when the number of images that you need to warehouse gets very large. Both databases and file systems have become better in the years since I bench marked both options against each other. At that time, you could fix the performance hit on the "file system" option by creating a hierarch of directories instead of putting them all in one directory. By now, file systems may have been optimized so that they don't choke when the number of directory entries gets large.
This is truly a "your mileage may vary" situation. Factors will include what file system will you use vs. what database engine will you use, how many images, what average size? Will you be "tagging" the images in the database as well as storing them?
Typically, you have to just try all the options until you find something that works in your configuration.
Definitely stress test it. If you think you need to store one million images, don't test with five and assume that it will scale.
Test it with at least a million images, if not with two or five million.
That said, if you only need to store 1,000 images (or less) maybe even 10k or less and if you need to index the images by attributes like date, location, subject matter, etc. then at the risk of offending many well meaning people, I am going to recommend storing the image as a blob in the database. The convenience of using the database to join the image to the meta data will outweigh anyone's performance concerns at that scale. When you store the metadata in a database with a pointer to a file in the file system, it is too easy for things to get out of sync. The file gets moved, renamed, deleted etc; your database wont know and now your system is broken.
Using a database will insure the integrity of your data, including the images for you.
First you will need to upload file in any predefined folder and than you can store name of that file in your database(with is varchar data type).
And when you will fetch those records, use that name to recreating image path wherever application required.
Related
I have over 1.3milion images that I have to compare with each other, and a few hundreds per day are added.
My company take an image and create a version that can be utilized by our vendors.
The files are often very similar to each other, for example two different companies can send us two different images, a JPG and a GIF, both with the McDonald Logo, with months between the submissions.
What is happening is that at the end we find ourselves creating two different times the same logo when we could simply copy/paste the already created one or at least suggest it as a possible starting point for the artists.
I have looked around for algorithms to create a fingerprint or something that will allow me to do a simple query when a new image is uploaded, time is relatively not an issues, if it takes 1 second to create the fingerprint it will take 150 days to create the fingerprints but it will be a great deal in saving that we might even get 3 or 4 servers to do it.
I am fluent in PHP, but if the algorithm is in pseudocode or even C I can read it and try to translate (unless it uses some C specific libraries)
Currently I am doing an MD5 of all the images to catch the ones that are exactly the same, this question came up when I was thinking to do a resize of the image and run the md5 on the resized image to catch the ones that have been saved in a different format and resized, but then I would still not have a good enough recognition.
If I didn't mention it, I will be happy with something that just suggest possible "similar" images.
EDIT
Keep in mind that the check needs to be done multiple times per minute, so the best solution is one that gives me some values per image that I can store and use in the future to compare with the image that I am looking at without having to re-scan the whole server.
I am reading some pages that mention histograms, or resizing the image to a very small size, strip possible tags and then convert it in grayscale, do the hash of that files and use it for comparison. If I am succesful I will post the code/answer here
Try using file_get_contents and:
http://www.php.net/manual/en/function.hash-file.php
If the hashes match, then you know they are the exact same.
EDIT:
If possible I would think storing the image hashes, and the image path in a database table might help you limit server load. It is much easier to run the hash algorithm once on your initial images and store the hash in a table... Then when new images are submitted you can hash the image and then do a lookup on the database table. If the hash is already there discard it. You can use the hash as the table index and so once you find a match you dont need to check the rest.
The other option is to not use a database...But then you would have to always do a n lookup. That is check hash the incoming image and then run in memory a n time search against all saved images.
EDIT #2:
Please view the solution here: Image comparison - fast algorithm
To speedup the process, sort all the files with size and compare internals only if two sizes are equal. To compare internal data, using hash comparison is also fastest way. Hope this helps.
I'm making an android application which takes a photo and push the image (as a base64 encoded string) to a PHP script, from here I'll be storing data about the image inside a MySQL database.
Would it be wise to store the image inside the database (since it's passed as a base64 string), would it be better to convert it back to an image and store it on the filesystem?
A base64 encoded image takes too much place (about 33% more than the binary equivalent).
MySQL offers binary formats (BLOB, MEDIUM_BLOB), use them.
Alternatively, most people prefer to store in the DB only a key to a file that the filesystem will store more efficiently, especially if it's a big image. That's the solution I prefer for the long term. I usually use a SHA1 hash of the file content to form the path to the file, so that I have no double storage and that it's easy to retrieve the record from the file if I want to (I use a three level file tree, first two levels being made respectively from the first two characters and the characters 3 and 4 of the hash so that I don't have too many direct child of a directory). Note that this is for example the logic of the git storage.
The advantage of storing them in the DB is that you'll manage more easily the backups, especially as long as your project is small. The database will offer you a cache, but your server and the client too, it's hard to decide a priori which will be fastest and the difference won't be big (I suppose you don't make too many concurrent write).
I've done it both ways, and every time I come back to code where I stored binary data in a MySQL table I always switch it to filesystem with a pointer in the MySQL table.
When it comes to performance, you're going to be much better off going to the FS as pulling multiple large BLOBs from a MySQL server will tend to saturate its pipe quickly. Usually it's a pipe you don't want clogged.
You could always save the base64_encode($image) in a file and only store the file path in the database, then use fopen() to get the encoded image.
My apologies if I didn't understand the question correctly.
"wise" is pretty subjective, I think. I think it would be wise from a "keep people from directly linking to my images" perspective. Also, it may be helpful as far as if you decide you need to change up dir structures etc.. it might make it easier on you (but this really depends on how you wrote your scripts to begin with..) but other than that... offhand I can't really think of any benefits to doing this.
I have no idea how the big websites save the pictures on their servers. Could any one tell me how do they save the pictures that are uploaded by the users in their database?
I was thinking, maybe they would just save the file(the picture) in some path and just save that path in the databse is that right?
But I want to do it this way. Is this right? For example, a website named www.photos.com. When a user uploads a picture I would create a folder of the user name and save those pictures in that folder.
I believe we can create a directory using php file concepts. So when a new user uploads his picture or file, I want to create a directory with his name.
Example: if user name is john, I would create a directory like this on photos.com www.photos.com/john/ and then save all his pictures to this directory when he uploads a picture. Is this the right way to do this?
I have no one here that has good knowledge of saving the files to servers so please let me know how to do this? I want to do it the correct and secure way.
All big websites don't save pictures to the database they store them in the disk.
They save a reference to the picture's position in a table. And then link from there.
Why? Performance.
Pulling heavy content from a database is a huge performance bottleneck. And databases don't scale horizontally that well, so it would mean even a bigger problem. All big sites use static content farms to deal with static content such as images. That's servers who won't care less about your identity.
How do they keep the pictures really private you might ask? They don't.
The picture's link is, in itself, the address and the password. Let's take Facebook, for example. If I store a private picture on my account you should not be able to open it. But, as long as you have the correct address you can.
This picture is private. Notice the filename
10400121_87110566301_7482172_n.jpg
(facebook changes the url from time to time so the link may be broken)
It's non sequential. The only way to get the picture is to know it's address.
Based on a previous user photo you can't guess the next one.
It has a huge entropy so even if you start taking random wild guesses you'll have an extensive amount of failures and, if you do get to a picture, you won't be able to, from there, realize the owners identity which, in itself, is protection in anonymity.
Edit (why you should not store images in a "username" folder:
After your edit it became clear that you do intent to put files on disk and not on the database. This edit covers the new scenario.
Even though your logic (create a folder per user) seams more organized it creates problems when you start having many users and many pictures. Imagine that your servers have 1T disk space. And lets also imagine that 1T is more or less accurate with the load the server can handle.
Now you have 11 users, assume they start uploading at the same time and each will upload more than 100GB of files. When they reach 91GB each the server is full and you must start storing images on a different server. If that user/folder structure is followed you would have to select one of the users and migrate all of his data to a different server. Also, it makes a hard-limit on a user who can't upload more than 1T in files.
Should I store all files in the same folder, then?
No, big-sites generally store files in sequential folders (/000001/, /000002/, etc) having an x defined number of files per folder. This is mainly for file-system performance issues.
More on how many files in a directory is too many?
It is usually a bad idea to store images in your database (if your site is popular). Database is, traditionally, one of main bottlenecks in most any application out there. No need to load it more than necessary. If images are in the filesystem, many http servers (nginx, for example) will serve them most efficiently.
The biggest social network in Russia, Vkontakte does exactly this: store images in the filesystem.
Another big social network implemented a sophisticated scalable blob storage. But it's not available to the public, AFAIK.
Summary of this answer: don't store blobs in the database.
is this the right way to do
Yes.
The only thing I'd suggest to use not name but id.
www.photos.com/albums/1234/ would be okay for starter.
Image management may best be achieved by physically uploading images to the server and then recording file location and image details in a database. Subsequently, a Search Form could be configured to permit the user to do a text search, part number search, or other queries. A PHP script could be written to produce a valid HTML image tag based on data found in the table.
uploading images into a MySQLâ„¢ BLOB field is such a bad idea such image data is generally problematic if the images are much larger than thumbnails. If the images are large, you can end up having to copy/paste one SQL INSERT statement at a time (into phpMyAdmin). If the images are large and the SQL INSERT statement is broken into two lines by your text editor, you'll never be able to restore the image.
What's the advantage of storing images or the path to images in a database compared to directly linking to the images from your script?
Edit: Isn't hardcoding the urls in the script also faster since you don't have to do a database lookup for every image in your webpage?
Because you can dynamically alter the paths later, or be able to manipulate them, otherwise your 'script' would have to be updated EVERYWHERE (imagine your script(s) grow to large sizes).
Database makes management of data easier, and eliminates hard coding in your example in scripts.
It is never good to hard-code something.
EDIT
I just noticed you said 'storing image' I wouldn't store images in the DB, safe them for the files system and reference with the path like you stated in your question.
It's impossible to answer to such a vague question.
What images you're talking about? Design images? photo gallery images? avatar images? It's all different cases each with own solution. Storing image names in the database will do any good for only one case out of these three, as it would be easier to group, arrange and interlink images in the gallery. While for the other cases there is not a single reason to store image names in the database.
Anyway, it's all applicable to the image names only. As there are not a single reason to store any URL or path beside image name. Url should be computational based on some rules, not hardcoded one.
I do not typically do this with basic site images, but the definite advantage can be for scalability purposes. If the image is going to show up in different scripts, they can all reference to the db, thus giving you the ability to only have to change the url in one place.
I have 500k unique images in my folder/directory. I want call it by name and all names are stored in Mysql database. but I heard that images can be stored in a database. So my question is which is more fastest option to display image faster. do I need to store in mySQl or can I keep same method which I am following?
If I need to store in mySQL then how do I create a table for it, and how do I store all these images?
This has been answered quite a few times. But you haven't talked about what type of application that you are building, if you are building a web application then storing the images on the file system has advantages re: caching.
Storing Images In Filesystem As Files Or In BLOB Database Field As Binaries
Storing Images in DB - Yea or Nay?
It's easy enough to store the images in a table, I would definitely avoid it though if your images are quite large.
I do not think 500k entries in a single directory will go over very well: How many files can I put in a directory?
Once upon a time, Linux ext3 started running slowly for very simple operations once a directory accumulated around 2000 files. (O(N) sorts of slowly!) After the htree patches were merged, large directory performance improved drastically, but I'd still recommend partitioning your image store based on some very easy criteria. (Squid web cache, for example, creates directory trees like 01/, 02/, and places files based on the first two characters in the filename.)
Do not store so many data in a db like mysql especially if you are not so familiar like you sound. Keep the images on the fs
I have 500k unique images in my
folder/directory. I want call it by
name and all names are stored in Mysql
database. but I heard that images can
be store in database. so my question
is which is more fastest option to
display image faster.
You should use the file system. Storing in database is not going to work very well. You should read Facebook Photo Storage Architecture to learn how facebook does it. They have the most photos in the world.
Haystack file storage:
Also interesting:
http://www.fredberinger.com/high-performance-at-massive-scale-lessons-learned-at-facebook/
Storing images into the database (in a blob datatype) in much more inefficient than keep those images stored on the file system.
BTW here is explained how to insert binary data into a mysql table
If you have Redis, you could put all the images in memory, that would be the quickest way