What is the most efficient way to store 500.000 images? - php

I'm coding a basic gallery for a website with around 40.000 online people at any given time. Users will be able to create galleries and upload images.
My question is, should I make a seperate folder for each gallery and put the images in them, or make a single folder and put all images in it, but keep the gallery_id for each image in the database? Or, should I make a directory for every user, then another directory inside them for the gallery names?
How would you do this?
Ps. I need it to be as light as it can.

I would store them by id
and i would split them into folders (dependant of filesystem, some don't perform well with lots of files in 1 folder), plus it is easier to find them if you have to manually look at something
Give each file an id, then using the first 3 digits of the file name, split them into folders. (you could start your auto-increment counter at 100000 or zero pad the id, so there is at least 3 levels
/photos/1/0/3/103456.jpg
/photos/9/4/1/941000.jpg
/photos/0/0/0/000001.jpg
You can store the relationship of photo to user / gallery / etc in the database
Or if you want to see how the big boys do it
Needle in a haystack: efficient storage of billions of photos

Typically web servers don't want you to have more than a few thousand images in a single folder (I recently had to deal with 70,000 images causing super slow reads and sorts so trust me on this) so certainly not a single folder if you think you will have thousands of images. I would suggest the best solution would be to host off of amazon's S3 connected to their CDN CloudFront but if that isn't realistic you can still do several things just on your own server.
Make a separate folder for each gallery like you suggest only if you know some bounds on how large a gallery can get and have an idea of how many galleries will be created. (This is what I would suggest for your specific problem right now)
Put the image name through a hash function then use the first 1-3 characters of the hash to name folders to put the images into. The hash ensures that the images are roughly equally split among the folders and you can decide how many folders you need.
At any rate having the information of what gallery and the image id in the actual path will probably be useful to you moving forward both in code and whenever a human has to hunt bugs on the server. I would probably name the folders based on the gallery id and just make sure that no gallery has more than a few thousand images in it.

I store mine like this:
images/userid/photoid
This way I can quickly isolate user images if I need to inspect anything at a later date. It seems more organized than dropping them all in one central directory.

Related

Folder Structure for storing millions of images?

I am building a site that is looking at Millions of photos being uploaded easily (with 3 thumbnails each for each image uploaded) and I need to find the best method for storing all these images.
I've searched and found examples of images stored as hashes.... for example...
If I upload, coolparty.jpg, my script would convert it to an Md5 hash resulting in..
dcehwd8y4fcf42wduasdha.jpg
and that's stored in /dc/eh/wd/dcehwd8y4fcf42wduasdha.jpg
but for the 3 thumbnails I don't know how to store them
QUESTIONS..
Is this the correct way to store these images?
How would I store thumbnails?
In PHP what is example code for storing these images using the method above?
How am I using the folder structure:
I'm uploading the photo, and move it like you said:
$image = md5_file($_FILES['image']['tmp_name']);
// you can add a random number to the file name just to make sure your images will be "unique"
$image = md5(mt_rand().$image);
$folder = $image[0]."/".$image[1]."/".$image[2]."/";
// IMAGES_PATH is a constant stored in my global config
define('IMAGES_PATH', '/path/to/my/images/');
// coolparty = f3d40fc20a86e4bf8ab717a6166a02d4
$folder = IMAGES_PATH.$folder.'f3d40fc20a86e4bf8ab717a6166a02d4.jpg';
// thumbnail, I just append the t_ before image name
$folder = IMAGES_PATH.$folder.'t_f3d40fc20a86e4bf8ab717a6166a02d4.jpg';
// move_uploaded_file(), with thumbnail after process
// also make sure you create the folders in mkdir() before you move them
I do believe is the base way, of course you can change the folder structure to a more deep one, like you said, with 2 characters if you will have millions of images.
The reason you would use a method like that is simply to reduce the total number of files per directory (inodes).
Using the method you have described (3 levels deeps) you are very unlikely to reach even hundreds of images per directory since you will have a max number of directories of almost 17MM. 16**6.
As far as your questions.
Yeah, that is a fine way to store them.
The way I would do it would be
/aa/bb/cc/aabbccdddddddddddddd_thumb.jpg
/aa/bb/cc/aabbccdddddddddddddd_large.jpg
/aa/bb/cc/aabbccdddddddddddddd_full.jpg
or similar
There are plenty of examples on the net as far as how to actually store images. Do you have a more specific question?
If you're talking millions of photos, I would suggest you farm these off to a third party such as Amazon Web Services, more specifically for this Amazon S3. There is no limit for the number of files and, assuming you don't need to actually list the files, there is no need to separate them into directories at all (and if you do need to list, you can use different delimeters and prefixes - http://docs.amazonwebservices.com/AmazonS3/latest/dev/ListingKeysHierarchy.html). And your hosting/rereival costs will probably be lower than doing yourself - and they get backed up.
To answer more specifically, yes, split by sub directories; using your structure, you can drop the first 5 characters of the filename as you alsready have it in the directory name.
And thumbs, as suggested by aquinas, just appent _thumb1 etc to the filename. Or store in separate folders themsevles.
1) That's something only you can answer. Generally, I prefer to store the images in the database so you can have ONE consistent backup, but YMMV.
2) How? How about /dc/eh/wd/dcehwd8y4fcf42wduasdha_thumb1.jpg, /dc/eh/wd/dcehwd8y4fcf42wduasdha_thumb2.jpg and /dc/eh/wd/dcehwd8y4fcf42wduasdha_thumb3.jpg
3) ??? Are you asking how to write a file to the file system or...?
Improve Answer.
For millions of Images, as yes, it is correct that using database will slow down the process
The best option will be either use "Server File System" to store images and use .htaccess to add security.
or you can use web-services. many servers like provide Images Api for uploading, displaying.
You can go on that option also. For example Amazon

What is the best way to upload and store pictures on the site?

I have no idea how the big websites save the pictures on their servers. Could any one tell me how do they save the pictures that are uploaded by the users in their database?
I was thinking, maybe they would just save the file(the picture) in some path and just save that path in the databse is that right?
But I want to do it this way. Is this right? For example, a website named www.photos.com. When a user uploads a picture I would create a folder of the user name and save those pictures in that folder.
I believe we can create a directory using php file concepts. So when a new user uploads his picture or file, I want to create a directory with his name.
Example: if user name is john, I would create a directory like this on photos.com www.photos.com/john/ and then save all his pictures to this directory when he uploads a picture. Is this the right way to do this?
I have no one here that has good knowledge of saving the files to servers so please let me know how to do this? I want to do it the correct and secure way.
All big websites don't save pictures to the database they store them in the disk.
They save a reference to the picture's position in a table. And then link from there.
Why? Performance.
Pulling heavy content from a database is a huge performance bottleneck. And databases don't scale horizontally that well, so it would mean even a bigger problem. All big sites use static content farms to deal with static content such as images. That's servers who won't care less about your identity.
How do they keep the pictures really private you might ask? They don't.
The picture's link is, in itself, the address and the password. Let's take Facebook, for example. If I store a private picture on my account you should not be able to open it. But, as long as you have the correct address you can.
This picture is private. Notice the filename
10400121_87110566301_7482172_n.jpg
(facebook changes the url from time to time so the link may be broken)
It's non sequential. The only way to get the picture is to know it's address.
Based on a previous user photo you can't guess the next one.
It has a huge entropy so even if you start taking random wild guesses you'll have an extensive amount of failures and, if you do get to a picture, you won't be able to, from there, realize the owners identity which, in itself, is protection in anonymity.
Edit (why you should not store images in a "username" folder:
After your edit it became clear that you do intent to put files on disk and not on the database. This edit covers the new scenario.
Even though your logic (create a folder per user) seams more organized it creates problems when you start having many users and many pictures. Imagine that your servers have 1T disk space. And lets also imagine that 1T is more or less accurate with the load the server can handle.
Now you have 11 users, assume they start uploading at the same time and each will upload more than 100GB of files. When they reach 91GB each the server is full and you must start storing images on a different server. If that user/folder structure is followed you would have to select one of the users and migrate all of his data to a different server. Also, it makes a hard-limit on a user who can't upload more than 1T in files.
Should I store all files in the same folder, then?
No, big-sites generally store files in sequential folders (/000001/, /000002/, etc) having an x defined number of files per folder. This is mainly for file-system performance issues.
More on how many files in a directory is too many?
It is usually a bad idea to store images in your database (if your site is popular). Database is, traditionally, one of main bottlenecks in most any application out there. No need to load it more than necessary. If images are in the filesystem, many http servers (nginx, for example) will serve them most efficiently.
The biggest social network in Russia, Vkontakte does exactly this: store images in the filesystem.
Another big social network implemented a sophisticated scalable blob storage. But it's not available to the public, AFAIK.
Summary of this answer: don't store blobs in the database.
is this the right way to do
Yes.
The only thing I'd suggest to use not name but id.
www.photos.com/albums/1234/ would be okay for starter.
Image management may best be achieved by physically uploading images to the server and then recording file location and image details in a database. Subsequently, a Search Form could be configured to permit the user to do a text search, part number search, or other queries. A PHP script could be written to produce a valid HTML image tag based on data found in the table.
uploading images into a MySQLâ„¢ BLOB field is such a bad idea such image data is generally problematic if the images are much larger than thumbnails. If the images are large, you can end up having to copy/paste one SQL INSERT statement at a time (into phpMyAdmin). If the images are large and the SQL INSERT statement is broken into two lines by your text editor, you'll never be able to restore the image.

What is the best way to keep track of images uploaded by an user using PHP

I am planning to do a photo album website, So each user may upload as many number of images. What is the best way to keep track of images for an individual user. What should be the server configuration to handle this part.
-Lokesh
Depending on the amount of images, you will probably want to store them on a static domain. Then, have a table in whatever database you are using to store the paths to each of the images for each user.
Well like many design topics there are lots of different ways to go about it. Two ways that come to mind right now are as follows.
you could simply have a directory created on the server for each user and then have the images each use uploads saved into that directory. Ofcourse you'd want to make sure they didn't over write any existing images with images of the same name. You could do this by warning them about conflicting names or by adding some sort of noce string (like a time stamp) to the end of of the file name. This is a pretty straight forward solution and means that you can login to your server and see all the images each user has uploaded right there for you to do anything you like with.
Another idea would be to save the images in a database. This can be done by serializing the images to a string and storing it in a database. This is nice becaues it means you don't have to worry about handling directories and duplicate file names. You will have to deserialize each image when you want to display it which will put your DB under load so for a very high traffic volume site this might not really be the way to go.
There are ofcourse combinations of these ideas and many others. It really comes down to working out which solution best fits your exact needs.

Should I place all the images in the same directory?

I have one simple question. Now I'm working on a e-commerce script and of course users will be able to upload images for each product (up to 10 images for each product). So my question is should I place all the images in the same directory (it will probably be thousands after a time) or create new ones from time to time? Will this slow down the performance or cause any other problems in future if I place them all together?
Thanks in advance
Disk approach
Create an img parent directory, with a subdirectory for each product's images.
./img
./img/eggs
./img/eggs/eggs1.jpg
./img/eggs/eggs2.jpg
./img/spam
./img/spam/myspamimage.jpg
./img/cheese
...
This way you'll have all your images stored in a single tree hierarchy that makes good sense. If you're going to have a very large amount if images (say, more than 100,000) you can group the images according to creation date:
./img
./img/2010-08/eggs
./img/2010-08/eggs/eggs1.jpg
./img/2010-08/eggs/eggs2.jpg
./img/2010-09/spam
./img/2010-09/spam/myspamimage.jpg
./img/2010-09/cheese
...
This way, you will be able to move some months (probably the older ones) to an archive and keep the month subdirectory as a link to another disk.
Database approach
If you need to keep a lot of metadata on each image (e.g., username, SKU, description, copyright etc) you can store the images using arbitrary image names (probably img/img0000001.jpg, /img/img0000002.jpg, ...) and keep a database record that maps a product to its image. This is very useful for searching all the images with a certain characteristics (user, creation date, etc.) associated with them.
I suggest you split them up into separate directories (maybe /year/month/ where year and month are the year and month of the time the image was uploaded, alternatively just do product_id/).
The problem is the inode structure used by most linux file systems. It needs more operations if the number of files in a directory increased.
Like adam said.
Make default directory like Images_Products
Use the mkdir function from php.net to make directories after the user uploaded an image.
Try to make the names of the directories dynamicly with some variable from the productname they added.
Example: $beer_heineken

Upload sets of images to one single folder or seperate folders for each set?

I would be having 150-200 products on my website which could grow in the future and I have around 30 - 40 images for each product so I wanted to ask should I have separate folders for storing images of each product or save all the images in one single folder?
Thanks
I'd use a separate folder for each product - it seems much neater that way and won't end up with too many files in the directory.
It would also make it easier to iterate over all a product's images.
If each product has a unique ID number then you should probably use that for the folder name, unless you want to go the full SEO route and have something like /images/my-product-name/...jpg
That depends on the file system - with FAT32 a directory can contain up to 65,536 entries (i.e. files), so you'll probably be fine.
It also depends on whether you only ever want programmatic access, or whether you want some person to have to ever look at a directory with 65K files.

Categories