I have one simple question. Now I'm working on a e-commerce script and of course users will be able to upload images for each product (up to 10 images for each product). So my question is should I place all the images in the same directory (it will probably be thousands after a time) or create new ones from time to time? Will this slow down the performance or cause any other problems in future if I place them all together?
Thanks in advance
Disk approach
Create an img parent directory, with a subdirectory for each product's images.
./img
./img/eggs
./img/eggs/eggs1.jpg
./img/eggs/eggs2.jpg
./img/spam
./img/spam/myspamimage.jpg
./img/cheese
...
This way you'll have all your images stored in a single tree hierarchy that makes good sense. If you're going to have a very large amount if images (say, more than 100,000) you can group the images according to creation date:
./img
./img/2010-08/eggs
./img/2010-08/eggs/eggs1.jpg
./img/2010-08/eggs/eggs2.jpg
./img/2010-09/spam
./img/2010-09/spam/myspamimage.jpg
./img/2010-09/cheese
...
This way, you will be able to move some months (probably the older ones) to an archive and keep the month subdirectory as a link to another disk.
Database approach
If you need to keep a lot of metadata on each image (e.g., username, SKU, description, copyright etc) you can store the images using arbitrary image names (probably img/img0000001.jpg, /img/img0000002.jpg, ...) and keep a database record that maps a product to its image. This is very useful for searching all the images with a certain characteristics (user, creation date, etc.) associated with them.
I suggest you split them up into separate directories (maybe /year/month/ where year and month are the year and month of the time the image was uploaded, alternatively just do product_id/).
The problem is the inode structure used by most linux file systems. It needs more operations if the number of files in a directory increased.
Like adam said.
Make default directory like Images_Products
Use the mkdir function from php.net to make directories after the user uploaded an image.
Try to make the names of the directories dynamicly with some variable from the productname they added.
Example: $beer_heineken
Related
This is a completely theoretical question.
I have a photo storage site in which photos are uploaded by users registered in the website.
The Question
Which of the approach is faster ?
And better for a long term when i need to use a lot of computers and
hard disks?
Is any other approach is there that's even better ?
Now i have thought of two approaches of accomplishing that stuff.
Files uploaded to my server is expected to be huge ~>100 million
Approach 1
These two /pictures/hd/ & /pictures/low/ directories will contain all the files uploaded by the user.
$newfilename = $user_id.time().$filename; //$filename = actual filename of uploaded file
$src = '/pictures/hd/'.$newfilename; //for hd pics
Inserting that into mysql by
insert into pics(`user_id`,`src`)VALUES('$user_id','$newfilename')
Approach 2
These two /pictures/hd/ & /pictures/low/ directories will contain sub-directories of the files uploaded by the user.
This is going to create lots of subdirectories with the name as user_id of the user who is uploading the file into the server.
if (!is_dir('/pictures/hd/'.$user_id.'/')) {
mkdir('/pictures/hd/'.$user_id.'/');
}
$newfilename = $user_id.'/'.$user_id.time().$filename; //$filename = actual filename of uploaded file
$src = '/pictures/hd/'.$newfilename; //for hd pics
Inserting that into mysql by
insert into pics(`user_id`,`src`)VALUES('$user_id','$newfilename')
Retrieval
When retrieving the image i can use the src column of my pics table to get the filename and explore the hd file using the '/pictures/hd/'.$src_of_picstable and lowq files using '/pictures/low/'.$src_of_picstable
The right way to answer the question is to test it.
Which is faster will depend on the number of files and the underlyng filesystem; ext3,4 will quite happily cope with very large numbers of files in a single directory (dentries atr managed in an HTree index). Some filesystems just use simple lists. Others have different ways of optimizing file access.
Your first problem of scaling will be how to manage the file set across multiple disks. Just extending a single filesystem across lots of disks is a bad idea. If you have lots of directories, then you can have lots of mount points. But this doesn't work all that well when you get to terrabytes of data.
However that the content is indexed independently of the file storage means that it doesn't matter what you choose now for your file storage, because you can easily change the mapping of files to location later without having to move your existing dataset around.
I wouldn't suggest single directory approach for two reasons. Firstly, if you're planning to have a lot of images your directory will get really big. And searching for a single image manually will take a lot longer. This will be needed when you debug something ir test new features.
Second reason for multiple directories is that you can smaller backups of part of your gallery. And if you have really big gallery(lets say several terabytes) single hard drive might not be enough to contain them all. With multiple directories you can mount each directory on separate hard drive and this way handle almost infinite size gallery.
My favorite approach is YYYY/MM/type-of-image directory structure. This way you can spot when did you introduce some bug by looking month by month. Also you can make monthly backups without duplicating redundant files. Also making quarterly snapshots of all gallery just in case.
Also about type-of-image there are several types of images that I might need such as original image, small thumbnail, thumbnail, normal image and etc. This way i can just swap type of image and get different image size.
As for you I would suggest YYYY/MM/type-of-image/user_id approach where you could easily find all user uploaded files in one place.
I am building a site that is looking at Millions of photos being uploaded easily (with 3 thumbnails each for each image uploaded) and I need to find the best method for storing all these images.
I've searched and found examples of images stored as hashes.... for example...
If I upload, coolparty.jpg, my script would convert it to an Md5 hash resulting in..
dcehwd8y4fcf42wduasdha.jpg
and that's stored in /dc/eh/wd/dcehwd8y4fcf42wduasdha.jpg
but for the 3 thumbnails I don't know how to store them
QUESTIONS..
Is this the correct way to store these images?
How would I store thumbnails?
In PHP what is example code for storing these images using the method above?
How am I using the folder structure:
I'm uploading the photo, and move it like you said:
$image = md5_file($_FILES['image']['tmp_name']);
// you can add a random number to the file name just to make sure your images will be "unique"
$image = md5(mt_rand().$image);
$folder = $image[0]."/".$image[1]."/".$image[2]."/";
// IMAGES_PATH is a constant stored in my global config
define('IMAGES_PATH', '/path/to/my/images/');
// coolparty = f3d40fc20a86e4bf8ab717a6166a02d4
$folder = IMAGES_PATH.$folder.'f3d40fc20a86e4bf8ab717a6166a02d4.jpg';
// thumbnail, I just append the t_ before image name
$folder = IMAGES_PATH.$folder.'t_f3d40fc20a86e4bf8ab717a6166a02d4.jpg';
// move_uploaded_file(), with thumbnail after process
// also make sure you create the folders in mkdir() before you move them
I do believe is the base way, of course you can change the folder structure to a more deep one, like you said, with 2 characters if you will have millions of images.
The reason you would use a method like that is simply to reduce the total number of files per directory (inodes).
Using the method you have described (3 levels deeps) you are very unlikely to reach even hundreds of images per directory since you will have a max number of directories of almost 17MM. 16**6.
As far as your questions.
Yeah, that is a fine way to store them.
The way I would do it would be
/aa/bb/cc/aabbccdddddddddddddd_thumb.jpg
/aa/bb/cc/aabbccdddddddddddddd_large.jpg
/aa/bb/cc/aabbccdddddddddddddd_full.jpg
or similar
There are plenty of examples on the net as far as how to actually store images. Do you have a more specific question?
If you're talking millions of photos, I would suggest you farm these off to a third party such as Amazon Web Services, more specifically for this Amazon S3. There is no limit for the number of files and, assuming you don't need to actually list the files, there is no need to separate them into directories at all (and if you do need to list, you can use different delimeters and prefixes - http://docs.amazonwebservices.com/AmazonS3/latest/dev/ListingKeysHierarchy.html). And your hosting/rereival costs will probably be lower than doing yourself - and they get backed up.
To answer more specifically, yes, split by sub directories; using your structure, you can drop the first 5 characters of the filename as you alsready have it in the directory name.
And thumbs, as suggested by aquinas, just appent _thumb1 etc to the filename. Or store in separate folders themsevles.
1) That's something only you can answer. Generally, I prefer to store the images in the database so you can have ONE consistent backup, but YMMV.
2) How? How about /dc/eh/wd/dcehwd8y4fcf42wduasdha_thumb1.jpg, /dc/eh/wd/dcehwd8y4fcf42wduasdha_thumb2.jpg and /dc/eh/wd/dcehwd8y4fcf42wduasdha_thumb3.jpg
3) ??? Are you asking how to write a file to the file system or...?
Improve Answer.
For millions of Images, as yes, it is correct that using database will slow down the process
The best option will be either use "Server File System" to store images and use .htaccess to add security.
or you can use web-services. many servers like provide Images Api for uploading, displaying.
You can go on that option also. For example Amazon
I'm coding a basic gallery for a website with around 40.000 online people at any given time. Users will be able to create galleries and upload images.
My question is, should I make a seperate folder for each gallery and put the images in them, or make a single folder and put all images in it, but keep the gallery_id for each image in the database? Or, should I make a directory for every user, then another directory inside them for the gallery names?
How would you do this?
Ps. I need it to be as light as it can.
I would store them by id
and i would split them into folders (dependant of filesystem, some don't perform well with lots of files in 1 folder), plus it is easier to find them if you have to manually look at something
Give each file an id, then using the first 3 digits of the file name, split them into folders. (you could start your auto-increment counter at 100000 or zero pad the id, so there is at least 3 levels
/photos/1/0/3/103456.jpg
/photos/9/4/1/941000.jpg
/photos/0/0/0/000001.jpg
You can store the relationship of photo to user / gallery / etc in the database
Or if you want to see how the big boys do it
Needle in a haystack: efficient storage of billions of photos
Typically web servers don't want you to have more than a few thousand images in a single folder (I recently had to deal with 70,000 images causing super slow reads and sorts so trust me on this) so certainly not a single folder if you think you will have thousands of images. I would suggest the best solution would be to host off of amazon's S3 connected to their CDN CloudFront but if that isn't realistic you can still do several things just on your own server.
Make a separate folder for each gallery like you suggest only if you know some bounds on how large a gallery can get and have an idea of how many galleries will be created. (This is what I would suggest for your specific problem right now)
Put the image name through a hash function then use the first 1-3 characters of the hash to name folders to put the images into. The hash ensures that the images are roughly equally split among the folders and you can decide how many folders you need.
At any rate having the information of what gallery and the image id in the actual path will probably be useful to you moving forward both in code and whenever a human has to hunt bugs on the server. I would probably name the folders based on the gallery id and just make sure that no gallery has more than a few thousand images in it.
I store mine like this:
images/userid/photoid
This way I can quickly isolate user images if I need to inspect anything at a later date. It seems more organized than dropping them all in one central directory.
We are building a web app which will have a lot of images being uploaded. Which is the best Solution to optimize these images and store it in on the website ?
And also is there a way i can also auto enhance the images which are being uploaded ?
Do not store images in DB, store them in file system (as real files). You'll probably need to store information about them in DB though, e.g., filename, time of upload, size, owner etc.
Filenames must be unique. You might use yyyymmddhhiissnnnn, where yyyymmdd is year, month and date, hhiiss - hour, minutes and seconds, nnnn - number of image in that second, i.e., 0001 for first image, 0002 for second image etc. This will give you unique filenames with fine ordering.
Think about making some logic directory structure. Storing millions of images in single folder is not a good idea, so you will need to have something like images/<x>/<y>/<z>/<filename>. This could also be spanned across multiple servers.
Keep original images. You never know what you will want to do with images after year or two. You can convert them to some common format though, i.e., if you allow to upload JPG, PNG and other formats, you might store all of them as JPG.
Create and store all kinds of resized images that are necessary in your website. For example, social networks often have 3 kinds of resized images - one for displaying with user comments in various places (very small), one for displaying in profile page (quite small, but not like icon; might be some 240x320 pixels) and one for "full size" viewing (often smaller than original). Filenames of these related images should be similar to filenames of original images, e.g., suffixes _icon, _profile and _full might be added to filenames of original images. Depending on your resources and the amount of images that are being uploaded at the same time, you can do this either in realtime (in the same HTTP request) or use some background processing (cron job that continuously checks if there are new images to be converted).
As for auto enhancing images - it is possible, but only if you know exactly what must be done with images. I think that analyzing every image and deciding what should be done with it might be too complex and take too much resources.
All good suggestions from binaryLV above. Adding to his suggestion #5, you also probably want to optimize the thumbnails you create. When images are uploaded, they are likely to have metadata that is unnecessary for the thumbnails to have. You can losslessly remove the metadata to make the thumbnail sizes smaller, as suggested here: http://code.google.com/speed/page-speed/docs/payload.html#CompressImages. I personally use jpegtran on the images for my website to automatically optimize my thumbnails whenever they are created. If you ever need the metadata, you can get it from the original image.
Something else to consider if you plan to display these images for users is to host your images on a cookie-free domain or sub-domain as mentioned here: http://developer.yahoo.com/performance/rules.html#cookie_free. If the images are hosted on a domain or sub-domain that has cookies, then every image will send along an unnecessary cookie. It can save a few KB per image requested, which can add up to a decent amount, especially on restricted bandwidth connection such as on a mobile device.
I would be having 150-200 products on my website which could grow in the future and I have around 30 - 40 images for each product so I wanted to ask should I have separate folders for storing images of each product or save all the images in one single folder?
Thanks
I'd use a separate folder for each product - it seems much neater that way and won't end up with too many files in the directory.
It would also make it easier to iterate over all a product's images.
If each product has a unique ID number then you should probably use that for the folder name, unless you want to go the full SEO route and have something like /images/my-product-name/...jpg
That depends on the file system - with FAT32 a directory can contain up to 65,536 entries (i.e. files), so you'll probably be fine.
It also depends on whether you only ever want programmatic access, or whether you want some person to have to ever look at a directory with 65K files.