I have up to 40 000 images stored in one directory (ID photos). I need to periodically synchronize them with a database. I'm currently using a PHP script to loop through the directory and add new files and remove missing files, through ODBC. Obviously, this is not working so well.
Is there a robust way to it in PHP? Or, what is the best alternative?
Thanks
Related
I'm working on a small PHP application which update a stock products regularly, i'm getting the updated file from the server, and i have the old one in my directory, so what is the best way to get only the updated products(lines) between these two files, for information both files contain arround 70000 product lines.
I though to store the data of each file into an array, then use "array_diff" to compare them, it will work theoretically, but will be good idea with 70000 on each array?
Thanks in advance.
I'd use the diff command.
For reference:
https://www.geeksforgeeks.org/diff-command-linux-examples/
I have ~280,000 files that will need to be searched through, and the proper file returned and opened. The file names are exact matches of the expected search terms.
The search terms will be taken by an input box using PHP. What is the best way to accomplish this so that searches do not take a large amount of time?
Thanks!
I suspect the file system itself will struggle with 280,000 files in one directory.
An approach I've taken in the past is to put those files in subdirectories based upon the initial letters of the filename e.g.
1/100000.txt
1/100001.txt
...
9/900000.txt
etc. You can subdivide further using the second letter etc.
Its good you added mysql to your tags. Ideally i would have a CRON task that would index the directories into a mysql table and use that to do the actual search. Algebra is faster than File System iteration. You could run the task daily or hourly depending on how often your files change. Or use something like Guard to monitor the file system for changes and make appropriate updates.
See: https://github.com/guard/guard
I am building a site that is looking at Millions of photos being uploaded easily (with 3 thumbnails each for each image uploaded) and I need to find the best method for storing all these images.
I've searched and found examples of images stored as hashes.... for example...
If I upload, coolparty.jpg, my script would convert it to an Md5 hash resulting in..
dcehwd8y4fcf42wduasdha.jpg
and that's stored in /dc/eh/wd/dcehwd8y4fcf42wduasdha.jpg
but for the 3 thumbnails I don't know how to store them
QUESTIONS..
Is this the correct way to store these images?
How would I store thumbnails?
In PHP what is example code for storing these images using the method above?
How am I using the folder structure:
I'm uploading the photo, and move it like you said:
$image = md5_file($_FILES['image']['tmp_name']);
// you can add a random number to the file name just to make sure your images will be "unique"
$image = md5(mt_rand().$image);
$folder = $image[0]."/".$image[1]."/".$image[2]."/";
// IMAGES_PATH is a constant stored in my global config
define('IMAGES_PATH', '/path/to/my/images/');
// coolparty = f3d40fc20a86e4bf8ab717a6166a02d4
$folder = IMAGES_PATH.$folder.'f3d40fc20a86e4bf8ab717a6166a02d4.jpg';
// thumbnail, I just append the t_ before image name
$folder = IMAGES_PATH.$folder.'t_f3d40fc20a86e4bf8ab717a6166a02d4.jpg';
// move_uploaded_file(), with thumbnail after process
// also make sure you create the folders in mkdir() before you move them
I do believe is the base way, of course you can change the folder structure to a more deep one, like you said, with 2 characters if you will have millions of images.
The reason you would use a method like that is simply to reduce the total number of files per directory (inodes).
Using the method you have described (3 levels deeps) you are very unlikely to reach even hundreds of images per directory since you will have a max number of directories of almost 17MM. 16**6.
As far as your questions.
Yeah, that is a fine way to store them.
The way I would do it would be
/aa/bb/cc/aabbccdddddddddddddd_thumb.jpg
/aa/bb/cc/aabbccdddddddddddddd_large.jpg
/aa/bb/cc/aabbccdddddddddddddd_full.jpg
or similar
There are plenty of examples on the net as far as how to actually store images. Do you have a more specific question?
If you're talking millions of photos, I would suggest you farm these off to a third party such as Amazon Web Services, more specifically for this Amazon S3. There is no limit for the number of files and, assuming you don't need to actually list the files, there is no need to separate them into directories at all (and if you do need to list, you can use different delimeters and prefixes - http://docs.amazonwebservices.com/AmazonS3/latest/dev/ListingKeysHierarchy.html). And your hosting/rereival costs will probably be lower than doing yourself - and they get backed up.
To answer more specifically, yes, split by sub directories; using your structure, you can drop the first 5 characters of the filename as you alsready have it in the directory name.
And thumbs, as suggested by aquinas, just appent _thumb1 etc to the filename. Or store in separate folders themsevles.
1) That's something only you can answer. Generally, I prefer to store the images in the database so you can have ONE consistent backup, but YMMV.
2) How? How about /dc/eh/wd/dcehwd8y4fcf42wduasdha_thumb1.jpg, /dc/eh/wd/dcehwd8y4fcf42wduasdha_thumb2.jpg and /dc/eh/wd/dcehwd8y4fcf42wduasdha_thumb3.jpg
3) ??? Are you asking how to write a file to the file system or...?
Improve Answer.
For millions of Images, as yes, it is correct that using database will slow down the process
The best option will be either use "Server File System" to store images and use .htaccess to add security.
or you can use web-services. many servers like provide Images Api for uploading, displaying.
You can go on that option also. For example Amazon
Basically i have simple form which user uses for files uploading. Files should be stored under /files/ directory with some subdirectories for almost equally splitting files. e.g. /files/sub1/sub2/file1.txt
Also i need to not to store equal files (by filename).
I have own solution. Calculate sha1 from filename. Take first 5 symbols - abcde for example and put file in /files/a/b/c/d/e/ this works well, but gives situation when one folder contains 4k files, 2nd 6k files. Is there any way to make files count be more closer to each other? Max files count can be 10k or 10kk.
Thanks for any help.
P.S. May be i explained something wrong, so once again :) Task is simple - you have only html and php (without any db) and files directory where you should store only uploaded files without any own data. You should develop script that can handle storing uploads to files directory without storing duplicates (by filename) and split uploaded files by subdirectories by files count in each directory (optimal and count files in each directory should be close to each other).
I have no idea why you want it taht way. But if you REALLY have to do it this way, iI would suggest you set a limit how many bytes are stored in each folder. Everytime you have to save the data you open a log with
the current sub
the total number of bytes written to that directory
If necesary you create a new sub diretory(you coulduse th current timestempbecause it wont repeat) and reset the bytecount
Then you save the file and increment the byte count by the number of bytes written.
I highly doubt it is worth the work, but I do not really know why you want to distribute the files that way.
I have an PHP website which creates and stores HTML template files on server based on user input.one user can create many templates.So to store the template files and associate them with the DB record ,what I do is-
"templates" is the table which hold other information about the template such as who created it etc. with unique auto-increment id as template_id
for example -
if template id is 1001
I convert it to hex which is 03e9
Now I split the hex number into 03 & e9 (after two numbers) becomes folder and e9 becomes
file with some extension as "e9.tpl"
This is how I can find out template from the file system if I know the template ID.I dont need to separately store the path to the file.
is it a good approach ? any shortfalls of this approach ? is there any other approach better than this ?
What are the advantages / disadvantages of storing the path to file in the database itself ? for example to enable using different discs serving templates etc.?
If the ID in the DB table is already UNIQUE, why transform the id for the filesystem at all?
Just add a file 1001.tpl and you are all set. If you want to have template files sorted into folders, use the User ID (which I assume to be UNIQUE too), so you get folder 124/1001.tpl.
Depending on your deployment process, you will want to keep the created files outside the application folder, so not accidently delete them when updating the application.
Are you doing this because you are worried that you might run out of file entries/inodes in the directory? In ext3 the practical limit is somewhere around 100.000 files (and 32.000 dirs).
Creating a directory structure on the fly is better done using modulu as in $dir = $id % 1000 and then put the new template in that dir ($dir/$id.tpl). That strategy will create max 1000 dirs and you have thus made it possible to handle around 100.000.000 files.
I don't see any reason for messing with hexadecimal values or substrings.
If you have to hit the database to get the id, you may be just as well off storing the template in it as well. But there's nothing categorically wrong with storing them on the file system. I generally would.
When you hit 65,536, you'll get 0x10000. Make sure your code can handle that. I'd be more apt to store 0x1234 like: 1/1234.tpl, just for the sake of clarity. Note that by virtue of sequential IDs, your folders will fill up sequentially.
I'd probably not even convert them to hex. You could use a modulus operator to determine which folder to put them in. Figure out how many files you are likely to have and use that to determine how many folders you want.
For example:
$path = ($id % NUMBER_OF_FOLDERS) . "/$id.tpl"
where $id is the template id in decimal.
I don't understand the point in separating the hex into two parts to create different folders... That could create hundreds and hundreds of different folders which would become a complete mess on your server. Why not just store on the templates in one single folder with the hex value as the file name, such as 03e9.tpl?