I am building a PHP application that lets users upload photos. To make it manageable, a folder shall have maximum of 5000 photos. Each uploaded photo will be assigned an ID in the database, and the photo will be renamed to the ID.
How do I check if a certain folder has 5000 photos? Should I check the ID of the last row in the photo table?
Folders shall be named incrementally. eg. folder_1, folder_2, etc.
My proposed steps:
Retrieve last insert ID of photo table
(Last_insert_ID + 1) % 5000 = Some_number (The folder number it should be saved in)
Save photo to folder_some_number. If folder not exist, create new one.
Or is there a better way to do this?
I'm guessing database reads are going to be faster than file system reads like this.
You'd be better off running an sql query and getting a count per folder, grouped.
// Example Query
SELECT COUNT(file), folderId As Count FROM photos WHERE folderId IN ('1', '2') GROUP BY folder
// It would be beneficial to have a flag on the folders that would enable or disable them
// that way you're not iterating through folders that we already know are > 5k
// You would run this and have seperate query that would pull in these folder names
// and passing them to the above query.
SELECT foldername, folderId FROM folders WHERE countFlag = 0;
//Example Conditional.
if($Count > 5000):
// Over 5k Do Something
//Since were over 5k, set this folders flag to 1
// that way we arent iterating through it again
$countFlag = 1;
else:
// Under 5k Do Something else
endif;
Note: If you need actual code samples, i can whip something up real quick. The above examples leave out actual working code and are just for theory purposes. You will need to iterate through the returned rows as they are grouped by folder.
$last_id; // for example 9591
$current_folder = floor($last_id/5000);
$picture_num_in_folder = $last_id-($current_folder*5000);
if ($picture_num_in_folder == 5000)
// new dir and such (new folderid = $current_folder+1 and $picture_num_in_folder = 1)
else
// just place the picture with unique ID $last_id+1 called image_$picture_num_in_folder+1 in folder $current_folder
Don't use autoincrement ids for calculations. When you delete files, you'll get holes in your ID sequences which will throw off your math. Save the filepath to your db table. Then do a simple COUNT:
SELECT filepath, COUNT(filename) AS num FROM mytable
GROUP BY filepath;
You can save new files to filepaths where num < 5000.
Using the insert ID might not be very accurate, as there a number of error conditions that can cause an ID to be "skipped". You could store the folder number in a separate column, called "folder_number" or similar. That way you can get the highest folder number, and then do a count on the number of records in that folder, if it's less than 5000, add it in the same folder, otherwise run your logic to increment the folder count (creating the physical folder at the same time).
That should be faster than using the file system to check, which could be quite slow for the volume of files you're talking about.
Let me suggest another approach:
Use a hash on the picture's ID and use the first few characters as a path for example,
lets assume the image ID is 1 for which the hash is c4ca4238a0b923820dcc509a6f75849b.
Create the directory structure /rootpath/images/c/4/c/ and in it save the image.
You can use mkdir() to create nested directories like mkdir( '/dir/c/4/c/', 0777, true );.
This way you automatically distribute the load across many folders and the ID itself is the path.
If my memory serves, WordPress uses something similar...
if you want to organize your folders and files this way, you dont need to check if the folder has 5000 files, just assign the files to the corresponding folder.
If there are less than 5000 files in a x folder that should be full (because of the last ID is bigger than 5000*x) that means that some images have been removed. You can't reasign that ID to a row in your database so you can't refill the space of the removed files.
Related
I have a table in the database that keeps track of all images for each main system like so:
Table: system_images
field: systemid
field: imglarge
field: imgthumb
Sometimes a system gets deleted from the database including all records from the system_images that belong to that system entry. However the image itself is still physically on the server. Currently there is a cron job that grabs all the images in the directory, then queries each time to see if that image is still in the table, if not then delete the image off the server. Here is what the current cron job looks like
$system_images = array_diff(scandir($global_productimages),array('..', '.'));
$image_count = count($system_images);
if($image_count > 0)
{
foreach($system_images AS $curr_image)
{
$image_name = trim($curr_image);
$find = $image_query->findSystemImage($image_name);
if($find == 0)
{
unlink($imgpath .$image_name);
}
}
}
Is there a way where I dont have to do a single query for each image? There could be thousands of images in the directory.
Why dont you delete them right after you delete the system?
1) Before you delete the system record, use a SELECT statement and push in an array the image names that belongs on it
2) Delete the record
3) Unlink the images by iterating through the array which is holding the images
There is no need to use a cron for this job which looks like too heavy if you scan and query the DB one by one.
Two suggestions
Short term (expanding on the suggestion by #paokg4)
You could very easily perform the clean up on the command line with something like
FILEPATH=/global/product/imagedir; mysql -ENe 'select imglarge, imgthumb from `yourdbname`.`system_images` where systemid = 123' | grep -v '^\*' | xargs -i rm $FILEPATH/{}
Where we use mysql's vertical output format (-E) to get both filenames simultaneously while removing column names (-N), suppress the row numbers in the output with a grep, and pipe the results through to rm.
Longer term
It sounds as though the main problem you have here is in the case of removing a 'system' where there might be thousands of associated images - a final tidying up exercise. For this to be a problem it suggests that you store images for multiple systems in a single directory and that there is no obvious way at a file system level of distinguishing between images associated with different systems.
If this does indeed characterise the current setup then it also suggests a couple of longer term solutions to the problem which would allow to remove the files with a single operation e.g.
Store images for different systems in different directories - in
which case you can just remove the directory.
Add a system prefix to the filenames you store which would then allow
you to delete them with a simple wildcard search.
I'm explaining with an example for getting my solution.
I've a user table having id, name, image fields.
Sample record should be,
id - 1
name - TestUser
image - temp/testuser.jpg (sometimes no sub-directory 'temp'. ie, image - testuser.jpg)
After user registers, I'm moving the image to a new subdirectory.
ie, moving from temp/testuser.jpg(testuser.jpg) to 1/testuser.jpg
So I need to update with the new url.
ie, I need two operations on here. Replace the sub-directory temp with 1, if no sub-directory prepend '1/' to the existing image url.
How can I manage these operations in one query? Kindly help me :)
I recommend using the following query
update users set
image = concat('new_dir/', substring(image from locate('/', image) + 1))
where id = 1
I would suggest to separate the directory from the filename. You're adding another field, but it makes it easier.
Upon update, you only need to update the directory it resides in.
Otherwise you can do a substr() on the whole path and replace everything up to the last slash.
I am making a website for a cars show, i want to store images in the database (just the URL) and what i want to do is for all the images to be added to the same cell in the table.
then at retrieval time, i want to use the explode() command in php so i can seperate each URL and use a loop to display them.
the problem i am facing it i do not know what i should use for a delimiter, i cannot use anything that can be used in windows, mac or Linux which can be used in a file name, and i am afraid of using a system reserved key and cause a problem.
i am also concerned about the data type that will hold this information, i am thinking TEXT is best here but i heard many saying it causes problem.
to be clear, the idea is:
when someone uploads 3 images, the images will be uploaded into a folder, then the names will be taken and put into 1 string (after the directories names are added) with a separator between them that then will be stored in the database.
Then, i take that string, use explode() and store the separated data in an array and use a loop to display an image with the source being the stored data in the array.
i need a special delimiter or another way... can someone help me do this or tell me another way of saving the images somehow without a potential risk! i have seen many website which uses dynamic bullet points (lists) but i was never able to get a code or even an idea of how to do them.
EDIT:
The current way i am using is having 10 rows, 1 for each image.. but the problem here is that the user will not be able to add more than 10 images, and if he has less than 10 images then there will be few empty images being displayed. (i know a solution for the empty images, but it is impossible to add more images..)
You can to use any type of
serialization(serialize, json_encode), when put your array and
deserialization(unserialize, json_decode), when want to use it.
But! I advice you to create a new table for your images, with car_id field, for example.
Then you can just join it and get it all.
It can be CarImages ('id', 'car_id', 'image_file')
Also I recommend to add foreign key constraint on CarImages.car_id->Cars.id,
then your orphaned images will cascade removed, when cars will removed.
Storing of serialized values is always bad idea.
If you can't store one row per image on a separate table for any technical debt reason, then you should json_encode the array on images paths and store the result in database.
Solution one :
Create a table called Images contains 3 columns (id,image_path,user_id) and everytime the user uploads an image insert it into this table, and in your script if you want to display the uploads for a specified user get it by the user_id :
$user_id = 1 // The specified user id;
mysqli_query($connect, "SELECT * FROM images WHERE user_id = '$user_id'");
// And Loop here trough images
Solution Two :
Inserting images paths into one column.
Table files contains 1 column called path
Inserting the values to the files table :
$array = array(
'/path/to/file/1',
'/path/to/file/2',
'/path/to/file/3'
);
foreach($array as $path) {
$path = $path . '|';
mysqli_query($connect, "INSERT INTO files (path) VALUES ('$path');
}
Now display the results :
$query = mysqli_query($connect, "SELECT path FROM files");
$row = mysqli_fetch_assoc($query);
$paths = explode('|', $row['path']);
foreach($paths as $path) {
echo $path . '<br>';
}
If you do not change your database then you should try.I think below link useful for you
json
json-encode serialize
you can use anyone.
If You design Your Tables like
Table-user
user_id
username
another table for user images
Table-images
serial_num
user_id
image_url
then you can store many images for 1 user.here user_id of images table actually the foreign key of user table's user_id.
You are using relational database so it's good for you otherwise you can use nosql database
I am using the following as a 'delete all' option for users. Users have images in their account along with a thumbnail of each image. Each image also has a corresponding row in the database.
It works. What I am worried about is the fact that new images could be uploaded along with new rows being added to the db 'during' the execution of this script.
So say there is a large amount of images that are now going to be deleted... all the images are deleted in the first loop... now maybe new images have been added during this... now the thumbnail deletion occurs and deletes new thumbnails where its image wasn't deleted in the first pass. Finally, the rows in the db are deleted... same thing... new entries are deleted and now there is an image and thumb present yet no corresponding entry in the db. Hopefully that makes sense.
How can I ensure the same data is deleted from all three operations?
//delete all screenshots first
$ss_files = glob($_SERVER['DOCUMENT_ROOT'].'/../user-data/'.$_SESSION['user']['account_id'].'/screenshots/*');
foreach($ss_files as $ss_file)
{
if(is_file($ss_file))
{
unlink($ss_file); // delete file
}
}
//delete all thumbnails first
$ss_files = glob($_SERVER['DOCUMENT_ROOT'].'/../user-data/'.$_SESSION['user']['account_id'].'/screenshots/thumbs/*');
foreach($ss_files as $ss_file)
{
if(is_file($ss_file))
{
unlink($ss_file); // delete file
}
}
//create stmt
$stmt = $db->prepare("
DELETE del_table.*
FROM image_logs AS del_table
INNER JOIN users
ON users.user_id = del_table.user_id
INNER JOIN computers
ON computers.computer_id = users.computer_id
WHERE computers.account_id = :account_id
");
//add bindings and execute
$binding = array(
'account_id' => $_SESSION['user']['account_id']
);
$stmt->execute($binding);
What I am worried about is the fact that new images could be uploaded along with new rows being added to the db 'during' the execution of this script.
Your current approach is not good. First, if you expect deleting to take more time, then you should perhaps introduce "deleted" flag first. In such case if you want to delete i.e. image, you flag it as deleted in DB (same for user account if you need to do more cleaning) and that's it. Done. Deleted. Got some related data to physically remove? Make separate "garbage collector" that will be removing files and deleting cleaned "deleted" records from DB and run it via cron. In such approach, deleting is quick while cleaning leftovers may take longer but it's not a problem.
Also you should not iterate over files in the first place, but over DB records as DB is more important data.
Here is the best method I have come up with so far and I would like to know if there is an even better method (I'm sure there is!) for storing and fetching millions of user images:
In order to keep the directory sizes down and avoid having to make any additional calls to the DB, I am using nested directories that are calculated based on the User's unique ID as follows:
$firstDir = './images';
$secondDir = floor($userID / 100000);
$thirdDir = floor(substr($id, -5, 5) / 100);
$fourthDir = $userID;
$imgLocation = "$firstDir/$secondDir/$thirdDir/$fourthDir/1.jpg";
User ID's ($userID) range from 1 to the millions.
So if I have User ID 7654321, for example, that user's first pic will be stored in:
./images/76/543/7654321/1.jpg
For User ID 654321:
./images/6/543/654321/1.jpg
For User ID 54321 it would be:
./images/0/543/54321/1.jpg
For User ID 4321 it would be:
./images/0/43/4321/1.jpg
For User ID 321 it would be:
./images/0/3/321/1.jpg
For User ID 21 it would be:
./images/0/0/21/1.jpg
For User ID 1 it would be:
./images/0/0/1/1.jpg
This ensures that with up to 100,000,000 users, I will never have a directory with more than 1,000 sub-directories, so it seems to keep things clean and efficient.
I benchmarked this method against using the following "hash" method that uses the fastest hash method available in PHP (crc32). This "hash" method calculates the Second Directory as the first 3 characters in the hash of the User ID and the Third Directory as the next 3 character in order to distribute the files randomly but evenly as follows:
$hash = crc32($userID);
$firstDir = './images';
$secondDir = substr($hash,0,3);
$thirdDir = substr($hash,3,3);
$fourthDir = $userID;
$imgLocation = "$firstDir/$secondDir/$thirdDir/$fourthDir/1.jpg";
However, this "hash" method is slower than the method I described earlier above, so it's no good.
I then went one step further and found an even faster method of calculating the Third Directory in my original example (floor(substr($userID, -5, 5) / 100);) as follows:
$thirdDir = floor(substr($userID, -5, 3));
Now, this changes how/where the first 10,000 User ID's are stored, making some third directories have either 1 user sub-directory or 111 instead of 100, but it has the advantage of being faster since we do not have to divide by 100, so I think it is worth it in the long-run.
Once the directory structure is defined, here is how I plan on storing the actual individual images: if a user uploads a 2nd pic, for example, it would go in the same directory as their first pic, but it would be named 2.jpg. The default pic of the user would always just be 1.jpg, so if they decide to make their 2nd pic the default pic, 2.jpg would be renamed to 1.jpg and 1.jpg would be renamed 2.jpg.
Last but not least, if I needed to store multiple sizes of the same image, I would store them as follows for User ID 1 (for example):
1024px:
./images/0/0/1/1024/1.jpg
./images/0/0/1/1024/2.jpg
640px:
./images/0/0/1/640/1.jpg
./images/0/0/1/640/2.jpg
That's about it.
So, are there any flaws with this method? If so, could you please point them out?
Is there a better method? If so, could you please describe it?
Before I embark on implementing this, I want to make sure I have the best, fastest, and most efficient method for storing and retrieving images so that I don't have to change it again.
Thanks!
Do not care about the small speed differences of calculting the path, it doesn't matter. What matters is how well and uniformly the images are distributed in the directories, how short is generated the path, how hard is it to deduce the naming convention (lets replace 1.jpg to 2.jpg.. wow, it's working..).
For example in your hash solution the path is entirely based on userid, which will put all pictures belonging to one user to the same directory.
Use the whole alphabet (lower and uppercase, if your FS supports it), not just numbers. Check what other softwares do, a good place to check hashed directy names is google chrome, mozilla, ... It's better to have short directory names. Faster to look up, occupies less space in your html documents.