I have a folder that contains 1000+ sub folders. In each subfolder there are 1-800 images, plus a thumbs folder with the same number of smaller versions. In total there are around 18000 photos and 18000 thumbs
I have created a script to run occasionally to check if everything is present (all paths, folders and photo names are stored in database).
Using either file_exist() or is_file(), plus clearstatcache(), I run a loop to my db records to check if everything is ok.
To check if the script actually works, I have included a check field in my table structure:
photo_present, SET(Y,N)
Each time I run the script each photo validated will have the photo_present flag set to Y.
After only a few records, 300-800, the script gives an internal server error 500.
I checked my table and I know that the script has run for a while since I see the photo_present field is set to Y.
My question is how to optimize it so either file_exist() or is_file() will continue to work until all the files have been checked?
How are you running the script? If it's by hitting the web page, it could be that your server caps the runtime of a script at 30 seconds or so and is killing PHP after that time, resulting in an Internal Server Error. The same could be true if it's being run via cron, but that's less likely.
Are you sure this isn't a script timeout? Check http://php.net/manual/en/function.set-time-limit.php
I myself consider glob being a good alternative when browsing folders and files, you might be able to figure out something.
<?php
//get all image files
$files = glob("{*.jpg,*.JPG,*.gif,*.GIF,*.png,*.PNG,*.cr2,*.CR2,*.DNG,*.dng}", GLOB_BRACE);
//print each file name
foreach($files as $file)
{
//Do your db validation here.
//echo "$file";
}
?>
Related
I have 1000 plus txt files with file name as usernames. Now i'm reading it by using loop. here is my code
for($i=0; $i<1240; $i++){
$node=$users_array[$i];
$read_file="Uploads/".$node."/".$node.".txt";
if (file_exists($read_file)) {
if(filesize($read_file) > 0){
$myfile = fopen($read_file, "r");
$file_str =fread($myfile,filesize($read_file));
fclose($myfile);
}
}
}
when loop runs, it takes too much time and server gets timed out.
I don't know why it is taking that much time because files have not much data in it. read all text from a txt file should be fast. am i right?
Well, you are doing read operations on HDD/SSD which are not as fast as memory, so you should expect a high running time depending on how big the text files are. You can try the following:
if you are running the script from browser, I recommend running it from command line, this way you will not get a web server time out and the script will manage to finish if there is no time execution limit set on php, case in which maybe you should increase it
on your script above you can set "filesize($read_file)" into a variable so that you do not execute it twice, it might improve running the script
if you still can't finish the job consider running it in batches of 100 or 500
keep an eye on memory usage, maybe that is why the script dies
if you need the content of the file as a string you can try "file_get_contents" and maybe skip "filesize" check all together
It sounds like your problem is having 1000+ files in a single directory. On a traditional Unix file system, finding a single file by name requires scanning through the directory entries one by one. If you have a list of files and try to read all of them, it'll require traversing about 500000 directory entries, and it will be slow. It's an O(n^2) algorithm and it'll only get worse as you add files.
Newer file systems have options to enable more efficient directory access (for example https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Hash_Tree_Directories) but if you can't/don't want to change file system options you'll have to split your files into directories.
For example, you could take the first two letters of the user name and use that as the directory. That's not great because you'll get an uneven distribution, it would be better to use a hash, but then it'll be difficult to find entries by hand.
Alternatively you could iterate the directory entries (with opendir and readdir) and check if the file names match your users, and leave dealing with the problems the huge directory creates for later.
Alternatively, look into using a database for your storage layer.
I have a problem when I create a new directory with PHP when uploading a file.
The directory is created, but if another instance of the same script runs at the same time the directory exist check doesn't work correctly (gives PHP warning).
Someone told me it's a race condition but I still have this issue after adding some random sleep time.
usleep(mt_rand(1, 50));
if(!is_dir($dir)){
mkdir($dir);
}
usleep(mt_rand(1, 50));
can anyone help?
Does anybody know a safe way to upload a file in multiple parts, with 3-4 parts being uploaded at the same time? Currently I'm moving the uploaded parts in a temporary directory (is_dir fails on the temporary dir if more parts arrive at the same time), then when the number of files from that dir equal the number of parts the parts get combined. But it fails many times, sometimes is_dir gives warning, sometimes the parts get combined twice...
mkdir() returns true on success, and false on failure (dir already exists). So you can just use it as a part of your conditional check.
if (!is_dir($dir) && mkdir($dir)) {
// good to go
}
You want to use clearstatcache() at the top of the page testing for the directory, so that the system cache of current directory structure is fresh. read more here
The upload javascript splits the file in multiple parts and sends 4 ajax requests to php, so the php receives 4 parts simultaneously. when all parts are received the php should combine them to reproduce the file on the server
So why split them up, send the Ajax as a single request and then allow PHP to handle each blob file part in its own scripting. This will sidestep the problem. PHP - with the correct setting up - can comfortably handle large files in chunk blocks.
I'm trying to mass compress images with GD on my site, it works just fine when I try to compress a small folder with 20 images but I have around 70k images and when I use the script I get a timeout and 500 error message. This is the code:
$di = new RecursiveDirectoryIterator('./image/data/');
$iter = new RecursiveIteratorIterator($di);
$regexIter = new RegexIterator(
$iter, '/^.+\.jpg$/i', RecursiveRegexIterator::GET_MATCH);
foreach ($regexIter as $fileInfo) {
$img = imagecreatefromjpeg($fileInfo[0]);
imagejpeg($img,$fileInfo[0],
75);
}
Now I already searched for this topic and found out that I can use:
set_time_limit();
So I decided to add
set_time_limit(100000);
but this is not working, I still the timeout message and no images are compressed
Do you have any suggestions on how I could do this efficiently because typing in every folder would take me weeks.
the better way to do big works is to do these in more parts.
e.g. you move the treated pictures in another directory and you stop the script after 100 pictures.
then you just have to restart the same script a few times and all pictures are done
To answer your question, increasing the timeout is something you should ask your hosting provider. Of course, they may refuse to do it.
A good idea is to transform your script to run from command line. The processing is faster and the timeout is usually much, much higher. But then again, it requires for you to have command line access on the server.
Last and preferred option is to transform your script into "chaining". Since most of the time will be spent doing the actual image conversion, this is what I would do:
get a list of all images with their full path; save in session or in a temporary table
start processing each image from the list, deleting it after it's been done
at every image, you check how much time it's passed since the start of the script, and if it's getting close to the timeout, you redirect to the same script with an additional "offset" parameter
So I'm trying to see if something like this is possible WITHOUT using database.
A file is uploaded to the server /files/file1.html
PHP is tracking the upload time by checking last update time in database
If the file (file1.html) has been updated since the last DB time, PHP makes changes; Otherwise, no changes are made
Basically, for a text simulation game (basketball), it outputs HTML files for rosters/stats/standings/etc. and I'd like to be able to insert each team's Logo at the top (which the outputted files don't do). Obviously, it would need to be done often as the outputted files are uploaded to the server daily. I don't want to have to go through each team's roster manually inserting images at the top.
Don't have an example as the league hasn't started.
I've been thinking of just creating a button on the league's website (not created yet) that when pushed would update the pages, but I'm hoping to have PHP do it by itself.
Yes, you could simply let php check for the file creation date (the point in time where the file was created on the server, not the picture itself was made). check http://php.net/manual/en/function.filemtime.php and you should be done within 30mins ;)
sexy quick & dirty unproven code:
$filename = 'somefile.txt';
$timestamp_now = time(); // get timestamp from now (seconds)
if (filemtime($filename) > $timestamp_now) {
// overwrite the file (maybe check for existing file etc first)
}
I will be sending new files over from one computer to another computer. How do I make PHP auto detect new/updated files in the folders and enter the information inside the files into mysql database?
Get all files you already know from the database
loop through the directory with http://www.php.net/manual/de/function.readdir.php
if the file is known, do nothing
if the file is not known, add it to the database
In the end, delete all files no longer in the directory
I would pick a set-up where new files and old fields are in a separate directory.
But if you have no choice, you could check the modification date and match it with your last directory iteration. (Use filemtime for this).
Don't forget to do some database checking when you process an image though.
Save the timestamp of the last check and when you check next look at the fileinfo and check creation date. Even better yet because you store filecontens in a database, check for the time it was modified using: filemtime()
You can't. PHP works as a preprocessor and even it has execution time limit (set in the configuration). If you need to process with PHP then make a PHP script that outputs a web page that use meta redirection to itself. Inside the script, you should loop over the files, query the database for the file name and its modification time, if it exists then nothing to do, otherwise, if the file name exists then it's an update, otherwise it's a new file.