Run large php scripts without them timing out? - php

I'm trying to mass compress images with GD on my site, it works just fine when I try to compress a small folder with 20 images but I have around 70k images and when I use the script I get a timeout and 500 error message. This is the code:
$di = new RecursiveDirectoryIterator('./image/data/');
$iter = new RecursiveIteratorIterator($di);
$regexIter = new RegexIterator(
$iter, '/^.+\.jpg$/i', RecursiveRegexIterator::GET_MATCH);
foreach ($regexIter as $fileInfo) {
$img = imagecreatefromjpeg($fileInfo[0]);
imagejpeg($img,$fileInfo[0],
75);
}
Now I already searched for this topic and found out that I can use:
set_time_limit();
So I decided to add
set_time_limit(100000);
but this is not working, I still the timeout message and no images are compressed
Do you have any suggestions on how I could do this efficiently because typing in every folder would take me weeks.

the better way to do big works is to do these in more parts.
e.g. you move the treated pictures in another directory and you stop the script after 100 pictures.
then you just have to restart the same script a few times and all pictures are done

To answer your question, increasing the timeout is something you should ask your hosting provider. Of course, they may refuse to do it.
A good idea is to transform your script to run from command line. The processing is faster and the timeout is usually much, much higher. But then again, it requires for you to have command line access on the server.
Last and preferred option is to transform your script into "chaining". Since most of the time will be spent doing the actual image conversion, this is what I would do:
get a list of all images with their full path; save in session or in a temporary table
start processing each image from the list, deleting it after it's been done
at every image, you check how much time it's passed since the start of the script, and if it's getting close to the timeout, you redirect to the same script with an additional "offset" parameter

Related

read more than 1000 txt files in core php

I have 1000 plus txt files with file name as usernames. Now i'm reading it by using loop. here is my code
for($i=0; $i<1240; $i++){
$node=$users_array[$i];
$read_file="Uploads/".$node."/".$node.".txt";
if (file_exists($read_file)) {
if(filesize($read_file) > 0){
$myfile = fopen($read_file, "r");
$file_str =fread($myfile,filesize($read_file));
fclose($myfile);
}
}
}
when loop runs, it takes too much time and server gets timed out.
I don't know why it is taking that much time because files have not much data in it. read all text from a txt file should be fast. am i right?
Well, you are doing read operations on HDD/SSD which are not as fast as memory, so you should expect a high running time depending on how big the text files are. You can try the following:
if you are running the script from browser, I recommend running it from command line, this way you will not get a web server time out and the script will manage to finish if there is no time execution limit set on php, case in which maybe you should increase it
on your script above you can set "filesize($read_file)" into a variable so that you do not execute it twice, it might improve running the script
if you still can't finish the job consider running it in batches of 100 or 500
keep an eye on memory usage, maybe that is why the script dies
if you need the content of the file as a string you can try "file_get_contents" and maybe skip "filesize" check all together
It sounds like your problem is having 1000+ files in a single directory. On a traditional Unix file system, finding a single file by name requires scanning through the directory entries one by one. If you have a list of files and try to read all of them, it'll require traversing about 500000 directory entries, and it will be slow. It's an O(n^2) algorithm and it'll only get worse as you add files.
Newer file systems have options to enable more efficient directory access (for example https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout#Hash_Tree_Directories) but if you can't/don't want to change file system options you'll have to split your files into directories.
For example, you could take the first two letters of the user name and use that as the directory. That's not great because you'll get an uneven distribution, it would be better to use a hash, but then it'll be difficult to find entries by hand.
Alternatively you could iterate the directory entries (with opendir and readdir) and check if the file names match your users, and leave dealing with the problems the huge directory creates for later.
Alternatively, look into using a database for your storage layer.

Charging time of GD and parsing web

I created a PHP file in which a map is drawn with GD based on data obtained from another site. The fact is that the PHP run-time makes the page loading is very slow.
The question is, is there any way this PHP code is executed only once a day? o any chance you run the web server automatically?
You need to cache your map image and load it from a file if it already exists. Regenerate it once a day. This skeletal code outlines how that can be accomplished. The first time the page loads when the image has become more than a day old, it will be regenerated and saved to a file.
// If the file is older than 1 day, create a new one
if (filemtime("imagecache.jpg") < time() - 86400) {
// Generate your new image and write it to a file
// Assuming $im is an image from GD
// UPDATE: fixed file_put_contents() because I didn't know imagejpeg()
// could write the file by itself.
imagejpeg($im, "imagecache.jpg");
}
Many ways to do this. They all start with having a PHP script that creates a static graphic file using gd and saves it somewhere in the disk. This is what you will show to users.
Once you're generating that file your two easiest choices might be:
Point your users to the static file, and invoke the php periodically using cron or something similar.
Point your users to a PHP script. Have your PHP script check the graphic file's timestamp and if it's older than a certain age, regenerate it and then send that to the user. Thus you have some PHP overhead but it's less than generating the graphic every time.
Create a cron job that runs once a day (preferably during a light traffic time) to do your heavy lifting, save or cache the result (for example, using APC or memcached, or even just overwriting the currently-used image with the new one), and display that result to your users.

php file exists problem - 1000's of files

I have a folder that contains 1000+ sub folders. In each subfolder there are 1-800 images, plus a thumbs folder with the same number of smaller versions. In total there are around 18000 photos and 18000 thumbs
I have created a script to run occasionally to check if everything is present (all paths, folders and photo names are stored in database).
Using either file_exist() or is_file(), plus clearstatcache(), I run a loop to my db records to check if everything is ok.
To check if the script actually works, I have included a check field in my table structure:
photo_present, SET(Y,N)
Each time I run the script each photo validated will have the photo_present flag set to Y.
After only a few records, 300-800, the script gives an internal server error 500.
I checked my table and I know that the script has run for a while since I see the photo_present field is set to Y.
My question is how to optimize it so either file_exist() or is_file() will continue to work until all the files have been checked?
How are you running the script? If it's by hitting the web page, it could be that your server caps the runtime of a script at 30 seconds or so and is killing PHP after that time, resulting in an Internal Server Error. The same could be true if it's being run via cron, but that's less likely.
Are you sure this isn't a script timeout? Check http://php.net/manual/en/function.set-time-limit.php
I myself consider glob being a good alternative when browsing folders and files, you might be able to figure out something.
<?php
//get all image files
$files = glob("{*.jpg,*.JPG,*.gif,*.GIF,*.png,*.PNG,*.cr2,*.CR2,*.DNG,*.dng}", GLOB_BRACE);
//print each file name
foreach($files as $file)
{
//Do your db validation here.
//echo "$file";
}
?>

Only grab completed files

I'm making a real simple "backend" (PHP5) for two flash/air-applications. One of them will upload a photo, the backend will save it to a folder, and the second app will poll the backend for new photo's and show them.
I don't got any access to a database, so the backend has to be pure PHP5 and nothing more. That's why I chose to save the images to a folder (with a timestamp in their names) and use readdir() to get them back.
This all works like a charm. Nevertheless, I would really like to make sure the backend only returns photo's that are completely uploaded, preventing the second app to try to load an unfinished image. Are there any methods/tricks that I can use to validate a file?
You could check the filesize a couple hundred milliseconds apart and see if it changes:
$first = filesize($file);
// wait 100ms
usleep(100000);
$second = filesize($file);
if($first == $second) {
// file is no longer being actively uploaded
}
The usual trick for atomic filesystem operations is to write into a temporary file that is not matched by the reader (e.g. XXX.jpg.tmp) and once it's completely uploaded, rename it to it's target name. Renames on the same volume are atomic, so there is no point where the file is either uncomplete or unavailable.
A really easy and common way to do so would be to create a trigger file based on the files name, so that you get something like
123.jpg
123.rdy
or
123.jpg
123.jpg.rdy
You create that file (just an empty stub) as soon as the upload is complete. The application that grabs files to load only cares about files with a trigger file and then processes those. Alternatively, you could also save the uploaded file as ie. 123.bsy or 123.jpg.bsy while it is still being uploaded and then rename it to the finale name 123.jpg after the upload is done. Since renames in the same directory are usually really cheap operations in term of processing time, the chances of running in a race condition should be pretty low. (This might or might now depend on the OS used, though ...)
If you need to keep the files in that place, you could, of course, use a database where you add a record for each file, as the upload is complete. The other app could then just provide files with a matching database record.
After writing this all down I figgered it out myself. What I did was adding the exact amount of bytes in the filename as well and validate that while outputting the list of images. The .tmp/.bsy-sollution is nice also, but I read it a bit to late :)
Upside to my solution is that no more renaming is required after the upload is done. Thanks everybody for your fast answers!

Ouput stuff while php script is running

So I just created a script to resize a whole bunch of images. Is there anyway to have there be output as its running through the loop?
Basically I have like 400 photos in photo db table. Its gathering a list of all these photos, then looping through each one and resizing it 3 times. (large,medium,small version).
Right now on each loop I am echoing that images results, but I dont see the results untill EVERYTHING is done. So like 10 minutes later, I will get output. I added this set_time_limit(0); to make sure it doesnt time out.
**EDIT ** It looks like every so often the script actually updates to the browser, maybe every 30 seconds?
You can use flush() or ob_flush() to tell the script to send something back to the client after you do an echo().
BUT - you never really have complete control over it, the web server does, so the web server may not cooperate depending on how its configured. For example, if you have the server doing gzip rather than using PHP's gzip features, the web server may still buffer the output.

Categories