Perform large operations in PHP without running out of memory? - php

I'm generating about 700 PDFs with dompdf and saving them to the disk. Each one involves a database query to fetch the information for the PDF. When I run it I get about 2 minutes in and then it runs out of memory.
Why am I running out of memory? Shouldn't each PDF generation consume x memory, then release it when finished?
What's good practice in PHP for doing large operations like this to avoid strangling your server?
Some more info:
It generates many PDFs before running out of memory (some 100s)
The PDF it fails on is not abnormally large or special in any way. They're all between 4KB and ~500KB
Watching the memory usage as it processes it just slowly climbs in fits and starts until it runs out.
I can't get the wysiwyg to code-format properly, so here's a pastie for the snippets in question: http://pastie.org/3800751

Your problem probably that you running your code asynchronously. Try running it synchronously, it might take a really long time but it will work. Also, make sure to dispose of your objects at the end of each loop.

Ideas:
Increase memory limit ini_set
Use a cron job, or generally try and run each pdf generation
asynchronously with a queue.
Batch (split) processing into a number which
can be processed within one page load / Redirect after each batch to
same page using header / Keeping track of where you are on each script load with session / Beware of redirect limits on browsers

In general, large operations like this should be forked into child processes that each handle generating a single PDF. This should avoid the out of memory situation in case your PDF or DB libraries leak memory.

Try to change memory_limit at php.ini

Related

Saving Memory / Calling external code and dumping memory used before continuing

I've got a backup script that continually builds excel files as it executes. The main script calls a class to handle the excel file build a few hundred times (one for each excel file).
The problem is that with each cycle of the loop, calling on the class to build the next excel file just adds to the used memory. Eventually this overwhelms the memory allocated for the execution.
I know the temporary answer is increase the memory allowed, but I was hoping I could wrap the file building with some memory-type ob_start/ob_clean functions.
I've tried to debug with xdebug for the past couple of days and I don't see any gaping holes that would cause the problem -- the memory usage is a pretty gradual increase overtime.
Thank you!
Try separating the actual "work" in a separate .php script and calling that using shell_exec (http://php.net/manual/en/function.shell-exec.php) from your main script.
That way any memory allocated by the "worker" script is automatically cleared when it finishes executing.
Also you might look at CRON jobs to execute the main script at intervals instead of allowing it to run as a daemon.
There's also forking.

PHP Scripts with a huge amount of Execution Time

I am writing a PHP script which downloads pictures from Internet. As the data is huge the execution time for the script varies from 10-15 minutes. Are there any better ways to handle such a situation or should i simple execute the script and let it take the time it takes?
Your script appears to be essentially I/O bound. Short of getting more bandwidth, there's little you can do.
You can improve user experience (if any) by increasing interactivity. For example you can save the filenames you intend to download in a session, and redisplay the page (and refresh it, or go AJAX) after each one, showing expected completion time, current speed, and percentage of completion.
Basically, the script will save in session the array of URLs, and at each iteration pop some of them and download them, maybe checking the time it takes (if you download one file in half a second, it's worth it to download another).
Since the script is executed several times, not only one, you need not worry about its timeout. You do, however, have to deal with the possibility of the user aborting the whole process.
I would've recommended multiple threads to do it faster if there are not any bandwidth restrictions. but the closest thing php has is process control.
Alternatively sometime ago I wrote a similar scraper, and to execute it faster I used the exec functions to instantiate multiple threads of the same file. Hence you also need to create a repository and locking mechanism. Sounds and looks dirty, but works!
If optimisation is worth investing time and if substantial part of execution time is consumed on image processing then calling a shell script that spins a few processes might be an option

Creating thumbanails from images 10-20 parallel at the same time with PHP

I have big imports, where I need to create for each entry an thumbnail of an image. The problem is, that, when the server is not the fastest one, for each entry I need 1-2 seconds to import it with the thumbnail in the DB. This is a huge time by 200k rows.
Is there any library in PHP where I can start for example the creation of 10-20 thumbnails as threads, parallel in the same time, so I can increase the import 10x, I hope.
Php provides functions as pcnlt_fork() but they should be used only from the CLI scripts. There's no way of making your webscript parallel. However you can always execute (for example) bash script which will run imagemagick resize ... &. But generally better approach is to prepare cronjob which will generate thumbnails in backgroud.
Perhaps you should look into increasing the PHP memory limit. More memory may enable PHP to process each image faster. If you have PHP process multiple images simultaneously, they will most likely each be processed more slowly and you will end up with the same or worse overall speed.
Also, why are you storing the images in the DB? Maybe you should instead store them in the filesystem and just store a reference to their location in the DB.

reading large file from remote server using php script

I am trying to read large files lets say illustrator file or photoshop file using cron job in my system.
Files size varies from 20 mb - 300 mb
I have been using some function but it break in middle while reading. So i wanted to have a fresh opinion.
Amount these function
file_get_contents
readfile
curl
which is most effective in terms of
consistency (should not break while reading file)
speed
resource uses
if there is more then two cron job, does it impact over all server performance.
Please share best practice code.
Thanks in advance
Use cURL. The file functions have been deprecated in favor of cURL to open remote files. It's not only faster, but also more reliable1 (you are less likely to experience timeouts).
If your script times out or runs out of memory anyways, you'll want to increase the execution time and memory limits (max_execution_time and memory_limit).
Other notes:
readfile() reads a file and prints it to the output buffer; it's no the same thing as file_get_contents().
If you compile curl with --with-curlwrappers then when you do file_get_contents() it will use cURL instead of the fopen() functions.
1 Citation needed.
you need to split the two tasks if files are so big.
first you download the file with wget and once you have your file you process it with php.
this way you are less likely to go into timeout problems.
if you don't know which file to download because it is a variable from php of some sort you can write to a file the name of the required file as first step of your job
then pass it to wget via --input-file=file as second step
and then process it as third and final step with your php program
DirectIO is a low level extension that bypasses the OS and goes straight to the hard disk, as a result it is probably the most efficient.
http://php.net/manual/en/ref.dio.php
Note that as of PHP 5.1.0 it is no longer bundled with PHP. Also, if your script is breaking in the middle of the operation, check your max_execution_time and max_memory.

PHP calling multiple videos to convert at once via mencoder.How can I limit it?

I recently installed my video script to a new server but I am seeing that it will start to convert 1 video (via mencoder) then before finishing it, it will try and convery another, and another, so it will be trying to convert 4+ videos at the same time causing the server to shut down. The script developer said:
"It converts each video in a PHP background process. There might be a way to limit the number of PHP background processes on your server and queue them."
So how is this done please?
Regards
Use PHP Semaphores
You can use a shared counting Semaphore in PHP, and implement a queue with a cap on the no. of parallel executions. Semaphores are always the most recommended method for any form of concurrency control.
Using this you can easily configure and control the parallel executions of mencoder, and limit them as well.
Pseudocode
Begin
init sem=MAX;
wait(sem) //sem--, waits if sem=0, till atleast one process comes out of the critical section
/*
Critical Section
where you execute mencoder
*/
signal(sem) //sem++
End
Use some sort of lock. For example, use file locking on a directory so only one process at a time can use the resource.
This would require a wrapper script for the encoder which will wait for the lock to be released by the currently running encoder.
It should also be smart enough to detect when a lock wasn't freed if the encoder crashes, and release the lock.
Edit:
My understanding of the problem was there were multiple invocations of the script, each calling mencoder. However, from the other response, it seems that it might be one invocation running all the processes in the background. Then I think the solution using Semaphores is better.
Edit:
Looks like someone asked this question before:
best-way-to-obtain-a-lock-in-php

Categories