Long PHP script runs multiple times - php

I have a products database that synchronizes with product data ever morning.
The process is very clear:
Get all products from database by query
Loop through all products, and get and xml from the other server by product_id
Update data from xml
Log the changes to file.
If I query a low amount of items, but limiting it to 500 random products for example, everything goes fine. But when I query all products, my script SOMETIMES goes on the fritz and starts looping multiple times. Hours later I still see my log file growing and products being added.
I checked everything I could think of, for example:
Are variables not used twice without overwriting each other
Does the function call itself
Does it happen with a low amount of products too: no.
The script is called using a cronjob, are the settings ok. (Yes)
The reason that makes it especially weird is that it sometimes goes right, and sometimes it doesnt. Could this be some memory problem?
EDIT
wget -q -O /dev/null http://example.eu/xxxxx/cron.php?operation=sync its in webmin called on a specific hour and minute
Code is hundreds of lines long...
Thanks

You have:
max_execution_time disabled. Your script won't end until the process is complete for as long as it needed.
memory_limit disabled. There is no limit to how much data stored in memory.
500 records were completed without issues. This indicates that the scripts completes its process before the next cronjob iteration. For example, if your cron runs every hour, then the 500 records are processed in less than an hour.
If you have a cronjob that is going to process large amount of records, then consider adding lock mechanism to the process. Only allow the script to run once, and start again when the previous process is complete.
You can create script lock as part of a shell script before executing your php script. Or, if you don't have an access to your server you can use database lock within the php script, something like this.
class ProductCronJob
{
protected $lockValue;
public function run()
{
// Obtain a lock
if ($this->obtainLock()) {
// Run your script if you have valid lock
$this->syncProducts();
// Release the lock on complete
$this->releaseLock();
}
}
protected function syncProducts()
{
// your long running script
}
protected function obtainLock()
{
$time = new \DateTime;
$timestamp = $time->getTimestamp();
$this->lockValue = $timestamp . '_syncProducts';
$db = JFactory::getDbo();
$lock = [
'lock' => $this->lockValue,
'timemodified' => $timestamp
];
// lock = '0' indicate that the cronjob is not active.
// Update #__cronlock set lock = '', timemodified = '' where name = 'syncProducts' and lock = '0'
// $result = $db->updateObject('#__cronlock', $lock, 'id');
// $lock = SELECT * FROM #__cronlock where name = 'syncProducts';
if ($lock !== false && (string)$lock !== (string)$this->lockValue) {
// Currently there is an active process - can't start a new one
return false;
// You can return false as above or add extra logic as below
// Check the current lock age - how long its been running for
// $diff = $timestamp - $lock['timemodified'];
// if ($diff >= 25200) {
// // The current script is active for 7 hours.
// // You can change 25200 to any number of seconds you want.
// // Here you can send notification email to site administrator.
// // ...
// }
}
return true;
}
protected function releaseLock()
{
// Update #__cronlock set lock = '0' where name = 'syncProducts'
}
}

Your script is running for quite some time (~45m) and wget think it's "timing out" since you don't return any data. By default wget will have a 900s timeout value and a retry count of 20. So first you should probably change your wget command to prevent this:
wget --tries=0 --timeout=0 -q -O /dev/null http://example.eu/xxxxx/cron.php?operation=sync
Now removing the timeout could lead to other issue, so instead you could send (and flush to force webserver to send it) data from your script to make sure wget doesn't think the script "timed out", something every 1000 loops or something like that. Think of this as a progress bar...
Just keep in mind that you will hit an issue when the run time will get close to your period as 2 crons will run in parallel. You should optimize your process and/or have a lock mechanism maybe?

I see two possibilities:
- chron calls the script much more often
- script takes too long somehow.
you can try estimate the time a single iteration of the loop takes.
this can be done with time(). perhaps the result is suprising, perhaps not. you can probably get the number of results too. multiply the two, that way you will have an estimate of how long the process should take.
$productsToSync = $db->loadObjectList();
and
foreach ($productsToSync AS $product) {
it seems you load every result into an array. this wont work for huge databases because obviously a million rows wont fit in memory. you should just get one result at a time. with mysql there are methods that just fetch one thing at a time from the resource, i hope yours allows the same.
I also see you execute another query each iteration of the loop. this is something I try to avoid. perhaps you can move this to after the first query has ended and do all of those in one big query? otoh this may bite my first suggestion.
also if something goes wrong, try to be paranoid when debugging. measure as much as you can. time as much as you can when its a performance issue. put the timings in you log file. usually you will find the bottleneck.

I solved the problem myself. Thanks for all the replies!
My MySQL timed out, that was the problem. As soon as I added:
ini_set('mysql.connect_timeout', 14400);
ini_set('default_socket_timeout', 14400);
to my script the problem stopped. I really hope this helps someone. Ill upvote all the locking answers, because those were very helpful!

Related

In PHP, exec fails silently, sometimes, when calling many exec commands, but the same command run again later will work

I have a PHP script that uses exec('command args > /log/file &'); within a loop to create multiple child scripts that run at the same time. Basically, the parent script gets user information out of a database and creates child scripts running in parallel, then the child script creates an email to send to a single user. This happens approximately 50,000 times.
To prevent the creation of 50,000 simultaneously running processes, I have a database table that keeps track of the currently running processes, and before creating a new process the parent checks the current child count and sleeps if 25 children are currently active. The child, upon completing its task, deletes its row in the table, freeing the parent to create more children.
The problem is, about 10% of the exec commands fail silently, and for seemingly no reason. I can run the parent script again (it's smart enough not to email the same user twice), and it will work, once again, 90% of the time using the same exec commands that failed last time. Running the script five or six times in a row will email everyone.
By putting a sleep immediately after the exec, I can increase my success rate to around 95%.
Why would exec be failing, if the same command will work later? I can just keep the script repeating until it completes, but I'd much rather solve the exec problem.
Some highly simplified sample code:
Parent script:
do {
//get user, group, and supergroup information for users that haven't
//been emailed yet
foreach ($users as $userArray) {
$processId = insertIntoProcessQueue($userArray);
$cmd = 'sudo php -q ./childScript.php ' . cliArg($userArray) . ' ' .
cliArg($groupArray) . ' ' . cliArg($supergroupArray)
' ' . $proccessId . ' > file.log &';
exec($cmd);
do {
if (numChildren() >= 25) {
sleep(1);
$waiting = true;
}
} while ($waiting);
}
$incomplete = moreUsersToEmail() > 0 ? true : false;
} while ($incomplete);
function cliArg($array) {
return escapeshellarg(json_encode($arg));
}
Child script:
ignore_user_abort(true);
$user = json_decode($argv[1]);
$group = json_decode($argv[2]);
$supergroup = json_decode($argv[3]);
print_r($user);
$email = createEmail($user, $group, $supergroup);
$email->sendEmail();
removeFromProcessQueue($argv[4]);
flush();
exit;
The print_r will only show up in the log file when the script completes and I never get any errors, so I can't get any data about why it's failing. To add to that, it doesn't fail consistently on any individual users, and it doesn't fail running a single user at a time, so I have to run the script through everyone and try and catch the errors amidst the 45,000 that are working properly. And, since the parent and child never communicate beyond the parent starting the child, I can't detect (from the parent) when a child fails (otherwise I could immediately try and start any failed children again instead of rerunning the parent post-hoc).
Edit: So it turns out there's an included script that's dynamically generated and is destroyed and regenerated every time it's used (don't ask me why), which creates a race condition while running processes in parallel that caused the script to fail.
Thanks everyone for your unfortunately wasted time.
I just looked at the PHP docs for exec() and you can pass an array as a reference with a second parameter which will be filled with the output of exec. You can use this to determine a) why the command is failing and b) when the command fails and integrate that into your code.
So I'd change:
exec($cmd);
To something like:
function check_exec_results($results)
{
echo '<HR><PRE>',print_r($output,true),'</PRE><HR>'; //use this to figure out what output you're getting from the exec commands then remove when you've figured out a way to set $results_look_good below
$results_look_good = ?; //you will need to edit this yourself to actually do some kind of check
return $results_look_good;
}
$successful_exec = false;
do
{
$exec_results = array();
exec($cmd,$exec_results);
$successful_exec = check_exec_results($exec_results);
}
while (!$successful_exec);
Note that this is potentially an infinite loop so I'd also go a step further and set a limit to the number of times exec() can be called for each user.
So it turns out there's an included script that's dynamically generated and is destroyed and regenerated every time it's used (don't ask me why), which creates a race condition while running processes in parallel that caused the script to fail.
Thanks everyone for your unfortunately wasted time.

Don't run script if it's already running

I've been completely unsuccessful finding an answer to this question. Hopefully someone here can help.
I have a PHP script (a WordPress template, to be specific) that automatically imports and processes images when a user hits it. The problem is that the image processing takes up a lot of memory, particularly if multiple users are accessing the template at the same time and initiating the image processing. My server crashed multiple times because of this.
My solution to this was to not execute the image-processing function if it was already running. Before the function started running, I would check a database entry named image_import_running to see if it was set to false. If it was, the function then ran. The very first thing the function did was set image_import_running to true. Then, after it was all finished, I set it back to false.
It worked great -- in theory. The site hasn't crashed since, I can tell you that. But there are two major problems with it:
If the user closes the page while it's loading, the script never finishes processing the images and therefore never sets image_import_running back to false. The template will never process images again until it's manually set to false.
If the script times out while it's processing images -- and that's a strong possibility if there are many images in the queue -- you have essentially the same problem as No. 1: the script never gets to the point where it sets image_import_running back to false.
To handle No. 1 (the first one of the two problems I realized), I added ignore_user_abort(true) to the script. Did it work? I don't know, because No. 2 is still an issue. That's where I'm stumped.
If I could ask the server whether the script was running or not, I could do something like this:
if($import_running && $script_not_running) {
$import_running = false;
}
But how do I set that $script_not_running variable? Beats me.
I've shared this entire story with you just in case you have some other brilliant solution.
Try using
ignore_user_abort(true); it will continue to run even if the person leaves and closes the browser.
you might also want to put a number instead of true false in the db record and set a maximum number of processes that can run together
As others have suggested, it would be best to move the image processing out of the request itself.
As an interim "fix", store a timestamp alongside image_import_running when a processing job begins (e.g., image_import_commenced). This is a very crude mechanism, but if you know the maximum time that a job can run before timing out, the script can check whether that period of time has elapsed.
e.g., if image_import_running is still true but the current time is more than 10 minutes since image_import_commenced, run the processing anyway.
What about setting a transient with an expiry time that would throttle the operation?
if(!get_transient( 'import_running' )) {
set_transient( 'import_running', true, 30 ); // set a 30 second transient on the import.
run_the_import_function();
}
I would rather store the job into database flagging it pending and set a cron job to execute the processing one job at a time.
For Me i use just this simple idea with a text document. for example run.txt file
in the top script use :
if((file_get_contents('run.txt') != 'run'){ // here the script will work
$file = fopen('run.txt', 'w+');
fwrite($file, 'run');
fclose('run.txt');
}else{
exit(); // if it find 'run' in run.txt the script will stop
}
And add this in the end of your script file
$file = fopen('run.txt', 'w+');
fwrite($file, ''); //will delete run word for the next try ;)
fclose('run.txt');
That will check if script already work by checking runt.txt contents
if run word exist in run.txt it will not run
Running a cron would definitively be a better solution. Idea to store url in a table is a good one.
To answer to the original question, you may run a ps auxwww command with exec (Check this page: How to get list of running php scripts using PHP exec()? ) and move your function in a separated php file.
exec("ps auxwww|grep myfunction.php|grep -v grep", $output);
Just add following on the top of your script.
<?php
// Ensures single instance of script run at a time.
$fileName = basename(__FILE__);
$output = shell_exec("ps -ef | grep -v grep | grep $fileName | wc -l");
//echo $output;
if ($output > 2)
{
echo "Already running - $fileName\n";
exit;
}
// Your php script code.
?>

PHP ends script execution automaticly

Firstly I want to give you the basic idea of what I am trying to do:
I'm trying to make a free web hosting service do some work for me. I've created one php page and MySQL db. The basic idea behind my PHP page is I have a while loop with condition of $shutdown, and some counter inside while loop to track whether code is running or not
<?php
/*
Connect to database etc. etc
*/
$shutdown = false;
// Main loop
while (!$shutdown)
{
// Check for user shutdown request
$strq = "SELECT * FROM TB_Shutdown;";
$result = mysql_query($strq);
$row = mysql_fetch_array($result);
if ($row[0] == "true")
{
$shutdown = true; // I know this statement is useless but nevermind
break;
}
//Increase counter
$strq = "SELECT * FROM TB_Counter;";
$result = mysql_query($strq);
$row = mysql_fetch_array($result);
if (intval($row[0]) == 60)
{
// Reset counter
$strq = "UPDATE TB_Counter SET value = 0";
$result = mysql_query($strq);
/*
I have some code to do some works at here its not important just curl stuff
*/
else
{
// Increase counter
$strq = "UPDATE TB_Counter SET value = " . (intval($row[0]) + 1);
$result = mysql_query($strq);
}
/*
I have some code to do some works at here its not important just curl stuff
*/
// Sleep
sleep(1);
}
?>
And I have a check.php which returns me the value from TB_Counter.
The problem is: I'm tracking the TB_Counter table every second. It stops after a while. If I close my webbrowser (which I called my main while php loop page from) it stops after like 2 minutes. If not after 5-7 mins I get the error "connection has been reset" on browser and loop stops.
What should I do to make my loop lasts forever?
You need to allow PHP to execute completely. There is an option in the PHP.INI file which says:
max_execution_time = 30;
This sets the maximum time in seconds a script is allowed to run
before it is terminated by the parser. This helps prevent poorly
written scripts from tying up the server. The default setting is 30.
When running PHP from the command line the default setting is 0.
The function set_time_limit:
Set the number of seconds a script is allowed to run. If this is
reached, the script returns a fatal error. The default limit is 30
seconds or, if it exists, the max_execution_time value defined in the
php.ini.
To check if PHP is running in safe mode, you can use this:
echo $phpinfo['PHP Core']['safe_mode'][0]
If it is going to be a huge process, you can consider running on Cron as a CronJob. A small explanation on it:
Cron is very simply a Linux module that allows you to run commands at predetermined times or intervals. In Windows, it’s called Scheduled Tasks. The name Cron is in fact derived from the same word from which we get the word chronology, which means order of time.
Using Cron, a developer can automate such tasks as mailing ezines that might be better sent during an off-hour, automatically updating stats, or the regeneration of static pages from dynamic sources. Systems administrators and Web hosts might want to generate quota reports on their clients, complete automatic credit card billing, or similar tasks. Cron has something for everyone!
Read more about Cron
You could use php function set_time_limit().
You should not handle this from a browser. Run a cron every minute doing the checks you need would be a better solution.
Next to that why would you update every second? Just write down a timestamp so you know when a request was made?
Making something to run forever is not doable. More important is to secure that your business process keeps running. So maybe it would be wise to put your business case here, you seem to need to count seconds and do something within a minute but it's not totally clear. So what do you need to do?

PHP resets variables

I'm trying to create a script that creates unique codes and writes them to a textfile.
I've managed to generate the codes, and write them to the file.
Now my problem is the fact that my loop keeps running, resulting in over 92 000 codes being written to the file, before the server times-out.
I've done some logging, and it seems that everything works fine, it's just that after a certain amount of seconds, all my variables are reset and everything starts from scratch. The time interval after which this happens varies from time to time.
I've already set ini_set('memory_limit', '200M'); ini_set('max_execution_time',0); at the top of my script. Maybe there's a php time-out setting I'm missing?
The script is a function in a controller. I set the ini_set at the beginning of this function. This is the loop I'm going through:
public function generateAction() {
ini_set('memory_limit', '200M');
ini_set('max_execution_time',0);
$codeArray = array();
$numberOfCodes = 78000;
$codeLength = 8;
$totaalAantal = 0;
$file = fopen("codes.txt","a+");
while(count($codeArray)<$numberOfCodes){
$code = self::newCode($codeLength);
if(!in_array($code,$codeArray))
{
$totaalAantal++;
$codeArray[] = $code;
fwrite($file,'total: '.$totaalAantal."\r\n");
}
}
fclose($file);
}
In the file this would give something like this:
total: 1
total: 2
total: ...
total: 41999
total: 42000
total: 1
total: 2
total: ...
total: 41999
total: 42000
Thanks.
Edit: so far we've established that the generateAction() is called 2 or 3 times, before the end of the script, when it should only be called once.
I already found the solution for this problem.
The host's script limit was set to 90 seconds, and because this script had to run for longer, I had to run it via the command line.
Taking account of the test with uniqid(), we can say that variables are not reseted, but the method generateAction() is called several times.
Since you code is probably synchronous, we may say that generateAction() is called several times because the main script is called several times.
What happens in detail?
Because of the nature of your algorithm, each pass in the loop is slower then the previous one. So the duration of executing generateAction() may be quite long.
You probably don't wait for the end, and you stop the process or even start the process from a new page. Nevertheless, the process don't really stop so soon, and it keeps running in back-end. I've observed such a behavior on my local WAMP/LAMP installation: the script is not actually stopped even if I stop the page, if I close the page, even if I close the navigator or if I restart Apache.
So it happens to you that several script processes are writing simultaneously in the codes.txt file.
In order to avoid this, you can for example lock the file during the loop using function flock().

Wanted to execute Php code for some particular seconds

I wanted to execute a bunch of code for 5 seconds and if it has not finished executing within the specificed time frame I need to execute another piece of code..
Whether it's possible?
Ex..
There are two functions A and B
If A takes more than 30 seconds to execute the control should pass on to B
During function A you could periodically check how long the script has been executing, and if it goes over x seconds, run B:
function checkTime($start) {
$current = time();
$secondsToExecute = 5;
if (($start+$secondsToExecute) <= $current) {
func_b();
}
}
function func_a($start) {
// do some code
checkTime($start);
// do some code
checkTime($start);
// do some code
}
function func_b() {
// do something else
exit();
}
func_a(time());
http://php.net/manual/en/features.connection-handling.php
Set a time limit and a shutdown function, which checks if the status is 2 (timeout) and does your stuff if so.
One thing to note is that the time limit set this way only counts actual php processing time. Time spent with php waiting for another process or a database or http connection, etc, will not count and your time limit will not be considered reached.
If you need to count actual time that passed, even if it was not php processing time, you're going to have to go with the above suggested answer. Manually inserting that time check in places where it makes sense is the best, i.e. inside loops that you know may run too long, maybe even not on every iteration but on every N iterations, etc. Alternatively a more general approach is to use register_tick_function(), but that might lead to a noticeable performance hit with a low tick count, and you must take care to unregister it or use appropriate flags so you don't end up infinitely starting more and more calls to your timeout handling code once the timeout has happened.
Other approaches are also possible, you can register a handler for some signal using pcntl_signal() and have it sent to your process when the time limit is reached by an outside program ('man timeout' if you are on a linux box) or by a fork()-ed instance of your own php script, etc.

Categories