Prevent PHP scripts from running concurrently - php

I have a cron script that would spider a website for new content and save the entries I need into the database. Entries are md5 hashed and validated, to prevent dupes. However I have noticed that there are sometimes two occurrences running at the same time, and the hashing method is failing at this point as I get two of each pregmatches inserted into DB.
Can someone recommend the best way to prevent this from happening in the future.
I have considered locking execution by checking log files, but in this case the script may get permanently locked if there is an error in the middle.
I'm looking into setting $_SESSION['lock'], so in this case if it locks and breaks, the session is bound to expire at some point.
Any ideas?

I think that $_SESSION should be left when running from a web server, not command line.
I would store last activity time in a file. If cron finishes its work normally, you delete the file.
When cron script runs, check the file. If file doesn't exist, or, last activity is older than a certain time span, you continue to execute, otherwise - stop.
This would be pretty easy to implement too.
Check, if script should run:
if(file_exists('lock.txt') && file_get_contents('lock.txt') > (time() - 60)){
die('Should not run!');
}
Log activity on certain script's life-cycle points:
file_put_contents('lock.txt', time());

Related

PHP script continuing to run after page close [initiated from AJAX]

Everything I google tells me this should not be happening, however it is.
I'm building a migration tool to build a 'master database'. I have an admin panel, only accessible to a few select people. There is a merge button that starts an AJAX call to run the migration php function. I'm not positive how long this script takes considering I'm still developing it but none the less I'm expecting a minimum of 20 minutes once pushed to production and populated with the production database. I do NOT need a lecture on best practices telling me not to do it via a GUI. This will become a cron as well, however I want to be able to induce it manually, if the admin desires.
So here's my process. The migration function immediately closes the session session_write_close() allowing me to run multiple php scripts simultaneously. I do this because I start a setInterval that checks to see a session variable. This is my 'progress' which is just an int on what loop iteration I'm on. In my migration script I open sessions, add 1 to that int, and close the sessions again. I do this at the end of each loop.
By doing this I have successfully created a progress for my AJAX. Now I noticed something. If I start my migration, then close out of my tab - or refresh. Once I reload the page my progress continues to grow. This tells me that the migration script is still executing in the background.
If I close 100% out of my browser, or clear my sessions I no longer see progress go up. This however is not because the script stops. This is because my progress indication relies on sessions and once I clear my sessions or close out my browser my session cookie changes. However I know the script is still running because I can query the database manually and see that entries are being added.
NOW to my question:
I do NOT want this. If my browser closes, if I press refresh, if I loose connection, etc I want the script to be TERMINATED. I want it to stop mid process.
I tried ignore_user_abort(false); however I'm pretty sure this is specific to command line and made no difference for me.
I want it to be terminated because I'm building a 'progress resume' function where we can choose where to resume the migration progress again.
Any suggestions?
UPDATE:
I didn't want to go this route but some solution I just though of is I could have another session variable. And it's my 'last time client was validated' which could be a timestamp. In my javascript, on the client side, every like 30 seconds I could hit a php script to 'update last time client was validated'. And in my migration function at the beginning of each loop I could check to make sure that timestamp isn't like 60 seconds old for example. If it IS 60 seconds old, or older, I do a die thus stopping my script. This would locally mean 'if there is no client updating this timestamp then we can assume the user closed out of his browser/tab/refreshed'. And as for the function I can ignore this check if in command line (cron). Not the ideal solution but it is my plan B
I am, and did, go with the solution to ping from the client to indicate if the client is still alive or not.
So essentially this is what I did:
From the client, in javascript, I set up a setInterval to run every 1.5 seconds and that hits a php script via AJAX. This php script updates a session variable with the current timestamp (this could easily be a database value if you needed to, however I didn't want the overhead of another query).
$_SESSION['migration_listsync_clientLastPing_'.$decodedJson['progressKey']] = time();
Then, inside my migration function I run a check to see if the 'timestamp' is over 10 seconds old, and if it is I die - thus killing the script.
if(isset($_SESSION['migration_listsync_clientLastPing_'.$progressKey])){
$calc = time() - $_SESSION['migration_listsync_clientLastPing_'.$progressKey];
if($calc > 10){
die();
}
}
I added a 'progressKey' param which is a random number from 1-100 that is generated when the function is called. This number is generated in javascript and passed into both of my AJAX calls. This way if the user refreshes the page and then immediately pressed the button again we won't have 2 instances of the function running. The 'old' instance will die after a few seconds and the new instance will take over.
This isn't an ideal solution however it is an effective one.

Script Instance Checker

I have been researching on how to approach this. What I am trying to prevent is an overlapping execution of a cronjob. I would like to run my script in every minute basis because the application is support needs a constant look out. The problem is if it takes quite a long time to finish and the next cron execute will catch up.
I have searched and some posted about PID but did not get on how to do it. I cannot use lock files because it can be unreliable, tried it already.
Is there any other approach on this?
Thank you.
Get each job to write to a database in completion. Then put an if statement at the start of each script to ensure that the other script has run and completed (by checking your database).
Alternatively...
You could have your first script run your second script at the end?

How do I detect that a PHP CLI script is in a Hung state

I am using supervisor (http://supervisord.org/) to daemonize a fairly standard PHP script. The script is structured something like:
while (1) {
// Do a SQL select
// for any matching rows, do something
// if I have been running for longer than 60 mins, exit
}
Today, this script (which has been fairly stable for some time now), hung. It did not crash (ie issue SIGHUP or SIGTERM signals) which would have alerted supervisord to restart the process. It did not encounter any errors in its processing, which would have either been caught by the script, or at least have triggered a fatal error and exited. Instead of these "catchable" scenarios, it just sat there. We do have a cron job setup to run every hour to restart the script through the supervisorctl hook, because it seems to be generally accepted that PHP scripts are leaky in terms of memory and would do well to be restarted if running long. The script resumed operations normally after that reboot.
My question: how can I detect that this script has hung? I can't even begin to diagnose or troubleshoot this problem of why it has hung, if I am not somehow alerted to that state. I am looking for either a software solution to this, or some approach that I can take to author a solution myself ( in either PHP, Python, perl or shell).
The script is written in PHP 5.2.6, and runs on a uptodate RHEL 5 server.
Please let me know if I can share any additional information if it will help with a more awesome solution.
Thank you!
Shaheeb R.
Since this is a case where the script is hanging, PHP possibly may not process any additional code that could detect this hang. For this reason, I suggest modifying the script to keep a log. This would allow the main script to let anything outside of it know it is still running, and with some well placed updates it can also help pinpoint where things have gone awry.
The logging can be written to a file or database, and should contain at least an indicator of the scripts status, such as a last modified date. If this script is not constantly running, then something should also indicate it is running or has stopped. In the example you gave, the log writing would occur within the while loop at least once, possibly more. It costs time/resources to open the pointers or DB connection, so I recommend logging only what is needed. (Note: If using the text file approach, the file would need to be closed right after each write.)
Example:
while (1) {
log('Running SQL select');
// Do a SQL select
log('Results retrieved');
// for any matching rows, do something
// (check log) if I have been running for longer than 60 mins, exit
}
function log($msg) {
// Write timestamp, $msg to log
}
A separate script would need to check the log and report any errors, which could be problematic if it's affected by what's making the main script hang, but I can't think of an alternative.
In regards to memory, if you are not already using mysql_free_result you should give give it a try.
My suggestion would be similar to what #Shroder described, but taking it a little further. With each run you would create a log/db entry, it would be timestamped + transaction aware (you would update the transaction at start of run to processing and then when done, sign off the entry with completed.
On the side you would run a simple cron check, and see if current time is larger than your trigger (60 minutes, etc) by using the timestamp and transaction state. At that point you throw an alert, etc;
It's quite simple! Just calculate the difference in time from the start of the loop to the current execution point.
$starttime = microtime(true);
while (1)
{
//Do your stuff here
//More SQL, whatever you need
//Put this at the end of the loop
$curtime = microtime(true);
$timetaken = $curtime - $starttime;
if($timetaken > (60 * 60))
{
break;
}
}
microtime(true) will return the seconds since the Unix epoch, so if we subtract the time we start from the current time, we get time taken/elapsed and exit the loop if it's over 60*60 seconds.

Migrate MySQL data, speed & efficiency

I had to change the blueprint of my webapplication to decrease loading time (http://stackoverflow.com/questions/5096127/best-way-to-scale-data-decrease-loading-time-make-my-webhost-happy).
This change of blueprint implies that the data of my application has to be migrated to this new blueprint (otherwise my app won't work). To migrate all my MySQL records (thousands of records), I wrote a PHP/MySQL script.
Opening this script in my browser doesn't work. I've set the time limit of the script to 0 for unlimited loading time, but after a few minutes the script stops loading. A cronjob is also not really an option: 1) strange enough it doesn't load, but the biggest problem: 2) I'm afraid this is going to cost too much resources of my shared server.
Do you know a fast and efficient way to migrate all my MySQL records, using this PHP/MySQL script?
You could try PHP's "ignore_user_abort". It's a little dangerous in that you need SOME way to end it's execution, but it's possible your browser is aborting after the script takes too long.
I solved the problem!
Yes, it will take a lot of time, yes, it will cause an increase in server load, but it just needs to be done. I use the errorlog to check for errors while migrating.
How?
1) I added ignore_user_abort(true); and set_time_limit(0); to make sure the scripts keeps running on te server (stops when the while() loop is completed).
2) Within the while() loop, I added some code to be able to stop the migration script by creating a small textfile called stop.txt:
if(file_exists(dirname(__FILE__)."/stop.txt")) {
error_log('Migration Stopped By User ('.date("d-m-Y H:i:s",time()).')');
break;
}
3) Migration errors and duplicates are logged into my errorlog:
error_log('Migration Fail => UID: '.$uid.' - '.$email.' ('.date("d-m-Y H:i:s",time()).')');
4) Once migration is completed (using mail()), I receive an email with the result of migration, so I don't have to check this manually.
This might not be the best solution, but it's a good solution to work with!

Is it possible to make a PHP script run itself every hour or so without the use of a cronjob?

I'm pretty sure I've seen this done in a php script once, although I cant find the script. It was some script that would automatically check for updates to that script, and then replace itself if there was an update.
I don't actually need all that, I just want to be able to make my PHP script automatically run every 30 minutes to an hour, but I'd like to do it without cronjobs, if its possible.
Any suggestions? Or is it even possible?
EDIT: After reading through a possible duplicate that RC linked to, I'd like to clarify.
I'd like to do this completely without using resources outside of the PHP script. AKA no outside cronjobs that send a GET request. I'd also like to do it without keeping the script running constantly and sleeping for 30 minutes
If you get enough hits this will work...
Store a last update time somewhere(file, db, etc...). In a file that gets enough hits add a code that checks if the last update time was more xx minutes ago. If it was then run the script.
You may want to use the PHP's sleep function with specified time to run your code with that interval or you may want to try some online cron job services if you wish.
Without keeping the script running constantly, you'll either have to use something hackish that's not guaranteed to actually run (using regular user pages accesses to run a side routine to see if X amount of time has passed since last run of the script and if so, run it again), or use an external service like cron. There's no way for a regular PHP script to just magically invoke itself.
You can either use AJAX calls from your real visitors to run scheduled jobs in the background (google for "poor man's cron", there are a number of implementations out there) or use some external cron-like service (for example a cronjob on some other machine). In theory you could just run a PHP script with no timeout and make it loop forever and fire off requests at the appropriate time, but the only thing that would achieve is reinventing cron in a very ineffective and fragile way (if the script dies for some reason, it will never start again on its own, while cron would just call it again).
Either way, you will need to set proper execution time so the script does not exceed it.
I found this:
<?php
// name of your file
$myFile="time.db";
$time=file($myFile);
if(time()-3600 > $time[0]){
// an hour has elapsed
// do your thing.
// write the new timestamp to file
$fh = fopen($myFile, 'w') or die("can't open file");
fwrite($fh, time());
fclose($fh);
}
else{
// it hasn't been an hour yet, so do nothing
}
?>
in here
If the host includes a mysql 5.1+ db then perhaps timed triggers are availible to call the script? I like these mission impossible type questions, but need more information on what kind of playground and rules for the best answer.

Categories