I run a large number of PHP scripts via CLI executed by CRON in the following format :
/usr/bin/php /var/www/html/a15432/run.php
/usr/bin/php /var/www/html/a29821/run.php
In some instances there is a problem with once of the files whereby the execution of the file enters a loop or simply does not terminate due to an error in that particular script.
What I would like to achieve is that if either of the above occurs, PHP CLI simply stops executing the script.
I have searched this and all indications are that I need to use this command, which I have entered at the beginning of each file, but it does not seem to be making any difference.
ini_set('max_execution_time', 30);
Is there any other way I can better protect my server so that one (or more) of these rogue files cannot be allowed to bring down the server by entering some kind of infinite loop and using all of the servers resources trying to process the file? To make the problem worse, these files can be triggered every 5 minutes which means that at times, there are loads of the same files trying to process.
Of course I realise that the correct solution is to fix the files so that there is better error handling and that they are never allowed to enter this state, but at the moment I'd like to understand why this doesn't work as expected.
Thank you.
Related
I wrote some tvguide scrapers in php. I run them from a script that is executed by a cronjob. This script is run every minute, and checks if a scraper needs to start. This way i can alter and manage these scraping jobs without having to modify the cron itself.
These scraping scripts vary in runtime, some take no more than 1 minute, and others can take up to 4 hours. When i run them one after another there is no problem. But when i try to run two script simultaniously - one or both scripts hang. Resulting in a email from the cron:
sh: line 1: 700865 Hangup /usr/local/bin/php /home/id789533/domains/erdesigns.eu/public_html/tvg_schedules/scraper.php --country=dk --provider=1 --scraper=tv2 2>&1
Where the /usr/local/..... is the command for the script, and which is called from the scheduler script.
I just cant find anything related to this message, and i have no idea how to fix it. I can send the script itself if needed.
All advise and help would be apreciated.
[Edit] I also took a look at the resource usage, and the load never gets higher than 150mb and 15% load. I have a limit of 400% and 1GB.
I execute the scripts from the php script like so:
shell_exec(sprintf("/usr/local/bin/php %s 2>&1", $scraper));
where $scraper is the filename. It executes the script like it should, but after a while i get the message sh: line 1: 000000 Hangup
I know for sure that it is not allocating to much memory, someone who can direct me to the right way? I dont know where to look right now.
PHP is a language intended for the web with features like a cap for maximum execution time to make sure scripts do not run indefinitely and thereby blocking resources. Therefore, PHP is not the best choice for this task.
If it is only a short script I would advice you to convert it into a BASH or Python script. However, if you want to stick to PHP, check your php.ini file for settings restricting execution time.
Just wanted to know, how PHP will work in the following case?
Suppose I have cron script which is running every minute.
OR there is infinite loop script which is processing the queue table.
Now suppose I am updating any related class file which is used in infinite loop script.
Does it generate any error or stop the infinite loop script?
And what are the good practices need to follow in such situation?
Nothing will happen to already running scripts when you change any source code.
The source code is read from the file once at the start of the script, is parsed into bytecode, and then stays in memory until the end of the script. Your program is not actually "running from source" all the time or any such thing and it will not notice any changes to the source code files until it needs to load the file again.
An infinite loop program will only reflect changes when you stop and restart it.
A cron job will pick up any change the next time it runs.
I am working on web scraping with php and curl to scrap a whole website
but it takes more than one day to complete the process of scraping
I have even used
ignore_user_abort(true);
set_error_handler(array(&$this, 'customError'));
set_time_limit (0);
ini_set('memory_limit', '-1');
I have also cleared memory after scraping a page I am using simple html DOM
to get the scraping details from a page
But still process runs and works fine for some amount of links after that it stops although process keeps circulating the browser and no error log is generated
Could not understand what seems to be the problem.
Also I need to know if PHP can
run process for two or three days?
thanks in advance
PHP can run for as long as you need it to, but the fact it stops after what seems like the same point every time indicates there is an issue with your script.
You said you have tried ignore_user_abort(true);, but then indicated you were running this via a browser. This setting only works in command line as closing a browser window for a script of this type will not terminate the process anyway.
Do you have xDebug? simplehtmlDOM will throw some rather interesting errors with malformed html (a link within a broken link for example). xDebug will throw a MAX_NESTING_LEVEL error in a browser, but will not throw this in a console unless you have explicitly told it to with the -d flag.
There are lots of other errors, notices, warnings etc which will break/stop your script without writing anything to error_log.
Are you getting any errors?
When using cURL in this way it is important to use multi cURL to parallel process URLs - depending on your environment, 150-200 URLs at a time is easy to achieve.
If you have truly sorted out the memory issue and freed all available space like you have indicated, then the issue must be with a particular page it is crawling.
I would suggest running your script via a console and finding out exactly when it stops to run that URL separately - at least this will indicate if it is a memory issue or not.
Also remember that set_error_handler(array(&$this, 'customError')); will NOT catch every type of error PHP can throw.
When you next run it, debug via a console to show progress, and keep a track of actual memory use - either via PHP (printed to console) or via your systems process manager. This way you will be closer to finding out what the actual issue with your script is.
Even if you set an unlimited memory, there exists a physical limit.
If you call recursively the URLs, the memory can be fullfilled.
Try to do a loop and work with a database:
scan a page, store the founded links if there aren't in the database yet.
when finish, do a select, and get the first unscanned URL
{loop}
I am using a script with set_time_limit(60*60*24) to process a big amount of images. But after 1k images or so (1 or 2 minutes), the script stops without showing any errors in the command line.
I'm also using a logger that writes to a file any error thrown by the script, on shutdown (by using register_shutdown_function). But when this script stops, nothing is written (it should write something, even if no errors are thrown. It works perfect with any other script, on any other situation I ever had).
Apache error_log doesn't show anything either.
Any ideas?
Edit: My enviroment is Centos 5.5, with php 5.3.
It is probably running out of memory.
ini_set('memory_limit', '1024M');
May get you going if you can allocate that much.
Please make sure you're not running in safe mode:
http://php.net/manual/en/features.safe-mode.php
Please note that register_shutdown_function does NOT guaranties that the associated function will be executed everytime. So you should not rely on it.
see http://php.net/register_shutdown_function
To debug the issue check the PHP error log. (which is NOT the apache error log when you're using PHP from the console. check your PHP.ini or ini_get('error_log') to know where it is.)
A solution may be to write a simple wrapper script in bash that executes the script and then does what you want to be executed at the end of the script.
Also note that PHP doesn't count the time spent in external, non-php, activities, like network calls, some libraries functions, image magick, etc.
So the time limit you set may actually last much longer than you expect it to.
I have a PHP script that grabs a chunk of data from a database, processes it, and then looks to see if there is more data. This processes runs indefinitely and I run several of these at a time on a single server.
It looks something like:
<?php
while($shouldStillRun)
{
// do stuff
}
logThatWeExitedLoop();
?>
The problem is, after some time, something causes the process to stop running and I haven't been able to debug it and determine the cause.
Here is what I'm using to get information so far:
error_log - Logging all errors, but no errors are shown in the error log.
register_shutdown_function - Registered a custom shutdown function. This does get called so I know the process isn't being killed by the server, it's being allowed to finish. (or at least I assume that is the case with this being called?)
debug_backtrace - Logged a debug_backtrace() in my custom shutdown function. This shows only one call and it's my custom shutdown function.
Log if reaches the end of script - Outside of the loop, I have a function that logs that the script exited the loop (and therefore would be reaching the end of the source file normally). When the script dies randomly, it's not logging this, so whatever kills it, kills it while it's in the middle of processing.
What other debugging methods would you suggest for finding the culprit?
Note: I should add that this is not an issue with max_execution_time, which is disabled for these scripts. The time before being killed is inconsistent. It could run for 10 seconds or 12 hours before it dies.
Update/Solution: Thank you all for your suggestions. By logging the output, I discovered that when a MySql query failed, the script was set to die(). D'oh. Updated it to log the mysql errors and then terminate. Got it working now like a charm!
I'd log memory usage of your script. Maybe it acquires too much memory, hits memory limit and dies?
Remember, PHP has a variable in the ini file that says how long a script should run. max-execution-time
Make sure that you are not going over this, or use the set_time_limit() to increase execution time. Is this program running through a web server or via cli?
Adding: My Bad Experiences with PHP. Looking through some background scripts I wrote earlier this year. Sorry, but PHP is a terrible scripting language for doing anything for long lengths of time. I see that the newer PHP (which we haven't upgraded to) adds the functionality to force the GC to run. The problem I've been having is from using too much memory because the GC almost never runs to clean up itself. If you use things that recursively reference themselves, they also will never be freed.
Creating an array of 100,000 items makes memory, but then setting the array to an empty array or splicing it all out, does NOT free it immediately, and doesn't mark it as unused (aka making a new 100,000 element array increases memory).
My personal solution was to write a perl script that ran forever, and system("php my_php.php"); when needed, so that the interpreter would free completely. I'm currently supporting 5.1.6, this might be fixed in 5.3+ or at the very least, now they have GC commands that you can use to force the GC to cleanup.
Simple script
#!/usr/bin/perl -w
use strict;
while(1) {
if( system("php /to/php/script.php") != 0 ) {
sleep(30);
}
}
then in your php script
<?php
// do a single processing block
if( $moreblockstodo ) {
exit(0);
} else {
// no? then lets sleep for a bit until we get more
exit(1);
}
?>
I'd log the state of the function to a file in a few different places in each loop.
You can get the contents of most variables as a string with var_export, using the var_export($varname,true) form.
You could just log this to a certain file, and keep an eye on it. The latest state of the function before the log ends should provide some clues.
Sounds like whatever is happening is not a standard php error. You should be able to throw your own errors using a try... catch statement that should then be logged. I don't have more details other than that because I'm on my phone away from a pc.
I've encountered this before on one of our projects at work. We have a similar setup - a PHP script checks the DB if there are tasks to be done (such as sending out an email, updating records, processing some data as well). The PHP script has a while loop inside, which is set to
while(true) {
//do something
}
After a while, the script will also be killed somehow. I've already tried most of what has been said here like setting max_execution_time, using var_export to log all output, placing a try_catch, making the script output ( php ... > output.txt) etc and we've never been able to find out what the problem is.
I think PHP just isn't built to do background tasks by itself. I know it's not answering your question (how to debug this) but the way we worked this is that we used a cronjob to call the PHP file every 5 minutes. This is similar to Jeremy's answer of using a perl script - it ensures that the interpreter if free after the execution is done.
If this is on Linux, try to look into system logs - the process could be killed by the OOM (out-of-memory) killer (unlikely, you'd also see other problems if this was happening), or a segmentation fault (some versions of PHP don't like some versions of extensions, resulting in weird crashes).