Best practice to find source of PHP script termination - php

I have a PHP script that grabs a chunk of data from a database, processes it, and then looks to see if there is more data. This processes runs indefinitely and I run several of these at a time on a single server.
It looks something like:
<?php
while($shouldStillRun)
{
// do stuff
}
logThatWeExitedLoop();
?>
The problem is, after some time, something causes the process to stop running and I haven't been able to debug it and determine the cause.
Here is what I'm using to get information so far:
error_log - Logging all errors, but no errors are shown in the error log.
register_shutdown_function - Registered a custom shutdown function. This does get called so I know the process isn't being killed by the server, it's being allowed to finish. (or at least I assume that is the case with this being called?)
debug_backtrace - Logged a debug_backtrace() in my custom shutdown function. This shows only one call and it's my custom shutdown function.
Log if reaches the end of script - Outside of the loop, I have a function that logs that the script exited the loop (and therefore would be reaching the end of the source file normally). When the script dies randomly, it's not logging this, so whatever kills it, kills it while it's in the middle of processing.
What other debugging methods would you suggest for finding the culprit?
Note: I should add that this is not an issue with max_execution_time, which is disabled for these scripts. The time before being killed is inconsistent. It could run for 10 seconds or 12 hours before it dies.
Update/Solution: Thank you all for your suggestions. By logging the output, I discovered that when a MySql query failed, the script was set to die(). D'oh. Updated it to log the mysql errors and then terminate. Got it working now like a charm!

I'd log memory usage of your script. Maybe it acquires too much memory, hits memory limit and dies?

Remember, PHP has a variable in the ini file that says how long a script should run. max-execution-time
Make sure that you are not going over this, or use the set_time_limit() to increase execution time. Is this program running through a web server or via cli?
Adding: My Bad Experiences with PHP. Looking through some background scripts I wrote earlier this year. Sorry, but PHP is a terrible scripting language for doing anything for long lengths of time. I see that the newer PHP (which we haven't upgraded to) adds the functionality to force the GC to run. The problem I've been having is from using too much memory because the GC almost never runs to clean up itself. If you use things that recursively reference themselves, they also will never be freed.
Creating an array of 100,000 items makes memory, but then setting the array to an empty array or splicing it all out, does NOT free it immediately, and doesn't mark it as unused (aka making a new 100,000 element array increases memory).
My personal solution was to write a perl script that ran forever, and system("php my_php.php"); when needed, so that the interpreter would free completely. I'm currently supporting 5.1.6, this might be fixed in 5.3+ or at the very least, now they have GC commands that you can use to force the GC to cleanup.
Simple script
#!/usr/bin/perl -w
use strict;
while(1) {
if( system("php /to/php/script.php") != 0 ) {
sleep(30);
}
}
then in your php script
<?php
// do a single processing block
if( $moreblockstodo ) {
exit(0);
} else {
// no? then lets sleep for a bit until we get more
exit(1);
}
?>

I'd log the state of the function to a file in a few different places in each loop.
You can get the contents of most variables as a string with var_export, using the var_export($varname,true) form.
You could just log this to a certain file, and keep an eye on it. The latest state of the function before the log ends should provide some clues.

Sounds like whatever is happening is not a standard php error. You should be able to throw your own errors using a try... catch statement that should then be logged. I don't have more details other than that because I'm on my phone away from a pc.

I've encountered this before on one of our projects at work. We have a similar setup - a PHP script checks the DB if there are tasks to be done (such as sending out an email, updating records, processing some data as well). The PHP script has a while loop inside, which is set to
while(true) {
//do something
}
After a while, the script will also be killed somehow. I've already tried most of what has been said here like setting max_execution_time, using var_export to log all output, placing a try_catch, making the script output ( php ... > output.txt) etc and we've never been able to find out what the problem is.
I think PHP just isn't built to do background tasks by itself. I know it's not answering your question (how to debug this) but the way we worked this is that we used a cronjob to call the PHP file every 5 minutes. This is similar to Jeremy's answer of using a perl script - it ensures that the interpreter if free after the execution is done.

If this is on Linux, try to look into system logs - the process could be killed by the OOM (out-of-memory) killer (unlikely, you'd also see other problems if this was happening), or a segmentation fault (some versions of PHP don't like some versions of extensions, resulting in weird crashes).

Related

How to run PHP process for longer time

I am working on web scraping with php and curl to scrap a whole website
but it takes more than one day to complete the process of scraping
I have even used
ignore_user_abort(true);
set_error_handler(array(&$this, 'customError'));
set_time_limit (0);
ini_set('memory_limit', '-1');
I have also cleared memory after scraping a page I am using simple html DOM
to get the scraping details from a page
But still process runs and works fine for some amount of links after that it stops although process keeps circulating the browser and no error log is generated
Could not understand what seems to be the problem.
Also I need to know if PHP can
run process for two or three days?
thanks in advance
PHP can run for as long as you need it to, but the fact it stops after what seems like the same point every time indicates there is an issue with your script.
You said you have tried ignore_user_abort(true);, but then indicated you were running this via a browser. This setting only works in command line as closing a browser window for a script of this type will not terminate the process anyway.
Do you have xDebug? simplehtmlDOM will throw some rather interesting errors with malformed html (a link within a broken link for example). xDebug will throw a MAX_NESTING_LEVEL error in a browser, but will not throw this in a console unless you have explicitly told it to with the -d flag.
There are lots of other errors, notices, warnings etc which will break/stop your script without writing anything to error_log.
Are you getting any errors?
When using cURL in this way it is important to use multi cURL to parallel process URLs - depending on your environment, 150-200 URLs at a time is easy to achieve.
If you have truly sorted out the memory issue and freed all available space like you have indicated, then the issue must be with a particular page it is crawling.
I would suggest running your script via a console and finding out exactly when it stops to run that URL separately - at least this will indicate if it is a memory issue or not.
Also remember that set_error_handler(array(&$this, 'customError')); will NOT catch every type of error PHP can throw.
When you next run it, debug via a console to show progress, and keep a track of actual memory use - either via PHP (printed to console) or via your systems process manager. This way you will be closer to finding out what the actual issue with your script is.
Even if you set an unlimited memory, there exists a physical limit.
If you call recursively the URLs, the memory can be fullfilled.
Try to do a loop and work with a database:
scan a page, store the founded links if there aren't in the database yet.
when finish, do a select, and get the first unscanned URL
{loop}

start /B doesn't start the task

I'm currently launching an asynchronous job with PHP to perform some tests.
To make it work, I found on SO some tips, like the use of popen and start:
$commande = "testu.bat";
$pid = popen('start /B ' . $commande, 'r');
$status = pclose($pid);
The testu.bat's folder is in my user PATH.
This script performs some task, and to control it's execution, it should generates a log file, but I never get it.
Whereas if I just remove the /B option, it works fine and I get my log file.
Did I miss something about background execution? How can I catch the error informations when it is running in the background?
It appears you are operating under the assumption that the /B switch to the start command means "background". It does not. From the start usage:
B Start application without creating a new window. The
application has ^C handling ignored. Unless the application
enables ^C processing, ^Break is the only way to interrupt
the application.
Processes launched by start are asynchronous by default. Since that appears to be what you want, just run the command without the /B switch.
Interesting one... Ok, here's what I think is going on:
Because you run the task in the background, the PHP script will just carry on, it is not waiting for testu.bat to return anything...
Or put another way, popen does what it was instructed to do, which is starting the task in the background (which it does) but then control is handed immediately back to PHP, whilst the log file is still being created in the background and the php script carries on at the same time...
What I would do in this case is let testu.bat call the php script (or another PHP script) in a callback type fashion once it has done its processing, in a similar way as in Javascript you would use callbacks for asynchromous Ajax calls...
Maybe provide the callback script command as a parameter to testu.bat..?
Hope this is of any help...
I'm not quite sure about your goal here, but here are some info you might use:
for figuring out background errors, you may find these functions useful:
set_exception_handler();
set_error_handler();
register_shutdown_function();
Of course write out the errors they catch into some file.
If you do not need any data back from your requests, you can simply use:
fsockopen()
curl
and give them a short timeout (10 milisec). The scripts will run in the backround.
Alternatively if you do need the data back, you can either put it into a database and set up a loop that checks if the data has already been inserted, or simply output it into a file and check for its existence.
In my opinion start launches the specified command by creating a new prcoess in the "background". Therefore the execution of start itself "just" starts the second process and exists immediately.
However, using the /B switch, the command to be executed will be excuted in the context of the start process. Therefore the execution of the start process takes longer. Now what I suspect is that executing pclose terminates the start process and as a result of this you don't get your log file.
Maybe one solution (not testet though) could be executing something like
start /B cmd "/C testu.bat" where start just tries to execute cmd and cmd gets /C testu.bat as parameter which is the "command" it shall execute.
Another thought:
What happens if you don't call $status = pclose($pid);?
Just for some people seeking this trick to works, in my case it just needs to activate the PHP directive ignore_user_abort in php.ini or by the PHP platform function.
Without this activated, the process is killed by pclose() without finishing the job.
Your problem is most likely properly solved by using a queue system. You insert a job into a queue that a background process picks up and works on. In this way the background task is completely independent of the HTTP request that initiated the task - but you can still monitor its progress.
The two most popular software packages that can help you in your scenario:
Gearman
Check out this gist and this totorial for installation on Windows.
RabbitMQ
Check out this tutorial for installation on Windows.
Note that implementing a queuing solution is not a 5 minute prospect, but it is technically the right approach for this sort of situation. For a company product this is the only viable approach; for a personal project where you just want something to work, there is a level of commitment required to see it through for the first time. If you're looking to expand your development horizons, I strongly suggest you give the queue a shot.

PHP script stops running after some time run by browser

I have made a PHP script which probably would take about 3 hours to complete. I run it from browser and after about 45minutes it stops doing anything. I know this since its polling certain web addresses and then saves some data to database. So it basically stops putting any data to database which lead me to conclusion that it has stopped. It still shows in browser like it would be loading the page though but its neverending.
There arent any errors so it probably is some kind of timeout... But where it occurs is mystery or how can I prevent it from happening. In my case I cant use the CLI, I must user browser client to initiate the script.
I have tried to put
set_time_limit(0);
But it had no apparent effect. Any suggestions what could cause the timeout and a fix for it?
Try this:
set_time_limit(0);
ignore_user_abort(true);
ini_set('max_execution_time', 0);
Most webhosts kill processes that run for a certain length of time. This is intended as a failsafe against infinite loops.
Ask your host about this and see if there's any way it can be disabled for this particular script of yours. In some cases, the killer doesn't apply to Cron tasks, or processes run by SSH. However, this varies from host to host.
Might be the browser that's timing out, not sure if browsers do that or not, but then I've also never had a page require so much time.
Suggestion: I'm assuming you are running a loop. Load a page then run each iteration of the loop in an ajax call to another page, not firing the next iteration until the previous one returns.
There's a setting in PHP to kill processes after sometime. This is especially important for shared servers (you do not want that one process slows up the whole server).
You need to ask your host if you can make modifications to php.ini (through .htaccess). In particular the max_execution_time setting.
If you are using session, then you would need to look at 'session.cookie_lifetime' and not set_time_limit. If you are using an array, the array size might also fill up.
Without more info on how your script handles the task, it would be difficult to identify.

I am looking for the PHP equivalent for VB doevents

I am looking for the PHP equivalent for VB doevents.
I have written a realtime analysis package in VB and used doevents to release to the operating system.
Doevents allows me to stay in memory and run continuously without filling up memory and allows me to respond to user input.
I have rewritten the package in PHP and I am looking for that same doevents feature.
If it doesn't exist I could reschedule myself and exit.
But I currently don't know how to do that and I think that would add a lot more overhead.
Thank you, gerardg
usleep is what you are looking for.. Delays program execution for the given number of micro seconds
http://php.net/manual/en/function.usleep.php
It's been almost 10 years since I last wrote anything in VB and as I recall, doevents() function allowed the application to yield to the processor during intensive processing (usually to allow other system events to fire - the most common being WM_PAINT so that your UI won't appear hung).
I don't think PHP has such functionality - your script will run as a single process and end (either when it's done or when it hits the default 30 second timeout).
If you are thinking in terms of threads (as most Windows programmers tend to do) and needing to spawn more than 1 instance of your script, perhaps you should take look at PHP's Process Control functions as a start.
I'm not entirely sure which aspects of doevents you're looking to emulate, so here's pretty much everything that could be useful for you.
You can use ob_implicit_flush(true) at the top of your script to enable implicit output buffer flushing. That means that whenever your script calls echo or print or whatever you use to display stuff, PHP will automatically send it all to the user's browser. You could also just use ob_flush() after each call to display something, which acts more like Application.DoEvents() in VB with regards to keeping your UI active, but must be called each time something is output.
Naturally if your script uses the output buffer already, you could build a copy of the buffer before flushing, with ob_get_contents().
If you need to allow the script to run for more time than usual, you can set a longer tiemout with set_time_limit($time). If you need more memory, and you have access to edit your .htaccess file, place the following code and edit the value:
php_value memory_limit 64M
That sets the memory limit to 64 megabytes.
For running multiple scripts at once, you can use pcntl_exec to start another one running.
If I am missing something important about DoEvents(), let me know and I will try to help you make it work.
PHP is designed for asynchronous on demand processing. However it can be forced to become a background task with a little hackery.
As PHP is running as a single thread you do not have to worry about letting the CPU do other things as that is already taken care of. If this was not the case then a web server would only be able to serve up one page at a time and all other requests would have to sit in a queue. You will need to write some sort of look that never expires until some detectable condition happens (like the "now please exit" message you set in the DB or something).
As pointed out by others you will need to set_time_limit($something); with perhaps usleep stopping the code from running "too fast" if it eats very much CPU each loop. However if you are also using a Database connection most of your script time is actually the script waiting for the Database (by far the biggest overhead for a script).
I have seen PHP worker threads created by using screen and detatching it to a background task. Other approaches also work so long as you do not have a session that will time out or exit (say when the web browser is closed). A cron that starts a script to check if the script is running every x mins or hours gives you automatic recovery from forced exists and/or system restarts.
TL;DR: doevents is "baked in" to PHP and you don't have to worry about it.

When does a script end without showing the reason?

I am using a script with set_time_limit(60*60*24) to process a big amount of images. But after 1k images or so (1 or 2 minutes), the script stops without showing any errors in the command line.
I'm also using a logger that writes to a file any error thrown by the script, on shutdown (by using register_shutdown_function). But when this script stops, nothing is written (it should write something, even if no errors are thrown. It works perfect with any other script, on any other situation I ever had).
Apache error_log doesn't show anything either.
Any ideas?
Edit: My enviroment is Centos 5.5, with php 5.3.
It is probably running out of memory.
ini_set('memory_limit', '1024M');
May get you going if you can allocate that much.
Please make sure you're not running in safe mode:
http://php.net/manual/en/features.safe-mode.php
Please note that register_shutdown_function does NOT guaranties that the associated function will be executed everytime. So you should not rely on it.
see http://php.net/register_shutdown_function
To debug the issue check the PHP error log. (which is NOT the apache error log when you're using PHP from the console. check your PHP.ini or ini_get('error_log') to know where it is.)
A solution may be to write a simple wrapper script in bash that executes the script and then does what you want to be executed at the end of the script.
Also note that PHP doesn't count the time spent in external, non-php, activities, like network calls, some libraries functions, image magick, etc.
So the time limit you set may actually last much longer than you expect it to.

Categories