See also Having a PHP script loop forever doing computing jobs from a queue system, but that doesn't answer all my questions.
If I want to run a PHP script forever, accessing a queue and doing jobs:
What is the potential for memory problems? How to avoid them? (any flush functions or something I should use?)
What if the script dies for some reason? What would be a good method to automatically start it up again?
What would be the best basic approach to start the script. Since it runs forever, I don't need cron. But how do I start it up? (See also 2.)
Set the queue up as a cron script. Have it execute every 10 seconds. When the script fires up, check if there's a lock file present (something like .lock). If there is, exit immediately. If not, create the .lock and start processing. If any errors occur, email/log these errors, delete .lock and exit. If there's no tasks, then exit.
I think this approach is ideal, since PHP isn't really designed to be able to run a script for extended periods of time like you're asking. To avoid potential memory leaks, crashes etc, continuously executing the script is a better approach.
While PHP can access (publish and consume) MQ's, if at all possible try to use a fully functional MQ application to do this.
A fully functional MQ application (in ruby, perl, .NET, java etc) will handle all of the concurrency, error logging, state management and scalability issues that you discuss.
Not going too far with state machines, at least it's a good idea to introduce states both to 'jobs' (example: flv2avi conversion) and 'tasks' (flv2avi 1.flv).
On my script (Perl), sometimes zombie processes are starting to downgrade the whole script's performance. It is a rare case, but it is native in source, so the script should be able to stop reading queue anymore, allowing new instance to continue its tasks&jobs; however, keeping as much of running tasks' data is welcome. Once first instance has 1-2 tasks, it gets killed.
On start :
check for common errors (due to shutdown)
check for known errors (out of space, can't read input)
kill whatever may be killed and set status to 'waiting'
start all waiting.
If you run a piped jobs (vlc | ffmpeg, tail -f | grep), you can try to avoid using too much I/O in your program, instead doing fork() (bad idea for PHP?) or just calling /bin/bash -c "prog1 | prog2", this saves a lot of cpu load.
Start points: both /etc/rc.d and cron (check processes, run first instance || run second with 'debug' argument )
Related
I've previously used Gearman along with supervisor to manage jobs.
In this case we are using Amazon SQS which I have spent some time trying to get my head around.
I have set up a separate micro instance from our main webserver to use as an Image processing server (purely for testing at the moment, it will be upgraded and become part of a cluster before this implementation goes live)
On this micro instance I have installed PHP and ImageMagick in order to perform the image processing.
I have also written a worker script which receives the messages from Amazon SQS.
All works perfectly, however I need this script to run over and over again in order to continuously check for messages.
I don't like the thought of running a continuous loop so have started to look at other methods with little success.
So my question is what is generally considered the best practice way to do this?
I am worried about memory since PHP wasn't really designed for this, therefore it feels like running the script for a while, then stopping and restarting it might be my best bet.
I have experience using supervisor (to ensure that gearman workers kept running) and am wondering if I could simply use that to continuously execute the simple php script over and over?
My thoughts are as follows:
Set up SQS long polling so that the script checks for 20 seconds.
Use a while loop with a 20 second sleep to keep this script running for say an hour at a time
Have all this run through supervisor. When the hour is up and the loop is complete, allow the script to exit.
Supervisor should then automatically restart it
Does this sound viable? Is there a better way? What is generally considered the best practice for receiving SQS messages in PHP?
Thanks in advance
In supervisord you can set autorestart to true to have it run your command over and over again. See: http://supervisord.org/configuration.html#program-x-section-settings
Overall, using an endless while loop is perfectly fine, PHP will free your objects correctly and keep memory in check if written correctly. It can run for years without leaks (if there's a leak, you probably created it yourself, so review your code).
How do I stop a Supervisord process without killing the program it's controlling? might be of interest to you; the OP had a similar setup, with autorestart and wanted to add graceful shutdowns to it.
I need to write a server-side program that lives on the server, and is checking a database consistently for new entries.
When a new entry shows up in the database, the program should process the data and put the results somewhere else.
It is important to hi-light that the process isn't instigated by new entries showing up, but by the program checking for new entries on its own.
Some people I've spoken to brought up cron jobs, I was curious what if this is the solution for me? I see that it has limitations, it won't run less than every minute. I was hoping for the program to run every 5 seconds, would I be better off writing a shell script or is that a bootleg fix?
I'm not sure if this is conventional (?) but...
Use a database trigger on INSERT that runs an external program (PHP, Python, .. whatever). Which database are you using? I think this post is old but might be of help: http://crazytechthoughts.blogspot.co.uk/2011/12/call-external-program-from-mysql.html
There is a technique I've frequently used when dealing with queues that I've been processing.
#!/bin/sh
php -f checkDBAndAct.php
sleep 5
exec $0
The exec $0 part starts the script running again, replacing itself in memory, so it will run forever without issues. Any memory the PHP script uses is cleaned up whenever it exits, so that's not a problem either.
A simple line will start it, and put it into the background:
cd /x/y/z ; nohup ./loopToProcessDB.sh &
or it can be similarly started when the machine starts with various means (such as Cron's '#reboot ....')
-- from https://stackoverflow.com/a/2686100/6216
An extended version is on http://PHPscaling.com and https://gist.github.com/alister/1386212
Though I'd use an actual queue system, rather than a DB, as there are a number of downsides to bending a database to this task.
I am developing a website that requires a lot background processes for the site to run. For example, a queue, a video encoder and a few other types of background processes. Currently I have these running as a PHP cli script that contains:
while (true) {
// some code
sleep($someAmountOfSeconds);
}
Ok these work fine and everything but I was thinking of setting these up as a deamon which will give them an actual process id that I can monitor, also I can run them int he background and not have a terminal open all the time.
I would like to know if there is a better way of handling these? I was also thinking about cron jobs but some of these processes need to loop every few seconds.
Any suggestions?
Creating a daemon which you can make calls to and ask questions would seem the sensible option. Depends on wether your hoster permits such things, especially if you're requiring it to do work every few seconds, then definately an OS based service/daemon would seem far more sensible than anything else.
You could create a daemon in PHP, but in my experience this is a lot of hard work and the result is unreliable due to PHP's memory management and error handling.
I had the same problem, I wanted to write my logic in PHP but have it daemonised by a stable program that could restart the PHP script if it failed and so I wrote The Fat Controller.
It's written in C, runs as a daemon and can run PHP scripts, or indeed anything. If the PHP script ends for whatever reason, The Fat Controller will restart it. This means you don't have to take care of daemonising or error recovery - it's all handled for you.
The Fat Controller can also do lots of other things such as parallel processing which is ideal for queue processing, you can read about some potential use cases here:
http://fat-controller.sourceforge.net/use-cases.html
I've done this for 5 years using PHP to run background tasks and its no different to doing in any other language. Just use CRON and lock files. The lock file will prevent multiple instances of your script running.
Also its important to monitor your code and one check I always do to prevent stale lock files from preventing scripts to run is to have second CRON job to check if if the lock file is older than a few minutes and if an instance of the PHP script is running, if not it then removes the lock file.
Using this technique allows you to set your CRON to run the script every minute without issues.
Use the System::Daemon module from PEAR.
One solution (that I really need to try myself, as I may need it) is to use cron, but get the process to loop for five mins or so. Then, get cron to kick it off every five minutes. As one dies, the next one should be finishing (or close to finishing).
Bear in mind that the two may overlap a bit, and so you need to ensure that this doesn't cause a clash (e.g. writing to the same video file). Some simple inter-process communication may be useful, even if it is just writing to a PID file in the temp directory.
This approach is a bit low-tech but helps avoid PHP hanging onto memory over the longer term - sort of in-built task restarts!
I have a PHP-script running on my server via a cronjob. The job runs every minute. In the php script i have a loop that executes, then waits one sevond and loops again. Essentially creating a script to run once every second.
Now I'm wondering, if i make the cronjob run only once per hour and have the script still loop for an entire hour or possible an entire day.. Would this have any impact on the servers cpu and or memory and if so, will it be positive or negative?
I spot a design flaw.
You can always have a PHP script permanently running in a loop performing whatever functionality you require, without dependency upon a webserver or clients.
You are obviously checking something with this script, any incites into what? There may be better solutions for you. For example if it is a database consider SQL triggers.
In my opinion it would have a negative impact. since the scripts keeps using resources.
cron is called on a time based scale that is already running on the server.
But cronjob can only run once a minute at most.
Another thing is if the script times out, fails, crashes for whatever reason you end up with not running the script for at max one hour. Would have a positive impact on server load but not what you're looking for i guess? :)
maybe run it every 2 or even 5 minutes to spare server load?
OR maybe change the script so it does not wait but just executes once and calling it from cron job. should have a positive impact on server load.
I think you should change script logic if it is possible.
If tasks your script executes are not periodic but are triggered by some events, the you can use some Message Queue (like Gearman).
Otherwise your solution is OK. Memory leaks can occurs, but in new PHP versions (5.3.x) Garbage Collector is pretty good. Some extensions can lead to memory leaks. Or your application design can lead to hungry memory usage (like Doctrine ORM loaded objects cache).
But you can control script memory usage by tools like monit and restart your script when mempry limit reaches some point or start script again when your script unexpectedly shuts down.
I am looking for the PHP equivalent for VB doevents.
I have written a realtime analysis package in VB and used doevents to release to the operating system.
Doevents allows me to stay in memory and run continuously without filling up memory and allows me to respond to user input.
I have rewritten the package in PHP and I am looking for that same doevents feature.
If it doesn't exist I could reschedule myself and exit.
But I currently don't know how to do that and I think that would add a lot more overhead.
Thank you, gerardg
usleep is what you are looking for.. Delays program execution for the given number of micro seconds
http://php.net/manual/en/function.usleep.php
It's been almost 10 years since I last wrote anything in VB and as I recall, doevents() function allowed the application to yield to the processor during intensive processing (usually to allow other system events to fire - the most common being WM_PAINT so that your UI won't appear hung).
I don't think PHP has such functionality - your script will run as a single process and end (either when it's done or when it hits the default 30 second timeout).
If you are thinking in terms of threads (as most Windows programmers tend to do) and needing to spawn more than 1 instance of your script, perhaps you should take look at PHP's Process Control functions as a start.
I'm not entirely sure which aspects of doevents you're looking to emulate, so here's pretty much everything that could be useful for you.
You can use ob_implicit_flush(true) at the top of your script to enable implicit output buffer flushing. That means that whenever your script calls echo or print or whatever you use to display stuff, PHP will automatically send it all to the user's browser. You could also just use ob_flush() after each call to display something, which acts more like Application.DoEvents() in VB with regards to keeping your UI active, but must be called each time something is output.
Naturally if your script uses the output buffer already, you could build a copy of the buffer before flushing, with ob_get_contents().
If you need to allow the script to run for more time than usual, you can set a longer tiemout with set_time_limit($time). If you need more memory, and you have access to edit your .htaccess file, place the following code and edit the value:
php_value memory_limit 64M
That sets the memory limit to 64 megabytes.
For running multiple scripts at once, you can use pcntl_exec to start another one running.
If I am missing something important about DoEvents(), let me know and I will try to help you make it work.
PHP is designed for asynchronous on demand processing. However it can be forced to become a background task with a little hackery.
As PHP is running as a single thread you do not have to worry about letting the CPU do other things as that is already taken care of. If this was not the case then a web server would only be able to serve up one page at a time and all other requests would have to sit in a queue. You will need to write some sort of look that never expires until some detectable condition happens (like the "now please exit" message you set in the DB or something).
As pointed out by others you will need to set_time_limit($something); with perhaps usleep stopping the code from running "too fast" if it eats very much CPU each loop. However if you are also using a Database connection most of your script time is actually the script waiting for the Database (by far the biggest overhead for a script).
I have seen PHP worker threads created by using screen and detatching it to a background task. Other approaches also work so long as you do not have a session that will time out or exit (say when the web browser is closed). A cron that starts a script to check if the script is running every x mins or hours gives you automatic recovery from forced exists and/or system restarts.
TL;DR: doevents is "baked in" to PHP and you don't have to worry about it.