Running an infinite loop in cron job.Suppose, i have written a php based script to run on my server computer using cron job, and i want to use infinite loop in that php script.Any ideas for running an infinite loop in cron job.
Infinite looping applications are usually called daemons. They are system services that offer some kind of constant processing and/or the readiness to accept some potential incoming processing activities.
Gearman is a system daemon you can install than can handle various tasks you give it. It's a complex tools that allows many things but it could be used to implement your necessities.
PHP::Gearman is a Gearman client that talks to the Gearman daemon and sends tasks to the daemon specifying the conditions under which the task must be executed.
The limitations that #Jeffrey emphasized about PHP are true because PHP was designed as a share nothing architecture (one page load equals one script execution - each page load works under its own data context).
Perhaps System Daemon (a pear package) may assist in overcoming some or all of the limitations mentioned above. I haven't used it so I can't tell you much more about it but it's as good a place to start as any.
Related
I am programming a website on a Linux CentOS server (I am planning to upgrade to a VPS plan where I will have root access). Much of the website will rely on scripts that are automated.
I have 2 questions about starting automated processes.
Is there any way I can start a Daemon thread, or anything like that, which will constantly be running. I need to execute a script every time an email account gets a new e-mail. I am aware of cron jobs that can run every minute, but having a script that constantly runs would be ideal, so I can execute the script the moment a new e-mail arrives.
Is there any way from code (ideally PHP) to start a thread, which runs concurrently with the main program. In the script I am using, the imap_open is used to connect to an e-mail account, which takes a few seconds every time. However, if I could fire off multiple concurrent scripts at the same time, that would ideally reduce the program's time. Is there any way to do this?
Any help with these questions would be greatly appreciated.
You can certainly write a daemon / service that runs constantly. For a starting tutorial see
http://www.netzmafia.de/skripten/unix/linux-daemon-howto.html
Your daemon can implement SMTP (there are existing libraries available to support this) to periodically check the email account for new emails and act accordingly.
Here's a question with answers from SO that discusses how to accomplish all of this with Python
How to make a Python script run like a service or daemon in Linux
For the first part, there's two easy solutions:
Use the Vixie cron #reboot start specification to start your daemon at reboot as a standard user. This and every-minute cron-jobs are the only mechanisms that make it easy to run a daemon-style service as a user.
Use procmail to start a new script on every email delivery. The downside here is that procmail will run and then start a new program on every email -- when you're getting a hundred emails per second, this could be a serious hindrance compared to a daemon that uses inotify(7) to alert a long-lived program about new emails.
For the second part, look for a wrapper for the fork(2) system call. It cleaves a program cleanly in half -- parent and child -- and allows each to continue independent execution from then on. If the child and parent need to communicate again in the future, then perhaps see if PHP supports threaded execution.
And what about incron? May be there is a way to use it in your case but you must produce a filesystem event (for example create a new file).
I'm making a scan server for my company, which will be used to launch scans from tools such as nessus, nmap, nikto etc. I've written the pages in PHP, but I need to have control over the subsequent processes (spawned with nohup and backgrounded with &) as I need to perform various actions once the scans have completed (such as emailing them, downloading reports off the nessus server etc).
I was advised on here to create a python daemon that the PHP pages communicated with. I've googled endlessly, but I can't find anything that explains the logic behind the communication from a beginner's perspective (coding a daemon will be my most advanced project yet). I'm aware of IPC and unix domain sockets for example, but not sure as to how I can employ them in my situation. As such, I'm after some advice or pointers as to what I should do.
I was thinking I could create a python script with a while loop that constantly checks to see if the process has terminated and when it does, perform the appropriate post process termination action. The script would be daemonized so it runs in the background and I would call it from the PHP pages with the PID as a parameter, which I could access with the argparse module for example.
Am I on the right lines in terms of logic - or are there better solutions?
Any help, or just something to google, is much appreciated! Thanks
I think something like gearman would certainly make it easier to implement this.
Gearman is a job server which lets you start jobs, query if the job is still running and fetch the output of the job (as text).
It supports PHP and Python (among others).
(This answer made me feel like a salesman).
So your plan is: PHP spawns nmap & a watchdog. Watchdog keeps polling for nmap to finish running and then does some post processing once its done.
Slightly cleaner would be:
PHP spawns a 'process manager' (which also you write). This process manager is basically a program that executes nmap in a child process, waits for this child process to finish (using the 'wait' system call, which for example in C looks like: http://linux.die.net/man/2/wait), and does the post processing.
It will also be more efficient because a 'wait' will probably be cheaper than repeatedly checking if the PID has terminated.
If you like python more than C, python has subprocess management too:
http://docs.python.org/library/subprocess.html
My app takes a loooong list of urls, and split it in X (where X = $threads) so then I can start a thread.php and calculate the urls for it. Then it does GET and POST request to retrieve data
I am using this:
for($x=1;$x<=$threads;$x++){
$pid[] = exec("/path/bin/php thread.php <options> > /dev/null & echo \$!");
}
For "threading" (I know its not really threading, is it forking or what?), I save the pids into a file for later checking if N thread is running and to stop them.
Now I want to move out from php, I was thinking about using python because I'd like to learn more about it.
How can I achieve this kind of "threading" with python? (or ruby)
Or is there a better way to launch multiple background threads in python or ruby that runs in parallel (at the same time)?
The threads doesn't need to communicate between each other or with a main thread, they are independent, they do http request and interact with a mysql db, they may need to access/modify the same table entries (I haven't tought about this or how I will solve it yet).
The app works with "projects", each project has a "max threads" variable and I use a web interface to control it (so I could still use php for the interface [starting/stopping threads] in the new app).
I wanted to use
from threading import Thread
in python, but I've been told those threads wont run in parallel but once at a time.
The app is intended to run on linux web servers.
Any suggestion will be appreciated.
For Python 2.6+, consider the multiprocessing module:
multiprocessing is a package that supports spawning processes using an API similar to the threading module. The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. Due to this, the multiprocessing module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows
For Python 2.5, the same functionality is available via pyprocessing.
In addition to the example at the links above, here are some additional links to get you started:
multiprocessing Basics
Communication between processes with multiprocessing
You don't want threading. You want a work queue like Gearman that you can send jobs to asynchronously.
It's worth noting that this is a cross-platform, cross-language solution. There are bindings for many languages (including Python and PHP) provided officially, and many more unofficially with a bit of work with Google.
The original intent is effectively load balancing, but it works just as well with only one machine. Basically, you can create one or more Workers that listen for Jobs. You can control the number of Workers and the types of Jobs they can listen for.
If you insert five Jobs into the queue at the same time, and there happen to be five Workers waiting, each Worker will be handed one of the Jobs. If there are more Jobs than Workers, the Jobs get handled sequentially. Your Client (the thing that submits Jobs) can either wait for all of the Jobs it's created to complete, or it can simply place them in the queue and continue on.
I know about PHP not being multithreaded but i talked with a friend about this: If i have a large algorithmic problem i want to solve with PHP isn't the solution to simply using the "curl_multi_xxx" interface and start n HTTP requests on the same server. This is what i would call PHP style multithreading.
Are there any problems with this in the typical webserver environment? The master request which is waiting for "curl_multi_exec" shouldn't count any time against its maximum runtime or memory length.
I have never seen this anywhere promoted as a solution to prevent a script killed by too restrictive admin settings for PHP.
If i add this as a feature into a popular PHP system will there be server admins hiring a russian mafia hitman to get revenge for this hack?
If i add this as a feature into a
popular PHP system will there be
server admins hiring a russian mafia
hitman to get revenge for this hack?
No but it's still a terrible idea for no other reason than PHP is supposed to render web pages. Not run big algorithms. I see people trying to do this in ASP.Net all the time. There are two proper solutions.
Have your PHP script spawn a process
that runs independently of the web
server and updates a common data
store (probably a database) with
information about the progress of
the task that your PHP scripts can
access.
Have a constantly running daemon
that checks for jobs in a common
data store that the PHP scripts can
issue jobs to and view the progress
on currently running jobs.
By using curl, you are adding a network timeout dependency into the mix. Ideally you would run everything from the command line to avoid timeout issues.
PHP does support forking (pcntl_fork). You can fork some processes and then monitor them with something like pcntl_waitpid. You end up with one "parent" process to monitor the children it spanned.
Keep in mind that while one process can startup, load everything, then fork, you can't share things like database connections. So each forked process should establish it's own. I've used forking for up 50 processes.
If forking isn't available for your install of PHP, you can spawn a process as Spencer mentioned. Just make sure you spawn the process in such a way that it doesn't stop processing of your main script. You also want to get the process ID so you can monitor the spawned processes.
exec("nohup /path/to/php.script > /dev/null 2>&1 & echo $!", $output);
$pid = $output[0];
You can also use the above exec() setup to spawn a process started from a web page and get control back immediately.
Out of curiosity - what is your "large algorithmic problem" attempting to accomplish?
You might be better to write it as an Amazon EC2 service, then sell access to the service rather than the package itself.
Edit: you now mention "mass emails". There are already services that do this, they're generally known as "spammers". Please don't.
Lothar,
As far as I know, php don't work with services, like his concorrent, so you don't have a way for php to know how much time have passed unless you're constantly interrupting the process to check the time passed .. So, imo, no, you can't do that in php :)
When executing proc_nice(), is it actually nice'ing Apache's thread?
If so, and if the current user (non-super user) can't renice to its original priority is killing the Apache thread appropriate (apache_child_terminate) on an Apache 2.0x server?
The issue is that I am trying to limit the impact of an app that allows the user to run Ad-Hack queries. The Queries can be massive and the resultant transform on the data requires a lot of Memory and CPU.
I've already re-written the process to be more stream based - helping with the memory consumption, but I would also like the process to run a lower priority. However I can't leave the Apache thread in low priority as we have a lot of high-priority web services running on this same box.
TIA
In that kind of situation, a solution if often to not do that kind of heavy work within the Apache processes, but either :
run an external PHP process, using something like shell_exec, for instance -- this is if you must work in synchronous mode (ie, if you cannot execute the task a couple of minutes later)
push the task to a FIFO system, and immediatly return a message to the user saying "your task will be processed soon"
and have some other process (launched via a crontab every minute, for instance) check that FIFO queue
and do the processing it there is something in the queue
That process, itself, can run in low priority mode.
As often as possible, especially if the heavy calculations take some time, I would go for the second solution :
It allows users to get some feedback immediatly : "the server has received your request, and will process it soon"
It doesn't keep Apaches's processes "working" for long : the heavy stuff is done by other processes
If, one day, you need such an amount of processing power that one server is not enough anymore, this kind of system will be easier to scale : just add a second server that'll pick from the same FIFO queue
If your server is really too loaded, you can stop processing from the queue, at least for some time, so the load can get better -- for instance, this can be usefull if your critical web-services are used a lot in a specific time-frame.
Another (nice-looking, but I haven't tried it yet) solution would be to use some kind of tool like, for instance, Gearman :
Gearman provides a generic application
framework to farm out work to other
machines or processes that are better
suited to do the work. It allows you
to do work in parallel, to load
balance processing, and to call
functions between languages. It can be
used in a variety of applications,
from high-availability web sites to
the transport of database replication
events. In other words, it is the
nervous system for how distributed
processing communicates.