I have a website, created using PHP and running on Apache. I want a subscriber to be able to log in and start a process on the server. They can then log out or close the browser without interrupting the process. Later they can log in and see the progress or see the results of the original process. What is the best way to accomplish this (having the process run until completion, after the browser is closed)?
Just looking for someone to point me in the right direction. A few people mentioned Gearman.
Gearman would be an ideal candidate, and I would use it for exactly the purpose you describe. It has everything you need out of the box to meet your requirements ("background" a long running, CPU-bound process to another machine, e.g. video encoding).
There is a Gearman PHP library, but you can write your worker code in a different language if it's better suited to doing the work.
For reporting progress information, I recommend having the worker write to Redis or Memcached - some kind of temporary storage that your web server can also access.
Check out the simple PHP example on the Gearman site. For learning, I recommend setting up a lab environment that contains 3 separate VM's, one for your web server (the client), one for the Gearman job queue (the server) and another for processing jobs (the workers).
Related
I found that pthreads does not work on web environment. I use PHP7.1 on FPM on Linux Debian which i also use Symfony 3.2. All I want to do is, for example:
User made a request and PUT a file (which is 1GB)
PHP Server receives the file and process it.
Immediately return true to user (jsonResponse) without awaiting processing uploaded file
Later, when processing file is finished (move, copy, duplicate whatever you want) just add an event or do callback from background and notify user.
Now. For this I created Console Command. I execute a Process('bin/console my:command')->start(); from background and I do my processing. But this is killing a fly with bazooka for me. I have to pass many variables to this executable command.
All I want to is creating another thread and just return to user without awaiting processing.
You may say this is duplicate. And point to pthreads. But pthreads stated that it is only intended for CLI. Also last version of pthreads doesn't work with symfony. (fatal error).
I am stuck at this point and have doubt if should I stay with creating processes for each uploaded file or move to python -> django
You don't want threads. You want a job queue. Have a look at Gearman or similar things.
Gearman provides a generic application framework to farm out work to other machines or processes that are better suited to do the work. It allows you to do work in parallel, to load balance processing, and to call functions between languages. It can be used in a variety of applications, from high-availability web sites to the transport of database replication events. In other words, it is the nervous system for how distributed processing communicates.
I'm looking for some ideas to do the following. I need a PHP script to perform certain action for quite a long time. This is an extension for a CMS and this can't be anything else but PHP. It also can't be a command line script because it should be used by common people that will have only the standard means of the CMS. One of the options is having a cron job (most simple hostings have it) that will trigger the script often so that instead of working for a long time it could perform the action step by step preserving its state from one launch to the next one. This is not perfect but I can't see of any other solutions. If the script will be redirecting to itself server will interrupt it. What other options can suit?
Thanks everyone in advance!
What you're talking about is a daemon or long running program that waits for calls by client programs, performs and action, provides a response then keeps on waiting for more calls.
You might be familiar w/ these in the form of Apache & MySQL ;) Anyway PHP is generally OK in this regard, it does have the ability to function over raw sockets as well as fork sub-processes to handle multiple requests simultaneously.
Having said that PHP daemons are a tool where YMMV. Some folks will say they work great, other folks like me will say they have issues w/ interprocess communication and leaking memory even amidst plethora unset() calls.
Anyway you likely won't be able to deploy a daemon of any type on a shared hosting environment. You'll need to get a better server package or stick with a Cron based solution.
Here's a link about writing a PHP daemon.
Also, one more note. Daemons do crash from time to time and therefore you may still need to store state about whats going on, just in case someone trips over the power cord to your shared server :)
I would also suggest that you think about making it a daemon but if not then you can simply use
set_time_limit(0);
ignore_user_abort(true);
at the top to tell it not to time out and not to get interrupted by anything. Then call it from the cron to start it every day or whatever. I have this on many long processing daily tasks and it works great for me. However, it won't be able to easily talk to the outside world (other scripts can't query it or anything -- if that is what you want look into php services) so once you get it running make sure it will stop and have it print its progress to a logfile.
I want to have my own variable that would be (most likely an array) storing what my php application is up to right now.
The application can trigger few processes that are in background (like downloading files) and I want to have a list what is being currently processed.
For example
if php calls exec() that will be downloading for 15mins
and then another download starts
and another download starts
then if I access my application I want to be able to see that 3 downloads are in process. If none of them finished yet.
Can do that? Only in memory, not storing anything on the disk?
I thought that the solution would be a some kind of server variable.
PHP doesn't have knowledge of previous processes. As soon has a php process is finished everything it knows about itself goes with it.
I can think of two options. Write knowledge about spawned processes to a file or database and use it to sync all your php request, (store the PID of each spawned process)
Or
Create an Daemon. The people behind PHP have worked hard to clean up PHP memory handling and such to make this more feasible. Take a look at their PEAR package - http://pear.php.net/package/System_Daemon
Off the top of my head, a quick architecture would compose of 3 peices
Part A) The web app that will take in request for downloads, and report back the progress of all request
Part B) You daemon, which accepts requests for downloads, spawns process, and will report back status of all spawned reqeust
Part C) The spawn request that will perform the download you need.
Anyone for shared memory?
Obviously you would have to have some sort of daemon, but you could use the inbuilt semaphore functions to easily have contact between each of the scripts. You need to be careful though because sometimes if you're not closing the memory block properly, you could risk ending up with no blocks left.
You can't store your own variables in $_SERVER. The best method would be to store your data in a database where and query/update it as required.
I'm putting together my first commercial PHP application, it's nothing really huge as I'm still eagerly learning PHP :)
Right now I'm still in the conceptual stage of planning my application but I run into one problem all the time, the application is supposed to be self-hosted by my customers, on their own servers and will include some very long running scripts, depending on how much data every customer enters in his application.
Now I think I have two options, either use cronjobs, like for example let one or multiple cronjobs run at a time that every customer can set himself, OR make the whole processing of data as daemons that run in the background...
My question is, since it's a self-hosted application (and every server is different)... is it even recommended to try to write php that starts background processes on a customers server, or is this more something that you can do reliably only on your own server...?
Or should I use cronjobs for these long running processes?
(depending on the amount of data my customers will enter in the application, a process could run 3+ hours)
Is that even a problem that can be solved, reliably, with PHP...? Excuse me if this should be a weird question, I'm really not experienced with PHP daemons and/or long running cronjobs created by php.
So to recap everything:
Commercial self-hosted application, including long running processes, cronjobs or daemons? And is either or maybe both also a reliable solution for a paid application that you can give to your customers with a clear conscience because you know it will work reliable on all kinds of different servers...?
EDIT*
PS: Sorry, I forgot to mention that the application targets only Linux servers, so everything like Debian, Ubuntu etc etc.
Short answer, no, don't go for background process if this will be a client hosted solution. If you go towards the ASP concept (Application Service Provider... not Active Server Pages ;)) then you can do some wacky stuff with background processes and external apps connecting to your sql servers and processing stuff for you.
What i suggest is to create a strong task management backbone and link that to a solid task processing infrastructure. I'll recommend you read an old post i did quite some time ago regarding background processes and a strategy i had adopted to fix long running processes:
Start & Stop PHP Script from Backend Administrative Webpage
Happy reading...
UPDATE
I realize that my old post is far from easy to understand so here goes:
You need 2 models: Job and JobQueue, 2 controller: JobProcessor, XYZProcessor
JobProcessor is called either by a user when a page triggers or using a cronjob as you wish. JobProcessor::process() is the key that starts the whole processing or continues it. It loads the JobQueues and asks the job queues if there is work to do. If there is work to do, it asks the jobqueue to start/continue it's job.
JobQueue Model: Used to queue several JOBS one behind each other and controls what job is currently current by keep some kind of ID and STATE about which job is running.
Job Model: Represents exactly what needs to be done, it contains for example the name of the controller that will process the data, the function to call to process the data and a serialized configuration property that describe what must be done.
XYZController: Is the one that contains the processing method. When the processing method is called, the controller must load everything it needs to memory and then process each individual unit of work as fast as possible.
Example:
Call of index.php
Index.php creates a jobprocessor controller
Index.php calls the jobprocessor's process()
JobProcessor::Process() loads all the queues and processes them
For each JobQueue::Process(), the job queue loads it's possible Jobs and detects if one is currently running or not. If none is running, it starts the next one by calling Job::Process();
Job::Process() creates the XYZController that will work the task at hand. For example, my old system had an InvoicingController and a MassmailingController that worked hand in hand.
Job::Process() calls XYZController::Prepare() so that it loads it's information to process. (For example, load a batch of emails to process, load a batch of invoices to create)
Job::Process() calls XYZController::RunWorkUnit() so that it processes a single unit of work (For example, create one invoice, send one email)
Job::Process() asks JobProcessingController::DoIStillHaveTimeToProcess() and if so, continues processing the next element.
Job::Process() runs out of time and calls XYZController::Cleanup() so that all resources are released
JobQueue::Process() ends and returns to JobController
JobController::Process() is about to end? Open a socket, call myself back so i can start another round of processing until i don't have anything to do anymore
Handle the request from the user that start in position #1.
Ultimately, you can instead open a socket each time and ask the processor to do something, or you can queue a CronJob to call your processor. This way your users won't get stuck waiting for the 3/4 work units to complete each time.
Its worth noting that, in addition to running daemons or cron jobs, you can kick off long running processes from a web request (but note that it must run outside of the webserver process group) and of course asynchronous message processing (which is essentially a variant on the batch approach).
All four of these approaches are very different in terms of how they behave, how concurrency and timing are managed. The factors which make them all different are the same ones you omitted from your question - so it's not really possible to answer.
Unfortunately all rely on facilities which are very different between MSWindows and POSIX systems - so although PHP will run on both, if you want to sell your app on both platforms it's going to need 2 versions.
Maybe you should talk to your potential customer base and ask them what they want?
I know about PHP not being multithreaded but i talked with a friend about this: If i have a large algorithmic problem i want to solve with PHP isn't the solution to simply using the "curl_multi_xxx" interface and start n HTTP requests on the same server. This is what i would call PHP style multithreading.
Are there any problems with this in the typical webserver environment? The master request which is waiting for "curl_multi_exec" shouldn't count any time against its maximum runtime or memory length.
I have never seen this anywhere promoted as a solution to prevent a script killed by too restrictive admin settings for PHP.
If i add this as a feature into a popular PHP system will there be server admins hiring a russian mafia hitman to get revenge for this hack?
If i add this as a feature into a
popular PHP system will there be
server admins hiring a russian mafia
hitman to get revenge for this hack?
No but it's still a terrible idea for no other reason than PHP is supposed to render web pages. Not run big algorithms. I see people trying to do this in ASP.Net all the time. There are two proper solutions.
Have your PHP script spawn a process
that runs independently of the web
server and updates a common data
store (probably a database) with
information about the progress of
the task that your PHP scripts can
access.
Have a constantly running daemon
that checks for jobs in a common
data store that the PHP scripts can
issue jobs to and view the progress
on currently running jobs.
By using curl, you are adding a network timeout dependency into the mix. Ideally you would run everything from the command line to avoid timeout issues.
PHP does support forking (pcntl_fork). You can fork some processes and then monitor them with something like pcntl_waitpid. You end up with one "parent" process to monitor the children it spanned.
Keep in mind that while one process can startup, load everything, then fork, you can't share things like database connections. So each forked process should establish it's own. I've used forking for up 50 processes.
If forking isn't available for your install of PHP, you can spawn a process as Spencer mentioned. Just make sure you spawn the process in such a way that it doesn't stop processing of your main script. You also want to get the process ID so you can monitor the spawned processes.
exec("nohup /path/to/php.script > /dev/null 2>&1 & echo $!", $output);
$pid = $output[0];
You can also use the above exec() setup to spawn a process started from a web page and get control back immediately.
Out of curiosity - what is your "large algorithmic problem" attempting to accomplish?
You might be better to write it as an Amazon EC2 service, then sell access to the service rather than the package itself.
Edit: you now mention "mass emails". There are already services that do this, they're generally known as "spammers". Please don't.
Lothar,
As far as I know, php don't work with services, like his concorrent, so you don't have a way for php to know how much time have passed unless you're constantly interrupting the process to check the time passed .. So, imo, no, you can't do that in php :)