I have been working on a php mvc web application using codeigniter and need to process some long running tasks.
I checked through several options (RabbitMQ, Gearman, IronMQ etc) and decided to use Gearman of it's simplecity. I went through the samples and tutorials in gearman.org which shows how to start a GearmanWorker using worker.php.
my concern is, in mvc architecture where does this GearmanWorker is initiated and started?
Does it started through a controller method OR
Do we need to initiate the GearmanWorker from cli(console)? If it's started from cli then how to handle if the already started worker has stopped for some reason when we make a GearmanClient->do('some task')
a similar question but not clear enough for me
I wouldn't recommend to start worker from the controller. You can start several workers distributed over network and using workers text command for monitoring purpose. gearmand dispatch a job to the next idle worker.
Maybe SUBMIT_JOB_BG is a good option for you to avoid web server timeout, if job execution takes long.
Related
Would appreciate some help understanding typical best practices in carrying out a series of tasks using Gearman in conjunction with PHP (among other things).
Here is the basic scenario:
A user uploads a set of image files through a web-based interface. The php code responding to the POST request generates an entry in a database for each file, mostly with null entries in the columns, queues a job for each to do analysis using Gearman, generates a status page and exits.
The Gearman worker gets a job for a file and starts a relatively long-running analysis. The result of that analysis is a set of parameters that need to be inserted back into the database record for that file.
My question is, what is the generally accepted method of doing this? Should I use a callback that will ultimately kick off a different php script that is going to do the modification, or should the worker function itself do the database modification?
Everything is currently running on the same machine; I'm planning on using Gearman for background scheduling, rather than for scaling by farming out to different machines, but in any case any of the functions could connect to the database wherever it is.
Any thoughts appreciated; just looking for some insights on how this typically gets structured and what might be considered best practice.
Are you sure you want to use Gearman? I only ask because it was the defacto PHP job server about 15 years ago but hasn't been a reliable solution for quite some time. I am not sure if things have drastically improved in the last 12 months, but last time I evaluated Gearman, it wasn't production capable.
Now, on to the questions.
what is the generally accepted method of doing this? Should I use a callback that will ultimately kick off a different php script that is going to do the modification, or should the worker function itself do the database modification?
You are going to follow this general pattern with any job queue:
Collect a unit of work. In your case, it will be 1 of the images and any information about who that image belongs to, user id, etc.
Submit the work to the job queue with this information.
Job Queue's worker process picks up the work, and starts processing it. This is where I would create records in the database as you can opt to not create them on job failure.
The job queue is going to track which jobs have completed and usually the status of completion. If you are using gearman, this is the gearmand process. You also need something pickup work and process that work, I will refer to this as the job worker. The job worker is where the concurrency happens which is what i think you were referring to when you said "kick off a different php script." You can just kick off a PHP script at an interval (with supervisord or a cronjob) for a kind of poll & fork approach. It's not the most efficient approach, but it doesn't sound like it will really matter for your applications use case. You could also use pcntl_fork or pthreads in PHP to get more control over your concurrent processes and implement a worker pool pattern, but it is much more complicated than just firing off a script. If you are interested in trying to implement some concurrency in PHP, I have a proof-of-concept job worker for beanstalkd available on GitHub that implements a worker pool with both fork and pthreads. I have also include a couple of other resources on the subject of concurrency.
Job Worker (pthreads)
Job Worker (fork)
PHP Daemon Example
PHP IPC Example
I would like to know if someone could explain to me how to build a real-time application with Symfony?
I have looked at a lot of documentation with my best friend Google, but I have not found quite detailed articles.
I would like some more PHP-oriented thing and saw that there were technologies like ReactPHP / Ratchet (but I can not find a tutorial clear enough to integrate it into an existing symfony project).
Do you have any advice on which technologies to use and why? (If you have tutorial links I take!)
Thank you in advance for your answers !
Every useful Symfony application does some form of I/O. In traditional applications this is most often blocking I/O. Even if it's non-blocking I/O, it doesn't integrate a global event loop that could schedule other things while waiting for I/O.
If you integrate Symfony into an existing event loop based WebSocket server, it will work with blocking I/O as a proof of concept, but you will quickly notice it isn't running fine in production, because any blocking I/O blocks your whole event loop and thus blocks all other connected clients.
One solution is rewriting everything to non-blocking I/O, but then you'd no longer be using Symfony. You might be able to reuse some components, but only those not doing any I/O.
Another solution is to use RPC and queue WebSocket requests into a queue. The intermediary can be written using non-blocking I/O only, it doesn't have much to do. It basically just forwards WebSocket messages as RPC requests to a queue. Then you have a set of workers pulling from that queue, doing a normal Symfony kernel dispatch and sending the response into a response queue. The worker can then continue to fetch the next job.
With the second solution you can totally use blocking I/O and all existing Symfony components. You can spawn as many workers as you need and you can even keep them alive between requests. The difference with a queue in between is that one blocking worker doesn't block the responsiveness of the WebSocket endpoint.
If you want multiple WebSocket processes, you'll need separate response queues for them, so the responses are sent back to the right process where the client is connected.
You can find a working implementation with BeanstalkD as queue in kelunik/rpc-demo. src/Server.php is just for the demo purpose and can be replaced with a HTTP server at any time. To keep the demo simple it uses a single WebSocket process but that can be changed as outlined above. You can start php bin/server and php bin/worker, then use telnet localhost 2000 to connect and send messages. It will respond with the same message but base64 encoded in the workers.
The mentioned demo is built on Amp, but the same concepts apply to ReactPHP as well.
In this issue of the official Symfony repository you may find comments and ideas about this: https://github.com/symfony/symfony/issues/17051
Im looking to build a distributed video encoding cluster of a few dozen machines. Ive never worked with a messaging queue before, but the 2 that I started playing around with were Gearman and Beanstalkd.
Beanstalk seems to be a lot simpler and easier to use than Gearman, but its not as feature rich as.
One thing I don't understand is... how do you spawn new workers on all the servers? I plan to use php. Is it as simple as running worker.php in CLI with "&" and just have it sit there waiting for work?
I noticed gearman doesn't actually kill the process after a job is done, but Beanstalk does, so I have to restart the script after every job, on every server.
Currently Im more inclined to use Beanstalk, the general flow of things I planned was:
Run a minutely cron on each server that checks if there are pre-defined amount of workers running. If its less than supposed to be, spawn new worker processes. Each process will take roughly 2-30 minutes.
Maybe I have a flaw in my logic here? Let me know what would be a "better" or "proper" way of doing this?
Terminology I will use just to try and be clear...
There is the concept of a producer and a consumer. The producer generates jobs that are put on a queue (i.e. the beanstalk service) that is then read by a consumer.
There are multiple ways to write a consumer. You can either every x time frame via a cron job run the task or just have a consumer running in a while 1 loop via php (or what have you).
Where to install the service is really dependent on what you are going after. For me I normally install the service either on a consumer(s) or on its separate box (with sometimes the latter being overkill depending on your needs).
If you want durability on the queue side then you should use Beanstalk's binlog parameter (-b ). If something happens to your beanstalk service this will allow you to restart with minimal loss of data in the queues (if not no information). Durability on the producer side can come from having multiple queues to try against.
I have a PHP project (Symfony2) that uses RabbitMQ. I use its as simple message queue to delay some jobs (sending mails, important data from APIs). The consumers run on the webserver and their code is part of the webserver repo - they are deployed in the same with with the web.
The questions are:
How do I start the consumers as daemons and make sure they always run?
When deploying the app, how do I shut down consumers "gracefully" so that they stop consuming but finish processing the message they started?
If it's any important, for deployment I use Capifony.
Thank you!
It maybe worth looking at something supervisord which is written in python. I've used it before for running workers for Gearmand which is a job queue that fullfils a similar role to the way your using RabbitMQ.
I am familiar with the various methods available within php for spawning new processes, forking, etc... Everything I have read urges against using pcntl_fork from within a web-accessible app. Can anyone tell me why this is not recommended?
At a fundamental level, I can see how if you are not careful, things could quickly get out of hand. But what if you are careful? In my case, I would like to pcntl_fork my parent script into a new child, run a short series of specific functions, and then close the child. Seems pretty straightforward, right? Would it still be dangerous for me to try this?
On a related note, can anyone talk about the overhead involved in doing this a different way... Calling proc_open() to launch an entirely new PHP process? Will I lose any possible speed increase by having to launch the new process?
Background: Consider a site with roughly 2,000 concurrent users running fastcgi.
Have you considered gearman for 'forking' new processes? It's also described as 'a distributed forking mechanism' so your workers do not need to be on the same machine.
Synchronous and asynchronous calls are also available.
You will find it here: http://gearman.org/ and it might be a candidate solution to the problem.
I would like to propose another possibility... Tell me what you think about this.
What if I created a pool of web servers whose sole job was to respond to job requests from the master application server? I would have something like this:
Master Application Server (Apache, PHP - FastCGI)
Application Worker Server (Apache, PHP - FastCGI)
Application Worker Server (Apache, PHP - FastCGI)
Application Worker Server (Apache, PHP - FastCGI)
Application Worker Server (Apache, PHP - FastCGI)
Instead of spawning new PHP processes on my master application server, I would send out job requests to my "workers" using asynchronous sockets. The workers would then run these jobs in realtime and send the result back to the main application server.
Has anyone tried this? Do you foresee any problems? It seems to me that this might work wonderfully.
The problem is not that the app is web-accessible.
The problem is that the web server (or here the FastCGI module) may not handle forks very well. Just try yourself.