Cakephp work queue - php

i'm looking for use a work queue to delegate some jobs.
I know that some service like Amazon SQS or Beanstalkd are perfect for this problem.
But, in both I have to create a daemon that poll the queue every x seconds.
Is there other ways to do that with some kind of push system?
Someone has experience with SQS+SNS to call the workers?
Thanks.

Related

Creating scheduled jobs in a Multi-Tenant application

I am building a Multi-Tenant web application using Laravel/PHP that will be hosted on AWS as SaaS at the end. I have around 15-20 different background jobs that need scheduling for each tenant. The jobs need to be fired every 5 minutes as well. Thus the number of jobs which need to be fired for 100 tenants would be around 2000. I am left with 2 challenges in achieving this
Is there a cloud solution that distributes and manages the load of the scheduled jobs automatically?
If one is out there, how can we create those 15+ scheduled jobs on the fly? Is there an API available?
Looking for your assistance
Finally, I have found a solution to my problem.
We cannot scale the background jobs in the way I want. It required me to look into the solution from a completely different angle.
The ideal solution to my problem is that I should generate SQS messages (with a payload describing the tenant id, the job needs to be executed and any additional parameters) corresponding to the number of tenants on a set interval and queue it.
For example, if I have 100 tenants and I want to run "Job 1" every our, the main application will generate 100 SQS messages and queue it in a particular SQS Queue every hour. It will do the same for all 15 different jobs I have per tenant.
On the other end, a scalable AWS Lambda function listening to the SQS queue will pick up the payload and execute the intended task based on the data being carried by the payload.
But unfortunately, my expertise lies in PHP/Laravel technology which is still not in the AWS Lambda stack. Hence I figured out a workaround as follows.
I built a Docker image with my PHP/Laravel application and placed it in Amazon ECS (EC2 container service). Still, I have the AWS Lambda function in place but this time it acts as a trigger to my docker containers. The Lambda picks an SQS Message, processes the payload and spawns a Docker container on ECS based on my Docker image. I got some of the ideas from the following article to arrive at this solution.
https://aws.amazon.com/blogs/compute/better-together-amazon-ecs-and-aws-lambda/
Laravel has option to schedule Task/Jobs:
Refer: https://laravel.com/docs/6.x/scheduling
so you can keep jobs of your client in your database and than do it some like below:
Scheduling Queued Jobs
The job method may be used to schedule a queued job. This method provides a convenient way to schedule jobs without using the call method to manually create Closures to queue the job:
$schedule->job(new ClientJob)->everyFiveMinutes();
// Dispatch the job to the "clientjob" queue...
$schedule->job(new ClientJob, 'clientjob')->everyFiveMinutes();
or
Scheduling Shell Commands
The exec method may be used to issue a command to the operating system:
$schedule->exec('node /home/forge/script.js')->everyFiveMinutes();

Using laravel queue with cron jobs on shared hosting

In this moment i have a shared hosting server.
In my app i want to use Laravel's queue system, but i can't maintain the command php artisan queue:work beacuse i can't install a supervisor.
With a little bit of an effort, i can move my app on a VPS, but i don't have much of experience with servers and i'm a little scared that my app will be offline much time.
Considering lack of experience on server side, i have this questions:
Is it ok to use Laravel queues with cron jobs? Can it break in any way?
Only for this problem should i upgrade to an VPS or i should remain on this shared hosting server ( i have ssh access here )?
Quick answer: you should not use Laravel queues without a process monitor such as Supervisor.
It all depends of what you want to achieve, but an alternative to queues would be using the laravel scheduler: you can trigger the scheduler with a cron task (every minute for example), and dispatch jobs easily.
And if you really want to use the queues, a solution could be to add your jobs to the queue, and process them using a cron task every minute running the following command: php artisan queue:work. But I would recommend the previous solution.

Valid Architecture for a Message Queue & Worker System in PHP?

I'm trying to wrap my head around the message queue model and jobs that I want to implement in a PHP app:
My goal is to offload messages / data that needs to be sent to multiple third party APIs, so accessing them doesnt slow down the client. So sending the data to a message queue is ideal.
I considered using just Gearman to hold the MQ/Jobs, but I wanted to use a Cloud Queue service like SQS or Rackspace Cloud Queues so i wouldnt have to manage the messages.
Here's a diagram of what I think I should do:
Questions:
My workers, would be written in PHP they all have to be polling the cloud queue service? that could get expensive especially when you have a lot of workers.
I was thinking maybe have 1 worker just for polling the queue, and if there are messages, notify the other workers that they have jobs, i just have to keep this 1 worker online using supervisord perhaps? is this polling method better than using a MQ that can notify? How should I poll the MQ, once every second or as fast as it can poll? and then increase the polling workers if I see it slowing down?
I was also thinking of having a single queue for all the messages, then the worker monitoring that distributes the messages to other cloud MQs depending on where they need to be processed, since 1 message might need to be processed by 2 diff workers.
Would I still need gearman to manage my workers or can I just use supervisord to spin workers up and down?
Isn't it more effective and faster to also send a notification to the main worker whenever a message is sent vs polling the MQ? I assume I would the need to use gearman to notify my main worker that the MQ has a message, so it can start checking it. or if I have 300 messages per second, this would generate 300 jobs to check the MQ?
Basically how could I check the MQ as efficiently and as effectively as possible?
Suggestions or corrections to my architecture?
My suggestions basically boil down to: Keep it simple!
With that in mind my first suggestion is to drop the DispatcherWorker. From my current understanding, the sole purpose of the worker is to listen to the MAIN queue and forward messages to the different task queues. Your application should take care of enqueuing the right message onto the right queue (or topic).
Answering your questions:
My workers, would be written in PHP they all have to be polling the cloud queue service? that could get expensive especially when you have a lot of workers.
Yes, there is no free lunch. Of course you could adapt and optimize your worker poll rate by application usage (when more messages arrive increase poll rate) by day/week time (if your users are active at specific times), and so on. Keep in mind that engineering costs might soon be higher than unoptimized polling.
Instead, you might consider push queues (see below).
I was thinking maybe have 1 worker just for polling the queue, and if there are messages, notify the other workers that they have jobs, i just have to keep this 1 worker online using supervisord perhaps? is this polling method better than using a MQ that can notify? How should I poll the MQ, once every second or as fast as it can poll? and then increase the polling workers if I see it slowing down?
This sounds too complicated. Communication is unreliable, there are reliable message queues however. If you don't want to loose data, stick to the message queues and don't invent custom protocols.
I was also thinking of having a single queue for all the messages, then the worker monitoring that distributes the messages to other cloud MQs depending on where they need to be processed, since 1 message might need to be processed by 2 diff workers.
As already mentioned, the application should enqueue your message to multiple queues as needed. This keeps things simple and in place.
Would I still need gearman to manage my workers or can I just use supervisord to spin workers up and down?
There are so many message queues and even more ways to use them. In general, if you are using poll queues you'll need to keep your workers alive by yourself. If however you are using push queues, the queue service will call an endpoint specified by you. Thus you'll just need to make sure your workers are available.
Basically how could I check the MQ as efficiently and as effectively as possible?
This depends on your business requirements and the job your workers do. What time spans are critical? Seconds, Minutes, Hours, Days? If you use workers to send emails, it shouldn't take hours, ideally a couple of seconds. Is there a difference (for the user) between polling every 3 seconds or every 15 seconds?
Solving your problem (with push queues):
My goal is to offload messages / data that needs to be sent to multiple third party APIs, so accessing them doesnt slow down the client. So sending the data to a message queue is ideal. I considered using just Gearman to hold the MQ/Jobs, but I wanted to use a Cloud Queue service like SQS or Rackspace Cloud Queues so i wouldnt have to manage the messages.
Indeed the scenario you describe is a good fit for message queues.
As you mentioned you don't want to manage the message queue itself, maybe you do not want to manage the workers either? This is where push queues pop in.
Push queues basically call your worker. For example, Amazon ElasticBeanstalk Worker Environments do the heavy lifting (polling) in the background and simply call your application with an HTTP request containing the queue message (refer to the docs for details). I have personally used the AWS push queues and have been happy with how easy they are. Note, that there are other push queue providers like Iron.io.
As you mentioned you are using PHP, there is the QPush Bundle for Symfony, which handles incoming message requests. You may have a look at the code to roll your own solution.
I would recommend a different route, and that would be to use sockets. ZMQ is an example of a socket based library already written. With sockets you can create a Q and manage what to do with messages as they come in. The machine will be in stand-by mode and use minimal resources while waiting for a message to come in.

Tasks/Queues mangament in amazon aws

I'm looking for a solution to add items into a queue and execute them one-by-one in a similar method to google appengine's tasks manager. Each task will be executed using a http request to a php script.
As i'm using amazon, i understood that the best practice is using the SNS service that will be responsible for receiving new tasks, adding them to a queue (Amazon's SQS service) and also inform my php worker that a new task has been pushed into the queue so he can look for it and execute it.
There are several issues with that method (like the need to limit the number of workers instances via the worker itself or just the possibility that the task won't be in the queue when we call the worker because we add the task to the queue in the same time).
I would like to hear if there are any better options or a nicer way of implementing a tasks manager. I preffer using the amazon's services but i'm open to any new suggestion, looking for the best method. Features that are missing in amazon like FIFO and priorities support would also be a nice addition.
Thanks!
Ben
I have found a good solution.
AWS Beanstalk service is apparently offering an option to define a new elastic-beanstalk instance as a "worker" or a "web server". in case you define it as a "Worker", you'll be able to attach it to a sqs queue and it will be responsible for polling the queue and performing the task (with the code you deploy to the instance).

Distributed video encoding - Gearman vs Beanstalkd

Im looking to build a distributed video encoding cluster of a few dozen machines. Ive never worked with a messaging queue before, but the 2 that I started playing around with were Gearman and Beanstalkd.
Beanstalk seems to be a lot simpler and easier to use than Gearman, but its not as feature rich as.
One thing I don't understand is... how do you spawn new workers on all the servers? I plan to use php. Is it as simple as running worker.php in CLI with "&" and just have it sit there waiting for work?
I noticed gearman doesn't actually kill the process after a job is done, but Beanstalk does, so I have to restart the script after every job, on every server.
Currently Im more inclined to use Beanstalk, the general flow of things I planned was:
Run a minutely cron on each server that checks if there are pre-defined amount of workers running. If its less than supposed to be, spawn new worker processes. Each process will take roughly 2-30 minutes.
Maybe I have a flaw in my logic here? Let me know what would be a "better" or "proper" way of doing this?
Terminology I will use just to try and be clear...
There is the concept of a producer and a consumer. The producer generates jobs that are put on a queue (i.e. the beanstalk service) that is then read by a consumer.
There are multiple ways to write a consumer. You can either every x time frame via a cron job run the task or just have a consumer running in a while 1 loop via php (or what have you).
Where to install the service is really dependent on what you are going after. For me I normally install the service either on a consumer(s) or on its separate box (with sometimes the latter being overkill depending on your needs).
If you want durability on the queue side then you should use Beanstalk's binlog parameter (-b ). If something happens to your beanstalk service this will allow you to restart with minimal loss of data in the queues (if not no information). Durability on the producer side can come from having multiple queues to try against.

Categories