Multiple Sites on same Beanstalkd Queue - php

I currently have a couple sites setup on a server, and they all use beanstalkd for their queues. Some of these sites are staging sites. While I know it would be ideal to have the staging site on another server, it doesn't make financial sense to spin up another server for it in this situation.
I recently ran into a very confusing issue where I was deploying a staging site, which included a reseeding of the database. I have an observer setup on some model saves that will trigger a job to be queued, which usually ends up sending out an email. The staging site does not actually have any queue workers setup to run them, but the production site (on the same server) does have the queue workers running.
What appeared to be happening is the staging site was generating the queue jobs, and the production site was running those queue jobs! This causes random users of mine to be spammed with email, since it would serialize the model from staging, and when it unserialized it running the job, it actually matched up with an actual production user.
It seems like it would be very common to have multiple sites on a server running queues, so I'm curious if there is a way to avoid this issue. Elasticsearch has the concept of a 'cluster', so you can run multiple search 'clusters' on one server. I'm curious if beanstalkd or redis or any other queue provider have this ability, so we don't have crosstalk between completely separate websites.
Thanks!

Beanstalkd has the concept of tubes:
Tubes are job queues.
A common use case of tubes would be to have completely different sets
of producers and consumers running through a single beanstalk instance
such that a given consumer will not know what to do with jobs produced
by some of the producers. Producer1 can enqueue jobs into Tube1 and
Consumer1 can pick up the jobs from there completely independently of
what Producer2 and Consumer2 are doing with Tube2, for example.
For example, if you're using pheanstalk, the producer will call useTube():
$pheanstalk = new Pheanstalk();
$pheanstalk->useTube('foo')->put(...);
And the worker will call watch():
$pheanstalk = new Pheanstalk();
$pheanstalk->watch('foo')->ignore('default')->reserve();

This is an old question but have you tried running multiple beanstalkd daemons? Simply bind to another port.
Example:
beanstalkd -p 11301 &
Use & to fork into background.

Related

Reloading multiple application servers on deploy

I have a setup where there are several application servers running php-fpm service and they all share a GlusterFS mount for the application code and other assets. In the current deploy process, the files get updated directly on the file server and many times to reflect changes the application service must be reloaded. To achieve that, the deployment script needs to get into every server and issue a reload command but with autoscaling, the number of servers is not the same at every moment.
Overall, I am working on sketching a couple of alternatives to solution this problem:
First one, more artesanal and not perfect, as a proof of concept, would be a cron job that will run every X minutes on the application machines and look for a file that should contain a unique info like it's hostname or IP address. If it matches, it will not take action but if not, it will reload and write itself within the file. On the deployment procedure, the script would clear the file and all servers should get reloaded in the next cron run.
Second, using a more sophisticated approach like a message queue or notification service where the running applications machine would subscribe to at boot time and wait for an order to reload. Deploy script would then publish a notification to get all servers aware it is time. A similar cron job from the previous method would then notice that and reload the app server.
Would any of that make sense? Is there any simpler or more standard way to trigger a broadcast for the applications servers running at a given moment in the deploy procedure without having to ssh to each and issuing the reload command? Any other advice you can provide or other suggestions?
Thanks!

How to safely update Laravel queues without errors?

I have a system with multiple web servers behind a load balancer. I also have one or more servers for running queued jobs (each server can also have more than one queue listener). The web servers push jobs on to redis (separate server) and the workers pull the jobs.
I'm trying to find the best way to update the code (git pull) in each of these locations without the workers having failed jobs due to code changes. An example would be removing a dependency. Queued jobs would still rely on the "old" code after running git pull. Thus, the job would error out since the dependency was removed.
Does anyone have insight on updating the queue workers without causing jobs to fail?

How to manage/balance semi persistent jobs over service instances

I see a common pattern for services that we try to develop and I wonder if there are tools / libraries out there that would help here. While the default jobs as discussed in microservice literature is from the REQUEST -> RESPONSE nature, our jobs are more or less assignments of semi permanent tasks.
Examples of such tasks
Listen on the message queue for data from source X and Y, correlate the data that comes in and store it in Z.
Keep an in-memory buffer that calculates a running average of the past 15 mins of data everytime a new data entry comes in.
Currently our services are written in PHP. Due to the perceived overhead of PHP processes and connections to the message queue we'd like a single service process to handle multiple of those jobs simultanously.
A chart that hopefully illustrated the setup that we have in our head:
Service Workers are currently deamonized PHP scripts
For the Service Registry we are looking at Zookeeper
While Zookeeper (and Curator) do loadbalancing, I did not find anything around distributing permanent jobs (that are updatable, removable, and must be reassigned when a worker dies)
Proposed responsibilities of a Job Manager
Knows about jobs
Knows about services that can do these jobs
Can assign jobs to services
Can send job updates to services
Can reassign jobs if a worker dies
Are there any libraries / tools that can tackle such problems, and can thus function as the Job Manager? Or is this all one big anti pattern and should we do it some other way?
You should have a look at Gearman.
It composes of a client which assigns the jobs, one or more workers which will pick up and execute the jobs and a server which will maintain the list of functions (services) and jobs pending. It will re-assign the jobs if a worker dies.
Your workers sound like (api-less) services itself. So, your requirements can be reformulated as:
Knows about deployed services
Knows about nodes that can host there services
Can deploy services to nodes
Can [send job updates to services] = redeploy services/invoke some API on deployed services
Can redeploy service if service or node dies
Look at Docker to deploy, run and manage isolated processes on host.
RabbitMq is simple message queue that is fairly easy to get going with.

Distributed video encoding - Gearman vs Beanstalkd

Im looking to build a distributed video encoding cluster of a few dozen machines. Ive never worked with a messaging queue before, but the 2 that I started playing around with were Gearman and Beanstalkd.
Beanstalk seems to be a lot simpler and easier to use than Gearman, but its not as feature rich as.
One thing I don't understand is... how do you spawn new workers on all the servers? I plan to use php. Is it as simple as running worker.php in CLI with "&" and just have it sit there waiting for work?
I noticed gearman doesn't actually kill the process after a job is done, but Beanstalk does, so I have to restart the script after every job, on every server.
Currently Im more inclined to use Beanstalk, the general flow of things I planned was:
Run a minutely cron on each server that checks if there are pre-defined amount of workers running. If its less than supposed to be, spawn new worker processes. Each process will take roughly 2-30 minutes.
Maybe I have a flaw in my logic here? Let me know what would be a "better" or "proper" way of doing this?
Terminology I will use just to try and be clear...
There is the concept of a producer and a consumer. The producer generates jobs that are put on a queue (i.e. the beanstalk service) that is then read by a consumer.
There are multiple ways to write a consumer. You can either every x time frame via a cron job run the task or just have a consumer running in a while 1 loop via php (or what have you).
Where to install the service is really dependent on what you are going after. For me I normally install the service either on a consumer(s) or on its separate box (with sometimes the latter being overkill depending on your needs).
If you want durability on the queue side then you should use Beanstalk's binlog parameter (-b ). If something happens to your beanstalk service this will allow you to restart with minimal loss of data in the queues (if not no information). Durability on the producer side can come from having multiple queues to try against.

Scaling cronjobs over multiple servers

right now, we have a single server with a cronjob tab that sends out daily emails. We would like to scale that server. The application is standard zend framework application deployed on centos server in amazon cloud.
We already took care of the load balancing, content management and managing deployment. However, the cronjob is still an issue for us, as we need to grantee that some jobs are performed only once.
For example, the daily emails cronjob must only be executed once by a single server. I'm looking for the best method to grantee only one server will execute it only once.
I'm thinking about 2 solutions, but i was wondering if someone else had the same issue.
Make one of the servers "master", who only sends out the daily emails. That will be an issue, if the server malfunction, and generally we don't want to have a "special" server. It would also means we will need to keep track which server is master.
Have a queue of schedule tasks to be performed. Each server open that queue and sees which tasks needed to be performed. The first server who "grab" the task, will preform the task and mark it as done. I was looking at amazon simple queuing service as a solution for the queue.
Both these solutions have advantages and disadvantages, and i was wondering if someone thought about someone else that might help us here.
When you need to scale out cron jobs, you are better off using a job manager like Gearman
Beanstalkd could also be an option for you.
I had the same problem. What I did was dead simple.
I spun up the cheapest EC2 instance on AWS.
I created the cronjob(s) only on this server.
The cron job just run jobs that only makes a simple request to my endpoint / api (i.e. api.mydomain.com).
On my api, i just have a route watching for these special request that will run the job I want. So basically, all I'm doing instead of running the task using a cronjob, im running the task via a http request.
I hope that makes sense! Now it doesn't matter how many servers you have, it will just scale! Also, your cronjob server's only function is to run dead simple jobs to send a request, nothing more.

Categories