I have a system with multiple web servers behind a load balancer. I also have one or more servers for running queued jobs (each server can also have more than one queue listener). The web servers push jobs on to redis (separate server) and the workers pull the jobs.
I'm trying to find the best way to update the code (git pull) in each of these locations without the workers having failed jobs due to code changes. An example would be removing a dependency. Queued jobs would still rely on the "old" code after running git pull. Thus, the job would error out since the dependency was removed.
Does anyone have insight on updating the queue workers without causing jobs to fail?
Related
I am making an api that requires the job to be dispatched multiple times, however, each job takes 10 seconds, and it takes forever to process one by one. Is their anyway to run multiple job once?
GetCaptcha::dispatch($task_id)->afterCommit()->onQueue('default');
You can achieve that by running multiple workers at the same time.
From the Laravel docs:
To assign multiple workers to a queue and process jobs concurrently,
you should simply start multiple queue:work processes. This can either
be done locally via multiple tabs in your terminal or in production
using your process manager's configuration settings. When using
Supervisor, you may use the numprocs configuration value.
Read more here:
https://laravel.com/docs/9.x/queues#running-multiple-queue-workers
https://laravel.com/docs/9.x/queues#supervisor-configuration
I'm using Laravel 5.5 and I'm trying to setup some fast queue processing. I've been running into one roadblock after another.
This site is an employer/employee matching service. So when an employer posts a job position, it needs to then run through all the employees in our system and calculate a number of variables to determine how well they match to the job. We have this all figured out, but it takes a long time to process one at a time when you have thousands of employees in the system. So, I set up to write a couple of tables. The first is a simple table that defines the position ID and the status. The second is a table listing all the employee IDs, the position ID, and the status of that employee being processed. This takes only a few seconds to write and then allows the user to move on in the application.
Then I have another server setup to run a cron every minute that checks for new entries in the first table. When found, it marks it out as started and then grabs all the employees and runs through each employee and starts a queued job in Laravel. The job I have defined does properly submit to the queue and running queue:work does in fact process the job properly. This is all tested.
However, the problem I'm running into is that I've tried database (MySQL), Redis and SQS for the queue and they are all very slow. I was using this same server to try to operate the queue:work (using Supervisor and attempting to run up to 300 processes) but then created 3 clones that don't run the cron but only run Supervisor (100 processes per clone) and killed Supervisor on the first server. With database it would process ok, though to run through 10k queued jobs would take hours, but with SQS and Redis I'm getting a ton of failures. The scripts are taking too long or something. I checked the CPUs on the clones running the workers and they are barely hitting 40% so I'm not over-taxing the servers.
I was just reading about Horizon and I'm not sure if it would help the situation. I keep trying to find information about how to properly setup a queue processing system with Laravel and just keep running into more questions than answers.
Is anyone familiar with this stuff and have any advice on how to set this up correctly so that it's very fast and failure free (assuming my code has no bugs)?
UPDATE: Following some other post advice, I figured I'd share a few more details:
I'm using Forge as the setup tool with AWS EC2 servers with 2G of RAM.
Each of the three clones has the following worker configuration:
command=php /home/forge/default/artisan queue:work sqs --sleep=10 --daemon --quiet --timeout=30 --tries=3
process_name=%(program_name)s_%(process_num)02d
autostart=true
autorestart=true
stopasgroup=true
killasgroup=true
user=forge
numprocs=100
stdout_logfile=/home/forge/.forge/worker-149257.log
The database is on Amazon RDS.
I'm curious if the Laravel cache will work with the queue system. There's elements of the queued script that are common to every run so perhaps if I queued that data up from the beginning it may save some time. But I'm not convinced it will be a huge improvement.
If we ignore the actual logic processed by each job, and consider the overhead of running jobs alone, Laravel's queueing system can easily handle 10,000 jobs per hour, if not several times that, in the environment described in the question—especially with a Redis backend.
For a typical queue setup, 100 queue worker processes per box seems extremely high. Unless these jobs spend a significant amount of time in a waiting state—such as jobs that make requests to web services across a network and use only a few milliseconds processing the response—the large number of processes running concurrently will actually diminish performance. We won't gain much by running more than one worker per processor core. Additional workers create overhead because the operating system must divide and schedule compute time between all the competing processes.
I checked the CPUs on the clones running the workers and they are barely hitting 40% so I'm not over-taxing the servers.
Without knowing the project, I can suggest that it's possible that these jobs do spend some of their time waiting for something. You may need to tune the number of workers to find the sweet spot between idle time and overcrowding.
With database it would process ok, though to run through 10k queued jobs would take hours, but with sqs and redis I'm getting a ton of failures.
I'll try to update this answer if you add the error messages and any other related information to the question.
I'm curious if the Laravel cache will work with the queue system. There's elements of the queued script that are common to every run so perhaps if I queued that data up from the beginning it may save some time.
We can certainly use the cache API when executing jobs in the queue. Any performance improvement we see depends on the cost of reproducing the data for each job that we could store in the cache. I can't say for sure how much time caching would save because I'm not familiar with the project, but you could profile sections of the code in the job to find expensive operations.
Alternatively, we could cache reusable data in memory. When we initialize a queue worker using artisan queue:work, Laravel starts a PHP process and boots the application once for all of the jobs that the worker executes. This is different from the application lifecycle for a typical PHP web app wherein the application reboots for every request and disposes state at the end of each request. Because every job executes in the same process, we can create an object that caches shared job data in the process memory, perhaps by binding a singleton into the IoC container, which the jobs can read much faster than even a Redis cache store because we avoid the overhead needed to fetch the data from the cache backend.
Of course, this also means that we need to make sure that our jobs don't leak memory, even if we don't cache data as described above.
I was just reading about Horizon and I'm not sure if it would help the situation.
Horizon provides a monitoring service that may help to track down problems with this setup. It may also improve efficiency a bit if the application uses other queues that Horizon can distribute work between when idle, but the question doesn't seem to indicate that this is the case.
Each of the three clones has the following worker configuration:
command=php /home/forge/default/artisan queue:work sqs --sleep=10 --daemon --quiet --timeout=30 --tries=3
(Sidenote: for Laravel 5.3 and later, the --daemon option is deprecated, and the queue:work command runs in daemon mode by default.)
I currently have a couple sites setup on a server, and they all use beanstalkd for their queues. Some of these sites are staging sites. While I know it would be ideal to have the staging site on another server, it doesn't make financial sense to spin up another server for it in this situation.
I recently ran into a very confusing issue where I was deploying a staging site, which included a reseeding of the database. I have an observer setup on some model saves that will trigger a job to be queued, which usually ends up sending out an email. The staging site does not actually have any queue workers setup to run them, but the production site (on the same server) does have the queue workers running.
What appeared to be happening is the staging site was generating the queue jobs, and the production site was running those queue jobs! This causes random users of mine to be spammed with email, since it would serialize the model from staging, and when it unserialized it running the job, it actually matched up with an actual production user.
It seems like it would be very common to have multiple sites on a server running queues, so I'm curious if there is a way to avoid this issue. Elasticsearch has the concept of a 'cluster', so you can run multiple search 'clusters' on one server. I'm curious if beanstalkd or redis or any other queue provider have this ability, so we don't have crosstalk between completely separate websites.
Thanks!
Beanstalkd has the concept of tubes:
Tubes are job queues.
A common use case of tubes would be to have completely different sets
of producers and consumers running through a single beanstalk instance
such that a given consumer will not know what to do with jobs produced
by some of the producers. Producer1 can enqueue jobs into Tube1 and
Consumer1 can pick up the jobs from there completely independently of
what Producer2 and Consumer2 are doing with Tube2, for example.
For example, if you're using pheanstalk, the producer will call useTube():
$pheanstalk = new Pheanstalk();
$pheanstalk->useTube('foo')->put(...);
And the worker will call watch():
$pheanstalk = new Pheanstalk();
$pheanstalk->watch('foo')->ignore('default')->reserve();
This is an old question but have you tried running multiple beanstalkd daemons? Simply bind to another port.
Example:
beanstalkd -p 11301 &
Use & to fork into background.
I see a common pattern for services that we try to develop and I wonder if there are tools / libraries out there that would help here. While the default jobs as discussed in microservice literature is from the REQUEST -> RESPONSE nature, our jobs are more or less assignments of semi permanent tasks.
Examples of such tasks
Listen on the message queue for data from source X and Y, correlate the data that comes in and store it in Z.
Keep an in-memory buffer that calculates a running average of the past 15 mins of data everytime a new data entry comes in.
Currently our services are written in PHP. Due to the perceived overhead of PHP processes and connections to the message queue we'd like a single service process to handle multiple of those jobs simultanously.
A chart that hopefully illustrated the setup that we have in our head:
Service Workers are currently deamonized PHP scripts
For the Service Registry we are looking at Zookeeper
While Zookeeper (and Curator) do loadbalancing, I did not find anything around distributing permanent jobs (that are updatable, removable, and must be reassigned when a worker dies)
Proposed responsibilities of a Job Manager
Knows about jobs
Knows about services that can do these jobs
Can assign jobs to services
Can send job updates to services
Can reassign jobs if a worker dies
Are there any libraries / tools that can tackle such problems, and can thus function as the Job Manager? Or is this all one big anti pattern and should we do it some other way?
You should have a look at Gearman.
It composes of a client which assigns the jobs, one or more workers which will pick up and execute the jobs and a server which will maintain the list of functions (services) and jobs pending. It will re-assign the jobs if a worker dies.
Your workers sound like (api-less) services itself. So, your requirements can be reformulated as:
Knows about deployed services
Knows about nodes that can host there services
Can deploy services to nodes
Can [send job updates to services] = redeploy services/invoke some API on deployed services
Can redeploy service if service or node dies
Look at Docker to deploy, run and manage isolated processes on host.
RabbitMq is simple message queue that is fairly easy to get going with.
I'm looking for PHP solutions for running cronjobs over multiple servers and guaranteeing that only a single server runs these cronjobs automatically. We have cronjobs that we need to run only once, like daily digest emails or weekly reports.
Right now, we have a "master" server which has the crontab installed, and multiple "normal" servers which only have apache installed on them. The issue is that if the "master" server fails, nobody will run the cronjob anymore. It also means we need to keep track on which server is the master and it's creating some scaling issues for us.
Are there any ready-made php solutions for running unique tasks by multiple servers?
I have looked at gearman (http://www.slideshare.net/felixdv/high-gear-php-with-gearman), but it's a little too complex . I just need to guarantee that only one server out of the farm runs the cronjobs.
I build my concurrency and 'running' checks in the scripts themselves.
Update a 'lock' in the database or in memcached that states the last execution time.
If that lock is present, bail in the other copies of the script... unless the lock is too old.
If the lock has been sitting around > max_execution, the script failed or ran too long and never unlocked. Email yourself on that condition.
Remember to unset the lock at script close.
I don't know if you want a free solution or paid one..
Anyway the only solution which I can think of right now is TWS Tivoli Workload Scheduler which is provided by IBM .
As far as I know it works both on Windows or Linux/Unix machines.
http://www-01.ibm.com/software/tivoli/products/scheduler/