I wrote a PHP shell script which include queuing jobs in centOS with 'at' command.
The queue jobs may vary in time and contents which means the system need to keep quite a large number of jobs.
The application logic will also be a bit difficult to setup with cronjob.
Is there a limit in number of queue jobs in centOS or is there any alternative way of queuing jobs?
You might consider writing to a "distributed" queue such as dropr or an implementation of AMPQ.
A lot of job processes may slow down the whole server. It could happen if there are many of them which started at the same time (or simply work at the same time). If you have 20 hard working processes and only 4 CPU cores they will struggle for CPU and it should switch between them very often.
I'd suggest using a message queue and a pool of workers. You can start from a filesystem queue (or mysql based). You have to install a php lib only, no any extra services. Later if needed, you can switch to real message queue brokers with less effort.
I would recommend using enqueue library.
Related
How can I make laravel queue:work to process jobs as many as possible? (With either redis or beanstalkd)
By default it is processing 1 job at the same time, but I need it to run multiple jobs at the same time, until CPU has free space.
Any help is appreciated.
It depends on how you are running the worker(s), and how you can increase the number of workers that are being started to get items from the queue and run them.
If you are using supervisord to run the workers, it could be as easy as increasing the numprocs in the configuration.
You would not usually set it to a very large number, as trying to run too many processes at once is likely to end up as a major problem when things run out of memory or CPU.
Have a look at this listener. The best it is that it runs workers depending of how many load you have in your queue, and it is very easy to configure. So, when you require much more workers, since it is spawning workers automatically, the new ones will handle the task... If there's no need of more workers, since the previously spawned are killed, there will be a minimum consuming only a few resources from your PC. You can tune depending of your needs and server capabilities.
https://github.com/smaugho/TunedQueue
I have about 35 cron jobs right now. Most of them are PHP scripts that either scrape or do some calculations. The scripts also loop over 10-20 different servers to do those scrapes. (They are different countries so they have to be separate calls).
So we have 30 scripts, each has a loop over 20 servers and therefore take about 5-15 minutes to run per script. I have each script spaced out right now.
But is it better to have 80 individual scripts run instead of 35 scripts that loop and take a while? Each script would take maybe 1-2 minutes instead of 10-15min.
That would of course spawn a ton more PHP processes. Is there any issue or limit with 10-15 or more PHP processes running at once?
I'm running a cloud server performance on Rackspace.
Personally if the jobs need to complete in a certain order I would make it as linear as possible.....it might take longer but I always err . The side of data accuracy.
It depends.
If you are creating more processes that will be running at the same time you are going to increase your overall memory footprint. Each process will carry it's own overhead of memory for the process to run, and to load any libraries needed for it's process. (aside from whatever it needs to do whatever it does). You will also more than have twice as many script to monitor that they are successfully running all the time.
However in creating more processes you will be able to speed things us since you are essentially creating a multi-thread. Allowing one process to continue while another is blocking waiting for i/o.
If each script doesn't have a dependency on another, breaking them into smaller scripts should be fine. If you can handle monitoring more scripts, and the server can handle it, then I would do it.
If scripts do have dependencies, or if you would have to run so many at the same time you server usage maxes out, keep them together.
That being said, I would also try to optimize the script, make sure there isn't something you can do to make them faster without create more processes.
Depending on how you have the servers setup, I would run them at once. In addition, I would also run them at night, off hours when the web servers aren't in use and not during business operations unless your web app depends on it. If you're on a Cloud server on Rackspace I wouldn't worry about bandwidth although increasing your ram could be an issue further down the road.
Spawning a ton more PHP process shouldn't be a worry if you have sufficient amount of ram; there is no limitation on the linux side.
a) Figure out which cron needs to run in which order
b) Order the cron to be run at night, around mid-night
c) Run and fireoff the 80 scripts at once
it would also be a good idea to send you an email with cron results or report that it all went through successfully, based on the batch but not individual cron.
I'm starting to use asynchronous jobs/messages to do some heavy background work on a PHP page instead of making the user sit there and wait for it. So far I'm leaning towards using Beanstalkd over RabbitMQ or Amazon's SQS, but my question below is a bit more generic and applies to all of them:
Is it better to have one huge worker acting as a dispatcher for multiple job types?
Worker watches all jobs, delegates based on job type
Only one open connection to Beanstalkd
Use meta-data to dispatch Worker objects to do the actual work?
May only process 1 job at a time on the server
Or is it better to have several, smaller focused worker scripts on the same server?
Each worker only watches 1 kind of job
Multiple, sustained connections to Beanstalkd
Less complexity, as each script only does one thing
Other job types don't clog up while waiting for one long job to run
Takes more resources
There are probably several other factors that I don't even know about, so any additional tips would be appreciated.
(If it matters, I'm planning to daemon-ize a PHP-based worker script using Supervisor. For now the worker will only be running on 1 server but that may expand to two in the future...)
Is it so memory consuming to have a daemon written on php (which listens/process a queue) comparing to crontab way of executing background tasks?
I have ~600 shops on one server under one engine. Some tasks shop-owner runs require a lot of time, so it is reasonable to fork them. Putting a task into cron works well, I just don't like up to 59 sec delay of start (restriction of cron). So I'd like to try queue system. I'm just afraid it will force me to run 600 php threads to listen/process those queues (shops are from different customers, I can't make a common daemon). Doesn't it automatically require some 600-1000MB more memory, which is then not a good choice comparing to cron (which only loads a process if it was planned).
Instead of putting them into a cron with a 59-second delay, why not run them using the "at" daemon? You can simply use "at now" and they'll run immediately. See, for example,:
http://unixhelp.ed.ac.uk/CGI/man-cgi?at
I certainly wouldn't consider running 600 threads in PHP as daemons simultaneously.
I've previously built queue-runners that ran as many as 75-100 separate PHP processes, using supervisor to start as many as I wanted. Since they share so much common code, that is also shared by the OS, and not duplicated.
Running a few dozen, or more, maybe with some type of high-priority queue for the small, fast jobs and a subset of the workers that can happily run the large, slow ones.
I've written on the subject at my tech blog, phpscaling.com.
Im looking to build a distributed video encoding cluster of a few dozen machines. Ive never worked with a messaging queue before, but the 2 that I started playing around with were Gearman and Beanstalkd.
Beanstalk seems to be a lot simpler and easier to use than Gearman, but its not as feature rich as.
One thing I don't understand is... how do you spawn new workers on all the servers? I plan to use php. Is it as simple as running worker.php in CLI with "&" and just have it sit there waiting for work?
I noticed gearman doesn't actually kill the process after a job is done, but Beanstalk does, so I have to restart the script after every job, on every server.
Currently Im more inclined to use Beanstalk, the general flow of things I planned was:
Run a minutely cron on each server that checks if there are pre-defined amount of workers running. If its less than supposed to be, spawn new worker processes. Each process will take roughly 2-30 minutes.
Maybe I have a flaw in my logic here? Let me know what would be a "better" or "proper" way of doing this?
Terminology I will use just to try and be clear...
There is the concept of a producer and a consumer. The producer generates jobs that are put on a queue (i.e. the beanstalk service) that is then read by a consumer.
There are multiple ways to write a consumer. You can either every x time frame via a cron job run the task or just have a consumer running in a while 1 loop via php (or what have you).
Where to install the service is really dependent on what you are going after. For me I normally install the service either on a consumer(s) or on its separate box (with sometimes the latter being overkill depending on your needs).
If you want durability on the queue side then you should use Beanstalk's binlog parameter (-b ). If something happens to your beanstalk service this will allow you to restart with minimal loss of data in the queues (if not no information). Durability on the producer side can come from having multiple queues to try against.