Is it so memory consuming to have a daemon written on php (which listens/process a queue) comparing to crontab way of executing background tasks?
I have ~600 shops on one server under one engine. Some tasks shop-owner runs require a lot of time, so it is reasonable to fork them. Putting a task into cron works well, I just don't like up to 59 sec delay of start (restriction of cron). So I'd like to try queue system. I'm just afraid it will force me to run 600 php threads to listen/process those queues (shops are from different customers, I can't make a common daemon). Doesn't it automatically require some 600-1000MB more memory, which is then not a good choice comparing to cron (which only loads a process if it was planned).
Instead of putting them into a cron with a 59-second delay, why not run them using the "at" daemon? You can simply use "at now" and they'll run immediately. See, for example,:
http://unixhelp.ed.ac.uk/CGI/man-cgi?at
I certainly wouldn't consider running 600 threads in PHP as daemons simultaneously.
I've previously built queue-runners that ran as many as 75-100 separate PHP processes, using supervisor to start as many as I wanted. Since they share so much common code, that is also shared by the OS, and not duplicated.
Running a few dozen, or more, maybe with some type of high-priority queue for the small, fast jobs and a subset of the workers that can happily run the large, slow ones.
I've written on the subject at my tech blog, phpscaling.com.
Related
How can I make laravel queue:work to process jobs as many as possible? (With either redis or beanstalkd)
By default it is processing 1 job at the same time, but I need it to run multiple jobs at the same time, until CPU has free space.
Any help is appreciated.
It depends on how you are running the worker(s), and how you can increase the number of workers that are being started to get items from the queue and run them.
If you are using supervisord to run the workers, it could be as easy as increasing the numprocs in the configuration.
You would not usually set it to a very large number, as trying to run too many processes at once is likely to end up as a major problem when things run out of memory or CPU.
Have a look at this listener. The best it is that it runs workers depending of how many load you have in your queue, and it is very easy to configure. So, when you require much more workers, since it is spawning workers automatically, the new ones will handle the task... If there's no need of more workers, since the previously spawned are killed, there will be a minimum consuming only a few resources from your PC. You can tune depending of your needs and server capabilities.
https://github.com/smaugho/TunedQueue
I have about 35 cron jobs right now. Most of them are PHP scripts that either scrape or do some calculations. The scripts also loop over 10-20 different servers to do those scrapes. (They are different countries so they have to be separate calls).
So we have 30 scripts, each has a loop over 20 servers and therefore take about 5-15 minutes to run per script. I have each script spaced out right now.
But is it better to have 80 individual scripts run instead of 35 scripts that loop and take a while? Each script would take maybe 1-2 minutes instead of 10-15min.
That would of course spawn a ton more PHP processes. Is there any issue or limit with 10-15 or more PHP processes running at once?
I'm running a cloud server performance on Rackspace.
Personally if the jobs need to complete in a certain order I would make it as linear as possible.....it might take longer but I always err . The side of data accuracy.
It depends.
If you are creating more processes that will be running at the same time you are going to increase your overall memory footprint. Each process will carry it's own overhead of memory for the process to run, and to load any libraries needed for it's process. (aside from whatever it needs to do whatever it does). You will also more than have twice as many script to monitor that they are successfully running all the time.
However in creating more processes you will be able to speed things us since you are essentially creating a multi-thread. Allowing one process to continue while another is blocking waiting for i/o.
If each script doesn't have a dependency on another, breaking them into smaller scripts should be fine. If you can handle monitoring more scripts, and the server can handle it, then I would do it.
If scripts do have dependencies, or if you would have to run so many at the same time you server usage maxes out, keep them together.
That being said, I would also try to optimize the script, make sure there isn't something you can do to make them faster without create more processes.
Depending on how you have the servers setup, I would run them at once. In addition, I would also run them at night, off hours when the web servers aren't in use and not during business operations unless your web app depends on it. If you're on a Cloud server on Rackspace I wouldn't worry about bandwidth although increasing your ram could be an issue further down the road.
Spawning a ton more PHP process shouldn't be a worry if you have sufficient amount of ram; there is no limitation on the linux side.
a) Figure out which cron needs to run in which order
b) Order the cron to be run at night, around mid-night
c) Run and fireoff the 80 scripts at once
it would also be a good idea to send you an email with cron results or report that it all went through successfully, based on the batch but not individual cron.
I'm starting to use asynchronous jobs/messages to do some heavy background work on a PHP page instead of making the user sit there and wait for it. So far I'm leaning towards using Beanstalkd over RabbitMQ or Amazon's SQS, but my question below is a bit more generic and applies to all of them:
Is it better to have one huge worker acting as a dispatcher for multiple job types?
Worker watches all jobs, delegates based on job type
Only one open connection to Beanstalkd
Use meta-data to dispatch Worker objects to do the actual work?
May only process 1 job at a time on the server
Or is it better to have several, smaller focused worker scripts on the same server?
Each worker only watches 1 kind of job
Multiple, sustained connections to Beanstalkd
Less complexity, as each script only does one thing
Other job types don't clog up while waiting for one long job to run
Takes more resources
There are probably several other factors that I don't even know about, so any additional tips would be appreciated.
(If it matters, I'm planning to daemon-ize a PHP-based worker script using Supervisor. For now the worker will only be running on 1 server but that may expand to two in the future...)
I wrote a PHP shell script which include queuing jobs in centOS with 'at' command.
The queue jobs may vary in time and contents which means the system need to keep quite a large number of jobs.
The application logic will also be a bit difficult to setup with cronjob.
Is there a limit in number of queue jobs in centOS or is there any alternative way of queuing jobs?
You might consider writing to a "distributed" queue such as dropr or an implementation of AMPQ.
A lot of job processes may slow down the whole server. It could happen if there are many of them which started at the same time (or simply work at the same time). If you have 20 hard working processes and only 4 CPU cores they will struggle for CPU and it should switch between them very often.
I'd suggest using a message queue and a pool of workers. You can start from a filesystem queue (or mysql based). You have to install a php lib only, no any extra services. Later if needed, you can switch to real message queue brokers with less effort.
I would recommend using enqueue library.
I'm working on a PHP web interface that will receive huge traffic. Some insert/update requests will contain images that will have to be resized to some common sizes to speed up their further retrieval.
One way to do it is probably to set up some asynchronous queue on the server. Eg. set up a table in a db with a tasks queue that would be populated by PHP requests and let some other process on the server watch the table and process any waiting tasks. How would you do that? What would be the proper environment for that long running process? Java, or maybe something lighter would do?
If what you're doing is really high volume then what you're looking for is something like beanstalkd. It is a distributed work queue processor. You just put a job on the queue and then forget about it.
Of course then you need something at the other end reading the queue and processing the work. There are multiple ways of doing this.
The easiest is probably to have a cron job that runs sufficiently often to read the work queue and process the requests. Alternatively you can use some kind of persistent daemon process that is woken up by work becoming available.
The advantage of this kind of approach is you can tailor the number of workers to how much work needs to get done and beanstalkd handles distributed prorcessing (in the sense that the listners can be on different machines).
You may set a cron task that would check the queue table. The script that handles actions waiting in the queue can be written e.g. in php so you don't have to change implementation language.
I use Perl for long running process in combination with beanstalkd. The nice thing is that the Beanstalkd client for Perl has a blocking reserve method. This way it uses almost no CPU time when there is nothing to do. But when it has to do its job, it will automatically start processing. Very efficient.
You would want to create a daemon which would "sleep" for a period of time and then check the database for items to process. Once it found items to process, it would process them and then check again as soon as it was done, if no more, then sleep.
You can create daemon's in any language, including PHP.
Alternatively, you can just have PHP execute a script and continue on. So that PHP won't wait for the script to finish before continuing, execute it in the background.
exec("nohup /usr/bin/php -f /path/to/script/script.php > /dev/null 2>&1 &");
Although you have to be careful with that since you could end up having too many processes running in the background since there is no queueing.
You could use a service like IronWorker to do image processing in the background and take the load off your servers. Since it's a service, you won't need to manage anything or set anything else up and it will scale with you as you grow so if you can do one image with it, you can scale to millions of images with zero effort.
Here's an article on how to do a bunch of image processing transformations:
http://dev.iron.io/solutions/image-processing/
The examples there are in Ruby, but you could do the same stuff with PHP pretty easily.