Laravel - Queue Workers, High CPU

Laravel - Queue Workers, High CPU - php

I have a Laravel app (on Forge) that's posting messages to SQS. I then have another box on Forge which is running Supervisor with queue workers that are consuming the messages from SQS.
Right now, I just have one daemon worker processing a particular tube of data from SQS. When messages come up, they do take some time to process - anywhere from 30 to 60 seconds. The memory usage on the box is fine, but the CPU spikes almost instantly and then everything seems to get slower.
Is there any way to handle this? Should I instead dispatch many smaller jobs (which can be consumed by multiple workers) rather than one large job which can't be split amongst workers?
Also, I noted that Supervisor is only using one of my two cores. Any way to have it use both?

Having memory intensive applications is manageable as long as scaling is provided, but CPU spikes is something that is hard to manage since it happens within one core, and if that happens, sometimes your servers might even get sandboxed.
To answer your question, I see two possible ways to handle your problem.
Concurrent Programming. Have it as it is, and see whether the larger task can be parallelized. (see this). If this is supported, then parallelize the code to ensure that each core handles a specific part of your large task. Finally, gather the results into one coordinating core and assemble the final result. (additionally: This can be efficiently done is GPU programming is considered)
Dispatch Smaller Jobs (as given in the question): This is a good approach if you can manage multiple workers working on smaller tasks and finally there is a mechanism to coordinate everything together. This could be arranged as a Master-Slave setting. This would make everything easy (because parallelizing a problem is a bit hard), but you need to coordinate everything together.

Related

Temporary storage for collecting data prior to sending

I'm working on a composer package for PHP apps. The goal is to send some data after requests, queue jobs, other actions that are taken. My initial (and working) idea is to use register_shutdown_function to do it. There are a couple of issues with this approach, firstly, this increases the page response time, meaning that there's the overhead of computing the request, plus sending the data via my API. Another issue is that long-running processes, such as queue workers, do not execute this method for a long time, therefore there might be massive gaps between when the data was created and when it's sent and processed.
My thought is that I could use some sort of temporary storage to store the data and have a cronjob to send it every minute. The only issue I can see with this approach is managing concurrency on hight IO. Because many processes will be writing to the file every (n) ms, there's an issue with reading the file and removing lines that had been already sent.
Another option which I'm trying to desperately avoid is using the client database. This could potentially cause performance issues.
What would be the preferred way to do this?
Edit: the package is essentially a monitoring agent.

There are a couple of issues with this approach, firstly, this increases the page response time, meaning that there's the overhead of computing the request, plus sending the data via my API
I'm not sure you can get around this, there will be additional overhead to doing more work within the context of a web request. I feel like using a job-queue based/asynchronous system is minimizing this for the client. Whether you choose a local file system write, or a socket write you'll have that extra overhead, but you'll be able to return to the client immediately and not block on the processing of that request.
Another issue is that long-running processes, such as queue workers, do not execute this method for a long time, therefore there might be massive gaps between when the data was created and when it's sent and processed.
Isn't this the whole point?? :p To return to your client immediately, and then asynchronously complete the job at some point in the future? Using a job queue allows you to decouple and scale your worker pool and webserver separately. Your webservers can be pretty lean because heavy lifting is deferred to the workers.
My thought is that I could use some sort of temporary storage to store the data and have a cronjob to send it every minute.
I would def recommend looking at a job queue opposed to rolling your own. This is pretty much solved and there are many extremely popular open source projects to handle this (any of the MQs) Will the minute cron job be doing the computation for the client? How do you scale that? If a file has 1000 entries, or you scale 10x and has 10000 will you be able to do all those computations in less than a minute? What happens if a server dies? How do you recover? Inter-process concurrency? Will you need to manage locks for each process? Will you use a separate file for each process and each minute? To bucket events? What happens if you want less than 1 minute runs?
Durability Guarantees
What sort of guarantees are you offering your clients? If a request returns can the client be sure that the job is persisted and it will be completed at sometime in the future?
I would def recommend choosing a worker queue, and having your webserver processes write to it. It's an extremely popular problem with so many resources on how to scale it, and with clear durability and performance guarantees.

Running asynchronous jobs in the background (laravel)

I know Laravel's queue drivers such as redis and beanstalkd and I read that you can increase the number of workers for beanstalkd etc. However I'm just not sure if these solutions are right for my scenario. Here's what I need;
I listen to an XML feed over a socket connection, and the data just keeps coming rapidly. forever. I get tens of XML documents in a second.
I read data from this socket line by line, and once I get to the XML closing tag, I send the buffer to another process to be parsed. I used to just encode the xml in base64, and run a separate php process for each xml. shell_exec('php parse.php' . $base64XML);
This allowed me to parse this never ending xml data quite rapidly. Sort of a manual threading. Now I'd like to utilize the same functionality with Laravel, but I wonder if there is a better way to do it. I believe Artisan::call('command') doesn't push it to the background. I could of course do a shell_exec within Laravel too, but I'd like to know if I can benefit from Beanstalkd or a similar solution.
So the real question is this: How can I set the number of queue workers for beanstalkd or redis drivers? Like I want 20 threads running at the same time. More if possible.
A slightly less important question is: How many threads is too many? If I had a very high-end dedicated server that can process the load just fine, would creating 500 threads/workers with these tools cause any problems on the code level?

Well laravel queues are just made for that.
Basicaly, you have to create a Job Class. All the heavy work you want to do on your xml document need to be here.
Then, you fetch your xml out of the socket, and as soon as you have received one document, you push it on your Queue.
Later, a queue worker will pick it up from the queue, and do the heavy work.
The advantage of that is that if you queue up documents faster than you work on them, the queue will take care of that high load moment and queue up tasks for later.
I also don't recommend you to do it without a queue (with a fork like you did). In fact, if too much documents come in, you'll create too many childs threads and overload your server. Bookkeeping these threads correctly is risky and not worth it when a simple queue with a fixed number of workers solve all these problems out of the box).

After a little more research, I found how to set the number of worker processes. I had missed that part in the documentation. Silly me. I still wonder if this supervisor tool can handle hundreds of workers for situations like mine. Hopefully someone can share their experience, but if not I'll be updating this answer once I do a performance test this week.

I tell you from experience that shell_exec() is not the ideal way to run async tasks in PHP.
Seems ok while developing, but if you have a small vps (1-2 GB ram) you could overload your server and apache/nginx/sql/something could brake while you're not around and your website could be down for hours / days.
I recommend Laravel Queues + Scheduler for these kind of things.

Can I use cron jobs for my application (needs to be extremely scalable)?

I'm about to undertake a large project, where I'll need scheduled tasks (cron jobs) to run a script that will loop through my entire database of entities and make calls to multiple API's such as Facebook, Twitter & Foursquare every 10 minutes. I need this application to be scalable.
I can already foresee a few potential pitfalls...
Fetching data from API's is slow..
With thousands of records (constantly increasing) in my database, it's going to take too much time to process every record within 10 minutes.
Some shared servers only stop scripts running after 30 seconds.
Server issues due to constant intensive scripts running.
My question is how to structure my application...?
Could I create multiple cron jobs to handle small segments of my database (this will have to be automated)?
This will require potentially thousands of cron jobs.. Is that sustainable?
How to bypass the 30 sec issue with some servers?
Is there a better way to go about this?
Thanks!

I'm about to undertake a large project, where I'll need scheduled
tasks (cron jobs) to run a script that will loop through my entire
database of entities and make calls to multiple API's such as
Facebook, Twitter & Foursquare every 10 minutes. I need this
application to be scalable.
Your best option is to design the application to make use of a distributed database, and deploy it on multiple servers.
You can design it to work in two "ranks" of servers, not unlike the map-reduce approach: lightweight servers that only perform queries and "pre-digest" some data ("map"), and servers that aggregate the data ("reduce").
Once you do that, you can establish a performance baseline and calculate that, say, if you can generate 2000 queries per minute and you can handle as many responses, then you need a new server every 20,000 users. In that "generate 2000 queries per minute" you need to factor in:
data retrieval from the database
traffic bandwidth from and to the control servers
traffic bandwidth to Facebook, Foursquare, Twitter etc.
necessity to log locally (and maybe distill and upload log digests to Command and Control)
An advantage of this architecture is that you can start small - a testbed can be built with a single machine running both Connector, Mapper, Reducer, Command and Control and Persistence. When you grow, you just outsource different services to different servers.
On several distributed computing platforms, this also allows you to run queries faster by judiciously allocating Mappers geographically or connectivity-wise, and reduce the traffic costs between your various platforms by playing with, e.g. Amazon "zones" (Amazon has also a message service that you might find valuable for communicating between the tasks)
One note: I'm not sure that PHP is the right tool for this whole thing. I'd rather think Python.
At the 20,000 users-per-instance traffic level, though, I think that you'd better take this up with the guys at Facebook, Foursquare etc. . At a minimum you might glean some strategies such as running the connector scripts as independent tasks, each connector sorting its queue based on that service's user IDs, to leverage what little data locality there might be, and taking advantage of pipelining to squeeze more bandwidth with less server load. At the most, they might point you to bulk APIs or different protocols, or buy you for one trillion bucks :-)

See http://php.net/manual/en/function.set-time-limit.php to bypass the 30 second limit.
For scheduling jobs in PHP look at:
http://www.phpjobscheduler.co.uk/
http://www.zend.com/en/products/server/zend-server-job-queue
I personally would look at a more robust framework that handles job scheduling (see Grails with Quartz) instead of reinventing the wheel and writing your own job scheduler. Don't forget that you are probably going to need to be checking on the status of tasks from time to time so you will need a logging solution around the tasks.

Using php + gearman + node.js

I am considering building a site using php, but there are several aspects of it that would perform far, far better if made in node.js. At the same time, large portions of of the site need to remain in PHP. This is because a lot of functionality is already developed in PHP, and redeveloping, testing, and so forth would be too large of an undertaking, and quite frankly, those parts of the site run perfectly fine in PHP.
I am considering rebuilding the sections in node.js that would benefit from running most in node.js, then having PHP pass the request to node.js using Gearman. This way, I scan scale out by launching more workers and have gearman handle the load distribution.
Our site gets a lot of traffic, and I am concerned if gearman can handle this load. I wan't to keep this question productive, so let's focus largely on the following addressable points:
Can gearman handle all of our expected load assuming we have the memory (potentially around 3000+ queued jobs at at time, with several thousand being processed per second)?
Would this run better if I just passed the requests to node.js using CURL, and if so, does node.js provide any way to distribute the load over multiple instances of a given script?
Can gearman be configured in a way that there is no single point of failure?
What are some issues that you guys can see arising both in terms of development and scaling?
I am addressing these wide range of points so anyone viewing this post can collect a wide range of information in one place regarding matters that strongly affect each other.
Of course I will test all of this, but I want to collect as much information as possible before potentially undertaking something like this.
Edit: A large reason I am using gearman is not because of it's non-blocking structure, but because of it's sheer speed.

I can only speak to your questions on Gearman:
Can gearman handle all of our expected load assuming we have the memory (potentially around 3000+ queued jobs at at time, with several thousand being processed per second)?
Short: Yes
Long: Everything has its limit. If your job payloads are inordinately large you may run into issues. Gearman stores its queue in memory.. so if your payloads exceed the amount of memory available to Gearman you'll run into problems.
Can gearman be configured in a way that there is no single point of failure?
Gearman has a plugin/extension/component available to use MySQL as a persistence store. That way, if Gearman or the machine itself goes down you can bring it right back up where it left off. Multiple worker-servers can help keep things going if other workers go down.

Node has a cluster module that can do basic load balancing against n processes. You might find it useful.
A common architecture here in nodejs-land is to have your nodes talk http and then use some way of load balancing such as an http proxy or a service registry. I'm sure it's more or less the same elsewhere. I don't know enough about gearman to say if it'll be "good enough," but if this is the general idea then I'd imagine it would be fine. At the least, other people would be interested in hearing how it went I'm sure!
Edit: Remember, number-crunching will block node's event loop! This is somewhat obvious if you think about it, but definitely something to keep in mind.

What is the bottleneck when it's not Memory, CPU or IO?

I have a PHP class that selects data about a file from a MySQL database, processes that data in PHP and then outputs the final data to the command line. Then it moves onto the next file within a foreach loop. ( later I'll be inserting this data into another table ... but that's not important now )
I want to make the processing as fast as possible.
When I run the script and monitor my system using top or iostat:
my cpus are never less than 65% idle ( 4 core EC2 instance )
the PHP script sits at about 45%
mysqld sits at about 8%
my memory usage never passes ~1.5GB ( 8GB of ram total )
there is very little disk IO
What other bottlenecks could be preventing this process from running faster and using the available CPU and Memory?
EDIT 1:
This does not need to be a procedural process and I've designed it to parallelize the processing if necessary. If I can speed it up some, it'd be simpler to leave it as procedural processing.
I've monitored the disk I/O using iostat -x 1 and there is very little.
I need to speed this up in general because it will ultimately be used to process hundreds of millions of files and I'd like it to be as fast as possible as it's part of a larger processing step.

Well, it may be because a single PHP process can only run on one core at a time and you're not loading up your system to the point where it will have four concurrent jobs running continuously.
Example: if PHP were the only thing running on that box, it was inherently tied to a single core per "job" and only one request at a time were being made, I'd fully expect a CPU load of around 25% despite the fact it's already going as fast as it possibly can.
Of course, once that system started ramping up to the point where there are continuously four PHP scripts running, you may find higher CPU utilisation.
In my opinion, you should only really worry about a performance problem if it's an actual problem (such as not being able to keep up with incoming requests). Optimisation just because you want it using more CPU and/or memory resources seems to be looking at it the wrong way around. I would just get it running as fast as possible without worrying about the actual resources used.
If you want to process hundreds of millions of files as fast as possible (as per your update) and PHP is core-bound, you should think about horizontal scaling.
In other words, if the processing of a single file is independent, you can simply start two or three PHP processes and have them process one file each. That will be more likely to get them running on distinct cores.
You can even scale across physical machines if necessary though that's likely to introduce network latency on the DB access (unless the DB is replicated across all the machines as well).
Without a fair bit more detail, the options I can provide will be mostly generic ones.

The first problem you need to fix is the word "bottleneck", because it means everything and nothing.
It conjurs this image of some sort of constriction in the flow of whatever the machine seems to do which is so fast it must be like water running through pipes.
Computation isn't like that.
I find it helps to see how a very simple, slow, computer works, namely Harry Porter's Relay Computer.
You can watch it chug along, at a very slow clock rate, executing every little step within each instruction and finishing them before it starts the next.
(Now, obviously, machines these days are multi-core, pipelined, multi-level cache, blah blah. That's all fine, but that makes you think computation is like water flowing, and that prevents you from understanding software performance.)
Think of any computer and software as just like in that relay machine, except on a scale of nanoseconds, not seconds.
When a computer is calculating in a program, it is executing instructions one after the other. Call that "X".
When a program wants to read or write some bits to external hardware, it has to request that hardware to start, and then it has to find a way to kill time until the result is ready.
Call that "y".
It could be an idle loop, or letting another "thread" run, etc.
So the execution of a program looks like
XXXXXyyyyyyyXXXXXXXXyyyyyyy
If there are more "y"s in there than "X"s we tend to call it "I/O bound".
If not, we might call it "compute bound".
Either way, it's just a matter of proportion of time spent.
If you say it's "memory bound", that's just like I/O except it could be different external hardware.
It still occupies some fraction of the overall sequential timeline.
Now for any given task, there are infinitely many programs that could be written to do it. Some of them will get done in fewer steps than all the others.
When you want performance, you want to get as close as possible to writing one of those programs.
One way to do it is to find "X"s and "y"s that you can get rid of, and get rid of as many as possible.
Now, within a single thread, if you pick an "X" or "y" at random, how can you tell if you can get rid of it?
Find out what it's purpose is!
That "X" or "y" represents a moment in the execution sequence of the program, and if you look at the state of the program at that time, and look at the source code, you will be able to figure out why that moment is being spent.
Do that a few times.
As soon as you see two moments in time having a similar less-than-absolutely-necessary purpose,
there are probably a lot more like them, and you've found something you can get rid of.
If you do so, the program will no longer be spending that time.
That's the basic idea behind this method of performance tuning.
Here's an example where that method was used, over several iterations, to remove over 97% of the time spent in a program.
Not all programs are that far away from optimal.
(Some are much farther.)
Many programs just have to do a certain amount of "X"s or "y"s, and there's no way around it.
Nevertheless, it is often very surprising how much room you can find for speedup in otherwise perfectly good code - provided - you forget about "bottlenecks" and look for steps that it's doing, over time, that could be removed or done better.
It's easy.

I suspect you're spending most of your time communicating with MySQL and reading the files. How are you determining that there's very little IO? Communicating with MySQL is going to be over the network, which is very slow compared to direct memory access. Same with reading files.

Looks like CPU is your bottleneck. Or to be more precise a single core is your bottle neck.
100% utilisation of a single core will result in a "25% CPU utilisation" if the other three cores are idle.
Your numbers are consistent with a php script running at 100% on a single core, with 5 to 10% utilization on the other three cores.

Sorry to resurrect an old thread, but thought this might help someone out.
I had a similar problem and it had to do with a command line script that was throwing numerous 'Notice' warnings. That somehow led to it performing slowly and using less than 10% of the cpu. This behavior only showed up on migrating from MacOS X to Ubuntu, as the default in OSX seems to be to suppress the wornings. Once I fixed the offending code it performed much better, with processes using around 100% cpu consistently.

As the other guy said, sorry to resurrect an old thread, but this may help somebody.
I had the same issue: running a bunch of processes in parallel, all using MySQL. The machine was slow with no identifiable bottlenecks: cpu, memory nor disk.
It turns out that the most probable cause of my problems was that MySQL internal threads were hung on the same semaphore most of the time. Switching from vanilla MySQL 5.5 to MariaDB 10.0 fixed the problem.
Also, to ensure that my machine is always running at full capacity while not being flooded, I have created a Perl script raspawn.pl (on GitHub).
You can read the full sad story here.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.