SQS is slow publishing requests

SQS is slow publishing requests - php

I have a Laravel application which sends data to SQS on nearly every request. However, every so often, one of these requests takes several seconds to execute. Attached is a stack trace from New Relic. It seems that the tick() method (within CURL) gets called many times and the seconds just pile up. It also seems to be making several attempts to connect to the same endpoint, though they are AWS services so I can't imagine they'd be unresponsive this often.
Any idea why this might occur?
My code is hosted on AWS, on two m4.large instances behind an ELB. In general, the application is operating at a fairly low throughput -- roughly 50 - 100 requests per minute.
Stack trace: https://ibb.co/f05gLk
Additional thought: being that these instances are in a private subnet, is it possible the long request times to endpoints on SQS is a DNS-related issue?

SQS pushes are slow(>50ms) sometimes (especially if your packet size is big). I noticed SQS pushes to take around 80ms for a fairly small packet(200k). I shifted the push to redis and a batched push from redis to sqs to solve this.
I did not spend time investigating why the pushes are slow.

Related

Temporary storage for collecting data prior to sending

I'm working on a composer package for PHP apps. The goal is to send some data after requests, queue jobs, other actions that are taken. My initial (and working) idea is to use register_shutdown_function to do it. There are a couple of issues with this approach, firstly, this increases the page response time, meaning that there's the overhead of computing the request, plus sending the data via my API. Another issue is that long-running processes, such as queue workers, do not execute this method for a long time, therefore there might be massive gaps between when the data was created and when it's sent and processed.
My thought is that I could use some sort of temporary storage to store the data and have a cronjob to send it every minute. The only issue I can see with this approach is managing concurrency on hight IO. Because many processes will be writing to the file every (n) ms, there's an issue with reading the file and removing lines that had been already sent.
Another option which I'm trying to desperately avoid is using the client database. This could potentially cause performance issues.
What would be the preferred way to do this?
Edit: the package is essentially a monitoring agent.

There are a couple of issues with this approach, firstly, this increases the page response time, meaning that there's the overhead of computing the request, plus sending the data via my API
I'm not sure you can get around this, there will be additional overhead to doing more work within the context of a web request. I feel like using a job-queue based/asynchronous system is minimizing this for the client. Whether you choose a local file system write, or a socket write you'll have that extra overhead, but you'll be able to return to the client immediately and not block on the processing of that request.
Another issue is that long-running processes, such as queue workers, do not execute this method for a long time, therefore there might be massive gaps between when the data was created and when it's sent and processed.
Isn't this the whole point?? :p To return to your client immediately, and then asynchronously complete the job at some point in the future? Using a job queue allows you to decouple and scale your worker pool and webserver separately. Your webservers can be pretty lean because heavy lifting is deferred to the workers.
My thought is that I could use some sort of temporary storage to store the data and have a cronjob to send it every minute.
I would def recommend looking at a job queue opposed to rolling your own. This is pretty much solved and there are many extremely popular open source projects to handle this (any of the MQs) Will the minute cron job be doing the computation for the client? How do you scale that? If a file has 1000 entries, or you scale 10x and has 10000 will you be able to do all those computations in less than a minute? What happens if a server dies? How do you recover? Inter-process concurrency? Will you need to manage locks for each process? Will you use a separate file for each process and each minute? To bucket events? What happens if you want less than 1 minute runs?
Durability Guarantees
What sort of guarantees are you offering your clients? If a request returns can the client be sure that the job is persisted and it will be completed at sometime in the future?
I would def recommend choosing a worker queue, and having your webserver processes write to it. It's an extremely popular problem with so many resources on how to scale it, and with clear durability and performance guarantees.

Laravel - Queue Workers, High CPU

I have a Laravel app (on Forge) that's posting messages to SQS. I then have another box on Forge which is running Supervisor with queue workers that are consuming the messages from SQS.
Right now, I just have one daemon worker processing a particular tube of data from SQS. When messages come up, they do take some time to process - anywhere from 30 to 60 seconds. The memory usage on the box is fine, but the CPU spikes almost instantly and then everything seems to get slower.
Is there any way to handle this? Should I instead dispatch many smaller jobs (which can be consumed by multiple workers) rather than one large job which can't be split amongst workers?
Also, I noted that Supervisor is only using one of my two cores. Any way to have it use both?

Having memory intensive applications is manageable as long as scaling is provided, but CPU spikes is something that is hard to manage since it happens within one core, and if that happens, sometimes your servers might even get sandboxed.
To answer your question, I see two possible ways to handle your problem.
Concurrent Programming. Have it as it is, and see whether the larger task can be parallelized. (see this). If this is supported, then parallelize the code to ensure that each core handles a specific part of your large task. Finally, gather the results into one coordinating core and assemble the final result. (additionally: This can be efficiently done is GPU programming is considered)
Dispatch Smaller Jobs (as given in the question): This is a good approach if you can manage multiple workers working on smaller tasks and finally there is a mechanism to coordinate everything together. This could be arranged as a Master-Slave setting. This would make everything easy (because parallelizing a problem is a bit hard), but you need to coordinate everything together.

High CPU PHP (symfony) process

We are using https://github.com/php-amqplib/php-amqplib to consume messages from a rabbitmq. We've got several consumers that are running happily with no issues. Today we discovered that one of our consumer processes is consuming around 7% CPU of the host when idling (no messages in the queue) whereas the rest are consuming about 1% each.
On top of that when switching this process on and off we see large changes in the CPU utilizations of our db (AWS RDS postgres instance). With 3 consumer processes running our DB is at >30% CPU utilization all the time (even when there's nothing in the queue).
We've got a standard symfony configuration and our consumers are run using app/console rabbitmq:consumer -w consumer_name. The consumer in question has nothing special about it as far as we can tell. We are completely out of clues here so any help will be much appreciated.
more details:
When we turn the on the consumer we can see the same set of queries running a huge amount of times on the db (200,001 times in the space of 10minutes). There are no unacked messages in the queue. The consumer processes messages correctly otherwise. The query is a SELECT that would be run normally as part of the logic of the consumer.

Valid Architecture for a Message Queue & Worker System in PHP?

I'm trying to wrap my head around the message queue model and jobs that I want to implement in a PHP app:
My goal is to offload messages / data that needs to be sent to multiple third party APIs, so accessing them doesnt slow down the client. So sending the data to a message queue is ideal.
I considered using just Gearman to hold the MQ/Jobs, but I wanted to use a Cloud Queue service like SQS or Rackspace Cloud Queues so i wouldnt have to manage the messages.
Here's a diagram of what I think I should do:
Questions:
My workers, would be written in PHP they all have to be polling the cloud queue service? that could get expensive especially when you have a lot of workers.
I was thinking maybe have 1 worker just for polling the queue, and if there are messages, notify the other workers that they have jobs, i just have to keep this 1 worker online using supervisord perhaps? is this polling method better than using a MQ that can notify? How should I poll the MQ, once every second or as fast as it can poll? and then increase the polling workers if I see it slowing down?
I was also thinking of having a single queue for all the messages, then the worker monitoring that distributes the messages to other cloud MQs depending on where they need to be processed, since 1 message might need to be processed by 2 diff workers.
Would I still need gearman to manage my workers or can I just use supervisord to spin workers up and down?
Isn't it more effective and faster to also send a notification to the main worker whenever a message is sent vs polling the MQ? I assume I would the need to use gearman to notify my main worker that the MQ has a message, so it can start checking it. or if I have 300 messages per second, this would generate 300 jobs to check the MQ?
Basically how could I check the MQ as efficiently and as effectively as possible?
Suggestions or corrections to my architecture?

My suggestions basically boil down to: Keep it simple!
With that in mind my first suggestion is to drop the DispatcherWorker. From my current understanding, the sole purpose of the worker is to listen to the MAIN queue and forward messages to the different task queues. Your application should take care of enqueuing the right message onto the right queue (or topic).
Answering your questions:
My workers, would be written in PHP they all have to be polling the cloud queue service? that could get expensive especially when you have a lot of workers.
Yes, there is no free lunch. Of course you could adapt and optimize your worker poll rate by application usage (when more messages arrive increase poll rate) by day/week time (if your users are active at specific times), and so on. Keep in mind that engineering costs might soon be higher than unoptimized polling.
Instead, you might consider push queues (see below).
I was thinking maybe have 1 worker just for polling the queue, and if there are messages, notify the other workers that they have jobs, i just have to keep this 1 worker online using supervisord perhaps? is this polling method better than using a MQ that can notify? How should I poll the MQ, once every second or as fast as it can poll? and then increase the polling workers if I see it slowing down?
This sounds too complicated. Communication is unreliable, there are reliable message queues however. If you don't want to loose data, stick to the message queues and don't invent custom protocols.
I was also thinking of having a single queue for all the messages, then the worker monitoring that distributes the messages to other cloud MQs depending on where they need to be processed, since 1 message might need to be processed by 2 diff workers.
As already mentioned, the application should enqueue your message to multiple queues as needed. This keeps things simple and in place.
Would I still need gearman to manage my workers or can I just use supervisord to spin workers up and down?
There are so many message queues and even more ways to use them. In general, if you are using poll queues you'll need to keep your workers alive by yourself. If however you are using push queues, the queue service will call an endpoint specified by you. Thus you'll just need to make sure your workers are available.
Basically how could I check the MQ as efficiently and as effectively as possible?
This depends on your business requirements and the job your workers do. What time spans are critical? Seconds, Minutes, Hours, Days? If you use workers to send emails, it shouldn't take hours, ideally a couple of seconds. Is there a difference (for the user) between polling every 3 seconds or every 15 seconds?
Solving your problem (with push queues):
My goal is to offload messages / data that needs to be sent to multiple third party APIs, so accessing them doesnt slow down the client. So sending the data to a message queue is ideal. I considered using just Gearman to hold the MQ/Jobs, but I wanted to use a Cloud Queue service like SQS or Rackspace Cloud Queues so i wouldnt have to manage the messages.
Indeed the scenario you describe is a good fit for message queues.
As you mentioned you don't want to manage the message queue itself, maybe you do not want to manage the workers either? This is where push queues pop in.
Push queues basically call your worker. For example, Amazon ElasticBeanstalk Worker Environments do the heavy lifting (polling) in the background and simply call your application with an HTTP request containing the queue message (refer to the docs for details). I have personally used the AWS push queues and have been happy with how easy they are. Note, that there are other push queue providers like Iron.io.
As you mentioned you are using PHP, there is the QPush Bundle for Symfony, which handles incoming message requests. You may have a look at the code to roll your own solution.

I would recommend a different route, and that would be to use sockets. ZMQ is an example of a socket based library already written. With sockets you can create a Q and manage what to do with messages as they come in. The machine will be in stand-by mode and use minimal resources while waiting for a message to come in.

How to use only 2 broker connections across 15+ number of processes?

The scenario is - I am building a message queue model using RabbitMQ and phpamqplib. This model will have 15 programs each program will consume a message from a queue and publish a message to another queue. All these queues are different (i.e. around 30 queues). But I want to use only 2 connections across all these programs one for publishing and another one for consuming. I don't want to create broker connections in each of the program. I am not able to understand how to do it? Any help? Thanks in advance.

If you want to use 2 connections then 15 producers and consumers should be part of a single process and run as threads. In addition two threads one for consuming and other for publishing.
The consumer thread consumes messages and pushes them to remaining worker thread pool.
Once the worker threads have completed their work, response is pushed to an internal storage inside publisher,which in turn pops the response onto rabbit queues.
. Few points to keep in mind are:
Throughput: Number of consumers and producers is decided on basis of throughput you want to achieve for your application.
Scalability, if you have fixed number of consumers and producers then you might be able to scale your application to a limit.
Flow control: number of consumers can be crucial in avoiding connection based flow control.
Internal message caching by consumer thread (Qos). Set a well defined QOs value as per the throughout desired.
Also explore if multi-threading is supported by amqp library you desire to use. If yes then you could share the connection across threads.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.