I have a Laravel app (iOS API) that pushes data to SQS to be processed in the background. Depending on the request, we need to dispatch anywhere from 1 to 4 jobs to SQS. For example:
Dispatch a job to SQS, to be processed by a Worker:
for connecting to a socket service (Pusher)
for connecting to Apple's APNS service (Push Notifications)
for sending data to Loggly for basic centralized request logging
for storing analytics data in a SQL database (for the time being)
The problem is, we might have a feature like "chat", which is a pretty light request as far as server processing is concerned, however it needs to connect to SQS three times over to send:
1) Socket push to all the devices
2) Analytics processing
3) Centralized Request / Error Logging
In total, these connections end up doubling or tripling the time that the rest of request takes. ie. POSTing to /chat might otherwise take about 40-50ms, but with SQS, it takes more like 100 - 120ms
Should i be approaching this differently? Is there a way batch these to SQS so we only need to connect once rather than 3 times?
As Michael suggests in the comments, one way to do it is to use a sendMessageBatch request to send up to 10 messages to one single queue. When you're using multiple queues, and maybe even if you're not, there's also another approach.
If you can approach all the different messages as the same element, namely a notification of an action to an arbitrary receiver, you will find yourself in the fanout pattern. This is a commonly used pattern in which multiple receivers need to act on a single message, a single action.
Although AWS SQS doesn't natively support it, in combination with AWS Simple Notification Service you can actually achieve the same thing. Jeff Barr wrote a short blog post on the fanout setup using SNS and SQS a while ago (2012). It boils down to sending a notification to SNS that'll trigger messages to be posted on multiple SQS queues.
If anyone is curious, I released an MIT package to batch dispatch :)
https://packagist.org/packages/atymic/laravel-bulk-sqs-queue
It uses async and sendMessageBatch under the hood.
Related
I was looking for a good way to manage a lot of background tasks, and i found out AWS SQS.
My software is coded in PHP. To complete a background task, the worker must be a CLI PHP application.
How am i thinking of acclompishing this with AWS SQS:
Client creates a message (message = task)
Message Added to Mysql DB
A Cron Job checks mysql db for messages and adds them to SQS queue
SQS Queue Daemon listents to queue for messages and sends HTTP POST requests to worker when a message is received
Worker receives POST request and forks a php shell_execute with parameters to do the work
Its neccessary to insert messages in MySQL because they are scheduled to be completed at a certain time
A little over complicated.
I need to know what is the best way to do this.
I would use AWS Lambda, with an SQS trigger to asynchronoulsy process messages dropped in the queue.
First, your application can post messages directly to SQS, there is no need to first insert the message in MySQL and have a separate daemon to feed the queue.
Secondly, you can write an AWS Lambda function in PHP, check https://aws.amazon.com/blogs/apn/aws-lambda-custom-runtime-for-php-a-practical-example/
Thirdly, I would wire the Lambda function to the queue, following this documentation : https://aws.amazon.com/blogs/apn/aws-lambda-custom-runtime-for-php-a-practical-example/
This will simplify your architecture (less moving parts, less code) and make it more scalable.
I am trying to set up an API system that synchronously communicates with a number of workers in Laravel. I use Laravel 5.4 and, if possible, would like to use its functionality whenever possible without too many plugins.
What I had in mind are two servers. The first one with a Laravel instance – let’s call it APP – receiving and answering requests from and to a user. The second one runs different workers, each a Laravel instance. This is how I see the workflow:
APP receives a request from user
APP puts request on a queue
Workers look for jobs on the queue and eventually finds one.
Worker resolves job
Worker responses to APP OR APP finds out somehow that job is resolved
APP sends response to user
My first idea was to work with queues and beanstalkd. The problem is that this all seem to work asynchronously. Is there a way for the APP to wait for the result of one of the workers?
After some more research I stumbled upon Guzzle. Would this be a way to go?
EDIT: Some extra info on the project.
I am talking about a Restful API. E.g. a user sends a request in the form of "https://our.domain/article/1" and their API token in the header. What the user receives is a JSON formatted string like {"id":1,"name":"article_name",etc.}
The reason for using two sides is twofold. At one hand there is the use of different workers. On the other hand we want all the logic of the API as secure as possible. When a hack attack is made, only the APP side would be compromised.
Perhaps I am making things all to difficult with the queues and all that? If you have a better approach to meet the same ends, that would of course also help.
I know your question was how you could run this synchronously, I think that the problem that you are facing is that you are not able to update the first server after the worker is done. The way you could achieve this is with broadcasting.
I have done something similar with uploads in our application. We use a Redis queue but beanstalk will do the same job. On top of that we use pusher which the uses sockets that the user can subscribe to and it looks great.
User loads the web app, connecting to the pusher server
User uploads file (at this point you could show something to tell the user that the file is processing)
Worker sees that there is a file
Worker processes file
Worker triggers and event when done or on fail
This event is broadcasted to the pusher server
Since the user is listening to the pusher server the event is received via javascript
You can now show a popup or update the table with javascript (works even if the user has navigated away)
We used pusher for this but you could use redis, beanstalk and many other solutions to do this. Read about Event Broadcasting in the Laravel documentation.
I'm trying to wrap my head around the message queue model and jobs that I want to implement in a PHP app:
My goal is to offload messages / data that needs to be sent to multiple third party APIs, so accessing them doesnt slow down the client. So sending the data to a message queue is ideal.
I considered using just Gearman to hold the MQ/Jobs, but I wanted to use a Cloud Queue service like SQS or Rackspace Cloud Queues so i wouldnt have to manage the messages.
Here's a diagram of what I think I should do:
Questions:
My workers, would be written in PHP they all have to be polling the cloud queue service? that could get expensive especially when you have a lot of workers.
I was thinking maybe have 1 worker just for polling the queue, and if there are messages, notify the other workers that they have jobs, i just have to keep this 1 worker online using supervisord perhaps? is this polling method better than using a MQ that can notify? How should I poll the MQ, once every second or as fast as it can poll? and then increase the polling workers if I see it slowing down?
I was also thinking of having a single queue for all the messages, then the worker monitoring that distributes the messages to other cloud MQs depending on where they need to be processed, since 1 message might need to be processed by 2 diff workers.
Would I still need gearman to manage my workers or can I just use supervisord to spin workers up and down?
Isn't it more effective and faster to also send a notification to the main worker whenever a message is sent vs polling the MQ? I assume I would the need to use gearman to notify my main worker that the MQ has a message, so it can start checking it. or if I have 300 messages per second, this would generate 300 jobs to check the MQ?
Basically how could I check the MQ as efficiently and as effectively as possible?
Suggestions or corrections to my architecture?
My suggestions basically boil down to: Keep it simple!
With that in mind my first suggestion is to drop the DispatcherWorker. From my current understanding, the sole purpose of the worker is to listen to the MAIN queue and forward messages to the different task queues. Your application should take care of enqueuing the right message onto the right queue (or topic).
Answering your questions:
My workers, would be written in PHP they all have to be polling the cloud queue service? that could get expensive especially when you have a lot of workers.
Yes, there is no free lunch. Of course you could adapt and optimize your worker poll rate by application usage (when more messages arrive increase poll rate) by day/week time (if your users are active at specific times), and so on. Keep in mind that engineering costs might soon be higher than unoptimized polling.
Instead, you might consider push queues (see below).
I was thinking maybe have 1 worker just for polling the queue, and if there are messages, notify the other workers that they have jobs, i just have to keep this 1 worker online using supervisord perhaps? is this polling method better than using a MQ that can notify? How should I poll the MQ, once every second or as fast as it can poll? and then increase the polling workers if I see it slowing down?
This sounds too complicated. Communication is unreliable, there are reliable message queues however. If you don't want to loose data, stick to the message queues and don't invent custom protocols.
I was also thinking of having a single queue for all the messages, then the worker monitoring that distributes the messages to other cloud MQs depending on where they need to be processed, since 1 message might need to be processed by 2 diff workers.
As already mentioned, the application should enqueue your message to multiple queues as needed. This keeps things simple and in place.
Would I still need gearman to manage my workers or can I just use supervisord to spin workers up and down?
There are so many message queues and even more ways to use them. In general, if you are using poll queues you'll need to keep your workers alive by yourself. If however you are using push queues, the queue service will call an endpoint specified by you. Thus you'll just need to make sure your workers are available.
Basically how could I check the MQ as efficiently and as effectively as possible?
This depends on your business requirements and the job your workers do. What time spans are critical? Seconds, Minutes, Hours, Days? If you use workers to send emails, it shouldn't take hours, ideally a couple of seconds. Is there a difference (for the user) between polling every 3 seconds or every 15 seconds?
Solving your problem (with push queues):
My goal is to offload messages / data that needs to be sent to multiple third party APIs, so accessing them doesnt slow down the client. So sending the data to a message queue is ideal. I considered using just Gearman to hold the MQ/Jobs, but I wanted to use a Cloud Queue service like SQS or Rackspace Cloud Queues so i wouldnt have to manage the messages.
Indeed the scenario you describe is a good fit for message queues.
As you mentioned you don't want to manage the message queue itself, maybe you do not want to manage the workers either? This is where push queues pop in.
Push queues basically call your worker. For example, Amazon ElasticBeanstalk Worker Environments do the heavy lifting (polling) in the background and simply call your application with an HTTP request containing the queue message (refer to the docs for details). I have personally used the AWS push queues and have been happy with how easy they are. Note, that there are other push queue providers like Iron.io.
As you mentioned you are using PHP, there is the QPush Bundle for Symfony, which handles incoming message requests. You may have a look at the code to roll your own solution.
I would recommend a different route, and that would be to use sockets. ZMQ is an example of a socket based library already written. With sockets you can create a Q and manage what to do with messages as they come in. The machine will be in stand-by mode and use minimal resources while waiting for a message to come in.
We are currently developing a mobile app for iOS and Android. For this, we need a stable webservices.
Requirements: - Based on PHP and MySQL, must be blazing fast, must be scalable
I've created a custom-coded simple webservices with multiple endpoints to allow passing data from the app to our database, and vice versa.
My Question:
our average response time with my custom coded solution is below 100ms (measured using newrelic) for normal requests (say, updating a DB field, or performing INSERT INTO). This is without any load however (below 100 users daily). When we are creating outbound requests (specifically, sending E-Mail using SendGrid PHP-Framework) we are seeing a response time of > 1000ms. It appears that the request is "waiting" for a response from Sendgrid. Is it possible to tell the script not to "wait for a response"? This is not really ideal. My idea was to store all "pending" requests to a separate table, and then using a cron to run through all "pending" requests and mark them as "completed". Is this a viable solution? And will one cron each minute be enough for processing requests (possible delay of 1min for each E-Mail)?
As always, any replies or suggestions are very appreciated. Thanks in advance!
To answer the first part of your question: Yes you can make asynchronous requests with PHP, and even ignore the service's response. However, as you correctly say it's not a super great solution.
Asynchronous Requests
This excellent blog post on PHP Asynchronous Requests by Segment.io comes to several conclusions:
You can open a socket and write to it, as described by this Stack Overflow Topic - However, it seems that this is actually blocking and fairly slow (300ms in their tests).
You can write to a log file and then process it in another way (essentially a queue, like you describe) - However, this requires another process to read the log and process it. Using the file system can be slow, and shared files can cause all sorts of problems.
You can fork a cURL request - However, this means you aren't waiting for a response, so if SendGrid (or some other service) responds with an error, you can't catch it and react.
Opinion Land
We're now entering semi-opinion land, but queues as you describe (such as a mySQL one with a cron job, or a text file, or something else) tend to be very scalable as you can throw workers at the queue if you need it to process faster. These can be outside your user facing system (and therefor not share resources).
Queues
With a queue, you'd have a separate service that would be responsible for sending an email with SendGrid (e.g.). It would pull tasks off a queue (e.g. "send an email to Nick")and then execute on it.
There are several ways to implement queues that you can process.
You can write your own - As you seem to want to stay on PHP/mySQL, if you do this you'll need to take into account a bunch of queueing problems and weird edge cases. However, you'll have absolute control and for a simple application maybe this will work.
You can implement a self hosted task queue - Celery is meant to be a distributed task queue, øMQ (ZeroMQ) and RabbitMQ can also be used as Task Queues. These are meant to be fast and distributed and have had a lot of thought put into them. You'd need to benchmark them in your system to see if they speed it up. It'd also mean you have to host additional pieces yourself. This however, is likely to be the fastest solution from a communication standpoint.
You can pass things off to a hosted task queue - IronMQ and Amazon SQS are both cool hosted solutions which means you wouldn't need to dedicate resources to them, additionally with IronWorkers (e.g.) you could have the other service taken care of. However, since you're trying to optimize a request to an external service, this probably isn't the solution in this scenario.
Queueing Emails
On the topic of queuing emails (specifically), this is something common to email senders. Like with everything else it means you can have better reliability (because if a service down the line fails you can keep it in the queue and retry).
With email however, there's some specific services out there for queueing messages. These are SMTP Servers. Theoretically you can setup a server like sendmail and then set SendGrid as your "smarthost" or relay and have the server send to SendGrid. It then queues and deals with service interruptions and sends mail with little additional code. However, SMTP servers are pains to deal with, even if they're just forwarding messages. Additionally, SMTP is even slower than HTTP to establish a connection and therefor probably not what you want, but it's good to know.
Another possible solution if you control your own server environment that will speed up your email sending and your application is to install a mail server such as Postfix locally. You then configure Postfix to use your Sendgrid credentials, so any email sent will go from your server to sendgrid.
This is not a PHP solution, but removes the need for writing your own customer solution. If you set Postfix as the default mail server. You can then just use the php mail() function to send email.
https://sendgrid.com/docs/Integrate/Mail_Servers/postfix.html
I developed an android app where it subscribes to a queue and also publishes to other queues. At a time it publishes the same message to two different queues, one of them is a queue named "Queue" and now from a appfog instance i need to subscribe to the "Queue" and consume messages and insert them in a mysql db.
I created a php standalone app for the above purpose with codeigniter. By some reason the worker app loses its connection to rabbitmq. i would like to know the best way to do this.
How can a worker app on appfog can sustain the application restarts.
what of kind of thing i need to use to solve the above problem.
What version of RabbitMQ are you using? In 3.0 heartbeats were enabled by default for connections. If there is a regular disconnect (10 minutes by default) period then it is probably the heartbeat that is disconnecting.
Your client (hopefully) can configure a longer during the connection negotiation or you can have your PHP app send a heartbeat at a regular interval. Here is the announcement post that describes it a bit.