I was wondering, because I can not find anything on symfony or other resources, if php's symfony/messenger can handle messages in "bulk" with any async transport.
For example. Grab 20 messages from the bus, handle those 20 messages, and ack or reject any of the messages.
I know RabbitMQ has a feature to grab n-amount of messages from the queue, and process all of them in a single run.
In some cases this will have a better performance over scaling the async workers.
Does anybody have any leads, resources or experience with it? Or am I trying to resolve something by going against the idea of symfony/messenger?
[update]
I'm aware that bulk messages are not part of the (async) messaging concept. That each message should be processed individually. But some message brokers have implemented a feature to "grab" X-amount of messages from a queue and process them (either by sending an acknowledge or rejection, or otherwise). I know handling multiple messages in a single iteration increases complexity of any consumers, but in some cases it will improve performance.
I've used this concept of consuming multiple messages in a single iteration many times, but never with php's symfony/messenger.
This was not natively possible prior to symfony 5.4.
They added a BatchHandlerInterface which will allow you to batch (and choose the size of the batch) your messages.
You can find more info here :
Symfony - Handle messages in batches
GitHub PR of the feature
First I think you have the wrong concept. There is no such thing as “messages in bulk” in the queue world.
The idea of the queue is that one message is received in the consumer and the consumer is responsible of letting know the queue that the message was acknowledged, so it can be deleted. If this does not happen in X time the message is again visible for other messages.
If the messenger get 20 messages from the queue it still process them one by one and after he finish processing just acknowledge every message. These 20 messages are “hidden” to other consumers for some time /it depends of the configuration of the queue/. This answer also the question of multiple consumers.
Related
I have a question concerning the third RabbitMQ tutorial. I am trying to implement something similar, except there is no guarantee the consumer(s) would be running at the time the producer sends a message to the exchange.
So, I have my producer which publishes the messages to a fanout exchange:
$channel->exchange_declare('my_exchange', 'fanout', false, false, false);
$channel->basic_publish('my_message', 'my_exchange');
In my publishers, I declare queues, which I then bind to the exchange:
list($queueName,, ) = $channel->queue_declare("", false, false, true, false);
$channel->queue_bind($queueName, 'my_exchange');
And this is where my problem has it's root. The tutorial says:
The messages will be lost if no queue is bound to the exchange yet,
but that's okay for us; if no consumer is listening yet we can safely
discard the message.
Is there a way to somehow preserve those messages, so when a consumer starts, it would access the previously sent messages? The only way I figured out how to do it is to declare the same queue in my producer and my publisher, but it kind of defeats the purpose of having an exchange and separate queues for different consumers.
The queues need to exist, doesn't matter really who/what creates them: it can be producer (althoug I would strongly discourage this), consumer, some third admin app that just creates queus via rest api, rabbitmqctl... If you want to consume the queue(s) later, just make sure that they're durable and that TTL for messages is long enough (also durable messages if needed). But beware that your queue(s) don't get into flow state.
The only way I figured out how to do it is to declare the same queue
in my producer and my publisher, but it kind of defeats the purpose of
having an exchange and separate queues for different consumers.
First - I think you meant to say in my producer and my subscriber :)
Second, separate queues for consumers (or queue per consumer) is just in this example. Bare in mind that this is for a fanout exchange, and each consumer decalres an exclusive queue - when the consumer disconnects, the queue is gone. And that's why that's okay for us, because we're simply broadcasting and who wants the broadcast (the messages) needs to get it. Fanout exchange just puts messages to all the queues bound to it, that's it. It's perfectly ok to have multiple consumers consuming from same queue (look at tutorial 2).
So you just need to consider your use case. Of course it doesn't make sense to create fanout exchange and pre-setup the queus for the consumers... Perhaps you need just some routing keys or something else.
In this example (so tutorial 3) it's ment that there is a brodcast of messages, and if no one get's them, not a big (or small) deal. If anyone wants them, they need to get them. It's like a tv channel - regardless if someone is watching or not, the signal goes on.
Consumers should attach themselves to queues, they shouldn't declare their own queues. Think of queues as buckets of work to be done. Depending on the workload you can add N consumers to those queues to do the work.
When you create an exchange you should have one or more queues (buckets of work) that are attached to that exchange. If you do this, messages will flow into the queues and start to queue-up (pardon the pun). Your consumers can then attach whenever they are ready and start doing the work.
To avoid common concurrency problems I have single threaded php consumers that consume several messages from one rabbitMQ queue.
Basically its the same php script executed X times and waiting for new messages.
My question is:
Is correct to assume that as my consumers are single threaded I set a prefetch configuration of 1 message?
PHP Prefetch Count RabbitMQ
Because obviously it wont process more than 1 message at a time...
Right?
Prefetch is simply a number of messages that the broker will put on the consumer side and remove them only after those messages were acknowledged. Now, if we assume that the client(consumer) is processing one message at the time, then this number prefetch_count is not really important. But, if the clients are consuming messages in one thread, and then spawning new threads - each to process one message, then it's obviously a different story. So one could say that multi-threaded acknowledging is more of a story.
Since you wrote single threaded consumers I'm pretty sure you mean the entire client is single threaded, not just the "consuming" portion, so the my direct answer is you can set it to 1, but you don't have to, it's depending more on the way you are ACKing the messages. I just wanted to elaborate on multi-threaded processing part.
I have a Symfony2 app that under some circumstances has to send more than 10.000 push and email notifications.
I developed a SQS flow with some workers polling the queues to send emails and mobile push notifications.
But now, I have the problem that, when in the request/response cycle I need to send to SQS this task/jobs (maybe not that amount) this task itself is consuming a lot of time (response timeout is normally reached).
Should I process this task at background (I need to send back a quick response)? And how to handle possible errors with this scenario?
NOTE: Amazon SQS can receive 10 messages at one request and I already using this method. Maybe should I build a simple SQS Message with a lot of notifications jobs (max. 256K) to send less HTTP requests to SQS?
The moment you have a single action that triggers 10k actions, you need to try to find a way to tell the user that "OK, I got it. I'll start working on it and will let you know when it's done".
So to bring that work into the background, a domain event should be raised from your user's action which would be queued into SQS. The user gets notified, and then a worker can pick up that message from the queue and start sending emails and push notifications to another queue.
At the end of the day, 10k messages in batches of 10 are just 1k requests to SQS, which should be pretty quick anyway.
Try to keep your messages small. Don't send the whole content of an email into a queue message, because then you'll get unnecessary long latencies. Keep the content in a reachable place or just query for it again when consuming the message instead of passing big content up and down the network.
And how to handle possible errors with this scenario?
Amazon provides dead letter queues for this. In asynchronous systems I've built, I usually create a queue and then attach a redrive policy to it that says "if I see the same message on this queue 10 times, send it to a dead letter queue so that it doesn't bounce back and forth between the queue and a consumer for all eternity". The dead letter queue is simply another queue.
From a dead letter queue you can decide what to do with data that did not process. Since it's notifications (emails or push notifications) in your case, you might have another component in your system that will periodically reprocess a dead letter queue. Scheduled Lambdas are good for this.
I'm implementing RabbitMQ to perform some image editing operations on another server. Though, from time to time the request may arrive on that server before the source image is synced to it - in which case I would like to pop the message back in the the queue and process it after all other operations have completed.
However, calling basic.nack with the resubmit bit set makes my queue re-receive that message immediately - ahead of any operations that operations that can actually complete.
Currently I feel like I'm forced to implement some logic that just re-submits the original message to the exchange, but I'd like to avoid that. Both because the same message may have been successfully processed on another server (with it's own queue), and because I expect this to be so much of a common pattern that there must be better way.
(oh, I'm using php-amqplib in both consumer and server code)
Thanks!
Update: I solved my problem using Dead Letter Exchange, as suggested by zaq178miami
My current solution:
Declares a dead letter exchange $dead_letter_exchange on the original queue $worker
Declares a recovery exchange $recovery_exchange
Declares a queue $dead_queue, with a x-message-ttl of 5 seconds and x-dead-letter-exchange set to $recovery_exchange
Binds $dead_letter_queue to $dead_letter_exchange
And binds $worker to $recovery_exchange
$dead_letter_exchange and $recovery_exchange are generated names, based on the exchange I'm consuming from and the value of $worker
Making every message that gets nack'ed return to the worker only on that specific queue (server) after five seconds for a retry. I may still want to apply some logic that throws the message away after $n retries.
I'm still open to better ideas ;-)
Looks like you have 'race condition' issue which is the cause of the problem. Maybe it is a good choice to delay message publishing or publish delayed messages to be sure that image synced to target machine or publish message when image arrives (which might be tricky) or just sync image on demand (when message consumed). You can even add some API to get source image, so you can scale your consumers horizontally without any pain any time. The idea is to make consumers atomic and undependable as much as it can be.
Back to original question, if it an option for you, try Dead Letter Exchanges to move failed messages to separate queue. Mixing failed messages and valid without having definitive mechanism to detect re-published smells a bit (due to such reasons like potential cycling problem, management difficulties). But it really depends on your needs, messages rate and hardware, if some solution yields stable result and you are sure about it - just stick to it.
Note, if you are using php-amqplib you can start consuming messages from more than one queue at the same time, so you can consume messages from the main queue and postponed messages (but in such case you have to publish message to postponed queue delayed too to prevent it immediate consuming).
Usually delayed messages publishing done via per-message or per-queue ttl and extra queue with DLX set to the main working queue, or in your case to postponed messages queue.
Has someone come up with an elegant way in PHP to recognize that SQS has just sent you the same message from the queue that it has sent previously? This could happen if the SQS has never received a 'deletemessage' request sent by the requestor because the original SQS queue could have gone down, so its replacement will resend the same message as per Amazon docs.
Would appreciate any pointers, code samples in PHP etc.
When you process a message you receive from SQS, there's a message id that is the same every time you receive the message, and a receipt handle (used to send the delete message) that changes each time. Saving the message ID along with the work you do as a result of the message allows your appplication to be aware that the work should not be repeated.
Of course, you need to set the visibility timeout on yoir queue high enough that you won't fail to delete each message or extend its invisibility before the timeout expires.
A recent addition to SQS is the ability to divert messages to a different queue -- they call it a "dead letter queue" -- once they've been delivered via the original queue a configurable number of times.
This seems like an easy enough workaround.
http://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/SQSDeadLetterQueue.html
It's always possible (though I have not caught it happening) that SQS could still deliver the message more than once, because, being a massive distributed syatem, SQS is designed to deliver messages "at least" once, so you have to accommodate that possibility. Saving the message ID in a column in my database that has a unique constraint on it, which makes inserting a duplicate impossible.
Amazon SQS is engineered to provide “at least once” delivery of all messages in its queues. Although most of the time each message will be delivered to your application exactly once, you should design your system so that processing a message more than once does not create any errors or inconsistencies.
https://aws.amazon.com/sqs/faqs/