To avoid common concurrency problems I have single threaded php consumers that consume several messages from one rabbitMQ queue.
Basically its the same php script executed X times and waiting for new messages.
My question is:
Is correct to assume that as my consumers are single threaded I set a prefetch configuration of 1 message?
PHP Prefetch Count RabbitMQ
Because obviously it wont process more than 1 message at a time...
Right?
Prefetch is simply a number of messages that the broker will put on the consumer side and remove them only after those messages were acknowledged. Now, if we assume that the client(consumer) is processing one message at the time, then this number prefetch_count is not really important. But, if the clients are consuming messages in one thread, and then spawning new threads - each to process one message, then it's obviously a different story. So one could say that multi-threaded acknowledging is more of a story.
Since you wrote single threaded consumers I'm pretty sure you mean the entire client is single threaded, not just the "consuming" portion, so the my direct answer is you can set it to 1, but you don't have to, it's depending more on the way you are ACKing the messages. I just wanted to elaborate on multi-threaded processing part.
Related
I was wondering, because I can not find anything on symfony or other resources, if php's symfony/messenger can handle messages in "bulk" with any async transport.
For example. Grab 20 messages from the bus, handle those 20 messages, and ack or reject any of the messages.
I know RabbitMQ has a feature to grab n-amount of messages from the queue, and process all of them in a single run.
In some cases this will have a better performance over scaling the async workers.
Does anybody have any leads, resources or experience with it? Or am I trying to resolve something by going against the idea of symfony/messenger?
[update]
I'm aware that bulk messages are not part of the (async) messaging concept. That each message should be processed individually. But some message brokers have implemented a feature to "grab" X-amount of messages from a queue and process them (either by sending an acknowledge or rejection, or otherwise). I know handling multiple messages in a single iteration increases complexity of any consumers, but in some cases it will improve performance.
I've used this concept of consuming multiple messages in a single iteration many times, but never with php's symfony/messenger.
This was not natively possible prior to symfony 5.4.
They added a BatchHandlerInterface which will allow you to batch (and choose the size of the batch) your messages.
You can find more info here :
Symfony - Handle messages in batches
GitHub PR of the feature
First I think you have the wrong concept. There is no such thing as “messages in bulk” in the queue world.
The idea of the queue is that one message is received in the consumer and the consumer is responsible of letting know the queue that the message was acknowledged, so it can be deleted. If this does not happen in X time the message is again visible for other messages.
If the messenger get 20 messages from the queue it still process them one by one and after he finish processing just acknowledge every message. These 20 messages are “hidden” to other consumers for some time /it depends of the configuration of the queue/. This answer also the question of multiple consumers.
There is a long running process(Excel report creation) in my web app that needs to be executed in a background.
Some details about the app and environment.
The app consists of many instances, where each client has separate one (with customized business logic) while everything is hosted on our server. The functionality that produces Excel is the same.
I'm planning to have one rabbitMq server installed. One part of app(Publisher) will take all report options from user and will put it into message. And some background job(Consumer) will consume it, produce report and send it via email.
However, there is a flaw in such design, where,say, users from one instance will queue lots of complicated reports(worth ~10 min of work) and a user from another instance will queue an easy one(1-2 mins) and he will have to wait until others will finish.
There could be separate queues for each app instance, but in that case I would need to create one consumer per instance. Given that there are 100+ instances atm, it doesn't look like a viable approach.
I was thinking if it's possible to have a script that checks all available queues(and consumers) and create a new consumer for a queue that doesn't have one. There are no limitations on language for consumer and such script.
Does that sound like a feasible approach? If not, please give a suggestion.
Thanks
As I understood topic correctly everything lies on one server - RabbitMQ, web application, different instances per client and messeges' consumers. In that case I rather put different topics per message (https://www.rabbitmq.com/tutorials/tutorial-five-python.html) and introduce consumer priorities (https://www.rabbitmq.com/consumer-priority.html). Based on that options during publishing of the message I will create combination of topic and priority of the message - publisher will know number of already sent reports per client, selected options and will decide is it high, low or normal priority.
Logic to pull messages based on that data will be in the consumer so consumer will not get heavy topics when there are in process already 3 (example).
Based on the total number of messages in the queue (its not accurate 100%) and previous topics and priorities you can implement kind of leaking bucket strategy in order to get control of resources- max 100 number of reports generated simultaneously.
You can consider using ZeroMQ (http://zeromq.org) for your case its maybe more suitable that RabbitMQ because is more simple and its broker less solution.
I have a question concerning the third RabbitMQ tutorial. I am trying to implement something similar, except there is no guarantee the consumer(s) would be running at the time the producer sends a message to the exchange.
So, I have my producer which publishes the messages to a fanout exchange:
$channel->exchange_declare('my_exchange', 'fanout', false, false, false);
$channel->basic_publish('my_message', 'my_exchange');
In my publishers, I declare queues, which I then bind to the exchange:
list($queueName,, ) = $channel->queue_declare("", false, false, true, false);
$channel->queue_bind($queueName, 'my_exchange');
And this is where my problem has it's root. The tutorial says:
The messages will be lost if no queue is bound to the exchange yet,
but that's okay for us; if no consumer is listening yet we can safely
discard the message.
Is there a way to somehow preserve those messages, so when a consumer starts, it would access the previously sent messages? The only way I figured out how to do it is to declare the same queue in my producer and my publisher, but it kind of defeats the purpose of having an exchange and separate queues for different consumers.
The queues need to exist, doesn't matter really who/what creates them: it can be producer (althoug I would strongly discourage this), consumer, some third admin app that just creates queus via rest api, rabbitmqctl... If you want to consume the queue(s) later, just make sure that they're durable and that TTL for messages is long enough (also durable messages if needed). But beware that your queue(s) don't get into flow state.
The only way I figured out how to do it is to declare the same queue
in my producer and my publisher, but it kind of defeats the purpose of
having an exchange and separate queues for different consumers.
First - I think you meant to say in my producer and my subscriber :)
Second, separate queues for consumers (or queue per consumer) is just in this example. Bare in mind that this is for a fanout exchange, and each consumer decalres an exclusive queue - when the consumer disconnects, the queue is gone. And that's why that's okay for us, because we're simply broadcasting and who wants the broadcast (the messages) needs to get it. Fanout exchange just puts messages to all the queues bound to it, that's it. It's perfectly ok to have multiple consumers consuming from same queue (look at tutorial 2).
So you just need to consider your use case. Of course it doesn't make sense to create fanout exchange and pre-setup the queus for the consumers... Perhaps you need just some routing keys or something else.
In this example (so tutorial 3) it's ment that there is a brodcast of messages, and if no one get's them, not a big (or small) deal. If anyone wants them, they need to get them. It's like a tv channel - regardless if someone is watching or not, the signal goes on.
Consumers should attach themselves to queues, they shouldn't declare their own queues. Think of queues as buckets of work to be done. Depending on the workload you can add N consumers to those queues to do the work.
When you create an exchange you should have one or more queues (buckets of work) that are attached to that exchange. If you do this, messages will flow into the queues and start to queue-up (pardon the pun). Your consumers can then attach whenever they are ready and start doing the work.
I'm implementing RabbitMQ to perform some image editing operations on another server. Though, from time to time the request may arrive on that server before the source image is synced to it - in which case I would like to pop the message back in the the queue and process it after all other operations have completed.
However, calling basic.nack with the resubmit bit set makes my queue re-receive that message immediately - ahead of any operations that operations that can actually complete.
Currently I feel like I'm forced to implement some logic that just re-submits the original message to the exchange, but I'd like to avoid that. Both because the same message may have been successfully processed on another server (with it's own queue), and because I expect this to be so much of a common pattern that there must be better way.
(oh, I'm using php-amqplib in both consumer and server code)
Thanks!
Update: I solved my problem using Dead Letter Exchange, as suggested by zaq178miami
My current solution:
Declares a dead letter exchange $dead_letter_exchange on the original queue $worker
Declares a recovery exchange $recovery_exchange
Declares a queue $dead_queue, with a x-message-ttl of 5 seconds and x-dead-letter-exchange set to $recovery_exchange
Binds $dead_letter_queue to $dead_letter_exchange
And binds $worker to $recovery_exchange
$dead_letter_exchange and $recovery_exchange are generated names, based on the exchange I'm consuming from and the value of $worker
Making every message that gets nack'ed return to the worker only on that specific queue (server) after five seconds for a retry. I may still want to apply some logic that throws the message away after $n retries.
I'm still open to better ideas ;-)
Looks like you have 'race condition' issue which is the cause of the problem. Maybe it is a good choice to delay message publishing or publish delayed messages to be sure that image synced to target machine or publish message when image arrives (which might be tricky) or just sync image on demand (when message consumed). You can even add some API to get source image, so you can scale your consumers horizontally without any pain any time. The idea is to make consumers atomic and undependable as much as it can be.
Back to original question, if it an option for you, try Dead Letter Exchanges to move failed messages to separate queue. Mixing failed messages and valid without having definitive mechanism to detect re-published smells a bit (due to such reasons like potential cycling problem, management difficulties). But it really depends on your needs, messages rate and hardware, if some solution yields stable result and you are sure about it - just stick to it.
Note, if you are using php-amqplib you can start consuming messages from more than one queue at the same time, so you can consume messages from the main queue and postponed messages (but in such case you have to publish message to postponed queue delayed too to prevent it immediate consuming).
Usually delayed messages publishing done via per-message or per-queue ttl and extra queue with DLX set to the main working queue, or in your case to postponed messages queue.
I'm planning to write a system which should accept input from users (from browser), make some calculations and show updated data to all users, currently visiting certain website.
Input can come one time in a hour, but can also come 100 times each second. It is VERY important not to loose any of user inputs, but really register and process ALL of them.
So, the idea was to create two programs. One will receive data (input) from browser and store it somehow in a queue (maybe an array, to be really fast?). Second program should wait until there are new items in the queue (saving resources) and then became active and begin to process the queue items. Both programs should run asynchronously.
I can php, so I would write first program using php. But I'm not sure about second part.. I'm not sure about how to send an event from first to second program. I need some advice at this point. Threads are not possible with php? I need some ideas how to create the system like i discribed.
I would use comet server to communicate feedback to the website the input came from (this part already tested)
As per the comments above, trivially you appear to be describing a message queueing / processing system, however looking at your question in more depth this is probably not the case:
Both programs should run asynchronously.
Having a program which process a request from a browser but does it asynchronously is an oxymoron. While you could handle the enqueueing of a message after dealing with the HTTP request, its still a synchronous process.
It is VERY important not to loose any of user inputs
PHP is not a good language for writing control systems for nuclear reactors (nor, according to Microsoft, is Java). HTTP and TCP/IP are not ideal for real time systems either.
100 times each second
Sorry - I thought you meant there could be a lot of concurrent requests. This is not a huge amount.
You seem to be confusing the objective of using COMET / Ajax with asynchronous processing of the application. Even with very large amounts of data, it should be possible to handle the interaction using a single php script working synchronously.