There is a long running process(Excel report creation) in my web app that needs to be executed in a background.
Some details about the app and environment.
The app consists of many instances, where each client has separate one (with customized business logic) while everything is hosted on our server. The functionality that produces Excel is the same.
I'm planning to have one rabbitMq server installed. One part of app(Publisher) will take all report options from user and will put it into message. And some background job(Consumer) will consume it, produce report and send it via email.
However, there is a flaw in such design, where,say, users from one instance will queue lots of complicated reports(worth ~10 min of work) and a user from another instance will queue an easy one(1-2 mins) and he will have to wait until others will finish.
There could be separate queues for each app instance, but in that case I would need to create one consumer per instance. Given that there are 100+ instances atm, it doesn't look like a viable approach.
I was thinking if it's possible to have a script that checks all available queues(and consumers) and create a new consumer for a queue that doesn't have one. There are no limitations on language for consumer and such script.
Does that sound like a feasible approach? If not, please give a suggestion.
Thanks
As I understood topic correctly everything lies on one server - RabbitMQ, web application, different instances per client and messeges' consumers. In that case I rather put different topics per message (https://www.rabbitmq.com/tutorials/tutorial-five-python.html) and introduce consumer priorities (https://www.rabbitmq.com/consumer-priority.html). Based on that options during publishing of the message I will create combination of topic and priority of the message - publisher will know number of already sent reports per client, selected options and will decide is it high, low or normal priority.
Logic to pull messages based on that data will be in the consumer so consumer will not get heavy topics when there are in process already 3 (example).
Based on the total number of messages in the queue (its not accurate 100%) and previous topics and priorities you can implement kind of leaking bucket strategy in order to get control of resources- max 100 number of reports generated simultaneously.
You can consider using ZeroMQ (http://zeromq.org) for your case its maybe more suitable that RabbitMQ because is more simple and its broker less solution.
Related
I'm building a website in PHP and need an API to be checked on a regular basis for EACH USER individually.
In a nutshell it's a SaaS to steer a user account on another website with additional/automated options. A web based bot if you want.
So basically, I need to dynamically create a new cron job with its individual interval for each user, and these should be executed in parallel (it would take too long to put all queries into one cron job if theres a lot of users).
Also, it might become necessary that each request is done with a different IP. Reason is that it is possible that the API provider is annoyed by us and wants to block us. Since the API key is public they will most likely do it by simply blocking our IP. When we change that frequently, that should help a lot.
Is something like that possible? What would this require? Any option that doesn't get too expensive?
I thought of RabbitMQ for example, but that wouldn't quite tackle all issues and I'm wondering if there's some better/smarter solution.
Thanks!
Look at temporal.io open source project. It supports practically an unlimited number of such periodic jobs. The PHP SDK can be found here.
I'm just curious how would you handle a scenario where a lot of PDF's has to be generated on the server and be send to the user by email. You're not able to temper with the PDF because it needs to be 100% secure or close to that number.
For example the PDF contains the order you just made in a webshop, proof of purchase or something like that.
The application will have a lot of concurrent users. For this question I will use Laravel as a base platform for the web application.
I had the idea of running a cron job at night that will generate all this PDF's at once and send per e-mail.
What is considered best practise in this scenario?
For example the PDF contains the order you just made in a webshop, proof of purchase or something like that.
Given that these will presumably occur throughout the day, a queue may be a better solution than a cron. Every time someone does an action that'd require a PDF, fire off a queue job. A background process will check for queued jobs and process them.
This avoids having a giant backlog, protects you in the case a cron fails, and gets PDFs out to the clients in a more timely fashion.
I am making an payment system in PHP which depends on a REST API.
My Business Logic:
If someone submits a request through my system, lets say "transfer money from point A to point B" that transaction is saved in my database with status: "submited", then submitted to the (Mobile Network Operator) API URL which processes it and returns back the status to my system, update my database transaction status to the new status 'eg: waiting for the confirmation' and notify the user of the incoming status.
The problem is:
My application should keep requesting with an interval of 10 seconds to check for the new status and showing the new status to the user till the last status of 'complete or declined', since statuses can range to 5 eg:"waiting, declined, approved, complete...' .
I have managed to do this using AJAX, setting time intervals in JavaScript. But it stops requesting if the user closes the browser or anything happened at their end. resulting into my app not knowing whether the money was delivered or not .
I would like to how i can run this circular tasks in the background using Gearman without involving JavaScript time intervals thanks
Gearman is more of a worker queue, not a scheduling system. I would probably setup some type of cron job that will query the database and submit the appropriate jobs to Gearman in an async way. With gearman, you will want to use libdrizzle or something else for persistent queues and also some type of GearmanWorker process manager to run more than one job at a time. There are a number of projects that currently do this with varying degrees of success like https://github.com/brianlmoon/GearmanManager. None of the worker managers I have evaluated have really been up to par, so I created my own that will probably be open-sourced shortly.
You wouldn't use Gearman in the background for circular tasks, which is normally referred to as polling. Gearman is normally used as a job queue for doing things like video compression, resizing images, sending emails, or other tasks that you want to 'background'.
I don't recommend polling the database, either on the frontend or the backend. Polling is generally considered bad, because it doesn't scale. In your javascript example, you can see that as your application grows and is used by thousands of users, polling is going to introduce a lot of unnecessary traffic and load on your servers. On the backend, the machine doing the polling is a single point of failure.
The architecture you want to explore is a message queue. It's similar to the Listener/Observer pattern in programming, but applied at the systems level. This will allow a more robust system that can handle interruptions, from a user closing the browser all the way to a backend system going down for maintenance.
I have a question concerning the third RabbitMQ tutorial. I am trying to implement something similar, except there is no guarantee the consumer(s) would be running at the time the producer sends a message to the exchange.
So, I have my producer which publishes the messages to a fanout exchange:
$channel->exchange_declare('my_exchange', 'fanout', false, false, false);
$channel->basic_publish('my_message', 'my_exchange');
In my publishers, I declare queues, which I then bind to the exchange:
list($queueName,, ) = $channel->queue_declare("", false, false, true, false);
$channel->queue_bind($queueName, 'my_exchange');
And this is where my problem has it's root. The tutorial says:
The messages will be lost if no queue is bound to the exchange yet,
but that's okay for us; if no consumer is listening yet we can safely
discard the message.
Is there a way to somehow preserve those messages, so when a consumer starts, it would access the previously sent messages? The only way I figured out how to do it is to declare the same queue in my producer and my publisher, but it kind of defeats the purpose of having an exchange and separate queues for different consumers.
The queues need to exist, doesn't matter really who/what creates them: it can be producer (althoug I would strongly discourage this), consumer, some third admin app that just creates queus via rest api, rabbitmqctl... If you want to consume the queue(s) later, just make sure that they're durable and that TTL for messages is long enough (also durable messages if needed). But beware that your queue(s) don't get into flow state.
The only way I figured out how to do it is to declare the same queue
in my producer and my publisher, but it kind of defeats the purpose of
having an exchange and separate queues for different consumers.
First - I think you meant to say in my producer and my subscriber :)
Second, separate queues for consumers (or queue per consumer) is just in this example. Bare in mind that this is for a fanout exchange, and each consumer decalres an exclusive queue - when the consumer disconnects, the queue is gone. And that's why that's okay for us, because we're simply broadcasting and who wants the broadcast (the messages) needs to get it. Fanout exchange just puts messages to all the queues bound to it, that's it. It's perfectly ok to have multiple consumers consuming from same queue (look at tutorial 2).
So you just need to consider your use case. Of course it doesn't make sense to create fanout exchange and pre-setup the queus for the consumers... Perhaps you need just some routing keys or something else.
In this example (so tutorial 3) it's ment that there is a brodcast of messages, and if no one get's them, not a big (or small) deal. If anyone wants them, they need to get them. It's like a tv channel - regardless if someone is watching or not, the signal goes on.
Consumers should attach themselves to queues, they shouldn't declare their own queues. Think of queues as buckets of work to be done. Depending on the workload you can add N consumers to those queues to do the work.
When you create an exchange you should have one or more queues (buckets of work) that are attached to that exchange. If you do this, messages will flow into the queues and start to queue-up (pardon the pun). Your consumers can then attach whenever they are ready and start doing the work.
I'm planning to write a system which should accept input from users (from browser), make some calculations and show updated data to all users, currently visiting certain website.
Input can come one time in a hour, but can also come 100 times each second. It is VERY important not to loose any of user inputs, but really register and process ALL of them.
So, the idea was to create two programs. One will receive data (input) from browser and store it somehow in a queue (maybe an array, to be really fast?). Second program should wait until there are new items in the queue (saving resources) and then became active and begin to process the queue items. Both programs should run asynchronously.
I can php, so I would write first program using php. But I'm not sure about second part.. I'm not sure about how to send an event from first to second program. I need some advice at this point. Threads are not possible with php? I need some ideas how to create the system like i discribed.
I would use comet server to communicate feedback to the website the input came from (this part already tested)
As per the comments above, trivially you appear to be describing a message queueing / processing system, however looking at your question in more depth this is probably not the case:
Both programs should run asynchronously.
Having a program which process a request from a browser but does it asynchronously is an oxymoron. While you could handle the enqueueing of a message after dealing with the HTTP request, its still a synchronous process.
It is VERY important not to loose any of user inputs
PHP is not a good language for writing control systems for nuclear reactors (nor, according to Microsoft, is Java). HTTP and TCP/IP are not ideal for real time systems either.
100 times each second
Sorry - I thought you meant there could be a lot of concurrent requests. This is not a huge amount.
You seem to be confusing the objective of using COMET / Ajax with asynchronous processing of the application. Even with very large amounts of data, it should be possible to handle the interaction using a single php script working synchronously.