I'm writing a web app in PHP + Laravel + MySQL.
In the system, a user can schedule emails (and other API calls) at arbitrary times (much like how you schedule posts in WordPress). I can use CRON to inspect the database every 5min or so to find emails that should be sent, send them, and update their status.
However, this is a SaaS app. So the amount of emails to be sent at a particular time can grow rapidly. I can create a "lock file" every time the CRON script runs so that only one instance of it is running at a time. The lock file will be deleted after a script finishes execution.
But with potentially large data, I would want a way to process multiple messages simultaneously, potentially using multiple "workers." Is there any existing solution manage such a queue?
Yes! Task/Message/Job queues are what you are looking for! They allow you to put various tasks in queues from which you can retrieve them and process them, this process can scale horizontally as each worker can pull a task once its finished with the previous one.
You should have the cron maybe every minute/two minutes that just uploads the task and what needs to be done. This will make sure the cron is very quick.
Take a look at Iron.io Here is an extract from the website which gives a nice overview of these kinds of systems:
An easy-to-use scalable task queue that gives cloud developers a
simple way to offload front-end tasks, run scheduled jobs, and process
tasks in the background and at scale.
Gearman is also a great solution that you can use yourself and is very simple. You can send the message in many different languages and use a different langauge to process it. Say PHP -> C etc...
The Wikipedia link will tell you everything you need to know, here is a quick excerpt:
Message queues provide an asynchronous communications protocol,
meaning that the sender and receiver of the message do not need to
interact with the message queue at the same time. Messages placed onto
the queue are stored until the recipient retrieves them.
Related
I want to replace my cron job with apache kafka using php.
Is this possible?
Now my cron does work as updation on databases. Also sending email, sms depend on conditions. Also periodically updation in databases.
And daily backup of database.
Is it possible to implement this using kafka
You need to design your entire environment in terms of events rather than "batch time slots", but yes, it's possible in theory. As a shim, you can start with a Kafka process in Cron that will read messages for a configurable amount of time (or max amount of messages), and then process that chunk.
As for what you have asked for, you can make a CDC / changelog topic for database events (if you make this a compacted topic, you remove the need for a daily backup, as every database event is persistent in Kafka from the beginning of your DB history - look at the Debezium project for a starting point), and you can derive corresponding emails or sms topics which you have consumers polling and firing off SMTP, SMS, or GCM/APNs messages as you're probably already doing if this is the system you are migrating from.
None of this necessarily needs to be in PHP either (or really Kafka over another pub-sub system, for that matter). I would implore you to consider a microservices based approach that uses a client library and technology that makes most sense for your use cases. For example, AWS can integrate Kinesis (or MSK)+SNS+SES and you have an equivalent Kafka+SMS+Email solution with no infrastructure to maintain yourself.
Before you can go down this path, though, you need to stop batching your data into slices for Cron to process, and rather publish the data event-by-event, and do continuous, rolling aggregations as necessary over some time windows
I'm trying to wrap my head around the message queue model and jobs that I want to implement in a PHP app:
My goal is to offload messages / data that needs to be sent to multiple third party APIs, so accessing them doesnt slow down the client. So sending the data to a message queue is ideal.
I considered using just Gearman to hold the MQ/Jobs, but I wanted to use a Cloud Queue service like SQS or Rackspace Cloud Queues so i wouldnt have to manage the messages.
Here's a diagram of what I think I should do:
Questions:
My workers, would be written in PHP they all have to be polling the cloud queue service? that could get expensive especially when you have a lot of workers.
I was thinking maybe have 1 worker just for polling the queue, and if there are messages, notify the other workers that they have jobs, i just have to keep this 1 worker online using supervisord perhaps? is this polling method better than using a MQ that can notify? How should I poll the MQ, once every second or as fast as it can poll? and then increase the polling workers if I see it slowing down?
I was also thinking of having a single queue for all the messages, then the worker monitoring that distributes the messages to other cloud MQs depending on where they need to be processed, since 1 message might need to be processed by 2 diff workers.
Would I still need gearman to manage my workers or can I just use supervisord to spin workers up and down?
Isn't it more effective and faster to also send a notification to the main worker whenever a message is sent vs polling the MQ? I assume I would the need to use gearman to notify my main worker that the MQ has a message, so it can start checking it. or if I have 300 messages per second, this would generate 300 jobs to check the MQ?
Basically how could I check the MQ as efficiently and as effectively as possible?
Suggestions or corrections to my architecture?
My suggestions basically boil down to: Keep it simple!
With that in mind my first suggestion is to drop the DispatcherWorker. From my current understanding, the sole purpose of the worker is to listen to the MAIN queue and forward messages to the different task queues. Your application should take care of enqueuing the right message onto the right queue (or topic).
Answering your questions:
My workers, would be written in PHP they all have to be polling the cloud queue service? that could get expensive especially when you have a lot of workers.
Yes, there is no free lunch. Of course you could adapt and optimize your worker poll rate by application usage (when more messages arrive increase poll rate) by day/week time (if your users are active at specific times), and so on. Keep in mind that engineering costs might soon be higher than unoptimized polling.
Instead, you might consider push queues (see below).
I was thinking maybe have 1 worker just for polling the queue, and if there are messages, notify the other workers that they have jobs, i just have to keep this 1 worker online using supervisord perhaps? is this polling method better than using a MQ that can notify? How should I poll the MQ, once every second or as fast as it can poll? and then increase the polling workers if I see it slowing down?
This sounds too complicated. Communication is unreliable, there are reliable message queues however. If you don't want to loose data, stick to the message queues and don't invent custom protocols.
I was also thinking of having a single queue for all the messages, then the worker monitoring that distributes the messages to other cloud MQs depending on where they need to be processed, since 1 message might need to be processed by 2 diff workers.
As already mentioned, the application should enqueue your message to multiple queues as needed. This keeps things simple and in place.
Would I still need gearman to manage my workers or can I just use supervisord to spin workers up and down?
There are so many message queues and even more ways to use them. In general, if you are using poll queues you'll need to keep your workers alive by yourself. If however you are using push queues, the queue service will call an endpoint specified by you. Thus you'll just need to make sure your workers are available.
Basically how could I check the MQ as efficiently and as effectively as possible?
This depends on your business requirements and the job your workers do. What time spans are critical? Seconds, Minutes, Hours, Days? If you use workers to send emails, it shouldn't take hours, ideally a couple of seconds. Is there a difference (for the user) between polling every 3 seconds or every 15 seconds?
Solving your problem (with push queues):
My goal is to offload messages / data that needs to be sent to multiple third party APIs, so accessing them doesnt slow down the client. So sending the data to a message queue is ideal. I considered using just Gearman to hold the MQ/Jobs, but I wanted to use a Cloud Queue service like SQS or Rackspace Cloud Queues so i wouldnt have to manage the messages.
Indeed the scenario you describe is a good fit for message queues.
As you mentioned you don't want to manage the message queue itself, maybe you do not want to manage the workers either? This is where push queues pop in.
Push queues basically call your worker. For example, Amazon ElasticBeanstalk Worker Environments do the heavy lifting (polling) in the background and simply call your application with an HTTP request containing the queue message (refer to the docs for details). I have personally used the AWS push queues and have been happy with how easy they are. Note, that there are other push queue providers like Iron.io.
As you mentioned you are using PHP, there is the QPush Bundle for Symfony, which handles incoming message requests. You may have a look at the code to roll your own solution.
I would recommend a different route, and that would be to use sockets. ZMQ is an example of a socket based library already written. With sockets you can create a Q and manage what to do with messages as they come in. The machine will be in stand-by mode and use minimal resources while waiting for a message to come in.
I have a web application written in PHP using a Postgres database.
The next phase of development is for background batch processes to be built that will need to be executed once a day (or adhoc as requested) for each user of the app. The process will query, await response and process the response from third-party services to feed information into the user's account within the web application.
Are there good ways to do this?
How would batches be triggered every day at 3am for each user?
Given there could be a delay in the response, is this a good scenario to use something like node.js?
Is it best to have the output of the batch process directly update the web application's database
with the appropriate data?
Or, is there some other way to handle the output?
Update: The process doesn't have to run at 3am. The key is that a few batch processes may need to run for each user. The execution of batches could be spread throughout the day.. I want this to be a "background" process separate to the app.
You could write a PHP script that runs through any users that need to be processed, and set up a cron job to run your script at 3am. Running as a cron job means you don't need to worry so much about how slow the third party call is. Obviously you'd need to store any necessary data in the database.
Alternatively, if the process is triggered by the user doing something on the site, you could use exec() to trigger the PHP script to process just that user, right away, without the user having to wait. The risk with this is that you can't control how rapidly the process is triggered.
Third option is to just do the request live and make the user wait. But it sounds like this is not an option for you.
It really depends on what third party you're calling and why. How long does the third party take to respond, how reliable they are, what kind of rate limits they might enforce, etc...
I run a website and my subscriber base is gradually increasing.
I had to manually batch my subscribers, that is Batch A (1-700), Batch B (701 - 1400) etc. and manually trigger the email sending every hour.
In addition, to sending them emails i want to perform some other tasks along side the email.
I believe there should be a way of triggering the message send once from the web interface (that is from my website backend, pls not from the command line), and it batches the emails and processes automatically hourly.
Looking forward to replies on how i can get it done.
Thanks in advance.
If you are unable to schedule cron jobs on your server (as is the case with most cheap hosting solutions), there are some pure php alternatives to run scheduled jobs: phpjobscheduler is one of those alternatives.
In UNIX-like systems, this can be done with cron. In Windows, see the Task Scheduler, schtasks or at.
If you don't have access to these tools, you cannot programatically run scripts with a given period (short of having another machine call your scripts via HTTP).
I am creating a web application using zend, here I create an interface from where user-A can send email to more than one user(s) & it works excellent but it slow the execution time because of which user-A wait too much for the "acknowledged response" ( which will show after the emails have sent. )
In Java there are "Threads" by which we can perform that task (send emails) & it does not slow the rest application.
Is there any technique in PHP/Zend just like in Java by which we can divide our tasks which could take much time eg: sending emails.
EDIT (thanks #Efazati, there seems to be new development in this direction)
http://php.net/manual/en/book.pthreads.php
Caution: (from here on the bottom):
pthreads was, and is, an experiment with pretty good results. Any of its limitations or features may change at any time; [...]
/EDIT
No threads in PHP!
The workaround is to store jobs in a queue (say rows in a table with the emails) and have a cronjob call your php script at a given interval (say 2 minutes) and poll for jobs. When jobs present fetch a few (depending on your php's install timeout) and send emails.
The main idea to defer execution:
main script adds jobs in the queue
cron script sends them in tiny slices
Gotchas:
make sure u don't send an email without deleting from queue (worst case would be if a user rescieves some spam at 2 mins interval ...)
make sure you don't delete a job without executing it first ...
handle bouncing email using a score algorithm
You could look into using multiple processes, such as with fork. The communication between them wouldn't be as simple as with threads (but then, it won't come with all of its pitfalls either), but if you're just sending emails, it might not be necessary to communicate much, if at all.
Watch out for doing forks on an Apache process. You may get some behaviors that you are not expecting. If you are looking to do any kind of asynchronous execution it should be via some kind of queuing mechanism. Gearman is one. Zend Server Job Queue is another. I have some demo code at Do you queue? Introduction to the Zend Server Job Queue. Cron can be used, but you'll have the problem of depending on your cron scheduler to run tasks whereas asynchronous computing often needs to be run immediately. Using a queuing system allows you to do that without threading.
There is a Threading extension being developed based on PThreads that looks promising at https://github.com/krakjoe/pthreads
There is pcntl, which allows you to create sub-processes, but php doesn't work very well for this kind of architecture. You're probably better off creating a long-running script (a daemon) and spawning multiple of them.
As of PHP there are no threads in it. However for php, you can have a look at this roundabout way
http://www.alternateinterior.com/2007/05/multi-threading-strategies-in-php.html
You may want to use a queue system for your email sending and send the email from another system which supports threads. PHP is just a tool and you should the tool that is best fitted for the job.
PHP doesn't include threading as part of the language, there are some methods that can emulate it but they aren't foolproof.
This Google search shows a few potential workarounds