I am handling a project which contain message queue concept. Now the project is in PHP, and it's making more delay in message sending or mail sending. So I suggest to develop a message queue in Perl or Python script. Could you please suggest which is best either PHP or Perl or Python?
A possible solution could be to use Gearman as a queue :
Your PHP project would send messages to Gearman, as background jobs ; and finish
Gearman would dispatch those messages to workers
Workers will deal with the jobs -- doing the stuff that might take time
One additional advantage : the day you need several servers to handle a larger amount of jobs, you'll already have what's needed : Gearman will deal with load-balancing for you.
PHP is perfectly adequate to implement a simple message queue. So, if your current code is causing delays then it is because of your design, not because of some limitation with PHP. Switching to a different language isn't going to help you. Bad code is bad code regardless of language.
The best thing you can probably do is going with an existing message queue. Pascal recommended Gearman. I have worked with (and quite liked) Beanstalkd. If you need a metric ton of features, have a look at ApacheMQ or RabbitMQ.
That said, if you insist on implementing your own message queue, I would suggest sticking with PHP. That way you can re-use code from your existing application (e.g. re-use your models and database API for example).
Here are two alternative for gearman
a. Beanstalkd
b. MemcacheQ
MemcacheQ http://memcachedb.org/memcacheq/
Adding and fetching from queue needs to be done manually using code.
Its not like you send it to queue and MemcacheQ will execute it one by one.
but its very very fast.
Beanstalkd
http://kr.github.com/beanstalkd/download.html
It supports many languages.
Related
Would appreciate some help understanding typical best practices in carrying out a series of tasks using Gearman in conjunction with PHP (among other things).
Here is the basic scenario:
A user uploads a set of image files through a web-based interface. The php code responding to the POST request generates an entry in a database for each file, mostly with null entries in the columns, queues a job for each to do analysis using Gearman, generates a status page and exits.
The Gearman worker gets a job for a file and starts a relatively long-running analysis. The result of that analysis is a set of parameters that need to be inserted back into the database record for that file.
My question is, what is the generally accepted method of doing this? Should I use a callback that will ultimately kick off a different php script that is going to do the modification, or should the worker function itself do the database modification?
Everything is currently running on the same machine; I'm planning on using Gearman for background scheduling, rather than for scaling by farming out to different machines, but in any case any of the functions could connect to the database wherever it is.
Any thoughts appreciated; just looking for some insights on how this typically gets structured and what might be considered best practice.
Are you sure you want to use Gearman? I only ask because it was the defacto PHP job server about 15 years ago but hasn't been a reliable solution for quite some time. I am not sure if things have drastically improved in the last 12 months, but last time I evaluated Gearman, it wasn't production capable.
Now, on to the questions.
what is the generally accepted method of doing this? Should I use a callback that will ultimately kick off a different php script that is going to do the modification, or should the worker function itself do the database modification?
You are going to follow this general pattern with any job queue:
Collect a unit of work. In your case, it will be 1 of the images and any information about who that image belongs to, user id, etc.
Submit the work to the job queue with this information.
Job Queue's worker process picks up the work, and starts processing it. This is where I would create records in the database as you can opt to not create them on job failure.
The job queue is going to track which jobs have completed and usually the status of completion. If you are using gearman, this is the gearmand process. You also need something pickup work and process that work, I will refer to this as the job worker. The job worker is where the concurrency happens which is what i think you were referring to when you said "kick off a different php script." You can just kick off a PHP script at an interval (with supervisord or a cronjob) for a kind of poll & fork approach. It's not the most efficient approach, but it doesn't sound like it will really matter for your applications use case. You could also use pcntl_fork or pthreads in PHP to get more control over your concurrent processes and implement a worker pool pattern, but it is much more complicated than just firing off a script. If you are interested in trying to implement some concurrency in PHP, I have a proof-of-concept job worker for beanstalkd available on GitHub that implements a worker pool with both fork and pthreads. I have also include a couple of other resources on the subject of concurrency.
Job Worker (pthreads)
Job Worker (fork)
PHP Daemon Example
PHP IPC Example
I know Laravel's queue drivers such as redis and beanstalkd and I read that you can increase the number of workers for beanstalkd etc. However I'm just not sure if these solutions are right for my scenario. Here's what I need;
I listen to an XML feed over a socket connection, and the data just keeps coming rapidly. forever. I get tens of XML documents in a second.
I read data from this socket line by line, and once I get to the XML closing tag, I send the buffer to another process to be parsed. I used to just encode the xml in base64, and run a separate php process for each xml. shell_exec('php parse.php' . $base64XML);
This allowed me to parse this never ending xml data quite rapidly. Sort of a manual threading. Now I'd like to utilize the same functionality with Laravel, but I wonder if there is a better way to do it. I believe Artisan::call('command') doesn't push it to the background. I could of course do a shell_exec within Laravel too, but I'd like to know if I can benefit from Beanstalkd or a similar solution.
So the real question is this: How can I set the number of queue workers for beanstalkd or redis drivers? Like I want 20 threads running at the same time. More if possible.
A slightly less important question is: How many threads is too many? If I had a very high-end dedicated server that can process the load just fine, would creating 500 threads/workers with these tools cause any problems on the code level?
Well laravel queues are just made for that.
Basicaly, you have to create a Job Class. All the heavy work you want to do on your xml document need to be here.
Then, you fetch your xml out of the socket, and as soon as you have received one document, you push it on your Queue.
Later, a queue worker will pick it up from the queue, and do the heavy work.
The advantage of that is that if you queue up documents faster than you work on them, the queue will take care of that high load moment and queue up tasks for later.
I also don't recommend you to do it without a queue (with a fork like you did). In fact, if too much documents come in, you'll create too many childs threads and overload your server. Bookkeeping these threads correctly is risky and not worth it when a simple queue with a fixed number of workers solve all these problems out of the box).
After a little more research, I found how to set the number of worker processes. I had missed that part in the documentation. Silly me. I still wonder if this supervisor tool can handle hundreds of workers for situations like mine. Hopefully someone can share their experience, but if not I'll be updating this answer once I do a performance test this week.
I tell you from experience that shell_exec() is not the ideal way to run async tasks in PHP.
Seems ok while developing, but if you have a small vps (1-2 GB ram) you could overload your server and apache/nginx/sql/something could brake while you're not around and your website could be down for hours / days.
I recommend Laravel Queues + Scheduler for these kind of things.
We have a large web application built on PHP. This application allows scheduling tweets and wall posts and there are scheduled emails that go out from the server.
By 'scheduled', I mean that these are PHP scripts scheduled to run at particular time using cron. There are about 7 PHP files that do the above jobs.
I have been hearing about Message Queues. Can anyone explain if Message Queues are the best fit in this scenario? Do Message Queues execute PHP scripts? or do we need to configure this entirely differently? What are the advantages / disadvantages?
Using Crontab to make asynchronous tasks (asynchronous from your PHP code) is a basic approach where using a job/task queue manager is an elaborate one and give you more control, power and scalability/elasticity.
Crontab are very easy to deal with but does not offer a lot of functionalities. It is best for scheduled jobs rather than for asynchronous tasks.
On the other hand, deploying a Task queue (and its message broker) require more time. You have to choose the right tools first then learn how to implement them in your PHP code. But this is the way to go in 2011.
Thank God, I don't do PHP but have played around with Celery (coupled with RabbitMQ) on Python projects ; I am sure you can find something similar in the PHP world.
I am creating a web application using zend, here I create an interface from where user-A can send email to more than one user(s) & it works excellent but it slow the execution time because of which user-A wait too much for the "acknowledged response" ( which will show after the emails have sent. )
In Java there are "Threads" by which we can perform that task (send emails) & it does not slow the rest application.
Is there any technique in PHP/Zend just like in Java by which we can divide our tasks which could take much time eg: sending emails.
EDIT (thanks #Efazati, there seems to be new development in this direction)
http://php.net/manual/en/book.pthreads.php
Caution: (from here on the bottom):
pthreads was, and is, an experiment with pretty good results. Any of its limitations or features may change at any time; [...]
/EDIT
No threads in PHP!
The workaround is to store jobs in a queue (say rows in a table with the emails) and have a cronjob call your php script at a given interval (say 2 minutes) and poll for jobs. When jobs present fetch a few (depending on your php's install timeout) and send emails.
The main idea to defer execution:
main script adds jobs in the queue
cron script sends them in tiny slices
Gotchas:
make sure u don't send an email without deleting from queue (worst case would be if a user rescieves some spam at 2 mins interval ...)
make sure you don't delete a job without executing it first ...
handle bouncing email using a score algorithm
You could look into using multiple processes, such as with fork. The communication between them wouldn't be as simple as with threads (but then, it won't come with all of its pitfalls either), but if you're just sending emails, it might not be necessary to communicate much, if at all.
Watch out for doing forks on an Apache process. You may get some behaviors that you are not expecting. If you are looking to do any kind of asynchronous execution it should be via some kind of queuing mechanism. Gearman is one. Zend Server Job Queue is another. I have some demo code at Do you queue? Introduction to the Zend Server Job Queue. Cron can be used, but you'll have the problem of depending on your cron scheduler to run tasks whereas asynchronous computing often needs to be run immediately. Using a queuing system allows you to do that without threading.
There is a Threading extension being developed based on PThreads that looks promising at https://github.com/krakjoe/pthreads
There is pcntl, which allows you to create sub-processes, but php doesn't work very well for this kind of architecture. You're probably better off creating a long-running script (a daemon) and spawning multiple of them.
As of PHP there are no threads in it. However for php, you can have a look at this roundabout way
http://www.alternateinterior.com/2007/05/multi-threading-strategies-in-php.html
You may want to use a queue system for your email sending and send the email from another system which supports threads. PHP is just a tool and you should the tool that is best fitted for the job.
PHP doesn't include threading as part of the language, there are some methods that can emulate it but they aren't foolproof.
This Google search shows a few potential workarounds
I'm working an image processing website, instead of having lengthy jobs hold up the users browser I want all commands to return fast with a job id and have a background task do the actual work. The id could then be used to check for status and results (ie a url of the processed image). I've found a lot of distributed queue managers for ruby, java and python but I don't know nearly enough of any of those languages to be able to use them.
My own tests have been with shared mysql database to queue jobs, lock them to a worker, and mark them as completed (saving the return data in the db). It was just a messy prototype, and the entire time I felt as if I was reinventing the wheel (and not very elegantly). Does something exist in php (or that I can talk to RESTfully?) that I could use?
Reading around a bit more, I've found that what I'm looking for is a queuing system that has a php api, it doesn't have to be written in php. I've only found classes for use with Amazon's SQS, but not only is that not free, it's also quite latent sometimes (over a minute for a message to show up).
Have you tried ActiveMQ? It makes mention of supporting PHP via the Stomp protocol. Details are available on the activemq site.
I've gotten a lot of mileage out of the database approach your describing though, so I wouldn't worry too much about it.
Do you have full control over server?
MySQL queue could be fine in such case. Have a PHP script that is running constantly (in endless while loop), querying the MySQL database for new "tasks" and sleep()ing in between to reduce the load in idle time.
When each task is completed, mark it in the database and move to the next one.
To prevent that whole thing stops if your script crashes/exists (PHP memory overflow, etc.) you can, for example, place it in inittab (if you use Linux as a server) and init will restart it automatically.
Zend_Framework has a queue class, with a number of implementations of Mysql-backed, SQS and some other back-ends.
Personally, I've had excellent results with BeanstalkD recently, which also has a PHP client. I'm just serialising some data with JSON to throw into it, which gets decoded and run on the worker(s).