I have a web-app, written in PHP, and I am using Symfony2 as the core framework.
I need to regularly run thousands of small network requests every 10 minutes or so and I am trying to find the best solution for asynchronously running these jobs, without conflicting or doubling up.
Currently I have a very basic and inelegant solution where a cron job executes a PHP command script. The command synchronously works through each entry in the database and sends a network request. When that request completes (or fails), it moves on to the next one. When it has iterated over all entries, it exists, to be executed again from the cron job.
For the rewrite, I have looked at php-resque and pcntl_fork as solutions for running jobs in parallel, thereby speeding up the execution significantly. I have also looked at running multiple non-blocking socket requests from PHP but, so far, have preferred the simplicity of isolated jobs.
PHP can't do threading in the traditional sense, so what you're trying to do isn't really possible in the way that you're thinking. You've looked at the best solution (probably) in pcntl_fork, but that still won't be truly asynchronous. Have you considered using cron to accomplish this instead?
http://php.net/pthreads
http://github.com/krakjoe/pthreads
Examples on github... and included in distribution, use code in github, it contains fixes not yet released.
Related
I have created a script on PHP that creates cache files from API and it takes around 30 minutes to load the page completely means when it creates all cache files.
I have a concern that my hostinger's customer support is telling me that it won't run for 30 minutes but in some answers, I found that it can run in the background and nothing to worry about until it's loaded.
So is that possible that the cronjob will run up to 30 minutes?
If not what is the best solution to run that cache making script at a specific time in the background like the cronjob does? Please Explain in brief so I can get a way.
Thanks for the great answer.
Ideally, for long running tasks, the task should be hosted in a platform that allows extended operations and defined in a way that it can be externally triggered, this might be in the form of an endpoint in a web API.
Then you can use the cronjob to trigger that process.
Without creating a whole API, you could make this a single endpoint on your website, a hidden page that only the cronjob knows how to call, then run your script from there.
There are lots of ways around this but the methodology is similar just use the cronjob as the trigger to a different process. Move the core logic of your script to a platform that allows the long execution time.
This is a similar post: Run a “long” php-script via Cronjob with an answer that suggests you can try to execute the script without waiting for the response, that is the same expectation with calling an external web process or API, the cronjob should not wait for a response.
It's good practice to limit resources on web server, especially in the shared hosting account. Because, in most cases, it may cause the web server to slow down and Denial of Services situation.
It's recommended to run the script using php-cli and cron.
php-cli offer much more relaxation, such as time and resource limitation. Please also read
Events in MariaDB VS Cron in php - which is better
I have implemented a command in my Symfony setup which grabs a job from the DB and then processes it.
How can I run multiple instances of command at once, to get through jobs quicker. I know that multithreading is not supported in PHP but seeing as the command is called from the shell, I was wondering if there was a workaround.
Call command using:
app/console job:process
The way I would solve this is to use a work queue with multiple workers. It's easier to manage and scale than manually running multiple processes and worrying about concurrency.
The simplest general-purpose queue I've found for working with php/symfony is beanstalkd which you can integrate into symfony2 with the LeezyPheanstalkBundle
In general, I'd suggest using enqueue library. You can choose from a variety of transports available, from the simplest like filesystem and Doctrine DBAL to real once like RabbitMQ and Amazon SQS.
Regarding the consumers, you need sort of process manager. There several options:
http://supervisord.org/ - You need extra service. It has to be configured properly.
A pure PHP process manager like this. Based on Symfony process component and pure PHP code. It can handle process reboot, correct exit on sigterm signal and a lot more.
A php\swoole process manager like this. It requires a swoole PHP extension but it is performance is amazing.
I have written a blog post on how to solve this exact problem. https://plume.baucum.me/~/Absolutely/running-multiple-processes-simultaneously-in-a-symfony-command
It is much too long to rehash everything here, but the basic concept is that your command optionally takes in the job's ID. The command will check if the ID was given. If not then it will grab all the jobs from the DB, loop over them, and recall itself with the job ID parameter. As each command is kicked off you store it in an array, and if the array is too big you sleep, for rate throttling. As commands finish you remove them from the array.
When the command is ran with the job ID it will create a lock using Symfony's lock component so that a job cannot accidentally be processed two times at once. It is important that you unlock the job when it either finishes or errors out. Once it has the ID and the lock it will then call whatever code you have written to actually process the job.
Using this technique I have taken commands that took hours to run, as it synchronously went through each task, into taking only minutes. Make sure to try different throttles to balance resource utilization and time it takes to execute your task.
I'm looking for some ideas to do the following. I need a PHP script to perform certain action for quite a long time. This is an extension for a CMS and this can't be anything else but PHP. It also can't be a command line script because it should be used by common people that will have only the standard means of the CMS. One of the options is having a cron job (most simple hostings have it) that will trigger the script often so that instead of working for a long time it could perform the action step by step preserving its state from one launch to the next one. This is not perfect but I can't see of any other solutions. If the script will be redirecting to itself server will interrupt it. What other options can suit?
Thanks everyone in advance!
What you're talking about is a daemon or long running program that waits for calls by client programs, performs and action, provides a response then keeps on waiting for more calls.
You might be familiar w/ these in the form of Apache & MySQL ;) Anyway PHP is generally OK in this regard, it does have the ability to function over raw sockets as well as fork sub-processes to handle multiple requests simultaneously.
Having said that PHP daemons are a tool where YMMV. Some folks will say they work great, other folks like me will say they have issues w/ interprocess communication and leaking memory even amidst plethora unset() calls.
Anyway you likely won't be able to deploy a daemon of any type on a shared hosting environment. You'll need to get a better server package or stick with a Cron based solution.
Here's a link about writing a PHP daemon.
Also, one more note. Daemons do crash from time to time and therefore you may still need to store state about whats going on, just in case someone trips over the power cord to your shared server :)
I would also suggest that you think about making it a daemon but if not then you can simply use
set_time_limit(0);
ignore_user_abort(true);
at the top to tell it not to time out and not to get interrupted by anything. Then call it from the cron to start it every day or whatever. I have this on many long processing daily tasks and it works great for me. However, it won't be able to easily talk to the outside world (other scripts can't query it or anything -- if that is what you want look into php services) so once you get it running make sure it will stop and have it print its progress to a logfile.
I am developing a website that requires a lot background processes for the site to run. For example, a queue, a video encoder and a few other types of background processes. Currently I have these running as a PHP cli script that contains:
while (true) {
// some code
sleep($someAmountOfSeconds);
}
Ok these work fine and everything but I was thinking of setting these up as a deamon which will give them an actual process id that I can monitor, also I can run them int he background and not have a terminal open all the time.
I would like to know if there is a better way of handling these? I was also thinking about cron jobs but some of these processes need to loop every few seconds.
Any suggestions?
Creating a daemon which you can make calls to and ask questions would seem the sensible option. Depends on wether your hoster permits such things, especially if you're requiring it to do work every few seconds, then definately an OS based service/daemon would seem far more sensible than anything else.
You could create a daemon in PHP, but in my experience this is a lot of hard work and the result is unreliable due to PHP's memory management and error handling.
I had the same problem, I wanted to write my logic in PHP but have it daemonised by a stable program that could restart the PHP script if it failed and so I wrote The Fat Controller.
It's written in C, runs as a daemon and can run PHP scripts, or indeed anything. If the PHP script ends for whatever reason, The Fat Controller will restart it. This means you don't have to take care of daemonising or error recovery - it's all handled for you.
The Fat Controller can also do lots of other things such as parallel processing which is ideal for queue processing, you can read about some potential use cases here:
http://fat-controller.sourceforge.net/use-cases.html
I've done this for 5 years using PHP to run background tasks and its no different to doing in any other language. Just use CRON and lock files. The lock file will prevent multiple instances of your script running.
Also its important to monitor your code and one check I always do to prevent stale lock files from preventing scripts to run is to have second CRON job to check if if the lock file is older than a few minutes and if an instance of the PHP script is running, if not it then removes the lock file.
Using this technique allows you to set your CRON to run the script every minute without issues.
Use the System::Daemon module from PEAR.
One solution (that I really need to try myself, as I may need it) is to use cron, but get the process to loop for five mins or so. Then, get cron to kick it off every five minutes. As one dies, the next one should be finishing (or close to finishing).
Bear in mind that the two may overlap a bit, and so you need to ensure that this doesn't cause a clash (e.g. writing to the same video file). Some simple inter-process communication may be useful, even if it is just writing to a PID file in the temp directory.
This approach is a bit low-tech but helps avoid PHP hanging onto memory over the longer term - sort of in-built task restarts!
Greetings All!
I am having some troubles on how to execute thousands upon thousands of requests to a web service (eBay), I have a limit of 5 million calls per day, so there are no problems on that end.
However, I'm trying to figure out how to process 1,000 - 10,000 requests every minute to every 5 minutes.
Basically the flow is:
1) Get list of items from database (1,000 to 10,000 items)
2) Make a API POST request for each item
3) Accept return data, process data, update database
Obviously a single PHP instance running this in a loop would be impossible.
I am aware that PHP is not a multithreaded language.
I tried the CURL solution, basically:
1) Get list of items from database
2) Initialize multi curl session
3) For each item add a curl session for the request
4) execute the multi curl session
So you can imagine 1,000-10,000 GET requests occurring...
This was ok, around 100-200 requests where occurring in about a minute or two, however, only 100-200 of the 1,000 items actually processed, I am thinking that i'm hitting some sort of Apache or MySQL limit?
But this does add latency, its almost like performing a DoS attack on myself.
I'm wondering how you would handle this problem? What if you had to make 10,000 web service requests and 10,000 MySQL updates from the return data from the web service... And this needs to be done in at least 5 minutes.
I am using PHP and MySQL with the Zend Framework.
Thanks!
I've had to do something similar, but with Facebook, updating 300,000+ profiles every hour. As suggested by grossvogel, you need to use many processes to speed things up because the script is spending most of it's time waiting for a response.
You can do this with forking, if your PHP install has support for forking, or you can just execute another PHP script via the command line.
exec('nohup /path/to/script.php >> /tmp/logfile 2>&1 & echo $!'), $processId);
You can pass parameters (getopt) to the php script on the command line to tell it which "batch" to process. You can have the master script do a sleep/check cycle to see if the scripts are still running by checking for the process id's. I've tested up to 100 scripts running at once in this manner, at which point the CPU load can get quite high.
Combine multiple processes with multi-curl, and you should easily be able to do what you need.
My two suggestions are (a) do some benchmarking to find out where your real bottlenecks are and (b) use batching and cacheing wherever possible.
Mysqli allows multiple-statement queries, so you could definitely batch those database updates.
The http requests to the web service are more likely the culprit, though. Check the API you're using to see if you can get more info from a single call, maybe? To break up the work, maybe you want a single master script to shell out to a bunch of individual processes, each of which makes an api call and stores the results in a file or memcached. The master can periodically read the results and update the db. (Careful to rotate the data store for safe reading and writing by multiple processes.)
To understand your requirements better, you must implement your solution only in PHP? Or you can interface a PHP part with another part written in another language?
If you could not go for another language, try to perform this update maybe as php script that runs in the background and not through the apache.
You can follow Brent Baisley advice for a simple use case.
If you want to build a robuts solution, then you need to :
set up a representation of the actions in a table in database that will be your process queue;
set up a script that pop this queue and process your action;
set up a cron daemon that run this script every x.
This way you can have 1000 PHP scripts running, using your OS parallelism capabilities and not hanging when ebay is taking to to respond.
The real advantage of this system is that you can fully control the firepower you throw at your task by adjusting :
the number of request one PHP script does;
the order / number / type / priority of the action in the queue;
the number or scripts the cron daemon runs.
Thanks everyone for the awesome and quick answers!
The advice from Brent Baisley and e-satis works nicely, rather than executing the sub-processes using CURL like i did before, the forking takes a massive load off, it also nicely gets around the issues with max out my apache connection limit.
Thanks again!
It is true that PHP is not multithreaded, but it can certainly be setup with multiple processes.
I have created a system that resemebles the one that you are describing. It's running in a loop and is basically a background process. It uses up to 8 processes for batch processing and a single control process.
It is somewhat simplified because i do not have to have any communication between the processes. Everything resides in a database so each process is spawned with the full context taken from the database.
Here is a basic description of the system.
1. Start control process
2. Check database for new jobs
3. Spawn child process with the job data as a parameter
4. Keep a table of the child processes to be able to control the number of simultaneous processes.
Unfortunately it does not appear to be a widespread idea to use PHP for this type of application, and i really had to write wrappers for the low level functions.
The manual has a whole section on these functions, and it appears that there are methods for allowing IPC as well.
PCNTL has the functions to control forking/child processes, and Semaphore covers IPC.
The interesting part of this is that i'm able to fork off actual PHP code, not execute other programs.