I have some php code, that execute for a very long time.
I need to realise next scheme:
User enter on some page(page 1)
This page starts execution of my large PHP script in background .(Every change is writting to database)
We sent every N seconds query to database to get current status of execution.
I don't want to use exec command because 1000 users makes 1000 php processes. It's not way for me...
So you basically want a queue (possibly stored in a database) and a command line script ran by cron that process queued items.
Clarification: I'm not sure about what's unclear about my answer, but this complies with the two requirements imposed by the question:
The script cannot be aborted by the client
You share a single process between 1,000 clients
Use http requests to the local http server from within your script in combination with phps ignore_client_abort() function.
That way you keep the load inside the http servers worker processes, have a natural limit and queuing of requests comes for free.
You can use CLI to execute multiple PHP scripts
or
you can try Easy Parallel Processing in PHP
Related
First of all sorry to post a question that seems to have been flogged to death on SO before. However, none of the questions I have reviewed helped me to solve my specific problem.
I have built a web application that runs an extensive data processing routine in PHP (i.e. MySQL queries, calculations, etc.).
Depending on the amount of data fed to the app this processing can take quite a long time so the script needs to run server-side and independently from the web front-end.
There is a problem, however. It seems I cannot control the script execution time limit as long as the script is invoked via cgi.
When I run the script via SSH and the command line it works fine for however long it takes to process the data.
But if I use the exec() command in a php script called via the webserver I always ends up with the error End of script output before headers after approximately 45 seconds.
Rather than having to fiddle with server settings (a nightmare in terms of portability) I would like to find a solution that kicks off the script independently from cgi.
Any suggestions?
Don't execute the long script directly from the website (AKA, directly from Apache) because, as you've mentioned, it will block until it finishes and potentially time out. Instead, use the website to schedule a job (an execution of the long script) to be run immediately.
Here is a basic outline of how you can potentially do this:
Create a new, small database to store job requests, including fields job_id, processing_status, run_start_time, and more relevant fields
Create some Ajax that hits your server and writes a "job request" to this jobs database, set to execute immediately.
Add a crontab script or bot that periodically watches for new jobs. If it finds a job that is yet to be processed but has passed the run_start_time, run it using exec() or some other command executor. This way the command won't timeout because it is not being run by Apache, but by the cron daemon.
When the command finishes, update the jobs database saying that processing is finished.
From your website, write a frontend that allows the user to see if the requested job is finished yet. Once it finishes, it displays some kind of "Done" indicator or something similar.
I've written a PHP script which performs web scraping from one site and parse input for my website.
The script is driven by cronjob periodically, and everything is hosted in a shared web-server.
The problem is: my script terminates several times a day, with no error message and in random location in code each time.
The script is long, performing 2 HTTP Gets and 4 HTTP Posts to other country website, each HTTP request takes ~3 seconds to complete; it also writes to files and r/w to/from MySql database.
I'm stuck on it after trying the following things:
1) Talking with my hosting support (IxWebHosting) - they just wasted my time, denying their responsibility and advised me to limit the cronjob periodicy to 5 minute rate maximum (before it was 3 minutes interval, however it didn't change anything.)
2) Instead of running from cronjob context, I've switched to the following method:
a. Cronjob calls a "loader PHP script" every 5 minutes.
b. The "loader PHP script" calls the real PHP script using HTTP Get and terminates before waiting for an answer.
c. The real PHP script perform its ~20 seconds job (here is where the program terminates in random location).
3) Put some log file timestamp writing in many places along the code in order to see where program terminated each run - this showed me the program terminates everywhere in the code.
4) In order to prove it's not my code fault I've performed the following test:
a. Cronjob calls another loader PHP script.
b. The PHP script performs HTTP request to a different testing-purpose PHP script and terminates without waiting for response.
c. The 2nd PHP script will perform dummy 20 seconds task: sleep for a second and write timestamp into log file for 20 times.
Result: the test succeeded! the 2nd program didn't fail... which means it has something to do with my code and the webserver I'm running at - however since it fails everytime in different place and only ~10 times a day (from 288 times it runs a day) then I can't tell where it is (and no error message of PHP).
Thanks in advance, sorry for long description - I'll be happy to provide more details upon request.
Are you logging the actual process, rather than writing logs from within the process ? e.g. does your cron job look like:
* * * * * /home/user/myTroublesomeJob.php 2>&1 >/tmp/crash.log
This will catch the stdout/sterr of the process itself. It may also be worth invoking your script from a parent shell script, and that can catch the PHP process exiting, and dump out the exit code (which would indicate a core dump, a signal being caught etc.). See here for more info.
Try setting the timeout at the start of the script.
set_time_limit(1800);
I found recently that if I ran a script manually, it went fine, but if it was run by Cron, it would throw timeout errors. Putting this limit in helped.
If you are running a script on a Shared server then it will not allow you to run long running Scripts.
If you script takes time then please use a dedicated server. Because in shared server many user are using shared resources so server automatically kills a script which is using extra resources.
..I will suggest you to use amazon EC2 free package. There you will be able to run long running scripts.
Thanks
For example, there is a very simple PHP script which updates some tables on database, but this process takes a long time (maybe 10 minutes). Therefore, I want this script to continue processing even if the user closed the browser, because sometimes users do not wait and they close the browser or go to another webpage.
If the task takes 10 minutes, do not use a browser to execute it directly. You have lots of other options:
Use a cronjob to execute the task
periodically.
Have the browser
request insert a new row into a
database table so that a regular
cronjob can process the new row and
execute the PHP script with the
appropriate arguments
Have the
browser request write a message to
queue system, which has a subscriber
listening for such events (which then
executes the script).
While some of these suggestions are probably overkill for your situation, the key, combining feature is to de-couple the browser request from the execution of the job, so that it can be completed asynchronously.
If you need the browser window updated with progress, you will need to use a periodically-executed AJAX request to retrieve the job status.
To answer your question directly, see ignore_user_abort
More broadly, you probably have an architecture problem here.
If many users can initiate this stuff, you'll want the web application to add jobs to some kind of queue, and have a set number of background processes that chew through all the work.
The PHP script will keep running after the client terminates the connection (not doing so would be a security risk), but only up to max_execution_time (set in php.ini or through a PHP script, generally 30 seconds by default)..
For example:
<?php
$fh = fopen("bluh.txt", 'w');
for($i=0; $i<20; $i++) {
echo $i."<br/>";
fwrite($fh,$i."\n");
sleep(1);
}
fclose($fh);
?>
Start running that in your browser and close the browser before it completes. You'll find that after 20 seconds the file contains all of the values of $i.
Change the upper bound of the for loop to 100 instead of 20, and you'll find it only runs from 0 to 29. Because of PHP's max_execution_time the script times out and dies.
if the script is completely server based (no feedback to the user) this will be done even if the client is closed.
The general architecture of PHP is that a clients send a request to a script that gives a reply to the user. if nothing is given back to the user the script will still execute even if the user is not on the other side anymore. More simpler: their is no constant connection between server and client on a regular script.
You can make the PHP script run every 20 minutes using a crontab file which contains the time and what command to run in this case it would be the php script.
Yes. The server doesn't know if the user closed the browser. At least it doesn't notice that immediately.
No: the server probably (depending of how it is configured) won't allow for a php script to run for 10 minutes. On a cheap shared hosting I wouldn't rely on a script running for longer than a reasonable response time.
A server-side script will go on what it is doing regardless of what the client is doing.
EDIT: By the way, are you sure that you want to have pages that take 10 minutes to open? I suggest you to employ a task queue (whose items are executed by cron on a timely basis) and redirect user to a "ok, I am on it" page.
I have a simple messaging queue setup and running using the Zend_Queue object heirarchy. I'm using a Zend_Queue_Adapter_Db back-end. I'm interested in using this as a job queue, to schedule things for processing at a later time. They're jobs that don't need to happen immediately, but should happen sooner rather than later.
Is there a best-practices/standard way to setup your infrastructure to run jobs? I understand the code for receiving a message from the queue, but what's not so clear to me is how run the program that does that receiving. A cron that receives n messages on the command-line, run once a minute? A cron that fires off multiple web requests, each web request running the receiver script? Something else?
Tangential bonus question. If I'm running other queries with Zend_Db, will the message queue queries be considered part of that transaction?
You can do it like a thread pool. Create a command line php script to handle the receiving. It should be started by a shell script that automatically restarts the process if it dies. The shell script should not start the process if it is already running (use a $pid.running file or similar). Have cron run several of these every 1-10 minutes. That should handle the receiving nicely.
I wouldn't have the cron fire a web request unless your cron is on another server for some strange reason.
Another way to use this would be to have some backround process creating data, and a web user(s) consume it as they naturally browse the site. A report generator might work this way. Company-wide reports are available to all users but you don't want them all generating this db/time intensive report. So you create a queue and process one at a time possible removing duplicates. All users can view the report(s) when ready.
According to the docs it doens't look like the zend db is even using the same connection as your other zend_db queries. But of course the best way to find out is to make a simple test.
EDIT
The multiple lines in the cron are for concurrency. each line represents a worker for the pool. I was not clear, you don't want the pid as the identifier, you want to pass that as a parameter.
/home/byron/run_queue.sh Process1
/home/byron/run_queue.sh Process2
/home/byron/run_queue.sh Process3
The bash script would check for the $process.running file if it finds it exit.
otherwise:
Create the $process.running file.
start the php process. Block/wait until finished.
Delete the $process.running file.
This allows for the php script to die but not cause the pool to loose a worker.
If the queue is empty the php script exits immediately and is started again by the nex invocation of cron.
Greetings All!
I am having some troubles on how to execute thousands upon thousands of requests to a web service (eBay), I have a limit of 5 million calls per day, so there are no problems on that end.
However, I'm trying to figure out how to process 1,000 - 10,000 requests every minute to every 5 minutes.
Basically the flow is:
1) Get list of items from database (1,000 to 10,000 items)
2) Make a API POST request for each item
3) Accept return data, process data, update database
Obviously a single PHP instance running this in a loop would be impossible.
I am aware that PHP is not a multithreaded language.
I tried the CURL solution, basically:
1) Get list of items from database
2) Initialize multi curl session
3) For each item add a curl session for the request
4) execute the multi curl session
So you can imagine 1,000-10,000 GET requests occurring...
This was ok, around 100-200 requests where occurring in about a minute or two, however, only 100-200 of the 1,000 items actually processed, I am thinking that i'm hitting some sort of Apache or MySQL limit?
But this does add latency, its almost like performing a DoS attack on myself.
I'm wondering how you would handle this problem? What if you had to make 10,000 web service requests and 10,000 MySQL updates from the return data from the web service... And this needs to be done in at least 5 minutes.
I am using PHP and MySQL with the Zend Framework.
Thanks!
I've had to do something similar, but with Facebook, updating 300,000+ profiles every hour. As suggested by grossvogel, you need to use many processes to speed things up because the script is spending most of it's time waiting for a response.
You can do this with forking, if your PHP install has support for forking, or you can just execute another PHP script via the command line.
exec('nohup /path/to/script.php >> /tmp/logfile 2>&1 & echo $!'), $processId);
You can pass parameters (getopt) to the php script on the command line to tell it which "batch" to process. You can have the master script do a sleep/check cycle to see if the scripts are still running by checking for the process id's. I've tested up to 100 scripts running at once in this manner, at which point the CPU load can get quite high.
Combine multiple processes with multi-curl, and you should easily be able to do what you need.
My two suggestions are (a) do some benchmarking to find out where your real bottlenecks are and (b) use batching and cacheing wherever possible.
Mysqli allows multiple-statement queries, so you could definitely batch those database updates.
The http requests to the web service are more likely the culprit, though. Check the API you're using to see if you can get more info from a single call, maybe? To break up the work, maybe you want a single master script to shell out to a bunch of individual processes, each of which makes an api call and stores the results in a file or memcached. The master can periodically read the results and update the db. (Careful to rotate the data store for safe reading and writing by multiple processes.)
To understand your requirements better, you must implement your solution only in PHP? Or you can interface a PHP part with another part written in another language?
If you could not go for another language, try to perform this update maybe as php script that runs in the background and not through the apache.
You can follow Brent Baisley advice for a simple use case.
If you want to build a robuts solution, then you need to :
set up a representation of the actions in a table in database that will be your process queue;
set up a script that pop this queue and process your action;
set up a cron daemon that run this script every x.
This way you can have 1000 PHP scripts running, using your OS parallelism capabilities and not hanging when ebay is taking to to respond.
The real advantage of this system is that you can fully control the firepower you throw at your task by adjusting :
the number of request one PHP script does;
the order / number / type / priority of the action in the queue;
the number or scripts the cron daemon runs.
Thanks everyone for the awesome and quick answers!
The advice from Brent Baisley and e-satis works nicely, rather than executing the sub-processes using CURL like i did before, the forking takes a massive load off, it also nicely gets around the issues with max out my apache connection limit.
Thanks again!
It is true that PHP is not multithreaded, but it can certainly be setup with multiple processes.
I have created a system that resemebles the one that you are describing. It's running in a loop and is basically a background process. It uses up to 8 processes for batch processing and a single control process.
It is somewhat simplified because i do not have to have any communication between the processes. Everything resides in a database so each process is spawned with the full context taken from the database.
Here is a basic description of the system.
1. Start control process
2. Check database for new jobs
3. Spawn child process with the job data as a parameter
4. Keep a table of the child processes to be able to control the number of simultaneous processes.
Unfortunately it does not appear to be a widespread idea to use PHP for this type of application, and i really had to write wrappers for the low level functions.
The manual has a whole section on these functions, and it appears that there are methods for allowing IPC as well.
PCNTL has the functions to control forking/child processes, and Semaphore covers IPC.
The interesting part of this is that i'm able to fork off actual PHP code, not execute other programs.