Best way to manage long-running php script? - php

I have a PHP script that takes a long time (5-30 minutes) to complete. Just in case it matters, the script is using curl to scrape data from another server. This is the reason it's taking so long; it has to wait for each page to load before processing it and moving to the next.
I want to be able to initiate the script and let it be until it's done, which will set a flag in a database table.
What I need to know is how to be able to end the http request before the script is finished running. Also, is a php script the best way to do this?

Certainly it can be done with PHP, however you should NOT do this as a background task - the new process has to be dissociated from the process group where it is initiated.
Since people keep giving the same wrong answer to this FAQ, I've written a fuller answer here:
http://symcbean.blogspot.com/2010/02/php-and-long-running-processes.html
From the comments:
The short version is shell_exec('echo /usr/bin/php -q longThing.php | at now'); but the reasons "why", are a bit long for inclusion here.
Update +12 years
While this is still a good way to invoke a long running bit of code, it is good for security to limit or even disable the ability of PHP in the webserver to launch other executables. And since this decouples the behaviour of the log running thing from that which started it, in many cases it may be more appropriate to use a daemon or a cron job.

The quick and dirty way would be to use the ignore_user_abort function in php. This basically says: Don't care what the user does, run this script until it is finished. This is somewhat dangerous if it is a public facing site (because it is possible, that you end up having 20++ versions of the script running at the same time if it is initiated 20 times).
The "clean" way (at least IMHO) is to set a flag (in the db for example) when you want to initiate the process and run a cronjob every hour (or so) to check if that flag is set. If it IS set, the long running script starts, if it is NOT set, nothin happens.

You could use exec or system to start a background job, and then do the work in that.
Also, there are better approaches to scraping the web that the one you're using. You could use a threaded approach (multiple threads doing one page at a time), or one using an eventloop (one thread doing multiple pages at at time). My personal approach using Perl would be using AnyEvent::HTTP.
ETA: symcbean explained how to detach the background process properly here.

No, PHP is not the best solution.
I'm not sure about Ruby or Perl, but with Python you could rewrite your page scraper to be multi-threaded and it would probably run at least 20x faster. Writing multi-threaded apps can be somewhat of a challenge, but the very first Python app I wrote was mutlti-threaded page scraper. And you could simply call the Python script from within your PHP page by using one of the shell execution functions.

Yes, you can do it in PHP. But in addition to PHP it would be wise to use a Queue Manager. Here's the strategy:
Break up your large task into smaller tasks. In your case, each task could be loading a single page.
Send each small task to the queue.
Run your queue workers somewhere.
Using this strategy has the following advantages:
For long running tasks it has the ability to recover in case a fatal problem occurs in the middle of the run -- no need to start from the beginning.
If your tasks do not have to be run sequentially, you can run multiple workers to run tasks simultaneously.
You have a variety of options (this is just a few):
RabbitMQ (https://www.rabbitmq.com/tutorials/tutorial-one-php.html)
ZeroMQ (http://zeromq.org/bindings:php)
If you're using the Laravel framework, queues are built-in (https://laravel.com/docs/5.4/queues), with drivers for AWS SES, Redis, Beanstalkd

PHP may or may not be the best tool, but you know how to use it, and the rest of your application is written using it. These two qualities, combined with the fact that PHP is "good enough" make a pretty strong case for using it, instead of Perl, Ruby, or Python.
If your goal is to learn another language, then pick one and use it. Any language you mentioned will do the job, no problem. I happen to like Perl, but what you like may be different.
Symcbean has some good advice about how to manage background processes at his link.
In short, write a CLI PHP script to handle the long bits. Make sure that it reports status in some way. Make a php page to handle status updates, either using AJAX or traditional methods. Your kickoff script will the start the process running in its own session, and return confirmation that the process is going.
Good luck.

I agree with the answers that say this should be run in a background process. But it's also important that you report on the status so the user knows that the work is being done.
When receiving the PHP request to kick off the process, you could store in a database a representation of the task with a unique identifier. Then, start the screen-scraping process, passing it the unique identifier. Report back to the iPhone app that the task has been started and that it should check a specified URL, containing the new task ID, to get the latest status. The iPhone application can now poll (or even "long poll") this URL. In the meantime, the background process would update the database representation of the task as it worked with a completion percentage, current step, or whatever other status indicators you'd like. And when it has finished, it would set a completed flag.

You can send it as an XHR (Ajax) request. Clients don't usually have any timeout for XHRs, unlike normal HTTP requests.

I realize this is a quite old question but would like to give it a shot. This script tries to address both the initial kick off call to finish quickly and chop down the heavy load into smaller chunks. I haven't tested this solution.
<?php
/**
* crawler.php located at http://mysite.com/crawler.php
*/
// Make sure this script will keep on runing after we close the connection with
// it.
ignore_user_abort(TRUE);
function get_remote_sources_to_crawl() {
// Do a database or a log file query here.
$query_result = array (
1 => 'http://exemple.com',
2 => 'http://exemple1.com',
3 => 'http://exemple2.com',
4 => 'http://exemple3.com',
// ... and so on.
);
// Returns the first one on the list.
foreach ($query_result as $id => $url) {
return $url;
}
return FALSE;
}
function update_remote_sources_to_crawl($id) {
// Update my database or log file list so the $id record wont show up
// on my next call to get_remote_sources_to_crawl()
}
$crawling_source = get_remote_sources_to_crawl();
if ($crawling_source) {
// Run your scraping code on $crawling_source here.
if ($your_scraping_has_finished) {
// Update you database or log file.
update_remote_sources_to_crawl($id);
$ctx = stream_context_create(array(
'http' => array(
// I am not quite sure but I reckon the timeout set here actually
// starts rolling after the connection to the remote server is made
// limiting only how long the downloading of the remote content should take.
// So as we are only interested to trigger this script again, 5 seconds
// should be plenty of time.
'timeout' => 5,
)
));
// Open a new connection to this script and close it after 5 seconds in.
file_get_contents('http://' . $_SERVER['HTTP_HOST'] . '/crawler.php', FALSE, $ctx);
print 'The cronjob kick off has been initiated.';
}
}
else {
print 'Yay! The whole thing is done.';
}

I would like to propose a solution that is a little different from symcbean's, mainly because I have additional requirement that the long running process need to be run as another user, and not as apache / www-data user.
First solution using cron to poll a background task table:
PHP web page inserts into a background task table, state 'SUBMITTED'
cron runs once each 3 minutes, using another user, running PHP CLI script that checks the background task table for 'SUBMITTED' rows
PHP CLI will update the state column in the row into 'PROCESSING' and begin processing, after completion it will be updated to 'COMPLETED'
Second solution using Linux inotify facility:
PHP web page updates a control file with the parameters set by user, and also giving a task id
shell script (as a non-www user) running inotifywait will wait for the control file to be written
after control file is written, a close_write event will be raised an the shell script will continue
shell script executes PHP CLI to do the long running process
PHP CLI writes the output to a log file identified by task id, or alternatively updates progress in a status table
PHP web page could poll the log file (based on task id) to show progress of the long running process, or it could also query status table
Some additional info could be found in my post : http://inventorsparadox.blogspot.co.id/2016/01/long-running-process-in-linux-using-php.html

I have done similar things with Perl, double fork() and detaching from parent process. All http fetching work should be done in forked process.

Use a proxy to delegate the request.

what I ALWAYS use is one of these variants (because different flavors of Linux have different rules about handling output/some programs output differently):
Variant I
#exec('./myscript.php \1>/dev/null \2>/dev/null &');
Variant II
#exec('php -f myscript.php \1>/dev/null \2>/dev/null &');
Variant III
#exec('nohup myscript.php \1>/dev/null \2>/dev/null &');
You might havet install "nohup". But for example, when I was automating FFMPEG video converstions, the output interface somehow wasn't 100% handled by redirecting output streams 1 & 2, so I used nohup AND redirected the output.

if you have long script then divide page work with the help of input parameter for each task.(then each page act like thread)
i.e if page has 1 lac product_keywords long process loop then instead of loop make logic for one keyword and pass this keyword from magic or cornjobpage.php(in following example)
and for background worker i think you should try this technique it will help to call as many as pages you like all pages will run at once independently without waiting for each page response as asynchronous.
cornjobpage.php //mainpage
<?php
post_async("http://localhost/projectname/testpage.php", "Keywordname=testValue");
//post_async("http://localhost/projectname/testpage.php", "Keywordname=testValue2");
//post_async("http://localhost/projectname/otherpage.php", "Keywordname=anyValue");
//call as many as pages you like all pages will run at once independently without waiting for each page response as asynchronous.
?>
<?php
/*
* Executes a PHP page asynchronously so the current page does not have to wait for it to finish running.
*
*/
function post_async($url,$params)
{
$post_string = $params;
$parts=parse_url($url);
$fp = fsockopen($parts['host'],
isset($parts['port'])?$parts['port']:80,
$errno, $errstr, 30);
$out = "GET ".$parts['path']."?$post_string"." HTTP/1.1\r\n";//you can use POST instead of GET if you like
$out.= "Host: ".$parts['host']."\r\n";
$out.= "Content-Type: application/x-www-form-urlencoded\r\n";
$out.= "Content-Length: ".strlen($post_string)."\r\n";
$out.= "Connection: Close\r\n\r\n";
fwrite($fp, $out);
fclose($fp);
}
?>
testpage.php
<?
echo $_REQUEST["Keywordname"];//case1 Output > testValue
?>
PS:if you want to send url parameters as loop then follow this answer :https://stackoverflow.com/a/41225209/6295712

Not the best approach, as many stated here, but this might help:
ignore_user_abort(1); // run script in background even if user closes browser
set_time_limit(1800); // run it for 30 minutes
// Long running script here

If the desired output of your script is some processing, not a webpage, then I believe the desired solution is to run your script from shell, simply as
php my_script.php

Related

Is it possible to cancel execution of a loop once started by some external event

I am developing an web application using php and mysql along with AJAX. In one of my script there is a provision to fetch data from mysql table. But what if I want to cancel the execution of the php script which I am calling to get the data, in the middle of the execution? Let me clear it more. Like if it takes say 30 minutes to complete an AJAX call due to the heavy loop and I want to exit from that call before completion by clicking some button. How can I achieve that goal. Otherwise, my script is running well except that it hangs if I don't want to wait for the final AJAX response text and try to switch to other page of the web application.
You can create script like this:
$someStorage->set($sessionId . '_someFuncStop', true);
Call it through AJAX, when button STOP pressed.
In your script with loop check that var from storage
while(1) {
if ($someStorage->get($sessionId . '_someFuncStop') === true ) break;
}
To my best knowledge, PHP doesn't support some sort of event listeners that can interrupt running script by an external cause.
There are 2 paths you might want to consider (if you don't want to write shell scripts on the server that would terminate system processes that execute the script):
ignore_user_abort function, thoug it is not 100% reliable
http://php.net/manual/en/function.ignore-user-abort.php
Inside the loop you wish to terminate, create a database call (or read from a file), where you can set some kind of flag and if that flag is set, run a break/return/die command inside the script. The button you mentioned can then write to database/file and set the interrupt flag.
(In general this is not really useful for script interruptions, since most scripts run in tens of milliseconds and the flag would not be set fast enough to terminate the script, in tens of minutes however, this is a viable solution.)

Php, how to kill the self-request?

Lets imagine a request is done, which lasts for a while, until its running, Php is echo-ing content. To flush the content, I use:
echo str_repeat(' ', 99999).str_repeat(' ', 99999); ob_implicit_flush(true);
so, while the request is being processed, we actually can see an output.
Now I would like to have a button like "stop it" so that from Php I could kill this process (Apache process I guess). How to?
Now I would like to have a button like "stop it" so that from Php I could kill this process
It's not clear what you're asking here. Normally PHP runs in a single thread (of execution that is - not talking about light weight processes here). Further for any language running as CGI/FastCGI/mod_php there aer no input events - input from the HTTP channel is only read once at the beeginning of execution.
It is possible (depending on whether the thread of execution is regularly re-entering the PHP interpreter) to ask PHP to run a function at intervals (register_tick_function()) which could poll for some event communicated via another channel (e.g. a different HTP request setting a semaphore).
Sending an stream of undefined and potentially very large length to the browser is a really bad idea. The right solution (your example is somewhat contrived) may be to to spawn a background process on the webserver and poll the output via Ajax. You would still need to implement some sort of control channel though.
Sometimes the thread of execution goes out of PHP and stays there for a long time. In many cases if the user terminates a PHP script which has a long running database query, the PHP may stop running but the SQL will keep on running until completion. There are solutions - bu you didn't say if that was the problem.

php starting process in background and also waiting for it to finish

I have two scripts (A & B) that a user can call.
If the user calls A, I do a bunch of database access and retrieve a result to return to the user. After I've worked out what I need to send back, I then do a bunch of extra processing and modifying of the database.
I am wondering if it's possible to return the result to the user, and then perform the rest of the processing in some sort of background task.
A further condition would be that if the user that called script A then calls script B, any processing task that user triggered by calling A must be complete, or script B must wait until it completes.
Is there a way to do this?
Php can't perform tasks after closing a request because the request (and the responce sent to browser) are really closed when the php process finish.
Also, php is good for short actions, not long running program like daemons because php lack of a good garbage collector (so it'll eat up all availlable memory before crashing).
What you are looking for is called a queue. When you need to perform some resource (or time) intensive tasks, you put a task into a queue. Then later a worker process will take one item from the queue then perform the task.
This enable you to limit ressource usage by limiting the number of workers to avoid peaks and service failures.
Take a look at resque (for a self hosted solution) or iron.io (for a cloud, setup free solution)
If you are on a shared host (so, no queue and no cron are available) then I recommend you to look at iron.io push queue. That sort of queue will call your server (via HTTP) to send task to it while the queue isn't empty. This way, all the polling/checking queue is done on the iron.io side and you only have to setup a regular page that will perform your task.
Also, if you want the script B to wait for the script A to finish, you'll have to create some sort of locking system. But I'll try to avoid that if I were you because that can cause a deadlock (one thread waiting another, but the other will never finish thus blocking the waiting thread forever)
You could do something like this:
a.php
<?php
echo "hi there!";
//in order to run another program in background you have to
//redirect std e err output and use &
//otherwise php will wait for the output
$cmd = "/usr/bin/php ".__DIR__."/b.php";
`$cmd > /dev/null 2> /dev/null &`;
echo "<br>";
echo "finished!";
b.php
<?php
$f = fopen(__DIR__."/c.txt", "w");
//some long running task
for($i=0; $i<300; $i++){
fwrite($f, "$i\n");
sleep(1);
}
fclose($f);
Notes:
Not the most elegant solution but it does the job.
If you want just one b.php running at time, you can add a pid check.
The process will run with http user (apache or other) make sure it will have the proper permissions.
I guess you are looking for Ignore_user_abort().
This function keeps your script alive for cleanup tasks when the browser has closed the connection.
http://php.net/manual/en/function.ignore-user-abort.php
You can virtually fork off a browser request by a good combination of header(), set_time_limit(), ignore_user_abort(), connection_status().
It's quite funny in combination with Ajax.
to do something like that it's better to use nodejs,PHP is synchronized and do its job line by line.(Also you can do develop asynchronous code in something else like python by asyncore lib).
But for your question ,you should develop something like this:
set_time_limit(0);
ignore_user_abort();
// Working With database
// Echo data to user
ob_end_flush();
ob_flush();
flush();
// Second part of database access,this can be take longer
take look at ob_flush() php function,this will not wait to end process ,then render html for user.

How can I make a scheduler in PHP without the help of cron

How can I make a scheduler in PHP without writing a cron script? Is there any standard solution?
Feature [For example]: sent remainder to all subscriber 24hrs b4 the subscription expires.
The standard solution is to use cron on Unix-like operating systems and Scheduled Tasks on Windows.
If you don't want to use cron, I suppose you could try to rig something up using at. But it is difficult to imagine a situation where cron is a problem but at is A-OK.
The solution I see is a loop (for or while) and a sleep(3600*24);
Execute it through a sending ajax call every set interval of yours through javascript
Please read my final opinion at the bottom before rushing to implement.
Cron really is the best way to schedule things. It's simple, effective and widely available.
Having said that, if cron is not available or you absolutely don't want to use it, two general approaches for a non-cron, Apache/PHP pseudo cron running on a traditional web server, is as follows.
Check using a loadable resource
Embed an image/script/stylesheet/other somewhere on each web page. Images are probably the best supported by browsers (if javascript is turned off there's no guarantee that the browser will even load .js source files). This page will send headers and empty data back to the browser (a 1x1 clear .gif is fine - look at fpassthru)
from the php manual notes
<?php
header("Content-Length: 0");
header("Connection: close");
flush();
// browser should be disconnected at this point
// and you can do your "cron" work here
?>
Check on each page load
For each task you want to automate, you would create some sort of callable API - static OOP, function calls - whatever. On each request you check to see if there is any work to do for a given task. This is similar to the above except you don't use a separate URL for the script. This could mean that the page takes a long time to load while the work is being performed.
This would involve a select query to your database on either a task table that records the last time a task has run, or simply directly on the data in question, in your example, perhaps on a subscription table.
Final opinion
You really shouldn't reinvent the wheel on this if possible. Cron is very easy to set up.
However, even if you decide that, in your opinion, cron is not easy to set up, consider this: for each and every page load on your site, you will be incurring the overhead of checking to see what needs to be done. True cron, on the other hand, will execute command line PHP on the schedule you set up (hourly, etc) which means your server is running the task checking code much less frequently.
Biggest potential problem without true cron
You run the risk of not having enough traffic to your site to actually get updates happening frequently enough.
Create a table of cronjob. In which keep the dates of cron job. Keep a condition, if today date is equal to the date in the creonjob table. then call for a method to execute. This works fine like CRON job.

run php script after every 100ms

Is it possible to run a php script after every 100ms ? This script will check a database for changes and then the changes will be reflected into some other database. I am already doing it using Triggers. But I want to know if there is any other way to do it without using Cron and Triggers. I will be using Linux for this purpose.
Thanks
Running something every 100ms almost means that it runs all the time , might as well create a daemon that continuously loops and executes
or use triggers. Essentially on every database change it will copy to another table/db.
http://codespatter.com/2008/05/06/how-to-use-triggers-to-track-changes-in-mysql/
It is not possible to do this with cron (it has a max frequency of one minute) and this is a really bad idea. You will be running a whole new php interpreter ten times per second, not to mention doing database connection too.
Far better perhaps would be to run one program that re-uses it's connection and checks every second or so.
Sounds a little like you are trying to make your own database replication or sync between two databases.
You could write a daemon to do it, essentially a script which continually runs in memory somewhere to then run whatever code you want to.
So that daemon would then do the database processing for you, and you wouldn't have to call a script over and over again.
Use your favorite programming language and set up a permanent loop to run it every 100ms, then put the script into inittab with 'respawn' (man inittab for complete syntax). Finally, init q to reload init.
It's best if you write a little daemon for that. Use the pcntl functions to do so. In your case you might get away with:
<?php
while (1) {
usleep(100000);
if (pcntl_fork() == 0) {
include("/lib/100ms-script.php");
exit;
}
// else pcntl_wait(); eventually
}
I'm assuming that this is in reference to some type of web page to be created. If so, this sounds like this is a job for Ajax, not PHP. As you may already know PHP processing is done on the server side. Once processing is complete the page is served up to the client.
With Ajax/JavaScript processing can continue via the browser. You can setup a timer that can then be used to communicate with the server. Depending on the output of the response the page may be updated to reflect the necessary changes.

Categories