Php, how to kill the self-request?

Php, how to kill the self-request? - php

Lets imagine a request is done, which lasts for a while, until its running, Php is echo-ing content. To flush the content, I use:
echo str_repeat(' ', 99999).str_repeat(' ', 99999); ob_implicit_flush(true);
so, while the request is being processed, we actually can see an output.
Now I would like to have a button like "stop it" so that from Php I could kill this process (Apache process I guess). How to?

Now I would like to have a button like "stop it" so that from Php I could kill this process
It's not clear what you're asking here. Normally PHP runs in a single thread (of execution that is - not talking about light weight processes here). Further for any language running as CGI/FastCGI/mod_php there aer no input events - input from the HTTP channel is only read once at the beeginning of execution.
It is possible (depending on whether the thread of execution is regularly re-entering the PHP interpreter) to ask PHP to run a function at intervals (register_tick_function()) which could poll for some event communicated via another channel (e.g. a different HTP request setting a semaphore).
Sending an stream of undefined and potentially very large length to the browser is a really bad idea. The right solution (your example is somewhat contrived) may be to to spawn a background process on the webserver and poll the output via Ajax. You would still need to implement some sort of control channel though.
Sometimes the thread of execution goes out of PHP and stays there for a long time. In many cases if the user terminates a PHP script which has a long running database query, the PHP may stop running but the SQL will keep on running until completion. There are solutions - bu you didn't say if that was the problem.

Related

PHP - Responding to incoming API call without returning

I'm trying to figure out the best way to send a response to an incoming API POST to our system, without returning.
We receive an XML post from another service, and consume it, and then our response to them is in XML also. What we are currently doing is digesting the incoming post, do some stuff on our end, then do a php return with the XML.
I would like to change this so that we can respond to their call with the XML, but then do some processing after the fact, without making some type of exec/background call.
What's the best way to send a response in PHP without returning? If we do an "echo" will that close the connection and allow us to process more afterwards without the other server "waiting?

Calling PHP's echo will not close the connection, in fact you can call echo multiple times in your PHP script and the output will be added to the response. The connection will only close when
The end of the script is reached
exit() or the alias die() are called
A fatal/parse error or an uncaught Exception occurs or your server runs out of memory
The maximum script execution time which you can set in php.ini is exceeded
Usually, the calling client code will also have some kind of timeout, so if your 'digesting' code could take longer and you want to take care about this as well as point 4 in the list, you can store the request data for later processing, for example in a database or serialized in files. Having successfully stored the data, you then have basically 2 options to go:
Option 1: Spawn a background PHP process
To spawn a background PHP process that will survive the livecycle of the calling script, use exec and nohup. The basic usage could look like this:
exec('RESOURCE_ID=123 nohup /path/to/your/php/executable your_script.php > /dev/null');
Within the first segment of the command, RESOURCE_ID=123, you can pass a unique identifier of the previously stored request data, maybe a database entry id or the storage filename, to the background script. Use getenv('RESOURCE_ID') in your background script to retrieve the variable.
[EDIT] > /dev/null for output redirection is crucial for running the process in the background, otherwise the parent script will wait for the output of the background process. I also suggest to write the output as well as error output to an actual file like &> my_script.out, this will have the same effect. You could also get the process id of the background process by appending & echo $! to the command, exec() will then return it.
After starting the background script, you can send your 'OK' response and exit the parent script.
Option 2: Cronjob for processing, as suggested by Jim Panse
The more complex your system grows, you probably need more control over the execution of your 'digesting' code. Perhaps you want to balance server load peaks, or restart failed tasks, or throttle malicious usage of your API. If you need this kind of control, you are better off with this option.

Since i guess you want your system-to-system communication synchronously there are multiple things you can consider.
Even though time consuming requests you usually still want a fast response.
To satisfy this you can't process the request immediately.
So, just save the request and process it later (give the client a 202 response back). Systems like queues are very popular to save time consuming jobs for running them later. Another time controlled script (cronjob) could then do polling and process the stacked messages/data.
If you want to provide the results to the client too, return them a unique resource id on the initial rest call and implement another resource with exactly this parameter as the input. If your system finished processing, the result will appear there.
Spawning a process from within another php script isn't very handy since it's very difficult to debug and error-prone.
I personally would't go for this solution.

php: flush data and end client connection

I have a php-script (in a normal LAMP environment) that runs a couple of housekeeping-tasks at the end of script.
I use flush() to push all the data to the client, which works fine (the page is fully loaded), but the browser still waits for data (indicated by the "loading"-animation) which is confusing for the user but of course clear because Apache cannot know whether PHP will generate more output after flush() - in my case it never does, however.
Is there a way to tell the client that the output is finished and the http-connection should be closed immediately even though the script keeps running?

It sounds like you have a long running script performing varioous tasks. Especially it appears to script goes on doing things after it has sent the reply to the client. This is a design that opens a whole lot of potential problems. You should re-think your architecture.
Keep house keeping tasks and client communication strictly separate. For example you could have a client request processed and trigger internal sub requests (which you can detach from) or deligate tasks to a cron like system. Then offer a second view to the client which visualized the progress and result of those tasks. This approach is much safer, more flexible and easier to extend when required. And your problem at hand is solved, too :-)

you can use this function fastcgi_finish_request() special function to finish request and flush all data while continuing to do something time-consuming (video converting, stats processing etc.); http://php.net/manual/en/install.fpm.php but you need to install FPM for it like
<?php
echo "You can see this from the browser immediately.<br>";
fastcgi_finish_request();
sleep(10);
echo "You can't see this form the browser.";
?>

Dead Man's Switch in PHP/Python

So this is as much a theoretical question as a language-specific one, but consider this:
I need PHP to execute a rather system-intensive process (using PHP exec();) that will be running in the background, but then when a user leaves that specific page, the process will be killed.
I quickly realized that a dead man's switch would be an easy way to implement this since I'm not making use of any session variables or other server-side variables, which could end up looking like:
if($_SERVER['REQUEST_URI'] !== 'page_with_session.php'){
//Instead of 'session_destroy();' this would be used to kill said process
}
In any case, a while loop in PHP, resetting a timer in a Python script or re-calling said script every 15 seconds so that it doesn't get to the end and kill the process. However, when the user leaves the page, the script will have been called but not able to reset before killing the process.
Are there any gaping holes in this idea? If not, how would the implementation in PHP/JS look? The order I see it working in would be:
Page is hit by user
<?php exec('killer.py') ?>
killer.py:
Listen for 20 seconds - If no response...
os.system('pkill process')
<?php while(true){sleep(15); exec('killer.py no_wait_dont');} ?>
Any thoughts you guys have would be greatly appreciated!
Mason

Javascript is a lot easier, and about as safe (that is, not much).
Just write a javascript ping function that, once every 10 seconds, posts something to ping.php (via ajax). This ping.php would log when the last ping was received in the user session (say in $_SESSION['last_ping'])
You can check for user activity from other pages by comparing $_SESSION['last_ping'] to the current time. You would have to pepper your runtime-intensive pages with this, but it would certainly work.

Implement a heartbeat in JS. If it stops for more than a certain time then kill the subprocess:
js sends a request
php/python start the subprocess in a background and return pid to js
js pings php/python with given pid
php/python signals the subprocess corresponding to pid via IPC e.g., by sending SIGUSR1
if subprocess doesn't receive a signal in time; it dies i.e., there is a constant self-destruct countdown in the subprocess
If you can't add the self-destruct mechanism to the subprocess then you need a watcher process that would receive signals and kill the subprocess. It is less reliable because you need to make sure that the watcher process is running.

Best way to manage long-running php script?

I have a PHP script that takes a long time (5-30 minutes) to complete. Just in case it matters, the script is using curl to scrape data from another server. This is the reason it's taking so long; it has to wait for each page to load before processing it and moving to the next.
I want to be able to initiate the script and let it be until it's done, which will set a flag in a database table.
What I need to know is how to be able to end the http request before the script is finished running. Also, is a php script the best way to do this?

Certainly it can be done with PHP, however you should NOT do this as a background task - the new process has to be dissociated from the process group where it is initiated.
Since people keep giving the same wrong answer to this FAQ, I've written a fuller answer here:
http://symcbean.blogspot.com/2010/02/php-and-long-running-processes.html
From the comments:
The short version is shell_exec('echo /usr/bin/php -q longThing.php | at now'); but the reasons "why", are a bit long for inclusion here.
Update +12 years
While this is still a good way to invoke a long running bit of code, it is good for security to limit or even disable the ability of PHP in the webserver to launch other executables. And since this decouples the behaviour of the log running thing from that which started it, in many cases it may be more appropriate to use a daemon or a cron job.

The quick and dirty way would be to use the ignore_user_abort function in php. This basically says: Don't care what the user does, run this script until it is finished. This is somewhat dangerous if it is a public facing site (because it is possible, that you end up having 20++ versions of the script running at the same time if it is initiated 20 times).
The "clean" way (at least IMHO) is to set a flag (in the db for example) when you want to initiate the process and run a cronjob every hour (or so) to check if that flag is set. If it IS set, the long running script starts, if it is NOT set, nothin happens.

You could use exec or system to start a background job, and then do the work in that.
Also, there are better approaches to scraping the web that the one you're using. You could use a threaded approach (multiple threads doing one page at a time), or one using an eventloop (one thread doing multiple pages at at time). My personal approach using Perl would be using AnyEvent::HTTP.
ETA: symcbean explained how to detach the background process properly here.

No, PHP is not the best solution.
I'm not sure about Ruby or Perl, but with Python you could rewrite your page scraper to be multi-threaded and it would probably run at least 20x faster. Writing multi-threaded apps can be somewhat of a challenge, but the very first Python app I wrote was mutlti-threaded page scraper. And you could simply call the Python script from within your PHP page by using one of the shell execution functions.

Yes, you can do it in PHP. But in addition to PHP it would be wise to use a Queue Manager. Here's the strategy:
Break up your large task into smaller tasks. In your case, each task could be loading a single page.
Send each small task to the queue.
Run your queue workers somewhere.
Using this strategy has the following advantages:
For long running tasks it has the ability to recover in case a fatal problem occurs in the middle of the run -- no need to start from the beginning.
If your tasks do not have to be run sequentially, you can run multiple workers to run tasks simultaneously.
You have a variety of options (this is just a few):
RabbitMQ (https://www.rabbitmq.com/tutorials/tutorial-one-php.html)
ZeroMQ (http://zeromq.org/bindings:php)
If you're using the Laravel framework, queues are built-in (https://laravel.com/docs/5.4/queues), with drivers for AWS SES, Redis, Beanstalkd

PHP may or may not be the best tool, but you know how to use it, and the rest of your application is written using it. These two qualities, combined with the fact that PHP is "good enough" make a pretty strong case for using it, instead of Perl, Ruby, or Python.
If your goal is to learn another language, then pick one and use it. Any language you mentioned will do the job, no problem. I happen to like Perl, but what you like may be different.
Symcbean has some good advice about how to manage background processes at his link.
In short, write a CLI PHP script to handle the long bits. Make sure that it reports status in some way. Make a php page to handle status updates, either using AJAX or traditional methods. Your kickoff script will the start the process running in its own session, and return confirmation that the process is going.
Good luck.

I agree with the answers that say this should be run in a background process. But it's also important that you report on the status so the user knows that the work is being done.
When receiving the PHP request to kick off the process, you could store in a database a representation of the task with a unique identifier. Then, start the screen-scraping process, passing it the unique identifier. Report back to the iPhone app that the task has been started and that it should check a specified URL, containing the new task ID, to get the latest status. The iPhone application can now poll (or even "long poll") this URL. In the meantime, the background process would update the database representation of the task as it worked with a completion percentage, current step, or whatever other status indicators you'd like. And when it has finished, it would set a completed flag.

You can send it as an XHR (Ajax) request. Clients don't usually have any timeout for XHRs, unlike normal HTTP requests.

I realize this is a quite old question but would like to give it a shot. This script tries to address both the initial kick off call to finish quickly and chop down the heavy load into smaller chunks. I haven't tested this solution.
<?php
/**
* crawler.php located at http://mysite.com/crawler.php
*/
// Make sure this script will keep on runing after we close the connection with
// it.
ignore_user_abort(TRUE);
function get_remote_sources_to_crawl() {
// Do a database or a log file query here.
$query_result = array (
1 => 'http://exemple.com',
2 => 'http://exemple1.com',
3 => 'http://exemple2.com',
4 => 'http://exemple3.com',
// ... and so on.
);
// Returns the first one on the list.
foreach ($query_result as $id => $url) {
return $url;
}
return FALSE;
}
function update_remote_sources_to_crawl($id) {
// Update my database or log file list so the $id record wont show up
// on my next call to get_remote_sources_to_crawl()
}
$crawling_source = get_remote_sources_to_crawl();
if ($crawling_source) {
// Run your scraping code on $crawling_source here.
if ($your_scraping_has_finished) {
// Update you database or log file.
update_remote_sources_to_crawl($id);
$ctx = stream_context_create(array(
'http' => array(
// I am not quite sure but I reckon the timeout set here actually
// starts rolling after the connection to the remote server is made
// limiting only how long the downloading of the remote content should take.
// So as we are only interested to trigger this script again, 5 seconds
// should be plenty of time.
'timeout' => 5,
)
));
// Open a new connection to this script and close it after 5 seconds in.
file_get_contents('http://' . $_SERVER['HTTP_HOST'] . '/crawler.php', FALSE, $ctx);
print 'The cronjob kick off has been initiated.';
}
}
else {
print 'Yay! The whole thing is done.';
}

I would like to propose a solution that is a little different from symcbean's, mainly because I have additional requirement that the long running process need to be run as another user, and not as apache / www-data user.
First solution using cron to poll a background task table:
PHP web page inserts into a background task table, state 'SUBMITTED'
cron runs once each 3 minutes, using another user, running PHP CLI script that checks the background task table for 'SUBMITTED' rows
PHP CLI will update the state column in the row into 'PROCESSING' and begin processing, after completion it will be updated to 'COMPLETED'
Second solution using Linux inotify facility:
PHP web page updates a control file with the parameters set by user, and also giving a task id
shell script (as a non-www user) running inotifywait will wait for the control file to be written
after control file is written, a close_write event will be raised an the shell script will continue
shell script executes PHP CLI to do the long running process
PHP CLI writes the output to a log file identified by task id, or alternatively updates progress in a status table
PHP web page could poll the log file (based on task id) to show progress of the long running process, or it could also query status table
Some additional info could be found in my post : http://inventorsparadox.blogspot.co.id/2016/01/long-running-process-in-linux-using-php.html

I have done similar things with Perl, double fork() and detaching from parent process. All http fetching work should be done in forked process.

Use a proxy to delegate the request.

what I ALWAYS use is one of these variants (because different flavors of Linux have different rules about handling output/some programs output differently):
Variant I
#exec('./myscript.php \1>/dev/null \2>/dev/null &');
Variant II
#exec('php -f myscript.php \1>/dev/null \2>/dev/null &');
Variant III
#exec('nohup myscript.php \1>/dev/null \2>/dev/null &');
You might havet install "nohup". But for example, when I was automating FFMPEG video converstions, the output interface somehow wasn't 100% handled by redirecting output streams 1 & 2, so I used nohup AND redirected the output.

if you have long script then divide page work with the help of input parameter for each task.(then each page act like thread)
i.e if page has 1 lac product_keywords long process loop then instead of loop make logic for one keyword and pass this keyword from magic or cornjobpage.php(in following example)
and for background worker i think you should try this technique it will help to call as many as pages you like all pages will run at once independently without waiting for each page response as asynchronous.
cornjobpage.php //mainpage
<?php
post_async("http://localhost/projectname/testpage.php", "Keywordname=testValue");
//post_async("http://localhost/projectname/testpage.php", "Keywordname=testValue2");
//post_async("http://localhost/projectname/otherpage.php", "Keywordname=anyValue");
//call as many as pages you like all pages will run at once independently without waiting for each page response as asynchronous.
?>
<?php
/*
* Executes a PHP page asynchronously so the current page does not have to wait for it to finish running.
*
*/
function post_async($url,$params)
{
$post_string = $params;
$parts=parse_url($url);
$fp = fsockopen($parts['host'],
isset($parts['port'])?$parts['port']:80,
$errno, $errstr, 30);
$out = "GET ".$parts['path']."?$post_string"." HTTP/1.1\r\n";//you can use POST instead of GET if you like
$out.= "Host: ".$parts['host']."\r\n";
$out.= "Content-Type: application/x-www-form-urlencoded\r\n";
$out.= "Content-Length: ".strlen($post_string)."\r\n";
$out.= "Connection: Close\r\n\r\n";
fwrite($fp, $out);
fclose($fp);
}
?>
testpage.php
<?
echo $_REQUEST["Keywordname"];//case1 Output > testValue
?>
PS:if you want to send url parameters as loop then follow this answer :https://stackoverflow.com/a/41225209/6295712

Not the best approach, as many stated here, but this might help:
ignore_user_abort(1); // run script in background even if user closes browser
set_time_limit(1800); // run it for 30 minutes
// Long running script here

If the desired output of your script is some processing, not a webpage, then I believe the desired solution is to run your script from shell, simply as
php my_script.php

php asynchronous call and getting response from the background job

I have done some google search on this topic and couldn't find the answer to my question.
What I want to achieve is the following:
the client make an asynchronous call to a function in the server
the server runs that function in the background (because that function is time consuming), and the client is not hanging in the meantime
the client constantly make a call to the server requesting the status of the background job
Can you please give me some advices on resolving my issue?
Thank you very much! ^-^

You are not specifying what language the asynchronous call is in, but I'm assuming PHP on both ends.
I think the most elegant way would be this:
HTML page loads, defines a random key for the operation (e.g. using rand() or an already available session ID [be careful though that the same user could be starting two operations])
HTML page makes Ajax call to PHP script to start_process.php
start_process.php executes exec /path/to/scriptname.php to start the process; see the User Contributed Notes on exec() on suggestions how to start a process in the background. Which one is the right for you, depends mainly on your OS.
long_process.php frequently writes its status into a status file, named after the random key that your Ajax page generated
HTML page makes frequent calls to show_status.php that reads out the status file, and returns the progress.

Have a google for long running php processes (be warned that there's a lot of bad advice out there on the topic - including the note referred to by Pekka - this will work on Microsoft but will fail in unpredicatable ways on anything else).
You could develop a service which responds to requests over a socket (your client would use fsockopen to connect) - some simple ways of acheiving this would be to use Aleksey Zapparov's Socket server (http://www.phpclasses.org/browse/package/5758.html) which handles requests coming in via a socket however since this runs as a single thread it may not be very appropriate for something which requiers a lot of processing. ALternatively, if you are using a non-Microsoft system then yo could hang your script off [x]inetd however, you'll need to do some clever stuff to prevent it terminating when the client disconnects.
To keep the thing running after your client disconnects then the PHP code must be running from the standalone PHP executable (not via the webserver) Spawn a process in a new process group (see posix_setsid() and pcntl_fork()). To enable the client to come back and check on progress, the easiest way to achieve this is to configure the server to write out its status to somewhere the client can read.
C.

Ajax call run method longRunningMethod() and get back an idendifier (e.g an id)
Server runs the method, and sets key in e.g. sharedmem
Client calls checkTask(id)
server lookup the key in sharedmem and check for ready status
[repeat 3 & 4 until 5 is finished]
longRunningMethod is finished and sets state to finished in sharedmem.
All Ajax calls are per definition asynchronous.

You could (although not a strictly necessary step) use AJAX to instantiate the call, and the script could then create a reference to the status of the background job in shared memory (or even a temporary entry in an SQL table, or even a temp file), in the form of a unique job id.
The script could then kick off your background process and immediately return the job ID to the client.
The client could then call the server repeatedly (via another AJAX interface, for example) to query the status of the job, e.g. "in progress", "complete".
If the background process to be executed is itself written in PHP (e.g. a command line PHP script) then you could pass the job id to it and it could provide meaningful progress updates back to the client (by writing to the same shared memory area, or database table).
If the process to executed it's not itself written in PHP then I suggest wrapping it in a command line PHP script so that it can monitor when the process being executed has finished running (and check the output to see if was successful) and update the status entry for that task appropriately.
Note: Using shared memory for this is best practice, but may not be available if you are using shared hosting, for example. Don't forget you want to have a means to clean up old status entries, so I would store "started_on"/"completed_on" timestamps values for each one, and have it delete entries for stale data (e.g. that have a completed_on timestamp of more than X minutes - and, ideally, that also checks for jobs that started some time ago but were never marked as completed and raises an alert about them).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.