I'm trying to make a sort of PHP bot. The idea is to have to php files, named a.php and b.php. a.php does something, then sleeps 30 seconds, calls b.php, b.php ends the Http request, does some processing, and then calls a.php, which ends the Http request, and so on.
Only problem now is how to end the Http reqest, made using cURL. Ive tried this code below:
<?php
ob_end_clean();
header("Connection: close");
ignore_user_abort(); // optional
ob_start();
echo ('Text the user will see');
$size = ob_get_length();
header("Content-Length: $size");
ob_end_flush(); // Will not work
flush(); // Unless both are called !
// At this point, the browser has closed connection to the web server
// Do processing here
echo('Text user will never see');
Slight problem is that it doesn't work, and I actually see "Text user will never see". I've tried cron jobs and such, but host doesn't allow it. I can't sent the script timeout limit either. So my only option is to create repeating php scripts. So how would I send the Http request?
Based on the new understanding of your problem. You are creating a system that checks a remote URL every 30 seconds to monitor a fragment of content. For this I recommend a CRON which can either be server based: http://en.wikipedia.org/wiki/Cron or web based if your host does not permit it: http://www.webbasedcron.com/ (example).
PHP scripts in this case run in the context if web server request, therefore you can't stop talking to the web connection and then continue doing stuff, which is what I think you're attempting to do with the connection close.
The reason you're seeing the output at the end is because at the end of a script PHP will call an implicit flush (see ob_implicit_flush in the manual), but you close the connection to the browser by ending the PHP script.
Ways around this:
You might be able to use set_time_limit to extend the execution limit. DO NOT USE ZERO. It's tempting to say "take all the time you need" on a post-process script, but that way lies madness and bitter sysadmins, plus remember you're still running on curl's timeout stopwatch (though you can extend that as an option). set_time_limit(5) will give you five more seconds, so doing that periodically will allow you to do your post-processing but - if you're careful - still protect you from infinite loops. Infinate loops with no execute limits in the context of apache requests are also likely to make you unpopular with your sysadmin.
It might be possible to build a shell script in your application, save it to disk, execute that in the background and have it delete itself after. That way it will run outside the web-request context, and if the script still exists when you next do the request, you can know that the other processing is still happening. Be really careful about things that might take longer than your gap between executions, as that way leads to sorrow and more bitter sysadmins. This course of action would get you thrown off my hosting environment if you did it without talking to me about it first, though, as it's a terrible hack with a myriad of possible security issues.
But you appear to be attempting to run a regular batch process on a system where they don't want you to do that - or they'd have given you access to cron - so your best and most reliable method is to find a host that actually supports the thing you're trying to do.
Related
This question came up to me when I encountered a bug that caused my PHP program to loop infinitely. Here is an example situation:
Suppose I have a PHP webpage that receives picture uploads (the page perhaps is a response page for an image upload form). In the server, the script should store the image in a temporary file. The script should then output a confirmation message to the client then stop sending data so that the client would not wait. The script should then continue executing, processing the image (like resizing it) before ending.
I think this "technique" could be useful such that the client will not wait during time-consuming processes, therefore preventing time-outs.
Also, could this be solved using HTTP methods?
Yes.
This can easily be done without any asynchronous processing if you correctly utilize HTTP headers.
Under normal conditions PHP will stop processing as soon as the client on the other end closes the connection. If you want to continue processing after this event, you need to do one thing: tell PHP to ignore user aborts. How?
ignore_user_abort()
This will allow your script to keep running even after the client gets the heck out of dodge. But we're also faced with the problem of how to tell the client that the request they made is finished so that it will close the connection. Normally, PHP transparently handles sending these headers for us if we don't specify them. Here, though, we need to do it explicitly or the client won't know when we want them to stop reading the response.
To do this, we have to send the appropriate HTTP headers to tell the client when to close:
Connection: close
Content-Length: 42
This combination of headers tells the client that once it reads 42 bytes of entity body response that the message is finished and that they should close the connection. There are a couple of consequences to this method:
You have to generate your response BEFORE sending any output because you have to determine its content length size in bytes so you can send the correct header.
You have to actually send these headers BEFORE you echo any output.
So your script might look something like this:
<?php
ignore_user_abort();
// do work to determine the response you want to send ($responseBody)
$contentLength = strlen($responseBody);
header('Connection: close');
header("Content-Length: $contentLength");
flush();
echo $responseBody;
// --- client will now disconnect and you can continue processing here ---
The big "Gotchya" with this method is that when you're running PHP in a web SAPI you can easily run up against the max time limit directive if you do time-consuming processing after the end user client closes the connection. If this is a problem, you may need to consider an asynchronous processing option using cron because there is no time limit when PHP runs in a CLI environment. Alternatively, you could just up the time limit of your scripts in the web environment using set_time_limitdocs.
It's worth mentioning that if you do something like this, you may also want to add a check to connection_aborted()docs while generating your response body so that you can avoid the additional processing if the user aborts before completing the transfer.
I have facing the same problem when i upload image on twitter & facebook from iphone through web service of php.
If the processing time of image upload is not much then you can check the comment of #Musa this may help you but if it takes too much time to process then try this steps.
1. Image store in folder
2. Fetch image from folder using cron
3. Cron run for every 2 min in backend
these will decrease your processing time.
Hope this help you.
It is advisable to do these asynchronously. That is, make another script which only processes the previously-created tmp files, and run it with cron (don't even involve apache). When php is running as web-server module, it should be dedicated to quickly forming a response, and then going away to free up resources for the next request.
You are doing the right thing by thinking this way; just keep going one small architectural step further, and fully decouple the request from the heavy lifting that needs to take place.
You can do it several ways #
1 #
ob_start();
//output
header("Content-Length: ".ob_get_length());
header("Connection: close");
ob_end_flush();
//do other stuff
2 #
Using system() or exec() of PHP, close the Process
3 #
Close the Process using Shell Script
You can use ob_implicit_flush(), It will turn implicit flushing on or off. Implicit flushing will result in a flush operation after every output call, so that explicit calls to flush() will no longer be needed.
refer to
How do i implement this scenario using PHP?
OR
You should Create a standalone cron, which will run after a specific amount of time, and do the in asynchronous way, with out letting the user to know what processing is going on, or with out letting the user to wait. This way you will even be able to detect the failed cases also.
And you should also try to minimize the loading time.
If I have a PHP page that is doing a task that takes a long time, and I try to load another page from the same site at the same time, that page won't load until the first page has timed out. For instance if my timeout was set to 60 seconds, then I wouldn't be able to load any other page until 60 seconds after the page that was taking a long time to load/timeout. As far as I know this is expected behaviour.
What I am trying to figure out is whether an erroneous/long loading PHP script that creates the above situation would also affect other people on the same network. I personally thought it was a browser issues (i.e. if I loaded http://somesite.com/myscript.php in chrome and it start working it's magic in the background, I couldn't then load http://somesite.com/myscript2.php until that had timed out, but I could load that page in Firefox). However, I've heard contradictory statements, saying that the timeout would happen to everyone on the same network (IP address?).
My script works on some data imported from sage and takes quite a long time to run - sometiems it can timeout before it finishes (i.e. if the sage import crashes over the weeked), so I run it again and it picks up where it left off. I am worried that other staff in the office will not be able to access the site while this is running.
The problem you have here is actually related to the fact that (I'm guessing) you are using sessions. This may be a bit of a stretch, but it would account for exactly what you describe.
This is not in fact "expected behaviour" unless your web server is set up to run a single process with a single thread, which I highly doubt. This would create a situation where the web server is only able to handle a single request at any one time, and this would affect everybody on the network. This is exactly why your web server probably won't be set up like this - in fact I suspect you will find it is impossible to configure your server like this, as it would make the server somewhat useless. And before some smart alec chimes in with "what about Node.js?" - that is a special case, as I am sure you are already well aware.
When a PHP script has a session open, it has an exclusive lock on the file in which the session data is stored. This means that any subsequent request will block at the call to session_start() while PHP tries to acquire that exclusive lock on the session data file - which it can't, because your previous request still has one. As soon as your previous request finishes, it releases it's lock on the file and the next request is able to complete. Since sessions are per-machine (in fact per-browsing session, as the name suggests, which is why it works in a different browser) this will not affect other users of your network, but leaving your site set up so that this is an issue even just for you is bad practice and easily avoidable.
The solution to this is to call session_write_close() as soon as you have finished with the session data in a given script. This causes the script to close the session file and release it's lock. You should try and either finish with the session data before you start the long running process, or not call session_start() until after it has completed.
In theory you can call session_write_close() and then call session_start() again later in the script, but I have found that PHP sometimes exhibits buggy behaviour in this respect (I think this is cookie related, but don't quote me on that). Obviously, pay attention to the fact the setting cookies modifies the headers, so you have to call session_start() before you output any data or enable output buffering.
For example, consider this script:
<?php
session_start();
if (!isset($_SESSION['someval'])) {
$_SESSION['someval'] = 1;
} else {
$_SESSION['someval']++;
}
echo "someval is {$_SESSION['someval']}";
sleep(10);
With the above script, you will have to wait 10 seconds before you are able to make a second request. However, if you add a call to session_write_close() after the echo line, you will be able to make another request before the previous request has completed.
Hmm... I did not check but I think that each request to the webserver is handled in a thread of its own. Thereby a different request should not be blocked. Just try :-) Use a different browser and access your page while the big script is running!
Err.. I just see that this worked for you :-) And it should for others, too.
This question already has answers here:
How do I close a connection early?
(20 answers)
Closed 9 years ago.
Is there a way in PHP to close the connection (essentially tell a browser than there's no more data to come) but continue processing. The specific circumstance I'm thinking of is that I would want to serve up cached data, then if the cache had expired, I would still serve the cached data for a fast response, close the connection, but continue processing to regenerate and cache new data. Essentially the only purpose is to make a site appear more responsive as there wouldn't be the occasional delay while a user waits for content to be regenerated.
UPDATE:
PLuS has the closest answer to what I was looking for. To clarify for a couple of people I'm looking for something that enables the following steps:
User requests page
Connection opens to server
PHP checks if cache has expired, if still fresh, serve cache and close connection (END HERE). If expired, continue to 4.
Serve expired cache
Close connection so browser knows it's not waiting for more data.
PHP regenerates fresh data and caches it.
PHP shuts down.
UPDATE:
This is important, it must be a purely PHP solution. Installing other software is not an option.
If running under fastcgi you can use the very nifty:
fastcgi_finish_request();
http://php.net/manual/en/function.fastcgi-finish-request.php
More detailed information is available in a duplicate answer.
I finally found a solution (thanks to Google, I just had to keep trying different combinations of search terms). Thanks to the comment from arr1 on this page (it's about two thirds of the way down the page).
<?php
ob_end_clean();
header("Connection: close");
ignore_user_abort(true);
ob_start();
echo 'Text the user will see';
$size = ob_get_length();
header("Content-Length: $size");
ob_end_flush(); // All output buffers must be flushed here
flush(); // Force output to client
// Do processing here
sleep(30);
echo('Text user will never see');
I have yet to actually test this but, in short, you send two headers: one that tells the browser exactly how much data to expect then one to tell the browser to close the connection (which it will only do after receiving the expected amount of content). I haven't tested this yet.
You can do that by setting time limit to unlimited and ignoring connection
<?php
ignore_user_abort(true);
set_time_limit(0);
see also: http://www.php.net/manual/en/features.connection-handling.php
PHP doesn't have such persistence (by default). The only way I can think of is run cron jobs to pre-fill the cache.
Can compile and run programs from PHP-CLI(not on shared hosting > VPS)
Caching
For caching I would not do it that way. I would use redis as my LRU cache. It is going to be very fast(benchmarks) especially when you compile it with client library written in C.
Offline processing
When you install beanstalkd message queue you can also do delayed puts. But I would use redis brpop/rpush to do the other message queuing part because redis is going to be faster especially if you use PHP client library(in C user-space).
Can NOT compile or run programs from PHP-CLI(on shared hosting)
set_time_limit
most of the times this set_time_limit is not available(because of safe-mode or max_execution_time directive) to set 0 at least when on shared hosting.Also shared hosting really providers don't like for users to hold up PHP processes for a long time. Most of the times the default limit is set to 30.
Cron
Use cron to write data to disc using Cache_lite. Some stackoverflow topic already explaining this:
crontab with wget - why is it running twice?
Bash commands not executed when through cron job - PHP
How can I debug a PHP CRON script that does not appear to be running?
Also rather easy, but still hacky. I thinky you should upgrade(>VPS) when you have to do such hacking.
Asynchronous request
As last resort you could do asynchronous request caching data using Cache_lite for example. Be aware that shared hosting does not like for you to hold up a lot of long running PHP processes. I would use only one background process which calls another one when it reaches max-execution-time directive. I would note time when script starts and between a couple of cache calls I would measure time spent and when it gets near the time I would do another asynchronous request. I would use locking to make sure only 1 process is running. This way I will not piss of the provider and it can be done. On the other hand I don't think I would write any of this because it is kind of hacky if you ask me. When I get to that scale I would upgrade to VPS.
As far as I know, unless you're running FastCGI, you can't drop the connection and continue execution (unless you got Endophage's answer to work, which I failed). So you can:
Use cron or anything like that to schedule this kind of tasks
Use a child process to finish the job
But it gets worse. Even if you spawn a child process with proc_open(), PHP will wait for it to finish before closing connection, even after calling exit(), die(), some_undefined_function_causing_fatal_error(). The only workaround I found is to spawn a child process that itself spawns a child process, like this:
function doInBackground ($_variables, $_code)
{
proc_open (
'php -r ' .
escapeshellarg ("if (pcntl_fork() === 0) { extract (unserialize (\$argv [1])); $_code }") .
' ' . escapeshellarg (serialize ($_variables)),
array(), $pipes
);
}
$message = 'Hello world!';
$filename = tempnam (sys_get_temp_dir(), 'php_test_workaround');
$delay = 10;
doInBackground (compact ('message', 'filename', 'delay'), <<< 'THE_NOWDOC_STRING'
// Your actual code goes here:
sleep ($delay);
file_put_contents ($filename, $message);
THE_NOWDOC_STRING
);
If you are doing this to cache content, you may instead want to consider using an existing caching solution such as memcached.
No. As far as the webserver is concerned, the request from the browser is handled by the PHP engine, and that's that. The request lasts as long as the PHP.
You might be able to fork() though.
this is more of a fundamental question at how apache/threading works.
in this hypothetical (read: sometimes i suck and write terrible code), i write some code that enters the infinite-recursion phases of it's life. then, what's expected, happens. the serve stalls.
even if i close the tab, open up a new one, and hit the site again (locally, of course), it does nothing. even if i hit a different domain i'm hosting through a vhost declaration, nothing. i normally have to wait a number of seconds before apache can begin handling traffic again. most of the time i just get tired and restart the server manually.
can someone explain this process to me? i have the php runtime setting 'ignore_user_abort' set to true to allow ajax calls that are initiated to keep running even if they close their browser, but would this being set to false affect it?
any help would be appreciated. didn't know what to search for.
thanks.
ignore_user_abort() allows your script (and Apache) to ignore a user disconnecting (closing browser/tab, moving away from page, hitting ESC, esc..) and continue processing. This is useful in some cases - for instance in a shopping cart once the user hits "yes, place the order". You really don't want an order to die halfway through the process, e.g. order's in the database, but the charge hasn't been sent to the payment facility yet. Or vice-versa.
However, while this script is busilly running away in "the background", it will lock up resources on the server, especially the session file - PHP locks the session file to make sure that multiple parallel requests won't stomp all over the file, so while your infinite loop is running in the background, you won't be able to use any session-enabled other part of the site. And if the loop is intensive enough, it could tie up the CPU enough that Apache is unable to handle any other requests on other hosted sites, where the session lock might not apply.
If it is an infinite loop, you'll have to wait until PHP's own maximum allowed run time (set_time_limit() and max_execution_time()) kicks in and kills the script. There's also some server-side limiters, like Apache's RLimitCPU and TimeOut that can handle situations like this.
Note that except on Windows, PHP doesn't count "external" time in the set_time_limit. So if your runaway process is doing database stuff, calling external programs via system() and the like, the time spent running those external calls is NOT accounted for in the parent's time limit.
If you write code that causes an (effectively) neverending loop, then apache will execute that, and be unable to respond to any additional new requests for a page, because it's trying to determine the page content (for the served page which caused the neverending loop) by executing the (non-terminating) php code.
Solution: don't write code that doesn't terminate (in a reasonable amount of time). Understand loop invariants.
I've noticed many times where some php scripts exit. It seems to me that this will force an exit of the httpd/apache child (of course another will be started if required for the next request).
But in the CMS, that next request will require the entire init.php initialization, and of course just cleaning up and starting php in the first place.
It seems that the php files usually start with
if ( !defined( 'SMARTY_DIR' ) ) {
include_once( 'init.php' );
}
which suggests that somebody was imagining that one php process would serve multiple requests. But if every script exits, then each php/apache process will serve one request only.
Any thoughts on the performance and security implications of removing many of the exit calls (especially from the most-frequently-called scripts like index.php etc) to allow one process to serve multiple requests?
Thanks, Peter
--ADDENDUM --
Thank you for the answers. That (php will never serve more than one request) is what I thought originally until last week, when I was debugging a config variable that could only have been set in one script (because of the way the path was set up) but was still set in another script (this is on a webserver with about 20 hits/sec). In that case, I did not have a php exit call in the one script that set up its config slightly differently. But when I added the php exit call to that one script (in the alternate directory) this solved the misconfiguration I was experiencing in all my main scripts in the main directory (which were due to having a css directory variable set erroneously, in a previous page execution). So now I'm confused again, because with what all the answers so far say, php should never serve more than one request.
exit does nothing to Apache processes (it certainly doesn't kill a worker!). It simply ends the execution of the PHP script and returns execution to the Apache process, which'll send the results to the browser and continue on to the next request.
The Smarty code you've excerpted doesn't have anything to do with a PHP process serving multiple requests. It just insures that Smarty is initialised at all times - useful if a PHP script might be alternatively included in another script or accessed directly.
I think your confusion comes from what include_once is for. PHP is basically a "shared-nothing" system, where there are no real persistent server objects. include_once doesn't mean once per Apache child, but once per web request.
PHP may hack up a hairball if you include the same file twice. For instance a function with a particular name can only be defined once. This led to people implementing a copy of the old C #ifndef-#define-#include idiom once for each included file. include_once was the fix for this.
Even if you do not call exit your PHP script is still going to end execution, at which point any generated HTML will be returned to the web server to send on to your browser.
The exit keyword allows you to signal to the PHP engine that your work is done and no further processing needs to take place.
Also note that exit is typically used for error handling and flow control - removing it from includes will likely break your application.