I've written in PHP a script that takes a long time to execute [Image processing for thousands of pictures]. It's a meter of hours - maybe 5.
After 15 minutes of processing, I get the error:
ERROR
The requested URL could not be retrieved
The following error was encountered while trying to retrieve the URL: The URL which I clicked
Read Timeout
The system returned: [No Error]
A Timeout occurred while waiting to read data from the network. The network or server may be down or congested. Please retry your request.
Your cache administrator is webmaster.
What I need is to enable that script to run for much longer.
Now, here are all the technical info:
I'm writing in PHP and using the Zend Framework. I'm using Firefox. The long script that is processed is done after clicking a link. Obviously, since the script is not over I see the web page on which the link was and the web browser writes "waiting for ...".
After 15 minutes the error occurs.
I tried to make changes to Firefox threw about:config but without any success. I don't know, but the changes might be needed somewhere else.
So, any ideas?
Thanks ahead.
set_time_limit(0) will only affect the server-side running of the script. The error you're receiving is purely browser-side. You have to send SOMETHING to keep the browser from deciding the connection's dead - even a single character of output (followed by a flush() to make sure it actually get sent out over the wire) will do. Maybe once every image that's processed, or on a fixed time interval (if last char sent more than 5 minutes ago, output another one).
If you don't want any intermediate output, you could do ignore_user_abort(TRUE), which will allow the script to keep running even if the connection gets shut down from the client side.
If the process runs for hours then you should probably look into batch processing. So you just store a request for image processing (in a file, database or whatever works for you) instead of starting the image processing. This request is then picked up by a scheduled (cron) process running on the server, which will do the actual processing (this can be a PHP script, which calls set_time_limit(0)). And when processing is finished you could signal the user (by mail or any other way that works for you) that the processing is finished.
use set_time_limit
documentation here
http://nl.php.net/manual/en/function.set-time-limit.php
If you can split your work in batches, after processing X images display the page with some javascript (or META redirects) on it to open the link http://server/controller/action/nextbatch/next_batch_id.
Rinse and repeat.
batching the entire process also has the added benefit that once something goes wrong, you don't have to start out the entire thing anew.
If you're running on a server of your own and can get out of safe_mode, then you could also fork background processes to do the actual heavy lifting, independent of your browser view of things. If you're in a multicore or multiprocessor environment, you can even schedule more than one running process at any time.
We've done something like that for large computation scripts; synchronization of the processes happened over a shared database---but luckily enough, they processes were so independent that the only thing we needed to see was their completion or termination.
Related
I currently have a website that has twice been suspended by my hosting provider for "overusing system resources". In each case, there were 300 - 400 crashed copies of one of my PHP scripts left running on the server.
The scripts themselves pull and image from a web camera at home and copy it to the server. They make use of file locks to ensure only one can write at a time. The scripts are called every 3 seconds by any client viewing the page.
Initially I was confused, as I had understood that a PHP script either completes (returning the result), or crashes (returning the internal server error page). I am, however, informed that "defunct scripts" are a very common occurrence.
Would anyone be able to educate me? I have Googled this to death but I cannot see how a script can end up in a crashed state. Would it not time out when it reaches the max execution time?
My hosting provider is using PHP set up as CGI on a Linux platform. I believe that I have actually identified the problem with my script in that I did not realise that flock was a blocking function (and I am not using the LOCK_NB mask). I am assuming that somehow hundreds of copies of my script end up blocked waiting for a resource to become available and this leads to a crash? Does this sound plausible? I am reluctant to re-enable the site for fear of it being suspended again.
Any insights greatly appreciated.
Probably the approach I would recommend is to use tempnam() first and write the contents inside (which may take a while). Once done, you do the file locking, etc.
Not sure if this happens when a PUT request is being done; typically PHP will handle file uploads first before handing over the execution to your script.
Script could crash on these two limitations
max_execution_time
memory_limit
while working with resources, unless you have no other errors in script / check for notice errors too
If I have a PHP page that is doing a task that takes a long time, and I try to load another page from the same site at the same time, that page won't load until the first page has timed out. For instance if my timeout was set to 60 seconds, then I wouldn't be able to load any other page until 60 seconds after the page that was taking a long time to load/timeout. As far as I know this is expected behaviour.
What I am trying to figure out is whether an erroneous/long loading PHP script that creates the above situation would also affect other people on the same network. I personally thought it was a browser issues (i.e. if I loaded http://somesite.com/myscript.php in chrome and it start working it's magic in the background, I couldn't then load http://somesite.com/myscript2.php until that had timed out, but I could load that page in Firefox). However, I've heard contradictory statements, saying that the timeout would happen to everyone on the same network (IP address?).
My script works on some data imported from sage and takes quite a long time to run - sometiems it can timeout before it finishes (i.e. if the sage import crashes over the weeked), so I run it again and it picks up where it left off. I am worried that other staff in the office will not be able to access the site while this is running.
The problem you have here is actually related to the fact that (I'm guessing) you are using sessions. This may be a bit of a stretch, but it would account for exactly what you describe.
This is not in fact "expected behaviour" unless your web server is set up to run a single process with a single thread, which I highly doubt. This would create a situation where the web server is only able to handle a single request at any one time, and this would affect everybody on the network. This is exactly why your web server probably won't be set up like this - in fact I suspect you will find it is impossible to configure your server like this, as it would make the server somewhat useless. And before some smart alec chimes in with "what about Node.js?" - that is a special case, as I am sure you are already well aware.
When a PHP script has a session open, it has an exclusive lock on the file in which the session data is stored. This means that any subsequent request will block at the call to session_start() while PHP tries to acquire that exclusive lock on the session data file - which it can't, because your previous request still has one. As soon as your previous request finishes, it releases it's lock on the file and the next request is able to complete. Since sessions are per-machine (in fact per-browsing session, as the name suggests, which is why it works in a different browser) this will not affect other users of your network, but leaving your site set up so that this is an issue even just for you is bad practice and easily avoidable.
The solution to this is to call session_write_close() as soon as you have finished with the session data in a given script. This causes the script to close the session file and release it's lock. You should try and either finish with the session data before you start the long running process, or not call session_start() until after it has completed.
In theory you can call session_write_close() and then call session_start() again later in the script, but I have found that PHP sometimes exhibits buggy behaviour in this respect (I think this is cookie related, but don't quote me on that). Obviously, pay attention to the fact the setting cookies modifies the headers, so you have to call session_start() before you output any data or enable output buffering.
For example, consider this script:
<?php
session_start();
if (!isset($_SESSION['someval'])) {
$_SESSION['someval'] = 1;
} else {
$_SESSION['someval']++;
}
echo "someval is {$_SESSION['someval']}";
sleep(10);
With the above script, you will have to wait 10 seconds before you are able to make a second request. However, if you add a call to session_write_close() after the echo line, you will be able to make another request before the previous request has completed.
Hmm... I did not check but I think that each request to the webserver is handled in a thread of its own. Thereby a different request should not be blocked. Just try :-) Use a different browser and access your page while the big script is running!
Err.. I just see that this worked for you :-) And it should for others, too.
I'm currently running a Linux based VPS, with 768MB of Ram.
I have an application which collects details of domains and then connect to a service via cURL to retrieve details of the pagerank of these domains.
When I run a check on about 50 domains, it takes the remote page about 3 mins to load with all the results, before the script can parse the details and return it to my script. This causes a problem as nothing else seems to function until the script has finished executing, so users on the site will just get a timer / 'ball of death' while waiting for pages to load.
**(The remote page retrieves the domain details and updates the page by AJAX, but the curl request doesnt (rightfully) return the page until loading is complete.
Can anyone tell me if I'm doing anything obviously wrong, or if there is a better way of doing it. (There can be anything between 10 and 10,000 domains queued, so I need a process that can run in the background without affecting the rest of the site)
Thanks
A more sensible approach would be to "batch process" the domain data via the use of a cron triggered PHP cli script.
As such, once you'd inserted the relevant domains into a database table with a "processed" flag set as false, the background script would then:
Scan the database for domains that aren't marked as processed.
Carry out the CURL lookup, etc.
Update the database record accordingly and mark it as processed.
...
To ensure no overlap with an existing executing batch processing script, you should only invoke the php script every five minutes from cron and (within the PHP script itself) check how long the script has been running at the start of the "scan" stage and exit if its been running for four minutes or longer. (You might want to adjust these figures, but hopefully you can see where I'm going with this.)
By using this approach, you'll be able to leave the background script running indefinitely (as it's invoked via cron, it'll automatically start after reboots, etc.) and simply add domains to the database/review the results of processing, etc. via a separate web front end.
This isn't the ideal solution, but if you need to trigger this process based on a user request, you can add the following at the end of your script.
set_time_limit(0);
flush();
This will allow the PHP script to continue running, but it will return output to the user. But seriously, you should use batch processing. It will give you much more control over what's going on.
Firstly I'm sorry but Im an idiot! :)
I've loaded the site in another browser (FF) and it loads fine.
It seems Chrome puts some sort of lock on a domain when it's waiting for a server response, and I was testing the script manually through a browser.
Thanks for all your help and sorry for wasting your time.
CJ
While I agree with others that you should consider processing these tasks outside of your webserver, in a more controlled manner, I'll offer an explanation for the "server standstill".
If you're using native php sessions, php uses an exclusive locking scheme so only a single php process can deal with a given session id at a time. Having a long running php script which uses sessions can certainly cause this.
You can search for combinations of terms like:
php session concurrency lock session_write_close()
I'm sure its been discussed many times here. I'm too lazy to search for you. Maybe someone else will come along and make an answer with bulleted lists and pretty hyperlinks in exchange for stackoverflow reputation :) But not me :)
good luck.
I'm not sure how your code is structured but you could try using sleep(). That's what I use when batch processing.
this is more of a fundamental question at how apache/threading works.
in this hypothetical (read: sometimes i suck and write terrible code), i write some code that enters the infinite-recursion phases of it's life. then, what's expected, happens. the serve stalls.
even if i close the tab, open up a new one, and hit the site again (locally, of course), it does nothing. even if i hit a different domain i'm hosting through a vhost declaration, nothing. i normally have to wait a number of seconds before apache can begin handling traffic again. most of the time i just get tired and restart the server manually.
can someone explain this process to me? i have the php runtime setting 'ignore_user_abort' set to true to allow ajax calls that are initiated to keep running even if they close their browser, but would this being set to false affect it?
any help would be appreciated. didn't know what to search for.
thanks.
ignore_user_abort() allows your script (and Apache) to ignore a user disconnecting (closing browser/tab, moving away from page, hitting ESC, esc..) and continue processing. This is useful in some cases - for instance in a shopping cart once the user hits "yes, place the order". You really don't want an order to die halfway through the process, e.g. order's in the database, but the charge hasn't been sent to the payment facility yet. Or vice-versa.
However, while this script is busilly running away in "the background", it will lock up resources on the server, especially the session file - PHP locks the session file to make sure that multiple parallel requests won't stomp all over the file, so while your infinite loop is running in the background, you won't be able to use any session-enabled other part of the site. And if the loop is intensive enough, it could tie up the CPU enough that Apache is unable to handle any other requests on other hosted sites, where the session lock might not apply.
If it is an infinite loop, you'll have to wait until PHP's own maximum allowed run time (set_time_limit() and max_execution_time()) kicks in and kills the script. There's also some server-side limiters, like Apache's RLimitCPU and TimeOut that can handle situations like this.
Note that except on Windows, PHP doesn't count "external" time in the set_time_limit. So if your runaway process is doing database stuff, calling external programs via system() and the like, the time spent running those external calls is NOT accounted for in the parent's time limit.
If you write code that causes an (effectively) neverending loop, then apache will execute that, and be unable to respond to any additional new requests for a page, because it's trying to determine the page content (for the served page which caused the neverending loop) by executing the (non-terminating) php code.
Solution: don't write code that doesn't terminate (in a reasonable amount of time). Understand loop invariants.
I created a script that gets data from some web services and our database, formats a report, then zips it and makes it available for download. When I first started I made it a command line script to see the output as it came out and to get around the script timeout limit you get when viewing in a browser. But because I don't want my user to have to use it from the command line or have to run php on their computer, I want to make this run from our webserver instead.
Because this script could take minutes to run, I need a way to let it process in the background and then start the download once the file has been created successfully. What's the best way to let this script run without triggering the timeout? I've attempted this before (using the backticks to run the script separately and such) but gave up, so I'm asking here. Ideally, the user would click the submit button on the form to start the request, then be returned to the page instead of making them stare at a blank browser window. When the zip file they exists (meaning the process has finished), it should notify them (via AJAX? reloaded page? I don't know yet).
This is on windows server 2007.
You should run it in a different process. Make a daemon that runs continuously, hits a database and looks for a flag, like "ShouldProcessData". Then when you hit that website switch the flag to true. Your daemon process will see the flag on it's next iteration and begin the processing. Stick the results in to the database. Use the database as the communication mechanism between the website and the long running process.
In PHP you have to tell what time-out you want for your process
See PHP manual set_time_limit()
You may have another problem: the time-out of the browser itself (could be around 1~2 minutes). While that time-out should be changeable within the browser (for each browser), you can usually prevent the time-out user side to be triggered by sending some data to the browser every 20 seconds for instance (like the header for download, you can then send other headers, like encoding etc...).
Gearman is very handy for it (create a background task, let javascript poll for progress). It does of course require having gearman installed & workers created. See: http://www.php.net/gearman
Why don't you make an ajax call from the page where you want to offer the download and then just wait for the ajax call to return and also set_time_limit(0) on the other page.