PHP provides a mechanism to register a shutdown function:
register_shutdown_function('shutdown_func');
The problem is that in the recent versions of PHP, this function is still executed DURING the request.
I have a platform (in Zend Framework if that matters) where any piece of code throughout the request can register an entry to be logged into the database. Rather than have tons of individual insert statements throughout the request, slowing the page down, I queue them up to be insert at the end of the request. I would like to be able to do this after the HTTP request is complete with the user so the length of time to log or do any other cleanup tasks doesn't affect the user's perceived load time of the page.
Is there a built in method in PHP to do this? Or do I need to configure some kind of shared memory space scheme with an external process and signal that process to do the logging?
If you're really concerned about the insert times of MySQL, you're probably addressing the symptoms and not the cause.
For instance, if your PHP/Apache process is executing after the user gets their HTML, your PHP/Apache process is still locked into that request. Since it's busy, if another request comes along, Apache has to fork another thread, dedicate more memory to it, open additional database connections, etc.
If you're running into performance problems, you need to remove heavy lifting from your PHP/Apache execution. If you have a lot of cleanup tasks going on, you're burning precious Apache processes.
I would consider logging your entries to a file and have a crontab load these into your database out of band. If you have other heavy duty tasks, use a queuing/job processing system to do the work out of band.
Aside from register_shutdown_function() there aren't built in methods for determining when a script has exited. However, the Zend Framework has several hooks for running code at specific points in the dispatch process.
For your requirements the most relevant would be the action controller's post-dispatch hook which takes place just after an action has been dispatched, and the dispatchLoopShutdown event for the controller plugin broker.
You should read the manual to determine which will be a better fit for you.
EDIT: I guess I didn't understand completely. From those hooks you could fork the current process in some form or another.
PHP has several ways to fork processes as you can read in the manual under program execution. I would suggest going over the pcntl extension - read this blog post to see an example of forking child processes to run in the background.
The comments on this random guy's blog sound similar to what you want. If that header trick doesn't work, one of the comments on that blog suggests exec()ing to a separate PHP script to run in the background, if your Web host's configuration allows such a thing.
This might be a little hackish for your taste, but it would be an effective and simple workaround.
You could create a "queue" managed by your DB of choice, and first, store the request in your queue table in the database, then output an iframe that leads to a script that will trigger the instructions that match up with the queue_id in your database.
ex:
<?php
mysql_query('INSERT INTO queue ('instructions') VALUES ('something would go here');
echo('<iframe src="/yourapp/execute_queue/id?' . mysql_insert_id() . '" />');
?>
and the frame would do something like
ex:
<?php
$result = mysql_query('SELECT instructions FROM queue WHERE id = ' . $_GET['id']);
// From here, simply execute some instruction based on the "instructions" field, then delete the instruction from the database.
?>
Like I said, I can see how you could consider this hackish, but the frame will load independent of its parent page, so it would achieve what you want without running some other app in the background.
Hope this at least points you in the right direction.
Cheers!
Related
I'm creating a PHP script that will allow a user to log into a website and execute database queries and do other actions that could take some time to complete. If the PHP script runs these actions and they take too long, the browser page times out on the user end and the action never completes on the server end. If I redirect the user to another page and then attempt to run the action in the PHP script, will the server run it even though the user is not on the page? Could the action still time out?
In the event of long-running server-side actions in a web application like this, a good approach is to separate the queueing of the actions (which should be handled by the web application) from the running of the actions (which should be handled by a different server-side application).
In this case it could be as simple as the web application inserting a record into a database table which indicates that User X has requested Action Y to be processed at Time Z. A back-end process (always-running daemon, scheduled script, whatever you prefer) would be constantly polling that database table to look for new entries. ("New" might be denoted by something like an "IsComplete" column in that table.) It could poll every minute, every few minutes, every hour... whatever is a comfortable balance between server performance and the responsiveness of an action beginning when it's requested.
Once the action is complete, the server-side application that ran the action would mark it as complete in the database and would store the results wherever you need them to be stored. (Another database table or set of tables? A file? etc.) The web application can check for these results whenever you need it to (such as on each page load, maybe there could be some sort of "current status" of queued actions on each page so the user can see when it's ready).
The reason for all of this is simply to keep the user-facing web application responsive. Even if you do things like increase timeouts, users' browsers may still give up. Or the users themselves may give up after staring at a blank page and a spinning cursor for too long. The user interface should always respond back to the user quickly.
You could look at using something like ignore_user_abort but that is still not ideal in my opinion. I would look at deferring these actions and running them through a message queue. PHP comes with Gearman - that is one option. Using a message queue scales well and does a better job ensuring the request actions actually get completed.
Lots on SO on the subject... Asynchronous processing or message queues in PHP (CakePHP) ...but don't use Cake :)
set_time_limit() is your friend.
If it were me, I would put a loading icon animation in the user interface telling them to wait. Then I would execute the "long process" using an asynchronous AJAX call that would then return an answer, positive or negative, that you would pass to the user through JavaScript.
Just like when you upload pictures to Facebook, you can tell the user what is going on. Very clean!
So here's the lowdown:
The client i'm developing for is on HostGator, which has limited their max_execution_time to 30 seconds and it cannot be overridden (I've tried and confirmed it cannot be via their support and wiki)
What I'm have the code doing is take an uploaded file and...
loop though the xml
get all feed download links within the file
download each xml file
individually loop though each xml array of each file and insert the information of each item into the database based on where they are from (i.e. the filename)
Now is there any way I can queue this somehow or split the workload into multiple files possibly? I know the code works flawlessly and checks to see if each item exists before inserting it but I'm stuck getting around the execution_limit.
Any suggestions are appreciated, let me know if you have any questions!
The timelimit is in effect only when executing PHP scripts through a webserver, if you execute the script from CLI or as a background process, it should work fine.
Note that executing an external script is somewhat dangerous if you are not careful enough, but it's a valid option.
Check the following resources:
Process Control Extensions
And specifically:
pcntl-exec
pcntl-fork
Did you know you can trick the max_execution_time by registering a shutdown handler? Within that code you can run for another 30 seconds ;-)
Okay, now for something more useful.
You can add a small queue table in your database to keep track of where you are in case the script dies mid-way.
After getting all the download links, you add those to the table
Then you download one file and process it; when you're done, you check them off (delete from) from the queue
Upon each run you check if there's still work left in the queue
For this to work you need to request that URL a few times; perhaps use JavaScript to keep reloading until the work is done?
I am in such a situation. My approach is similar to Jack's
accept that execution time limit will simply be there
design the application to cope with sudden exit (look into register_shutdown_function)
identify all time-demanding parts of the process
continuously save progress of the process
modify your components so that they are able to start from arbitrary point, e.g. a position in a XML file or continue downloading your to-be-fetched list of XML links
For the task I made two modules, Import for the actual processing; TaskManagement for dealing with these tasks.
For invoking TaskManager I use CRON, now this depends on what webhosting offers you, if it's enough. There's also a WebCron.
Jack's JavaScript method's advantage is that it only adds requests if needed. If there are no tasks to be executed, the script runtime will be very short and perhaps overstated*, but still. The downsides are it requires user to wait the whole time, not to close the tab/browser, JS support etc.
*) Likely much less demanding than 1 click of 1 user in such moment
Then of course look into performance improvements, caching, skipping what's not needed/hasn't changed etc.
I have a PHP function that I want to make available publically on the web - but it uses a lot of server resources each time it is called.
What I'd like to happen is that a user who calls this function is forced to wait for some time, before the function is called (or, at the least, before they can call it a second time).
I'd greatly prefer this 'wait' to be enforced on the server-side, so that it can't be overridden by dubious clients.
I plan to insist that users log into an online account.
Is there an efficient way I can make the user wait, without using server resources?
Would 'sleep()' be an appropriate way to do this?
Are there any suggested problems with using sleep()?
Is there a better solution to this?
Excuse my ignorance, and thanks!
sleep would be fine if you were using PHP as a command line tool for example. For a website though, your sleep will hold the connection open. Your webserver will only have a finite number of concurrent connections, so this could be used to DOS your site.
A better - but more involved - way would be to use a job queue. Add the task to a queue which is processed by a scheduled script and update the web page using AJAX or a meta-refresh.
sleep() is a bad idea in almost all possible situations. In your case, it's bad because it keeps the connection to the client open, and most webservers have a limit of open connections.
sleep() will not help you at all. The user could just load the page twice at the same time, and the command would be executed twice right after each other.
Instead, you could save a timestamp in your database for when your function was last invoked. Then, before invoking it, you should check the database to see if a suitable amount of time has passed. If it has, invoke the function and update the timestamp in the database.
If you're planning on enforcing a user login, than the problem just got a whole lot simpler.
Have a record inn the database listing users and the last time they used your resource consuming service, and measure the time difference between then and now. If the time difference is too low, deny access and display an error message.
This is best handled at the server level. No reason to even invoke PHP for repeat requests.
Like many sites, I use Nginx and you can use it's rate-limiting to block repeat requests over a certain number. So like, three requests per IP, per hour.
How can I make a scheduler in PHP without writing a cron script? Is there any standard solution?
Feature [For example]: sent remainder to all subscriber 24hrs b4 the subscription expires.
The standard solution is to use cron on Unix-like operating systems and Scheduled Tasks on Windows.
If you don't want to use cron, I suppose you could try to rig something up using at. But it is difficult to imagine a situation where cron is a problem but at is A-OK.
The solution I see is a loop (for or while) and a sleep(3600*24);
Execute it through a sending ajax call every set interval of yours through javascript
Please read my final opinion at the bottom before rushing to implement.
Cron really is the best way to schedule things. It's simple, effective and widely available.
Having said that, if cron is not available or you absolutely don't want to use it, two general approaches for a non-cron, Apache/PHP pseudo cron running on a traditional web server, is as follows.
Check using a loadable resource
Embed an image/script/stylesheet/other somewhere on each web page. Images are probably the best supported by browsers (if javascript is turned off there's no guarantee that the browser will even load .js source files). This page will send headers and empty data back to the browser (a 1x1 clear .gif is fine - look at fpassthru)
from the php manual notes
<?php
header("Content-Length: 0");
header("Connection: close");
flush();
// browser should be disconnected at this point
// and you can do your "cron" work here
?>
Check on each page load
For each task you want to automate, you would create some sort of callable API - static OOP, function calls - whatever. On each request you check to see if there is any work to do for a given task. This is similar to the above except you don't use a separate URL for the script. This could mean that the page takes a long time to load while the work is being performed.
This would involve a select query to your database on either a task table that records the last time a task has run, or simply directly on the data in question, in your example, perhaps on a subscription table.
Final opinion
You really shouldn't reinvent the wheel on this if possible. Cron is very easy to set up.
However, even if you decide that, in your opinion, cron is not easy to set up, consider this: for each and every page load on your site, you will be incurring the overhead of checking to see what needs to be done. True cron, on the other hand, will execute command line PHP on the schedule you set up (hourly, etc) which means your server is running the task checking code much less frequently.
Biggest potential problem without true cron
You run the risk of not having enough traffic to your site to actually get updates happening frequently enough.
Create a table of cronjob. In which keep the dates of cron job. Keep a condition, if today date is equal to the date in the creonjob table. then call for a method to execute. This works fine like CRON job.
I need to do some additional processing after a Drupal page has been sent.
I know I could fire a background shell command, but I need the current Drupal execution context to be maintained.
I've spent a lot of time looking, but I can't find any documentation in this regard. This is surprising because it must surely be a common requirement.
The only real idea I have is to fire up Drupal (again) via a shell command (exec, etc) and supply it with a pseudo path which would invoke the continued processing. But this seems unnecessarily complex/wasteful.
Any pointers greatly appreciated, tks.
UPDATE: Based on Googletorp and Matt's replies, I just want to point out that I'm not doing housekeeping with this additional processing. Without going into too much detail, I have a number of pages whose content is based on multiple nodes. If one of these child nodes changes then the page needs to be updated immediately, but there is no reason why the user who updated the child node needs to be kept waiting while this happens.
So, the control flow would be:
UPDATE CHILD NODE
RETURN UPDATED CHILD NODE VIEW TO USER (this is where Drupal would normally terminate)
REGENERATE PARENT PAGE
EXIT
Neither Cron nor the rules module have the immediacy I require - but thanks for your input.
This is something that you should be doing using http://drupal.org/project/job_queue, and cron to complete whatever you need. However if you want a quick fix and just want the page to appear and things to continue happening you can use the PHP function flush()
http://php.net/manual/en/function.flush.php
For background processes I recommend a cron, which is built into the drupal API with hook_cron.
Your update indicates you require immediate execution. Therefore you require forking or an asynchronous process. One option is pcntl_fork(), but I've had issues because (with mod_php at least) you don't really want to fork the web server process.
Your best option is probably to set up a page to specifically perform your update, then call it via a forked curl process through popen(). For example,
popen('curl http://localhost/update_parent_nodes &');
You can take a look at the rules module, which lets you run code whenever certain events happen.
Take a look at hook_exit() - it gets invoked from within drupal_page_footer(), right after the main content has been sent out to the client (which happens indirectly via the page_set_cache() call).
Be aware that it will usually not be invoked for JavaScript/AHAH callbacks, as those terminate the processing themselves earlier on in most cases, but for your described scenario, it should be the right way to go.
OP here. Just to share my solution with people for future reference:
I used the second Drupal shell template outlined here
This bootstraps Drupal from the command line. I modified it a bit so I could could specify my own function and args from the CL.
I then invoked the above script using the PHP system() call in conjunction with the background operator (&) - remember to redirect output or the & is irrelevant.
It's a bit messy, but it does exactly what I need.