So here's the lowdown:
The client i'm developing for is on HostGator, which has limited their max_execution_time to 30 seconds and it cannot be overridden (I've tried and confirmed it cannot be via their support and wiki)
What I'm have the code doing is take an uploaded file and...
loop though the xml
get all feed download links within the file
download each xml file
individually loop though each xml array of each file and insert the information of each item into the database based on where they are from (i.e. the filename)
Now is there any way I can queue this somehow or split the workload into multiple files possibly? I know the code works flawlessly and checks to see if each item exists before inserting it but I'm stuck getting around the execution_limit.
Any suggestions are appreciated, let me know if you have any questions!
The timelimit is in effect only when executing PHP scripts through a webserver, if you execute the script from CLI or as a background process, it should work fine.
Note that executing an external script is somewhat dangerous if you are not careful enough, but it's a valid option.
Check the following resources:
Process Control Extensions
And specifically:
pcntl-exec
pcntl-fork
Did you know you can trick the max_execution_time by registering a shutdown handler? Within that code you can run for another 30 seconds ;-)
Okay, now for something more useful.
You can add a small queue table in your database to keep track of where you are in case the script dies mid-way.
After getting all the download links, you add those to the table
Then you download one file and process it; when you're done, you check them off (delete from) from the queue
Upon each run you check if there's still work left in the queue
For this to work you need to request that URL a few times; perhaps use JavaScript to keep reloading until the work is done?
I am in such a situation. My approach is similar to Jack's
accept that execution time limit will simply be there
design the application to cope with sudden exit (look into register_shutdown_function)
identify all time-demanding parts of the process
continuously save progress of the process
modify your components so that they are able to start from arbitrary point, e.g. a position in a XML file or continue downloading your to-be-fetched list of XML links
For the task I made two modules, Import for the actual processing; TaskManagement for dealing with these tasks.
For invoking TaskManager I use CRON, now this depends on what webhosting offers you, if it's enough. There's also a WebCron.
Jack's JavaScript method's advantage is that it only adds requests if needed. If there are no tasks to be executed, the script runtime will be very short and perhaps overstated*, but still. The downsides are it requires user to wait the whole time, not to close the tab/browser, JS support etc.
*) Likely much less demanding than 1 click of 1 user in such moment
Then of course look into performance improvements, caching, skipping what's not needed/hasn't changed etc.
Related
On my webservice users are allowed to create forms and let theire friends or co-workers create data with them. The collected data can be downloaded as a zip file stream. Sometimes users have huge amounts of data (up to 2gb) and the server simply kills the php process for obvious reasons. Is it somehow possible to create such a file on client side without flash,java (btw java doesn't work anyway for most of my users) etc. ?
Increase your script timeout and memory usage.
use set_time_limit function Docs
And use ini_set for memory_limit parameter.
And one more solution is to give the clients file parts. i.e. give them a limit for downloading the number of records. i.e. 1-1000, 1001-2000 etc
If you have control over the web server process I suggest you explore x-send-file as a solution to this.
See this SO question
In essence it will end the php process and send the file via the http server. This way time limits aren't an issue and you don't have a php instance hanging around.
Create a worker shell that keeps running in the background in a loop and checks for new data. If it finds new, unprocessed data have it prepare the download in the background. When the data is ready for download flag it as "ready" and inform the user (by email, polling via ajax for an status update, however you like) that his data was processed and is ready for download.
You can use nice to limit the CPU power used for that shell to avoid that it consumes all the available processing power and your site becomes slow.
That's exactly how I handle audio and video processing in one of my projects and it works fine.
I'm building a feature of a site that will generate a PDF (using TCPDF) into a booklet of 500+ pages. The layout is very simple but just due to the number of records I think it qualifies as a "long running php process". This will only need to be done a handful of times per year and if I could just have it run in the background and email the admin when done, that would be perfect. Considered Cron but it is a user-generated type of feature.
What can I do to keep my PDF rendering for as long as it takes? I am "good" with PHP but not so much with *nix. Even a tutorial link would be helpful.
Honestly you should avoid doing this entirely from a scalability perspective. I'd use a database table to "schedule" the job with the parameters, have a script that is continuously checking this table. Then use JavaScript to poll your application for the file to be "ready", when the file is ready then let the JavaScript pull down the file to the client.
It will be incredibly hard to maintain/troubleshoot this process while you're wondering why is my web server so slow all of a sudden. Apache doesn't make it easy to determine what process is eating up what CPU.
Also by using a database you can do things like limit the number of concurrent threads, or even provide faster rendering time by letting multiple processes render each PDF page and then re-assemble them together with yet another process... etc.
Good luck!
What you need is to change the allowed maximum execution time for PHP scripts. You can do that by several means from the script itself (you should prefer this if it would work) or by changing php.ini.
BEWARE - Changing execution time might seriously lower the performance of your server. A script is allowed to run only a certain time (30sec by default) before it is terminated by the parser. This helps prevent poorly written scripts from tying up the server. You should exactly know what you are doing before you do this.
You can find some more info about:
setting max-execution-time in php.ini here http://www.php.net/manual/en/info.configuration.php#ini.max-execution-time
limiting the maximum execution time by set_time_limit() here http://php.net/manual/en/function.set-time-limit.php
PS: This should work if you use PHP to generate the PDF. It will not work if you use some stuff outside of the script (called by exec(), system() and similar).
This question is already answered, but as a result of other questions / answers here, here is what I did and it worked great: (I did the same thing using pdftk, but on a smaller scale!)
I put the following code in an iframe:
set_time_limit(0); // ignore php timeout
//ignore_user_abort(true); // optional- keep on going even if user pulls the plug*
while(ob_get_level())ob_end_clean();// remove output buffers
ob_implicit_flush(true);
This avoided the page load timeout. You might want to put a countdown or progress bar on the parent page. I originally had the iframe issuing progress updates back to the parent, but browser updates broke that.
How can I make a scheduler in PHP without writing a cron script? Is there any standard solution?
Feature [For example]: sent remainder to all subscriber 24hrs b4 the subscription expires.
The standard solution is to use cron on Unix-like operating systems and Scheduled Tasks on Windows.
If you don't want to use cron, I suppose you could try to rig something up using at. But it is difficult to imagine a situation where cron is a problem but at is A-OK.
The solution I see is a loop (for or while) and a sleep(3600*24);
Execute it through a sending ajax call every set interval of yours through javascript
Please read my final opinion at the bottom before rushing to implement.
Cron really is the best way to schedule things. It's simple, effective and widely available.
Having said that, if cron is not available or you absolutely don't want to use it, two general approaches for a non-cron, Apache/PHP pseudo cron running on a traditional web server, is as follows.
Check using a loadable resource
Embed an image/script/stylesheet/other somewhere on each web page. Images are probably the best supported by browsers (if javascript is turned off there's no guarantee that the browser will even load .js source files). This page will send headers and empty data back to the browser (a 1x1 clear .gif is fine - look at fpassthru)
from the php manual notes
<?php
header("Content-Length: 0");
header("Connection: close");
flush();
// browser should be disconnected at this point
// and you can do your "cron" work here
?>
Check on each page load
For each task you want to automate, you would create some sort of callable API - static OOP, function calls - whatever. On each request you check to see if there is any work to do for a given task. This is similar to the above except you don't use a separate URL for the script. This could mean that the page takes a long time to load while the work is being performed.
This would involve a select query to your database on either a task table that records the last time a task has run, or simply directly on the data in question, in your example, perhaps on a subscription table.
Final opinion
You really shouldn't reinvent the wheel on this if possible. Cron is very easy to set up.
However, even if you decide that, in your opinion, cron is not easy to set up, consider this: for each and every page load on your site, you will be incurring the overhead of checking to see what needs to be done. True cron, on the other hand, will execute command line PHP on the schedule you set up (hourly, etc) which means your server is running the task checking code much less frequently.
Biggest potential problem without true cron
You run the risk of not having enough traffic to your site to actually get updates happening frequently enough.
Create a table of cronjob. In which keep the dates of cron job. Keep a condition, if today date is equal to the date in the creonjob table. then call for a method to execute. This works fine like CRON job.
I am working in a tool in PHP that processes a lot of data and takes a while to finish. I would like to keep the user updated with what is going on and the current task processed.
What is in your opinion the best way to do it? I've got some ideas but can't decide for the most effective one:
The old way: execute a small part of the script and display a page to the user with a Meta Redirect or a JavaScript timer to send a request to continue the script (like /script.php?step=2).
Sending AJAX requests constantly to read a server file that PHP keeps updating through fwrite().
Same as above but PHP updates a field in the database instead of saving a file.
Does any of those sound good? Any ideas?
Thanks!
Rather than writing to a static file you fetch with AJAX or to an extra database field, why not have another PHP script that simply returns a completion percentage for the specified task. Your page can then update the progress via a very lightweight AJAX request to said PHP script.
As for implementing this "progress" script, I could offer more advice if I had more insight as to what you mean by "processes a lot of data". If you are writing to a file, your "progress" script could simply check the file size and return the percentage complete. For more complex tasks, you might assign benchmarks to particular processes and return an estimated percentage complete based on which process has completed last or is currently running.
UPDATE
This is one suggested method to "check the progress" of an active script which is simply waiting for a response from a request. I have a data mining application that I use a similar method for.
In your script that makes the request you're waiting for (the script you want to check the progress of), you can store (either in a file or a database, I use a database as I have hundreds of processes running at any time which all need to track their progress, and I have another script that allows me to monitor progress of these processes) a progress variable for the process. When the process begins, set this to 1. You can easily select an arbitrary number of 'checkpoints' the script will pass and calculate the percentage given the current checkpoint. For a large request, however, you might be more interested in knowing the approximate percent the request has completed. One possible solution would be to know the size of the returned content and set your status variable according to the percentage received at any moment. I.e. if you receive the request data in a loop, each iteration you could update the status. Or if you are downloading to a flat file you could poll the size of the file. This could be done less accurately with time (rather than file size) if you know the approximate time the request should take to complete and simply compare against the script's current execution time. Obviously neither of these are perfect solutions, but I hope they'll give you some insight into your options.
I suggest using the AJAX method, but not using a file or a database. You could probably use session values or something like that, that way you don't have to create a connection or open a file to do anything.
In the past, I've just written messages out to the page and used flush() to flush the output buffer. Very simple, but it may not work correctly on every web server or with every web browser (as they may do their own internal buffering).
Personally, I like your second option the best. Should be reliable and fairly simple to implement.
I like option 2 - using AJAX to read a status file that PHP writes to periodically. This opens up a lot of different presentation options. If you write a JSON object to the file, you can easily parse it and display things like a progress bar, status messages, etc...
A 'dirty' but quick-and-easy approach is to just echo out the status as the script runs along. So long as you don't have output buffering on, the browser will render the HTML as it receives it from the server (I know WordPress uses this technique for it's auto-upgrade).
But yes, a 'better' approach would be AJAX, though I wouldn't say there's anything wrong with 'breaking it up' use redirects.
Why not incorporate 1 & 2, where AJAX sends a request to script.php?step=1, checks response, writes to the browser, then goes back for more at script.php?step=2 and so on?
if you can do away with IE then use server sent events. its the ideal solution.
PHP provides a mechanism to register a shutdown function:
register_shutdown_function('shutdown_func');
The problem is that in the recent versions of PHP, this function is still executed DURING the request.
I have a platform (in Zend Framework if that matters) where any piece of code throughout the request can register an entry to be logged into the database. Rather than have tons of individual insert statements throughout the request, slowing the page down, I queue them up to be insert at the end of the request. I would like to be able to do this after the HTTP request is complete with the user so the length of time to log or do any other cleanup tasks doesn't affect the user's perceived load time of the page.
Is there a built in method in PHP to do this? Or do I need to configure some kind of shared memory space scheme with an external process and signal that process to do the logging?
If you're really concerned about the insert times of MySQL, you're probably addressing the symptoms and not the cause.
For instance, if your PHP/Apache process is executing after the user gets their HTML, your PHP/Apache process is still locked into that request. Since it's busy, if another request comes along, Apache has to fork another thread, dedicate more memory to it, open additional database connections, etc.
If you're running into performance problems, you need to remove heavy lifting from your PHP/Apache execution. If you have a lot of cleanup tasks going on, you're burning precious Apache processes.
I would consider logging your entries to a file and have a crontab load these into your database out of band. If you have other heavy duty tasks, use a queuing/job processing system to do the work out of band.
Aside from register_shutdown_function() there aren't built in methods for determining when a script has exited. However, the Zend Framework has several hooks for running code at specific points in the dispatch process.
For your requirements the most relevant would be the action controller's post-dispatch hook which takes place just after an action has been dispatched, and the dispatchLoopShutdown event for the controller plugin broker.
You should read the manual to determine which will be a better fit for you.
EDIT: I guess I didn't understand completely. From those hooks you could fork the current process in some form or another.
PHP has several ways to fork processes as you can read in the manual under program execution. I would suggest going over the pcntl extension - read this blog post to see an example of forking child processes to run in the background.
The comments on this random guy's blog sound similar to what you want. If that header trick doesn't work, one of the comments on that blog suggests exec()ing to a separate PHP script to run in the background, if your Web host's configuration allows such a thing.
This might be a little hackish for your taste, but it would be an effective and simple workaround.
You could create a "queue" managed by your DB of choice, and first, store the request in your queue table in the database, then output an iframe that leads to a script that will trigger the instructions that match up with the queue_id in your database.
ex:
<?php
mysql_query('INSERT INTO queue ('instructions') VALUES ('something would go here');
echo('<iframe src="/yourapp/execute_queue/id?' . mysql_insert_id() . '" />');
?>
and the frame would do something like
ex:
<?php
$result = mysql_query('SELECT instructions FROM queue WHERE id = ' . $_GET['id']);
// From here, simply execute some instruction based on the "instructions" field, then delete the instruction from the database.
?>
Like I said, I can see how you could consider this hackish, but the frame will load independent of its parent page, so it would achieve what you want without running some other app in the background.
Hope this at least points you in the right direction.
Cheers!