I have a few time-consuming and (potentially) memory-intensive functions in my LAMP web application. Most of these functions will be executed every minute via cron (in some cases, the cron job will execute multiple instances of these functions).
Since memory is finite, I don't want to run into issues where I am trying to execute a function the environment can no longer handle. What is a good approach at dealing with potential memory problems?
I'm guessing that I need to determine how much memory is available to me, how much memory each function requires before executing it, determine what other functions are being executed by the cron AND their memory usage, etc.
Also, I don't want to run into the issue where a certain function somehow gets execution priority over other functions. If any priority is given, I'd like to have control over that somehow.
you could look into caching technologies like APC which lets you write stuff right into the RAM so that you can access it fast which of use if you dont want to do expensive tasks like mysql queries repeatedly.
an example for caching i could think of would be that you could cache emails rather than retreiving them again and again from the email server. basicaly ram caching is a very useful technique if you have things in your script that you want to preserve for the next time of script execution but if your script does unique things every time it is executed it would be useless. also as for contoll you could call memory_get_usage() on each script execution and write that value into the apc cache so that every cron could retreive that value and look whether enough memory is free for it to complete.
as for average usage you could write an array with the last lets say 100 function executions and when you call that function again it could apc_fetch that from the ram and calculate the average memory usage for that function then compare it to how much ram is being used right now and then decide wheter to start. furthermore it could write that estimate into the current memory usage variable to prevent other scripts from being run. at the end of that function you subtract that amount from the variable again.
tl;dr:
look into the apc_fetch, apc_store and memory_get_usage functions
Part of your problem may be the fact you are doing a cron every minute? Why not set some flags so only one instance of that cron is running before another executes the full logic? i.e. create a flat file thats deleted at the end of the cron to act as a 'lock'. This will make sure one cron process fully completes before any others go forward. However, I urge you to refer to my comment on your post so that I and others can give you more solid advice.
Try optimizing your algorithms. Like...
Once you're finished with a variable you should destroy it if you no longer need it.
Close MySQL connections after you've finished with them.
Use recursion.
Also as Jauzsika said change your memory limit in your php.ini, although don't make it too high. If you need more than 256MB RAM then I would suggest changing to a different language instead of PHP.
In your position, I'd consider writing a daemon instead of relying on cron. The daemon could monitor a queue and be aware of the number of child processes it has running. Managing multiple processes definitely isn't php's biggest strength, but you can do it. Pear even includes a System_Daemon package.
Your daemon could use memory_get_usage and call out out free, uptime, and friends to throttle the number of workers to match system conditions.
I don't have any direct experience with this, and I wouldn't be too too surprised if a daemon written in PHP gradually leaked memory. But if it's acceptably slow, cron could cycle the daemon every so often...
You can find out how much memory is currently in use by your script using memory_get_usage
But you can not determine how much your next function will need, before executing it. You can only see after execuiting, using memory_get_usage. You can however store the memory your function used the last times in a database and calculate with the average memory amount.
Regarding the eecution priority, I don't think it is posible to determine with PHP. Apache (or whatever webserver you are using) spawns multiple processes and the operating system schedules which one will be executed in which order.
Related
I am writing a daemon in PHP. I did not take a OS class in college. So, I'm wondering, what are the server/other statistics that I need to be looking at to make sure my Daemon is not consuming too much system resources and will be able to scale when there are more mysql records. Basically, my daemon is processing a bunch of mysql table rows.
For example, I understand I need to see how long the daemon is taking to process a certain number of rows, and the amount of memory it is using. But, how do I determine if it is leaking memory? Also, what other system parameters should I be judging the daemon by?
But, how do I determine if it is leaking memory?
The stuff you're asking about here has little to do with the operating system. You're right to be concerned about memory usage. A proper answer to this question goes way beyond the scope of a post here but you might want to start by looking at how reference counting works for memory management, and make sure you've got the circular reference checker configured in your PHP installation. The plot thickens when you discover that the mysql client blocks PHP while it is running and ignores PHP's memory limits - so if you fetch too large a result set, you won't know about it until mysql_query returns and your code falls over: always use LIMIT in queries (or PK selection) and for preference run the daemon under a watchdog. Test using varying memory limits lower than you intend to use in production.
Note that PHP will only start making more memory available to itself via garbage collection when it thinks it's running out of memory.
Write lots of stuff to log files!
Depending on how you are going to execute the Daemon fire up top in linux and then have it process a lot of rows (100k+, or something that would take about 30 seconds to execute) of what you anticipate. Look to see how fast memory usage increases: with small tasks it happens too fast, you need the running process.
Then be sure that you unset($objectOrString), close all files and connections to the database as soon as you are done using them: this will help.
Again, depending on what this file will be doing you may want to let it terminate and use a cron job to start it up agian so that PHP can run its garbage collection for you.
I have a PHP-script running on my server via a cronjob. The job runs every minute. In the php script i have a loop that executes, then waits one sevond and loops again. Essentially creating a script to run once every second.
Now I'm wondering, if i make the cronjob run only once per hour and have the script still loop for an entire hour or possible an entire day.. Would this have any impact on the servers cpu and or memory and if so, will it be positive or negative?
I spot a design flaw.
You can always have a PHP script permanently running in a loop performing whatever functionality you require, without dependency upon a webserver or clients.
You are obviously checking something with this script, any incites into what? There may be better solutions for you. For example if it is a database consider SQL triggers.
In my opinion it would have a negative impact. since the scripts keeps using resources.
cron is called on a time based scale that is already running on the server.
But cronjob can only run once a minute at most.
Another thing is if the script times out, fails, crashes for whatever reason you end up with not running the script for at max one hour. Would have a positive impact on server load but not what you're looking for i guess? :)
maybe run it every 2 or even 5 minutes to spare server load?
OR maybe change the script so it does not wait but just executes once and calling it from cron job. should have a positive impact on server load.
I think you should change script logic if it is possible.
If tasks your script executes are not periodic but are triggered by some events, the you can use some Message Queue (like Gearman).
Otherwise your solution is OK. Memory leaks can occurs, but in new PHP versions (5.3.x) Garbage Collector is pretty good. Some extensions can lead to memory leaks. Or your application design can lead to hungry memory usage (like Doctrine ORM loaded objects cache).
But you can control script memory usage by tools like monit and restart your script when mempry limit reaches some point or start script again when your script unexpectedly shuts down.
I am looking for the PHP equivalent for VB doevents.
I have written a realtime analysis package in VB and used doevents to release to the operating system.
Doevents allows me to stay in memory and run continuously without filling up memory and allows me to respond to user input.
I have rewritten the package in PHP and I am looking for that same doevents feature.
If it doesn't exist I could reschedule myself and exit.
But I currently don't know how to do that and I think that would add a lot more overhead.
Thank you, gerardg
usleep is what you are looking for.. Delays program execution for the given number of micro seconds
http://php.net/manual/en/function.usleep.php
It's been almost 10 years since I last wrote anything in VB and as I recall, doevents() function allowed the application to yield to the processor during intensive processing (usually to allow other system events to fire - the most common being WM_PAINT so that your UI won't appear hung).
I don't think PHP has such functionality - your script will run as a single process and end (either when it's done or when it hits the default 30 second timeout).
If you are thinking in terms of threads (as most Windows programmers tend to do) and needing to spawn more than 1 instance of your script, perhaps you should take look at PHP's Process Control functions as a start.
I'm not entirely sure which aspects of doevents you're looking to emulate, so here's pretty much everything that could be useful for you.
You can use ob_implicit_flush(true) at the top of your script to enable implicit output buffer flushing. That means that whenever your script calls echo or print or whatever you use to display stuff, PHP will automatically send it all to the user's browser. You could also just use ob_flush() after each call to display something, which acts more like Application.DoEvents() in VB with regards to keeping your UI active, but must be called each time something is output.
Naturally if your script uses the output buffer already, you could build a copy of the buffer before flushing, with ob_get_contents().
If you need to allow the script to run for more time than usual, you can set a longer tiemout with set_time_limit($time). If you need more memory, and you have access to edit your .htaccess file, place the following code and edit the value:
php_value memory_limit 64M
That sets the memory limit to 64 megabytes.
For running multiple scripts at once, you can use pcntl_exec to start another one running.
If I am missing something important about DoEvents(), let me know and I will try to help you make it work.
PHP is designed for asynchronous on demand processing. However it can be forced to become a background task with a little hackery.
As PHP is running as a single thread you do not have to worry about letting the CPU do other things as that is already taken care of. If this was not the case then a web server would only be able to serve up one page at a time and all other requests would have to sit in a queue. You will need to write some sort of look that never expires until some detectable condition happens (like the "now please exit" message you set in the DB or something).
As pointed out by others you will need to set_time_limit($something); with perhaps usleep stopping the code from running "too fast" if it eats very much CPU each loop. However if you are also using a Database connection most of your script time is actually the script waiting for the Database (by far the biggest overhead for a script).
I have seen PHP worker threads created by using screen and detatching it to a background task. Other approaches also work so long as you do not have a session that will time out or exit (say when the web browser is closed). A cron that starts a script to check if the script is running every x mins or hours gives you automatic recovery from forced exists and/or system restarts.
TL;DR: doevents is "baked in" to PHP and you don't have to worry about it.
I have a few questions about PHP memory usage. I'm going to run some tests on my own, but getting various advice is quite helpful.
I recently learned about the PHP function ignore_user_abort(), which allows a script to continue running even if a user closes the page. I was thinking about using this for my E-mail Newsletter tool instead of Cron jobs, as configuring Cron jobs has various pitfuls. The alternative approach of making a user stay on the page, using AJAX requests, and running part of the script after the page content has been delivered all have issues as well.
My solution would be to run call ignore_user_abort(true) at the beginning of the script, and at the end after the content has been generated, call flush() for good measure, and then run the newsletter script. Alternatively, do this with an AJAX.
First of all, does anyone see issues with that approach?
Second of all, if I used the script with no time limit set, and a while loop going through each email, what would the memory usage be like if I did it in one go? Since I'd be overwriting variables, not using new ones, I'd think it would be low.
Third, because if I am sending a large volume of emails, say 1000 per run, I don't want to overload my mail server. With my cron job, I run the script every 5 minutes, sending a batch of 50 emails out. If I was doing this in a single pass, could I send out 50 emails, call sleep for say 5 minutes, and then continue for another 50 emails? If so, what is the script memory usage like during the sleep period? Would this be an efficient method?
What I'm really trying to do here is come up with a way to create a newsletter tool that doesn't require the complex (for non-technical folks) task of setting up a Cron job (Which isn't even an option on shared hosts), and doesn't require the user to keep their browser open on a single page.
Any ideas suggestions or feedback is welcome. Thanks!
At a former job we wrote a daemon for a critical function in PHP, not exactly what you describe but similar enough -- certainly with loops and sleeps. We were very doubtful about its long-term stability -- specially in memory management--, so we subjected it to pretty tough stress testing. Results were excellent, and the code was put to production and running flawlessly for months if not years.
Caveats:
IIRC, PHP has a counter-based garbage
collector. This means that, unlike in
Java, two objects referencing each
other will stay in memory even if not
accessible by your program. You need
to be careful about this when you
'abandon' your objects.
Web servers
often have mechanisms to kill
long-running scripts. This may defeat
your purpose here -- specially if the
server's configuration can't be
tuned.
I have a php script run as a cron job that executes a set of simple tasks that loops for each user in the database and takes about 30 mins to complete. This process starts over every hour and needs to be as fast and efficient as possible. The problem Im having, is like with any server script, execution time varies and I need to figure out the best cron time settings.
If I run cron every minute, I need to stop the last loop of the script 20 seconds before the end of the minute to make sure that the current loop finishes in time. Over the course of the hour this adds up to a lot of wasted time.
Im wondering if its a bad idea to simple remove the php execution time limit and run the script once an hour and let it run to completion.... is this a bad idea?
Instead of setting the max_execution_time you could also use set_time_limit() to reset the counter on every loop. This will ensure your script is never running out of time unless there is something serious hanging within the current loop (and taking longer than the max_execution_time).
Basically this should make your script run as long as it needs while giving it a 30 seconds timeout between two set_time_limit() calls.
Assuming you'd like the work done ASAP, don't use cron. Cron is good for things that need to happen at specific times. It's often abused to simulate a background process that would ideally process work as soon as work appears. You should probably write a daemon that runs continuously. (Note: you could also look at a message/work-queue type system, there are nice libraries out there to do this too)
You can write a daemon from scratch using the pcntl functions (since you don't care about multiple worker processes, it's super-easy to get a process running in the background.), or cheat and just make a script that runs forever and run it via screen, or leverage some solid library code like PEAR's System:Daemon or nanoserv
Once the daemonization stuff is taken care of, all you really care about is having a loop that runs forever. You'll want to take care that your script doesn't leak memory, or consume too many resources.
Generally, you can do something like:
<?PHP
// some setup code
while(true){
$todo = figureOutIfIHaveWorkToDo();
foreach($todo as $something){
//do stuff with $something
//remember to clean up resources so you don't leak memory!
usleep(/*some integer*/);
}
usleep(/* some other integer */);
}
And it'll work pretty well.
Setting the time limit to 0 and letting it do its thing is fairly typical of PHP based cronjobs (in my experience), but this is also the point when you should ask yourself a few important questions, such as "Should I rewrite this job in a compiled language?" and "Am I using all of my tools (database, etc) to their maximum efficiency?"
That said, maybe better than completely removing the time limit would be to set it to the upper limit you actually want. If that means 48 minutes, then set_time_limit(48 * 60);
I really think you shouldn't set the time out to 0, that is just looking for trouble. At most, set it to 59*60 seconds, but setting it to 0 might cause security problems, if a script hangs, it will hang almost forever until the server host stops the execution. It is considered bad practice to do so.
I have used the php command-line interface for similar long running tasks in the past. You probably do not want to remove the execution time limit for any request.
Sounds like a great idea if there's little chance that it will take more than an hour. Note, however, that the wrong bug can be a really good way of making it take longer than expected..
To avoid all sorts of nasty problems, you should have a guard file with the process ID of the script. On startup, you should check to make sure the file doesn't exist, or if it does that the process ID in the file doesn't exist (through a kill( pid, 0 ) call). If these conditions are met, create a new file with the script's PID and delete the file when you're done.
This is the same trick that many daemons use to ensure it isn't already running. If the daemon was killed suddenly, the file will still exist but the PID of the process therein is unlikely to be running.
Depending on what your script does, it can lead to problems if you remove the time limit. If per example, you are polling an external server that is unresponsive while the job is running, and that your cron takes 2 hours instead of 30 minutes to complete, you may get a stack of PHP processes being fired up even if the previous ones haven't completed yet. This can cause system instability and crashes.
You probably have two options:
Make sure that no other instance of your script is running beforehand, otherwise exit() on start.
Consider changing your cronjob into a daemon.
Does it have to run hourly like clockwork?
If not split the job (you mentioned it was more than one simple task) do each task every hour?
Or split it per user, do A-M on hour, then N-Z the next?