Common technique in cleaning cache with php infinite loop - php

Hi the question is when you are executing infinite loop with php, how do you control memory clean up ?
The rough example is to get the result or to update the result from / to mysql in infinity while loop.
Need any common methods .
Thank you.
PS - all the nemesis and bugs of PHP were replaced by moving to python completely ...

As far as i know in PHP memory is freed when variable goes out of scope. But there are some other problems:
circullar references - PHP 5.3
should solve it - it also allows to run GC when you want
If PHP takes for example 5 MB of
memory in first iteration its
process will occupy this memory even
if later iterations would take for
example 1 MB
You have to free some things
manually (like for example mentioned
before database results)
Using scripting language for process-like running is very bad idea.
Try do it other way:
Write a script which would processs
amount of data that it would take
approximately 55-60 seconds to run.
Add a cron job to run it every
minute.
Add some kind of mutual exclusion to
script so cron would not run
concurrent scripts - you can
synchronise it on database table
(using SELECT FOR UPDATE)

As of PHP 5.3, you can explicitly trigger a GC cycle with gc_collect_cycles() as documented here.
Before that, it was out of your control, and you'd have to wait for PHP to decide it was time to take out the trash on its own - either by trying to exceed the memory limit with a significant amount of used-but-unattached memory objects or sacrificing a goat under the full moon and hoping for the best.

Related

Prevent PHP script using up all resources while it runs?

I have a daily cron job which takes about 5 minutes to run (it does some data gathering and then various database updates). It works fine, but the problem is that, during those 5 minutes, the site is completely unresponsive to any requests, HTTP or otherwise.
It would appear that the cron job script takes up all the resources while it runs. I couldn't find anything in the PHP docs to help me out here - how can I make the script know to only use up, say, 50% of available resources? I'd much rather have it run for 10 minutes and have the site available to users during that time, than have it run for 5 minutes and have user complaints about downtime every single day.
I'm sure I could come up with a way to configure the server itself to make this happen, but I would much prefer if there was a built-in approach in PHP to resolving this issue. Is there?
Alternatively, as plan B, we could redirect all user requests to a static downtime page while the script is running (as opposed to what's happening now, which is the page loading indefinitely or eventually timing out).
A normal script can't hog up 100% of resources, resources get split over the processes. It could slow everything down intensly, but not lock all resources in (without doing some funky stuff). You could get a hint by doing top -s in your commandline, see which process takes up a lot.
That leads to conclude that something locks all further processes. As Arkascha comments, there is a fair chance that your database gets locked. This answer explains which table type you should use; If you do not have it set to InnoDB, you probally want that, at least for the locking tables.
It could also be disk I/O if you write huge files, try to split it into smaller read/writes or try to place some of the info (e.g. if it are files with lists) to your database (assuming that has room to spare).
It could also be CPU. To fix that, you need to make your code more efficient. Recheck your code, see if you do heavy operations and try to make those smaller. Normally you want this as fast as possible, now you want them as lightweight as possible, this changes the way you write code.
If it still locks up, it's time to debug. Turn off a large part of your code and check if the locking still happens. Continue turning on code untill you notice locking. Then fix that. Try to figure out what is costing you so much. Only a few scripts require intense resources, it is now time to optimize. One option might be splitting it into two (or more) steps. Run a cron that prepares/sanites the data, and one that processed the data. These dont have to run at syncronical, there might be a few minutes between them.
If that is not an option, benchmark your code and improve as much as you can. If you have a heavy query, it might improve by selecting only ID's in the heavy query and use a second query just to fetch the data. If you can, use your database to filter, sort and manage data, don't do that in PHP.
What I have also implemented once is a sleep every N actions.
If your script really is that extreme, another solution could be moving it to a time when little/no visitors are on your site. Even if you remove the bottleneck, nobody likes a slow website.
And there is always the option of increasing your hardware.
You don't mention which resources are your bottleneck; CPU, memory or disk I/O.
However if it is CPU or memory you can do something this in you script:
http://php.net/manual/en/function.sys-getloadavg.php
http://php.net/manual/en/function.memory-get-usage.php
$yourlimit = 100000000;
$load = sys_getloadavg();
if ($load[0] > 0.80 || memory_get_usage() > $yourlimit) {
sleep(5);
}
Another thing to try would be to set your process priority in your script.
This requires SU though, which should be fine for a cronjob?
http://php.net/manual/en/function.proc-nice.php
proc_nice(50);
I did a quick test for both and it work like a charm, thanks for asking I have cronjob like that as well and will implement it. It looks like the proc_nice only will do fine.
My test code:
proc_nice(50);
$yourlimit = 100000000;
while (1) {
$x = $x+1;
$load = sys_getloadavg();
if ($load[0] > 0.80 || memory_get_usage() > $yourlimit) {
sleep(5);
}
echo $x."\n";
}
It really depend of your environment.
If using a unix base, there is built-in tools to limit cpu/priority of a given process.
You can limit the server or php alone, wich is probably not what you are looking for.
What you can do first is to separate your task in a separate process.
There is popen for that, but i found it much more easier to make the process as a bash script. Let''s name it hugetask for the example.
#!/usr/bin/php
<?php
// Huge task here
Then to call from the command line (or cron):
nice -n 15 ./hugetask
This will limit the scheduling. It mean it will low the priority of the task against others. The system will do the job.
You can as well call it from your php directly:
exec("nice -n 15 ./hugetask &");
Usage: nice [OPTION] [COMMAND [ARG]...] Run COMMAND with an adjusted
niceness, which affects process scheduling. With no COMMAND, print the
current niceness. Niceness values range from
-20 (most favorable to the process) to 19 (least favorable to the process).
To create a cpu limit, see the tool cpulimit which has more options.
This said, usually i am just putting some usleep() in my scripts, to slow it down and avoid to create a funnel of data. This is ok if you are using loops in your script. If you slow down your task to run in say 30 minutes, there won't be much issues.
See also proc_nice http://php.net/manual/en/function.proc-nice.php
proc_nice() changes the priority of the current process by the amount
specified in increment. A positive increment will lower the priority
of the current process, whereas a negative increment will raise the
priority.
And sys_getloadavg can also help. It will return an array of the system load in the last 1,5, and 15 minutes.
It can be used as a test condition before launching the huge task.
Or to log the average to find the best day time to launch huge task. It can be susrprising!
print_r(sys_getloadavg());
http://php.net/manual/en/function.sys-getloadavg.php
You could try to delay execution using sleep. Just cause your script to pause between several updates of your database.
sleep(60); // stop execution for 60 seconds
Although this depends a lot on the kind of process you are doing in your script. Maybe or not helpful in your case. Worth a try, so you could
Split your queries
do the updates in steps with sleep inbetween
References
Using sleep for cron process
I could not describe it better than the quote in the above answer:
Maybe you're walking the database of 9,000,000 book titles and updating about 10% of them. That process has to run in the middle of the day, but there are so many updates to be done that running your batch program drags the database server down to a crawl for other users.
So modify the batch process to submit, say, 1000 updates, then sleep for 5 seconds to give the database server a chance to finish processing any requests from other users that have backed up.
Sleep and server resources
sleep resources depend on OS
adding sleep to allevaite server resources
Probably to minimize you memory usage you should process heavy and lengthy operations in batches. If you query the database using an ORM like doctrine you can easily use existing functions
http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/reference/batch-processing.html
It's hard to tell what exactly the issue may be without having a look at your code (cron script). But to confirm that the issue is caused by the cron job you can run the script manually and check website responsiveness. If you notice the site being down when running the cron job then we would have to have a look at your script in order to come up with a solution.
Many loops in your cron script might consume a lot of CPU resources.
To prevent that and reduce CPU usage simply put some delays in your script, for example:
while($long_time_condition) {
//Do something here
usleep(100000);
}
Basically, you are giving the processor some time to do something else.
Also you can use the proc_nice() function to change the process priority. For example proc_nice(20);//very low priority. Look at this question.
If you want to find the bottlenecks in your code you can try to use Xdebug profiler.
Just set it up in your dev environment, start the cron manually and then profile any page. Also you can profile your cron script as well php -d xdebug.profiler_enable=On script.php, look at this question.
If you suspect that the database is your bottleneck than import pretty large dataset (or entire database) in your local database and repeat the steps, logging and inspecting all the queries.
Alternatively if it possible setup the Xdebug on the staging server where the server is as close as possible to production and profile the page during cron execution.

PHP script (algorithm) - only loops till a certain iteration after which it stops

I am running an algorithm in PHP which has a lot of data involved. All the processing happens within a nested for loop. Strangely, the outer for loop stops working after 'X' number of iterations (where 'X' is changing all the time I run the script). It takes anywhere between 5 mins to 30mins for the script to crash depending on 'X'. It does not throw out any errors, and only does an incomplete printout of my var_dump (in the first iteration of the outer loop)
These are the precautions I took:
1. I have set the timeout limit in php.ini to be 3600sec (60mins).
2. I am printing out the memory_get_usage() after every outer for loop iteration and i have verified that it is much lesser compared to the max memory allocated to php.
3. I am unsetting arrays once they are used
4. I reuse variable names to limit memory within the forloop
5. I have minimal calls to my DB
I have been solving this for a long time to no avail. So my question is what can be the cause of this problem/how do I go about debugging it. Thank you so much!
Extra: If i work with a much smaller test data size, everything works fine.
Obviously without code this is just a guess, but are you making sure to use a single connection to your database? If you are reconnecting every time you may get too many connections which could cause an error like this.
This sounds like an issue with utilisation of your server cores and a similar answer/workaround could be found here: Boost Apache2 up to 4 cores usage, running PHP
Try running your datasets in parallel.

How does the garbage collector work in PHP

I have a PHP script that has a large array of people, it grabs their details from an external resource via SOAP, modifies the data and sends it back. Due to the size of the details I upped PHP's memory to 128MB. After about 4 hours of running (It will probably take 4 days to run) it ran out of memory. Heres the basics of what it does:
$people = getPeople();
foreach ($people as $person) {
$data = get_personal_data();
if ($data == "blah") {
importToPerson("blah", $person);
} else {
importToPerson("else", $person);
}
}
After it ran out of memory and crashed I decided to initialise $data before the foreach loop and according to top, memory usage for the process hasn't risen above 7.8% and it's been running for 12 hours.
So my question is, does PHP not run a garbage collector on variables initialised inside the loop even if reused? Is the system reclaiming the memory and PHP hasn't marked it as usable yet and will eventually crash again (I've upped it to 256MB now so I've changed 2 things and not sure which has fixed it, I could probably change my script back to answer this but don't want to wait another 12 hours for it to crash to figure out)?
I'm not using the Zend framework so the other question like this I don't think is relevant.
EDIT: I don't actually have an issue with the script or what it's doing. At the moment, as far as all system reporting is concerned I don't have any issues. This question is about the garbage collector and how / when it reclaims resources in a foreach loop and / or how the system reports on memory usage of a php process.
I don't know the insides of PHP's VM, but from my experience, it doesn't garbage collect whilst your page is running. This is because it throws away everything your page created when it finishes.
Most of the time, when a page runs out of memory and the limit is pretty high (and 128Mb isn't high), there is an algorithm problem. Many PHP programmers assemble a structure of data, then pass it to the next step which iterates over the structure, usually creating another one. Lather, rinse, repeat. Unfortunately, this approach is a big memory hog and you end up creating multiple copies of your data in memory. Two of the really big changes in PHP 5 was that objects are reference counted, not copied, and the entire string subsystem was made much much faster. But it's still a problem.
To minimise memory use, you would look at re-structuring your algorithm so it can work with one piece of data from start to finish. Then you get the next and start again. Best case scenario is that you don't ever have the entire dataset in memory. For a database-backed website, this would mean processing a row of data from a database query all the way to presentation before getting the next. Of course, this approach isn't always possible and the script just has to keep a huge wodge of data in memory.
That said, you can do this memory-saving approach for part of the data. The trick is that you explicitly unset() a key variable or two at the end of the loop. This should reclaim the space. The other "best-practice" trick is to shift out of the loop data manipulation that doesn't need to be in the loop. As you seem to have discovered.
I've run PHP scripts that need upwards of 1Gb of memory. You can set the memory limit per script, actually, with ini_set('memory_limit', '1G');
Use memory_get_usage() to see what going on? Could put it inside of the loop to see the behavior in memory allocation.
Have you tried looking at the system monitor or whatever to see how much memory php is using during that process?

Why is it so bad to run a PHP script continuously?

I have a map. On this map I want to show live data collected from several tables, some of which have astounding amounts of rows. Needless to say, fetching this information takes a long time. Also, pinging is involved. Depending on servers being offline or far away, the collection of this data could vary from 1 to 10 minutes.
I want the map to be snappy and responsive, so I've decided to add a new table to my database containing only the data the map needs. That means I need a background process to update the information in my new table continuously. Cron jobs are of course a possibility, but I want the refreshing of data to happen as soon as the previous interval has completed. And what if the number of offline IP addresses suddenly spike and the loop takes longer to run than the interval of the Cron job?
My own solution is to create an infinite loop in PHP that runs by the command line. This loop would refresh the data for the map into MySQL as well as record other useful data such as loop time and failed attempts at pings etc, then restart after a short pause (a few seconds).
However - I'm being repeatedly told by people that a PHP script running for ever is BAD. After a while it will hog gigabytes of RAM (and other terrible things)
Partly I'm writing this question to confirm if this is in fact the case, but some tips and tricks on how I would go about writing a clean loop that doesn't leak memory (If that is possible) wouldn't go amiss. Opinions on the matter would also be appreciated.
The reply I feel sheds the most light on the issue I will mark as correct.
The loop should be in one script which will activate/call the actual script as a different process...much like cron is doing.
That way, even if memory leaks, and non collected memory is accumulating, it will/should be free after each cycle.
However - I'm being repeatedly told by people that a PHP script running for ever is BAD. After a while it will hog gigabytes of RAM (and other terrible things)
This used to be very true. Previous versions of PHP had horrible garbage collection, so long-running scripts could easily accidentally consume far more memory than they were actually using. PHP 5.3 introduced a new garbage collector that can understand and clean up circular references, the number one cause of "memory leaks." It's enabled by default. Check out that link for more info and pretty graphs.
As long as your code takes steps to allow variables to go out of scope at proper times and otherwise unset variables that will no longer be used, your script should not consume unnecessary amounts of memory just because it's PHP.
I don't think its bad, as with anything that you want to run continuously you have to be more careful.
There are libraries out there to help you with the task. Have a look at System_Daemon, which release RC 1 just over a month ago, which allows you to "Set options like max RAM usage".
Rather than running an infinite loop I'd be tempted to go with the cron option you mention in conjunction with a database table entry or flat-file that you'd use to store a "currently active" status bit to ensure that you didn't have overlapping processes attempting to run at the same time.
Whilst I realise that this would mean a minor delay before you perform the next iteration, this is probably a better idea anyway as:
It'll let the RDBMS perform any pending low-priority updates, etc. that may well been on-hold due to the amount of activity that you've been carrying out.
Even if you neatly unset all the temporary variables you've been using, it's still possible that PHP will "leak" memory, although recent improvements (5.2 introduced a new memory management system and garbage collection was overhauled in 5.3) should hopefully mean that this less of an issue.
In general, it'll also be easier to deal with other issues (if the DB connection temporarily goes down due to a config change and restart for example) if you use the cron approach, although in an ideal world you'd cater for such eventualities in your code anyway. (That said, the last time I checked, this was far from an ideal world.)
First I fail to see how you need a daemon script in order to provide the functionality you describe.
Cron jobs are of course a possibility, but I want the refreshing of data to happen as soon as the previous interval has completed
The neither a cron job nor a daemon are the way to solve the problem (unless the daemon becomes the data sink for the scripts). I'd spawn a dissociated process when the data is available using a locking strategy to aoid concurrency.
Long running PHP scripts are not intrinsically bad - but there reference counting garbage collector does not deal with all possible scenarios for cleaning up memory - but more recent implementations have a more advanced collector which should clean up a lot more (circular reference checker).

PHP and CPU - Process of chat + notifications

My site has a PHP process running, for each window/tab open, that runs in a maximum of 1 minute, and it returns notifications/chat messages/people online or offline. When JavaScript gets the output, it calls the same PHP process again and so on.
This is like Facebook chat.
But, seems it is taking too much CPU when it is running. Have you something in mind how Facebook handles this problem? What do they do so their processes don't take too much CPU and put their servers down?
My process has a "while(true)", with a "sleep(1)" at the end. Inside the cycle, it checks for notifications, checks if one of the current online people got offline/changed status, reads unread messages, etc.
Let me know if you need more info about how my process works.
Does calling other PHPs from "system()" (and wait for its output) alleviate this?
I ask this because it makes other processes to check notifications, and flushes when finished, while the main PHP is just collecting the results.
Thank you.
I think your main problem here is the parallelism. Apache and PHP do not excell at tasks like this where 100+ Users have an open HTTP-Request.
If in your while(true) you spend 0.1 second on CPU-bound workload (checking change status or other useful things) and 1 second on the sleep, this would result in a CPU load of 100% as soon as you have 10 users online in the chat. So in order so serve more users with THIS model of a chat you would have to optimize the workload in your while(true) cycle and/or bring the sleep interval from 1 second to 3 or higher.
I had the same problem in a http-based chat system I wrote many years ago where at some point too many parallel mysql-selects where slowing down the chat, creating havy load on the system.
What I did is implement a fast "ring-buffer" for messages and status information in shared memory (sysv back in the day - today I would probably use APC or memcached). All operations write and read in the buffer and the buffer itself gets periodicaly "flushed" into the database to persist it (but alot less often than once per second per user). If no persistance is needed you can omit a backend of course.
I was able to increase the number of user I could serve by roughly 500% that way.
BUT as soon as you solved this isse you will be faced with another: Available System Memory (100+ apache processes a ~5MB each - fun) and process context switching overhead. The more active processes you have the more your operating system will spend on the overhead involved with assigning "fair enough" CPU-slots AFAIK.
You'll see it is very hard to scale efficently with apache and PHP alone for your usecase. There are open source tools, client and serverbased to help though. One I remember places a server before the apache and queues messages internally while having a very efficent multi-socket communication with javascript clients making real "push" events possible. Unfortunatly I do not remember any names so you'll have to research or hope on the stackoverflow-community to bring in what my brain discarded allready ;)
Edit:
Hi Nuno,
the comment field has too few characters so I reply here.
Lets get to the 10 users in parallel again:
10*0.1 second CPU time per cycle (assumed) is roughly 1s combined CPU-time over a period of 1.1 second (1 second sleep + 0.1 second execute). This 1 / 1.1 which I would boldly round to 100% cpu utilization even though it is "only" %90.9
If there is 10*0.1s CPU time "stretched" over a period of not 1.1 seconds but 3.1 (3 seconds sleep + 0.1 seconds execute) the calculation is 1 / 3.1 = %32
And it is logical. If your checking-cycle queries your backend three times slower you have only a third of the load on your system.
Regarding the shared memory: The name might imply it but if you use good IDs for your cache-areas, like one ID per conversation or user, you will have private areas within the shared memory. Database tables also rely on you providing good IDs to seperate private data from public information so those should be arround allready :)
I would also not "split" any more. The fewer PHP-processes you have to "juggle" in parallel the easier it is for your systems and for you. Unless you see it makes absolutly sense because one type of notification takes alot more querying ressources than another and you want to have different refresh-times or something like that. But even this can be decided in the whyile cycle. users "away"-status could be checked every 30 seconds while the messages he might have written could get checked every 3. No reason to create more cycles. Just different counter variables or using the right divisor in a modulo operation.
The inventor of PHP said that he believes man is too limited to controll parallel processes :)
Edit 2
ok lets build a formula. We have these variables:
duration of execution (e)
duration of sleep (s)
duration of one cycle (C)
number of concurrent users (u)
CPU load (l)
c=e+s
l=ue / c #expresses "how often" the available time-slot c fits into the CPU load generated by 30 CONCURRENT users.
l=ue / (e+s)
for 30 users ASSUMING that you have 0.1s execution time and 1 second sleep
l=30*0.1 / (0.1 + 1)
l=2.73
l= %273 CPU utilization (aka you need 3 cores :P)
exceeding capab. of your CPU measn that cycles will run longer than you intend. the overal response time will increase (and cpu runs hot)
PHP blocks all sleep() and system() calls. What you really need is to research pcntl_fork(). Fortunately, I had these problems over a decade ago and you can look at most of my code.
I had the need for a PHP application that could connect to multiple IRC servers, sit in unlimited IRC chatrooms, moderate, interact with, and receive commands from people. All this and more was done in a process efficient way.
You can check out the entire project at http://sourceforge.net/projects/phpegg/ The code you want is in source/connect.inc.

Categories