My platform is PHP 5.2, Apache, Magento EE 1.9 and CentOS.
I have a pretty basic script which is fetching about 60,000 rows of data from an MS-SQL database using PHP's ms_sql() functions. The data is then processed a bit via data from Magento and finally written to a text file.
Really simple stuff...
$result = mssql_query($query);
while($row = mssql_fetch_assoc($result)) {
$member = $row; // Copied so I can modify it
// Do some stuff with each row... e.g.:
$customer = Mage::getModel("customer/customer");
$customer->loadByEmail($member["email"]);
$customerId = $customer->getId();
// Some more stuff like that...
$ordersCollection = Mage::getResourceModel('sales/order_collection');
// ...........
// Some more stuff like that...
$wishList = Mage::getModel('wishlist/wishlist')->loadByCustomer($customer);
// ...........
// Write straight to a file
fwrite($fp, implode("\t", $member) . "\r\n");
// Probably not even necessary
unset($member);
}
The problem is, the memory usage of my script increases with each iteration of the loop (about 10MB for every 300 rows), with a theoretical peak of about 2GB (though it hasn't got there yet).
I've taken great pains to ensure that I'm not leaving any data in memory. No huge arrays are building up, no variables are being added to, everything is either unset() or directly overwritten with each iteration of the loop.
So my question is: could the Magento functions be causing memory leaks?
And if so, how do I stop them from doing so?
Ideally this script should be totally "passive": just grab the query results, modify them a bit (very temporary memory needed for this) then dump them straight to a file and destroy the memory. But this is not happening!
Thanks
Exclude all Mage:: from your code and just dump data to the file without processing. And see what happens to the memory while doing this. Then start adding the Mage:: functions back one by one and see when it breaks.
This way you'll find the culprit. Then you need to start digging into it's implementation and see what could go wrong. You could also consider doing the processing without relying on your Mage:: calls. Just write the plain code to deal with the data in self-contained functions/classes and compare how things turn out if you exclude Mage:: entirely from the process.
Yes — PHP has a long history of non-ideal behavior when it comes to memory managment and code that pushes the edges of it's object oriented model.
You can try an alternate method of querying for your data that wastes less memory, or you can read up on how the Magento core team deals with this same issue.
Related
I have a daily cron job which will get a XML from web service. Sometimes it is large, contains more than 10K products information and the XML size will be 14M example.
What I need to do is parsing XML to object then processing them. The processing is quite complicated. Not like directly put them into the database, I need to do a lot operation on them, and finally put them into many database tables.
It is just in one PHP script. I don't have any experience on dealing with large data.
So the problem is it take a lot of memory. And very long time to do it. I turn my localhost PHP memory_limit to 4G and running 3.5hrs then got successful. But my production host is not allowed such amount memory.
I do a research but I am very confused which is a right way to dealing with this situation.
Here is a sample of my code:
function my_items_import($xml){
$results = new SimpleXMLElement($xml);
$results->registerXPathNamespace('i', 'http://schemas.microsoft.com/dynamics/2008/01/documents/Item');
//it will loop over 10K
foreach($results->xpath('//i:Item') as $data) {
$data->registerXPathNamespace('i', 'http://schemas.microsoft.com/dynamics/2008/01/documents/Item');
//my processing code here, it will call a other functions to do a lot things
processing($data);
}
unset($results);
}
As a start don't use SimpleXMLElement on the whole document. SimpleXMLElement loads everything in the memory and is not efficient for large data. Here is a snippet from a real code. You'll need to accommodate it to your case but hope you'll get the general idea.
$reader = new XMLReader();
$reader->xml($xml);
// Get cursor to first article
while($reader->read() && $reader->name !== 'article');
// Iterate articles
while($reader->name === 'article')
{
$doc = new DOMDocument('1.0', 'UTF-8');
$article = simplexml_import_dom($doc->importNode($reader->expand(), true));
processing($article);
$reader->next('article');
}
$reader->close();
$article is SimpleXMLElement which can be processed further.
This way you save a lot of memory by making only single article nodes go into memory.
Additionally if each processing() function take long time you can turn it into a background process which runs in separately from the main script and several processing() functions can be started in parallel.
Key hints:
dispose data during process.
Dispose data - mean over write it with blank data. BTW, unset is slower than overwrite with null
Use functions or static method, avoid as much oop instance as possible.
One extra question, how long it takes to loop your xml without do [lots things]:
function my_items_import($xml){
$results = new SimpleXMLElement($xml);
$results->registerXPathNamespace('i', 'http://schemas.microsoft.com/dynamics/2008/01/documents/Item');
//it will loop over 10K
foreach($results->xpath('//i:Item') as $data) {
$data->registerXPathNamespace('i', 'http://schemas.microsoft.com/dynamics/2008/01/documents/Item');
//my processing code here, it will call a other functions to do a lot things
//processing($data);
}
//unset($result);// no need
}
I've encountered the dreaded error-message, possibly through-painstaking effort, PHP has run out of memory:
Allowed memory size of #### bytes exhausted (tried to allocate #### bytes) in file.php on line 123
Increasing the limit
If you know what you're doing and want to increase the limit see memory_limit:
ini_set('memory_limit', '16M');
ini_set('memory_limit', -1); // no limit
Beware! You may only be solving the symptom and not the problem!
Diagnosing the leak:
The error message points to a line withing a loop that I believe to be leaking, or needlessly-accumulating, memory. I've printed memory_get_usage() statements at the end of each iteration and can see the number slowly grow until it reaches the limit:
foreach ($users as $user) {
$task = new Task;
$task->run($user);
unset($task); // Free the variable in an attempt to recover memory
print memory_get_usage(true); // increases over time
}
For the purposes of this question let's assume the worst spaghetti code imaginable is hiding in global-scope somewhere in $user or Task.
What tools, PHP tricks, or debugging voodoo can help me find and fix the problem?
PHP doesn't have a garbage collector. It uses reference counting to manage memory. Thus, the most common source of memory leaks are cyclic references and global variables. If you use a framework, you'll have a lot of code to trawl through to find it, I'm afraid. The simplest instrument is to selectively place calls to memory_get_usage and narrow it down to where the code leaks. You can also use xdebug to create a trace of the code. Run the code with execution traces and show_mem_delta.
Here's a trick we've used to identify which scripts are using the most memory on our server.
Save the following snippet in a file at, e.g., /usr/local/lib/php/strangecode_log_memory_usage.inc.php:
<?php
function strangecode_log_memory_usage()
{
$site = '' == getenv('SERVER_NAME') ? getenv('SCRIPT_FILENAME') : getenv('SERVER_NAME');
$url = $_SERVER['PHP_SELF'];
$current = memory_get_usage();
$peak = memory_get_peak_usage();
error_log("$site current: $current peak: $peak $url\n", 3, '/var/log/httpd/php_memory_log');
}
register_shutdown_function('strangecode_log_memory_usage');
Employ it by adding the following to httpd.conf:
php_admin_value auto_prepend_file /usr/local/lib/php/strangecode_log_memory_usage.inc.php
Then analyze the log file at /var/log/httpd/php_memory_log
You might need to touch /var/log/httpd/php_memory_log && chmod 666 /var/log/httpd/php_memory_log before your web user can write to the log file.
I noticed one time in an old script that PHP would maintain the "as" variable as in scope even after my foreach loop. For example,
foreach($users as $user){
$user->doSomething();
}
var_dump($user); // would output the data from the last $user
I'm not sure if future PHP versions fixed this or not since I've seen it. If this is the case, you could unset($user) after the doSomething() line to clear it from memory. YMMV.
There are several possible points of memory leaking in php:
php itself
php extension
php library you use
your php code
It is quite hard to find and fix the first 3 without deep reverse engineering or php source code knowledge. For the last one you can use binary search for memory leaking code with memory_get_usage
I recently ran into this problem on an application, under what I gather to be similar circumstances. A script that runs in PHP's cli that loops over many iterations. My script depends on several underlying libraries. I suspect a particular library is the cause and I spent several hours in vain trying to add appropriate destruct methods to it's classes to no avail. Faced with a lengthy conversion process to a different library (which could turn out to have the same problems) I came up with a crude work around for the problem in my case.
In my situation, on a linux cli, I was looping over a bunch of user records and for each one of them creating a new instance of several classes I created. I decided to try creating the new instances of the classes using PHP's exec method so that those process would run in a "new thread". Here is a really basic sample of what I am referring to:
foreach ($ids as $id) {
$lines=array();
exec("php ./path/to/my/classes.php $id", $lines);
foreach ($lines as $line) { echo $line."\n"; } //display some output
}
Obviously this approach has limitations, and one needs to be aware of the dangers of this, as it would be easy to create a rabbit job, however in some rare cases it might help get over a tough spot, until a better fix could be found, as in my case.
I came across the same problem, and my solution was to replace foreach with a regular for. I'm not sure about the specifics, but it seems like foreach creates a copy (or somehow a new reference) to the object. Using a regular for loop, you access the item directly.
I would suggest you check the php manual or add the gc_enable() function to collect the garbage... That is the memory leaks dont affect how your code runs.
PS: php has a garbage collector gc_enable() that takes no arguments.
I recently noticed that PHP 5.3 lambda functions leave extra memory used when they are removed.
for ($i = 0; $i < 1000; $i++)
{
//$log = new Log;
$log = function() { return new Log; };
//unset($log);
}
I'm not sure why, but it seems to take an extra 250 bytes each lambda even after the function is removed.
I didn't see it explicitly mentioned, but xdebug does a great job profiling time and memory (as of 2.6). You can take the information it generates and pass it off to a gui front end of your choice: webgrind (time only), kcachegrind, qcachegrind or others and it generates very useful call trees and graphs to let you find the sources of your various woes.
Example (of qcachegrind):
If what you say about PHP only doing GC after a function is true, you could wrap the loop's contents inside a function as a workaround/experiment.
One huge problem I had was by using create_function. Like in lambda functions, it leaves the generated temporary name in memory.
Another cause of memory leaks (in case of Zend Framework) is the Zend_Db_Profiler.
Make sure that is disabled if you run scripts under Zend Framework.
For example I had in my application.ini the folowing:
resources.db.profiler.enabled = true
resources.db.profiler.class = Zend_Db_Profiler_Firebug
Running approximately 25.000 queries + loads of processing before that, brought the memory to a nice 128Mb (My max memory limit).
By just setting:
resources.db.profiler.enabled = false
it was enough to keep it under 20 Mb
And this script was running in CLI, but it was instantiating the Zend_Application and running the Bootstrap, so it used the "development" config.
It really helped running the script with xDebug profiling
I'm a little late to this conversation but I'll share something pertinent to Zend Framework.
I had a memory leak problem after installing php 5.3.8 (using phpfarm) to work with a ZF app that was developed with php 5.2.9. I discovered that the memory leak was being triggered in Apache's httpd.conf file, in my virtual host definition, where it says SetEnv APPLICATION_ENV "development". After commenting this line out, the memory leaks stopped. I'm trying to come up with an inline workaround in my php script (mainly by defining it manually in the main index.php file).
I didn't see it mentioned here but one thing that might be helpful is using xdebug and xdebug_debug_zval('variableName') to see the refcount.
I can also provide an example of a php extension getting in the way: Zend Server's Z-Ray. If data collection is enabled it memory use will balloon on each iteration just as if garbage collection was off.
I wrote a web spider to spider pages concurrently. For each link that the spider finds, I want to fork off a new child that starts the process all over again.
I don't want to overload the target server so I created a static array that all objects can access. Each child can add their PID to the array, and either parent or child should check the array to see if $maxChildren have been met, and if so, patiently wait until any child finishes.
As you see, I have $maxChildren set to 3. I am expecting to see 3 simultaneous processes at any given time. However, that's not the case. The linux top command shows 12 to 30 processes at any given time. In concurrent programming, how can I regulate the number of simultaneous processes? My logic is currently inspired by how Apache handles it's max children, but I'm not exactly sure how that works.
As pointed out in one of the answers, globally accessing the static variable brings up issues with race conditions. To deal with this, the $children array takes the unique $PID of the process as both the key and it's value, thereby creating a unique value. My thinking is that since any object can only deal with one $children[$pid] value, locking is not necessary. Is this not true? Is there a chance that two processes could try to unset or add the same value at some point?
private static $children = array();
private $maxChildren = 3;
public function concurrentSpider($url) {
// STEP 1:
// Download the $url
$pageData = http_get($url, $ref = '');
if (!$this->checkIfSaved($url)) {
$this->save_link_to_db($url, $pageData);
}
// STEP 2:
// extract all hyperlinks from this url's page data
$linksOnThisPage = $this->harvest_links($url, $pageData);
// STEP 3:
// Check the links array from STEP 2 to see if this page has
// already been saved or is excluded because of any other
// logic from the excluded_link() function
$filteredLinks = $this->filterLinks($linksOnThisPage);
shuffle($filteredLinks);
// STEP 4: loop through each of the links and
// repeat the process
foreach ($filteredLinks as $filteredLink) {
$pid = pcntl_fork();
switch ($pid) {
case -1:
print "Could not fork!\n";
exit(1);
case 0:
if ($this->checkIfSaved($filteredLink)) {
exit();
}
//$pid = getmypid();
print "In child with PID: " . getmypid() . " processing $filteredLink \n";
$var[$pid]->concurrentSpider($filteredLink);
sleep(2);
exit(1);
default:
// Add an element to the children array
self::$children[$pid] = $pid;
// If the maximum number of children has been
// achieved, wait until one or more return
// before continuing.
while (count(self::$children) >= $this->maxChildren) {
//print count(self::$children) . " children \n";
$pid = pcntl_waitpid(-1, $status);
unset(self::$children[$pid]);
}
}
}
}
This is written in PHP. I know that the pcntl_waitpid function with argument of -1 waits for any child to complete regardless of the parent (http://php.net/manual/en/function.pcntl-waitpid.php).
What's wrong with my logic and how can I correct it so that only $maxChildren processes are running simultaneously? I'm also open to improving the logic in general if you have suggestions.
First thing to note: if this is truly a global being shared among multiple threads, it's possible that multiple threads are adding to it at once and you're running afoul of a race condition. You need some sort of concurrency control to ensure that only one process is accessing your global array at once.
Also, try the simple debugging trick of having each process write out (to the console or to a file) its PID and the full contents of the global array each time a new spider is forked. It will help you to check your assumptions (which are plainly wrong at some point) and figure out what's going wrong.
EDIT: (In response to the comments)
I'm not a PHP developer, but if I had to guess, based on the fact that you're using an OS tool that counts OS-level processes, I'd guess that your fork is spawning multiple processes, but your static array is global within the current process. Implementing system-wide shared memory is a lot more complicated!
If you just want to count something and ensure that instances of a shared resource don't grow out of control, look into semaphores, and see if you can find a way in PHP to create a named semaphore object that can be shared between multiple instances of your spider.
Use a real programming language ;)
Step 1 is kind of bad why are you downloading if it might be in the db. Put that inside the if and see if you can put a mutex around it. Maybe so something in sql to imitate one.
I hope harvest_links uses a proper html processor with css selector support (i like fizzler for .NET). I guess regular expression would be fine if its just to get links but it is possible to mess up.
I see step 4 and i don't think its bad but personally i'd do it a different way.
I'd have something like step one to insert url,page,flag into a db. Then i'd have another process or the same one ask the db for unprocessed pages and set the flag to some value if it errors and another if its successful. This is so if something fails of the process exits (shutdown, crash, power out, etc) it can pick it up easily and don't need to scan every page to find where it left off. It just ask the database for the next link and redoes what it didnt finish
PHP doesn't support multithreading, therefore it doesn't support mutexes or any other synchronization methods. As others have said in their answers, this will lead to a race condition.
You'll have to write a wrapper in C or bash. That way, the PHP script can submit targets to the wrapper, and the wrapper will handle scheduling.
Another approach is to rewrite your spider in Python or Ruby, both of which support multithreading. That will eliminate the need for interprocess communication.
Edit: On second thought, the best way is to write the wrapper in Python or Ruby and reuse your existing PHP code as a black box. That's a compromise of the solutions above.
If the spider is for practical purposes, you might want to google "curl multithread"
cURL Multi Threading with PHP
A site I am working with is starting to get a little sluggish, and I would like to refine it. I think the problem is with the PHP, but I can't be sure. How can I see how long functions are taking to perform?
If you want to test the execution time :
<?php
$startTime = microtime(true);
// Your content to test
$endTime = microtime(true);
$elapsed = $endTime - $startTime;
echo "Execution time : $elapsed seconds";
?>
Try the profiler feature in XDebug or Zend Debugger?
Two things you can do.
place Microtime calls everywhere although its not convenient if you want to test more than one function. So there is a simpler way to do it a better solution if you want to test many functions which i assume you would like to do.
just have a class (click on link to follow tutorial) where you can test how long all your functions take. Rather than place microtime everywhere. you just use this class. which is very convenient
http://codeaid.net/php/calculate-script-execution-time-%28php-class%29
the second thing you can do is to optimize your script is by taking a look at the memory usage.
By observing the memory usage of your scripts, you may be able optimize your code better.
PHP has a garbage collector and a pretty complex memory manager. The amount of memory being used by your script. can go up and down during the execution of a script. To get the current memory usage, we can use the memory_get_usage() function, and to get the highest amount of memory used at any point, we can use the memory_get_peak_usage() function.
view plaincopy to clipboardprint?
echo "Initial: ".memory_get_usage()." bytes \n";
/* prints
Initial: 361400 bytes
*/
// let's use up some memory
for ($i = 0; $i < 100000; $i++) {
$array []= md5($i);
}
// let's remove half of the array
for ($i = 0; $i < 100000; $i++) {
unset($array[$i]);
}
echo "Final: ".memory_get_usage()." bytes \n";
/* prints
Final: 885912 bytes
*/
echo "Peak: ".memory_get_peak_usage()." bytes \n";
/* prints
Peak: 13687072 bytes
*/
http://net.tutsplus.com/tutorials/php/9-useful-php-functions-and-features-you-need-to-know/
PK
You can also make it manually, by recording microtime() value in various places, like this:
<?
$TIMER['start']=microtime(TRUE);
// some code
$query="SELECT ...";
$TIMER['before q']=microtime(TRUE);
$res=mysql_query($query);
$TIMER['after q']=microtime(TRUE);
while ($row = mysql_fetch_array($res)) {
// some code
}
$TIMER['array filled']=microtime(TRUE);
// some code
$TIMER['pagination']=microtime(TRUE);
/and so on
?>
and then visualize it
<?
if ('127.0.0.1' === $_SERVER['REMOTE_ADDR']) {
echo "<table border=1><tr><td>name</td><td>so far</td><td>delta</td><td>per cent</td></tr>";
reset($TIMER);
$start=$prev=current($TIMER);
$total=end($TIMER)-$start;
foreach($TIMER as $name => $value) {
$sofar=round($value-$start,3);
$delta=round($value-$prev,3);
$percent=round($delta/$total*100);
echo "<tr><td>$name</td><td>$sofar</td><td>$delta</td><td>$percent</td></tr>";
$prev=$value;
}
echo "</table>";
}
?>
an IP address check implies that we are doing this profiling on the working site
Though I doubt it's PHP itself. Most likely it's database. So, pay most attention to query execution timing.
however, a "site" term is very broad. It includes also JS, CSS, images and stuff. So, I'd suggest to start form FirebFug's Net page to see what part of whole page takes more time.
Of course, refining can be done only after analysis of profiling results, and cannot be advised here without it.
Your best bet is Xdebug. Im happy as it comes bundled in my PHPed IDE. I can get profiler data at the click of a button.
So maybe you could consider that.
I had similar issues and so I created 2 new tables on the database and two new functions. One was audit_sql and the other was audit_code. Because I used an SQL abstraction class it was easy to time every single SQL call (I used php microtime as some others have suggested). So, I called microtime before and after the SQL call and stored the results on the database.
Similarly with pages. I called microtime at the start and end of each page and if necessary at the start and end of functons, divs - whatever I thought might be a culprit.
The general results were:
SQL calls to MySQL were almost instantaneous and were nto a problem at all. The only thing I would say is that even I was surprised at the number being executed! The site is generated from the database - even the menus, permissions etc. To produce the home page the SQL calls were measured in the 100s.
PHP was not the culprit. This was even more instantaneous that MySQL.
The culprit was.... (big build up!) calls to You Tube and Picassa and other sites like that. I host videos and photo albums on the site (well, I don't actually store them - they are stored on YT etc.) and on the home page are thumbnails that are extracted from You Tube and the like via the You Tube PHP API/Zend Framework. Because this is all http based to the other sites, each one was taking 1, 2 or 3 seconds. This was causing those divs containing these to take between 6 and 12 seconds and the home page up to 17 seconds.
The solution - store all thumbnails on my server. The first time one has to be served from the remote site (YT, Picassa etc.) so do that and then store it on your own site. Future times, you check if you have it and if so serve it always from your server. Cuts the page load time down to 2-3 seconds tops. Granted the first person to view the first home page load after someone has loaded more videos/images will take some time, but not thereafter. People will put a long one-off page load time down to their connection/the internet in general. Too many slow loads of your site and they will stop visiting!
I hope that helps somewhat.
I've encountered the dreaded error-message, possibly through-painstaking effort, PHP has run out of memory:
Allowed memory size of #### bytes exhausted (tried to allocate #### bytes) in file.php on line 123
Increasing the limit
If you know what you're doing and want to increase the limit see memory_limit:
ini_set('memory_limit', '16M');
ini_set('memory_limit', -1); // no limit
Beware! You may only be solving the symptom and not the problem!
Diagnosing the leak:
The error message points to a line withing a loop that I believe to be leaking, or needlessly-accumulating, memory. I've printed memory_get_usage() statements at the end of each iteration and can see the number slowly grow until it reaches the limit:
foreach ($users as $user) {
$task = new Task;
$task->run($user);
unset($task); // Free the variable in an attempt to recover memory
print memory_get_usage(true); // increases over time
}
For the purposes of this question let's assume the worst spaghetti code imaginable is hiding in global-scope somewhere in $user or Task.
What tools, PHP tricks, or debugging voodoo can help me find and fix the problem?
PHP doesn't have a garbage collector. It uses reference counting to manage memory. Thus, the most common source of memory leaks are cyclic references and global variables. If you use a framework, you'll have a lot of code to trawl through to find it, I'm afraid. The simplest instrument is to selectively place calls to memory_get_usage and narrow it down to where the code leaks. You can also use xdebug to create a trace of the code. Run the code with execution traces and show_mem_delta.
Here's a trick we've used to identify which scripts are using the most memory on our server.
Save the following snippet in a file at, e.g., /usr/local/lib/php/strangecode_log_memory_usage.inc.php:
<?php
function strangecode_log_memory_usage()
{
$site = '' == getenv('SERVER_NAME') ? getenv('SCRIPT_FILENAME') : getenv('SERVER_NAME');
$url = $_SERVER['PHP_SELF'];
$current = memory_get_usage();
$peak = memory_get_peak_usage();
error_log("$site current: $current peak: $peak $url\n", 3, '/var/log/httpd/php_memory_log');
}
register_shutdown_function('strangecode_log_memory_usage');
Employ it by adding the following to httpd.conf:
php_admin_value auto_prepend_file /usr/local/lib/php/strangecode_log_memory_usage.inc.php
Then analyze the log file at /var/log/httpd/php_memory_log
You might need to touch /var/log/httpd/php_memory_log && chmod 666 /var/log/httpd/php_memory_log before your web user can write to the log file.
I noticed one time in an old script that PHP would maintain the "as" variable as in scope even after my foreach loop. For example,
foreach($users as $user){
$user->doSomething();
}
var_dump($user); // would output the data from the last $user
I'm not sure if future PHP versions fixed this or not since I've seen it. If this is the case, you could unset($user) after the doSomething() line to clear it from memory. YMMV.
There are several possible points of memory leaking in php:
php itself
php extension
php library you use
your php code
It is quite hard to find and fix the first 3 without deep reverse engineering or php source code knowledge. For the last one you can use binary search for memory leaking code with memory_get_usage
I recently ran into this problem on an application, under what I gather to be similar circumstances. A script that runs in PHP's cli that loops over many iterations. My script depends on several underlying libraries. I suspect a particular library is the cause and I spent several hours in vain trying to add appropriate destruct methods to it's classes to no avail. Faced with a lengthy conversion process to a different library (which could turn out to have the same problems) I came up with a crude work around for the problem in my case.
In my situation, on a linux cli, I was looping over a bunch of user records and for each one of them creating a new instance of several classes I created. I decided to try creating the new instances of the classes using PHP's exec method so that those process would run in a "new thread". Here is a really basic sample of what I am referring to:
foreach ($ids as $id) {
$lines=array();
exec("php ./path/to/my/classes.php $id", $lines);
foreach ($lines as $line) { echo $line."\n"; } //display some output
}
Obviously this approach has limitations, and one needs to be aware of the dangers of this, as it would be easy to create a rabbit job, however in some rare cases it might help get over a tough spot, until a better fix could be found, as in my case.
I came across the same problem, and my solution was to replace foreach with a regular for. I'm not sure about the specifics, but it seems like foreach creates a copy (or somehow a new reference) to the object. Using a regular for loop, you access the item directly.
I would suggest you check the php manual or add the gc_enable() function to collect the garbage... That is the memory leaks dont affect how your code runs.
PS: php has a garbage collector gc_enable() that takes no arguments.
I recently noticed that PHP 5.3 lambda functions leave extra memory used when they are removed.
for ($i = 0; $i < 1000; $i++)
{
//$log = new Log;
$log = function() { return new Log; };
//unset($log);
}
I'm not sure why, but it seems to take an extra 250 bytes each lambda even after the function is removed.
I didn't see it explicitly mentioned, but xdebug does a great job profiling time and memory (as of 2.6). You can take the information it generates and pass it off to a gui front end of your choice: webgrind (time only), kcachegrind, qcachegrind or others and it generates very useful call trees and graphs to let you find the sources of your various woes.
Example (of qcachegrind):
If what you say about PHP only doing GC after a function is true, you could wrap the loop's contents inside a function as a workaround/experiment.
One huge problem I had was by using create_function. Like in lambda functions, it leaves the generated temporary name in memory.
Another cause of memory leaks (in case of Zend Framework) is the Zend_Db_Profiler.
Make sure that is disabled if you run scripts under Zend Framework.
For example I had in my application.ini the folowing:
resources.db.profiler.enabled = true
resources.db.profiler.class = Zend_Db_Profiler_Firebug
Running approximately 25.000 queries + loads of processing before that, brought the memory to a nice 128Mb (My max memory limit).
By just setting:
resources.db.profiler.enabled = false
it was enough to keep it under 20 Mb
And this script was running in CLI, but it was instantiating the Zend_Application and running the Bootstrap, so it used the "development" config.
It really helped running the script with xDebug profiling
I'm a little late to this conversation but I'll share something pertinent to Zend Framework.
I had a memory leak problem after installing php 5.3.8 (using phpfarm) to work with a ZF app that was developed with php 5.2.9. I discovered that the memory leak was being triggered in Apache's httpd.conf file, in my virtual host definition, where it says SetEnv APPLICATION_ENV "development". After commenting this line out, the memory leaks stopped. I'm trying to come up with an inline workaround in my php script (mainly by defining it manually in the main index.php file).
I didn't see it mentioned here but one thing that might be helpful is using xdebug and xdebug_debug_zval('variableName') to see the refcount.
I can also provide an example of a php extension getting in the way: Zend Server's Z-Ray. If data collection is enabled it memory use will balloon on each iteration just as if garbage collection was off.