I see many code like this:
function load_items(&$items_arr) {
// ... some code
}
load_items($item_arr);
$v = &$items_arr[$id];
compared to code as follow:
function load_items() {
// ... some code
return $items_arr;
}
$item_arr = load_items();
$v = $items_arr[$id];
Did the second code will copy items_arr and $item_arr[$id]?
Will the first code import performance?
No, it will not copy the value right away. Copy-on-write
is one of the memory management methods used in PHP. It ensures that memory isn’t wasted when you copy values between variables.
What that means is that when you assign:
$v = $items_arr[$id];
PHP will simply update the symbol table to indicate that $v points to the same memory address of $item_arr[$id], just if you change $item_arr or $v afterwards then PHP allocates more memory and then performs the copy.
By delaying this extra memory allocation and the copying PHP saves time and memory in some cases.
There's are nice article about memory management in PHP: http://hengrui-li.blogspot.no/2011/08/php-copy-on-write-how-php-manages.html
Related
I am getting 'memory exhausted' in my PHP script. I cannot figure out why. The limit is 128Mb. I am sending, from a javascript xmlrequest(), a json string that is maximum 10MB (usually under 1MB). The PHP script itself is 43K. Even copying everything a dozen times over I shouldn't run out of memory. I do a few calls to the database at getParentFolders() but that produces only a few rows. Certainly not the 62 megabytes it claims I am using. I have used Xdebug (see picture) but this tells me nothing useful, only that yup, I'm using a lot of memory.
So at this point I am trying to do 'best practices' to minimize memory usage. A simple fix, or so I thought, was to pass in values by reference. So I put a '&' before every variable in every function. To my surprise, there was no change in memory consumption. It was slightly worse by a few bytes, in fact. I have also tried using every variable as a global, but again, to my surprise, there was little difference.
So what's going on? Why is passing by reference and using globals not producing the performance benefits I expected? (see images)
Xdebug 'pass by value'
Xdebug 'pass by reference'
Note they are pretty much identical.
For those who want code, here is the getParentFolders() function, which returns just a short string but somehow uses 70Mb!
function getParentFolders(&$node) { //returns a string of folders
debugprint("GetParentFOlders()"); //prints if $DEBUG flag is on
$parent = getParent($node);
$path = "";
while ($parent) { //goes till we hit root folder
$path = $parent->title . '/' . $path; //prepend it
$parent = getParent($parent);
}
return $path;
}
function getParent(&$node) { //return node that is the parent
global $email;
$parentId = $node->parentId;
$clientId = $node->parentClient;
$idCol = $clientId . "_id";
$tablename = $email . "_bookmarks";
$query = "SELECT * FROM `$tablename`
WHERE $idCol = '$parentId'"; //only return one row since id is unique
$result = sendquery($query);
return (object) $result[0];
}
Edit:
Just to clarify, I am looking for a technical explanation of PHP memory usage and best practices - specifically why I am not seeing memory differences - not a workaround for the issue.
Make sure the query result returns one row by using LIMIT 1; If no result found then make the getParent() return false or null so that it can be set to $parent to exit from while loop. And I don't think so that you need to pass the $node argument by reference in your case.
The value of $node is a reference, i.e. a pointer to an object. In PHP 5+, "objects" are not values directly. You always work with objects through pointers to them: when you do new ClassName, it evaluates to a pointer to an object; when you use the -> operator, it takes a pointer to an object on the left side.
So if both of these functions were pass-by-value, the size of the thing passed is just the size of a pointer. When you passed by reference, that basically wraps the pointer in another level of pointer, which is the same size (though the total memory is greater since there are two pointers). The only point of passing by reference is that it allows you to assign to $node inside the function to point to a different object or hold a different type of value, and have it be reflected in the passed variable in the calling function, but you do not assign to $node anywhere here (modifying a field of the object pointed to $node is not the same as assigning to $node), so pass-by-reference is pointless here.
(Plus, even if you passed a value type, like arrays in PHP, they have copy-on-write semantics, and is not copied until you write to it. And even if you wrote to it and caused it to make a copy, that copy would only last for the lifetime of this local variable, which ends at the end of the function, so it would not create persistent memory usage.)
The answer is that objects cannot be passed by value. They are always passed by reference. The 'pass by reference' operator, '&', will do nothing other than wrap the object reference (i.e. a pointer) in a wrapper, then pass that new pointer.
As for the memory usage, there is no simple answer other than database query results seem to come with a lot of overhead, and you want to make your 'footprint' (i.e. any results returned from a database, or any variables) as small as possible.
I have read several times that, in order to invoke the garbage collector and actually clean the RAM used by a variable, you have to assign a new value (e.g. NULL) instead of simply unset() it.
This code, however, shows that the memory allocated for the array $a is not cleaned after the NULL assignment.
function build_array()
{
for ($i=0;$i<10000;$i++){
$a[$i]=$i;
}
$i=null;
return $a;
}
echo '<p>'.memory_get_usage(true);
$a = build_array();
echo '<p>'.memory_get_usage(true);
$a = null;
echo '<p>'.memory_get_usage(true);
The output I get is:
262144
1835008
786432
So part of the memory is cleaned, but not all the memory. How can I completely clean the RAM?
You have no way to definitely clear a variable from the memory with PHP.
Its is up to PHP garbage collector to do that when it sees that it should.
Fortunately, PHP garbage collector may not be perfect but it is one of the best features of PHP. If you do things as per the PHP documentation there's no reasons to have problems.
If you have a realistic scenario where it may be a problem, post the scenario here or report it to PHP core team.
Other unset() is the best way to clear vars.
Point is that memory_get_usage(true) shows the memory allocated to your PHP process, not the amount actually in use. System could free unused part once it is required somewhere.
More details on memory_get_usage could be found there
If you run that with memory_get_usage(false), you will see that array was actually gc'd. example
I was just wondering how PHP works under the hood in this certain scenario. Let's say I have these two pieces of code:
function foo() {
return 2 * 2;
}
// First.
if (foo()) {
bar(foo());
}
// Second.
if (($ref = foo())) {
bar($ref);
}
Now the questions:
In the first case, does PHP make some sort of temporary variable inside the if clause? If so, isn't the second piece of code always better approach?
Does the second case take more memory? If answer to the first question is yes to the first question, then not?
The two codes are not equivalent, because the first one calls foo() twice (if it returns a truthy value). If it has side effects, such as printing something, they will be done twice. Or if it's dependent on something that can change (e.g. the contents of a file or database), the two calls may return different values. In your example where it just multiplies two numbers, this doesn't happen, but it still means it has to do an extra multiplication, which is unnecessary.
The answer to your questions is:
Yes, it needs to hold the returned value in a temporary memory location so it can test whether it's true or not.
Yes, it uses a little more memory. In the first version, the temporary memory can be reclaimed as soon as the if test is completed. In the second version, it will not be reclaimed until the variable $foo is reassigned or goes out of scope.
In the first case, you are calling a function twice, so, if the function is time consuming, it is inefficient. The second case is indeed better since you are saving the result of foo().
In both cases, PHP needs to allocate memory depending on what data foo() generates. That memory will be freed by the garbage collector later on. In terms of memory both cases are pretty much equivalent. Maybe the memory will be released earlier, maybe not, but most likely you won't encounter a case where it matters.
PHP can't make any temporary variable because it can't be sure that foo()'s returning value will always be the same. microtime(), rand() will return different values for each call, for example.
In the second example, it takes indeed more memory, since PHP needs to create and keep the value in memory.
Here is how to test it :
<?php
function foo() {
return true;
}
function bar($bool) {
echo memory_get_usage();
}
if (1) {
// 253632 bytes on my machine
if (foo()) {
bar(foo());
}
} else {
// 253720 bytes on my machine
if (($ref = foo())) {
bar($ref);
}
}
I'm trying to extract data file from many html files. In order to do it fast I don't use DOM parser, but simple strpos(). Everything goes well if I generate from circa 200000 files. But if the do it with more files (300000) it outputs nothing, and do this strange effect:
Look at the bottom diagram. (The upper is the CPU) In the first (marked RED) phase the output filesize is growing, everything seems OK. After that the (marked ORANGE) file size become zero and the memory usage is growing. (Everything are two times, because I restarted the computing at halftime)
I forget to say that I use WAMP.
I have tired unset variables, put loop into function, using implode instead of concatenating strings, using fopen instead of filegetcontents and garbage collection too...
What is the 2nd phase? Am I out of memory? Is there some limit that I don't know (max_execution_time,memory_limit - are already ignored)? Why does this small program use so much memory?
Here is the code.
$datafile = fopen("meccsek2b.jsb", 'w');
for($i=0;$i<100000;$i++){
$a = explode('|',$data[$i]);
$file = "data2/$mid.html";
if(file_exists($file)){
$c = file_get_contents($file);
$o = 0;
$a_id = array();
$a_h = array();
$a_d = array();
$a_v = array();
while($o = strpos($c,'<a href="/test/',$o)){
$o = $o+15;
$a_id[] = substr($c,$o,strpos($c,'/',$o)-$o);
$o = strpos($c,'val_h="',$o)+7;
$a_h[] = substr($c,$o,strpos($c,'"',$o)-$o);
$o = strpos($c,'val_d="',$o)+7;
$a_d[] = substr($c,$o, strpos($c,'"',$o)-$o);
$o = strpos($c,'val_v="',$o)+7;
$a_v[] = substr($c,$o,strpos($c,'"',$o)-$o);
}
fwrite($datafile,
$mid.'|'.
implode(';',$a_id).'|'.
implode(';',$a_h).'|'.
implode(';',$a_d).'|'.
implode(';',$a_v).
PHP_EOL);
}
}
fclose($datafile);
Apache error log. (expires in 30 days)
I think I found the problem:
There was an infinite loop because strpos() returned 0.
The allocated memory size was growing until an exception:
PHP Fatal error: Out of memory
Ensino's note was very useful about using command line,that lead me finally to this question.
You should consider running your script from the command line; this way you might catch the error without digging through the error logs.
Furthermore, as stated in the PHP manual, the strpos function may return boolean FALSE, but may also return a non-boolean value which evaluates to FALSE, so the correct way to test the return value of this function is by using the !== operator:
while (($o = strpos($c,'<a href="/test/',$o)) !== FALSE){
...
}
The CPU spike most likely means that PHP is doing garbage collection. In case you want get some performance at cost of bigger memory usage, you can disable garbage collection by gc_disable().
Looking at the code, I'd guess, that you've reached point where file_get_contents is reading some big file and PHP realizes it has to free some memory by running garbage collection to be able to store it's content.
The best approach how to deal with that is to read the file continuously and process it by chunks rather than having it whole in the memory.
Huge amount of data is going into the system internal cache. When the data of the system cache is written to disk, it might have impact on memory and performance.
There is a the system function FlushFileBuffers to enfoce writes:
Please look at http://msdn.microsoft.com/en-us/library/windows/desktop/aa364451%28v=vs.85%29.aspx and http://winbinder.org/ for calling the function.
(Though, this explains not the empty file, unless there is windows bug.)
When running this simple script I get the output posted below.
It makes me think that there is a memory leak in either my code or the Zend Framework/Magento stack. This issue occurs when iterating any kind of Magento collection.
Is there anything that I am missing or doing wrong?
Script:
$customersCollection = Mage::getModel('customer/customer')->getCollection();
foreach($customersCollection as $customer) {
$customer->load();
$customer = null;
echo memory_get_usage(). "\n";
}
Output:
102389104
102392920
...
110542528
110544744
Your issue is that you are issuing fairly expensive queries with each iteration, when you could load the necessary data via the collection queries:
$collection = Mage::getResourceModel('customer/customer_collection')->addAttributeToSelect('*');
will do the same, but all in one query. The caveat to this approach is that if there are any custom event observers for customer_load_before or customer_load_after events (there are no core observers for these), the observer will need to be run manually for each data model.
Edit: credit to osonodoar for spotting an incorrect class reference (customer/customer vs customer/customer_collection)
The memory for an object (or other value) can only be freed when there are no references to it anywhere in the PHP process. In your case, the line $customer = null only decreases the number of references to that object by one, but it doesn't make it reach zero.
If you consider a simpler loop, this may become clearer:
$test = array('a' => 'hello');
foreach ( $test as $key => $value )
{
// $value points at the same memory location as $test['a']
// internally, that "zval" has a "refcount" of 2
$value = null;
// $value now points to a new memory location, but $test['a'] is unnaffected
// the refcount drops to 1, but no memory is freed
}
Because you are using objects, there is an added twist - you can modify the object inside the loop without creating a new copy of it:
$test = array('a' => new __stdClass);
// $test['a'] is an empty object
foreach ( $test as $key => $value )
{
// $value points at the same object as $test['a']
// internally, that object has a "refcount" of 2
$value->foo = "Some data that wasn't there before";
// $value is still the same object as $test['a'], but that object now has extra data
// This requires additional memory to store that object
$value = null;
// $value now points to a new memory location, but $test['a'] is unnaffected
// the refcount drops to 1, but no memory is freed
}
// $test['a']->foo now contains the string assigned in the loop, consuming extra memory
In your case, the ->load() method is presumably expanding the amount of data in each of the members of $customersCollection in turn, requiring more memory for each. Inspecting $customersCollection before and after the loop would probably confirm this.
First off, when unsetting variables use unset($variable) instead of $variable=null. It does essentially the same thing, but is much clearer as to your intent.
Second, PHP is meant to die - memory leaks aren't a huge issue, as a PHP request lasts maybe a few seconds, and then the process dies and all memory it was using is freed up for the next request. Unless you are running into scaling issues, it's nothing to worry about.
Edit: which isn't to say don't worry about the quality of your code, but for something like this, its most likely not worth the effort of trying to prevent it from happening unless it is causing problems.
Other way out to handle memory leak is that call exec within loop and let that exec function do the job part which results in memory leak.
So once it completes its part and terminates all memory leak within that exec will be released.
So with huge iterations this memory loss which keeps adding otherwise will be taken care.
#benmarks response would be the right approach here, as calling load() within a loop is a very very expensive call.
Calling $customer->load() would allocate memory incrementally that would be referenced by $customersCollection, that memory won't be released until the end of the loop.
However, if load() needs to be called for any reason, the code below won't leak memory, as the GC releases all the memory allocated by the model in each iteration.
$customersCollection = Mage::getModel('customer/customer')->getCollection();
foreach($customersCollection as $customer) {
$customerCopy = Mage::getModel('customer/customer')->load($customer->getId());
//Call to $customerCopy methods
echo memory_get_usage(). "\n";
}