Doctrine appears to be taking well over 4MB of RAM to execute a single, simple query:
print memory_get_peak_usage()." <br>\n";
$q = Doctrine_Query::create()
->from('Directories d')
->where('d.DIRECTORY_ID = ?', 5);
$dir = $q->fetchOne();
print $dir['name']." ".$dir['description']."<br>\n";
print memory_get_peak_usage()." <br>\n";
/*************** OUTPUT: **************************
6393616
testname testdescription
10999648
/***************************************************/
This is on a test database with very little data in it - the item that I am querying doesn't contain any data other than what is displayed here.
Is there potentially something wrong with the way I have the system set up, or is this standard memory usage for Doctrine?
From what I can see, you code doesn't seem to be wrong...
As a test, I've set up a quick example, with a very simple table (only four fields).
Here is the relevant code :
var_dump(number_format(memory_get_peak_usage()));
$test = Doctrine::getTable('Test')->find(1);
var_dump(number_format(memory_get_peak_usage()));
When doing that, I have this kind of output :
string '1,316,088' (length=9)
string '2,148,760' (length=9)
Considering the table is really simple and that I am only fetching one line, it seems "much" to me too -- but that's quite consistent with what you are getting, and with what I saw on other projects :-(
If you only need to display your data, and not work with it (ie update/delete/...), a solution might be to not fetch complex objects, but only a simple array :
$test = Doctrine::getTable('Test')->find(1, Doctrine::HYDRATE_ARRAY);
But, in this case, it doesn't make much of a difference, actually :-( :
string '1,316,424' (length=9)
string '2,107,128' (length=9)
Only 40 KB of difference -- well, with bigger objects / more lines, it might still be a good idea...
In the Doctrine manual, there is a page called Improving Performance ; maybe it could help you, especially for these sections :
Conservative Fetching
Free Objects
Oh, btw : I did this test on PHP 5.3.0 ; maybe this can have an impact on the amount of memory used...
I agree with romanb's answer - using an OpCode cache is a definite must when using large libs/frameworks.
An example related to OpCode caching
I've recently adopted Doctrine usage with Zend Framework and was curious about memory usage - so like the OP, I created a method using similar criteria to the OPs test and ran it as an overall test to see what ZF + Doctrine's peak memory usage would be.
I got the following results:
Result without APC:
10.25 megabytes
RV David
16.5 megabytes
Result with APC:
3 megabytes
RV David
4.25 megabytes
Opcode caching makes a very significant difference.
Well, where does this memory usage come from? As Pascal MARTIN pointed out, array hydration does not make a great difference which is logical concerning that we're only talking about a few records here.
The memory consumption comes from all the classes that are loaded on demand through autoloading.
If you dont have APC set up, then yes, there is something wrong with the way your system is set up. Dont even start to measure performance and expect good results with any large php library without an opcode cache like APC. It will not only speed up the execution but also reduce memory usage by at least 50% on all page loads except the very first one (where APC needs to cache the bytecodes first).
And 4MB with your simple example really smells like no-APC, otherwise it would really be a bit high.
Caution with fetchOne() on Doctrine Query. This function call will not append "Limit 1" on SQL
If you just need to get one records from DB, make sure:
$q->limit(1)->fetchOne()
The memory usage is tremendous dropped on large table.
You could see fetchOne() will fetch from DB as a collection first then return the first element.
public function fetchOne($params = array(), $hydrationMode = null)
{
$collection = $this->execute($params, $hydrationMode);
if (is_scalar($collection)) {
return $collection;
}
if (count($collection) === 0) {
return false;
}
if ($collection instanceof Doctrine_Collection) {
return $collection->getFirst();
} else if (is_array($collection)) {
return array_shift($collection);
}
return false;
}
Doctrine provides a free() function on Doctrine_Record, Doctrine_Collection, and Doctrine_Query which eliminates the circular references on those objects, freeing them up for garbage collection.
More info..
To make memory usage a little bit less You can try to use folowing code:
$record->free(true) – will do deep free-up's, calls free() on all relations too
$collection->free() – this will free all collection references
Doctrine_Manager::connection()->clean()/clear() – cleanup connection (and remove identity map entries)
$query->free()
I would guess that most of that memory is used up loading Doctrine's classes, not actually for objects associated with the query itself.
Which version of Doctrine are you using?
Are you using the autoloader?
In Doctrine 1.1, the default autoload behavior is called 'aggressive', which means that it load all of your model classes even if you're only using one or two on any particular request. Setting that behavior to 'conservative' would reduce memory usage.
I have just did "daemonized" script with symfony 1.4 and setting the following stopped the memory hogging:
sfConfig::set('sf_debug', false);
Related
I have read several times that, in order to invoke the garbage collector and actually clean the RAM used by a variable, you have to assign a new value (e.g. NULL) instead of simply unset() it.
This code, however, shows that the memory allocated for the array $a is not cleaned after the NULL assignment.
function build_array()
{
for ($i=0;$i<10000;$i++){
$a[$i]=$i;
}
$i=null;
return $a;
}
echo '<p>'.memory_get_usage(true);
$a = build_array();
echo '<p>'.memory_get_usage(true);
$a = null;
echo '<p>'.memory_get_usage(true);
The output I get is:
262144
1835008
786432
So part of the memory is cleaned, but not all the memory. How can I completely clean the RAM?
You have no way to definitely clear a variable from the memory with PHP.
Its is up to PHP garbage collector to do that when it sees that it should.
Fortunately, PHP garbage collector may not be perfect but it is one of the best features of PHP. If you do things as per the PHP documentation there's no reasons to have problems.
If you have a realistic scenario where it may be a problem, post the scenario here or report it to PHP core team.
Other unset() is the best way to clear vars.
Point is that memory_get_usage(true) shows the memory allocated to your PHP process, not the amount actually in use. System could free unused part once it is required somewhere.
More details on memory_get_usage could be found there
If you run that with memory_get_usage(false), you will see that array was actually gc'd. example
I'm curious about using of unset() language construct just about everywhere, where I took memory or declare some variables (regardless of structure).
I mean, when somebody declares variable, when should it really be left for GC, or be unset()?
Example 1:
<?php
$buffer = array(/* over 1000 elements */);
// 1) some long code, that uses $buffer
// 2) some long code, that does not use $buffer
?>
Is there any chance, that $buffer might affect performance of point 2?
Am I really need (or should) to do unset($buffer) before entering point 2?
Example 2:
<?php
function someFunc(/* some args */){
$buffer = new VeryLargeObject();
// 1) some actions with $buffer methods and properties
// 2) some actions without usage of $buffer
return $something;
}
?>
Am I really need (or should) to do unset($buffer) within someFunc()s body before entering point 2?
Will GC free all allocated memory (references and objects included) within someFunc()s scope, when function will come to an end or will find return statement?
I'm interested in technical explaination, but code style suggestions are welcome too.
Thanks.
In php, all memory gets cleaned up after script is finished, and most of the time it's enough.
From php.net:
unset() does just what it's name says - unset a variable. It does not
force immediate memory freeing. PHP's garbage collector will do it
when it see fits - by intention as soon, as those CPU cycles aren't
needed anyway, or as late as before the script would run out of
memory, whatever occurs first.
If you are doing $whatever = null; then you are rewriting variable's
data. You might get memory freed / shrunk faster, but it may steal CPU
cycles from the code that truly needs them sooner, resulting in a
longer overall execution time.
In reality you would use unset() for cleaning memory pretty rare, and it's described good in this post:
https://stackoverflow.com/a/2617786/1870446
By doing an unset() on a variable, you mark the variable for being "garbage collected" so the memory isn't immediately available. The variable does not have the data anymore, but the stack remains at the larger size.
In PHP >= 5.3.0, you can call gc_collect_cycles() to force a GC pass. (after doing gc_enable() first).
But you must understand that PHP is script language, it's not Java so you shouldn't consider it like one. If your script is really that heavy to use tons of RAM - you can use unset and when script is close to exceed the memory - GC will trigger and clean up everything useless, including your unset variables. But in most cases you can forget about it.
Also, if you would want to go for unsetting every variable you do not use - don't. It will actually make your script execute longer - by using more CPU cycles - for the sake of getting free memory that would, in most cases, would never be needed.
Some people also say that they use unset to explicitly show that they won't use variable anymore. I find it a bad practice too, for me it just makes code more verbose with all these useless unsets.
edit: wrong assumptions were made by my un-perfect self when I posted this question and I feel this question might be misleading.
The efficiency problem actually turned out to be unrelated to push_array.
The comments were helpful in helping me to understand that:
1)this should not take so long, and
2) diagnosing efficiency problems with microtime() is a good practice.
end edit
I am creating ~1400 objects in a test scenario. I think ~1400 will be within a magnitude of typical use.
public $t = Array();
...
for(...){
code...
for(...){
array_push($this->t, new T($i, $str)); //<--this line slows program.
count++;
}
code...
}
Unfortunately the script is taking about 90 seconds to run. If I comment out the one line of code with array_push, the script runs in about 1/6 the time, about 15 seconds.
The inner loop count varies, but averages about 3 to 15 cycles with one new object for each cycle.
Questions:
I am not an expert in PHP. I would like to know:
1) if it would help (and if so, how) to allocate memory space beforehand.
2) if there are any efficiency steps I should be taking that would help the script run faster or a data structure that would be more efficient then an array of objects. The newly created objects currently have two attributes, an integer and a string representing a single word (roughly averaging ~10 characters).
edit:
This is the constructor:
class T{
public $line;
public $text;
function __construct($ln, $txt){
$this->line = $ln;
$this->text = $txt;
}
}
The runtime depends on few different factors :
The server your using to run the script
Code efficiency of course - here's a great article about writing efficient php code
Of course there are more factors , but from previous exprience it shouldn't take that long , but still , the information I've got about the objects you're creating is not broad enough so I could detect where the problem is.
By the way, using PHP Accelerators such as APC or xcache might improve your code runtime.
In a PHP program, I sequentially read a bunch of files (with file_get_contents), gzdecode them, json_decode the result, analyze the contents, throw most of it away, and store about 1% in an array.
Unfortunately, with each iteration (I traverse over an array containing the filenames), there seems to be some memory lost (according to memory_get_peak_usage, about 2-10 MB each time). I have double- and triple-checked my code; I am not storing unneeded data in the loop (and the needed data hardly exceeds about 10MB overall), but I am frequently rewriting (actually, strings in an array). Apparently, PHP does not free the memory correctly, thus using more and more RAM until it hits the limit.
Is there any way to do a forced garbage collection? Or, at least, to find out where the memory is used?
it has to do with memory fragmentation.
Consider two strings, concatenated to one string. Each original must remain until the output is created. The output is longer than either input.
Therefore, a new allocation must be made to store the result of such a concatenation. The original strings are freed but they are small blocks of memory.
In a case of 'str1' . 'str2' . 'str3' . 'str4' you have several temps being created at each . -- and none of them fit in the space thats been freed up. The strings are likely not laid out in contiguous memory (that is, each string is, but the various strings are not laid end to end) due to other uses of the memory. So freeing the string creates a problem because the space can't be reused effectively. So you grow with each tmp you create. And you don't re-use anything, ever.
Using the array based implode, you create only 1 output -- exactly the length you require. Performing only 1 additional allocation. So its much more memory efficient and it doesn't suffer from the concatenation fragmentation. Same is true of python. If you need to concatenate strings, more than 1 concatenation should always be array based:
''.join(['str1','str2','str3'])
in python
implode('', array('str1', 'str2', 'str3'))
in PHP
sprintf equivalents are also fine.
The memory reported by memory_get_peak_usage is basically always the "last" bit of memory in the virtual map it had to use. So since its always growing, it reports rapid growth. As each allocation falls "at the end" of the currently used memory block.
In PHP >= 5.3.0, you can call gc_collect_cycles() to force a GC pass.
Note: You need to have zend.enable_gc enabled in your php.ini enabled, or call gc_enable() to activate the circular reference collector.
Found the solution: it was a string concatenation. I was generating the input line by line by concatenating some variables (the output is a CSV file). However, PHP seems not to free the memory used for the old copy of the string, thus effectively clobbering RAM with unused data. Switching to an array-based approach (and imploding it with commas just before fputs-ing it to the outfile) circumvented this behavior.
For some reason - not obvious to me - PHP reported the increased memory usage during json_decode calls, which mislead me to the assumption that the json_decode function was the problem.
There's a way.
I had this problem one day. I was writing from a db query into csv files - always allocated one $row, then reassigned it in the next step. Kept running out of memory. Unsetting $row didn't help; putting an 5MB string into $row first (to avoid fragmentation) didn't help; creating an array of $row-s (loading many rows into it + unsetting the whole thing in every 5000th step) didn't help. But it was not the end, to quote a classic.
When I made a separate function that opened the file, transferred 100.000 lines (just enough not to eat up the whole memory) and closed the file, THEN I made subsequent calls to this function (appending to the existing file), I found that for every function exit, PHP removed the garbage. It was a local-variable-space thing.
TL;DR
When a function exits, it frees all local variables.
If you do the job in smaller portions, like 0 to 1000 in the first function call, then 1001 to 2000 and so on, then every time the function returns, your memory will be regained. Garbage collection is very likely to happen on return from a function. (If it's a relatively slow function eating a lot of memory, we can safely assume it always happens.)
Side note: for reference-passed variables it will obviously not work; a function can only free its inside variables that would be lost anyway on return.
I hope this saves your day as it saved mine!
I've found that PHP's internal memory manager is most-likely to be invoked upon completion of a function. Knowing that, I've refactored code in a loop like so:
while (condition) {
// do
// cool
// stuff
}
to
while (condition) {
do_cool_stuff();
}
function do_cool_stuff() {
// do
// cool
// stuff
}
EDIT
I ran this quick benchmark and did not see an increase in memory usage. This leads me to believe the leak is not in json_decode()
for($x=0;$x<10000000;$x++)
{
do_something_cool();
}
function do_something_cool() {
$json = '{"a":1,"b":2,"c":3,"d":4,"e":5}';
$result = json_decode($json);
echo memory_get_peak_usage() . PHP_EOL;
}
I was going to say that I wouldn't necessarily expect gc_collect_cycles() to solve the problem - since presumably the files are no longer mapped to zvars. But did you check that gc_enable was called before loading any files?
I've noticed that PHP seems to gobble up memory when doing includes - much more than is required for the source and the tokenized file - this may be a similar problem. I'm not saying that this is a bug though.
I believe one workaround would be not to use file_get_contents but rather fopen()....fgets()...fclose() rather than mapping the whole file into memory in one go. But you'd need to try it to confirm.
HTH
C.
Call memory_get_peak_usage() after each statement, and ensure you unset() everything you can. If you are iterating with foreach(), use a referenced variable to avoid making a copy of the original (foreach()).
foreach( $x as &$y)
If PHP is actually leaking memory a forced garbage collection won't make any difference.
There's a good article on PHP memory leaks and their detection at IBM
There recently was a similar issue with System_Daemon. Today I isolated my problem to file_get_contents.
Could you try using fread instead? I think this may solve your problem.
If it does, it's probably time to do a bugreport over at PHP.
I am aware of that All associated result memory is automatically freed at the end of the script's execution. But would you recommend using it, if I am using a quite of lot of somewhat similar actions as below?
$sql = "select * from products";
$result = mysql_query($sql);
if($result && mysql_num_rows($result) > 0) {
while($data = mysql_fetch_assoc($result)) {
$sql2 = "insert into another_table set product_id = '".$data['product_id']."'
, product_name = '".$data['product_name']."'
";
$result2 = mysql_query($sql2);
**mysql_free_result($result2);**
}
}
Thanks.
Quoting the documentation of mysql_free_result :
mysql_free_result() only needs to be
called if you are concerned about how
much memory is being used for queries
that return large result sets. All
associated result memory is
automatically freed at the end of the
script's execution.
So, if the documentation says it's generally not necessary to call that function, I would say it's not really necessary, nor good practice, to call it ;-)
And, just to say : I almost never call that function myself ; memory is freed at the end of the script, and each script should not eat too much memory.
An exception could be long-running batches that have to deal with large amounts of data, though...
Yes, it is good practice to use mysql_free_result($result). The quoted documentation in the accepted answer is inaccurate. That is what the documentation says, but that doesn't make any sense. Here is what it says:
mysql_free_result() only needs to be called if you are concerned about how much memory is being used for queries that return large result sets. All associated result memory is automatically freed at the end of the script's execution.
The first part of the first sentence is correct. It is true that you don't need to use it for reasons other than memory concerns. Memory concerns are the only reason to use it. However, the second part of the first sentence doesn't make any sense. The claim is that you would only be concerned about memory for queries that return large result sets. This is very misleading as there are other common scenarios where memory is a concern and calling mysql_free_result() is very good practice. Any time queries may be run an unknown number of times, more and more memory will be used up if you don't call mysql_free_result(). So if you run your query in a loop, or from a function or method, it is usually a good idea to call mysql_free_result(). You just have to be careful not to free the result until after it will not be used anymore. You can shield yourself from having to think about when and how to use it by making your own select() and ex() functions so you are not working directly with result sets. (None of the code here is exactly the way I would actually write it, it is more illustrative. You may want to put these in a class or special namespace, and throw a different Exception type, or take additional parameters like $class_name, etc.)
// call this for select queries that do not modify anything
function select($sql) {
$array= array();
$rs= query($sql);
while($o= mysql_fetch_object($rs))
$array[]= $o;
mysql_free_result($rs);
return $array;
}
// call this for queries that modify data
function ex($sql) {
query($sql);
return mysql_affected_rows();
}
function query($sql) {
$rs= mysql_query($sql);
if($rs === false) {
throw new Exception("MySQL query error - SQL: \"$sql\" - Error Number: "
.mysql_errno()." - Error Message: ".mysql_error());
}
return $rs;
}
Now if you only call select() and ex(), you are just dealing with normal object variables and only normal memory concerns instead of manual memory management. You still have to think about normal memory concerns like how much memory is in use by the array of objects. After the variable goes out of scope, or you manually set it to null, it become available for garbage collection so PHP takes care of that for you. You may still want to set it to null before it goes out of scope if your code does not use it anymore and there are operations following it that use up an unknown amount of memory such as loops and other function calls. I don't know how result sets and functions operating on them are implemented under the hood (and even if I did, this could change with different/future versions of PHP and MySQL), so there is the possibility that the select() function approximately doubles the amount of memory used just before mysql_free_result($rs) is called. However using select() still eliminates what us usually the primary concern of more and more memory being used during loops and various function calls. If you are concerned about this potential for double memory usage, and you are only working with one row at a time over a single iteration, you can make an each() function that will not double your memory usage, and will still shield you from thinking about mysql_free_result():
each($sql,$fun) {
$rs= query($sql);
while($o= mysql_fetch_object($rs))
$fun($o);
mysql_free_result($rs);
}
You can use it like this:
each("SELECT * FROM users", function($user) {
echo $user->username."<BR>";
});
Another advantage of using each() is that it does not return anything, so you don't have to think about whether or not to set the return value to null later.
The answer is of course YES in mysqli.
Take a look at PHP mysqli_free_result documentation:
You should always free your result with mysqli_free_result(), when your result object is not needed anymore.
I used to test it with memory_get_usage function:
echo '<br>before mysqli free result: '.memory_get_usage();
mysqli_free_result($query[1]);
echo '<br>after mysqli free result'.memory_get_usage();
And it is the result:
before mysqli free result:2110088
after mysqli free result:1958744
And here, we are talking about 151,344 bytes of memory in only 1000 rows of mysql table.
How about a million rows and how is it to think about large projects?
mysqli_free_result() is not only for large amount of data, it is also a good practice for small projects.
It depends on how large your queries are or how many queries you run.
PHP frees the memory at the end of the script(s) automatically, but not during the run. So if you have a large amount of data comming from a query, better free the result manually.
I would say: YES, it is good practice because you care about memory during the development or your scripts and that is what makes a good developer :-)