Force freeing memory in PHP - php

In a PHP program, I sequentially read a bunch of files (with file_get_contents), gzdecode them, json_decode the result, analyze the contents, throw most of it away, and store about 1% in an array.
Unfortunately, with each iteration (I traverse over an array containing the filenames), there seems to be some memory lost (according to memory_get_peak_usage, about 2-10 MB each time). I have double- and triple-checked my code; I am not storing unneeded data in the loop (and the needed data hardly exceeds about 10MB overall), but I am frequently rewriting (actually, strings in an array). Apparently, PHP does not free the memory correctly, thus using more and more RAM until it hits the limit.
Is there any way to do a forced garbage collection? Or, at least, to find out where the memory is used?

it has to do with memory fragmentation.
Consider two strings, concatenated to one string. Each original must remain until the output is created. The output is longer than either input.
Therefore, a new allocation must be made to store the result of such a concatenation. The original strings are freed but they are small blocks of memory.
In a case of 'str1' . 'str2' . 'str3' . 'str4' you have several temps being created at each . -- and none of them fit in the space thats been freed up. The strings are likely not laid out in contiguous memory (that is, each string is, but the various strings are not laid end to end) due to other uses of the memory. So freeing the string creates a problem because the space can't be reused effectively. So you grow with each tmp you create. And you don't re-use anything, ever.
Using the array based implode, you create only 1 output -- exactly the length you require. Performing only 1 additional allocation. So its much more memory efficient and it doesn't suffer from the concatenation fragmentation. Same is true of python. If you need to concatenate strings, more than 1 concatenation should always be array based:
''.join(['str1','str2','str3'])
in python
implode('', array('str1', 'str2', 'str3'))
in PHP
sprintf equivalents are also fine.
The memory reported by memory_get_peak_usage is basically always the "last" bit of memory in the virtual map it had to use. So since its always growing, it reports rapid growth. As each allocation falls "at the end" of the currently used memory block.

In PHP >= 5.3.0, you can call gc_collect_cycles() to force a GC pass.
Note: You need to have zend.enable_gc enabled in your php.ini enabled, or call gc_enable() to activate the circular reference collector.

Found the solution: it was a string concatenation. I was generating the input line by line by concatenating some variables (the output is a CSV file). However, PHP seems not to free the memory used for the old copy of the string, thus effectively clobbering RAM with unused data. Switching to an array-based approach (and imploding it with commas just before fputs-ing it to the outfile) circumvented this behavior.
For some reason - not obvious to me - PHP reported the increased memory usage during json_decode calls, which mislead me to the assumption that the json_decode function was the problem.

There's a way.
I had this problem one day. I was writing from a db query into csv files - always allocated one $row, then reassigned it in the next step. Kept running out of memory. Unsetting $row didn't help; putting an 5MB string into $row first (to avoid fragmentation) didn't help; creating an array of $row-s (loading many rows into it + unsetting the whole thing in every 5000th step) didn't help. But it was not the end, to quote a classic.
When I made a separate function that opened the file, transferred 100.000 lines (just enough not to eat up the whole memory) and closed the file, THEN I made subsequent calls to this function (appending to the existing file), I found that for every function exit, PHP removed the garbage. It was a local-variable-space thing.
TL;DR
When a function exits, it frees all local variables.
If you do the job in smaller portions, like 0 to 1000 in the first function call, then 1001 to 2000 and so on, then every time the function returns, your memory will be regained. Garbage collection is very likely to happen on return from a function. (If it's a relatively slow function eating a lot of memory, we can safely assume it always happens.)
Side note: for reference-passed variables it will obviously not work; a function can only free its inside variables that would be lost anyway on return.
I hope this saves your day as it saved mine!

I've found that PHP's internal memory manager is most-likely to be invoked upon completion of a function. Knowing that, I've refactored code in a loop like so:
while (condition) {
// do
// cool
// stuff
}
to
while (condition) {
do_cool_stuff();
}
function do_cool_stuff() {
// do
// cool
// stuff
}
EDIT
I ran this quick benchmark and did not see an increase in memory usage. This leads me to believe the leak is not in json_decode()
for($x=0;$x<10000000;$x++)
{
do_something_cool();
}
function do_something_cool() {
$json = '{"a":1,"b":2,"c":3,"d":4,"e":5}';
$result = json_decode($json);
echo memory_get_peak_usage() . PHP_EOL;
}

I was going to say that I wouldn't necessarily expect gc_collect_cycles() to solve the problem - since presumably the files are no longer mapped to zvars. But did you check that gc_enable was called before loading any files?
I've noticed that PHP seems to gobble up memory when doing includes - much more than is required for the source and the tokenized file - this may be a similar problem. I'm not saying that this is a bug though.
I believe one workaround would be not to use file_get_contents but rather fopen()....fgets()...fclose() rather than mapping the whole file into memory in one go. But you'd need to try it to confirm.
HTH
C.

Call memory_get_peak_usage() after each statement, and ensure you unset() everything you can. If you are iterating with foreach(), use a referenced variable to avoid making a copy of the original (foreach()).
foreach( $x as &$y)
If PHP is actually leaking memory a forced garbage collection won't make any difference.
There's a good article on PHP memory leaks and their detection at IBM

There recently was a similar issue with System_Daemon. Today I isolated my problem to file_get_contents.
Could you try using fread instead? I think this may solve your problem.
If it does, it's probably time to do a bugreport over at PHP.

Related

Assigning NULL to a PHP variable doesn't clean the memory

I have read several times that, in order to invoke the garbage collector and actually clean the RAM used by a variable, you have to assign a new value (e.g. NULL) instead of simply unset() it.
This code, however, shows that the memory allocated for the array $a is not cleaned after the NULL assignment.
function build_array()
{
for ($i=0;$i<10000;$i++){
$a[$i]=$i;
}
$i=null;
return $a;
}
echo '<p>'.memory_get_usage(true);
$a = build_array();
echo '<p>'.memory_get_usage(true);
$a = null;
echo '<p>'.memory_get_usage(true);
The output I get is:
262144
1835008
786432
So part of the memory is cleaned, but not all the memory. How can I completely clean the RAM?
You have no way to definitely clear a variable from the memory with PHP.
Its is up to PHP garbage collector to do that when it sees that it should.
Fortunately, PHP garbage collector may not be perfect but it is one of the best features of PHP. If you do things as per the PHP documentation there's no reasons to have problems.
If you have a realistic scenario where it may be a problem, post the scenario here or report it to PHP core team.
Other unset() is the best way to clear vars.
Point is that memory_get_usage(true) shows the memory allocated to your PHP process, not the amount actually in use. System could free unused part once it is required somewhere.
More details on memory_get_usage could be found there
If you run that with memory_get_usage(false), you will see that array was actually gc'd. example

Dump all variables in PHP

There's a memory leak in my script and I couldn't find it after 2 days. I found the loop that is causing the memory leak; each iteration of the loop increases the memory usage. I moved the loop into a function to isolate the variables. At the end of the function, I unsetted every variable created by the function so that get_defined_vars() returns an empty array. Here's what I mean:
function the_loop(){
$var="value";
... // processing, including using a library
unset($var);
print_r(get_defined_vars()); // prints empty array
}
while(true){
the_loop();
echo memory_get_usage()."\n"; // steadily increases until memory limit is reached
}
I'm guessing that some variables defined in the_loop() are still in memory. I tried using XDebug's trace tool, but it didn't help. All it showed was that memory usage increases on average over the long run. I'm looking for a tool that can show me all the values in PHP's memory. I will be able to recognize the variable based on the value. What tool can dump PHP's memory?
As Dragon mentioned unset doesn't instantly free memory.
What's better at freeing memory with PHP: unset() or $var = null
I'd consider re-evaluating the way you're using PHP, it's not designed for long/constant running scripts, the garbage handler simply isn't that great.
If you want to dig further into the executing script I'd suggest checking out some tools like:
https://github.com/jokkedk/webgrind
http://xhprof.io/
http://derickrethans.nl/xdebug-and-tracing-memory-usage.html
Also worth a read: What gc_collect_cycles function is useful for?
Calling unset() does not force garbage collection, so while the reference count should decrease there may be others referencing it. Use xdebug_debug_zval($var) before calling unset to see how many references to its value there actually are.

PHP Memory Allocation and Deallocation

So, I am running a long-running script that is dealing with memory sensitive data (large amounts of it). I (think) I am doing a good job of properly destroying large objects throughout the long running process, to save memory.
I have a log that continuously outputs current memory usage (using memory_get_usage()), and I do not notice rises and drops (significant ones) in memory usage. Which tells me I am probably doing the right thing with memory management.
However, if I log on to the server and run a top command, I notice that the apache process that is dealing with this script never deallocates memory (at least visibly though the top command). It simply remains at the highest memory usage, even if the current memory usage reported by php is much, much lower.
So, my question is: are my attempts to save memory futile if the memory isnt really being freed back to the server? Or am I missing something here.
Thank you.
ps. using php 5.4 on linux
pps. For those who want code, this is a basic representation:
function bigData()
{
$obj = new BigDataObj();
$obj->loadALotOfData();
$varA = $obj->getALotOfData();
//all done
$obj = NULL;
$varA = NULL;
unset($obj,$varA);
}
update: as hek2mgl recommended, I ran debug_zval_dump(), and the output, to me, seems correct.
function bigData()
{
$obj = new BigDataObj();
$obj->loadALotOfData();
//all done
$obj = NULL;
debug_zval_dump($obj);
unset($obj);
debug_zval_dump($obj);
}
Output:
NULL refcount(2)
NULL refcount(1)
PHP has a garbage collector. It will free memory for variable containers which reference count is set to 0, meaning that no userland reference exists anymore.
I guess there are still references to variables which you might think to have cleaned. Need to see your code to show you what is the problem.

When unset() should really be used?

I'm curious about using of unset() language construct just about everywhere, where I took memory or declare some variables (regardless of structure).
I mean, when somebody declares variable, when should it really be left for GC, or be unset()?
Example 1:
<?php
$buffer = array(/* over 1000 elements */);
// 1) some long code, that uses $buffer
// 2) some long code, that does not use $buffer
?>
Is there any chance, that $buffer might affect performance of point 2?
Am I really need (or should) to do unset($buffer) before entering point 2?
Example 2:
<?php
function someFunc(/* some args */){
$buffer = new VeryLargeObject();
// 1) some actions with $buffer methods and properties
// 2) some actions without usage of $buffer
return $something;
}
?>
Am I really need (or should) to do unset($buffer) within someFunc()s body before entering point 2?
Will GC free all allocated memory (references and objects included) within someFunc()s scope, when function will come to an end or will find return statement?
I'm interested in technical explaination, but code style suggestions are welcome too.
Thanks.
In php, all memory gets cleaned up after script is finished, and most of the time it's enough.
From php.net:
unset() does just what it's name says - unset a variable. It does not
force immediate memory freeing. PHP's garbage collector will do it
when it see fits - by intention as soon, as those CPU cycles aren't
needed anyway, or as late as before the script would run out of
memory, whatever occurs first.
If you are doing $whatever = null; then you are rewriting variable's
data. You might get memory freed / shrunk faster, but it may steal CPU
cycles from the code that truly needs them sooner, resulting in a
longer overall execution time.
In reality you would use unset() for cleaning memory pretty rare, and it's described good in this post:
https://stackoverflow.com/a/2617786/1870446
By doing an unset() on a variable, you mark the variable for being "garbage collected" so the memory isn't immediately available. The variable does not have the data anymore, but the stack remains at the larger size.
In PHP >= 5.3.0, you can call gc_collect_cycles() to force a GC pass. (after doing gc_enable() first).
But you must understand that PHP is script language, it's not Java so you shouldn't consider it like one. If your script is really that heavy to use tons of RAM - you can use unset and when script is close to exceed the memory - GC will trigger and clean up everything useless, including your unset variables. But in most cases you can forget about it.
Also, if you would want to go for unsetting every variable you do not use - don't. It will actually make your script execute longer - by using more CPU cycles - for the sake of getting free memory that would, in most cases, would never be needed.
Some people also say that they use unset to explicitly show that they won't use variable anymore. I find it a bad practice too, for me it just makes code more verbose with all these useless unsets.

Memory leak in fgets

I have been playing with PHP sockets for some days when making a simple IRC bot for some other projects, the bot is up and running but i noticed that after a couple of hours it will have eaten up all memory available.
I have been doing some debugging with memory_get_usage() and after making sure that i null out all variables i use within my loops, the only thing that causes an increase in memory usage is "fgets()", and i cannot seem to figure out why it wont release its memory after using it.
Any ideas of what i have been doing wrong?
Psudo-code:
$this->socket = stream_socket_client(server, port);
stream_set_blocking($this->socket, 0);
stream_set_timeout($this->socket, 600);
while(true) {
usleep(500000);
$data = fgets($this->socket, 8192);
*work with data if strlen > 0*
$data = null;
}
Note that i have disabled blocking so that the bot can do some background tasks even when there are no activity on the channels it is watching.
Memory usage before and after calling fgets (the same result with stream_get_line):
int(959504)
string(0) "" //Data returned from gets
int(967736)
Note that i am testing against an SSL-server, could this be some kind of SSL "overflow"?
Or if you want to look at the whole code for yourself: https://github.com/Ueland/VikingBot
According to https://bugs.php.net/bug.php?id=38962 it's bug which is reproduced in a specific php 5.2.6 version. So if you use higher version you can report about your findings :)
Simply setting a variable to null doesn't release the memory the previous data was using. It simply disconnects the data from the variable. At some point in the future, the PHP garbage collector MAY kick in and actually free up the memory, but it's not guaranteed to do so. Garbage collection is a very expensive operation, CPU-usage-wise, and PHP will not run the GC unless it absolutely has to. Usually this'd be when memory usage gets close to the memory_limit setting.
You can try to force a GC run via gc_collect_cycles()
use stream_get_line() instead of fgets()
Figured it out at last. I realized that instead of doing a "while(true)" i used a function that only called itself when it was done, and therefore keeping a reference of itself laying around. Dunno why i didnt notice it before now, but at least now the memory usage keeps the same for every round. ;)
Thanks for all suggestions!

Categories