Is there a way in PHP to figure out from where an object is being referenced, in order to find stale references not actually needed any more?
Some background:
I am debugging/optimizing a large system written in PHP, trying to reduce the memory footprint of the system when running some large batch processing jobs.
The flow is basically:
1) Setup some context/objects needed for all processing
2) Iterate N times operating on objects only related to objects setup in #1, there is no relation/coupling between the individual objects created in the loop
Given big enough N the system will always run out of memory, even though each object created in step #2 should be able to be garbage collected after the processing is done on that specific object.
At the end of each iteration in step 2 I am doing the following:
debug_zval_dump($lObj);
echo gc_collect_cycles();
I am consequently seeing the following results:
debug_zval_dump: refcount(3)
gc_collect_cycles: 0
The above makes me assume that for some reason there are some stale references to the object being kept somewhere in the system, but I'm having trouble finding them just by inspecting the code.
Any help greatly appreciated!
The short answer is that what you're doing is not possible. From a variable, it's impossible to figure out what other variables are pointing to it (well, impossible from PHP at least).
What I would suggest is to setup an Object Pool. You "release" the object when you're done with it. That way, the pool knows if you can re-use that object (or throw it away if there are too many free objects).
In short, the memory management needs to be cooperative across multiple pieces of code. You can't expect them to just work transparently, if you're storing copies on either side.
As far as debug_zval_dump(), I would be very wary of trusting it. If the variable is a reference, you need to pass by reference (which you can't anymore in 5.4+). So if the variable is a reference, it will always give you a refcount of 1. And if it's not a reference, the true refcount will be increased by 1. It's useful in some edge-case scenarios, but I wouldn't rely on it for anything...
Related
I have a php script to scrap a website (text files only). After running for few hours I noticed the script to stop for reaching the memory limit. I know I can increase the limit, but since the files the script loads are onlty HTML files I explain the reaching of the limit only with the inability of the script to empty the memory after each loop. Could I optimize my script's memory management by flush()ing its memory regularly?
In general, you shouldn't need to manually manage memory in PHP, as it has a high-level Memory Manager built in to the Zend Engine which takes care of this for you. However, it is useful to know a bit about how this works in order to better understand why your code is running out of memory.
As a very basic overview, PHP frees memory based on a "refcount" of how many variables are referencing a particular piece of data. So if you say $a = 'hello'; $b = $a;, a single piece of memory containing the string 'hello' will have a refcount of 2. If you call unset() on either variable, or they fall out of scope (e.g. at the end of the function they were defined in), the refcount will decrease. Once the refcount reaches zero, the data will be deleted and the memory freed. Note that "freed" in this case means freed for use by other parts of that PHP script, not necessarily freed back to the Operating System for use by other processes.
There are a few differences between PHP versions worth knowing:
The reference counting mechanism described above doesn't work if you have circular references (e.g. $obj1->foo = $obj2; $obj2->bar = $obj1;) because the reference count never reaches zero. In PHP 5.2 and earlier, this meant that such circular references led to memory leaks, and had to be manually handled by the programmer. In PHP 5.3, a "Garbage Collector" was added specifically to handle this case. It does not replace the normal refcount mechanism, but if circular references are common in your code, it may be worth reading up on.
PHP 5.4 included a large number of optimizations to the way PHP allocates and uses memory. AFAIK, none of these change the fundamental recommendations of how to write efficient code, they are just a good reason to upgrade your PHP version if you can.
Other than that, there are a few common tips for writing PHP code that makes good use of memory:
Make sure unused variables are discarded when no longer needed. In a well-structured program, this is often a non-issue, because most variables will be local to a particular function; when the function exits, they will go out of scope, and be freed. But if you are creating large intermediate variables, or dynamically creating large numbers of variables, manually calling unset() may be a good idea. And if your code is very linear, or uses large numbers of global and static variables, just refactoring it into a more modular structure may improve its memory performance as well as its readability, maintainability, etc.
Assigning or passing a variable by reference ($foo = &$bar) may cause PHP to use more memory than a straight assignment ($foo = $bar). This is because PHP uses a "Copy On Write" mechanism to to store variables with the same content in one location of memory, but reference assignment conflicts with this mechanism, so PHP has to copy the variable early.
Objects are more memory-hungry than scalar values (int, boolean, string) or arrays. This is one of the things that has been much improved in PHP 5.4, but is still worth thinking about - although obviously not to the exclusion of writing well-structured code!
You can unset variables as you no longer need them (e.g. unset($var) or $var = null). If you're on PHP 5.3 or later, you can also explicitly call the garbage collector: see gc_collect_cycles() and gc_enable().
Some functions seem to be worse than others. I recently found that array_merge_recursive() did horrible things to my code's memory footprint.
If you want to be able to analyse where the memory's going, you can use tools like Xdebug or XHProf/XHGui to help. e.g. Xdebug and tracing memory usage and Profiling with XHProf
See also:
Force freeing memory in PHP
php garbage collection while script running
I have written a PHP plugin that imports records into a database from an Endnote XML bibliography file. The import process involves several stages, one of them being the initial reading of the Endnote records into memory and creating an internal, object-based representation for them. Secondly, all records to be imported have to be scanned for whether their author, publication, keywords etc. already have corresponding records in the database.
I am running PHP 5.4.7 (64 bit) on OS X 10.8.2.
In order to accomplish these tasks in a speedy manner, I am doing almost all data storage in memory, as opposed to writing data out to a db or repeatedly consulting the database... all necessary data is read in once and consulted as needed.
This is of course memory-intensive. However, I have developed a number of strategies to reduce the memory footprint which have been quite effective in reducing the amount of memory used. In particular, I make extensive use of the native PHP serialization/unserialization facilities, together with zlib to compress the serialized representations. Still, memory usage is still uncomfortably high, with maximum memory being exhausted after importing only 500 Endnote records.
To resolve this, I have tried using the unset() internal function to deallocate all variables, arrays and objects that I no longer need, as soon as I have done with them. Some of these objects are quite large when they are instantiated, which is why I dispose of them as soon as possible. However, after doing some memory use profiling, I am finding that the memory usage reported by memory_get_usage(true) is NOT going down, despite unsetting variables, enabling garbage collection using gc_enable() and requesting periodic garbage collection via gc_collect_cycles().
I have read other posts which indicate that so long as the reference counts for a particular variable have not gone down to zero, PHP will not free the associated memory, which I understand. I have designed my code to avoid circular references... each of the distinct memory-consuming objects has an independent set of internal storage arrays, none of which share data with other objects. Hence, upon destroying the host object, theoretically, all of its private data should be freed immediately. However, I am not seeing this happening.
If anyone wants to look at my code, it is available on Github.
1. The main unit test that puts the various high-memory usage objects through their paces, and measures memory usage (processing 500 Endnote records, it uses 122MB total) is found in the file /test/ObjectStoreTest.php
2. The routine for parsing the Endnote data and turning it into an object-based representation (uses about 15MB during the run) is found in /controller/ParseAndStoreEndnoteRecordsHandler.class.php
3. The class for discovering already-existing authors in the main database (seems to use up about 30MB) is found in /model/resolvers/CreatorExternalReferenceResolver.class.php
I give this information for reference, in case it is needed for answering the question... clearly, I don't expect anybody to spend half their day analyzing my code. Hopefully, however, this information will be sufficient in order to clearly specify the particular memory usage issue I am having.
The problem your seeing is due to the fact that that PHP's Garbage Collector has not kicked in yet, or the reference count for the memory consuming objects are not zero yet. The GC is more likely to kick in with a lower memory limit, but it looks like you need all of that memory space. I would set the memory limit 'as is' or higher and let the engine do its job.
The only true fix for this is to use to upgrade to the alpha version of PHP 5.5.0 and use Generators or Co-routines found in that build to keep the memory foot print down. It allows you to only peek at the value of the object in question and does not keep that value in RAM when it moves onto the next object. This allows for the garbage collector to do it's job as the reference counts for the objects are all zero and can thus be removed from memory.
Before starting, I'm not asking about standard coding practice or "etiquette." My question is more from curiosity with the internals of PHP. My research so far mostly seems to find people confused about scope in PHP.
Does re-using variables come with any benefit/detriment in PHP, either in memory or speed? For science.
Say you are sequentially accessing multiple files throughout a script.
Scenario A: your handles each have a variable, $file1, $file2, $file3, etc.
Scenario B: your handles all reuse the variable $fp
Will this theoretical scenario require respectively resource intensive scripts to matter? Will B allow garbage collection to get rid of the old handles while A won't? Will optimization through Zend make this a non-issue either way?
There is not a cut & dry answer to this question. Optimization and performance will depend heavily on your particular codebase, the platform it runs on, what else is running on the server, and more.
To start with your scenarios are too vague to provide an adequate answer. However, to touch on some of the prior comments/concerns...
PHP does not have very well defined rules for garbage collection. In THEORY scenario A will release the memory when a function exits thanks to garbage collection. In reality this rarely happens. There are a number of triggers that will cause garbage collection to release that memory, but behind the scenes the actual low-level free() and mallocs() are not cut & dry. If you watch your memory stack closely you will find that after a function exit the memory space for $file1, $file2, $file3 will remain. Sometimes until the entire application exits.
Your application construction will also determine which is faster, creating a new entry in the symbol table for $file1, $file2, $file3 or re-using $fp over & over. Re-using $fp, again IN THEORY, would typically mean the memory space does not need to be re-allocated and a new symbol table entry and corresponding management object does not need to be re-created. However this is not always the case. Sometimes re-using $fp actually can be slower because a destroy needs to be called first, then re-creating the object. In some corner cases it may be faster to just create a new $file1, $file2, $file3 on the iterative process and let garbage collection happen all-at-once.
So, the bottom line of all this....
You need to analyze and test your own apps in their native environment to learn how things behave in YOUR playground. It is rarely an "always do this" or "never do that" scenario.
Not confident on my answer, but I do found that reuse vars saves more memory, especially when re-using vars for query results as often time those vars will fill with a lot of other unwanted stuff in there.
You can use
echo memory_get_usage() at different stage of the code execution to see the difference and compare.
But it could get confusing as your code grows and makes it harder for people to read.
Also PHP runs garbage collection when the script is done. so how you name your vars probably won't have anything to do with it, rather it effects how much memory it uses during execution.
Memory management is not something that most PHP developers ever need to think about. I'm running into an issue where my command line script is running out of memory. It performs multiple iterations over a large array of objects, making multiple database requests per iteration. I'm sure that increasing the memory ceiling may be a short term fix, but I don't think it's an appropriate long-term solution. What should I be doing to make sure that my script is not using too much memory, and using memory efficiently?
The golden rule
The number one thing to do when you encounter (or expect to encounter) memory pressure is: do not read massive amounts of data in memory at once if you intend to process them sequentially.
Examples:
Do not fetch a large result set in memory as an array; instead, fetch each row in turn and process it before fetching the next
Do not read large text files in memory (e.g. with file); instead, read one line at a time
This is not always the most convenient thing in PHP (arrays don't cut it, and there is a lot of code that only works on arrays), but in recent versions and especially after the introduction of generators it's easier than ever to stream your data instead of chunking it.
Following this practice religiously will "automatically" take care of other things for you as well:
There is no longer any need to clean up resources with a big memory footprint by closing them and losing all references to them on purpose, because there will be no such resources to begin with
There is no longer a need to unset large variables after you are done with them, because there will be no such variables as well
Other things to do
Be careful of creating closures inside loops; this should be easy to do, as creating such inside loops is a bad code smell. You can always lift the closure upwards and give it more parameters.
When expecting massive input, design your program and pick algorithms accordingly. For example, you can mergesort any amount of text files of any size using a constant amount of memory.
You could try profiling it puting some calls to memory_get_usage(), to look for the place where it's peaking.
Of course, knowing what the code really does you'll have more information to reduce its memory usage.
When you compute your large array of objects, try to not compute it all at once. Walk in steps and process elements as you walk then free memory and take next elements.
It will take more time, but you can manage the amount of memory you use.
Can any body give me a a introduction of how to program efficiently minimizing memory usage in PHP program correctly and generate my program results using minimum memory ?
Based on how I read your question, I think you may be barking up the wrong tree with PHP. It was never designed for a low memory overhead.
If you just want to be as efficient as possible, then look at the other answers. Remember that every single variable costs a fair bit of memory, so use only what you have to, and let the garbage collector work. Make sure that you only declare variables in a local scope so they can get GC'd when the program leaves that scope. Objects will be more expensive than scalar variables. But the biggest common abuse I see are multiple copies of data. If you have a large array, operate directly on it rather than copying it (It may be less CPU efficient, but it should be more memory efficient).
If you are looking to run it in a low memory environment, I'd suggest finding a different language to use. PHP is nice because it manages everything for you (with respect to variables). But that type coersion and flexibility comes at a price (speed and memory usage). Each variable requires a lot of meta-data stored with it. So an 8 byte int (32 bit) would take 8 bytes to store in C, it will likely take more than 64 bytes in PHP (because of all of the "tracking" information associated with it such as type, name, scoping information, etc). That overhead is normally seen as ok since PHP was not designed for large memory loads. So it's a trade-off. More memory used for easier programming. But if you have tight memory constraints, I'd suggest moving to a different language...
It's difficult to give advice with so little information on what you're trying to do and why memory utilization is a problem. In the common scenarios (web servers that serve many requests), memory is not a limiting factory and it's preferable to serve the requests as fast as possible, even if this means sacrificing memory for speed.
However, the following general guidelines apply:
unset your variables as soon as you don't need them. In a program that's well written, this, however, won't have a big impact, as variables going out of scope have the same effect.
In long running scripts, with lot's of variables with circular references, and if using PHP 5.3, trey calling the garbage collector explicitly in certain points.
First of all: Don't try to optimize memory usage by using references. PHP is smart enough not to copy the contents of a variable if you do something like this:
$array = array(1,2,3,4,5,);
$var = $array;
PHP will only copy the contents of the variable when you write to it. Using references all the time because you think they will save you copying the variable content can often fire backwards ;)
But, I think your question is hard to answer, as long as you are more precise.
For example if you are working with files it can be recommendable not always to file_get_contents() the whole file, but use the f(open|...) functions to load only small parts of the file at once or even skip whole chunks.
Or if you are working with strings make use of functions which return a string offset instead of the rest of a string (e.g. strcspn instead of strpbrk) when possible.