I got to wondering about the efficiency of this:
I have a csv file with about 200 rows in it, I use a class to filter/break up the csv and get the bits I want. It is cached daily.
I found that many descriptions (can be up to ~500 chars each) have a hanging word "Apply" and it needs chopping off.
Thinking that calling toString() on my object more than once would be bad practice, I created a temp var : $UJM_desc (this code is inside a loop)
// mad hanging 'Apply' in `description` very often, cut it off
$UJM_desc = $description->toString();
$hanging = substr($UJM_desc, -5);
if($hanging == "Apply")
$UJM_desc = substr($UJM_desc, 0 , -5);
$html .= '<p>' . $UJM_desc ;
But could have just called $description->toString() a couple of times, I am aware there is room to simplify this maybe with a ternary, but still, I froze the moment and thought I'd ask.
Call a method twice or use a temp var? Which is best?
I'd just use regex to strip off the end:
$html .= '<p>' . preg_replace('/Apply$/', '', $description->toString());
That said, if $description->toString() gives the same output no matter where you use it, there's absolutely no reason to call it multiple times, and a temporary variable will be the most efficient.
There's also no reason to save $hanging to a variable, as you only use it once.
In general, it depends, and it's a tradeoff.
Keeping a calculated value in a variable takes up memory, and runs the risk of containing stale data.
Calculating the value anew might be slow, or expensive in some other way.
So it's a matter of deciding which resource is most important to you.
In this case, however, the temporary variable is so short-lived, it's definitely worth using.
Related
Let's say I have $variable holding more than 500 kb info.
while ($row = mysqli_fetch_assoc($selectFromTable))
{
$variable .= "<p>$row[info]</p>";
}
or
while ($row = mysqli_fetch_assoc($selectFromTable))
{
echo "<p>$row[info]</p>";
}
Optimization wise, is it better to echo the info right away than saving it to a variable?
I can't decide because I can't see the difference in performance because I don't know what tool to use in monitoring the response time. Any suggestion?
Even though there is not enough difference in performance, I still wanted to learn on how can I optimize my coding.
There is no significant difference in speed or memory usage between the two pieces of code you listed. They both build a new string that contains the value of $row['info'] enclosed in a <p> HTML element.
You can pass each string as an individual argument to echo:
echo "<p>", $row['info'], "</p>";
This avoids the creation of a new string, uses less memory and runs slightly faster (the improvement speed is not significant unless you do it thousands of times in a loop).
Read about the echo language construct.
Also please note that $row[info] is not correct. The correct way is $row['info']. It is explained in the documentation why.
You need to do something with variable, instead of just saving some data in it in the first loop.
With your current setup first loop with only variable storage will always be faster as operation with IO (input/output) devices are slow that means echo to print output in screen.
But if you add an echo after the variable statement then, loop with only single echo will obviously be faster.
I'm curious about using of unset() language construct just about everywhere, where I took memory or declare some variables (regardless of structure).
I mean, when somebody declares variable, when should it really be left for GC, or be unset()?
Example 1:
<?php
$buffer = array(/* over 1000 elements */);
// 1) some long code, that uses $buffer
// 2) some long code, that does not use $buffer
?>
Is there any chance, that $buffer might affect performance of point 2?
Am I really need (or should) to do unset($buffer) before entering point 2?
Example 2:
<?php
function someFunc(/* some args */){
$buffer = new VeryLargeObject();
// 1) some actions with $buffer methods and properties
// 2) some actions without usage of $buffer
return $something;
}
?>
Am I really need (or should) to do unset($buffer) within someFunc()s body before entering point 2?
Will GC free all allocated memory (references and objects included) within someFunc()s scope, when function will come to an end or will find return statement?
I'm interested in technical explaination, but code style suggestions are welcome too.
Thanks.
In php, all memory gets cleaned up after script is finished, and most of the time it's enough.
From php.net:
unset() does just what it's name says - unset a variable. It does not
force immediate memory freeing. PHP's garbage collector will do it
when it see fits - by intention as soon, as those CPU cycles aren't
needed anyway, or as late as before the script would run out of
memory, whatever occurs first.
If you are doing $whatever = null; then you are rewriting variable's
data. You might get memory freed / shrunk faster, but it may steal CPU
cycles from the code that truly needs them sooner, resulting in a
longer overall execution time.
In reality you would use unset() for cleaning memory pretty rare, and it's described good in this post:
https://stackoverflow.com/a/2617786/1870446
By doing an unset() on a variable, you mark the variable for being "garbage collected" so the memory isn't immediately available. The variable does not have the data anymore, but the stack remains at the larger size.
In PHP >= 5.3.0, you can call gc_collect_cycles() to force a GC pass. (after doing gc_enable() first).
But you must understand that PHP is script language, it's not Java so you shouldn't consider it like one. If your script is really that heavy to use tons of RAM - you can use unset and when script is close to exceed the memory - GC will trigger and clean up everything useless, including your unset variables. But in most cases you can forget about it.
Also, if you would want to go for unsetting every variable you do not use - don't. It will actually make your script execute longer - by using more CPU cycles - for the sake of getting free memory that would, in most cases, would never be needed.
Some people also say that they use unset to explicitly show that they won't use variable anymore. I find it a bad practice too, for me it just makes code more verbose with all these useless unsets.
I have a very big array stored in memory after reading a whole file in an array (as hex) like this :
$bin_content = fread(fopen($filename,"r"),filesize($filename));
$hex_decode = explode(" ",chunk_split(bin2hex($bin_content),2," "));
unset($bin_content);
function myfunction($i) {
global $hex_decode;
// stuff here
}
When I create a function that uses this $hex_decode array as global ... the script runs like forever (very slow) but if i call that function passing the $hex_decode as a parameter ( myfunction($i,$hex_decode) instead of myfunction($i) in which $i is a pointer) , things are more faster .
Can anyone explain why ? And is there any way to speed the script by reading that file in a different method .
I need to have the whole file in that array rather that line by line , because I'm building a custom ASN.1 decoder and i need to have it all .
And is there any way to speed the script by reading that file in a different method .
Personally I'd use a stream filter to chunk-read and convert the file as it was read rather than reading the entire file in one go, and then converting and fixing with the entire file in memory, handling any filtering and fixing of the ASN.1 structure within the stream filter.
I know this isn't a direct response to the actual question, but rather to the single quote above; but it could provide a less memory-hungry alternative.
There already was a similar question at StackOverflow, please take a look at:
The advantage / disadvantage between global variables and function parameters in PHP?
If your files can be huge you could consider a memory-conservative approach. ASN.1 is usually encoded in structures of type-length-value, which means that you don't have to store the whole thing in memory, just the data that you need to process at any given time.
If I were to say
echo $arr[some_index];
as opposed to saying
echo $arr['some_index'];
Will there be a significant amount of processor time/power lost to the error notice? I am aware that it is not proper syntax, but there is a huge amount of code written like this already on a project I am working on.
Well, it's simple enough to check. You can check the execution time on any statement(s) like such:
$start = microtime(true);
//Do your code. Try an echo of one kind here.
$end = microtime(true);
echo($end - $start); //The elapsed time, in seconds. Precise up to a microsecond.
Do one of those for each type you'd like to test. Whichever is consistently fastest will be the quickest, naturally.
You can also use memory_get_usage to determine how much memory has been used, before and after each call.
Now, you should also be getting a large number of NOTICE's. If a constant isn't defined, it's treated as a string instead, but throws a notice. Another problem is if your key ever conflicts with a constant, you'll be checking the wrong value. It's really just not good practice. I'd go through and replace everything.
I think the performance impact would be negligible, however the purist in me would want to see consistent use of quotes/no quotes.
In a PHP program, I sequentially read a bunch of files (with file_get_contents), gzdecode them, json_decode the result, analyze the contents, throw most of it away, and store about 1% in an array.
Unfortunately, with each iteration (I traverse over an array containing the filenames), there seems to be some memory lost (according to memory_get_peak_usage, about 2-10 MB each time). I have double- and triple-checked my code; I am not storing unneeded data in the loop (and the needed data hardly exceeds about 10MB overall), but I am frequently rewriting (actually, strings in an array). Apparently, PHP does not free the memory correctly, thus using more and more RAM until it hits the limit.
Is there any way to do a forced garbage collection? Or, at least, to find out where the memory is used?
it has to do with memory fragmentation.
Consider two strings, concatenated to one string. Each original must remain until the output is created. The output is longer than either input.
Therefore, a new allocation must be made to store the result of such a concatenation. The original strings are freed but they are small blocks of memory.
In a case of 'str1' . 'str2' . 'str3' . 'str4' you have several temps being created at each . -- and none of them fit in the space thats been freed up. The strings are likely not laid out in contiguous memory (that is, each string is, but the various strings are not laid end to end) due to other uses of the memory. So freeing the string creates a problem because the space can't be reused effectively. So you grow with each tmp you create. And you don't re-use anything, ever.
Using the array based implode, you create only 1 output -- exactly the length you require. Performing only 1 additional allocation. So its much more memory efficient and it doesn't suffer from the concatenation fragmentation. Same is true of python. If you need to concatenate strings, more than 1 concatenation should always be array based:
''.join(['str1','str2','str3'])
in python
implode('', array('str1', 'str2', 'str3'))
in PHP
sprintf equivalents are also fine.
The memory reported by memory_get_peak_usage is basically always the "last" bit of memory in the virtual map it had to use. So since its always growing, it reports rapid growth. As each allocation falls "at the end" of the currently used memory block.
In PHP >= 5.3.0, you can call gc_collect_cycles() to force a GC pass.
Note: You need to have zend.enable_gc enabled in your php.ini enabled, or call gc_enable() to activate the circular reference collector.
Found the solution: it was a string concatenation. I was generating the input line by line by concatenating some variables (the output is a CSV file). However, PHP seems not to free the memory used for the old copy of the string, thus effectively clobbering RAM with unused data. Switching to an array-based approach (and imploding it with commas just before fputs-ing it to the outfile) circumvented this behavior.
For some reason - not obvious to me - PHP reported the increased memory usage during json_decode calls, which mislead me to the assumption that the json_decode function was the problem.
There's a way.
I had this problem one day. I was writing from a db query into csv files - always allocated one $row, then reassigned it in the next step. Kept running out of memory. Unsetting $row didn't help; putting an 5MB string into $row first (to avoid fragmentation) didn't help; creating an array of $row-s (loading many rows into it + unsetting the whole thing in every 5000th step) didn't help. But it was not the end, to quote a classic.
When I made a separate function that opened the file, transferred 100.000 lines (just enough not to eat up the whole memory) and closed the file, THEN I made subsequent calls to this function (appending to the existing file), I found that for every function exit, PHP removed the garbage. It was a local-variable-space thing.
TL;DR
When a function exits, it frees all local variables.
If you do the job in smaller portions, like 0 to 1000 in the first function call, then 1001 to 2000 and so on, then every time the function returns, your memory will be regained. Garbage collection is very likely to happen on return from a function. (If it's a relatively slow function eating a lot of memory, we can safely assume it always happens.)
Side note: for reference-passed variables it will obviously not work; a function can only free its inside variables that would be lost anyway on return.
I hope this saves your day as it saved mine!
I've found that PHP's internal memory manager is most-likely to be invoked upon completion of a function. Knowing that, I've refactored code in a loop like so:
while (condition) {
// do
// cool
// stuff
}
to
while (condition) {
do_cool_stuff();
}
function do_cool_stuff() {
// do
// cool
// stuff
}
EDIT
I ran this quick benchmark and did not see an increase in memory usage. This leads me to believe the leak is not in json_decode()
for($x=0;$x<10000000;$x++)
{
do_something_cool();
}
function do_something_cool() {
$json = '{"a":1,"b":2,"c":3,"d":4,"e":5}';
$result = json_decode($json);
echo memory_get_peak_usage() . PHP_EOL;
}
I was going to say that I wouldn't necessarily expect gc_collect_cycles() to solve the problem - since presumably the files are no longer mapped to zvars. But did you check that gc_enable was called before loading any files?
I've noticed that PHP seems to gobble up memory when doing includes - much more than is required for the source and the tokenized file - this may be a similar problem. I'm not saying that this is a bug though.
I believe one workaround would be not to use file_get_contents but rather fopen()....fgets()...fclose() rather than mapping the whole file into memory in one go. But you'd need to try it to confirm.
HTH
C.
Call memory_get_peak_usage() after each statement, and ensure you unset() everything you can. If you are iterating with foreach(), use a referenced variable to avoid making a copy of the original (foreach()).
foreach( $x as &$y)
If PHP is actually leaking memory a forced garbage collection won't make any difference.
There's a good article on PHP memory leaks and their detection at IBM
There recently was a similar issue with System_Daemon. Today I isolated my problem to file_get_contents.
Could you try using fread instead? I think this may solve your problem.
If it does, it's probably time to do a bugreport over at PHP.