After reading branch predication, I tried to understand how it works.
I wrote a basic test (below).
I think after the first loop php knows how array is structured and predicts the result of condition ($items[$i] <= 150000), but the time of second loop is the same as the first loop and it does not seem to benefit from branch predication.
Some questions:
Maybe php does not support branch predication?
Maybe I did not understand branch predication?
My test is wrong/has other performance issues?
My test:
$count = 300000;
$items = array();
$found = 0;
// build array
for ($i = 0; $i <= $count; $i++) {
array_push($items, rand(0, $i));
}
// first loop with no predication benefit
// ------------------------------------
$time_start = microtime(true);
for($i = 0; $i <= $count; $i++) {
$found = ($items[$i] <= 150000) ? $found+1 : $found;
}
$time_end = microtime(true);
$time = $time_end - $time_start;
// ------------------------------------
echo "first loop\r\n";
echo 'Found:'. $found . "\r\n";
echo 'End:' . $time . "\r\n";
// reset counter
$found = 0;
// second loop with predication benefit
// ------------------------------------
$time_start = microtime(true);
for($i = 0; $i <= $count; $i++) {
$found = ($items[$i] <= 150000) ? $found+1 : $found;
}
$time_end = microtime(true);
$time = $time_end - $time_start;
// ------------------------------------
echo "second loop\r\n";
echo 'Found:'. $found . "\r\n";
echo 'End:' . $time . "\r\n";
output
first loop
Found:254052
End:0.093052864074707
second loop
Found:254052
End:0.092923879623413
http://codepad.org/Zni0b5rS
TL;TR
Branch prediction is a hardware feature.
The programming language which benefits from hardware features is C in this case. (PHP is written in C). Or truly spoken the machine code compiled from PHP's C sources. So yes, you already benefiting from branch prediction, but not it the way you might think of.
Benefiting from branch prediction happens on a far lower level. Basically you need to know that subsequent machine(!) commands are loaded into the CPU's pipeline before getting executed. But loading commands takes time.
If there is a conditional jump command to be executed, the subsequent commands in the pipeline might not being up to date any more if the condition evaluates to true, since the jump commands tells the CPU to go on loading commands from a complete different location in the (machine(!)) code - the jump target - in that case.
This means the CPU needs to flush the existing pipeline and refill it with commands starting a the jump target's location. If at the location there is another jump target the same would happen again. Note that conditional jumps are one of the most seen machine code commands.
That said, the performance of the command pipe line would basically not being sufficient.
What if the CPU might be smart and guess if the condition evaluates to true even before it get's evaluated? It could load commands from that jump target immediately after it loads a jump command into the pipeline. Note that loads is not equal to executes! Again, the CPU tries to guess that before the condition gets evaluated. If the guess should be wrong, the CPU will of course still need to flush the pipeline and refill it with the correct commands. But chances that flushing isn't necessary are better than compared to not guessing it.
There are multiple algorithms that implements the guess, from static to dynamic approaches which take into account what the code has been doing so far. They are pretty sophisticated and reach amazingly high success rates. Check back to Wikipedia, algorithms are explained there:
https://en.wikipedia.org/wiki/Branch_predictor
Returning back to your PHP code; Even the first loop might likely benefit from branch prediction.
Yes, it is obvious that both loops are doing the same and are therefore only needed to be executed once. However, PHP as an interpreted programming language is a far too high level to analyze such hardware features because too many things happen behind the scenes. (Probably even a task switch between the loops).
Btw, if you would have written comparable code in C, the C compiler would have likely opted this out. gcc for example can detect a lot of such situation. (Amazing!) However, this would already happen at compile time, even before run time.
If you are willing to analyze branch prediction, using the Assembler language and the GDB debugger can show it working in practice.
Related
This is not a question about principles or common coding procedures, it is a question about how PHP processes code, or more precisely, doesn't process code that it should ignore, in the name of better understanding how PHP works
Scenario 1:
if (1==2) { echo rand(0,99); }
Obviously, the code above will not have any output, and that's not what the question is about; but rather, about whether or not PHP even considers making any output. As PHP goes through the page, does it entirely skip the code assigned to the failed if-check, or does it get allocated any sort of resources beyond simply what the filesize does?
Scenario 2:
if (1==2) { for ($x = 0; $x <= 999999; $x++) { echo rand(0,99); } }
Similar to scenario 1 but with a key difference to clarify the point, considering that 1==2 is always going to be false, does this code use any more resources than the previous one or will they both be equally "cheap" to process? Or are there any "hidden" actions that add up even if the code in the loop is as minimal as this?
Scenario 3:
for ($x = 0; $x <= 999999; $x++) { if (1==2) { echo rand(0,99); } }
Now, this one should see a false statement a million times, but how significant is that really in terms of resources? Will it keep checking if 1 is 2 or does PHP "learn" from the first time it checks? And does it spend any resources beyond that, or is a simple if-check like this inside a loop the only thing PHP will process? Will it "read" echo rand(0,99); a million times, even though 1 is not 2?
Scenario 4:
for ($x = 0; $x <= 999999; $x++) { if (1==2) { for ($x = 0; $x <= 999999; $x++) { echo rand(0,99); } } }
Finally, a combination of them all, will this example be a massive loop-in-a-loop-level of resource wasting or will the inner loop be completely ignored from processing? In other words, will 1!=2 cause PHP to entirely skip processing the inner loop, or will it waste memory on code that it should ignore? And how different is this scenario compared to the previous three in terms of processing and resources?
Thanks in advance for any PHP and memory-usage expertise on the matter, it is my hope that the answer to this question will bring better understanding about how PHP processes code to me and others
EDIT:
Another somewhat relevant example would be that of having a large amount of comments within a loop compared to outside of it; would comments inside of a loop affect performance differently in any way (regardless of how "unnoticeable" you might consider it to be) than the same amount of comments outside of the loop?
1 & 2) Everything inside these if blocks is not evaluated
3) PHP doesnt learn anything, it will perform 1 million if checks. This isn't significant but it's not insignificant either. As one commenter suggested, try it and see the page time hit.
4) This generates the same amount of processing as #3
This question already has answers here:
Commenting interpreted code and performance
(9 answers)
Closed 9 years ago.
i have this doubt from many days. in every PHP code i write, i will write many comments, leave many white spaces and leave 4 to 5 empty lines between section and section (to make it clear for me)
will all these empty spaces, comments, empty lines make my PHP code to run slow ?
personal experiences are much appreciated :)
This is really a matter of IO and hard drive speed. If your bare file is 10KB and comments and line breaks add 4KB then the extra time that the hard drive spends reading more KB is what you need to benchmark (it's negligible by the way), not even worth your time.
If you start getting into micro-optimization then you run the risk of making your code absolutely horrid to read and maintain.
The best way to speed up your code is to re-factor code where necessary and don't do silly things that obviously hog resources like this crude example:
<?php
$arr = array(); // pretend it has 50,000 items
//GOOD IDEA: count the array once and reference that number
$arr_count = count($arr);
for($i=0; $i < $arr_count; $i++){
echo $arr[$i];
}
//BAD IDEA: re-counting the array for every iteration
for($i=0; $i < count($arr); $i++){
echo $arr[$i];
}
?>
Also unsetting a large array after you are done using it is better than waiting for the Garbage Collector to kick in. For example: pulling data from DB and looping through it. Unset the data when done and keep coding.
Comments and white space are completely ignored when the code is run. You can think of all that extra stuff as being completely wiped away once your done and the code is doing its thing.
Extra white space and comments are solely there for you and fellow coders to be better able to read and understand your code. In fact, if you don't use extra white space and comments, coders will get angry with you for writing and providing terrible code!
Consider the following code.
<?php
$time = round(microtime(true) * 1000);
for($i = 0; $i < 1000000; $i++) {
/*
*/
}
echo (round(microtime(true) * 1000) - $time) . "<br/>";
$time = round(microtime(true) * 1000);
for($i = 0; $i < 1000000; $i++) {
}
echo (round(microtime(true) * 1000) - $time) . "<br/>";
?>
There are times that the first is faster and others that the second is fast. So comments do not affect the speed.
Not really, is the simple answer for general scripts and coding.
It's likely that if you were having to consider gaining a few milliseconds here and there, and removing comments was affective, A) you have too many comments, and B) you'd already know about it all and be performing benchmarks etc.
The amount of comments is usually proportionate to the amount of code you have. ie a line or two of comments for a load of IF/ELSE, setting vars to POST or SESSIONS etc, and DB queries etc. And as the majority of PHP's time parsing a script is opening the file, accessing memory, checking thousands of things including cache etc, reading and executing the code, accessing database etc, the time taken to ignore your comments is probably .001%
Comments are used by you, and possibly other developers, to understand the code. Just keep them neat and try to keep them as short as possible while remaining concise, factual and useful.
I have a php script that loops many times. Is there a way in PHP to tell whether it was the last iteration ? Script is rather complex (1700 lines) and I can't locate the snippet responsible for running the script from the beginning.
Ideally I'm looking for a function (put in the end of the file) which predicts whether or not the script is going to run again from the beginning (as it does). Sure, other solutions are welcomed. The amount of iteration depends.
UPD:
Sorry, it's not loop that causes script to start over. There is something else (that I can't define) that makes the page to run from the beginning.
I assume you mean something like this:
$max = 10;
for($i = 0; $i <= $max; $i++) {
if($i == $max) {
//last iteration
}
}
I use often the function sizeof($var) on my web application, and I'd like to know if is better (in resources term) store this value in a new variable and use this one, or if it's better call/use every time that function; or maybe is indifferent :)
TLDR: it's better to set a variable, calling sizeof() only once. (IMO)
I ran some tests on the looping aspect of this small array:
$myArray = array("bill", "dave", "alex", "tom", "fred", "smith", "etc", "etc", "etc");
// A)
for($i=0; $i<10000; $i++) {
echo sizeof($myArray);
}
// B)
$sizeof = sizeof($myArray);
for($i=0; $i<10000; $i++) {
echo $sizeof;
}
With an array of 9 items:
A) took 0.0085 seconds
B) took 0.0049 seconds
With a array of 180 items:
A) took 0.0078 seconds
B) took 0.0043 seconds
With a array of 3600 items:
A) took 0.5-0.6 seconds
B) took 0.35-0.5 seconds
Although there isn't much of a difference, you can see that as the array grows, the difference becomes more and more. I think this has made me re-think my opinion, and say that from now on, I'll be setting the variable pre-loop.
Storing a PHP integer takes 68 bytes of memory. This is a small enough amount, that I think I'd rather worry about processing time than memory space.
In general, it is preferable to assign the result of a function you are likely to repeat to a variable.
In the example you suggested, the difference in processing code produced by this approach and the alternative (repeatedly calling the function) would be insignificant. However, where the function in question is more complex it would be better to avoid executing it repeatedly.
For example:
for($i=0; $i<10000; $i++) {
echo date('Y-m-d');
}
Executes in 0.225273 seconds on my server, while:
$date = date('Y-m-d');
for($i=0; $i<10000; $i++) {
echo $date;
}
executes in 0.134742 seconds. I know these snippets aren't quite equivalent, but you get the idea. Over many page loads by many users over many months or years, even a difference of this size can be significant. If we were to use some complex function, serious scalability issues could be introduced.
A main advantage of not assigning a return value to a variable is that you need one less line of code. In PHP, we can commonly do our assignment at the same time as invoking our function:
$sql = "SELECT...";
if(!$query = mysql_query($sql))...
...although this is sometimes discouraged for readability reasons.
In my view for the sake of consistency assigning return values to variables is broadly the better approach, even when performing simple functions.
If you are calling the function over and over, it is probably best to keep this info in a variable. That way the server doesn't have to keep processing the answer, it just looks it up. If the result is likely to change, however, it will be best to keep running the function.
Since you allocate a new variable, this will take a tiny bit more memory. But it might make your code a tiny bit more faster.
The troubles it bring, could be big. For example, if you include another file that applies the same trick, and both store the size in a var $sizeof, bad things might happen. Strange bugs, that happen when you don't expect it. Or you forget to add global $sizeof in your function.
There are so many possible bugs you introduce, for what? Since the speed gain is likely not measurable, I don't think it's worth it.
Unless you are calling this function a million times your "performance boost" will be negligible.
I do no think that it really matters. In a sense, you do not want to perform the same thing over and over again, but considering that it is sizeof(); unless it is a enormous array you should be fine either way.
I think, you should avoid constructs like:
for ($i = 0; $i < sizeof($array), $i += 1) {
// do stuff
}
For, sizeof will be executed every iteration, even though it is often not likely to change.
Whereas in constructs like this:
while(sizeof($array) > 0) {
if ($someCondition) {
$entry = array_pop($array);
}
}
You often have no choice but to calculate it every iteration.
I recently ran across some code which the person did the first one. Would like some thoughts on if the top one is better or why a person would write it that way? Any positive reasons over the bottom way.
$result = mysql_query($query)or die("Obtaining location data failed!");
for ($i = mysql_num_rows($result) - 1; $i >=0; $i--)
{
if (!mysql_data_seek($result, $i))
{
echo "Cannot seek to row $i\n";
continue;
}
if(!($row = mysql_fetch_object($result)))
continue;
echo $row->locationname;
}
mysql_free_result($result);
vs
$result = mysql_query($query) or die("Obtaining location data failed!");
while($row = mysql_fetch_object($result)){
echo $row->locationname;
unset($row);
}
mysql_free_result($result);
It looks like the top code is iterating through the mysql result backwards, where the first one is going through it forwards.
The second code example looks cleaner, and there is probably a way to adjust the query to get the results in reverse order in the first place, instead of the somewhat convoluted way the top loop was performed.
Those two are not equivalent since only the first processes the result set in reverse order.
I'd do that with an ORDER BY x DESC clause if only to keep the code simple. When using mysql_query() the complete result set is transferred from the MySQL server to the php process before the function returns and mysql_data_seek() is only moving some pointer within the process' memory, so performace-wise it shouldn't matter much. But if you at some point decide to use an unbuffered query instead it might very well affect the performance.
Definitely the second one :
less code = less code to maintain =~ maybe less bugs !!
The top one has definite advantages when it comes to job security and 'lines of code' performance metrics. Apart from that there is no good reason to do what they did.