I would like to compare different PHP code to know which one would be executed faster. I am currently using the following code:
<?php
$load_time_1 = 0;
$load_time_2 = 0;
$load_time_3 = 0;
for($x = 1; $x <= 20000; $x++)
{
//code 1
$start_time = microtime(true);
$i = 1;
$i++;
$load_time_1 += (microtime(true) - $start_time);
//code 2
$start_time = microtime(true);
$i = 1;
$i++;
$load_time_2 += (microtime(true) - $start_time);
//code 3
$start_time = microtime(true);
$i = 1;
$i++;
$load_time_3 += (microtime(true) - $start_time);
}
echo $load_time_1;
echo '<br />';
echo $load_time_2;
echo '<br />';
echo $load_time_3;
?>
I have executed the script several times.
The first result is
0.44057559967041
0.43392467498779
0.43600964546204
The second result is
0.50447297096252
0.48595094680786
0.49943733215332
The third result is
0.5283739566803
0.55247902870178
0.55091571807861
The result looks okay, but the problem is, that every time I execute this code the result is different. Also, I am comparing three times the same code and on the same machine.
Why would there be a difference in speed while comparing? And is there a way to compare execution times and to see the the real difference?
there is a thing called Observational error.
As long as your numbers do not exceed it, all your measurements are just waste of time.
The only proper way of doing measurements is called profiling and stands for measuring significant parts of the code, not senseless ones.
Why would there be a difference in
speed while comparing?
There are two reasons for this, which are both related to how things that are out of your control are handled by the PHP and the Operating system.
Firstly, the computer processor can only do certain amount of operations at any given time. The Operating system is basically responsible of handling the multitasking to divide these available cycles to your applications. Since these cycles aren't given at a constant rate, small speed variations are to be expected even with identical PHP commands, because of how processor cycles are allocated.
Secondly, a bigger cause to time variations are the background operations of PHP. There are many things that are completely hidden to the user, like memory allocation, garbage collection and handling various name spaces for variables and the like. These operations also take computer cycles and they can be run at unexpected times during your script. If garbage collection is performed during the first incrementation, but not the second, it causes the first operation to take longer than the second. Sometimes, because of garbage collection, the order in which the tests are performed can also impact the execution time.
Speed testing can be a bit tricky, because unrelated factors (like other applications running in the background) can skew the results of your test. Generally small speed differences are hard to tell between scripts, but when a speed test is run enough number of times, the real results can be seen. For example, if one script is constantly faster than another, it usually points out to that script being more efficient in terms of processing speed.
the reason the results vary is because there are other things going on at the same time, such as windows or linux based tasks, other processes, you will never get an exact result, you best of running the code over a 100 iterations and then devide the result to find the average time take, and use that as your figure/
Also it would be beneficial for you to create a class that can handle this for you, this way you can use it all the time without having to write the code every time:
try something like this (untested):
class CodeBench
{
private $benches = array();
public function __construct(){}
public function begin($name)
{
if(!isset($this->benches[$name]))
{
$this->benches[$name] = array();
}
$this->benches[$name]['start'] = array(
'microtime' => microtime(true)
/* Other information*/
);
}
public function end($name)
{
if(!isset($this->benches[$name]))
{
throw new Exception("You must first declare a benchmark for " . $name);
}
$this->benches[$name]['end'] = array(
'microtime' => microtime()
/* Other information*/
);
}
public function calculate($name)
{
if(!isset($this->benches[$name]))
{
throw new Exception("You must first declare a benchmark for " . $name);
}
if(!isset($this->benches[$name]['end']))
{
throw new Exception("You must first call an end call for " . $name);
}
return ($this->benches[$name]['end'] - $this->benches[$name]['start']) . 'ms'
}
}
And then use like so:
$CB = new CodeBench();
$CB->start("bench_1");
//Do work:
$CB->end("bench_1");
$CB->start("bench_2");
//Do work:
$CB->end("bench_2");
echo "First benchmark had taken: " . $CB->calculate("bench_1");
echo "Second benchmark had taken: " . $CB->calculate("bench_2");
Computing speeds are never 100% set in stone. PHP is a server-side script, and thus depending on the computing power available to the server, it can take a varying amount of time.
Since you're subtracting from the start time with each step, it is expected that load time 3 will be greater than 2 which will be greater than 1.
Related
I'm relatively new to PHP and slowly learning the idiosyncrasies specific to the language. One thing I get dinged by a lot is that I (so I'm told) use too many function calls and am generally asked to do things to work around them. Here's two examples:
// Change this:
} catch (Exception $e) {
print "It seems that error " . $e->getCode() . " occured";
log("Error: " . $e->getCode());
}
// To this:
} catch (Exception $e) {
$code = $e->getCode();
print "It seems that error " . $code . " occured";
log("Error: " . $code);
}
2nd Example
// Change this:
$customer->setProducts($products);
// To this:
if (!empty($products)) {
$customer->setProducts($products);
}
In the first example I find that assigning $e->getCode() to $code ads a slight cognitive overhead; "What's '$code'? Ah, it's the code from the exception." Whereas the second example adds cyclomatic complexity. In both examples I find the optimization to come at the cost of readability and maintainability.
Is the performance increase worth it or is this micro optimization?
I should note that we're stuck with PHP 5.2 for right now.
I've done some very rough bench tests and find the function call performance hit to be on the order of 10% to 70% depending on the nature of my bench test. I'll concede that this is significant. But before that catch block is hit there was a call to a database and an HTTP end point. Before $products was set on the $customer there was a complex sort that happened to the $products array. At the end of the day does this optimization justify the cost of making the code harder to read and maintain? Or, although these examples are simplifications, does anybody find the 2nd examples just as easy or easier to read than the first (am I being a wiener)?
Can anyone cite any good articles or studies about this?
Edit:
An example bench test:
<?php
class Foo {
private $list;
public function setList($list) {
$this->list = $list;
}
}
$foo1 = new Foo();
for($i = 0; $i < 1000000; $i++) {
$a = array();
if (!empty($a))
$foo1->setList($a);
}
?>
Run that file with the time command. On one particular machine it takes an average of 0.60 seconds after several runs. Commenting out the if (!empty($a)) causes it to take an average of 3.00 seconds to run.
Clarification: These are examples. The 1st example demonstrates horrible exception handling and a possible DRY violation at the expense of a simple, non-domain-specific example.
Nobody has yet discussed how the server's hardware is related to function call overhead.
When a function is called, all of the CPU's registers contain data relevant to the current execution point. All of the CPU's registers must be saved to memory (typically the process' stack) or there is no hope of ever returning to that execution point and resuming execution. When returning from the function, all of the CPU's registers must be restored from memory (typically the process' stack).
So, one can see how a string of nested function calls can add overhead to the process. The CPU's registers must be saved over and over again on the stack and restored over and over again to get back from the functions.
This is really the source of the overhead of function calls. And if function arguments are passed, those must all be duplicated before the function can be called. Therefore, passing huge arrays as function arguments is a poor design.
Studies have been done on object-oriented PHP on overhead of use of getters/setters. Removing all getters/setters cut execution time by about 50%. And that simply is due to function call overhead.
PHP function call overhead is precisely 15.5355%.
:) Just stirring the pot.
Seriously, here are a couple great links on the subject:
Is it possible to have too many functions in a PHP application?
functions vs repeated code
The code maintainability versus speed discussions at these links address the (maybe more important) question implied by the OP, but just to add a little data that may also be pertinent and hopefully useful to people who come across this thread in the future, here are results from running the below code on a 2011 Macbook Pro (with very little drive space and too many programs running).
As noted elsewhere, an important consideration in deciding whether to call a function or put the code "in-line" is how many times the function will be called from within a certain block of code. The more times the function will be called, the more it's worth considering doing the work in-line.
Results (times in seconds)
Call Function Method | In-Line Method | Difference | Percent Different
1,000 iterations (4 runs)
0.0039088726043701 | 0.0031478404998779 | 0.00076103210449219 | 19.4694
0.0038208961486816 | 0.0025999546051025 | 0.0012209415435791 | 31.9543
0.0030159950256348 | 0.0029480457305908 | 6.7949295043945E-5 | 2.2530
0.0031449794769287 | 0.0031390190124512 | 5.9604644775391E-6 | 0.1895
1,000,000 iterations (4 runs)
3.1843111515045 | 2.6896121501923 | 0.49469900131226 | 15.5355
3.131945848465 | 2.7114839553833 | 0.42046189308167 | 13.4249
3.0256152153015 | 2.7648048400879 | 0.26081037521362 | 8.6201
3.1251409053802 | 2.7397727966309 | 0.38536810874939 | 12.3312
function postgres_friendly_number($dirtyString) {
$cleanString = str_ireplace("(", "-", $dirtyString);
$badChars = array("$", ",", ")");
$cleanString = str_ireplace($badChars, "", $cleanString);
return $cleanString;
}
//main
$badNumberString = '-$590,832.61';
$iterations = 1000000;
$startTime = microtime(true);
for ($i = 1; $i <= $iterations; $i++) {
$goodNumberString = postgres_friendly_number($badNumberString);
}
$endTime = microtime(true);
$firstTime = ($endTime - $startTime);
$startTime = microtime(true);
for ($i = 1; $i <= $iterations; $i++) {
$goodNumberString = str_ireplace("(", "-", $badNumberString);
$badChars = array("$", ",", ")");
$goodNumberString = str_ireplace($badChars, "", $goodNumberString);
}
$endTime = microtime(true);
$secondTime = ($endTime - $startTime);
$timeDifference = $firstTime - $secondTime;
$percentDifference = (( $timeDifference / $firstTime ) * 100);
The canonical PHP implementation is very slow because it's easy to implement and the applications PHP aims at do not require raw performance like fast function calls.
You might want to consider other PHP implementations.
If you are writing the applications you should be writing in PHP (dump data from DB to the browser over a network) then the function call overhead is not significant. Certainly don't go out of your way to duplicate code because you were afraid using a function would be too much overhead.
I you are confusing terminology. Generally function call overhead
means the overhead involved in calling an returning from a function.
Rather than processing inline. Its not the the total cost of a function call.
Its just the cost preparing the arguments and return values and doing the branch.
The problem you have is that PHP and other weakly typed scripting style languages are really bad at determining if functions have side effects. So rather than store the results of a function as a temp, they will make multiple calls. If the function is doing something complex this will be very inefficient.
So, the bottom line is: call the function once and store and reuse the result!
Don't call the same function multiple times with the same arguments. (without a good reason)
I have the 2 following codes.
1:
$i = 0;
while(1)
{
$i++;
echo "big text for memory usage ";
if ( $i == 50000 )
break;
}
echo "<br />" . memory_get_usage();
It echoes every time : 1626464
2:
$i = 0;
for(;;)
{
$i++;
echo "big text for memory usage ";
if ( $i == 50000 )
break;
}
echo "<br />" . memory_get_usage();
It echoes every time : 1626656
Can anybody exaplain this difference between the 2 different memory usages? Even if they are so small...
It's an implementation detail. With the for loop, PHP probably uses some space to store three pointers, one for the for intialization, one for the incrementation, and one for the stop condition. If you're on a 64-bit system, then this accounts for the 64 * 3 = 192 extra bits you're seeing. Of course, it's hard to tell if I'm right without looking at the actual code.
The amount of memory difference you're seeing is very negligible and shouldn't be a concern. The way the two loops are compiled should not affect memory usage, but they may affect runtime speed (still negligible though).
For instance:
while(1): This will cause the compiler to check if 1 is true; if it isn't, it will jump to the end of your loop - if it is, it will process the loop's contents.
for(;;): A standard for-loop does three things. It defines an iterator, which you skip, checks if the iterator meets a condition, which you also skip, executes the body content, and then increments the iterator and jumps back to the condition-check. The jumps are all still in-place, but your code doesn't do anything at each (hence the "endless loop" - until you manually break, of course).
As a pure guess for a potential answer as-to-why the for-loop has a very slight raise in memory could be because of the way it's supposed to work - with a defined/managed iterator. PHP could pre-allocate extra space, if very small, to accomodate for this iterator and garbage collection even if you're not using it.
While logging some data using microtime() (using PHP 5), I encountered some values that seemed slightly out of phase in respect to the timestamp of my log file, so I just tried to compare the output of time() and microtime() with a simple script (usleep is just here in order to limit the data output):
<?php
for($i = 0; $i < 500; $i++) {
$microtime = microtime();
$time = time();
list($usec, $sec) = explode(" ", $microtime);
if ((int)$sec > $time) {
echo $time . ' : ' . $microtime . '<br>';
}
usleep(50000);
}
?>
Now, as $microtime is declared before $time, I expect it to be smaller, and nothing should ever be output; however, this obviously is not the case, and every now and then, $time is smaller than the seconds returned from microtime(), as in this example (truncated) output:
1344536674 : 0.15545100 1344536675
1344536675 : 0.15553900 1344536676
1344536676 : 0.15961000 1344536677
1344536677 : 0.16758900 1344536678
Now, this is just a small gap; however, I have observed some series where the difference is (quite) more than a second... so, how is this possible?
If you look at the implementations of time and microtime, you see they're radically different:
time just calls the C time function.
microtime has two implementations: If the C function gettimeofday is available (which it should be on a Linux system), it is called straight-forward. Otherwise they pull of some acrobatics to use rusage to calculate the time.
Since the C time call is only precise up to a second, it may also intentionally use a low-fidelity time source.
Furthermore, on modern x86_64 systems, both C functions can be implemented without a system call by looking into certain CPU registers. If you're on a multi-core system, these registers may not exactly match across cores, and that could be a reason.
Another potential reason for the discrepancies is that NTPd(a time-keeping daemon) or some other user-space process is changing the clock. Normally, these kinds of effects should be avoided by adjtime.
All of these rationales are pretty esoteric. To further debug the problem, you should:
Determine OS and CPU architecture (hint: and tell us!)
Try to force the process on one core
Stop any programs that are (or may be) adjusting the system time.
This could be due to the fact that microtime() uses floating point numbers, and therefore rounding errors may occur.
You can specify floating point numbers' precision in php.ini
Is there a table of how much "work" it takes to execute a given function in PHP? I'm not a compsci major, so I don't have maybe the formal background to know that "oh yeah, strings take longer to work with than integers" or anything like that. Are all steps/lines in a program created equal? I just don't even know where to start researching this.
I'm currently doing some Project Euler questions where I'm very sure my answer will work, but I'm timing out my local Apache server at a minute with my requests (and PE has said that all problems can be solved < 1 minute). I don't know how/where to start optimizing, so knowing more about PHP and how it uses memory would be useful. For what it's worth, here's my code for question 206:
<?php
$start = time();
for ($i=1010374999; $i < 1421374999; $i++) {
$a = number_format(pow($i,2),0,".","");
$c = preg_split('//', $a, -1, PREG_SPLIT_NO_EMPTY);
if ($c[0]==1) {
if ($c[2]==2) {
if ($c[4]==3) {
if ($c[6]==4) {
if ($c[8]==5) {
if ($c[10]==6) {
if ($c[12]==7) {
if ($c[14]==8) {
if ($c[16]==9) {
if ($c[18]==0) {
echo $i;
}
}
}
}
}
}
}
}
}
}
}
$end = time();
$elapsed = ($end-$start);
echo "<br />The time to calculate was $elapsed seconds";
?>
If this is a wiki question about optimization, just let me know and I'll move it. Again, not looking for an answer, just help on where to learn about being efficient in my coding (although cursory hints wouldn't be flat out rejected, and I realize there are probably more elegant mathematical ways to set up the problem)
There's no such table that's going to tell you how long each PHP function takes to execute, since the time of execution will vary wildly depending on the input.
Take a look at what your code is doing. You've created a loop that's going to run 411,000,000 times. Given the code needs to complete in less than 60 seconds (a minute), in order to solve the problem you're assuming each trip through the loop will take less than (approximately) .000000145 seconds. That's unreasonable, and no amount of using the "right" function will solve your call. Try your loop with nothing in there
for ($i=1010374999; $i < 1421374999; $i++) {
}
Unless you have access to science fiction computers, this probably isn't going to complete execution in less than 60 seconds. So you know this approach will never work.
This is known a brute force solution to a problem. The point of Project Euler is to get you thinking creatively, both from a math and programming point of view, about problems. You want to reduce the number of trips you need to take through that loop. The obvious solution will never be the answer here.
I don't want to tell you the solution, because the point of these things is to think your way through it and become a better algorithm programmer. Examine the problem, think about it's restrictions, and think about ways you reduce the total number of numbers you'd need to check.
A good tool for taking a look at execution times for your code is xDebug: http://xdebug.org/docs/profiler
It's an installable PHP extension which can be configured to output a complete breakdown of function calls and execution times for your script. Using this, you'll be able to see what in your code is taking longest to execute and try some different approaches.
EDIT: now that I'm actually looking at your code, you're running 400 million+ regex calls! I don't know anything about project Euler, but I have a hard time believing this code can be excuted in under a minute on commodity hardware.
preg_split is likely to be slow because it's using a regex. Is there not a better way to do that line?
Hint: You can access chars in a string like this:
$str = 'This is a test.';
echo $str[0];
Try switching preg_split() to explode() or str_split() which are faster
First, here's a slightly cleaner version of your function, with debug output
<?php
$start = time();
$min = (int)floor(sqrt(1020304050607080900));
$max = (int)ceil(sqrt(1929394959697989990));
for ($i=$min; $i < $max; $i++) {
$c = $i * $i;
echo $i, ' => ', $c, "\n";
if ($c[0]==1
&& $c[2]==2
&& $c[4]==3
&& $c[6]==4
&& $c[8]==5
&& $c[10]==6
&& $c[12]==7
&& $c[14]==8
&& $c[16]==9
&& $c[18]==0)
{
echo $i;
break;
}
}
$end = time();
$elapsed = ($end-$start);
echo "<br />The time to calculate was $elapsed seconds";
And here's the first 10 lines of output:
1010101010 => 1020304050403020100
1010101011 => 1020304052423222121
1010101012 => 1020304054443424144
1010101013 => 1020304056463626169
1010101014 => 1020304058483828196
1010101015 => 1020304060504030225
1010101016 => 1020304062524232256
1010101017 => 1020304064544434289
1010101018 => 1020304066564636324
1010101019 => 1020304068584838361
That, right there, seems like it oughta inspire a possible optimization of your algorithm. Note that we're not even close, as of the 6th entry (1020304060504030225) -- we've got a 6 in a position where we need a 5!
In fact, many of the next entries will be worthless, until we're back at a point where we have a 5 in that position. Why bother caluclating the intervening values? If we can figure out how, we should jump ahead to 1010101060, where that digit becomes a 5 again... If we can keep skipping dozens of iterations at a time like this, we'll save well over 90% of our run time!
Note that this may not be a practical approach at all (in fact, I'm fairly confident it's not), but this is the way you should be thinking. What mathematical tricks can you use to reduce the number of iterations you execute?
Or: Should I optimize my string-operations in PHP? I tried to ask PHP's manual about it, but I didn't get any hints to anything.
PHP already optimises it - variables are assigned using copy-on-write, and objects are passed by reference. In PHP 4 it doesn't, but nobody should be using PHP 4 for new code anyway.
One of the most essential speed optimization techniques in many languages is instance reuse. In that case the speed increase comes from at least 2 factors:
1. Less instantiations means less time spent on construction.
2. The less the amount of memory that the application uses, the less CPU cache misses there probably are.
For applications, where the speed is the #1 priority, there exists a truly tight bottleneck between the CPU and the RAM. One of the reasons for the bottleneck is the latency of the RAM.
The PHP, Ruby, Python, etc., are related to the cache-misses by a fact that even they store at least some (probably all) of the run-time data of the interpreted programs in the RAM.
String instantiation is one of the operations that is done pretty often, in relatively "huge quantities", and it may have a noticeable impact on speed.
Here's a run_test.bash of a measurement experiment:
#!/bin/bash
for i in `seq 1 200`;
do
/usr/bin/time -p -a -o ./measuring_data.rb php5 ./string_instantiation_speedtest.php
done
Here are the ./string_instantiation_speedtest.php and the measurement results:
<?php
// The comments on the
// next 2 lines show arithmetic mean of (user time + sys time) for 200 runs.
$b_instantiate=False; // 0.1624 seconds
$b_instantiate=True; // 0.1676 seconds
// The time consumed by the reference version is about 97% of the
// time consumed by the instantiation version, but a thing to notice is
// that the loop contains at least 1, probably 2, possibly 4,
// string instantiations at the array_push line.
$ar=array();
$s='This is a string.';
$n=10000;
$s_1=NULL;
for($i=0;$i<$n;$i++) {
if($b_instantiate) {
$s_1=''.$s;
} else {
$s_1=&$s;
}
// The rand is for avoiding optimization at storage.
array_push($ar,''.rand(0,9).$s_1);
} // for
echo($ar[rand(0,$n)]."\n");
?>
My conclusion from this experiment and one other experiment that I did with Ruby 1.8 is that it makes sense to pass string values around by reference.
One possible way to allow the "pass-strings-by-reference" to take place at the whole application scope is to consistently create a new string instance, whenever one needs to use a modified version of a string.
To increase locality, therefore speed, one may want to decrease the amount of memory that each of the operands consumes. The following experiment demonstrates the case for string concatenations:
<?php
// The comments on the
// next 2 lines show arithmetic mean of (user time + sys time) for 200 runs.
$b_suboptimal=False; // 0.0611 seconds
$b_suboptimal=True; // 0.0785 seconds
// The time consumed by the optimal version is about 78% of the
// time consumed by the suboptimal version.
//
// The number of concatenations is the same and the resultant
// string is the same, but what differs is the "average" and maximum
// lengths of the tokens that are used for assembling the $s_whole.
$n=1000;
$s_token="This is a string with a Linux line break.\n";
$s_whole='';
if($b_suboptimal) {
for($i=0;$i<$n;$i++) {
$s_whole=$s_whole.$s_token.$i;
} // for
} else {
$i_watershed=(int)round((($n*1.0)/2),0);
$s_part_1='';
$s_part_2='';
for($i=0;$i<$i_watershed;$i++) {
$s_part_1=$s_part_1.$i.$s_token;
} // for
for($i=$i_watershed;$i<$n;$i++) {
$s_part_2=$s_part_2.$i.$s_token;
} // for
$s_whole=$s_part_1.$s_part_2;
} // else
// To circumvent possible optimization one actually "uses" the
// value of the $s_whole.
$file_handle=fopen('./it_might_have_been_a_served_HTML_page.txt','w');
fwrite($file_handle, $s_whole);
fclose($file_handle);
?>
For example, if one assembles HTML pages that contain considerable amount of text, then one might want to think about the order, how different parts of the generated HTML are concated together.
A BSD-licensed PHP implementation and Ruby implementation of the watershed string concatenation algorithm is available. The same algorithm can be (has been by me) generalized to speed up multiplication of arbitrary precision integers.
Arrays and strings have copy-on-write behaviour. They are mutable, but when you assign them to a variable initially that variable will contain the exact same instance of the string or array. Only when you modify the array or string is a copy made.
Example:
$a = array_fill(0, 10000, 42); //Consumes 545744 bytes
$b = $a; // " 48 "
$b[0] = 42; // " 545656 "
$s = str_repeat(' ', 10000); // " 10096 "
$t = $s; // " 48 "
$t[0] = '!'; // " 10048 "
A quick google would seem to suggest that they are mutable, but the preferred practice is to treat them as immutable.
PHP 7.4 used mutable strings:
<?php
$str = "Hello\n";
echo $str;
$str[2] = 'y';
echo $str;
Output:
Hello
Heylo
Test: PHP Sandbox
PHP strings are immutable.
Try this:
$a="string";
echo "<br>$a<br>";
echo str_replace('str','b',$a);
echo "<br>$a";
It echos:
string
bing
string
If a string was mutable, it would have continued to show "bing".