I'm trying to work out which method would be faster (if either would be?). I've designed a test here: http://codepad.org/odyUN0xg
When I run that test my results are very inconsistent. Both vary wildly.
Is there a problem with the test?
otherwise, can anyone suggest which one would or wouldn't be faster??
Edit
Okay guys, thanks I've edited the codepad here: http://codepad.org/n1Xrt98J
With all the comments and discussion i've decided to go with array sum. As soon as I used that microtime(true) thing it started to look alot faster (array_sum)
Cheers for the advice, oh and I've added a "for" loop so that results are more even, but as noted on the results there is little time saving if any over a foreach.
The problem is that you took a very low limit, 1000. The overhead of the PHP interpreter is MUCH larger. I would take 100000000 or something like that.
However I think array_sum is faster since it's more specialized and probably implemented in fast C.
Oh, and as Michael McTiernan said you must change every instance of microtime() to microtime(true). http://php.net/manual/en/function.microtime.php
And finally, I wouldn't use codepad as testing environment since you have no control over it. You have no idea what happens and whether your process is paused or not.
To be honest, there's little value in using an artificial test and either way this sounds like fairly pointless micro-optimisation unless you've specifically identified this as a problem area after profiling the necessary code.
As such, it probably makes sense to use which ever feels more appropriate. I'd personally plump for array_sum - that's what it's there for after all.
Change any instance of microtime() to microtime(true).
Also, after testing this, the results aren't that wildly different.
$s = microtime();
A call to microtime() with no arguments will return a string like this: 0.35250000 1300737802. You probably want this:
$s = microtime(TRUE);
array sum took 0.125188 seconds sum
numbers took 0.166603 seconds
These kind of tests need to be run a few thousand times so you can get large execution times that are not affected by tiny external factors.
You need much bigger runs, and need to average several of them. You should also separate the two tests into two files. The server has other things going on that will disturb test timing, hence the need to average many runs.
array_sum() should be the faster of two as there is no extra script parsing associated with it, but its worth checking.
Almost always array_sum is faster. It depends more on your server php.ini configuration than actual usage between array_sum and foreach.
Coming to the point of this question, for a sample setup like following:
<?php
set_time_limit(0);
$s = microtime(TRUE);
$array = range(1, 10000);
$sum = 0;
for($j = 0; $j < 1000; $j++){
$sum += array_sum($array);
}
$s1 = microtime(TRUE);
$diff = $s1 - $s;
echo "for 1000 pass, array_sum took {$diff} seconds. Result = {$sum}<br/>";
$sum = 0;
$s2 = microtime(TRUE);
for($j = 0; $j < 1000; $j++){
foreach($array as $val){
$sum += $val;
}
}
$s3 = microtime(TRUE);
$diff = $s3 - $s2;
echo "for 1000 pass, foreach took {$diff} seconds. Result = {$sum}<br/>";
I got results where foreach was always slower. So that should answer your question. Sample:
for 1000 pass, array_sum took 0.2720000743866 seconds. Result = 50005000000
for 1000 pass, foreach took 1.7239999771118 seconds. Result = 50005000000
Related
A question that has always puzzled me is why people write it like the first version when the second version is smaller and easier to read. I thought it might be because php calculates the strlen each time it iterates. any ideas?
FIRST VERSION
for ($i = 0, $len = strlen($key); $i < $len; $i++) {}
You can obviously use $len inside the loop and further on in the code, but what are the benefits over the following version?
SECOND VERSION
for ($i = 0; $i < strlen($key); $i++) {}
It's a matter of performance.
Your first version of the for loop will recaculate the strlen every time and thus, the performances could be slowed down.
Even though it wouldn't be significant enough, you could be surprised how much the slow can be exponantial sometimes.
You can see here for some performances benchmarks with loops.
The first version is best used if the loop is expected to have many iterations and $key won't change in the process.
The second one is best used if the loop is updating $key and you need to recalculate it, or, when recalculating it doesn't affect your performance.
I am using PHP CLI to provide standard input. Am I using the optimal method of reading that input?
For example, I will provide it 50,000 lines of data. Each line contains two numbers. Is my code below the most efficient way to read 50,000 lines of data? Or this a very inefficient way to do so?
Here is my code:
<?php
// Testing time period for execution
// Time tracker: TESTING
$micropoint1 = microtime(true);
// First, retrieve the number of points that will be provided.
$no_points = fgets(STDIN);
for($i=1, $max=$no_points+1; $i<$max; $i++) {
list($x, $y) = fscanf(STDIN, "%d %d"); // Get the string returned from the command line and convert to an array
}
// Time tracker: TESTING
$micropoint2 = microtime(true);
$pointelapsed = $micropoint2 - $micropoint1;
fwrite(STDOUT, "\nPoint Loop Took ".$pointelapsed." microsecs\n");
?>
I can't imagine your approach getting any more efficient.
To be more efficient, it's better to:
Make your "%d %d" into single quotes '%d %d'
Move this string into variable/constant and use it in 50000 loop
As it is clearly very minimal and cannot be nerfed any further. However, since you didn't specify, that you want to optimize the loop only. Then I few points for the rest of the code.
When displaying the microtime, you can do this:
$pointelapsed = number_format(microtime(true) - $micropoint1, 7);
And also, what happens if fscanf() cant return anything. Wouldn't it give out an error..?
I would like to compare different PHP code to know which one would be executed faster. I am currently using the following code:
<?php
$load_time_1 = 0;
$load_time_2 = 0;
$load_time_3 = 0;
for($x = 1; $x <= 20000; $x++)
{
//code 1
$start_time = microtime(true);
$i = 1;
$i++;
$load_time_1 += (microtime(true) - $start_time);
//code 2
$start_time = microtime(true);
$i = 1;
$i++;
$load_time_2 += (microtime(true) - $start_time);
//code 3
$start_time = microtime(true);
$i = 1;
$i++;
$load_time_3 += (microtime(true) - $start_time);
}
echo $load_time_1;
echo '<br />';
echo $load_time_2;
echo '<br />';
echo $load_time_3;
?>
I have executed the script several times.
The first result is
0.44057559967041
0.43392467498779
0.43600964546204
The second result is
0.50447297096252
0.48595094680786
0.49943733215332
The third result is
0.5283739566803
0.55247902870178
0.55091571807861
The result looks okay, but the problem is, that every time I execute this code the result is different. Also, I am comparing three times the same code and on the same machine.
Why would there be a difference in speed while comparing? And is there a way to compare execution times and to see the the real difference?
there is a thing called Observational error.
As long as your numbers do not exceed it, all your measurements are just waste of time.
The only proper way of doing measurements is called profiling and stands for measuring significant parts of the code, not senseless ones.
Why would there be a difference in
speed while comparing?
There are two reasons for this, which are both related to how things that are out of your control are handled by the PHP and the Operating system.
Firstly, the computer processor can only do certain amount of operations at any given time. The Operating system is basically responsible of handling the multitasking to divide these available cycles to your applications. Since these cycles aren't given at a constant rate, small speed variations are to be expected even with identical PHP commands, because of how processor cycles are allocated.
Secondly, a bigger cause to time variations are the background operations of PHP. There are many things that are completely hidden to the user, like memory allocation, garbage collection and handling various name spaces for variables and the like. These operations also take computer cycles and they can be run at unexpected times during your script. If garbage collection is performed during the first incrementation, but not the second, it causes the first operation to take longer than the second. Sometimes, because of garbage collection, the order in which the tests are performed can also impact the execution time.
Speed testing can be a bit tricky, because unrelated factors (like other applications running in the background) can skew the results of your test. Generally small speed differences are hard to tell between scripts, but when a speed test is run enough number of times, the real results can be seen. For example, if one script is constantly faster than another, it usually points out to that script being more efficient in terms of processing speed.
the reason the results vary is because there are other things going on at the same time, such as windows or linux based tasks, other processes, you will never get an exact result, you best of running the code over a 100 iterations and then devide the result to find the average time take, and use that as your figure/
Also it would be beneficial for you to create a class that can handle this for you, this way you can use it all the time without having to write the code every time:
try something like this (untested):
class CodeBench
{
private $benches = array();
public function __construct(){}
public function begin($name)
{
if(!isset($this->benches[$name]))
{
$this->benches[$name] = array();
}
$this->benches[$name]['start'] = array(
'microtime' => microtime(true)
/* Other information*/
);
}
public function end($name)
{
if(!isset($this->benches[$name]))
{
throw new Exception("You must first declare a benchmark for " . $name);
}
$this->benches[$name]['end'] = array(
'microtime' => microtime()
/* Other information*/
);
}
public function calculate($name)
{
if(!isset($this->benches[$name]))
{
throw new Exception("You must first declare a benchmark for " . $name);
}
if(!isset($this->benches[$name]['end']))
{
throw new Exception("You must first call an end call for " . $name);
}
return ($this->benches[$name]['end'] - $this->benches[$name]['start']) . 'ms'
}
}
And then use like so:
$CB = new CodeBench();
$CB->start("bench_1");
//Do work:
$CB->end("bench_1");
$CB->start("bench_2");
//Do work:
$CB->end("bench_2");
echo "First benchmark had taken: " . $CB->calculate("bench_1");
echo "Second benchmark had taken: " . $CB->calculate("bench_2");
Computing speeds are never 100% set in stone. PHP is a server-side script, and thus depending on the computing power available to the server, it can take a varying amount of time.
Since you're subtracting from the start time with each step, it is expected that load time 3 will be greater than 2 which will be greater than 1.
Many many times on a page I will have to set post and get values in PHP like this
I just want to know if it is better to just continue doing it the way I have above or if performance would not be touched by adding it into a function like in the code below?
This would make it much easiar to write code but at the expense of making extra function calls on the page.
I have all the time in the world so making the code as fast as possible is more important to me then making it "easiar to write or faster to develop"
Appreciate any advice and please nothing about whichever makes it easier to develop, I am talking pure performance here =)
<?php
function arg_p($name, $default = null) {
return (isset($_GET[$name]))?$_GET[$name]:$default;
}
$pagesize = arg_p('pagesize', 10);
$pagesize = (isset($_GET['pagesize'])) ? $_GET['pagesize'] : 10;
?>
If you have all the time in the world, why don't you just test it?
<?php
// How many iterations?
$iterations = 100000;
// Inline
$timer_start = microtime(TRUE);
for($i = 0; $i < $iterations; $i++) {
$pagesize = (isset($_GET['pagesize'])) ? $_GET['pagesize'] : 10;
}
$time_spent = microtime(TRUE) - $timer_start;
printf("Inline: %.3fs\n", $time_spent);
// By function call
function arg_p($name, $default = null) {
return (isset($_GET[$name])) ? $_GET[$name] : $default;
}
$timer_start = microtime(TRUE);
for($i = 0; $i < $iterations; $i++) {
$pagesize = arg_p('pagesize', 10);
}
$time_spent = microtime(TRUE) - $timer_start;
printf("By function call: %.3fs\n", $time_spent);
?>
On my machine, this gives pretty clear results in favor of inline execution by a factor of almost 10. But you need a lot of iterations to really notice it.
(I would still use a function though, even if me answering this shows that I have time to waste ;)
Sure you'll probably get a performance benefit from not wrapping it into a function. But would it be noticeable? Not really.
Your time is worth more than the small about of CPU resources you'd save.
I doubt the difference in speed would be noticeable unless you are doing it many hundreds of times.
Function call is a performance hit, but you should also think about maintainability - wrapping it in the function could ease future changes (and copy-paste is bad for that).
Whilst performance wouldn't really be affected, anything that takes code out of the html stream the better.
Even with a thousand calls to your arg_p() you wouldn't be able to measure —let alone notice— the difference in performance. The time you will spend typing the extra "inline" code plus the time you will spend whenever you'll have to duplicate the changes to every inlined copy plus the added complexity and higher probability of typo or random error will cost you more than the unmeasurable performance improvement. In fact, that time could be spent on optimizing what really counts such as improving your database design, profile your code to find the areas that really affect the generation time, etc...
You'll be better off keeping your code clean. It will save you time that you can in turn invest into optimizing what really counts.
Is there a table of how much "work" it takes to execute a given function in PHP? I'm not a compsci major, so I don't have maybe the formal background to know that "oh yeah, strings take longer to work with than integers" or anything like that. Are all steps/lines in a program created equal? I just don't even know where to start researching this.
I'm currently doing some Project Euler questions where I'm very sure my answer will work, but I'm timing out my local Apache server at a minute with my requests (and PE has said that all problems can be solved < 1 minute). I don't know how/where to start optimizing, so knowing more about PHP and how it uses memory would be useful. For what it's worth, here's my code for question 206:
<?php
$start = time();
for ($i=1010374999; $i < 1421374999; $i++) {
$a = number_format(pow($i,2),0,".","");
$c = preg_split('//', $a, -1, PREG_SPLIT_NO_EMPTY);
if ($c[0]==1) {
if ($c[2]==2) {
if ($c[4]==3) {
if ($c[6]==4) {
if ($c[8]==5) {
if ($c[10]==6) {
if ($c[12]==7) {
if ($c[14]==8) {
if ($c[16]==9) {
if ($c[18]==0) {
echo $i;
}
}
}
}
}
}
}
}
}
}
}
$end = time();
$elapsed = ($end-$start);
echo "<br />The time to calculate was $elapsed seconds";
?>
If this is a wiki question about optimization, just let me know and I'll move it. Again, not looking for an answer, just help on where to learn about being efficient in my coding (although cursory hints wouldn't be flat out rejected, and I realize there are probably more elegant mathematical ways to set up the problem)
There's no such table that's going to tell you how long each PHP function takes to execute, since the time of execution will vary wildly depending on the input.
Take a look at what your code is doing. You've created a loop that's going to run 411,000,000 times. Given the code needs to complete in less than 60 seconds (a minute), in order to solve the problem you're assuming each trip through the loop will take less than (approximately) .000000145 seconds. That's unreasonable, and no amount of using the "right" function will solve your call. Try your loop with nothing in there
for ($i=1010374999; $i < 1421374999; $i++) {
}
Unless you have access to science fiction computers, this probably isn't going to complete execution in less than 60 seconds. So you know this approach will never work.
This is known a brute force solution to a problem. The point of Project Euler is to get you thinking creatively, both from a math and programming point of view, about problems. You want to reduce the number of trips you need to take through that loop. The obvious solution will never be the answer here.
I don't want to tell you the solution, because the point of these things is to think your way through it and become a better algorithm programmer. Examine the problem, think about it's restrictions, and think about ways you reduce the total number of numbers you'd need to check.
A good tool for taking a look at execution times for your code is xDebug: http://xdebug.org/docs/profiler
It's an installable PHP extension which can be configured to output a complete breakdown of function calls and execution times for your script. Using this, you'll be able to see what in your code is taking longest to execute and try some different approaches.
EDIT: now that I'm actually looking at your code, you're running 400 million+ regex calls! I don't know anything about project Euler, but I have a hard time believing this code can be excuted in under a minute on commodity hardware.
preg_split is likely to be slow because it's using a regex. Is there not a better way to do that line?
Hint: You can access chars in a string like this:
$str = 'This is a test.';
echo $str[0];
Try switching preg_split() to explode() or str_split() which are faster
First, here's a slightly cleaner version of your function, with debug output
<?php
$start = time();
$min = (int)floor(sqrt(1020304050607080900));
$max = (int)ceil(sqrt(1929394959697989990));
for ($i=$min; $i < $max; $i++) {
$c = $i * $i;
echo $i, ' => ', $c, "\n";
if ($c[0]==1
&& $c[2]==2
&& $c[4]==3
&& $c[6]==4
&& $c[8]==5
&& $c[10]==6
&& $c[12]==7
&& $c[14]==8
&& $c[16]==9
&& $c[18]==0)
{
echo $i;
break;
}
}
$end = time();
$elapsed = ($end-$start);
echo "<br />The time to calculate was $elapsed seconds";
And here's the first 10 lines of output:
1010101010 => 1020304050403020100
1010101011 => 1020304052423222121
1010101012 => 1020304054443424144
1010101013 => 1020304056463626169
1010101014 => 1020304058483828196
1010101015 => 1020304060504030225
1010101016 => 1020304062524232256
1010101017 => 1020304064544434289
1010101018 => 1020304066564636324
1010101019 => 1020304068584838361
That, right there, seems like it oughta inspire a possible optimization of your algorithm. Note that we're not even close, as of the 6th entry (1020304060504030225) -- we've got a 6 in a position where we need a 5!
In fact, many of the next entries will be worthless, until we're back at a point where we have a 5 in that position. Why bother caluclating the intervening values? If we can figure out how, we should jump ahead to 1010101060, where that digit becomes a 5 again... If we can keep skipping dozens of iterations at a time like this, we'll save well over 90% of our run time!
Note that this may not be a practical approach at all (in fact, I'm fairly confident it's not), but this is the way you should be thinking. What mathematical tricks can you use to reduce the number of iterations you execute?