A question that has always puzzled me is why people write it like the first version when the second version is smaller and easier to read. I thought it might be because php calculates the strlen each time it iterates. any ideas?
FIRST VERSION
for ($i = 0, $len = strlen($key); $i < $len; $i++) {}
You can obviously use $len inside the loop and further on in the code, but what are the benefits over the following version?
SECOND VERSION
for ($i = 0; $i < strlen($key); $i++) {}
It's a matter of performance.
Your first version of the for loop will recaculate the strlen every time and thus, the performances could be slowed down.
Even though it wouldn't be significant enough, you could be surprised how much the slow can be exponantial sometimes.
You can see here for some performances benchmarks with loops.
The first version is best used if the loop is expected to have many iterations and $key won't change in the process.
The second one is best used if the loop is updating $key and you need to recalculate it, or, when recalculating it doesn't affect your performance.
Related
I am currently updating very old script written for PHP 5.2.17 for PHP 8.1.2. There is a lot of text processing code blocks and almost all of them are preg_match/preg_match_all. I used to know, that strpos for string matching have always been faster than preg_match, but I decided to check one more time.
Code was:
$c = file_get_contents('readme-redist-bins.txt');
$start = microtime(true);
for ($i=0; $i < 1000000; $i++) {
strpos($c, '[SOMEMACRO]');
}
$el = microtime(true) - $start;
exit($el);
and
$c = file_get_contents('readme-redist-bins.txt');
$start = microtime(true);
for ($i=0; $i < 1000000; $i++) {
preg_match_all("/\[([a-z0-9-]{0,100})".'[SOMEMACRO]'."/", $c, $pma);
}
$el = microtime(true) - $start;
exit($el);
I took readme-redist-bins.txt file which comes with php8.1.2 distribution, about 30KB.
Results(preg_match_all):
PHP_8.1.2: 1.2461s
PHP_5.2.17: 11.0701s
Results(strpos):
PHP_8.1.2: 9.97s
PHP_5.2.17: 0.65s
Double checked... Tried Windows and Linux PHP builds, on two machines.
Tried the same code with small file(200B)
Results(preg_match_all):
PHP_8.1.2: 0.0867s
PHP_5.2.17: 0.6097s
Results(strpos):
PHP_8.1.2: 0.0358s
PHP_5.2.17: 0.2484s
And now the timings is OK.
So, how cant it be, that preg_match is so match faster on large text? Any ideas?
PS: Tried PHP_7.2.10 - same result.
PCRE2 is really fast. It's so fast that there usually is barely any difference between it and plain string processing in PHP and sometimes it's even faster. PCRE2 internally uses JIT and contains a lot of optimizations. It's really good at what it does.
On the other hand, strpos is poorly optimized. It's doing some simple byte comparison in C. It doesn't use parallelization/vectorization. For short needles and short haystacks, it uses memchr, but for longer values, it performs Sunday Algorithm.
For small datasets, the overhead from calling PCRE2 will probably outweigh its optimizations, but for larger strings, or case-insensitive/Unicode strings PCRE2 might offer better performance.
I'm trying to work out which method would be faster (if either would be?). I've designed a test here: http://codepad.org/odyUN0xg
When I run that test my results are very inconsistent. Both vary wildly.
Is there a problem with the test?
otherwise, can anyone suggest which one would or wouldn't be faster??
Edit
Okay guys, thanks I've edited the codepad here: http://codepad.org/n1Xrt98J
With all the comments and discussion i've decided to go with array sum. As soon as I used that microtime(true) thing it started to look alot faster (array_sum)
Cheers for the advice, oh and I've added a "for" loop so that results are more even, but as noted on the results there is little time saving if any over a foreach.
The problem is that you took a very low limit, 1000. The overhead of the PHP interpreter is MUCH larger. I would take 100000000 or something like that.
However I think array_sum is faster since it's more specialized and probably implemented in fast C.
Oh, and as Michael McTiernan said you must change every instance of microtime() to microtime(true). http://php.net/manual/en/function.microtime.php
And finally, I wouldn't use codepad as testing environment since you have no control over it. You have no idea what happens and whether your process is paused or not.
To be honest, there's little value in using an artificial test and either way this sounds like fairly pointless micro-optimisation unless you've specifically identified this as a problem area after profiling the necessary code.
As such, it probably makes sense to use which ever feels more appropriate. I'd personally plump for array_sum - that's what it's there for after all.
Change any instance of microtime() to microtime(true).
Also, after testing this, the results aren't that wildly different.
$s = microtime();
A call to microtime() with no arguments will return a string like this: 0.35250000 1300737802. You probably want this:
$s = microtime(TRUE);
array sum took 0.125188 seconds sum
numbers took 0.166603 seconds
These kind of tests need to be run a few thousand times so you can get large execution times that are not affected by tiny external factors.
You need much bigger runs, and need to average several of them. You should also separate the two tests into two files. The server has other things going on that will disturb test timing, hence the need to average many runs.
array_sum() should be the faster of two as there is no extra script parsing associated with it, but its worth checking.
Almost always array_sum is faster. It depends more on your server php.ini configuration than actual usage between array_sum and foreach.
Coming to the point of this question, for a sample setup like following:
<?php
set_time_limit(0);
$s = microtime(TRUE);
$array = range(1, 10000);
$sum = 0;
for($j = 0; $j < 1000; $j++){
$sum += array_sum($array);
}
$s1 = microtime(TRUE);
$diff = $s1 - $s;
echo "for 1000 pass, array_sum took {$diff} seconds. Result = {$sum}<br/>";
$sum = 0;
$s2 = microtime(TRUE);
for($j = 0; $j < 1000; $j++){
foreach($array as $val){
$sum += $val;
}
}
$s3 = microtime(TRUE);
$diff = $s3 - $s2;
echo "for 1000 pass, foreach took {$diff} seconds. Result = {$sum}<br/>";
I got results where foreach was always slower. So that should answer your question. Sample:
for 1000 pass, array_sum took 0.2720000743866 seconds. Result = 50005000000
for 1000 pass, foreach took 1.7239999771118 seconds. Result = 50005000000
Many many times on a page I will have to set post and get values in PHP like this
I just want to know if it is better to just continue doing it the way I have above or if performance would not be touched by adding it into a function like in the code below?
This would make it much easiar to write code but at the expense of making extra function calls on the page.
I have all the time in the world so making the code as fast as possible is more important to me then making it "easiar to write or faster to develop"
Appreciate any advice and please nothing about whichever makes it easier to develop, I am talking pure performance here =)
<?php
function arg_p($name, $default = null) {
return (isset($_GET[$name]))?$_GET[$name]:$default;
}
$pagesize = arg_p('pagesize', 10);
$pagesize = (isset($_GET['pagesize'])) ? $_GET['pagesize'] : 10;
?>
If you have all the time in the world, why don't you just test it?
<?php
// How many iterations?
$iterations = 100000;
// Inline
$timer_start = microtime(TRUE);
for($i = 0; $i < $iterations; $i++) {
$pagesize = (isset($_GET['pagesize'])) ? $_GET['pagesize'] : 10;
}
$time_spent = microtime(TRUE) - $timer_start;
printf("Inline: %.3fs\n", $time_spent);
// By function call
function arg_p($name, $default = null) {
return (isset($_GET[$name])) ? $_GET[$name] : $default;
}
$timer_start = microtime(TRUE);
for($i = 0; $i < $iterations; $i++) {
$pagesize = arg_p('pagesize', 10);
}
$time_spent = microtime(TRUE) - $timer_start;
printf("By function call: %.3fs\n", $time_spent);
?>
On my machine, this gives pretty clear results in favor of inline execution by a factor of almost 10. But you need a lot of iterations to really notice it.
(I would still use a function though, even if me answering this shows that I have time to waste ;)
Sure you'll probably get a performance benefit from not wrapping it into a function. But would it be noticeable? Not really.
Your time is worth more than the small about of CPU resources you'd save.
I doubt the difference in speed would be noticeable unless you are doing it many hundreds of times.
Function call is a performance hit, but you should also think about maintainability - wrapping it in the function could ease future changes (and copy-paste is bad for that).
Whilst performance wouldn't really be affected, anything that takes code out of the html stream the better.
Even with a thousand calls to your arg_p() you wouldn't be able to measure —let alone notice— the difference in performance. The time you will spend typing the extra "inline" code plus the time you will spend whenever you'll have to duplicate the changes to every inlined copy plus the added complexity and higher probability of typo or random error will cost you more than the unmeasurable performance improvement. In fact, that time could be spent on optimizing what really counts such as improving your database design, profile your code to find the areas that really affect the generation time, etc...
You'll be better off keeping your code clean. It will save you time that you can in turn invest into optimizing what really counts.
Is there a table of how much "work" it takes to execute a given function in PHP? I'm not a compsci major, so I don't have maybe the formal background to know that "oh yeah, strings take longer to work with than integers" or anything like that. Are all steps/lines in a program created equal? I just don't even know where to start researching this.
I'm currently doing some Project Euler questions where I'm very sure my answer will work, but I'm timing out my local Apache server at a minute with my requests (and PE has said that all problems can be solved < 1 minute). I don't know how/where to start optimizing, so knowing more about PHP and how it uses memory would be useful. For what it's worth, here's my code for question 206:
<?php
$start = time();
for ($i=1010374999; $i < 1421374999; $i++) {
$a = number_format(pow($i,2),0,".","");
$c = preg_split('//', $a, -1, PREG_SPLIT_NO_EMPTY);
if ($c[0]==1) {
if ($c[2]==2) {
if ($c[4]==3) {
if ($c[6]==4) {
if ($c[8]==5) {
if ($c[10]==6) {
if ($c[12]==7) {
if ($c[14]==8) {
if ($c[16]==9) {
if ($c[18]==0) {
echo $i;
}
}
}
}
}
}
}
}
}
}
}
$end = time();
$elapsed = ($end-$start);
echo "<br />The time to calculate was $elapsed seconds";
?>
If this is a wiki question about optimization, just let me know and I'll move it. Again, not looking for an answer, just help on where to learn about being efficient in my coding (although cursory hints wouldn't be flat out rejected, and I realize there are probably more elegant mathematical ways to set up the problem)
There's no such table that's going to tell you how long each PHP function takes to execute, since the time of execution will vary wildly depending on the input.
Take a look at what your code is doing. You've created a loop that's going to run 411,000,000 times. Given the code needs to complete in less than 60 seconds (a minute), in order to solve the problem you're assuming each trip through the loop will take less than (approximately) .000000145 seconds. That's unreasonable, and no amount of using the "right" function will solve your call. Try your loop with nothing in there
for ($i=1010374999; $i < 1421374999; $i++) {
}
Unless you have access to science fiction computers, this probably isn't going to complete execution in less than 60 seconds. So you know this approach will never work.
This is known a brute force solution to a problem. The point of Project Euler is to get you thinking creatively, both from a math and programming point of view, about problems. You want to reduce the number of trips you need to take through that loop. The obvious solution will never be the answer here.
I don't want to tell you the solution, because the point of these things is to think your way through it and become a better algorithm programmer. Examine the problem, think about it's restrictions, and think about ways you reduce the total number of numbers you'd need to check.
A good tool for taking a look at execution times for your code is xDebug: http://xdebug.org/docs/profiler
It's an installable PHP extension which can be configured to output a complete breakdown of function calls and execution times for your script. Using this, you'll be able to see what in your code is taking longest to execute and try some different approaches.
EDIT: now that I'm actually looking at your code, you're running 400 million+ regex calls! I don't know anything about project Euler, but I have a hard time believing this code can be excuted in under a minute on commodity hardware.
preg_split is likely to be slow because it's using a regex. Is there not a better way to do that line?
Hint: You can access chars in a string like this:
$str = 'This is a test.';
echo $str[0];
Try switching preg_split() to explode() or str_split() which are faster
First, here's a slightly cleaner version of your function, with debug output
<?php
$start = time();
$min = (int)floor(sqrt(1020304050607080900));
$max = (int)ceil(sqrt(1929394959697989990));
for ($i=$min; $i < $max; $i++) {
$c = $i * $i;
echo $i, ' => ', $c, "\n";
if ($c[0]==1
&& $c[2]==2
&& $c[4]==3
&& $c[6]==4
&& $c[8]==5
&& $c[10]==6
&& $c[12]==7
&& $c[14]==8
&& $c[16]==9
&& $c[18]==0)
{
echo $i;
break;
}
}
$end = time();
$elapsed = ($end-$start);
echo "<br />The time to calculate was $elapsed seconds";
And here's the first 10 lines of output:
1010101010 => 1020304050403020100
1010101011 => 1020304052423222121
1010101012 => 1020304054443424144
1010101013 => 1020304056463626169
1010101014 => 1020304058483828196
1010101015 => 1020304060504030225
1010101016 => 1020304062524232256
1010101017 => 1020304064544434289
1010101018 => 1020304066564636324
1010101019 => 1020304068584838361
That, right there, seems like it oughta inspire a possible optimization of your algorithm. Note that we're not even close, as of the 6th entry (1020304060504030225) -- we've got a 6 in a position where we need a 5!
In fact, many of the next entries will be worthless, until we're back at a point where we have a 5 in that position. Why bother caluclating the intervening values? If we can figure out how, we should jump ahead to 1010101060, where that digit becomes a 5 again... If we can keep skipping dozens of iterations at a time like this, we'll save well over 90% of our run time!
Note that this may not be a practical approach at all (in fact, I'm fairly confident it's not), but this is the way you should be thinking. What mathematical tricks can you use to reduce the number of iterations you execute?
I'm starting out my expedition into Project Euler. And as many others I've figured I need to make a prime number generator. Problem is: PHP doesn't like big numbers. If I use the standard Sieve of Eratosthenes function, and set the limit to 2 million, it will crash. It doesn't like creating arrays of that size. Understandable.
So now I'm trying to optimize it. One way, I found, was to instead of creating an array with 2 million variable, I only need 1 million (only odd numbers can be prime numbers). But now it's crashing because it exceeds the maximum execution time...
function getPrimes($limit) {
$count = 0;
for ($i = 3; $i < $limit; $i += 2) {
$primes[$count++] = $i;
}
for ($n = 3; $n < $limit; $n += 2) {
//array will be half the size of $limit
for ($i = 1; $i < $limit/2; $i++) {
if ($primes[$i] % $n === 0 && $primes[$i] !== $n) {
$primes[$i] = 0;
}
}
}
return $primes;
}
The function works, but as I said, it's a bit slow...any suggestions?
One thing I've found to make it a bit faster is to switch the loop around.
foreach ($primes as $value) {
//$limitSq is the sqrt of the limit, as that is as high as I have to go
for ($n = 3; $n = $limitSq; $n += 2) {
if ($value !== $n && $value % $n === 0) {
$primes[$count] = 0;
$n = $limitSq; //breaking the inner loop
}
}
$count++;
}
And in addition setting the time and memory limit (thanks Greg), I've finally managed to get an answer. phjew.
Without knowing much about the algorithm:
You're recalculating $limit/2 each time around the $i loop
Your if statement will be evaluated in order, so think about (or test) whether it would be faster to test $primes[$i] !== $n first.
Side note, you can use set_time_limit() to give it longer to run and give it more memory using
ini_set('memory_limit', '128M');
Assuming your setup allows this, of course - on a shared host you may be restricted.
From Algorithmist's proposed solution
This is a modification of the standard
Sieve of Eratosthenes. It would be
highly inefficient, using up far too
much memory and time, to run the
standard sieve all the way up to n.
However, no composite number less than
or equal to n will have a factor
greater than sqrt{n},
so we only need to know all primes up
to this limit, which is no greater
than 31622 (square root of 10^9). This
is accomplished with a sieve. Then,
for each query, we sieve through only
the range given, using our
pre-computed table of primes to
eliminate composite numbers.
This problem has also appeared on UVA's and Sphere's online judges. Here's how it's enunciated on Sphere.
You can use a bit field to store your sieve. That is, it's roughly identical to an array of booleans, except you pack your booleans into a large integer. For instance if you had 8-bit integers you would store 8 bits (booleans) per integer which would further reduce your space requirements.
Additionally, using a bit field allows the possibility of using bit masks to perform your sieve operation. For example, if your sieve kept all numbers (not just odd ones), you could construct a bit mask of b01010101 which you could then AND against every element in your array. For threes you could use three integers as the mask: b00100100 b10010010 b01001001.
Finally, you do not need to check numbers that are lower than $n, in fact you don't need to check for numbers less than $n*$n-1.
Once you know the number is not a prime, I would exit the enter loop. I don't know php, but you need a statement like a break in C or a last in Perl.
If that is not available, I would set a flag and use it to exit the inter loop as a condition of continuing the interloop. This should speed up your execution as you are not checking $limit/2 items if it is not a prime.
if you want speed, don’t use PHP on this one :P
no, seriously, i really like PHP and it’s a cool language, but it’s not suited at all for such algorithms