Why is foreach so slow? - php

PHPBench.com runs quick benchmark scripts on each pageload. On the foreach test, when I load it, foreach takes anywhere from 4 to 10 times as long to run than the third example.
Why is it that a native language construct is apparently slower than performing the logic oneself?

Maybe it has to do with the fact that foreach works on a copy of the array ?
Or maybe it has to do with the fact that, when looping with foreach, on each iteration, the internal array pointer is changed, to point to the next element ?
Quoting the relevant portion of foreach's manual page :
Note: Unless the array is referenced,
foreach operates on a copy of the
specified array and not the array
itself. foreach has some side effects
on the array pointer.
As far as I can tell, the third test you linked to doesn't do any of those two things -- which means both tests don't do the same thing -- which means you are not comparing two way of writing the same code.
(I would also say that this kind of micro-optimization will not matter at all in a real application -- but I guess you already know that, and just asked out of curiosity)
There is also one thing that doesn't feel right in this test : it only does the test one time ;; for a "better" test, it might be useful to test all of those more than once -- with timings in the order of 100 micro-seconds, not much is required to make a huge difference.
(Considering the first test varies between 300% and 500% on a few refreshes...)
For those who don't want to click, here's the first test (I've gotten 3xx%, 443%, and 529%) :
foreach($aHash as $key=>$val) {
$aHash[$key] .= "a";
}
And the third one (100%) :
$key = array_keys($aHash);
$size = sizeOf($key);
for ($i=0; $i<$size; $i++) {
$aHash[$key[$i]] .= "a";
}

I'm sorry, but the website got it wrong. Here's my own script that shows the two are almost the same in speed, and in fact, foreach is faster!
<?php
function start(){
global $aHash;
// Initial Configuration
$i = 0;
$tmp = '';
while($i < 10000) {
$tmp .= 'a';
++$i;
}
$aHash = array_fill(100000000000000000000000, 100, $tmp);
unset($i, $tmp);
reset($aHash);
}
/* The Test */
$t = microtime(true);
for($x = 0;$x<500;$x++){
start();
$key = array_keys($aHash);
$size = sizeOf($key);
for ($i=0; $i<$size; $i++) $aHash[$key[$i]] .= "a";
}
print (microtime(true) - $t);
print ('<br/>');
$t = microtime(true);
for($x = 0;$x<500;$x++){
start();
foreach($aHash as $key=>$val) $aHash[$key] .= "a";
}
print (microtime(true) - $t);
?>
If you look at the source code of the tests: http://www.phpbench.com/source/test2/1/ and http://www.phpbench.com/source/test2/3/ , you can see that $aHash isn't repopulated to the initial data after each iteration. It is created once at the beginning, then each test is ran X times. In this sense, you are working with an ever growing $aHash for each iteration... in psuedocode:
iteration 1: $aHash[10000000000000]=='aaaaaa....10000 times...a';
iteration 2: $aHash[10000000000000]=='aaaaaa....10001 times...a';
iteration 2: $aHash[10000000000000]=='aaaaaa....10002 times...a';
Over time, the data for all the tests is getting larger for each iteration, so of course by iteration 100, the array_keys method is faster because it'll always have the same keys, where as the foreach loop has to contend with an ever growing data set and store the values in arrays!
If you run my code provided above on your server, you'll see clearly that foreach is faster AND neater AND clearer.
If the author of the site intended his test to do what it does, then it certainly is not clear, and otherwise, it's an invalid test.

Benchmark results for such micro measurements, coming from a live, busy webserver that is subject to extreme amounts of varying load and other influences, should be disregarded. This is not an environment to benchmark in.

Related

What is the fastest way to check amount of specific chars in a string in PHP?

So i need to check if amount of chars from specific set in a string is higher than some number, what a fastest way to do that?
For example i have a long string "some text & some text & some text + a lot more + a lot more ... etc." and i need to check if there r more than 3 of next symbols: [&,.,+]. So when i encounter 4th occurrence of one of these chars i just need to return false, and stop the loop. So i think to create a simple function like that. But i wonder is there any native method in php to do such a thing? But i need some function which will not waste time parsing the string till the end, cuz the string may be pretty long. So i think regexp and functions like count_chars r not suited for that kind of job...
Any suggestions?
I don't know about a native method, I think count_chars is probably as close as you're going to get. However, rolling a custom solution would be relatively simple:
$str = 'your text here';
$chars = ['&', '.', '+'];
$count = [];
$length = strlen($str);
$limit = 3;
for ($i = 0; $i < $length; $i++) {
if (in_array($str[$i], $chars)) {
$count[$str[$i]] += 1;
if ($count[$str[$i]] > $limit) {
break;
}
}
}
Where the data is actually coming from might also make a difference. For example, if it's from a file then you could take advantage of fread's 2nd parameter to only read x number of bytes at a time within a while loop.
Finding the fastest way might be too broad of a question as PHP has a lot of string related functions; other solutions might use strstr, strpos, etc...
Not benchmarked the other solutions but http://php.net/manual/en/function.str-replace.php passing an array of options will be fast. There is an optional parameter which returns the count of replacements. Check that number
str_replace ( ['&','.','+'], '' , $subject , $count )
if ($count > $number ) {
Well, all my thoughts were wrong and my expectations were crushed by real tests. RegExp seems to work from 2 to 7 times faster (with different strings) than self-made function with simple symbol-checking loop.
The code:
// self-made function:
function chk_occurs($str,$chrs,$limit){
$r=false;
$count = 0;
$length = strlen($str);
for($i=0; $i<$length; $i++){
if(in_array($str[$i], $chrs)){
$count++;
if($count>$limit){
$r=true;
break;
}
}
}
return $r;
}
// RegExp i've used for tests:
preg_match('/([&\\.\\+]|[&\\.\\+][^&\\.\\+]+?){3,}?/',$str);
Of course it works faster because it's a single call to native function, but even same code wrapped into function works from 2 to ~4.8 times faster.
//RegExp wrapped into the function:
function chk_occurs_preg($str,$chrs,$limit){
$chrs=preg_quote($chrs);
return preg_match('/(['.$chrs.']|['.$chrs.'][^'.$chrs.']+?){'.$limit.',}?/',$str);
}
P.S. i wasn't bothered to check cpu-time, just was testing walltime measured via microtime(true); of the 200k iteration loop, but it's enough for me.

How to iterate efficiently when some sub intervals results are known

You have a function that always inputs an interval (natural numbers in this case), this function returns a result, but is quite expensive on the processor, simulated by sleep in this example:
function calculate($start, $end) {
$result = 0;
for($x=$start;$x<=$end;$x++) {
$result++;
usleep(250000);
}
return $result;
}
In order to be more efficient there is an array of old results, that contains the interval used an the result of the function for that interval:
$oldResults = [
['s'=>1, 'e'=>2, 'r' => 1],
['s'=>2, 'e'=>6, 'r' => 4],
['s'=>4, 'e'=>7, 'r' => 3]
];
If I call calculate(1,10) the function should be able to calculate new intervals based on old results and accumulate them, In this particular case it should take the old result from 1 to 2 add that to the old result from 2 to 6 and do a new calculate(6,10) and add that too. Take in consideration that the function ignores the old saved interval from 4 to 7 since it was more convenient to use 2-6.
This is a visual representation of the problem:
Of course in this example, calculate() is quite simple and you can just find particular ways to solve this problem around it, but in the real code calculate() is complex and the only thing I know is that calculate(n0,n3)==calculate(n0,n1)+calculate(n1,n2)+calculate(n2,n3).
I cannot find a way to solve the reuse of the old data without using a bunch of IF and foreach, I'm sure there is a more elegant approach to solve this.
You can play with the code here.
Note: I'm using PHP but I can read JS, Pyton, C and similar languages.
if you are certain that calculate(n0,n3)==calculate(n0,n1)+calculate(n1,n2)+calculate(n2,n3), then it seems to me that one approach might simply be to establish a database cache.
you can pre-calculate each discrete interval, and store its result in a record.
$start = 0;
$end = 1000;
for($i=1;$i<=$end;$i++) {
$result = calculate($start, $i);
$sql = "INSERT INTO calculated_cache (start, end, result) VALUES ($start,$i,$result)";
// execute statement via whatever dbms api
$start++;
}
now whenever new requests come in, a database lookup should be significantly faster. note you may need to tinker with my boundary cases in this rough example.
function fetch_calculated_cache($start, $end) {
$sql = "
SELECT SUM(result)
FROM calculated_cache
WHERE (start BETWEEN $start AND $end)
AND (end BETWEEN $start AND $end)
";
$result = // whatever dbms api you chose
return $result;
}
there are a couple obvious considerations such as:
cache invalidation. how often will the results of your calculate function change? you'll need to repopulate the database then.
how many intervals do you want to store? in my example, I arbitrarily picked 1000
will you ever need to retrieve non-sequential interval results? you'll need to apply the above procedure in chunks.
i wrote this:
function findFittingFromCache($from, $to, $cache){
//length for measuring usefulnes of chunk from cache (now 0.1 means 10% percent of total length)
$totalLength = abs($to - $from);
$candidates = array_filter($cache, function($val) use ($from, $to, $totalLength){
$chunkLength = abs($val['e'] - $val['s']);
if($from <= $val['s'] && $to >= $val['e'] && ($chunkLength/$totalLength > 0.1)){
return true;
}
return false;
});
//sorting to have non-decremental values of $x['s']
usort($candidates, function($a, $b){ return $a['s'] - $b['s']; });
$flowCheck = $from;
$needToCompute = array();
foreach($candidates as $key => $val){
if($val['s'] < $flowCheck){
//already using something with this interval
unset($candidates[$key]);
} else {
if($val['s'] > $flowCheck){
//save what will be needed to compute
$needToCompute[] = array('s'=>$flowCheck, 'e'=>$val['s']);
}
//increase starting position for next loop
$flowCheck = $val['e'];
}
}
//rest needs to be computed as well
if($flowCheck < $to){
$needToCompute[] = array('s'=>$flowCheck, 'e'=>$to);
}
return array("computed"=>$candidates, "missing"=>$needToCompute);
}
It is function which returns you two arrays, one "computed" holds found already computed pieces, second "missing" holds gaps between them which must be computed yet.
inside function there is 0.1 threshold, which disqualifies chunks shorter than 10% of total searched length, you can rewrite function to send threshold as parameter, or ommit it completely.
i presume results will be stored and after computing added into cache ($oldResults), which might be of any form (for example database as Jeff Puckett suggested). Do not forget to add all computed chunks and whole seeked length into cache.
I am sorry but i can't find a way without cycles and ifs
Working demo:
link

PHP's array_slice vs Python's splitting arrays

Some background
I was having a go at the common "MaxProfit" programming challenge. It basically goes like this:
Given a zero-indexed array A consisting of N integers containing daily
prices of a stock share for a period of N consecutive days, returns
the maximum possible profit from one transaction during this period.
I was quite pleased with this PHP algorithm I came up, having avoided the naive brute-force attempt:
public function maxProfit($prices)
{
$maxProfit = 0;
$key = 0;
$n = count($prices);
while ($key < $n - 1) {
$buyPrice = $prices[$key];
$maxFuturePrice = max( array_slice($prices, $key+1) );
$profit = $maxFuturePrice - $buyPrice;
if ($profit > $maxProfit) $maxProfit = $profit;
$key++;
}
return $maxProfit;
}
However, having tested my solution it seems to perform badly performance-wise, perhaps even in O(n2) time.
I did a bit of reading around the subject and discovered a very similar python solution. Python has some quite handy array abilities which allow splitting an array with a a[s : e] syntax, unlike in PHP where I used the array_slice function. I decided this must be the bottleneck so I did some tests:
Tests
PHP array_slice()
$n = 10000;
$a = range(0,$n);
$start = microtime(1);
foreach ($a as $key => $elem) {
$subArray = array_slice($a, $key);
}
$end = microtime(1);
echo sprintf("Time taken: %sms", round(1000 * ($end - $start), 4)) . PHP_EOL;
Results:
$ php phpSlice.php
Time taken: 4473.9199ms
Time taken: 4474.633ms
Time taken: 4499.434ms
Python a[s : e]
import time
n = 10000
a = range(0, n)
start = time.time()
for key, elem in enumerate(a):
subArray = a[key : ]
end = time.time()
print "Time taken: {0}ms".format(round(1000 * (end - start), 4))
Results:
$ python pySlice.py
Time taken: 213.202ms
Time taken: 212.198ms
Time taken: 215.7381ms
Time taken: 213.8121ms
Question
Why is PHP's array_slice() around 20x less efficient than Python?
Is there an equivalently efficient method in PHP that achieves the above and thus hopefully makes my maxProfit algorithm run in O(N) time? Edit I realise my implementation above is not actually O(N), but my question still stands regarding the efficiency of slicing arrays.
I don't really know, but PHP's arrays are messed up hybrid monsters, maybe that's why. Python's lists are really just lists, not at the same time dictionaries, so they might be simpler/faster because of that.
Yes, do an actual O(n) solution. Your solution isn't just slow because PHP's slicing is apparently slow, it's slow because you obviously have an O(n^2) algorithm. Just walk over the array once, keep track of the minimum price found so far, and check it with the current price. Not something like max over half the array in every single loop iteration.

Implementing Cutting Stock Algorithm in PHP

I need to implement the Cutting Stock Problem with a php script.
As my math skills are not that great I am just trying to brute force it.
Starting with these parameters
$inventory is an array of lengths that are available to be cut.
$requestedPieces is an array of lengths that were requested by the
customer.
$solution is an empty array
I have currently worked out this recursive function to come up with all possible solutions:
function branch($inventory, $requestedPieces, $solution){
// Loop through the requested pieces and find all inventory that can fulfill them
foreach($requestedPieces as $requestKey => $requestedPiece){
foreach($inventory as $inventoryKey => $piece){
if($requestedPiece <= $piece){
$solution2 = $solution;
array_push($solution2, array($requestKey, $inventoryKey));
$requestedPieces2 = $requestedPieces;
unset($requestedPieces2[$requestKey]);
$inventory2 = $inventory;
$inventory2[$inventoryKey] = $piece - $requestedPiece;
if(count($requestedPieces2) > 0){
branch($inventory2, $requestedPieces2, $solution2);
}else{
global $solutions;
array_push($solutions, $solution2);
}
}
}
}
}
The biggest inefficiency I have discovered with this is that it will find the same solution multiple times but with the steps in a different order.
For example:
$inventory = array(1.83, 20.66);
$requestedPieces = array(0.5, 0.25);
The function will come up with 8 solutions where it should come up with 4 solutions.
What is a good way to resolve this.
This does not answer your question, but I thought it could be worth being mentioned:
You have several other ways to solve your problem, rather than brute forcing it. The wikipedia page on the topic is pretty thorough, but I'll just describe two others simpler ideas. I will use the wikipedia terminology for certain words, namely master for inventory piece, and cut for a requested piece. I will use set to denote a set of cuts pertaining to a given master.
The first one is based on the greedy algorithm, and consist in filling a set with the largest available cut, until no more cut may fit, and repeat that same process for each master, yielding a set for each one of them.
The second one is more dynamic: it uses recursion (like yours), and look for the best fit for the remaining length of master and cuts at each step of the recursion, the goal being to minimize the wasted length when no more cuts can fit.
function branch($master, $cuts, $set){
$goods = array_filter($cuts, function($v) use ($master) { return $v <= $master;});
$res = array($master,$set,$cuts);
if (empty($goods))
return $res;
$remaining = array_diff($cuts, $goods);
foreach($goods as $k => $g){
$t = $set;
array_push($t, $g);
$r = $remaining;
$c = $goods;
for ($i = 0; $i < $k; $i++)
array_push($r,array_shift($c));
array_shift($c);
$t = branch($master - $g, $c, $t);
array_walk($r, function($k,$v) use ($t) {array_push($t[2], $v);});
if ($t[0] == 0) return $t;
if ($t[0] < $res[0])
$res = $t;
}
return $res;
}
The function above should give you the optimal set for a given master. It returns an array of 3 values:
the wasted length on master
the set
the remaining cuts
The parameters are
the master length,
the cuts to be performed (must be sorted in descending order),
the set of cuts already scheduled (a preexisting set, which would be empty for the first call for each master)
Caveats: It depends on the masters' order, you could certainly write a function which tries all the relevant possibilities to find the best order of masters.

Random number between x and y excluding a range of numbers inbetween

I am implementing a system at the moment where it needs to allocate a number in a certain range to a person, but not use any number that has been used before.
Keep in mind, both the number range and exclusion list are both going to be quite large.
Initially, I thought doing something like this would be best:
<?php
$start = 1;
$end = 199999;
$excluded = array(4,6,7,8,9,34);
$found = FALSE;
while (!$found) {
$rand = mt_rand($start,$end);
if (!in_array($rand,$excluded)) {
$found = TRUE;
}
}
?>
But I don't think this is ideal, there is the possibility of an infinite loop (or it taking a very long time / timing out the script).
I also thought about generating an array of all the numbers I needed, but surely a massive array would be worse? Also doing an array diff on 2 massive arrays would surely take a long time too?
Something like this:
<?php
$start = 1;
$end = 199999;
$allnums = range($start,$end);
$excluded = array(4,6,7,8,9,34);
$searcharray = array_diff($allnums,$excluded);
$rand = array_rand($searcharray);
?>
So, my question would be which would be a better option? And is there another (better) way of doing this that someone has used before?
Array's holding large amounts of data will use up a lot of memory, can you not use a database to hold these numbers in? That's generally what they are designed for.

Categories