Randomize a PHP array with a seed? - php

I'm looking for a function that I can pass an array and a seed to in PHP and get back a "randomized" array. If I passed the same array and same seed again, I would get the same output.
I've tried this code
//sample array
$test = array(1,2,3,4,5,6);
//show the array
print_r($test);
//seed the random number generator
mt_srand('123');
//generate a random number based on that
echo mt_rand();
echo "\n";
//shuffle the array
shuffle($test);
//show the results
print_r($test);
But it does not seem to work. Any thoughts on the best way to do this?
This question dances around the issue but it's old and nobody has provided an actual answer on how to do it: Can i randomize an array by providing a seed and get the same order? - "Yes" - but how?
Update
The answers so far work with PHP 5.1 and 5.3, but not 5.2. Just so happens the machine I want to run this on is using 5.2.
Can anyone give an example without using mt_rand? It is "broken" in php 5.2 because it will not give the same sequence of random numbers based off the same seed. See the php mt_rand page and the bug tracker to learn about this issue.

Sorry, but accordingly to the
documentation the
shuffle function is seeded automatically.
Normally, you shouldn't try to come up with your own algorithms to randomize things since they are very likely to be biased. The Fisher-Yates algorithm is known to be both efficient and unbiased though:
function fisherYatesShuffle(&$items, $seed)
{
#mt_srand($seed);
for ($i = count($items) - 1; $i > 0; $i--)
{
$j = #mt_rand(0, $i);
$tmp = $items[$i];
$items[$i] = $items[$j];
$items[$j] = $tmp;
}
}
Example (PHP 5.5.9):
php > $original = array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9);
php > $shuffled = (array)$original;
php > fisherYatesShuffle($shuffled, 0);
php > print_r($shuffled);
Array
(
[0] => 6
[1] => 0
[2] => 7
[3] => 2
[4] => 9
[5] => 3
[6] => 1
[7] => 8
[8] => 5
[9] => 4
)
php > $shuffled = (array)$original;
php > fisherYatesShuffle($shuffled, 0);
php > print_r($shuffled);
Array
(
[0] => 6
[1] => 0
[2] => 7
[3] => 2
[4] => 9
[5] => 3
[6] => 1
[7] => 8
[8] => 5
[9] => 4
)

You can use array_multisort to order the array values by a second array of mt_rand values:
$arr = array(1,2,3,4,5,6);
mt_srand('123');
$order = array_map(create_function('$val', 'return mt_rand();'), range(1, count($arr)));
array_multisort($order, $arr);
var_dump($arr);
Here $order is an array of mt_rand values of the same length as $arr. array_multisort sorts the values of $order and orders the elements of $arr according to the order of the values of $order.

The problem you have is that PHP comes with two random number generators built in.
The shuffle() command does not use the mt_rand() random number generator; it uses the older rand() random number generator.
Therefore, if you want shuffle() to use a seeded number sequence, you need to seed the older randomiser, using srand() rather than mt_srand().
In most other cases, you should use mt_rand() rather than rand(), since it is a better random number generator.

The main question involves two parts. One is about how to shuffle. The other is about how to add randomness to it.
A simple solution
This is probably the simplest answer to the main question. It is sufficient for most cases in PHP scripting. But not all (see below).
function /*array*/ seedShuffle(/*one dimentional array*/ $array, /*integer*/ $seed) {
$tmp = array();
for ($rest = $count = count($array);$count>0;$count--) {
$seed %= $count;
$t = array_splice($array,$seed,1);
$tmp[] = $t[0];
$seed = $seed*$seed + $rest;
}
return $tmp;
}
The above method will do, even though it doesn't produce true random shuffles for all possible seed-array combinations. However, if you really want it to be balanced and all, I guess PHP shuldn't be your choice.
A more useful solution for advanced programmers
As stated by André Laszlo, randomization is a tricky business. It is usually best to let a dedicated object handle it. My point is, that you shouldn't need to bother with the randomness when you write the shuffle function. Depending on what degree of ramdomness you would like in your shuffle, you may have a number of PseudoRandom objects to choose from. Thus the above could look like this:
abstract class PseudoRandom {
protected abstract function /*integer*/ nextInt();
public function /*integer*/ randInt(/*integer*/ $limit) {
return $this->nextInt()%$limit;
}
}
function /*array*/ seedShuffle($array, /*PseudoRandom Object*/ $rnd) {
$tmp = array();
$count = count($array);
while($count>0) {
$t = array_splice($array,$rnd->randInt($count--),1);
$tmp[] = $t[0];
}
return $tmp;
}
Now, this solution is the one I would vote for. It separates shuffle codes from randomization codes. Depending on what kind of random you need you can subclass PseudoRandom, add the needed methods and your preferred formulas.
And, as the same shuffle function may be used with many random algorithms, one random algorithm may be used in different places.

In recent PHP versions, seeding the PHP builtin rand() and mt_rand() functions will not give you the same results everytime. The reason for this is not clear to me (why would you want to seed the function anyway if the result is different every time.) Anyway, it seems like the only solution is to write your own random function
class Random {
// random seed
private static $RSeed = 0;
// set seed
public static function seed($s = 0) {
self::$RSeed = abs(intval($s)) % 9999999 + 1;
self::num();
}
// generate random number
public static function num($min = 0, $max = 9999999) {
if (self::$RSeed == 0) self::seed(mt_rand());
self::$RSeed = (self::$RSeed * 125) % 2796203;
return self::$RSeed % ($max - $min + 1) + $min;
}
}
Usage:
// set seed
Random::seed(42);
// echo 10 numbers between 1 and 100
for ($i = 0; $i < 10; $i++) {
echo Random::num(1, 100) . '<br />';
}
The code above will output the folowing sequence every time you run it:
76
86
14
79
73
2
87
43
62
7
Just change the seed to get a completely different "random" sequence

A variant that also works with PHP version 7.2, because the php function create_function is deprecated in the newest php version.
mt_srand($seed);
$getMTRand = function () {
return mt_rand();
};
$order = array_map($getMTRand, range(1, count($array)));
array_multisort($order, $array);
return $array;

I guess this will do the job :
function choose_X_random_items($original_array , $number_of_items_wanted = -1 , $seed = FALSE ){
//save the keys
foreach ($original_array as $key => $value) {
$original_array[$key]['key_memory'] = $key;
}
$original_array = array_values($original_array);
$results = array();
if($seed !== FALSE){srand($seed);}
$main_random = rand();
$random = substr($main_random,0,( $number_of_items_wanted == -1 ? count($original_array) : min($number_of_items_wanted,count($original_array)) ));
$random = str_split($random);
foreach ($random AS $id => $value){
$pick = ($value*$main_random) % count($original_array);
$smaller_array[] = $original_array[$pick];
unset($original_array[$pick]);
$original_array = array_values($original_array);
}
//retrieve the keys
foreach ($smaller_array as $key => $value) {
$smaller_array[$value['key_memory']] = $value;
unset($smaller_array[$value['key_memory']]['key_memory']);
unset($smaller_array[$key]);
}
return $smaller_array;
}
In order to not limit the resulting array, set $number_of_items_wanted to -1
In order to not use a seed, set $seed to FALSE

Seeded shuffle while maintaining the key index:
function seeded_shuffle(array &$items, $seed = false) {
mt_srand($seed ? $seed : time());
$keys = array_keys($items);
$items = array_values($items);
for ($i = count($items) - 1; $i > 0; $i--) {
$j = mt_rand(0, $i);
list($items[$i], $items[$j]) = array($items[$j], $items[$i]);
list($keys[$i], $keys[$j]) = array($keys[$j], $keys[$i]);
}
$items = array_combine($keys, $items);
}

A simple solution:
$pool = [1, 2, 3, 4, 5, 6];
$seed = 'foo';
$randomIndex = crc32($seed) % count($pool);
$randomElement = $pool[$randomIndex];
It might not be quite as random as the Fisher Yates shuffle, but I found it gave me more than enough entropy for what I needed it for.

Based on #Gumbo, #Spudley, #AndreyP answers, it works like that:
$arr = array(1,2,3,4,5,6);
srand(123); //srand(124);
$order = array_map(function($val) {return rand();}, range(1, count($arr)));
array_multisort($order, $arr);
var_dump($arr);

Home Made function, using crc32(note:also returns negative value(https://www.php.net/manual/en/function.crc32.php))
$sortArrFromSeed = function($arr, $seed){
$arrLen = count($arr);
$newArr = [];
$hash = crc32($seed); // returns hash (0-9 numbers)
$hash = strval($hash);
while(strlen($hash) < $arrLen){
$hash .= $hash;
}
for ($i=0; $i<$arrLen; $i++) {
$index = (int) $hash[$i] * (count($arr)/9); // because 0-9 range
$index = (int) $index; // remove decimal
if($index !== 0) $index--;
array_push($newArr, $arr[$index]);
unset($arr[$index]);
$arr = array_values($arr);
}
return $newArr;
};
// TESTING
$arr = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'];
$arr = $sortArrFromSeed($arr,'myseed123');
echo '<pre>'.print_r($arr,true).'</pre>';

This seems the easiest for me...
srand(123);
usort($array,function($a,$b){return rand(-1,1);});

Related

php - How to create a random bit stream

How can I generate following random array:
array(1, 0 , 0, 1, 1, 1, 1, 0, 1, 0, 0)
with length = n?
Of course, always the for solution is out there. But is there any One-line simple solution on this, like any combination of array_fill and mt_rand(0, 1)?
You can use combination of array_map and range (or array_fill if you need) to generate random values.
$length = 10;
$randBits = array_map(
function () {
return mt_rand(0, 1);
},
range(0, $length - 1)
);
Working example
Use range with mt_rand
$random_bit_stream = array_map(function() {
return mt_rand(0,1);
}, range(1,6));
or
$random_bit_stream = array_map(function() {
return rand(0,1);
}, range(1,6));
print_r($random_bit_stream);
Output:
Array ( [0] => 0 [1] => 1 [2] => 1 [3] => 1 [4] => 0 [5] => 0 )
try this,
function randome_array($n)
{
$array=array();
$x = 1;
while($x <= $n) {
$temp = mt_rand(0,1);
array_push($array,$temp);
$x++;
}
return $array;
}
print_r(randome_array(5));
https://3v4l.org/ZSrMX
i hope it will be helpful.
Well, as you mentioned, you could one-line the for solution.
for ($i = 0; $i < count($arr); $i++) $arr[$i] = rand(0, 1);
aside that, since you mentioned array_fill, you could simply use it with the random function as well
$arr = array_fill (0, count($arr), rand(0, 1));
You can try with range()
range — Create an array containing a range of elements
for more info check http://php.net/manual/en/function.range.php

array_slice with continuous sequence php

Good Day!
I have array structure for 13 with range (1,13);
it something like
array(1,2,3,4,5,6,7,8,9,10,11,12,13);
$slice=2;
$ignore = 3;
i want array structure like array(3,4,5,8,9,10,13);
I tried with array_slice function but i couldn't offset the value sequentially
array_slice($array,2,13,true);
is there a way to ignore the next 3 values and add slice to the next second until last array, any native function or key will be great.
Thanks,
vicky
Using array_merge(), function one could join two array slice. This will return a new array.
http://php.net/manual/en/function.array-merge.php
<?php
$slice = 2;
$ignore = 3;
$a = array(1,2,3,4,5,6,7,8,9,10,11,12,13);
$a = array_merge(array_slice($a, $slice,$ignore),array_merge(array_slice($a, 2* $slice + $ignore,3), array_slice($a, 3* $slice + 2*$ignore)));
var_dump($a);
?>
I hope you got it. :)
Here's a function that will recursively perform an "asymmetric" slicing of an array of n length:
function slice_asym(array $head, $exclude, $include)
{
$keep = array_slice($head, $exclude, $include);
$tail = array_slice($head, $exclude + $include);
return $tail ? array_merge($keep, slice_asym($tail, $exclude, $include))
: $keep;
}
Recursion requires a base case where recursion stops and all scopes unwind and return. In our case, we want to stop when the tail contains no more elements.
array_slice() always returns an array - if there are no elements to return for its given offset, array_slice() returns an empty array. So, for each recursion, we want to:
Slice out the element(s) that we want to $keep
Create a $tail - this is a subset of the array that excludes the elements that we both want to ignore and keep in the current scope ($exclude + $include).
If the $tail is not empty, create a new level of recursion. Otherwise stop recursion and merge all current $keep elements with the result of the next level of recursion.
In pseudocode, this looks something like:
# step 1: first recursion
head = [1,2,3,4,5,6,7,8,9,10,11,12,13]
keep = [3,4,5]
tail = [6,7,8,9,10,11,12,13] # tail not empty
# step 2: second recursion
head = [6,7,8,9,10,11,12,13]
keep = [8,9,10]
tail = [11,12,13] # tail not empty
# step 3: third recursion
head = [11,12,13]
keep = [13]
tail = [] # tail empty: base case met!
# return and merge `keep` elements
[3,4,5] + [8, 9, 10] + [13] -> [3,4,5,8,9,10,13]
As per your example, the following invocation:
$range = range(1, 13); // [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]
$sliced = slice_asym($range, 2, 3);
print_r($sliced);
Yields:
Array
(
[0] => 3
[1] => 4
[2] => 5
[3] => 8
[4] => 9
[5] => 10
[6] => 13
)
Edit
If you are using PHP version 5.5 or above, this is also a good use case for a generator.
These are useful because:
A generator allows you to write code that uses foreach to iterate over a set of data without needing to build an array in memory, which may cause you to exceed a memory limit, or require a considerable amount of processing time to generate.
So, you would not have to pre-create an array of sequential values:
Instead, you can write a generator function, which is the same as a normal function, except that instead of returning once, a generator can yield as many times as it needs to in order to provide the values to be iterated over.
We could implement a generator for your use case like so:
/**
*
* #param int $limit
* #param int $exclude
* #param int $include
* #return int
*/
function xrange_asym($limit, $exclude, $include) {
if ($limit < 0 || $exclude < 0 || $include < 0) {
throw new UnexpectedValueException(
'`xrange_asym` does not accept negative values');
}
$seq = 1;
$min = $exclude;
$max = $exclude + $include;
while ($seq <= $limit) {
if ($seq > $min && $seq <= $max) {
yield $seq;
}
if ($seq === $max) {
$min = $exclude + $max;
$max = $exclude + $include + $max;
}
$seq = $seq + 1;
}
}
and use it, like so:
foreach (xrange_asym(13, 2, 3) as $number) {
echo $number . ', '; // 3, 4, 5, 8, 9, 10, 13
}
Or use iterator_to_array() to generate the whole range as an array:
print_r(iterator_to_array(xrange_asym(13, 2, 3)));
Hope this helps :)

Efficiently pick n random elements from PHP array (without shuffle)

I have the following code to pick $n elements from an array $array in PHP:
shuffle($array);
$result = array_splice($array, 0, $n);
Given a large array but only a few elements (for example 5 out of 10000), this is relatively slow, so I would like to optimize it such that not all elements have to be shuffled. The values must be unique.
I'm looking fo the most performant alternative. We can assume that $array has no duplicates and is 0-indexed.
$randomArray = [];
while (count($randomArray) < 5) {
$randomKey = mt_rand(0, count($array)-1);
$randomArray[$randomKey] = $array[$randomKey];
}
This will provide exactly 5 elements with no duplicates and very quickly. The keys will be preserved.
Note: You'd have to make sure $array had 5 or more elements or add some sort of check to prevent an endless loop.
This function performs a shuffle on only $n elements where $n is the number of random elements you want to pick. It will also work on associative arrays and sparse arrays. $array is the array to work on and $n is the number of random elements to retrieve.
If we define the $max_index as count($array) - 1 - $iteration.
It works by generating a random number between 0 and $max_index. Picking the key at that index, and replacing its index with the value at $max_index so that it can never be picked again, as $max_index will be one less at the next iteration and unreachable.
In summary this is the Richard Durstenfeld's Fisher-Yates shuffle but operating only on $n elements instead of the entire array.
function rand_pluck($array, $n) {
$array_keys = array_keys($array);
$array_length = count($array_keys);
$max_index = $array_length -1;
$iterations = min($n, $array_length);
$random_array = array();
while($iterations--) {
$index = mt_rand(0, $max_index);
$value = $array_keys[$index];
$array_keys[$index] = $array_keys[$max_index];
array_push($random_array, $array[$value]);
$max_index--;
}
return $random_array;
}
The trick is to use a variation of shuffle or in other words a partial shuffle.
performance is not the only criterion, statistical efficiency, i.e unbiased sampling is as important (as the original shuffle solution is)
function random_pick( $a, $n )
{
$N = count($a);
$n = min($n, $N);
$picked = array_fill(0, $n, 0); $backup = array_fill(0, $n, 0);
// partially shuffle the array, and generate unbiased selection simultaneously
// this is a variation on fisher-yates-knuth shuffle
for ($i=0; $i<$n; $i++) // O(n) times
{
$selected = mt_rand( 0, --$N ); // unbiased sampling N * N-1 * N-2 * .. * N-n+1
$value = $a[ $selected ];
$a[ $selected ] = $a[ $N ];
$a[ $N ] = $value;
$backup[ $i ] = $selected;
$picked[ $i ] = $value;
}
// restore partially shuffled input array from backup
// optional step, if needed it can be ignored, e.g $a is passed by value, hence copied
for ($i=$n-1; $i>=0; $i--) // O(n) times
{
$selected = $backup[ $i ];
$value = $a[ $N ];
$a[ $N ] = $a[ $selected ];
$a[ $selected ] = $value;
$N++;
}
return $picked;
}
NOTE the algorithm is strictly O(n) in both time and space, produces unbiased selections (it is a partial unbiased shuffling) and produces output which is proper array with consecutive keys (not needing extra array_values etc..)
Use example:
$randomly_picked = random_pick($my_array, 5);
// or if an associative array is used
$randomly_picked_keys = random_pick(array_keys($my_array), 5);
$randomly_picked = array_intersect_key($my_array, array_flip($randomly_picked_keys));
For further variations and extensions of shuffling for PHP:
PHP - shuffle only part of an array
PHP shuffle with seed
How can I take n elements at random from a Perl array?
This will only show benifits for small n compared to an array shuffle, but you could
Choose a random index r n times, each time decreasing the limit by 1
Adjust for previously used indices
Take value
Store used index
Pseudocode
arr = []
used = []
for i = 0..n-1:
r = rand 0..len-i
d = 0
for j = 0..used.length-1:
if r >= used[j]:
d += 1
arr.append($array[r + d])
used.append(r)
return arr
You could generate n-times a random number with mt_rand() and then fill these values in a new array. To go against the case where the same index gets returned twice we use the actual returned index to fill the new array and check always if the index exists in the new array, if so we use while to loop through it as long as we get a duplicate index. At the end we use array_values() to get a 0-indexed array.
$count = count($array) - 1;
$new_array = array();
for($i = 0; $i < $n; $i++) {
$index = mt_rand(0, $count);
while(isset($new_array[$index])) {
$index = mt_rand(0, $count);
}
$new_array[$index] = $array[$index];
}
$new_array = array_values($new_array);
I wonder why everyone here make it so complicated?
Here's the fastest and simplest way:
$randomArray = array_rand(array_flip($array), $n);

Get no. of array elements between certain values in PHP?

How to the find the number of array elements present in the array between 2 values in PHP .
lets say this is my array =>
$a = array(1,2,3,5,10);
I want to find the length of array between 2 values i.e. 2 and 10. So the answer will be 3 in the case. If the highest value to be searched is present in the array it should be added in count.
also length of array between 2 and 9 is 2.
Hope I am clear with my question. Any help would be appreciated.
Use array_filter() to filter the array down to the matching elements, then use count() on the resulting filtered array.
$a = array(1,2,3,5,10);
print count(array_filter($a, function($e) {return ($e>=2 && $e<=10);}));
Hope that helps.
Note: The syntax I've used here, with the embedded function, requires PHP v5.3 or higher.
[EDIT]
Turn it into a simple callable function:
$a = array(1,2,3,5,10);
print countInRange(2,10);
function countInRange($min,$max) {
return count(array_filter($a, function($e) use($min,$max) {return ($e>=$min && $e<=$max);}));
}
See PHP manual for more info on array_filter() function.
$count = 0;
for( $i = 0; $i < count($a); $i++ ) {
if( $a[$i] <= $max && $a[$i] >= $min )
$count++;
}
echo $count;
$count = 0;
$min = 2;
$max = 10;
$a = array(1,2,3,5,10);
foreach($a as $val) {
if($val > $min && $val <= $max) ++$count;
}
This should work, right? At least it will find all numbers, which are higher than your minimum but smaller/equal to your maximum.
Of course this also means, that an array of 1, 2, 9, 5 would return 2, since both 9 and 5 are greater than 2 and smaller than 10.

PHP Arrays - Remove duplicates ( Time complexity )

Okay this is not a question of "how to get all uniques" or "How to remove duplicates from my array in php". This is a question about the time complexity.
I figured that the array_unique is somewhat O(n^2 - n) and here's my implementation:
function array_unique2($array)
{
$to_return = array();
$current_index = 0;
for ( $i = 0 ; $i < count($array); $i++ )
{
$current_is_unique = true;
for ( $a = $i+1; $a < count($array); $a++ )
{
if ( $array[$i] == $array[$a] )
{
$current_is_unique = false;
break;
}
}
if ( $current_is_unique )
{
$to_return[$current_index] = $array[$i];
}
}
return $to_return;
}
However when benchmarking this against the array_unique i got the following result:
Testing (array_unique2)... Operation took 0.52146291732788 s.
Testing (array_unique)... Operation took 0.28323101997375 s.
Which makes the array_unique twice as fast, my question is, why ( Both had the same random data ) ?
And a friend of mine wrote the following:
function array_unique2($a)
{
$n = array();
foreach ($a as $k=>$v)
if (!in_array($v,$n))
$n[$k]=$v;
return $n;
}
which is twice as fast as the built in one in php.
I'd like to know, why?
What is the time-complexity of array_unique and in_array?
Edit
I removed the count($array) from both loops and just used a variable in the top of the function, that gained 2 seconds on 100 000 elements!
While I can't speak for the native array_unique function, I can tell you that your friends algorithm is faster because:
He uses a single foreach loop as opposed to your double for() loop.
Foreach loops tend to perform faster than for loops in PHP.
He used a single if(! ) comparison while you used two if() structures
The only additional function call your friend made was in_array whereas you called count() twice.
You made three variable declarations that your friend didn't have to ($a, $current_is_unique, $current_index)
While none of these factors alone is huge, I can see where the cumulative effect would make your algorithm take longer than your friends.
The time complexity of in_array() is O(n). To see this, we'll take a look at the PHP source code.
The in_array() function is implemented in ext/standard/array.c. All it does is call php_search_array(), which contains the following loop:
while (zend_hash_get_current_data_ex(target_hash, (void **)&entry, &pos) == SUCCESS) {
// checking the value...
zend_hash_move_forward_ex(target_hash, &pos);
}
That's where the linear characteristic comes from.
This is the overall characteristic of the algorithm, becaus zend_hash_move_forward_ex() has constant behaviour: Looking at Zend/zend_hash.c, we see that it's basically just
*current = (*current)->pListNext;
As for the time complexity of array_unique():
first, a copy of the array will be created, which is an operation with linear characteristic
then, a C array of struct bucketindex will be created and pointers into our array's copy will be put into these buckets - linear characteristic again
then, the bucketindex-array will be sorted usign quicksort - n log n on average
and lastly, the sorted array will be walked and and duplicate entries will be removed from our array's copy - this should be linear again, assuming that deletion from our array is a constant time operation
Hope this helps ;)
Try this algorithm. It takes advantage of the fact that the key lookup is faster than in_array():
function array_unique_mine($A) {
$keys = Array();
$values = Array();
foreach ($A as $k => $v) {
if (!array_key_exists($v, $values)) {
$keys[] = $k;
$values[$v] = $v;
}
}
return array_combine($keys, $values);
}
Gabriel's answer has some great points about why your friend's method beats yours. Intrigued by the conversation following Christoph's answer, I decided to run some tests of my own.
Also, I tried this with differing lengths of random strings and although the results were different, the order was the same. I used 6 chars in this example for brevity.
Notice that array_unique5 actually has the same keys as native, 2 and 3, but just outputs in a different order.
Results...
Testing 10000 array items of data over 1000 iterations:
array_unique6: 1.7561039924622 array ( 9998 => 'b', 9992 => 'a', 9994 => 'f', 9997 => 'e', 9993 => 'c', 9999 => 'd', )
array_unique4: 1.8798060417175 array ( 0 => 'b', 1 => 'a', 2 => 'f', 3 => 'e', 4 => 'c', 5 => 'd', )
array_unique5: 7.5023629665375 array ( 10 => 'd', 0 => 'b', 3 => 'e', 2 => 'f', 9 => 'c', 1 => 'a', )
array_unique3: 11.356487989426 array ( 0 => 'b', 1 => 'a', 2 => 'f', 3 => 'e', 9 => 'c', 10 => 'd', )
array_unique: 22.535032987595 array ( 0 => 'b', 1 => 'a', 2 => 'f', 3 => 'e', 9 => 'c', 10 => 'd', )
array_unique2: 62.107122898102 array ( 0 => 'b', 1 => 'a', 2 => 'f', 3 => 'e', 9 => 'c', 10 => 'd', )
array_unique7: 71.557286024094 array ( 0 => 'b', 1 => 'a', 2 => 'f', 3 => 'e', 9 => 'c', 10 => 'd', )
And The Code...
set_time_limit(0);
define('HASH_TIMES', 1000);
header('Content-Type: text/plain');
$aInput = array();
for ($i = 0; $i < 10000; $i++) {
array_push($aInput, chr(rand(97, 102)));
}
function array_unique2($a) {
$n = array();
foreach ($a as $k=>$v)
if (!in_array($v,$n))
$n[$k]=$v;
return $n;
}
function array_unique3($aOriginal) {
$aUnique = array();
foreach ($aOriginal as $sKey => $sValue) {
if (!isset($aUnique[$sValue])) {
$aUnique[$sValue] = $sKey;
}
}
return array_flip($aUnique);
}
function array_unique4($aOriginal) {
return array_keys(array_flip($aOriginal));
}
function array_unique5($aOriginal) {
return array_flip(array_flip(array_reverse($aOriginal, true)));
}
function array_unique6($aOriginal) {
return array_flip(array_flip($aOriginal));
}
function array_unique7($A) {
$keys = Array();
$values = Array();
foreach ($A as $k => $v) {
if (!array_key_exists($v, $values)) {
$keys[] = $k;
$values[$v] = $v;
}
}
return array_combine($keys, $values);
}
function showResults($sMethod, $fTime, $aInput) {
echo $sMethod . ":\t" . $fTime . "\t" . implode("\t", array_map('trim', explode("\n", var_export(call_user_func($sMethod, $aInput), 1)))) . "\n";
}
echo 'Testing ' . (count($aInput)) . ' array items of data over ' . HASH_TIMES . " iterations:\n";
$fTime = microtime(1);
for ($i = 0; $i < HASH_TIMES; $i++) array_unique($aInput);
$aResults['array_unique'] = microtime(1) - $fTime;
$fTime = microtime(1);
for ($i = 0; $i < HASH_TIMES; $i++) array_unique2($aInput);
$aResults['array_unique2'] = microtime(1) - $fTime;
$fTime = microtime(1);
for ($i = 0; $i < HASH_TIMES; $i++) array_unique3($aInput);
$aResults['array_unique3'] = microtime(1) - $fTime;
$fTime = microtime(1);
for ($i = 0; $i < HASH_TIMES; $i++) array_unique4($aInput);
$aResults['array_unique4'] = microtime(1) - $fTime;
$fTime = microtime(1);
for ($i = 0; $i < HASH_TIMES; $i++) array_unique5($aInput);
$aResults['array_unique5'] = microtime(1) - $fTime;
$fTime = microtime(1);
for ($i = 0; $i < HASH_TIMES; $i++) array_unique6($aInput);
$aResults['array_unique6'] = microtime(1) - $fTime;
$fTime = microtime(1);
for ($i = 0; $i < HASH_TIMES; $i++) array_unique7($aInput);
$aResults['array_unique7'] = microtime(1) - $fTime;
asort($aResults, SORT_NUMERIC);
foreach ($aResults as $sMethod => $fTime) {
showResults($sMethod, $fTime, $aInput);
}
Results using Christoph's data set from the comments:
$aInput = array(); for($i = 0; $i < 1000; ++$i) $aInput[$i] = $i; for($i = 500; $i < 700; ++$i) $aInput[10000 + $i] = $i;
Testing 1200 array items of data over 1000 iterations:
array_unique6: 0.83235597610474
array_unique4: 0.84050011634827
array_unique5: 1.1954448223114
array_unique3: 2.2937450408936
array_unique7: 8.4412341117859
array_unique: 15.225166797638
array_unique2: 48.685120105743
PHP's arrays are implemented as hash tables, i.e. their performance characteristics are different from what you'd expect from 'real' arrays. An array's key-value-pairs are additionally stored in a linked list to allow fast iteration.
This explains why your implementation is so slow compared to your friend's: For every numeric index, your algorithm has to do a hash table lookup, whereas a foreach()-loop will just iterate over a linked list.
The following implementation uses a reverse hash table and might be the fastest of the crowd (double-flipping courtesy of joe_mucchiello):
function array_unique2($array) {
return array_flip(array_flip($array));
}
This will only work if the values of $array are valid keys, ie integers or strings.
I also reimplemented your algorithm using foreach()-loops. Now, it will actually be faster than your friend's for small data sets, but still slower than the solution via array_flip():
function array_unique3($array) {
$unique_array = array();
foreach($array as $current_key => $current_value) {
foreach($unique_array as $old_value) {
if($current_value === $old_value)
continue 2;
}
$unique_array[$current_key] = $current_value;
}
return $unique_array;
}
For large data sets, the built-in version array_unique() will outperform all other's except the double-flipping one. Also, the version using in_array() by your friend will be faster than array_unique3().
To summarize: Native code for the win!
Yet another version, which should preserve keys and their ordering:
function array_flop($array) {
$flopped_array = array();
foreach($array as $key => $value) {
if(!isset($flopped_array[$value]))
$flopped_array[$value] = $key;
}
return $flopped_array;
}
function array_unique4($array) {
return array_flip(array_flop($array));
}
This is actually enobrev's array_unique3() - I didn't check his implementations as thoroughly as I should have...
PHP is slower to execute than raw machine code (which is most likely executed by array_unique).
Your second example function (the one your friend wrote) is interesting. I do not see how it would be faster than the native implementation, unless the native one is removing elements instead of building a new array.
I'll admit I don't understand the native code very well, but it seems to copy the entire array, sort it, then loop through it removing duplicates. In that case your second piece of code is actually a more efficient algorithm, since adding to the end of an array is cheaper than deleting from the middle of it.
Keep in mind the PHP developers probably had a good reason for doing it the way they do. Does anyone want to ask them?
The native PHP function array_unique is implemented in C. Thus it is faster than PHP, that has to be translated first. What’s more, PHP uses an different algorithm than you do. As I see it, PHP first uses Quick sort to sort the elements and then deletes the duplicates in one run.
Why his friend’s implementation is faster has his own? Because it uses more built-in functionality that trying to recreate them.

Categories