PHP Arrays - Remove duplicates ( Time complexity )

PHP Arrays - Remove duplicates ( Time complexity ) - php

Okay this is not a question of "how to get all uniques" or "How to remove duplicates from my array in php". This is a question about the time complexity.
I figured that the array_unique is somewhat O(n^2 - n) and here's my implementation:
function array_unique2($array)
{
$to_return = array();
$current_index = 0;
for ( $i = 0 ; $i < count($array); $i++ )
{
$current_is_unique = true;
for ( $a = $i+1; $a < count($array); $a++ )
{
if ( $array[$i] == $array[$a] )
{
$current_is_unique = false;
break;
}
}
if ( $current_is_unique )
{
$to_return[$current_index] = $array[$i];
}
}
return $to_return;
}
However when benchmarking this against the array_unique i got the following result:
Testing (array_unique2)... Operation took 0.52146291732788 s.
Testing (array_unique)... Operation took 0.28323101997375 s.
Which makes the array_unique twice as fast, my question is, why ( Both had the same random data ) ?
And a friend of mine wrote the following:
function array_unique2($a)
{
$n = array();
foreach ($a as $k=>$v)
if (!in_array($v,$n))
$n[$k]=$v;
return $n;
}
which is twice as fast as the built in one in php.
I'd like to know, why?
What is the time-complexity of array_unique and in_array?
Edit
I removed the count($array) from both loops and just used a variable in the top of the function, that gained 2 seconds on 100 000 elements!

While I can't speak for the native array_unique function, I can tell you that your friends algorithm is faster because:
He uses a single foreach loop as opposed to your double for() loop.
Foreach loops tend to perform faster than for loops in PHP.
He used a single if(! ) comparison while you used two if() structures
The only additional function call your friend made was in_array whereas you called count() twice.
You made three variable declarations that your friend didn't have to ($a, $current_is_unique, $current_index)
While none of these factors alone is huge, I can see where the cumulative effect would make your algorithm take longer than your friends.

The time complexity of in_array() is O(n). To see this, we'll take a look at the PHP source code.
The in_array() function is implemented in ext/standard/array.c. All it does is call php_search_array(), which contains the following loop:
while (zend_hash_get_current_data_ex(target_hash, (void **)&entry, &pos) == SUCCESS) {
// checking the value...
zend_hash_move_forward_ex(target_hash, &pos);
}
That's where the linear characteristic comes from.
This is the overall characteristic of the algorithm, becaus zend_hash_move_forward_ex() has constant behaviour: Looking at Zend/zend_hash.c, we see that it's basically just
*current = (*current)->pListNext;
As for the time complexity of array_unique():
first, a copy of the array will be created, which is an operation with linear characteristic
then, a C array of struct bucketindex will be created and pointers into our array's copy will be put into these buckets - linear characteristic again
then, the bucketindex-array will be sorted usign quicksort - n log n on average
and lastly, the sorted array will be walked and and duplicate entries will be removed from our array's copy - this should be linear again, assuming that deletion from our array is a constant time operation
Hope this helps ;)

Try this algorithm. It takes advantage of the fact that the key lookup is faster than in_array():
function array_unique_mine($A) {
$keys = Array();
$values = Array();
foreach ($A as $k => $v) {
if (!array_key_exists($v, $values)) {
$keys[] = $k;
$values[$v] = $v;
}
}
return array_combine($keys, $values);
}

Gabriel's answer has some great points about why your friend's method beats yours. Intrigued by the conversation following Christoph's answer, I decided to run some tests of my own.
Also, I tried this with differing lengths of random strings and although the results were different, the order was the same. I used 6 chars in this example for brevity.
Notice that array_unique5 actually has the same keys as native, 2 and 3, but just outputs in a different order.
Results...
Testing 10000 array items of data over 1000 iterations:
array_unique6: 1.7561039924622 array ( 9998 => 'b', 9992 => 'a', 9994 => 'f', 9997 => 'e', 9993 => 'c', 9999 => 'd', )
array_unique4: 1.8798060417175 array ( 0 => 'b', 1 => 'a', 2 => 'f', 3 => 'e', 4 => 'c', 5 => 'd', )
array_unique5: 7.5023629665375 array ( 10 => 'd', 0 => 'b', 3 => 'e', 2 => 'f', 9 => 'c', 1 => 'a', )
array_unique3: 11.356487989426 array ( 0 => 'b', 1 => 'a', 2 => 'f', 3 => 'e', 9 => 'c', 10 => 'd', )
array_unique: 22.535032987595 array ( 0 => 'b', 1 => 'a', 2 => 'f', 3 => 'e', 9 => 'c', 10 => 'd', )
array_unique2: 62.107122898102 array ( 0 => 'b', 1 => 'a', 2 => 'f', 3 => 'e', 9 => 'c', 10 => 'd', )
array_unique7: 71.557286024094 array ( 0 => 'b', 1 => 'a', 2 => 'f', 3 => 'e', 9 => 'c', 10 => 'd', )
And The Code...
set_time_limit(0);
define('HASH_TIMES', 1000);
header('Content-Type: text/plain');
$aInput = array();
for ($i = 0; $i < 10000; $i++) {
array_push($aInput, chr(rand(97, 102)));
}
function array_unique2($a) {
$n = array();
foreach ($a as $k=>$v)
if (!in_array($v,$n))
$n[$k]=$v;
return $n;
}
function array_unique3($aOriginal) {
$aUnique = array();
foreach ($aOriginal as $sKey => $sValue) {
if (!isset($aUnique[$sValue])) {
$aUnique[$sValue] = $sKey;
}
}
return array_flip($aUnique);
}
function array_unique4($aOriginal) {
return array_keys(array_flip($aOriginal));
}
function array_unique5($aOriginal) {
return array_flip(array_flip(array_reverse($aOriginal, true)));
}
function array_unique6($aOriginal) {
return array_flip(array_flip($aOriginal));
}
function array_unique7($A) {
$keys = Array();
$values = Array();
foreach ($A as $k => $v) {
if (!array_key_exists($v, $values)) {
$keys[] = $k;
$values[$v] = $v;
}
}
return array_combine($keys, $values);
}
function showResults($sMethod, $fTime, $aInput) {
echo $sMethod . ":\t" . $fTime . "\t" . implode("\t", array_map('trim', explode("\n", var_export(call_user_func($sMethod, $aInput), 1)))) . "\n";
}
echo 'Testing ' . (count($aInput)) . ' array items of data over ' . HASH_TIMES . " iterations:\n";
$fTime = microtime(1);
for ($i = 0; $i < HASH_TIMES; $i++) array_unique($aInput);
$aResults['array_unique'] = microtime(1) - $fTime;
$fTime = microtime(1);
for ($i = 0; $i < HASH_TIMES; $i++) array_unique2($aInput);
$aResults['array_unique2'] = microtime(1) - $fTime;
$fTime = microtime(1);
for ($i = 0; $i < HASH_TIMES; $i++) array_unique3($aInput);
$aResults['array_unique3'] = microtime(1) - $fTime;
$fTime = microtime(1);
for ($i = 0; $i < HASH_TIMES; $i++) array_unique4($aInput);
$aResults['array_unique4'] = microtime(1) - $fTime;
$fTime = microtime(1);
for ($i = 0; $i < HASH_TIMES; $i++) array_unique5($aInput);
$aResults['array_unique5'] = microtime(1) - $fTime;
$fTime = microtime(1);
for ($i = 0; $i < HASH_TIMES; $i++) array_unique6($aInput);
$aResults['array_unique6'] = microtime(1) - $fTime;
$fTime = microtime(1);
for ($i = 0; $i < HASH_TIMES; $i++) array_unique7($aInput);
$aResults['array_unique7'] = microtime(1) - $fTime;
asort($aResults, SORT_NUMERIC);
foreach ($aResults as $sMethod => $fTime) {
showResults($sMethod, $fTime, $aInput);
}
Results using Christoph's data set from the comments:
$aInput = array(); for($i = 0; $i < 1000; ++$i) $aInput[$i] = $i; for($i = 500; $i < 700; ++$i) $aInput[10000 + $i] = $i;
Testing 1200 array items of data over 1000 iterations:
array_unique6: 0.83235597610474
array_unique4: 0.84050011634827
array_unique5: 1.1954448223114
array_unique3: 2.2937450408936
array_unique7: 8.4412341117859
array_unique: 15.225166797638
array_unique2: 48.685120105743

PHP's arrays are implemented as hash tables, i.e. their performance characteristics are different from what you'd expect from 'real' arrays. An array's key-value-pairs are additionally stored in a linked list to allow fast iteration.
This explains why your implementation is so slow compared to your friend's: For every numeric index, your algorithm has to do a hash table lookup, whereas a foreach()-loop will just iterate over a linked list.
The following implementation uses a reverse hash table and might be the fastest of the crowd (double-flipping courtesy of joe_mucchiello):
function array_unique2($array) {
return array_flip(array_flip($array));
}
This will only work if the values of $array are valid keys, ie integers or strings.
I also reimplemented your algorithm using foreach()-loops. Now, it will actually be faster than your friend's for small data sets, but still slower than the solution via array_flip():
function array_unique3($array) {
$unique_array = array();
foreach($array as $current_key => $current_value) {
foreach($unique_array as $old_value) {
if($current_value === $old_value)
continue 2;
}
$unique_array[$current_key] = $current_value;
}
return $unique_array;
}
For large data sets, the built-in version array_unique() will outperform all other's except the double-flipping one. Also, the version using in_array() by your friend will be faster than array_unique3().
To summarize: Native code for the win!
Yet another version, which should preserve keys and their ordering:
function array_flop($array) {
$flopped_array = array();
foreach($array as $key => $value) {
if(!isset($flopped_array[$value]))
$flopped_array[$value] = $key;
}
return $flopped_array;
}
function array_unique4($array) {
return array_flip(array_flop($array));
}
This is actually enobrev's array_unique3() - I didn't check his implementations as thoroughly as I should have...

PHP is slower to execute than raw machine code (which is most likely executed by array_unique).
Your second example function (the one your friend wrote) is interesting. I do not see how it would be faster than the native implementation, unless the native one is removing elements instead of building a new array.

I'll admit I don't understand the native code very well, but it seems to copy the entire array, sort it, then loop through it removing duplicates. In that case your second piece of code is actually a more efficient algorithm, since adding to the end of an array is cheaper than deleting from the middle of it.
Keep in mind the PHP developers probably had a good reason for doing it the way they do. Does anyone want to ask them?

The native PHP function array_unique is implemented in C. Thus it is faster than PHP, that has to be translated first. What’s more, PHP uses an different algorithm than you do. As I see it, PHP first uses Quick sort to sort the elements and then deletes the duplicates in one run.
Why his friend’s implementation is faster has his own? Because it uses more built-in functionality that trying to recreate them.

Related

Cluster PHP array

Let's say I have an array of items with each item a value. I'd like to
create a new array where the items are clustered by their relative distance to each other.
When an item has a distance of one to another item, they belong to each other.
$input = [
'item-a' => 1,
'item-b' => 2,
'item-c' => 3,
'item-d' => 5,
];
$output = [
['item-a', 'item-b'],
['item-b', 'item-c'],
['item-d'],
];
This will create an output of overlapping arrays. What I want is that, because item-a and item-b are related, and item-b is also
related to item-c, I'd like to group item-a, item-b, and item-c to each other. The distance to item-c and item-d is greater than
1 so it will for a cluster of itself.
$output = [
['item-a', 'item-b', 'item-c'],
['item-d'],
];
How do I even start coding this?
Thanks in advance and have a nice day!

This can only be tested in your environment but here is what it does
it attempts to find relative distances based on array index 0's hash
it resorts the input array by distances (assuming that in this stage some will be positive and some negative) - that gives us the info to put the hash array in an order
Take this new array and put the hash back in
build a final output array measuring distances and sorting the level of output array by a threshhold.
I put in a couple dummy functions to return distances, obviously replace with your own. This might need tweaking but at this point, it's in your hands.
<?php
// example code
$input = [
'item-a' => 'a234234d',
'item-f' => 'h234234e',
'item-h' => 'e234234f',
'item-b' => 'f234234g',
'item-m' => 'd234234j',
'item-d' => 'm234234s',
'item-e' => 'n234234d',
'item-r' => 's234234g',
'item-g' => 'f234234f',
];
function getDistanceFrom($from, $to) {
return rand(-3,3);
}
function getDistanceFrom2($from, $to) {
return rand(0,7);
}
// first sort by relative distance from the first one
$tmp = [];
$ctr = 0;
foreach ($input as $item => $hash) {
if ($ctr === 0) { $ctr ++; continue; }
$tmp[$item]=getDistanceFrom(reset($input), $hash);
}
uasort($tmp, function ($a, $b)
{
return ($a < $b) ? -1 : 1;
});
//now they're in order, ditch the relative distance and put the hash back in
$sortedinput = [];
foreach ($tmp as $item => $d) {
$sortedinput[$item] = $input[$item];
}
$output=[];
$last=0;
$level=0;
$thresh = 3; // if item is within 3 of the previous, group
foreach($sortedinput as $v=>$i) {
$distance = getDistanceFrom2($last, $i);
if (abs($distance) > $thresh) $level++;
$output[$level][]=array("item" => $v, "distance" => $distance, "hash" => $i);
$last = $i;
}
print_r($output);

How to output different values from an array

I have a problem with my code. what I want to do is to delete an item from the array when it has been called meaning that, I want every output to be different. I want to use it to rotate proxy and there are over 150 proxies in the array. Here's an example of my code.
for ( $i = 1; $i < 2; $i++ )
{
// If the array_history is empty, re-populate it.
if (empty($array_history))
$array_history = $array;
// Select a random key.
$key = array_rand($array_history, 1);
// Save the record in $selected.
$selected = $array_history[$key];
// Remove the key/pair from the array.
unset($array_history[$key]);
// Echo the selected value.
echo $selected;
}
How can I do this or is a for loop not suitable for this? thanks in advance.

What you want to do is spread access over 150 proxies. In this case, it is not really necessary to do it randomly. You can just go through the array.
<?php
$array = [0, 1, 2, 3, 4, 5, 6];
for ( $i = 1; $i < 20; $i++ )
{
echo getNext($array) . '<br>';
}
function getNext (&$array) {
$e = next($array); // Every time next element is selected. Each output is different.
if ($e)
return $e;
else
return reset($array);
}
?>

This seems like a good application for a generator. This one takes an array of proxy addresses and loops over the array in random order, shuffling the array each time it starts the loop again.
function get_proxy($proxies) {
$i = 0;
$len = count($proxies);
while (true) {
if ($i == 0) shuffle($proxies);
yield $proxies[$i];
$i = ($i + 1) % $len;
}
}
To use this, you would do something like this:
$proxies = array('10.0.0.4', '192.168.0.1', '10.1.0.1');
$i = 0;
foreach (get_proxy($proxies) as $proxy) {
echo "$proxy\n";
$i++;
// stop otherwise infinite loop
if ($i == 9) break;
}
Note that since the generator has an infinite loop in it, the external foreach loop will also be infinite, and so needs a way to break out (I've used a simple counter in this case).
Sample output for the above code:
10.1.0.1
10.0.0.4
192.168.0.1
192.168.0.1
10.1.0.1
10.0.0.4
10.1.0.1
192.168.0.1
10.0.0.4
Demo on 3v4l.org
If a generator doesn't suit your code structure, you could use a function with static variables to return a new proxy on each call:
$proxies = array('10.0.0.4', '192.168.0.1', '10.1.0.1');
function get_proxy($proxies) {
static $i = 0, $keys;
if (!isset($keys)) $keys = array_keys($proxies);
if ($i == 0) shuffle($keys);
$proxy = $proxies[$keys[$i]];
$i = ($i + 1) % count($keys);
return $proxy;
}
for ($i= 0; $i < 9; $i++) {
echo get_proxy($proxies) . "\n";
}
Sample output for this code:
10.1.0.1
10.0.0.4
192.168.0.1
192.168.0.1
10.1.0.1
10.0.0.4
10.0.0.4
192.168.0.1
10.1.0.1
Demo on 3v4l.org

When you define an array in php such as
<?php
$alphabet = array(a, b, c)
?>
Your trying to look for the elements in the array. The element list always starts at a count of 0. So to call individual elements count from left to right starting at 0.
<?php
#a
echo $alphabet[0];
#b
echo $alphabet[1];
#c
echo $alphabet[2];
?>
The above section should yield a result of abc because there are no breaks.
For loops are really handy for going through the entire array and running checks, error analysis or even math as examples.

I've changed tho cede slightly, to me it reads as code that will take a selection of items, pick one at random, discard the item, and then pick again until none are left. If there are none left, copy the original array and start again.
Currently your code loops once. I've extended the loop to 4 iteratons here. And upon each random selection, am storing that random key in a history array. You can then refer to that history later in your code.
<?php
$array =
[
'beatle' => 'John',
'stone' => 'Mick',
'floyd' => 'Syd'
];
for ($history = [], $i = 1; $i < 5; $i++)
{
if (empty($buffer))
$buffer = $array;
$key = array_rand($buffer, 1);
$history[] = $key;
echo $buffer[$key], "\n";
unset($buffer[$key]);
}
var_export($history);
Example output:
Syd
Mick
John
Syd
array (
0 => 'floyd',
1 => 'stone',
2 => 'beatle',
3 => 'floyd',
)
Referring to the history array above:
echo $array[$history[3]];
Would output Syd.
Adapting and encapsulating the above in a class. This will do the same (but not store the history of picks), taking an item from an array at random and remove until there are no more items in the array, and then replenish the list and so on:
class RandomProxies
{
const PROXIES =
[
'proxy1' => 'foo.example.com',
'proxy2' => 'bar.example.com',
'proxy3' => 'baz.example.com',
];
private $buffer;
public function getProxy() {
if (empty($this->buffer))
$this->buffer = self::PROXIES;
$key = array_rand($this->buffer);
$proxy = self::PROXIES[$key];
unset($this->buffer[$key]);
return $proxy;
}
}
$randomiser = new RandomProxies;
foreach(range(1,4) as $n) {
echo $randomiser->getProxy(), "\n";
}
Example output:
foo.example.com
baz.example.com
bar.example.com
foo.example.com

combine array entries with every other entry

Sorry for the title as it looks like most of the other questions about combining arrays, but I don't know how to write it more specific.
I need a PHP function, which combines the entries of one array (dynamic size from 1 to any) to strings in every possible combination.
Here is an example with 4 entries:
$input = array('e1','e2','e3','e4);
This should be the result:
$result = array(
0 => 'e1',
1 => 'e1-e2',
2 => 'e1-e2-e3',
3 => 'e1-e2-e3-e4',
4 => 'e1-e2-e4',
5 => 'e1-e3',
6 => 'e1-e3-e4',
7 => 'e1-e4'
8 => 'e2',
9 => 'e2-e3',
10 => 'e2-e3-e4',
11 => 'e2-e4',
12 => 'e3',
13 => 'e3-e4',
14 => 'e4'
);
The sorting of the input array is relevant as it affects the output.
And as you see, there should be an result like e1-e2 but no e2-e1.
It seems really complicated, as the input array could have any count of entries.
I don't even know if there is a mathematical construct or a name which describes such a case.
Has anybody done this before?

You are saying that there might be any number of entries in the array so I'm assuming that you aren't manually inserting the data and there would be some source or code entering the data. Can you describe that? It might be easier to directly store it as per your requirement than having an array and then changing it as per your requirement
This might be helpful Finding the subsets of an array in PHP

I have managed to bodge together a code that creates the output you want from the input you have.
I think I have understood the logic of when and why each item looks the way it deos. But Im not sure, so test it carefully before using it live.
I have a hard time explaining the code since it's really a bodge.
But I use array_slice to grab the values needed in the strings, and implode to add the - between the values.
$in = array('e1','e2','e3','e4');
//$new =[];
$count = count($in);
Foreach($in as $key => $val){
$new[] = $val; // add first value
// loop through in to greate the long incrementing string
For($i=$key; $i<=$count-$key;$i++){
if($key != 0){
$new[] = implode("-",array_slice($in,$key,$i));
}else{
if($i - $key>1) $new[] = implode("-",array_slice($in,$key,$i));
}
}
// all but second to last except if iteration has come to far
if($count-2-$key >1) $new[] = Implode("-",Array_slice($in,$key,$count-2)). "-". $in[$count-1];
// $key (skip one) next one. except if iteration has come to far
If($count-2-$key >1) $new[] = $in[$key] . "-" . $in[$key+2];
// $key (skip one) rest of array except if iteration has come to far
if($count-2-$key > 1) $new[] = $in[$key] ."-". Implode("-",Array_slice($in,$key+2));
// $key and last item, except if iteration has come to far
if($count-1 - $key >1) $new[] = $in[$key] ."-". $in[$count-1];
}
$new = array_unique($new); // remove any duplicates that may have been created
https://3v4l.org/uEfh6

here is a modificated version of Finding the subsets of an array in PHP
function powerSet($in,$minLength = 1) {
$count = count($in);
$keys = array_keys($in);
$members = pow(2,$count);
$combinations = array();
for ($i = 0; $i < $members; $i++) {
$b = sprintf("%0".$count."b",$i);
$out = array();
for ($j = 0; $j < $count; $j++) {
if ($b{$j} == '1') {
$out[] = $keys[$j];
}
}
if (count($out) >= $minLength) {
$combinations[] = $out;
}
}
$result = array();
foreach ($combinations as $combination) {
$values = array();
foreach ($combination as $key) {
$values[$key] = $in[$key];
}
$result[] = implode('-', $values);
}
sort($result);
return $result;
}
This seems to work.

How to implode a generator?

I have a PHP generator which generates some $key => $value.
Is there an "easy" way to implode the values (by passing the generator name)? Or to transform it to an array?
I can do this with a couple of lines of code, but are there some builtins functions to accomplish this?

You can use iterator_to_array function for the same, have a look on below example:
function gen_one_to_three() {
for ($i = 1; $i <= 3; $i++) {
// Note that $i is preserved between yields.
yield $i;
}
}
$generator = gen_one_to_three();
$array = iterator_to_array($generator);
print_r($array);
Output
Array
(
[0] => 1
[1] => 2
[2] => 3
)

One has to keep in mind that
A generator is simply a function that returns an iterator.
For instance:
function digits() {
for ($i = 0; $i < 10; $i++) {
yield $i;
}
}
digits is a generator, digits() is an iterator.
Hence one should search for "iterator to array" functions, to find iterator_to_array (suggested by Chetan Ameta )

Randomize a PHP array with a seed?

I'm looking for a function that I can pass an array and a seed to in PHP and get back a "randomized" array. If I passed the same array and same seed again, I would get the same output.
I've tried this code
//sample array
$test = array(1,2,3,4,5,6);
//show the array
print_r($test);
//seed the random number generator
mt_srand('123');
//generate a random number based on that
echo mt_rand();
echo "\n";
//shuffle the array
shuffle($test);
//show the results
print_r($test);
But it does not seem to work. Any thoughts on the best way to do this?
This question dances around the issue but it's old and nobody has provided an actual answer on how to do it: Can i randomize an array by providing a seed and get the same order? - "Yes" - but how?
Update
The answers so far work with PHP 5.1 and 5.3, but not 5.2. Just so happens the machine I want to run this on is using 5.2.
Can anyone give an example without using mt_rand? It is "broken" in php 5.2 because it will not give the same sequence of random numbers based off the same seed. See the php mt_rand page and the bug tracker to learn about this issue.

Sorry, but accordingly to the
documentation the
shuffle function is seeded automatically.
Normally, you shouldn't try to come up with your own algorithms to randomize things since they are very likely to be biased. The Fisher-Yates algorithm is known to be both efficient and unbiased though:
function fisherYatesShuffle(&$items, $seed)
{
#mt_srand($seed);
for ($i = count($items) - 1; $i > 0; $i--)
{
$j = #mt_rand(0, $i);
$tmp = $items[$i];
$items[$i] = $items[$j];
$items[$j] = $tmp;
}
}
Example (PHP 5.5.9):
php > $original = array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9);
php > $shuffled = (array)$original;
php > fisherYatesShuffle($shuffled, 0);
php > print_r($shuffled);
Array
(
[0] => 6
[1] => 0
[2] => 7
[3] => 2
[4] => 9
[5] => 3
[6] => 1
[7] => 8
[8] => 5
[9] => 4
)
php > $shuffled = (array)$original;
php > fisherYatesShuffle($shuffled, 0);
php > print_r($shuffled);
Array
(
[0] => 6
[1] => 0
[2] => 7
[3] => 2
[4] => 9
[5] => 3
[6] => 1
[7] => 8
[8] => 5
[9] => 4
)

You can use array_multisort to order the array values by a second array of mt_rand values:
$arr = array(1,2,3,4,5,6);
mt_srand('123');
$order = array_map(create_function('$val', 'return mt_rand();'), range(1, count($arr)));
array_multisort($order, $arr);
var_dump($arr);
Here $order is an array of mt_rand values of the same length as $arr. array_multisort sorts the values of $order and orders the elements of $arr according to the order of the values of $order.

The problem you have is that PHP comes with two random number generators built in.
The shuffle() command does not use the mt_rand() random number generator; it uses the older rand() random number generator.
Therefore, if you want shuffle() to use a seeded number sequence, you need to seed the older randomiser, using srand() rather than mt_srand().
In most other cases, you should use mt_rand() rather than rand(), since it is a better random number generator.

The main question involves two parts. One is about how to shuffle. The other is about how to add randomness to it.
A simple solution
This is probably the simplest answer to the main question. It is sufficient for most cases in PHP scripting. But not all (see below).
function /*array*/ seedShuffle(/*one dimentional array*/ $array, /*integer*/ $seed) {
$tmp = array();
for ($rest = $count = count($array);$count>0;$count--) {
$seed %= $count;
$t = array_splice($array,$seed,1);
$tmp[] = $t[0];
$seed = $seed*$seed + $rest;
}
return $tmp;
}
The above method will do, even though it doesn't produce true random shuffles for all possible seed-array combinations. However, if you really want it to be balanced and all, I guess PHP shuldn't be your choice.
A more useful solution for advanced programmers
As stated by André Laszlo, randomization is a tricky business. It is usually best to let a dedicated object handle it. My point is, that you shouldn't need to bother with the randomness when you write the shuffle function. Depending on what degree of ramdomness you would like in your shuffle, you may have a number of PseudoRandom objects to choose from. Thus the above could look like this:
abstract class PseudoRandom {
protected abstract function /*integer*/ nextInt();
public function /*integer*/ randInt(/*integer*/ $limit) {
return $this->nextInt()%$limit;
}
}
function /*array*/ seedShuffle($array, /*PseudoRandom Object*/ $rnd) {
$tmp = array();
$count = count($array);
while($count>0) {
$t = array_splice($array,$rnd->randInt($count--),1);
$tmp[] = $t[0];
}
return $tmp;
}
Now, this solution is the one I would vote for. It separates shuffle codes from randomization codes. Depending on what kind of random you need you can subclass PseudoRandom, add the needed methods and your preferred formulas.
And, as the same shuffle function may be used with many random algorithms, one random algorithm may be used in different places.

In recent PHP versions, seeding the PHP builtin rand() and mt_rand() functions will not give you the same results everytime. The reason for this is not clear to me (why would you want to seed the function anyway if the result is different every time.) Anyway, it seems like the only solution is to write your own random function
class Random {
// random seed
private static $RSeed = 0;
// set seed
public static function seed($s = 0) {
self::$RSeed = abs(intval($s)) % 9999999 + 1;
self::num();
}
// generate random number
public static function num($min = 0, $max = 9999999) {
if (self::$RSeed == 0) self::seed(mt_rand());
self::$RSeed = (self::$RSeed * 125) % 2796203;
return self::$RSeed % ($max - $min + 1) + $min;
}
}
Usage:
// set seed
Random::seed(42);
// echo 10 numbers between 1 and 100
for ($i = 0; $i < 10; $i++) {
echo Random::num(1, 100) . '<br />';
}
The code above will output the folowing sequence every time you run it:
76
86
14
79
73
2
87
43
62
7
Just change the seed to get a completely different "random" sequence

A variant that also works with PHP version 7.2, because the php function create_function is deprecated in the newest php version.
mt_srand($seed);
$getMTRand = function () {
return mt_rand();
};
$order = array_map($getMTRand, range(1, count($array)));
array_multisort($order, $array);
return $array;

I guess this will do the job :
function choose_X_random_items($original_array , $number_of_items_wanted = -1 , $seed = FALSE ){
//save the keys
foreach ($original_array as $key => $value) {
$original_array[$key]['key_memory'] = $key;
}
$original_array = array_values($original_array);
$results = array();
if($seed !== FALSE){srand($seed);}
$main_random = rand();
$random = substr($main_random,0,( $number_of_items_wanted == -1 ? count($original_array) : min($number_of_items_wanted,count($original_array)) ));
$random = str_split($random);
foreach ($random AS $id => $value){
$pick = ($value*$main_random) % count($original_array);
$smaller_array[] = $original_array[$pick];
unset($original_array[$pick]);
$original_array = array_values($original_array);
}
//retrieve the keys
foreach ($smaller_array as $key => $value) {
$smaller_array[$value['key_memory']] = $value;
unset($smaller_array[$value['key_memory']]['key_memory']);
unset($smaller_array[$key]);
}
return $smaller_array;
}
In order to not limit the resulting array, set $number_of_items_wanted to -1
In order to not use a seed, set $seed to FALSE

Seeded shuffle while maintaining the key index:
function seeded_shuffle(array &$items, $seed = false) {
mt_srand($seed ? $seed : time());
$keys = array_keys($items);
$items = array_values($items);
for ($i = count($items) - 1; $i > 0; $i--) {
$j = mt_rand(0, $i);
list($items[$i], $items[$j]) = array($items[$j], $items[$i]);
list($keys[$i], $keys[$j]) = array($keys[$j], $keys[$i]);
}
$items = array_combine($keys, $items);
}

A simple solution:
$pool = [1, 2, 3, 4, 5, 6];
$seed = 'foo';
$randomIndex = crc32($seed) % count($pool);
$randomElement = $pool[$randomIndex];
It might not be quite as random as the Fisher Yates shuffle, but I found it gave me more than enough entropy for what I needed it for.

Based on #Gumbo, #Spudley, #AndreyP answers, it works like that:
$arr = array(1,2,3,4,5,6);
srand(123); //srand(124);
$order = array_map(function($val) {return rand();}, range(1, count($arr)));
array_multisort($order, $arr);
var_dump($arr);

Home Made function, using crc32(note:also returns negative value(https://www.php.net/manual/en/function.crc32.php))
$sortArrFromSeed = function($arr, $seed){
$arrLen = count($arr);
$newArr = [];
$hash = crc32($seed); // returns hash (0-9 numbers)
$hash = strval($hash);
while(strlen($hash) < $arrLen){
$hash .= $hash;
}
for ($i=0; $i<$arrLen; $i++) {
$index = (int) $hash[$i] * (count($arr)/9); // because 0-9 range
$index = (int) $index; // remove decimal
if($index !== 0) $index--;
array_push($newArr, $arr[$index]);
unset($arr[$index]);
$arr = array_values($arr);
}
return $newArr;
};
// TESTING
$arr = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'];
$arr = $sortArrFromSeed($arr,'myseed123');
echo '<pre>'.print_r($arr,true).'</pre>';

This seems the easiest for me...
srand(123);
usort($array,function($a,$b){return rand(-1,1);});

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Arrays - Remove duplicates ( Time complexity ) - php

Related

Cluster PHP array

How to output different values from an array

combine array entries with every other entry

How to implode a generator?

Randomize a PHP array with a seed?

Categories

Resources