Efficiently pick n random elements from PHP array (without shuffle) - php

I have the following code to pick $n elements from an array $array in PHP:
shuffle($array);
$result = array_splice($array, 0, $n);
Given a large array but only a few elements (for example 5 out of 10000), this is relatively slow, so I would like to optimize it such that not all elements have to be shuffled. The values must be unique.
I'm looking fo the most performant alternative. We can assume that $array has no duplicates and is 0-indexed.

$randomArray = [];
while (count($randomArray) < 5) {
$randomKey = mt_rand(0, count($array)-1);
$randomArray[$randomKey] = $array[$randomKey];
}
This will provide exactly 5 elements with no duplicates and very quickly. The keys will be preserved.
Note: You'd have to make sure $array had 5 or more elements or add some sort of check to prevent an endless loop.

This function performs a shuffle on only $n elements where $n is the number of random elements you want to pick. It will also work on associative arrays and sparse arrays. $array is the array to work on and $n is the number of random elements to retrieve.
If we define the $max_index as count($array) - 1 - $iteration.
It works by generating a random number between 0 and $max_index. Picking the key at that index, and replacing its index with the value at $max_index so that it can never be picked again, as $max_index will be one less at the next iteration and unreachable.
In summary this is the Richard Durstenfeld's Fisher-Yates shuffle but operating only on $n elements instead of the entire array.
function rand_pluck($array, $n) {
$array_keys = array_keys($array);
$array_length = count($array_keys);
$max_index = $array_length -1;
$iterations = min($n, $array_length);
$random_array = array();
while($iterations--) {
$index = mt_rand(0, $max_index);
$value = $array_keys[$index];
$array_keys[$index] = $array_keys[$max_index];
array_push($random_array, $array[$value]);
$max_index--;
}
return $random_array;
}

The trick is to use a variation of shuffle or in other words a partial shuffle.
performance is not the only criterion, statistical efficiency, i.e unbiased sampling is as important (as the original shuffle solution is)
function random_pick( $a, $n )
{
$N = count($a);
$n = min($n, $N);
$picked = array_fill(0, $n, 0); $backup = array_fill(0, $n, 0);
// partially shuffle the array, and generate unbiased selection simultaneously
// this is a variation on fisher-yates-knuth shuffle
for ($i=0; $i<$n; $i++) // O(n) times
{
$selected = mt_rand( 0, --$N ); // unbiased sampling N * N-1 * N-2 * .. * N-n+1
$value = $a[ $selected ];
$a[ $selected ] = $a[ $N ];
$a[ $N ] = $value;
$backup[ $i ] = $selected;
$picked[ $i ] = $value;
}
// restore partially shuffled input array from backup
// optional step, if needed it can be ignored, e.g $a is passed by value, hence copied
for ($i=$n-1; $i>=0; $i--) // O(n) times
{
$selected = $backup[ $i ];
$value = $a[ $N ];
$a[ $N ] = $a[ $selected ];
$a[ $selected ] = $value;
$N++;
}
return $picked;
}
NOTE the algorithm is strictly O(n) in both time and space, produces unbiased selections (it is a partial unbiased shuffling) and produces output which is proper array with consecutive keys (not needing extra array_values etc..)
Use example:
$randomly_picked = random_pick($my_array, 5);
// or if an associative array is used
$randomly_picked_keys = random_pick(array_keys($my_array), 5);
$randomly_picked = array_intersect_key($my_array, array_flip($randomly_picked_keys));
For further variations and extensions of shuffling for PHP:
PHP - shuffle only part of an array
PHP shuffle with seed
How can I take n elements at random from a Perl array?

This will only show benifits for small n compared to an array shuffle, but you could
Choose a random index r n times, each time decreasing the limit by 1
Adjust for previously used indices
Take value
Store used index
Pseudocode
arr = []
used = []
for i = 0..n-1:
r = rand 0..len-i
d = 0
for j = 0..used.length-1:
if r >= used[j]:
d += 1
arr.append($array[r + d])
used.append(r)
return arr

You could generate n-times a random number with mt_rand() and then fill these values in a new array. To go against the case where the same index gets returned twice we use the actual returned index to fill the new array and check always if the index exists in the new array, if so we use while to loop through it as long as we get a duplicate index. At the end we use array_values() to get a 0-indexed array.
$count = count($array) - 1;
$new_array = array();
for($i = 0; $i < $n; $i++) {
$index = mt_rand(0, $count);
while(isset($new_array[$index])) {
$index = mt_rand(0, $count);
}
$new_array[$index] = $array[$index];
}
$new_array = array_values($new_array);

I wonder why everyone here make it so complicated?
Here's the fastest and simplest way:
$randomArray = array_rand(array_flip($array), $n);

Related

shuffle an array with a "limit" in PHP

What I try to achieve: randomize the order of all elements in an array, but allow each element to change its position only by a limited number of "steps".
Say I have an array like below, and I wish to randomize with a limit of 2 steps:
$array = [92,12,2,18,17,88,56];
An outcome could be: [2,12,92,17,18,56,88] (all elements of the array moved a maximum of 2 steps), but it could not be: [56,92,2,12,17,18,88] because in this example 56 moved too far.
I considered using a combination of array_chunk and shuffle, but this is problematic because elements will be shuffled inside their chunk, resulting in elements at the beginning or end of a chunk only moving in one direction. This is what I came up with (and problematic):
// in chunks of 3 an element can move a max. of 2 steps.
$chunks = array_chunk($array, 3);
$newChunks = [];
foreach ($chunks as $chunk){
$keys = array_keys($chunk);
shuffle($keys);
$newChunk = [];
foreach ($keys as $key){
$newChunk[$key] = $chunk[$key];
}
$newChunks[] = $newChunk;
}
Another idea I had was to get the key of the item in the array and with rand add of subtract my limit. For example:
foreach ( $array as $key => $value ) {
$newArray[] = ["key" => $key+rand(-2,2), "value" => $value];
};
This creates a new array with each of its elements being an array with the original value plus a value key that is the original key plus or minus 2. I could flatten this array, but the problem with this is that I can have duplicate keys.
I created this function to do this, but I guess it needs more improvements:
/**
* #param array $array
* #param int $limit
* #return array
*/
function shuffleArray(array $array, int $limit): array
{
$arrayCount = count($array);
$limit = min($arrayCount, $limit);
for ($i = 0; $i < $limit; $i++) {
for ($j = 0; $j < $arrayCount;) {
$toIndex = min($arrayCount - 1, $j + rand(0, 1));
[$array[$j], $array[$toIndex]] = [$array[$toIndex], $array[$j]];
$j += (($toIndex === $j) ? 1 : 2);
}
}
return $array;
}
Test:
$array = [92, 12, 2, 18, 17, 88, 56];
$limit = 2;
$result = shuffleArray($array, $limit); // [12, 92, 17, 2, 18, 56, 88]
Here is a possible solution in one pass :
Try to swap each element at position i with an element between i (stay in place) and i+x. I look only forward to avoid swaping an element several times. And I need an extra array to flag the already swapped elements. I don't need to process them in the future as they were already moved.
function shuffle_array($a, $limit)
{
$result = $a ;
$shuffled_index = array() ; // list of already shuffled elements
$n = count($result);
for($i = 0 ; $i < $n ; ++$i)
{
if( in_array($i, $shuffled_index) ) continue ; // already shuffled, go to the next elements
$possibleIndex = array_diff( range($i, min($i + $limit, $n-1)), $shuffled_index) ; // get all the possible "jumps", minus the already- shuffled index
$selectedIndex = $possibleIndex[ array_rand($possibleIndex) ]; // randomly choose one of the possible index
// swap the two elements
$tmp = $result[$i] ;
$result[$i] = $result[$selectedIndex] ;
$result[$selectedIndex] = $tmp ;
// element at position $selectedIndex is already shuffled, it needs no more processing
$shuffled_index[] = $selectedIndex ;
}
return $result ;
}
$array = [92,12,2,18,17,88,56];
$limit = 2 ;
shuffle_array($array, $limit); // [2, 18, 92, 12, 17, 56, 88]
I expect more elements to stay in place than in the solution of Kerkouch, as some elements can have very few remaining free choices.

Sort a flat array in recurring ascending sequences

I am trying to sort it in a repeating, sequential pattern of numerical order with the largest sets first.
Sample array:
$array = [1,1,1,2,3,2,3,4,5,4,4,4,5,1,2,2,3];
In the above array, I have the highest value of 5 which appears twice so the first two sets would 1,2,3,4,5 then it would revert to the second, highest value set etc.
Desired result:
[1,2,3,4,5,1,2,3,4,5,1,2,3,4,1,2,4]
I am pretty sure I can split the array into chunks of the integer values then cherrypick an item from each subarray sequentially until there are no remaining items, but I just feel that this is going to be poor for performance and I don't want to miss a simple trick that PHP can already handle.
Here's my attempt at a very manual loop using process, the idea is to simply sort the numbers into containers for array_unshifting. I'm sure this is terrible and I'd love someone to do this in five lines or less :)
$array = array(1,1,1,2,3,2,3,4,5,4,4,4,5,1,2,2,3);
sort($array);
// Build the container array
$numbers = array_fill_keys(array_unique($array),array());
// Assignment
foreach( $array as $number )
{
$numbers[ $number ][] = $number;
}
// Worker Loop
$output = array();
while( empty( $numbers ) === false )
{
foreach( $numbers as $outer => $inner )
{
$output[] = array_shift( $numbers[ $outer ] );
if( empty( $numbers[ $outer ] ) )
{
unset( $numbers[ $outer ] );
}
}
}
var_dump( $output );
I think I'd look at this not as a sorting problem, but alternating values from multiple lists, so rather than coming up with sets of distinct numbers I'd make sets of the same number.
Since there's no difference between one 1 and another, all you actually need is to count the number of times each appears. It turns out PHP can do this for you with aaray_count_values.
$sets = array_count_values ($input);
Then we can make sure the sets are in order by sorting by key:
ksort($sets);
Now, we iterate round our sets, counting down how many times we've output each number. Once we've "drained" a set, we remove it from the list, and once we have no sets left, we're all done:
$output = [];
while ( count($sets) > 0 ) {
foreach ( $sets as $number => $count ) {
$output[] = $number;
if ( --$sets[$number] == 0 ) {
unset($sets[$number]);
}
}
}
This algorithm could be adapted for cases where the values are actually distinct but can be put into sets, by having the value of each set be a list rather than a count. Instead of -- you'd use array_shift, and then check if the length of the set was zero.
You can use only linear logic to sort using php functions. Here is optimized way to fill data structures. It can be used for streams, generators or anything else you can iterate and compare.
$array = array(1,1,1,2,3,2,3,4,5,4,4,4,5,1,2,2,3);
sort($array);
$chunks = [];
$index = [];
foreach($array as $i){
if(!isset($index[$i])){
$index[$i]=0;
}
if(!isset($chunks[$index[$i]])){
$chunks[$index[$i]]=[$i];
} else {
$chunks[$index[$i]][] = $i;
}
$index[$i]++;
}
$result = call_user_func_array('array_merge', $chunks);
print_r($result);
<?php
$array = array(1,1,1,2,3,2,3,4,5,4,4,4,5,1,2,2,3);
sort($array);
while($array) {
$n = 0;
foreach($array as $k => $v) {
if($v>$n) {
$result[] = $n = $v;
unset($array[$k]);
}
}
}
echo implode(',', $result);
Output:
1,2,3,4,5,1,2,3,4,5,1,2,3,4,1,2,4
New, more elegant, more performant, more concise answer:
Create a sorting array where each number gets its own independent counter to increment. Then use array_multisort() to sort by this grouping array, then sort by values ascending.
Code: (Demo)
$encounters = [];
foreach ($array as $v) {
$encounters[] = $e[$v] = ($e[$v] ?? 0) + 1;
}
array_multisort($encounters, $array);
var_export($array);
Or with a functional style with no global variable declarations: (Demo)
array_multisort(
array_map(
function($v) {
static $e;
return $e[$v] = ($e[$v] ?? 0) + 1;
},
$array
),
$array
);
var_export($array);
Old answer:
My advice is functionally identical to #El''s snippet, but is implemented in a more concise/modern/attractive fashion.
After ensuring that the input array is sorted, make only one pass over the array and push each re-encountered value into its next row of values. The $counter variable indicates which row (in $grouped) the current value should be pushed into. When finished looping and grouping, $grouped will have unique values in each row. The final step is to merge/flatten the rows (preserving their order).
Code: (Demo)
$grouped = [];
$counter = [];
sort($array);
foreach ($array as $v) {
$counter[$v] = ($counter[$v] ?? -1) + 1;
$grouped[$counter[$v]][] = $v;
}
var_export(array_merge(...$grouped));

Get x highest values from an array including identical values

I'm using the following code to retrieve the highest 3 numbers from an array.
$a = array(1,2,5,10,15,20,10,15);
arsort($a, SORT_NUMERIC);
$highest = array_slice($a, 0, 3);
This code correctly gives me the highest three numbers array(20,15,10); however, I'm interested in getting the highest 3 numbers including the ones that are identical. In this example, I'm expecting to get an array like array(10, 10, 15, 15, 20)
Might be simpler but my brain is tired. Use arsort() to get the highest first, count the values to get unique keys with their count and slice the first 3 (make sure to pass true to preserve keys):
arsort($a, SORT_NUMERIC);
$counts = array_slice(array_count_values($a), 0, 3, true);
Then loop those 3 and fill an array with the number value the number of times it was counted and merge with the previous result:
$highest = array();
foreach($counts as $value => $count) {
$highest = array_merge($highest, array_fill(0, $count, $value));
}
You can use a function like this:
$a = array(1,2,5,10,15,20,10,15); //-- Original Array
function get3highest($a){
$h = array(); //-- highest
if(count($a) >= 3){ //-- Checking length
$c = 0; //-- Counter
while ($c < 3 || in_array($a[count($a)-1],$h) ){ //-- 3 elements or repeated value
$max = array_pop($a);
if(!in_array($max,$h)){
++$c;
}
$h[] = $max;
}
sort($h); //-- sorting
}
return $h; //-- values
}
print_r(get3Highest($a));
Of course you can improve this function to accept a dinamic value of "highest" values.
The below function may be usefull
$a = array(1,2,5,10,15,20,10,15);
function getMaxValue($array,$n){
$max_array = array(); // array to store all the max values
for($i=0;$i<$n;$i++){ // loop to get number of highest values
$keys = array_keys($array,max($array)); // get keys
if(is_array($keys)){ // if keys is array
foreach($keys as $v){ // loop array
$max_array[]=$array[$v]; // set values to max_array
unset($array[$v]); // unset the keys to get next max value
}
}else{ // if not array
$max_array[]=$array[$keys]; // set values to max_array
unset($array[$keys]); // unset the keys to get next max value
}
}
return $max_array;
}
$g = getMaxValue($a,3);
Out Put:
Array
(
[0] => 20
[1] => 15
[2] => 15
[3] => 10
[4] => 10
)
You can modify it to add conditions.
I thought of a couple of other possibilities.
First one:
Find the lowest of the top three values
$min = array_slice(array_unique($a, SORT_NUMERIC), -3)[0];
Filter out any lower values
$top3 = array_filter($a, function($x) use ($min) { return $x >= $min; });
Sort the result
sort($top3);
Advantages: less code
Disadvantages: less inefficient (sorts, iterates the entire array, sorts the result)
Second one:
Sort the array in reverse order
rsort($a);
Iterate the array, appending items to your result array until you've appended three distinct items.
$n = 0;
$prev = null;
$top = [];
foreach ($a as $x) {
if ($x != $prev) $n++;
if ($n > 3) break;
$top[] = $x;
$prev = $x;
}
Advantages: more efficient (sorts only once, iterates only as much as necessary)
Disadvantages: more code
This gives the results in descending order. You can optionally use array_unshift($top, $x) instead of $top[] = $x; to get it in ascending order, but I think I've read that array_unshift is less efficient because it reindexes the array after each addition, so if optimization is important it would probably be better to just use $top[] = $x; and then iterate the result in reverse order.

Given an array of integers, what's the most efficient way to get the number of other integers in the array within n?

Given the following array:
$arr = array(0,0,1,2,2,5,6,7,7,9,10,10);
And assuming $n = 2, what is the most efficient way to get a count of each value in the array within $n of each value?
For example, 6 has 3 other values within $n: 5,7,7.
Ultimately I'd like a corresponding array with simply the counts within $n, like so:
// 0,0,1,2,2,5,6,7,7,9,10,10 // $arr, so you can see it lined up
$count_arr = array(4,4,4,4,4,3,3,4,4,4, 2, 2);
Is a simple foreach loop the way to go? CodePad Link
$arr = array(0,0,1,2,2,5,6,7,7,9,10,10);
$n = 2;
$count_arr = array();
foreach ($arr as $v) {
$range = range(($v-$n),($v+$n)); // simple range between lower and upper bound
$count = count(array_intersect($arr,$range)); // count intersect array
$count_arr[] = $count-1; // subtract 1 so you don't count itself
}
print_r($arr);
print_r($count_arr);
My last answer was written without fully groking the problem...
Try sorting the array, before processing it, and leverage that when you run through it. This has a better runtime complexity.
$arr = array(0,0,1,2,2,5,6,7,7,9,10,10);
asort($arr);
$n = 2;
$cnt = count($arr);
$counts = array_pad(array(), $cnt, 0);
for ($x=0; $x<$cnt; $x++) {
$low = $x - 1;
$lower_range_bound = $arr[$x]-$n;
while($low >= 0 && ($arr[$low] >= $lower_range_bound)) {
$counts[$x]++;
$low--;
}
$high = $x + 1;
$upper_range_bound = $arr[$x]+$n;
while($high < $cnt && $arr[$high] <= $upper_range_bound) {
$counts[$x]++;
$high++;
}
}
print_r($arr);
print_r($counts);
Play with it here: http://codepad.org/JXlZNCxW

How to sort it better?

I have 2 arrays, both are multidimensional with same number of elements and same values, which are on different positions (those values are actually ID-s from my database, so one ID appears only once). How can I sort second array with values which are in first array?
For example - if first array looks like:
$array1[0][0] = 1;
$array1[0][x] = it doesn't matter what's here
$array1[1][0] = 4;
$array1[1][x] = it doesn't matter what's here
$array1[2][0] = 3;
$array1[2][x] = it doesn't matter what's here
...
how to sort second array so it would have same values as array1 on indexes [0][0], [1][0], [2][0], etc.
How I could solve problem is:
$i=0
while ($i < (count($array1)-2)){ // * check down
$find_id = $array1[$i][0];
// here I need to search for index of that ID in other array
$position = give_index($find_id, $array2);
// swapping positions
$temp = array2[$i][0];
$array2[$i][0] = $array2[$position][0];
$array2[$position][0] = $temp;
// increasing counter
i++;
}
function give_index($needle, $haystack){
for ($j = 0, $l = count($haystack); $j < $l; ++$j) {
if (in_array($needle, $haystack[$j][0])) return $j;
}
return false;
}
*There is only -2 because indexes start from 0 and also for the last element you don't need to check since it would be automatically sorted by last iteration of while-loop.
I don't find this solution good as I think that this is quite simple issue (maybe it's not even correct). Is there easier way in PHP that I'm missing?
This is the most efficient way I can think of:
function swap(&$a, &$b) {
$t = $a;
$a = $b;
$b = $t;
}
function find_index($id, $array, $from = 0) {
$index = false;
for ($i = $from, $c = count($array); $i < $c; $i++) {
if ($array[$i][0] == $id) {
$index = $i;
break;
}
}
return $index;
}
for ($i = 0, $c = count($array1); $i < ($c - 2); $i++) {
if ($array1[$i][0] != $array2[$i][0]) {
$fi = find_index($array1[$i][0], $array2, $i);
swap($array2[$i][0], $array2[$fi][0]);
}
}
What changes from yours?
I've defined a swap() function in order to swap any variable. That doesn't cost anything and makes everything look nicer. Also you can reuse that function later if you need to.
In the find_index (give_index in your code) we stop the loop once we find the correct index. Also we avoid the cost of an in_array function call.
We modified the find_index function to start only from the part of the array we haven't checked yet. Leading to a way more efficient way of scan the array.
In the for loop (a while loop was just wrong there) we stored the count of the array once, avoiding multiple calls.
Also we swap the $array2 values only if they are in the wrong place.
Other improvements
If you know anything else of the $array2 array you can make this even more performant. For example if you know that indexes are alternated like in $array1 you can change the main for loop from:
for ($i = 0, $c = count($array1); $i < ($c - 2); $i++) {
to
for ($i = 0, $c = count($array1); $i < ($c - 2); $i+2) {
(notice the $i+2 at the end) And you could do that in the find_index function as well.
Look into usort (http://php.net/manual/en/function.usort.php).
It provides a simple way to sort arrays using a user provided comparison function.

Categories