Is there an elegant way of getting values from a massive multi-dimensional array using another array for the keys to lookup?
e.g.
$cats[A][A1][A11][A111] = $val;
$cats[A][A1][A11][A112] = $val;
$cats[A][A1][A12] = $val;
$cats[A][A1][A12][A121] = $val;
$cats[A][A2] = $val;
$cats[A][A2][A21] = $val;
$cats[A][A2][A22] = $val;
$cats[A][A2][A22][A221] = $val;
$cats[A][A2][A22][A222] = $val;
access values from $cats using $keys = Array ('A', 'A2', 'A22', 'A221');
without checking the length of $keys and doing something like...
switch (count($keys)) {
case 1: $val = $cats[$keys[0]]; break;
case 2: $val = $cats[$key[0]][$key[1]]; break;
case 3: $val = $cats[$key[0]][$key[1]][$key[2]]; break;
...
}
many thanks.
Why not use recursion? Something like this:
function get_val($array, $keys) {
if(empty($keys) || !is_array($keys) || !is_array($array)) return $array;
else {
$first_key = array_shift($keys);
return get_val($array[$first_key], $keys);
}
}
I originally had this written in a loop, but changed it to recursive for some reason. It's true, as yeoman said, that a recursive function is more likely than a loop to cause a stack overflow, especially if your array is sufficiently large (PHP does support end recursion), so here's a loop that should accomplish the same purpose:
// given a multidimensional array $array and single-dimensional array of keys $keys
$desired_value = $array;
while(count($keys) > 0) {
$first_key = array_shift($keys);
$desired_value = $desired_value[$first_key];
}
That's fine so far. Otherwise you would need to iterate through array and check deepness. To make it dynamic I am sure you add keys into $keys array when constructing $cats. Using recursion also solution it will take more steps, more memory.
jburbage's suggestion of using recursion is OK in principle, but from what I know, PHP doesn't support end-recursion.
And the question was about a "massive" multidimensional array.
As "massive" suggests great depth in addition to great overall size, it's possible to run into a stack overflow with this solution, as it's usually possible to create data structures on the heap that reach deeper than the stack can cope with via recursion.
The approach is also not desirable from the performance point of view in this case.
Simply refactor jburbage's recursive solution to work in a loop, and you're almost there :-)
Here's jburbage's original suggested code once again:
function get_val($array, $keys) {
if(empty($keys) || !is_array($keys) || !is_array($array)) return $array;
else {
$first_key = array_shift($keys);
return get_val($array[$first_key], $keys);
}
}
Related
EDIT: Thanks to nice_dev for providing an excellent solution below!
I had to rewrite part of his query for correct results:
From:
$tmpmatch['unmatched'] = array_filter($deck['main_deck'],
function ($val) use (&$tmparray, &$matched) {
$matched[ $val ] = $matched [$val] ?? 0;
$matched[ $val ]++;
return $matched[ $val ] <= $tmparray[ $val ];
}
);
To:
$tmpmatch['unmatched'] = array_filter($deck['main_deck'],
function ($val) use (&$tmparray) {
return empty($tmparray[$val]) || !$tmparray[$val]++;
}
);
Original:
I'm attempting to compare arrays to get a new array of matched and unmatched items. I have a loop that goes through roughly 55,000 items. The processing of this script can take upwards of 20+ minutes to attempt to complete and I've narrowed it down to both usage of array_intersect and array_filter within the foreach. Ideally, I need it to complete much faster. If I limit the foreach to 1000 items it still takes upwards of ~3 minutes to complete which is slow for the client-side experience.
If I remove them, the script completes almost immediately. As such, I will include only these relevant pieces in the code below.
I'm using a custom array_intersect_fixed function as regular array_intersect returned wrong results with duplicate values as per here.
Explanations:
totalidsarray = An array of numbers such as ['11233', '35353, '3432433', '123323']. Could contain thousands of items.
$deck['main_deck'] = An array of numbers to compare against $totalidsarray. Similar structure. Max length is 60 items.
foreach($dbdeckarray as $deck){
$tmparray = $totalidsarray;
//Get an array of unmatched cards between the deck and the collection
//Collection = $tmparray
$tmpmatch['unmatched'] = array_filter($deck['main_deck'],
function ($val) use (&$tmparray) {
$key = array_search($val, $tmparray);
if ( $key === false ) return true;
unset($tmparray[$key]);
return false;
}
);
//Get an array of matched cards between the deck and the collection
$tmpmatch['matched'] = array_intersect_fixed($deck['main_deck'], $totalidsarray);
//Push results to matcharray
$matcharray[] = $tmpmatch;
}
//Built in array_intersect function returns wrong result when input arrays have duplicate values.
function array_intersect_fixed($array1, $array2) {
$result = array();
foreach ($array1 as $val) {
if (($key = array_search($val, $array2, FALSE))!==false) {
$result[] = $val;
unset($array2[$key]);
}
}
return $result;
}
To make matters worse, I have to do 2 further matched/unmatched checks within that same foreach loop against another array extra_deck, further increasing processing time.
Is there a more optimized approach I can take to this?
EDIT: Explanation of what the code needs to achieve.
The script will retrieve the user's card collection of cards that they own from a card game. This is assigned into totalidsarray. It will then query every deck in the database (~55,000) and compare the collection you own against the built deck of cards (main_deck). It then attempted to extract all owned cards (matched) and all un-owned cards (unmatched) into two arrays. Once the full foreach loop is done, the client-side returns a list of each deck alongside the matched cards/unmatched cards of each (with a % match for each).
A couple of optimizations I can suggest:
The array_intersect_fixed routine you have is quadratic in nature in terms of getting the result, because it is 2 nested loops under the hood. We can use array_count_values to optimize it to work in linear time(which uses a map).
json_decode() doesn't need to be done twice for every deck. If you do it once and use it wherever needed, it should work just fine(unless you make any edits in place which I don't find right now) . It also needs to be decoded to an array and not to an object using the true flag.
For your array_filter, the comparison is also quadratic in nature. We will use array_count_values again to optimize it and use a $matched array. We keep counting the frequency of elements and if any of them surpasses count in $tmparray, we return false, else, we return true.
Snippet:
<?php
$tmparray = array_count_values($totalidsarray);
foreach($dbdeckarray as $deck){
$matched = [];
$deck['main_deck'] = json_decode($deck['main_deck'], true);
$tmpmatch['unmatched'] = array_filter($deck['main_deck'],
function ($val) use (&$tmparray, &$matched) {
$matched[ $val ] = $matched [$val] ?? 0;
$matched[ $val ]++;
return $matched[ $val ] <= $tmparray[ $val ];
}
);
$tmpmatch['matched'] = array_intersect_fixed($deck['main_deck'], $tmparray);
$matcharray[] = $tmpmatch;
}
function array_intersect_fixed($array1, $array2) {
$result = array();
$matched = [];
foreach ($array1 as $val) {
$matched[ $val ] = $matched[ $val ] ?? 0;
$matched[ $val ]++;
if (isset($array2[ $val ]) && $matched[ $val ] <= $array2[ $val ]) {
$result[] = $val;
}
}
return $result;
}
Note: array_intersect_fixed expects $array2 to be in the Hashmap way by default. If you wish to use it elsewhere, make sure to pass array_count_values of the array as 2nd parameter or use a third parameter to indicate a flag check otherwise.
Beside #nice_dev suggestion your code can be simplified.
The unmatched part is an array diff
array_diff($deck['main_deck'], $tmparray);
The array_intersect_fixed(), if the problem are duplicated value in array, can be avoided by running array_unique() on the array (I guess is $deck['main_deck']) before calling array_intersect()
This will also speed up array_diff() as it will have less array element to compare.
I want to modify a big array inside a function, so I'm pretty sure I need to use references there, but I'm not sure what of these two alternatives is better (more performant, but also maybe some side effects?):
$array1 = getSomeBigArray();
$array2 = getAnotherBigArray();
$results[] = combineArrays($array1, $array2);
function combineArrays(&$array1, $array2){
// this is not important, just example of modification
foreach($array2 as $value){
if($value > 0){
$array1[] = $value;
}
}
return $array1; // will returning $array1 make a copy?
}
Option 2:
$array1 = getSomeBigArray();
$array2 = getAnotherBigArray();
combineArrays($array1, $array2);
$results[] = $array1;
function combineArrays(&$array1, $array2){
foreach($array2 as $value){
if($value > 0){
$array1[] = $value;
}
}
// void function
}
EDIT:
I have run some tests and now I'm more confused.
This is the test:
https://ideone.com/v7sepC
From those results it seems to be faster to not use references at all! and if used is faster option1 (with return).
But in my local env using references seems to be faster (not so much).
EDIT 2:
Maybe there is a problem with ideone.com? because running the same here:
https://3v4l.org/LaffP
the result is:
Opcion1 and Option2 (references) are almost equal and faster than passing by value
code 1: 1000000 values, resources: 32
code 1: 10000000 values, resources: 67
code 2: 1000000 values, resources: 27
code 2: 2000000 values, resources: 49
I calculated the resource usage of the system by calling
getrusage
And code 2 seems to be more performant. You can use the following code to make some tests yourself:
<?php
function getSomeBigArray() {
$arr = [];
for ($i=0;$i<2000000;$i++) {
$arr[] = $i;
}
return $arr;
}
function rutime($ru, $rus, $index) {
return ($ru["ru_$index.tv_sec"]*1000 + intval($ru["ru_$index.tv_usec"]/1000))
- ($rus["ru_$index.tv_sec"]*1000 + intval($rus["ru_$index.tv_usec"]/1000));
}
$array1 = getSomeBigArray();
$array2 = getSomeBigArray();
$rustart = getrusage();
$results[] = combineArrays($array1, $array2);
$ru = getrusage();
echo rutime($ru, $rustart, "utime");
function combineArrays(&$array1, $array2){
// The array combining method.
}
Note: method rutime used was copied by the right answer of the following stackoverflow post: Tracking the script execution time in PHP
When you do return $array1; (in the first option) it does not copy the array, only increases the reference counter and returns the reference to the same array.
I.e. returning value of the function and $array1 will be pointing to the same array in the memory. Unless you modify any of them: in that moment the data will be actually copied.
The same happens when you are assigning a value to $results[] = $array1; no data is actually copied, only a reference being put into a new element of $results.
In the end, both options have the same result: You'll have references to the same data in variable $array1 and in the last item of $results. Therefore, there is no notable performance difference in those two options.
Also, consider using native functions to perform typical actions. E.g. array_merge()
I'm trying to find a simpler way to create new arrays from existing arrays and values. There are two routines I'd like to optimize that are similar in construction. The form of the first one is:
$i = 0;
$new_array = array();
foreach ($my_array as $value) {
$new_array[$i][0] = $constant; // defined previously and unchanging
$new_array[$i][1] = $value; // different for each index of $my_array
$i++;
}
The form of the second one has not one but two different values per constant; notice that $value comes before $key in the indexing:
$i = 0;
$new_array = array();
foreach ($my_array as $key => $value) {
$new_array[$i][0] = $constant; // defined previously and unchanging
$new_array[$i][1] = $value; // different for each index of $my_array
$new_array[$i][2] = $key; // different for each index of $my_array
$i++;
}
Is there a way to optimize these procedures with shorter and more efficient routines using the array operators of PHP? (There are many, of course, and I can't find one that seems to fit the bill.)
I believe a combination of Wouter Thielen's suggestions regarding the other solutions actually holds the best answer for me.
For the first case I provided:
$new_array = array();
// $my_array is numeric, so $key will be index count:
foreach ($my_array as $key => $value) {
$new_array[$key] = array($constant, $value);
};
For the second case I provided:
// $my_array is associative, so $key will initially be a text index (or similar):
$new_array = array();
foreach ($my_array as $key => $value) {
$new_array[$key] = array($constant, $value, $key);
};
// This converts the indexes to consecutive integers starting with 0:
$new_array = array_values($new_array);
it is shorter, when you use the array-key instead of the $i-counter
$new_array = array();
foreach ($my_array as $key => $value) {
$new_array[$key][0] = $constant; // defined previously and unchanging
$new_array[$key][1] = $value; // different for each index of $my_array
}
Use array_map:
$new_array = array_map(function($v) use ($constant) {
return array($constant, $v);
}, $my_array);
If you want to use the keys too, for your second case:
$new_array = array_map(function($k, $v) use ($constant) {
return array($constant, $v, $k);
}, array_keys($my_array), $my_array);
Assuming the $constant variable is defined in the caller's scope, you'll need to use use ($constant) to pass it into the function's scope.
array_walk is similar, but modifies the array you pass to it, so if you want to update $my_array itself, use array_walk. Your second case then becomes this:
array_walk($my_array, function(&$val, $key) use($constant) {
$val = array($constant, $val, $key);
});
In both examples above for the second case, you'll end up with an associative array (i.e. with the keys still being the keys for the array). If you want to convert this into a numerically indexed array, use array_values:
$numerically_indexed = array_values($associative);
I asked a question similar to this a few days ago, check it out:
PHP - Fastest way to convert a 2d array into a 3d array that is grouped by a specific value
I think that you have an optimal way when it comes to dealing with large amount of data. For smaller amounts there is a better way as was suggested by the benchmarks in my question.
I think too that readability and understanding the code can also be an issue here and I find that things that you can understand are worth more later on than ideas that you do not really grasp as it generally takes a long time to understand them again as it can be quite confusing while debugging issues.
I would suggest, you take a look at the differences between JSON encoded arrays and serialised arrays as there can be major performance differences when working with the two. It seems that as it is now JSON encoded arrays are a more optimised format (faster) for holding and working with data however this will likely change with PHP 7. It would be useful to note that they are also more portable.
Further Reading:
Preferred method to store PHP arrays (json_encode vs serialize)
http://techblog.procurios.nl/k/n618/news/view/34972/14863/cache-a-large-array-json-serialize-or-var_export.html
I have an array that contains the location of a value in a very large multidimensional array. I need to take this location and replace the value at the location with another value. I have found numerous articles about returning the value of a position using such an array of indexes by writing a recursive function. However, this won't work because I can't slice up the large array, I need to replace just that one value.
The location would look something like:
array(1,5,3,4,6);
The code I had to find a value is the following:
function replace_value($indexes, $array, $replacement){
if(count($indexes) > 1)
return replace_value(array_slice($indexes, 1), $array[$indexes[0]], $replacement);
else
return $array[$indexes[0]];
}
}
How would I modify this to instead of recursively cutting down an array until the value is found I can simply modify a part of a large array? Is there a way to build
array(1,5,3,4,6);
Into
$array[1][5][3][4][6];
Thanks
You could modify your function like this:
function replace_value($indexes, &$array, $replacement){
if(count($indexes) > 1) {
return replace_value(array_slice($indexes, 1), $array[$indexes[0]], $replacement);
} else {
return $array[$indexes[0]] = $replacement;
}
}
Make sure your write &$array in the function definition, not $array This will pass in the actual array, so that you can modify it in place. Otherwise you would just be passing in a copy.
Assuming you trust the contents of the variable containing your array indices, this is a completely valid use of eval:
$keys = array(1,5,3,4,6);
$keys = "[" . join($keys, "][") . "]";
$value = "what";
eval("\$array$keys = '$value';"); # $array[1][5][3][4][6] = 'what';
Here's a solution without using eval. Go through each key and reduce the array as you go. The $ref variable below is a reference to the original array so changing it will change the original.
$keys = array(1,5,3,4,6);
$array[1][5][3][4][6] = 'foo';
$ref = &$array;
foreach( $keys as $key ) {
$ref = &$ref[ $key ];
}
$ref = 'bar';
echo $array[1][5][3][4][6]; // 'bar'
This is untested. I tend to shy away from using references because I think they're particularly confusing, and they leave remnant reference in your code that can cause difficult to find bugs.
$keys = array(1,5,3,4,6);
$path = 'new leaf value';
foreach (array_reverse($keys) as $key) {
$path = array($key => $path);
}
$modified = array_replace_recursive($origionalArray, $path);
Is it possible to remove a string (see example below) from a PHP array without knowing the index?
Example:
array = array("string1", "string2", "string3", "string4", "string5");
I need to remove string3.
$index = array_search('string3',$array);
if($index !== FALSE){
unset($array[$index]);
}
if you think your value will be in there more than once try using array_keys with a search value to get all of the indexes. You'll probably want to make sure
EDIT:
Note, that indexes remain unchanged when using unset. If this is an issue, there is a nice answer here that shows how to do this using array_splice.
This is probably not the fastest method, but it's a short and neat one line of code:
$array = array_diff($array, array("string3"))
or if you're using PHP >5.4.0 or higher:
$array = array_diff($array, ["string3"])
You can do this.
$arr = array("string1", "string2", "string3", "string4", "string5");
$new_arr=array();
foreach($arr as $value)
{
if($value=="string3")
{
continue;
}
else
{
$new_arr[]=$value;
}
}
print_r($new_arr);
Use a combination of array_search and array_splice.
function array_remove(&$array, $item){
$index = array_search($item, $array);
if($index === false)
return false;
array_splice($array, $index, 1);
return true;
}
You can also try like this.
$array = ["string1", "string2", "string3", "string4", "string5"];
$key = array_search('string3',$array);
unset($array[$key]);
It sort of depends how big the array is likely to be, and there's multiple options.
If it's typically quite small, array_diff is likely the fastest consistent solution, as Jorge posted.
Another solution for slightly larger sets:
$data = array_flip($data);
unset($data[$item2remove]);
$data = array_flip($data);
But that's only good if you don't have duplicate items. Depending on your workload it might be advantageous to guarantee uniqueness of items too.