I need to determine the keys of values, that have duplicates from an array.
What I came up with is:
$duplicates_keys = array();
$unique = array_unique($in);
$duplicates = array_diff_assoc($in, $unique);
foreach ($in as $key => $val){
if (in_array($val,$duplicates)){
$duplicates_keys[]=$key;
}
}
Which works, but that's pretty resource intensive, is there a faster way to do this?
As per my comment, i doubt this is a bottleneck. However you can reduce your iterations to once through the array, as follows:
$temp=[];
$dup=[];
foreach ($in as $key=>$val) {
if(isset($temp[$val])){
$dup[]=$key;
}else{
$temp[$val]=0;
}
}
Note that the value is set as an array key in temp, so you can use O(1) isset rather than in_array, which must search the full array until the value is found.
This is theoretically faster than your example, but you would need to profile it to be sure (as you should have already done to ascertain your current code is slow).
Probably you can do something else that has a far greater impact, like caching or a better database query
Use array_intersect() for this.
$duplicates_keys = array_intersect($in, $duplicates);
array_intersect()
Related
I have an multidim. array with >2000 Values.
Which simple way is the most efficient to search a value within.
What i am especially curious about is if array_search() uses alphabetic narrowing down to be more efficient (there is no parameter though to indicate if alphanumeric).
$_array = ["abidabi", "beda", "cedi", "zamibula"];
$_target = "zamibula";
foreach($_array as $val)
{
if ($val == $_target)
{
echo $val;
}
vs.
echo $_array[array_search('zamibula', $_array)];
for sure my actual code is very more complicated (json, multidimensional array with massive data to populate an INPUT->SELECT->OPTIONS), so it already lags browserside.
What are your thoughts on the alphabetical search? Can i structure the data somehow to make it less consuming :)?
thanks
If all the values in the array are unique, than the fastest way is to switch values to keys and then call isset like so:
$_array = ["abidabi", "beda", "cedi", "zamibula"];
$_target = "zamibula";
$indexed = array_flip($_array);
var_dump(isset($indexed[$_target]));
This is almost constantly fast no matter the size of an array.
I looping through a large dataset (contained in a multidimensional associative array $values in this example) with many duplicate index values with the goal of producing an array containing only the unique values from a given index 'data'.
Currently I am doing this like:
foreach ($values as $value) {
$unique[$value['data']] = true;
}
Which accomplishes the objective because duplicate array keys simply get replaced. But this feels a bit odd since the indexes themselves don't actually contain any data.
It was suggested that I build the array first and then use array_unique() to removes duplicates. I'm inclined to stick with the former method but am wondering are there pitfalls or problems I should be aware of with this approach? Or any benefits to using array_unique() instead?
I would do it like this.
$unique = array();
foreach($values as $value) {
if(!in_array($value, $unique) {
$unique[] = value;
}
}
I need to check some input string against a huge (and growing) list of strings coming from a CSV file (1000000+). I currently load every string in an array and check against it via in_array(). Code looks like this:
$filter = array();
$filter = ReadFromCSV();
$input = array("foo","bar" /* more elements... */);
foreach($input as $i){
if(in_array($i,$filter)){
// do stuff
}
}
It already takes some time and I was wondering is there is a faster way to do this?
in_array() checks every element in the array until it finds a match. The average complexity is O(n).
Since you are comparing strings, you might store your input as array keys instead of values and look them up via array_key_exists(); which requires a constant time O(1).
Some code:
$filter = array();
$filter = ReadFromCSV();
$filter = array_flip($filter); // switch key <=> value
$input = array("foo","bar" /* more elements... */);
foreach($input as $i){
if(array_key_exists($i,$filter)){ // array_key_exists();
// do stuff
}
}
That's what indexes were invented for.
It's not a matter of in_array() speed, as the data grows, you should probably consider using indexes by loading data into a real DBMS.
It is my understanding that using isset to prevent duplicate values from being inserted into an array is the best method with regard to memory consumption, resource usage and ease of code processing. I am currently using the array_count_values, like this:
$XMLproducts = simplexml_load_file("products.xml");
foreach($XMLproducts->product as $Product) {
if (condition exists) {
$storeArray[] = (string)$Product->store; //potentially several more arrays will have values stored in them
}}
$storeUniques = array_count_values($storeArray)
foreach ($storeUniques as $stores => $amts) {
?>
<?php echo $stores; ?> <?php echo "(" . ($amts) . ")" . "<br>";
}
How would prevent duplicate values from being inserted into an array (similar to the above) using ISSET? And is there a big performance difference between the 2 if the XML file being parsed is very large (5-6MB)?
As you are using the count in your output, you cannot use array_unique() because you would loose that information.
What you could do, is build the array you need in your loop, using the string as your key and counting the values as you go:
$storeArray = array();
foreach($XMLproducts->product as $Product) {
if (condition exists) {
$store = (string)$Product->store;
if (array_key_exists($store, $storeArray))
{
$storeArray[$store]++;
}
else
{
$storeArray[$store] = 1;
}
}
}
Note that this is just to illustrate, you can probably wrap it up in one line.
This way you will not have multiple duplicate strings in your array (assuming that that is your concern) and you don't increase your memory consumption by generating a second (potentially big...) array.
I think array_unique and company are considered unfriendly because they check the database each time an entry is made. The code you're trying to write is doing essentially the same thing, so I don't see a problem with using array_unique.
Very simple, no checking required:
foreach($XMLproducts->product as $Product)
$helperArray[$product->store] = "";
associative arrays have unique keys by definition. if a key already exists, it is simply overwritten.
Now swap key and value:
$storeArray = array_keys($helperArray);
EDIT: to also count the number of occurences of each <store>, I suggest:
foreach($XMLproducts->product as $Product)
$helperArray[] = (string)$product->store;
And then:
$storeArray = array_count_values($helperArray);
Result: key = unique store, value = count.
If you have any array $p that you populated in a loop like so:
$p[] = array( "id"=>$id, "Name"=>$name);
What's the fastest way to search for John in the Name key, and if found, return the $p index? Is there a way other than looping through $p?
I have up to 5000 names to find in $p, and $p can also potentially contain 5000 rows. Currently I loop through $p looking for each name, and if found, parse it (and add it to another array), splice the row out of $p, and break 1, ready to start searching for the next of the 5000 names.
I was wondering if there if a faster way to get the index rather than looping through $p eg an isset type way?
Thanks for taking a look guys.
Okay so as I see this problem, you have unique ids, but the names may not be unique.
You could initialize the array as:
array($id=>$name);
And your searches can be like:
array_search($name,$arr);
This will work very well as native method of finding a needle in a haystack will have a better implementation than your own implementation.
e.g.
$id = 2;
$name= 'Sunny';
$arr = array($id=>$name);
echo array_search($name,$arr);
Echoes 2
The major advantage in this method would be code readability.
If you know that you are going to need to perform many of these types of search within the same request then you can create an index array from them. This will loop through the array once per index you need to create.
$piName = array();
foreach ($p as $k=>$v)
{
$piName[$v['Name']] = $k;
}
If you only need to perform one or two searches per page then consider moving the array into an external database, and creating the index there.
$index = 0;
$search_for = 'John';
$result = array_reduce($p, function($r, $v) use (&$index, $search_for) {
if($v['Name'] == $search_for) {
$r[] = $index;
}
++$index;
return $r;
});
$result will contain all the indices of elements in $p where the element with key Name had the value John. (This of course only works for an array that is indexed numerically beginning with 0 and has no “holes” in the index.)
Edit: Possibly even easier to just use array_filter, but that will not return the indices only, but all array element where Name equals John – but indices will be preserved:
$result2 = array_filter($p, function($elem) {
return $elem["Name"] == "John" ? true : false;
});
var_dump($result2);
What suits your needs better, resp. which one is maybe faster, is for you to figure out.