Optimizing big string arrays for checking if a string exists queries - php

I have a big array that contains string values as the data. I want to optimize this array, so I can perform queries that check if a certain string exists in the array as quickly as possible. So let's say I create the array with $arr= []; and then add values like this:
foreach($names as $name)
$arr[]= $name;
And now I want to perform a lot of queries like if(in_array($random_string, $arr)), but it's pretty slow. I'd like to add some indexing for the array to optimize the performance. Should I simply use the sort() function for the array?
How to optimize an array with string data for checking if a string exists queries?
EDIT: No, obviously this is not a duplicate of "what is faster: in_array or isset? [closed]", and you can see that already by the answer by vivek_23.

I would suggest you to do sorting with binary search to know if a value exists. Time Complexity will be O(N log N) for sorting and O(log N) to search each individual element, where N is the number of elements in the array.
You could also create an associate array and check with the help of isset() to see if the value exists. But, hashing keys would make PHP to internally manage the hash structure consuming a bit of memory since you have big string arrays. Also, using isset($arr['some_key']) may not be necessarily an O(1) operation because of collisions.
Below is my code which uses binary search approach-
<?php
function checkIfValueExists($arr,$search_value){
$low = 0;
$high = count($arr) - 1;
while($low <= $high){
$mid = $low + intval(($high - $low) / 2);
$compare_result = strcmp($arr[$mid],$search_value);
if($compare_result === 0) return true;
else if($compare_result < 0) $low = $mid + 1;
else $high = $mid - 1;
}
return false;
}
The driver code to test above function-
<?php
$arr = array();
$str = "abcdefghijklmnopqrstuvwxyz";
$values_to_check = array();
for($i=1;$i<=50000;++$i){
$str_length = rand(1,50);
$new_str = "";
while($str_length-- > 0){
$new_str .= $str[rand(0,25)];
}
$arr[] = $new_str;
if(rand(0,1) === 1){
$values_to_check[] = rand(0,1) === 1 ? $new_str . $str[rand(0,25)] : $new_str;
}
}
// sort the array of strings.
sort($arr);
// test the functionality
foreach($values_to_check as $each_value){
var_dump(checkIfValueExists($arr,$each_value));
echo "<br/>";
}

Related

PHP - How to check values inside of an array that they don't have the same value? [duplicate]

I'm sure this is an extremely obvious question, and that there's a function that does exactly this, but I can't seem to find it. In PHP, I'd like to know if my array has duplicates in it, as efficiently as possible. I don't want to remove them like array_unique does, and I don't particularly want to run array_unique and compare it to the original array to see if they're the same, as this seems very inefficient. As far as performance is concerned, the "expected condition" is that the array has no duplicates.
I'd just like to be able to do something like
if (no_dupes($array))
// this deals with arrays without duplicates
else
// this deals with arrays with duplicates
Is there any obvious function I'm not thinking of?
How to detect duplicate values in PHP array?
has the right title, and is a very similar question, however if you actually read the question, he's looking for array_count_values.
I know you are not after array_unique(). However, you will not find a magical obvious function nor will writing one be faster than making use of the native functions.
I propose:
function array_has_dupes($array) {
// streamline per #Felix
return count($array) !== count(array_unique($array));
}
Adjust the second parameter of array_unique() to meet your comparison needs.
Performance-Optimized Solution
If you care about performance and micro-optimizations, check this one-liner:
function no_dupes(array $input_array) {
return count($input_array) === count(array_flip($input_array));
}
Description:
Function compares number of array elements in $input_array with array_flip'ed elements. Values become keys and guess what - keys must be unique in associative arrays so not unique values are lost and final number of elements is lower than original.
Warning:
As noted in the manual, array keys can be only type of int or string so this is what you must have in original array values to compare, otherwise PHP will start casting with unexpected results. See https://3v4l.org/7bRXI for an example of this fringe-case failure mode.
Proof for an array with 10 million records:
The top-voted solution by Jason McCreary: 14.187316179276s 🐌🐌🐌🐌🐌🐌🐌🐌🐌🐌🐌🐌🐌🐌
The accepted solution by Mike Sherov: 2.0736091136932s 🐌🐌
This answer's solution: 0.14155888557434s 🐌/10
Test case:
<?php
$elements = array_merge(range(1,10000000),[1]);
$time = microtime(true);
accepted_solution($elements);
echo 'Accepted solution: ', (microtime(true) - $time), 's', PHP_EOL;
$time = microtime(true);
most_voted_solution($elements);
echo 'Most voted solution: ', (microtime(true) - $time), 's', PHP_EOL;
$time = microtime(true);
this_answer_solution($elements);
echo 'This answer solution: ', (microtime(true) - $time), 's', PHP_EOL;
function accepted_solution($array){
$dupe_array = array();
foreach($array as $val){
// sorry, but I had to add below line to remove millions of notices
if(!isset($dupe_array[$val])){$dupe_array[$val]=0;}
if(++$dupe_array[$val] > 1){
return true;
}
}
return false;
}
function most_voted_solution($array) {
return count($array) !== count(array_unique($array));
}
function this_answer_solution(array $input_array) {
return count($input_array) === count(array_flip($input_array));
}
Notice that accepted solution might be faster in certain condition when not unique values are near the beginning of huge array.
You can do:
function has_dupes($array) {
$dupe_array = array();
foreach ($array as $val) {
if (++$dupe_array[$val] > 1) {
return true;
}
}
return false;
}
$hasDuplicates = count($array) > count(array_unique($array));
Will be true if duplicates, or false if no duplicates.
Here's my take on this… after some benchmarking, I found this to be the fastest method for this.
function has_duplicates( $array ) {
return count( array_keys( array_flip( $array ) ) ) !== count( $array );
}
…or depending on circumstances this could be marginally faster.
function has_duplicates( $array ) {
$array = array_count_values( $array );
rsort( $array );
return $array[0] > 1;
}
$duplicate = false;
if(count(array) != count(array_unique(array))){
$duplicate = true;
}
Keep it simple, silly! ;)
Simple OR logic...
function checkDuplicatesInArray($array){
$duplicates=FALSE;
foreach($array as $k=>$i){
if(!isset($value_{$i})){
$value_{$i}=TRUE;
}
else{
$duplicates|=TRUE;
}
}
return ($duplicates);
}
Regards!
To remove all the empty values from the comparison you can add array_diff()
if (count(array_unique(array_diff($array,array("")))) < count(array_diff($array,array(""))))
Reference taken from #AndreKR answer from here
Two ways to do it efficiently that I can think of:
inserting all the values into some sort of hashtable and checking whether the value you're inserting is already in it(expected O(n) time and O(n) space)
sorting the array and then checking whether adjacent cells are equal( O(nlogn) time and O(1) or O(n) space depending on the sorting algorithm)
stormdrain's solution would probably be O(n^2), as would any solution which involves scanning the array for each element searching for a duplicate
Find this useful solution
function get_duplicates( $array ) {
return array_unique( array_diff_assoc( $array, array_unique( $array ) ) );
}
After that count result if greater than 0 than duplicates else unique.
I'm using this:
if(count($array)==count(array_count_values($array))){
echo("all values are unique");
}else{
echo("there's dupe values");
}
I don't know if it's the fastest but works pretty good so far
One more solution from me, this is related to performance improvement
$array_count_values = array_count_values($array);
if(is_array($array_count_values) && count($array_count_values)>0)
{
foreach ($array_count_values as $key => $value)
{
if($value>1)
{
// duplicate values found here, write code to handle duplicate values
}
}
}
As you specifically said you didn't want to use array_unique I'm going to ignore the other answers despite the fact they're probably better.
Why don't you use array_count_values() and then check if the resulting array has any value greater than 1?
Php has an function to count the occurrences in the array http://www.php.net/manual/en/function.array-count-values.php
You can do it like that way also:
This will return true if unique else return false.
$nofollow = (count($modelIdArr) !== count(array_unique($modelIdArr))) ? true : false;
The simple solution but quite faster.
$elements = array_merge(range(1,10000000),[1]);
function unique_val_inArray($arr) {
$count = count($arr);
foreach ($arr as $i_1 => $value) {
for($i_2 = $i_1 + 1; $i_2 < $count; $i_2++) {
if($arr[$i_2] === $arr[$i_1]){
return false;
}
}
}
return true;
}
$time = microtime(true);
unique_val_inArray($elements);
echo 'This solution: ', (microtime(true) - $time), 's', PHP_EOL;
Speed - [0.71]!
function hasDuplicate($array){
$d = array();
foreach($array as $elements) {
if(!isset($d[$elements])){
$d[$elements] = 1;
}else{
return true;
}
}
return false;
}

PHP: Fastest way to handle undefined array key

in a very tight loop I need to access tens of thousands of values in an array containing millions of elements. The key can be undefined: In that case it shall be legal to return NULL without any error message:
Array key exists: return value of element.
Array key does not exist: return null.
I do know multiple solutions:
if (isset($lookup_table[$key])) {
return $lookup_table[$key];
} else {
return;
}
or
#return $lookup_table[$key];
or
error_reporting(0);
$return = $lookup_table[$key];
error_reporting(E_ALL);
return $return;
All solutions are far from optimal:
The first one requires 2 lookup in the B-TREE: One to check existence, another to retrieve value. That effectively doubles the runtime.
The second one uses the error suppression operator, and thus creates a massive overhead on that line.
The third one calls the error handler (that will check error_reporting setting and then display nothing) and thereby creates an overhead.
My question is if I miss a way to avoid error handling and yet work with a single Btree lookup?
To answer some questions:
The array caches the results of a complex calculation - to complex to be done in real time.
Out of billions of possible values, only millions yield a valid result. The array looks like 1234567 => 23457, 1234999 => 74361, .... That is saved to a PHP file of several megabyte, and include_once-d at the beginning of the execution. Initial load time does not matter.
If the key is not found, it simply means that this specific value will not return a valid result. The trouble is to get this done 50k+ per second.
Conclusion
Edit: outdated, check the accepted answer.
As there is no way found to get the value with a single lookup and without error handling, I have trouble accepting a single answer. Instead I upvoted all the great contributions.
The most valuable inputs where:
use array_key_exists, as it is faster than alternatives
Check out PHP's QuickHash
There was a lot of confusion on how PHP handles arrays. If you check the source code, you will see that all arrays are balanced trees. Building own lookup methods is common in C and C++, but is not performant in higher script-languages like PHP.
Update
Since PHP 7 you can accomplish this with the null coalesce operator:
return $table[$key] ?? null;
Old answer
First of all, arrays are not implemented as a B-tree, it's a hash table; an array of buckets (indexed via a hash function), each with a linked list of actual values (in case of hash collisions). This means that lookup times depend on how well the hash function has "spread" the values across the buckets, i.e. the number of hash collisions is an important factor.
Technically, this statement is the most correct:
return array_key_exists($key, $table) ? $table[$key] : null;
This introduces a function call and is therefore much slower than the optimized isset(). How much? ~2e3 times slower.
Next up is using a reference to avoid the second lookup:
$tmp = &$lookup_table[$key];
return isset($tmp) ? $tmp : null;
Unfortunately, this modifies the original $lookup_table array if the item does not exist, because references are always made valid by PHP.
That leaves the following method, which is much like your own:
return isset($lookup_table[$key]) ? $lookup_table[$key] : null;
Besides not having the side effect of references, it's also faster in runtime, even when performing the lookup twice.
You could look into dividing your arrays into smaller pieces as one way to mitigate long lookup times.
I did some bench marking with the following code:
set_time_limit(100);
$count = 2500000;
$search_index_end = $count * 1.5;
$search_index_start = $count * .5;
$array = array();
for ($i = 0; $i < $count; $i++)
$array[md5($i)] = $i;
$start = microtime(true);
for ($i = $search_index_start; $i < $search_index_end; $i++) {
$key = md5($i);
$test = isset($array[$key]) ? $array[$key] : null;
}
$end = microtime(true);
echo ($end - $start) . " seconds<br/>";
$start = microtime(true);
for ($i = $search_index_start; $i < $search_index_end; $i++) {
$key = md5($i);
$test = array_key_exists($key, $array) ? $array[$key] : null;
}
$end = microtime(true);
echo ($end - $start) . " seconds<br/>";
$start = microtime(true);
for ($i = $search_index_start; $i < $search_index_end; $i++) {
$key = md5($i);
$test = #$array[$key];
}
$end = microtime(true);
echo ($end - $start) . " seconds<br/>";
$error_reporting = error_reporting();
error_reporting(0);
$start = microtime(true);
for ($i = $search_index_start; $i < $search_index_end; $i++) {
$key = md5($i);
$test = $array[$key];
}
$end = microtime(true);
echo ($end - $start) . " seconds<br/>";
error_reporting($error_reporting);
$start = microtime(true);
for ($i = $search_index_start; $i < $search_index_end; $i++) {
$key = md5($i);
$tmp = &$array[$key];
$test = isset($tmp) ? $tmp : null;
}
$end = microtime(true);
echo ($end - $start) . " seconds<br/>";
and I found that the fastest running test was the one that uses isset($array[$key]) ? $array[$key] : null followed closely by the solution that just disables error reporting.
This Work for me
{{ isset($array['key']) ? $array['key']: 'Default' }}
but this is fast
{{ $array['key'] or 'Default' }}
There are two typical approaches to this.
Define defaults for an undefined key.
Check for undefined key.
Here is how to perform the first and as little code as possible.
$data = array_merge(array($key=>false),$data);
return $data[$key];
Here is how to perform the second.
return isset($data[$key]) ? $data[$key] : false;
Just a sudden idea that would have to be tested, but did you try using array_intersect_key() to get the existing values and a array_merge to fill() the rest ? It would remove the need of a loop to access the data. Something like that :
$searched_keys = array ('key1' => null, 'key2' => null); // the list of the keys to find
$exiting_values = array_intersect_key($lookup_table, $searched_keys);
$all_values = array_merge($searched_keys, $exiting_keys);
Please note that I did not tried it performance-wise.
The # operator and the error_reporting methods will both be slower than using isset. With both of these methods, it modifies the error reporting setting for PHP, but PHP's error handler will still be called. The error handler will check against the error_reporting setting and exit without reporting anything, however this has still taken time.
I prefer using the isset function instead of escaping the error.
I made a function to check the key exists and if not returns a default value, in the case of nested arrays you just need to add the other keys in order:
Nested array lookup:
/**
* Lookup array value.
*
* #param array $array
* #param array $keys
* #param $defaultValue
*/
public static function array_key_lookup($array, $keys, $defaultValue)
{
$value = $array;
foreach ($keys as $key) {
if (isset($value[$key])) {
$value = $value[$key];
} else {
$value = $defaultValue;
break;
}
}
return $value;
}
Usage example:
$array = [
'key1' => 'value1',
'key2' => 'value2',
'key3' => [
'key3a' => 'value3a',
'key3b' => 'value3b'
]
];
array_key_lookup($array, ['key3', 'key3a'], 'default')
'value3a'
array_key_lookup($array, ['key2', 'key2a'], 'default')
'default'
array_key_lookup($array, ['key2'], 'default')
'value2'
array_key_lookup($array, ['key5'], 'default')
'default'
Escaping the error:
$value = #$array[$key1][$key2] ?: $defaultValue;
First, re-organize the data for performance by saving a new array where the data is sorted by the keys, but the new array contains a regular numeric index.
This part will be time consuming, but only done once.
// first sort the array by it's keys
ksort($data);
// second create a new array with numeric index
$tmp = new array();
foreach($data as $key=>$value)
{
$tmp[] = array('key'=>$key,'value'=>$value);
}
// now save and use this data instead
save_to_file($tmp);
Once that is done it should be quick to find the key using a Binary Search. Later you can use a function like this.
function findKey($key, $data, $start, $end)
{
if($end < $start)
{
return null;
}
$mid = (int)(($end - $start) / 2) + $start;
if($data[$mid]['key'] > $key)
{
return findKey($key, $data, $start, $mid - 1);
}
else if($data[$mid]['key'] < $key)
{
return findKey($key, $data, $mid + 1, $end);
}
return $data[$mid]['value'];
}
To perform a search for a key you would do this.
$result = findKey($key, $data, 0, count($data));
if($result === null)
{
// key not found.
}
If the count($data) is done all the time, then you could cache that in the file that you stored the array data.
I suspect this method will be a lot faster in performance then a regular linear search that is repeated against the $data. I can't promise it's faster. Only an octree would be quicker, but the time to build the octree might cancel out the search performance (I've experienced that before). It depends on how much searching in the data you have to do.

PHP in_array() horrible performance. Fatest way to search array for value

I have the following simple code to test against collision on a primary key I am creating:
$machine_ids = array();
for($i = 0; $i < 100000; $i++) {
//Generate machine id returns a 15 character alphanumeric string
$mid = Functions::generate_machine_id();
if(in_array($mid, $machine_ids)) {
die("Collision!");
} else {
$machine_ids[] = $mid;
}
}
die("Success!");
Any idea why this is taking many minutes to run? Anyway to speed it up?
for($i = 0; $i < 100000; $i++)
{
//Generate machine id returns a 15 character alphanumeric string
$mid = Functions::generate_machine_id();
if (isset($machine_ids[$mid]))
{
die("Collision!");
}
$machine_ids[$mid] = true;
}
For this, use $mid as keys, and dummy value as value. Specifically, instead of
if(in_array($mid, $machine_ids)) {
die("Collision!");
} else {
$machine_ids[] = $mid;
}
use
if(isset($machine_ids[$mid])) {
die("Collision!");
} else {
$machine_ids[$mid] = 1;
}
At the end you can extract the array you originally wanted with array_keys($machine_ids).
This should be much faster. If it is still slow, then your Functions::generate_machine_id() is slow.
EDITED to add isset as per comments.
Checking for array membership is a O(n) operation, since you have to compare the value to every element in the array. After you add a whole bunch of stuff to the array, naturally it gets slower.
If you need to do a whole bunch of membership tests, as is the case here, you should use a different data structure that supports O(1) membership tests, such as a hash.
Refactor your code so that it uses a associated array to hold the machine IDs and use isset to check
if( isset($machine_id[$mid]) ) die("Collision");
$machine_ids[$mid] = $mid;
Using isset should be faster
If you need best performance for your case, you need store your data as array key and
use isset or array_key_exists(since php >= 7.4 array_key_exists is now as fast as isset) instead in_array.
Attention. It is true that isset on a hash map is faster than searching through an array for a value (in_array), but keep in mind
that converting an array of values, ["foo", "bar", "baz"], to a hash
map, ["foo" => true, "bar" => true, "baz" => true], incurs a memory
cost (as well as potentially constructing the hash map, depending on
how and when you do it). As with all things, you'll have to weigh the
pros & cons for each case to determine if a hash map or array (list)
of values works best for your needs. This isn't specific to PHP but
more of a general problem space of computer science.
And some performance tests from https://gist.github.com/alcaeus/536156663fac96744eba77b3e133e50a
<?php declare(strict_types = 1);
function testPerformance($name, Closure $closure, $runs = 1000000)
{
$start = microtime(true);
for (; $runs > 0; $runs--)
{
$closure();
}
$end = microtime(true);
printf("Function call %s took %.5f seconds\n", $name, $end - $start);
}
$items = [1111111];
for ($i = 0; $i < 100000; $i++) {
$items[] = rand(0, 1000000);
}
$items = array_unique($items);
shuffle($items);
$assocItems = array_combine($items, array_fill(0, count($items), true));
$in_array = function () use ($items) {
in_array(1111111, $items);
};
$isset = function () use ($assocItems) {
isset($items[1111111]);
};
$array_key_exists = function () use ($assocItems) {
array_key_exists(1111111, $assocItems);
};
testPerformance('in_array', $in_array, 100000);
testPerformance('isset', $isset, 100000);
testPerformance('array_key_exists', $array_key_exists, 100000);
Output:
Function call in_array took 5.01030 seconds
Function call isset took 0.00627 seconds
Function call array_key_exists took 0.00620 seconds

php: check if an array has duplicates

I'm sure this is an extremely obvious question, and that there's a function that does exactly this, but I can't seem to find it. In PHP, I'd like to know if my array has duplicates in it, as efficiently as possible. I don't want to remove them like array_unique does, and I don't particularly want to run array_unique and compare it to the original array to see if they're the same, as this seems very inefficient. As far as performance is concerned, the "expected condition" is that the array has no duplicates.
I'd just like to be able to do something like
if (no_dupes($array))
// this deals with arrays without duplicates
else
// this deals with arrays with duplicates
Is there any obvious function I'm not thinking of?
How to detect duplicate values in PHP array?
has the right title, and is a very similar question, however if you actually read the question, he's looking for array_count_values.
I know you are not after array_unique(). However, you will not find a magical obvious function nor will writing one be faster than making use of the native functions.
I propose:
function array_has_dupes($array) {
// streamline per #Felix
return count($array) !== count(array_unique($array));
}
Adjust the second parameter of array_unique() to meet your comparison needs.
Performance-Optimized Solution
If you care about performance and micro-optimizations, check this one-liner:
function no_dupes(array $input_array) {
return count($input_array) === count(array_flip($input_array));
}
Description:
Function compares number of array elements in $input_array with array_flip'ed elements. Values become keys and guess what - keys must be unique in associative arrays so not unique values are lost and final number of elements is lower than original.
Warning:
As noted in the manual, array keys can be only type of int or string so this is what you must have in original array values to compare, otherwise PHP will start casting with unexpected results. See https://3v4l.org/7bRXI for an example of this fringe-case failure mode.
Proof for an array with 10 million records:
The top-voted solution by Jason McCreary: 14.187316179276s 🐌🐌🐌🐌🐌🐌🐌🐌🐌🐌🐌🐌🐌🐌
The accepted solution by Mike Sherov: 2.0736091136932s 🐌🐌
This answer's solution: 0.14155888557434s 🐌/10
Test case:
<?php
$elements = array_merge(range(1,10000000),[1]);
$time = microtime(true);
accepted_solution($elements);
echo 'Accepted solution: ', (microtime(true) - $time), 's', PHP_EOL;
$time = microtime(true);
most_voted_solution($elements);
echo 'Most voted solution: ', (microtime(true) - $time), 's', PHP_EOL;
$time = microtime(true);
this_answer_solution($elements);
echo 'This answer solution: ', (microtime(true) - $time), 's', PHP_EOL;
function accepted_solution($array){
$dupe_array = array();
foreach($array as $val){
// sorry, but I had to add below line to remove millions of notices
if(!isset($dupe_array[$val])){$dupe_array[$val]=0;}
if(++$dupe_array[$val] > 1){
return true;
}
}
return false;
}
function most_voted_solution($array) {
return count($array) !== count(array_unique($array));
}
function this_answer_solution(array $input_array) {
return count($input_array) === count(array_flip($input_array));
}
Notice that accepted solution might be faster in certain condition when not unique values are near the beginning of huge array.
You can do:
function has_dupes($array) {
$dupe_array = array();
foreach ($array as $val) {
if (++$dupe_array[$val] > 1) {
return true;
}
}
return false;
}
$hasDuplicates = count($array) > count(array_unique($array));
Will be true if duplicates, or false if no duplicates.
Here's my take on this… after some benchmarking, I found this to be the fastest method for this.
function has_duplicates( $array ) {
return count( array_keys( array_flip( $array ) ) ) !== count( $array );
}
…or depending on circumstances this could be marginally faster.
function has_duplicates( $array ) {
$array = array_count_values( $array );
rsort( $array );
return $array[0] > 1;
}
$duplicate = false;
if(count(array) != count(array_unique(array))){
$duplicate = true;
}
Keep it simple, silly! ;)
Simple OR logic...
function checkDuplicatesInArray($array){
$duplicates=FALSE;
foreach($array as $k=>$i){
if(!isset($value_{$i})){
$value_{$i}=TRUE;
}
else{
$duplicates|=TRUE;
}
}
return ($duplicates);
}
Regards!
To remove all the empty values from the comparison you can add array_diff()
if (count(array_unique(array_diff($array,array("")))) < count(array_diff($array,array(""))))
Reference taken from #AndreKR answer from here
Two ways to do it efficiently that I can think of:
inserting all the values into some sort of hashtable and checking whether the value you're inserting is already in it(expected O(n) time and O(n) space)
sorting the array and then checking whether adjacent cells are equal( O(nlogn) time and O(1) or O(n) space depending on the sorting algorithm)
stormdrain's solution would probably be O(n^2), as would any solution which involves scanning the array for each element searching for a duplicate
Find this useful solution
function get_duplicates( $array ) {
return array_unique( array_diff_assoc( $array, array_unique( $array ) ) );
}
After that count result if greater than 0 than duplicates else unique.
I'm using this:
if(count($array)==count(array_count_values($array))){
echo("all values are unique");
}else{
echo("there's dupe values");
}
I don't know if it's the fastest but works pretty good so far
One more solution from me, this is related to performance improvement
$array_count_values = array_count_values($array);
if(is_array($array_count_values) && count($array_count_values)>0)
{
foreach ($array_count_values as $key => $value)
{
if($value>1)
{
// duplicate values found here, write code to handle duplicate values
}
}
}
As you specifically said you didn't want to use array_unique I'm going to ignore the other answers despite the fact they're probably better.
Why don't you use array_count_values() and then check if the resulting array has any value greater than 1?
Php has an function to count the occurrences in the array http://www.php.net/manual/en/function.array-count-values.php
You can do it like that way also:
This will return true if unique else return false.
$nofollow = (count($modelIdArr) !== count(array_unique($modelIdArr))) ? true : false;
The simple solution but quite faster.
$elements = array_merge(range(1,10000000),[1]);
function unique_val_inArray($arr) {
$count = count($arr);
foreach ($arr as $i_1 => $value) {
for($i_2 = $i_1 + 1; $i_2 < $count; $i_2++) {
if($arr[$i_2] === $arr[$i_1]){
return false;
}
}
}
return true;
}
$time = microtime(true);
unique_val_inArray($elements);
echo 'This solution: ', (microtime(true) - $time), 's', PHP_EOL;
Speed - [0.71]!
function hasDuplicate($array){
$d = array();
foreach($array as $elements) {
if(!isset($d[$elements])){
$d[$elements] = 1;
}else{
return true;
}
}
return false;
}

alternatives to php in_array for large arrays for avoiding duplicates entries

I need to generate a large list of random numbers from 600k to 2000k, but the
list can not have duplicates.
My current 'implementation' looks like this:
<?php
header('Content-type: text/plain');
$startTime = microtime(true);
$used = array();
for ($i=0; $i < 600000; ) {
$random = mt_rand();
//if (!in_array($random, $used)) {
$used[] = $random;
$i++;
//}
}
$endTime = microtime(true);
$runningTime = $endTime - $startTime;
echo 'Running Time: ' . $runningTime;
//print_r($used);
?>
If I keep the in_array test commented the processing time is around 1 second, so
the mt_rand calls and the used array filling are relatively 'cheap' but when I uncomment
the in_array test bad things happens! (I'm just waiting -it's been more then 10 minutes- for the script to terminate...)
So I'm looking for alternatives either on the duplicate detection side or in the generation part (How could i generate random numbers without the risk of getting duplicates)
I'm open to any suggestion.
For a quick/dirty solution, does using/checking array keys improve your speed at all?
$used = array();
for ($i = 0; $i < 600000; ) {
$random = mt_rand();
if (!isset($used[$random])) {
$used[$random] = $random;
$i++;
}
}
$used = array_values($used);
in_array requires to search the whole array in the worst case, that means linear costs (O(n)). But using the array key as – well – the key, the costs are constant (O(1)) since the costs for array access is always constant.
You could for example do something like this instead
$random = mt_rand();
$array = range($random, $random + 600000);
$array = shuffle($array);
That would create a array that first is in order, but then it shuffles the array, so the values will be random. No collisions! :D
If you do the looping anyways and if you don't need more than 600000 why would you check them at all, why not just append $i to $random. done. not random enough?
for ($i = 0; $i < 600000; $i++)
{
$yourArray[] = mt_rand() . $i;
}
Furthermore there is the array function array_unique, which removes duplicate values from an array.

Categories