php array_intersect() efficiency

php array_intersect() efficiency - php

consider the below script. two arrays with only three values.when i compare these two arrays using array_intersect(). the result is fast.
<?php
$arrayOne = array('3', '4', '5');
$arrayTwo = array('4', '5', '6');
$intersect = array_intersect($arrayOne, $arrayTwo);
print_r($intersect );
?>
my question is what is the efficiency of the array_intersect(). whether if we compare two arrays both having 1000 values each. would produce better result..... r we need to use some hash function to deal with finding common values quickly which will be effective???.. i need ur suggestion for this...
i am doing an application.if an person comes and login using facebook login.then the application will get his friends list and find whether any friends as commented in my app before and show it to him. roughly a friends may have 200 to300 friends in facebook and db has more than 1000 records. i need to find that efficiently how can i do that.......

Intersection can be implemented by constructing a set of the searched values in the second array, and looking up in a set can be made so fast that it takes essentially constant time on average. Therefore, the runtime of the whole algorithm can be in O(n).
Alternatively, one can sort the second array (in O(n log n)). Since looking up in a sorted array has a runtime in O(log n), the whole algorithm should then have a runtime in O(n log n).
According to a (short, unscientific) test I just ran, this seems to be the case for php's array_intersect:
Here's the code I used to test it. As you can see, for an input size as small as 1000, you don't need to worry.

array_intersect sorts the arrays before comparing their values in parallel (see the use of zend_qsort in the source file array.c). This alone takes O(n·log n) for each array. Then the actual intersection does only take linear time.
Depending on the values in your arrays, you could implement this intersection in linear time without the sorting, for example:
$index = array_flip($arrayOne);
foreach ($arrayTwo as $value) {
if (isset($index[$value])) unset($index[$value]);
}
foreach ($index as $value => $key) {
unset($arrayOne[$key]);
}
var_dump($arrayOne);

The fastest solution I found:
function arrayIntersect($arrayOne, $arrayTwo) {
$index = array_flip($arrayOne);
$second = array_flip($arrayTwo);
$x = array_intersect_key($index, $second);
return array_flip($x);
}
Tests I have made looks like below:
function intersect($arrayOne, $arrayTwo)
{
$index = array_flip($arrayOne);
foreach ($arrayTwo as $value) {
if (isset($index[$value])) unset($index[$value]);
}
foreach ($index as $value => $key) {
unset($arrayOne[$key]);
}
return $arrayOne;
}
function intersect2($arrayOne, $arrayTwo)
{
$index = array_flip($arrayOne);
$second = array_flip($arrayTwo);
$x = array_intersect_key($index, $second);
return array_flip($x);
}
for($i =0; $i < 1000000; $i++) {
$one[] = rand(0,1000000);
$two[] = rand(0,100000);
$two[] = rand(0,10000);
}
$one = array_unique($one);
$two = array_unique($two);
$time_start = microtime(true);
$res = intersect($one, $two);
$time = microtime(true) - $time_start;
echo "Sort time $time seconds 'intersect' \n";
$time_start = microtime(true);
$res2 = array_intersect($one, $two);
$time = microtime(true) - $time_start;
echo "Sort time $time seconds 'array_intersect' \n";
$time_start = microtime(true);
$res3 = intersect2($one, $two);
$time = microtime(true) - $time_start;
echo "Sort time $time seconds 'intersect2' \n";
Results from php 5.6 :
Sort time 0.77021193504333 seconds 'intersect'
Sort time 6.9765028953552 seconds 'array_intersect'
Sort time 0.4631941318512 seconds 'intersect2'

From what you state above, I would recommend you to implement a caching mechanism. That way you would of load the DB and speed up your application. I would also recommend you to profile the speed of array_intersect with increasing amount of data to see how performance scale. You could do this by simply wrapping the call in calls for the system time and calculate the difference. But I would recommend you to use a real profiler to get good data.

I implementing this simple code of comparing array_intersect and array_intersect_key,
$array = array();
for( $i=0; $i<130000; $i++)
$array[$i] = $i;
for( $i=200000; $i<230000; $i++)
$array[$i] = $i;
for( $i=300000; $i<340000; $i++)
$array[$i] = $i;
$array2 = array();
for( $i=100000; $i<110000; $i++)
$array2[$i] = $i;
for( $i=90000; $i<100000; $i++)
$array2[$i] = $i;
for( $i=110000; $i<290000; $i++)
$array2[$i] = $i;
echo 'Intersect to arrays -> array1[' . count($array) . '] : array2[' . count($array2) . '] ' . '<br>';
echo date('Y-m-d H:i:s') . '<br>';
$time = time();
$array_r2 = array_intersect_key($array,$array2);
echo 'Intercept key: ' . (time()-$time) . ' segs<br>';
$time = time();
$array_r = array_intersect($array,$array2);
echo 'Intercept: ' . (time()-$time) . ' segs<br>';
the result....
Intersect to arrays -> array1[200000] : array2[200000]
2014-10-30 08:52:52
Intercept key: 0 segs
Intercept: 4 segs
In this comparing of the efficency between array_intersect and array_intersect_key, we can see the interception with keys it is much faster.

Related

PHP array_combine & range VS foreach loop to get array of numbers

I have been thinking about this, for code readability I tend to use php built in functions like range & array_combine to generate an array of numbers like this:
array(
5 => 5,
10 => 10,
15 => 15,
20 => 20,
25 => 25,
...
60 => 60
);
So currently I use this to generate above:
$nums = range(5, 60, 5);
$nums = array_combine($nums, $nums);
I was wondering if there is a speed or memory difference between above approach and simply using loop like this:
for ($i = 5; $i <= 60; $i++) {
$nums[$i] = $i;
$i += 5;
}
I just want to know if my approach is good practice or if someone who would look at my code would try to find out where I live?

The following method seems to be fast for small numbers:
$tmp = range(5,$limit,5);
$tmp = array_combine($tmp, $tmp);
However, the for loop is much faster for bigger numbers:
for($i =5; $i<=$limit; $i += 5)
$tmp[$i] = $i;
Try the following code here:
<?php
$limit = 200;
$time_start = microtime(true);
$tmp = range(5,$limit,5);
$tmp = array_combine($tmp, $tmp);
$time_end = microtime(true);
echo $time_end - $time_start;
echo "<br/>";
//print_r($tmp);
echo "<br/>";
$time_start = microtime(true);
$tmp = array();
for($i =5; $i<=$limit; $i += 5)
$tmp[$i] = $i;
$time_end = microtime(true);
echo $time_end - $time_start;
echo "<br/>";
//print_r($tmp);
for $limit = 200 the first method is faster:
1=> 2.0980834960938E-5
2=> 2.1934509277344E-5
range & combination wins!
for $limit = 500 the second method is faster:
1=> 3.7908554077148E-5
2=> 2.9087066650391E-5
for loop wins!
So in my opinion, I would pick the second method (for loop) since for small number, even if the first method is faster, the time difference is negligible. However, for large numbers, the second method is always faster, and that what we care about in computer science, the worst time
Conclusion:
For loop is the winner!

What is better to use: in_array or array_unique?

I am in doubt what to use:
foreach(){
// .....
if(!in_array($view, $this->_views[$condition]))
array_push($this->_views[$condition], $view);
// ....
}
OR
foreach(){
// .....
array_push($this->_views[$condition], $view);
// ....
}
$this->_views[$condition] = array_unique($this->_views[$condition]);
UPDATE
The goal is to get array of unique values. This can be done by checking every time if value already exists with in_array or add all values each time and in the end use array_unique. So is there any major difference between this two ways?

I think the second approach would be more efficient. In fact, array_unique sorts the array then scans it.
Sorting is done in N log N steps, then scanning takes N steps.
The first approach takes N^2 steps (foreach element scans all N previous elements). On big arrays, there is a very big difference.

Honestly if you're using a small dataset it does not matter which one you use. If your dataset is in the 10000s you'll most definitely want to use a hash map for this sort of thing.
This is assuming the views are a string or something, which it looks like it is.
This is typically O(n) and possibly the fastest way to deal with tracking unique values.
foreach($views as $view)
{
if(!array_key_exists($view,$unique_views))
{
$unique_views[$condition][$view] = true;
}
}

TL;DR: foreach combined with if (!in_array()) is better.
Truthfully you should not really worry about what performs better; in most cases the difference is so small, its negligible (unless you're really doing some big data stuff). I would suggest to go with whatever seems more readable.
If you're interested, check out this script I wrote. It loops each case 100.000 times and both take between 50 and 200 ms.
https://3v4l.org/lkTCF
Note that array_unique() keeps the original keys so to counter that we also have to wrap the result with array_values().
In case the link ever dies:
<?php
$loops = 100000;
$start = microtime(true);
for ($l = 0; $l < $loops; $l++) {
$x = [1,2,3,4,6,7,8,9];
for ($i = 0; $i <= 10; $i++) {
if (!in_array($i, $x)) {
$x[] = $i;
}
}
}
$duration = microtime(true) - $start;
echo "in_array took $duration<br>".PHP_EOL;
$start = microtime(true);
for ($l = 0; $l < $loops; $l++) {
$x = [1,2,3,4,6,7,8,9];
$x = array_values(array_unique(array_merge($x, [0,1,2,3,4,5,6,7,8,9,10])));
}
$duration = microtime(true) - $start;
echo "array_unique took $duration<br>".PHP_EOL;

Random numbers with rand() and one seed

I have to generate a large number of unique keys. One key should consist of 16 digits. I came up with the following code:
function make_seed()
{
list($usec, $sec) = explode(' ', microtime());
return (float) $sec + ((float) $usec * 100000);
}
function generate_4_digits(){
$randval = rand(100, 9999);
if($randval < 1000){
$randval = '0'.$randval;
}
return (string)$randval;
}
function generate_cdkey(){
return generate_4_digits() . '-' . generate_4_digits() . '-' . generate_4_digits() . '-' . generate_4_digits();
}
srand(make_seed());
echo generate_cdkey();
The result was quite promising, 6114-0461-7825-1604.
Then I decided to generate 10 000 keys and see how many duplicates I get:
srand(make_seed());
$keys = array();
$duplicates = array();
for($i = 0; $i < 10000; $i++){
$new_key = generate_cdkey();
if(in_array($new_key, $keys)){
$duplicates[] = $new_key;
}
$keys[] = $new_key;
}
$keys_length = count($keys);
var_dump($duplicates);
echo '<pre>';
for($i = 0; $i < $keys_length; $i++){
echo $keys[$i] . "\n";
}
echo '</pre>';
On first run I got 1807 duplicates which was quite disappointing. But for my great surprise on each following run I get the same number of duplicates!? When I looked closely at the generated keys, I realized the last 1807 keys were exactly the same as the first ones. So I can generate 8193 without a single duplicate?! This is so close to 2^13?! Can we conclude rand() is suited to generate maz 2^13 unique numbers? But why?
I changed the code to use mt_rand() and I get no duplicates even when generating 50 000 keys.

This is probably more what you're looking for. openssl_random_pseudo_bytes ( int $length [, bool &$crypto_strong ] )

Throw some uniquid() in there.
http://www.php.net/manual/en/function.uniqid.php

This might be something to do with the behaviour of srand. When checking for duplicates you are only running srand once for all 10000 keys. Perhaps srand only produces enough for ~2^13 keys? What PHP version are you using? Since 4.2.0 srand isn't needed any more, but perhaps in if you call it anyway it stops doing it automatically for the rest of the script.

Fastest way of getting a character inside a string given the index (PHP)

I know of several ways to get a character off a string given the index.
<?php
$string = 'abcd';
echo $string[2];
echo $string{2};
echo substr($string, 2, 1);
?>
I don't know if there are any more ways, if you know of any please don't hesitate to add it. The question is, if I were to choose and repeat a method above a couple of million times, possibly using mt_rand to get the index value, which method would be the most efficient in terms of least memory consumption and fastest speed?

To arrive at an answer, you'll need to setup a benchmark test rig. Compare all methods over several (hundreds of thousands or millions) iterations on an idle box. Try the built-in microtime function to measure the difference between start and finish. That's your elapsed time.
The test should take you all of 2 minutes to write.
To save you some effort, I wrote a test. My own test shows that the functional solution (substr) is MUCH slower (expected). The idiomatic PHP ({}) solution is as fast as the index method. They are interchangeable. The ([]) is preferred, as this is the direction where PHP is going regarding string offsets.
<?php
$string = 'abcd';
$limit = 1000000;
$r = array(); // results
// PHP idiomatic string index method
$s = microtime(true);
for ($i = 0; $i < $limit; ++$i) {
$c = $string{2};
}
$r[] = microtime(true) - $s;
echo "\n";
// PHP functional solution
$s = microtime(true);
for ($i = 0; $i < $limit; ++$i) {
$c = substr($string, 2, 1);
}
$r[] = microtime(true) - $s;
echo "\n";
// index method
$s = microtime(true);
for ($i = 0; $i < $limit; ++$i) {
$c = $string[2];
}
$r[] = microtime(true) - $s;
echo "\n";
// RESULTS
foreach ($r as $i => $v) {
echo "RESULT ($i): $v \n";
}
?>
Results:
RESULT (PHP4 & 5 idiomatic braces syntax): 0.19106006622314
RESULT (string slice function): 0.50699090957642
RESULT (*index syntax, the future as the braces are being deprecated *): 0.19102001190186

how do I concatenate the string values of two arrays pairwise with PHP?

So I have two arrays
Array
(
[0] => test
[1] => test 1
[2] => test 2
[3] => test 3
)
and
Array
(
[0] => test
[1] => test 1
[2] => test 2
[3] => test 3
)
I want to combine them together so I get an array like this?
Array
(
[0] => test test
[1] => test 1 test 1
[2] => test 2 test 2
[3] => test 3 test 3
)
I have found lots of functions like array_merge and array_combine but nothing that does what I want to do.
Any ideas?
Thanks in advance.
Max

You could do it with array_map:
$combined = array_map(function($a, $b) { return $a . ' ' . $b; }, $a1, $a2));

Here is a one line solution if you are using Php 5.3.0+:
$result = array_map(function ($x, $y) { return $x.$y; }, $array1, $array2);

Many answers recommend the array_map way, and many the more trivial for loop way.
I think the array_map solution looks nicer and "more advanced" than looping over the arrays and building the concatenated array in a for loop, BUT - contrary to my expectations - it is much slower than a regular for.
I've run some tests with PHP Version 7.1.23-4 on ubuntu 16.04.1: with two arrays each containing 250k elements of 10 digit random numbers a for solution took 4.7004 sec for 20 runs, while the array_map solution took 11.7939 sec for 20 runs on my machine, almost 2.5 times slower!!!
I would have expected PHP to better optimise the built in array_map feature, than a for loop, but looks like the opposite.
The code I've tested:
// Init the test
$total_time_for = 0;
$total_time_arraymap = 0;
$array1 = [];
$array2 = [];
for ( $i = 1; $i < 250000; $i ++ ) {
$array1[] = mt_rand(1000000000,9999999999);
$array2[] = mt_rand(1000000000,9999999999);
}
// Init completed
for ( $j = 1; $j <= 20; $j ++ ) {
// Init for method
$array_new = [];
$startTime = microtime(true);
// Test for method
for ( $i = 0; $i < count($array1); $i ++ ) {
$array_new[] = $array1[$i] . " " . $array2[$i];
}
// End of test content
$endTime = microtime(true);
$elapsed = $endTime - $startTime;
$total_time_for += $elapsed;
//echo "for - Execution time : $elapsed seconds" . "\n";
unset($array_new);
//----
// Init array_map method
$array_new = [];
$startTime = microtime(true);
// Test array_map method
$array_new = array_map(function($a, $b) { return $a . ' ' . $b; }, $array1, $array2);
// End of test content
$endTime = microtime(true);
$elapsed = $endTime - $startTime;
$total_time_arraymap += $elapsed;
//echo "array_map - Execution time : $elapsed seconds" . "\n";
unset($array_new);
}
echo "for - Total execution time : $total_time_for seconds" . "\n";
echo "array_map - Total execution time : $total_time_arraymap seconds" . "\n";
Question arises than what array_map is good for? One possible answer that comes into my mind, is what if we have a predefined function somewhere, maybe in a 3rd party library, we'd like to apply to the arrays and we don't want to reimplement that function inside our for loop. array_map seems to be convenient in that case, to apply that function on our arrays. But is it any better, than calling the function from a for loop?
I've tested this as well, and looks like truly, array_map excels when using predefined functions. This time array_map took 8.7176 sec, while for loop took 12.8452 sec to do the same job as above.
The code I've tested:
// Init the test
$total_time_for = 0;
$total_time_arraymap = 0;
$array1 = [];
$array2 = [];
for ( $i = 1; $i <= 250000; $i ++ ) {
$array1[] = mt_rand(1000000000,9999999999);
$array2[] = mt_rand(1000000000,9999999999);
}
function combine($a, $b) {
return $a . ' ' . $b;
}
// Init completed
for ( $j = 1; $j <= 20; $j ++ ) {
// Init for method
$array_new = [];
$startTime = microtime(true);
// Test for method
for ( $i = 0; $i < count($array1); $i ++ ) {
$array_new[] = combine($array1[$i], $array2[$i]);
}
// End of test content
$endTime = microtime(true);
$elapsed = $endTime - $startTime;
$total_time_for += $elapsed;
//echo "for external function call - Execution time : $elapsed seconds" . "\n";
unset($array_new);
//----
// Init array_map method
$array_new = [];
$startTime = microtime(true);
// Test array_map method
$array_new = array_map('combine', $array1, $array2);
// End of test content
$endTime = microtime(true);
$elapsed = $endTime - $startTime;
$total_time_arraymap += $elapsed;
//echo "array_map external function call - Execution time : $elapsed seconds" . "\n";
unset($array_new);
}
echo "for external function call - Total execution time : $total_time_for seconds" . "\n";
echo "array_map external function call - Total execution time : $total_time_arraymap seconds" . "\n";
So long story short, the general conclusion:
Calling a predefined function: use array_map, it takes ~40% less time (8.7 sec vs. 12.8 sec )
Implementing the array manipulation right where needed: use for loop, it takes ~60% less time (4.7 sec vs. 11.8 sec).
Have a choice between using a predefined function or (re-)implementing it right where needed: use for loop and implement the required manipulations inside the loop, it takes ~45% less time ( 4.7 sec vs. 8.7 sec. ).
Based on this, in your particular use-case, use for loop and do the concatenation inside the loop body, without calling other functions.

you can do it like
for($i; $i<count($a); $i++)
{
$arr[$i] = $a[$i]." ".$b[$i];
}

Just loop through and assign the concatenation to a new array:
$array1=array("test","test 1","test 2","test 3");
$array2=array("x","y","z","w");
$new_array=array();
foreach (range(0,count($array1)-1) as $i)
{
array_push($new_array,$array1[$i] . $array2[$i]);
}

Assuming the two arrays are $array1 and $array2
for($x = 0; $x < count($array2); $x++){
$array1[$x] = $array1[$x] . ' ' . $array2[$x];
}

If you have data coming from two different querys and they become two different arrays, combining them is not always an answer.
There for when placed into an array ([]) they can be looped with a foreach to count how many, then looped together.
Note: they must have the same amount in each array or one may finish before the other…..
foreach ($monthlytarget as $value) {
// find how many results there were
$loopnumber++;
}
echo $loopnumber;
for ($i = 0; $i < $loopnumber; $i++) {
echo $shop[$i];
echo " - ";
echo $monthlytarget[$i];
echo "<br>";
}
This will then display: -
Tescos - 78
Asda - 89
Morrisons - 23
Sainsburys - 46
You can even add in the count number to show this list item number....

There's no built-in function (that I know of) to accomplish that. Use a loop:
$combined = array();
for($i = 0, $l = min(count($a1), count($a2)); $i < $l; ++$i) {
$combined[$i] = $a1[$i] . $a2[$i];
}
Adapt the loop to your liking: only concatenate the minimum number of elements, concatenate empty string if one of the arrays is shorter, etc.

you loop through it to create a new array. There's no built-in function. Welcome to the wonderful world of programming :)
Hints:
http://pt2.php.net/manual/en/control-structures.foreach.php
You can combine two strings with "."

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

php array_intersect() efficiency - php

Related

PHP array_combine & range VS foreach loop to get array of numbers

What is better to use: in_array or array_unique?

Random numbers with rand() and one seed

Fastest way of getting a character inside a string given the index (PHP)

how do I concatenate the string values of two arrays pairwise with PHP?

Categories

Resources