How does array_diff work? - php

How does array_diff() work? It obviously couldn't work as follows:
function array_diff($arraya, $arrayb)
{
$diffs = array();
foreach ($arraya as $keya => $valuea)
{
$equaltag = 0;
foreach ($arrayb as $valueb)
{
if ($valuea == $valueb)
{
$equaltag =1;
break;
}
}
if ($equaltag == o)
{
$diffs[$keya]=$valuea;
}
}
return $diffs;
} //couldn't be worse than this
Does anyone know a better solution?
EDIT #animuson:
function array_diff($arraya, $arrayb)
{
foreach ($arraya as $keya => $valuea)
{
if (in_array($valuea, $arrayb))
{
unset($arraya[$keya]);
}
}
return $arraya;
}

user187291's suggestion to do it in PHP via hash tables is simply great! In a rush of adrenaline taken from this phantastic idea, I even found a way to speed it up a little more (PHP 5.3.1):
function leo_array_diff($a, $b) {
$map = array();
foreach($a as $val) $map[$val] = 1;
foreach($b as $val) unset($map[$val]);
return array_keys($map);
}
With the benchmark taken from user187291's posting:
LEO=0.0322 leo_array_diff()
ME =0.1308 my_array_diff()
YOU=4.5051 your_array_diff()
PHP=45.7114 array_diff()
The array_diff() performance lag is evident even at 100 entries per array.
Note: This solution implies that the elements in the first array are unique (or they will become unique). This is typical for a hash solution.
Note: The solution does not preserve indices. Assign the original index to $map and finally use array_flip() to preserve keys.
function array_diff_pk($a, $b) {
$map = array_flip($a);
foreach($b as $val) unset($map[$val]);
return array_flip($map);
}
PS: I found this while looking for some array_diff() paradoxon: array_diff() took three times longer for practically the same task if used twice in the script.

UPDATE
see below for faster/better code.
array_diff behaviour is much better in php 5.3.4, but still ~10 times slower than Leo's function.
also it's worth noting that these functions are not strictly equivalent to array_diff since they don't maintain array keys, i.e. my_array_diff(x,y) == array_values(array_diff(x,y)).
/UPDATE
A better solution is to use hash maps
function my_array_diff($a, $b) {
$map = $out = array();
foreach($a as $val) $map[$val] = 1;
foreach($b as $val) if(isset($map[$val])) $map[$val] = 0;
foreach($map as $val => $ok) if($ok) $out[] = $val;
return $out;
}
$a = array('A', 'B', 'C', 'D');
$b = array('X', 'C', 'A', 'Y');
print_r(my_array_diff($a, $b)); // B, D
benchmark
function your_array_diff($arraya, $arrayb)
{
foreach ($arraya as $keya => $valuea)
{
if (in_array($valuea, $arrayb))
{
unset($arraya[$keya]);
}
}
return $arraya;
}
$a = range(1, 10000);
$b = range(5000, 15000);
shuffle($a);
shuffle($b);
$ts = microtime(true);
my_array_diff($a, $b);
printf("ME =%.4f\n", microtime(true) - $ts);
$ts = microtime(true);
your_array_diff($a, $b);
printf("YOU=%.4f\n", microtime(true) - $ts);
result
ME =0.0137
YOU=3.6282
any questions? ;)
and, just for fun,
$ts = microtime(true);
array_diff($a, $b);
printf("PHP=%.4f\n", microtime(true) - $ts);
result
ME =0.0140
YOU=3.6706
PHP=19.5980
that's incredible!

The best solution to know how it works it to take a look at its source-code ;-)
(Well, that's one of the powers of open source -- and if you see some possible optimization, you can submit a patch ;-) )
For array_diff, it should be in ext/standard -- which means, for PHP 5.3, it should be there : branches/PHP_5_3/ext/standard
And, then, the array.c file looks like a plausible target ; the php_array_diff function, line 3381, seems to correspond to array_diff.
(Good luck going through the code : it's quite long...)

It seems you can speed it up a good deal more by using another array instead of unsetting. Though, this uses more memory, which might be an issue depeding on the use-case (I haven't tested actual differences in memory allocation).
<?php
function my_array_diff($a, $b) {
$map = $out = array();
foreach($a as $val) $map[$val] = 1;
foreach($b as $val) if(isset($map[$val])) $map[$val] = 0;
foreach($map as $val => $ok) if($ok) $out[] = $val;
return $out;
}
function leo_array_diff($a, $b) {
$map = $out = array();
foreach($a as $val) $map[$val] = 1;
foreach($b as $val) unset($map[$val]);
return array_keys($map);
}
function flip_array_diff_key($b, $a) {
$at = array_flip($a);
$bt = array_flip($b);
$d = array_diff_key($bt, $at);
return array_keys($d);
}
function flip_isset_diff($b, $a) {
$at = array_flip($a);
$d = array();
foreach ($b as $i)
if (!isset($at[$i]))
$d[] = $i;
return $d;
}
function large_array_diff($b, $a) {
$at = array();
foreach ($a as $i)
$at[$i] = 1;
$d = array();
foreach ($b as $i)
if (!isset($at[$i]))
$d[] = $i;
return $d;
}
$functions = array("flip_array_diff_key", "flip_isset_diff", "large_array_diff", "leo_array_diff", "my_array_diff", "array_diff");
#$functions = array_reverse($functions);
$l = range(1, 1000000);
$l2 = range(1, 1000000, 2);
foreach ($functions as $function) {
$ts = microtime(true);
for ($i = 0; $i < 10; $i++) {
$f = $function($l, $l2);
}
$te = microtime(true);
$timing[$function] = $te - $ts;
}
asort($timing);
print_r($timing);
My timings are (PHP 5.3.27-1~dotdeb.0):
[flip_isset_diff] => 3.7415699958801
[flip_array_diff_key] => 4.2989008426666
[large_array_diff] => 4.7882599830627
[flip_flip_isset_diff] => 5.0816700458527
[leo_array_diff] => 11.086831092834
[my_array_diff] => 14.563184976578
[array_diff] => 99.379411935806
The three new functions were found at http://shiplu.mokadd.im/topics/performance-optimization/

As this has been brought up (see #BurninLeo's answer), what about something like this?
function binary_array_diff($a, $b) {
$result = $a;
asort($a);
asort($b);
list($bKey, $bVal) = each($b);
foreach ( $a as $aKey => $aVal ) {
while ( $aVal > $bVal ) {
list($bKey, $bVal) = each($b);
}
if ( $aVal === $bVal ) {
unset($result[$aKey]);
}
}
return $result;
}
After performing some tests, results seem to be acceptable:
$a = range(1, 10000);
$b = range(5000, 15000);
shuffle($a);
shuffle($b);
$ts = microtime(true);
for ( $n = 0; $n < 10; ++$n ) {
array_diff($a, $b);
}
printf("PHP => %.4f\n", microtime(true) - $ts);
$ts = microtime(true);
for ( $n = 0; $n < 10; ++$n ) {
binary_array_diff($a, $b);
}
printf("binary => %.4f\n", microtime(true) - $ts);
$binaryResult = binary_array_diff($a, $b);
$phpResult = array_diff($a, $b);
if ( $binaryResult == $phpResult && array_keys($binaryResult) == array_keys($phpResult) ) {
echo "returned arrays are the same\n";
}
Output:
PHP => 1.3018
binary => 1.3601
returned arrays are the same
Of course, PHP code cannot perform as good as C code, therefore there's no wonder that PHP code is a bit slower.

From PHP: "Returns an array containing all the entries from array1 that are not present in any of the other arrays."
So, you just check array1 against all arrayN and any values in array1 that don't appear in any of those arrays will be returned in a new array.
You don't necessarily even need to loop through all of array1's values. Just for all the additional arrays, loop through their values and check if each value is in_array($array1, $value).

Related

Explode an array php

I had a array of list like this :
A = (a.11, b.12, c.dd)
I want to store the above array values in two different arrays like
B = (a, b, c)
C = (11,12,dd)
I tried a lot but all in vain. i am bit new to php. please help me out in this regard. Your prompt response is highly appreciated
Thanx
Hope this will help you:
$a = array("a.11,b.12,c.dd");
$b = array();
$d = array();
foreach ($a as $val)
{
$c =explode(',', $val);
foreach ($c as $v)
{
$e =explode('.', $v);
array_push($b,$e[0]);
array_push($d,$e[1]);
}
}
print_r($b);
print_r($d);
Working demo
foreach($A as $v) {
$v = explode('.', $v);
$B[] = $v[0];
$C[] = $v[1]
}
Try this,
$C = [];
$B = array_map(function($v) use(&$C){$arr = explode('.', $v); $C[] = $v[1]; return $v[0];}, $A);

Checking a for progression in a list of variables

Let's say I want to check for simple mathematical progression. I understand I can do it like this:
if ($a<$b and $b<$c and $c<$d and $d<$e and $e<$f) { echo OK; }
Is there a way to do it in a more convenient way? Like so
if ($a..$f isprog(<)) { echo OK; }
I don 't know if I get your problem right. But propably the solution for your progression could be the SplHeap object of the SPL delivered with php.
$stack = new SplMaxHeap();
$stack->insert(1);
$stack->insert(3);
$stack->insert(2);
$stack->insert(4);
$stack->insert(5);
foreach ($stack as $value) {
echo $value . "\n";
}
// output will be: 5, 4, 3, 2, 1
I havent heard of something like this, but how about using simple function:
function checkProgress($vars){ //to make it easie i assume that vars can be given in an array
$result = true;
for ($i=0; $i<= count($vars); $i++){
if ($i>0 && $vars[$i] > $vars[$i-1]) continue;
$result = false;
}
return $result;
}
Solved it quick and dirty:
function ispositiveprogression($vars) {
$num=count($vars)-1;
while ($num) {
$result = true;
if ($vars[$num] > $vars[$num-1]) {
$num--;
}
else { $result = false; break; }
}
return $result;
}
Create an array of values, iterate over them and maintaining a flag that checks if the current element value is greater than / less than that of the next value. Unlike some of the solutions in this thread, this doesn't loop through the whole array. It stops looping when it discovers the first value that's not a progression. This will be a lot more faster if the operation involves a lot of numbers.
function checkIfProg($arr, $compare) {
$flag = true;
for ($i = 0, $c = count($arr); $i < $c; $i++) {
if ($compare == '<') {
if (isset($arr[$i + 1]) && $arr[$i] > $arr[$i + 1]) {
$flag = false;
break;
}
} elseif ($compare == '>') {
if (isset($arr[$i + 1]) && $arr[$i] < $arr[$i + 1]) {
$flag = false;
break;
}
}
}
return $flag;
}
Usage:
$a = 2;
$b = 3;
$c = 4;
$d = 5;
$e = 9;
$f = 22;
$arr = array($a, $b, $c, $d, $e, $f);
var_dump(checkIfProg($arr, '<')); // => bool(true)
If you want the array to be created dynamically, you could use some variable variable magic to achieve this:
$arr = array();
foreach (range('a','f') as $v) {
$arr[] = $$v;
}
This will create an array containing all the values of variables from $a ... $f.

What is the faster method than array_diff() on PHP? [duplicate]

How does array_diff() work? It obviously couldn't work as follows:
function array_diff($arraya, $arrayb)
{
$diffs = array();
foreach ($arraya as $keya => $valuea)
{
$equaltag = 0;
foreach ($arrayb as $valueb)
{
if ($valuea == $valueb)
{
$equaltag =1;
break;
}
}
if ($equaltag == o)
{
$diffs[$keya]=$valuea;
}
}
return $diffs;
} //couldn't be worse than this
Does anyone know a better solution?
EDIT #animuson:
function array_diff($arraya, $arrayb)
{
foreach ($arraya as $keya => $valuea)
{
if (in_array($valuea, $arrayb))
{
unset($arraya[$keya]);
}
}
return $arraya;
}
user187291's suggestion to do it in PHP via hash tables is simply great! In a rush of adrenaline taken from this phantastic idea, I even found a way to speed it up a little more (PHP 5.3.1):
function leo_array_diff($a, $b) {
$map = array();
foreach($a as $val) $map[$val] = 1;
foreach($b as $val) unset($map[$val]);
return array_keys($map);
}
With the benchmark taken from user187291's posting:
LEO=0.0322 leo_array_diff()
ME =0.1308 my_array_diff()
YOU=4.5051 your_array_diff()
PHP=45.7114 array_diff()
The array_diff() performance lag is evident even at 100 entries per array.
Note: This solution implies that the elements in the first array are unique (or they will become unique). This is typical for a hash solution.
Note: The solution does not preserve indices. Assign the original index to $map and finally use array_flip() to preserve keys.
function array_diff_pk($a, $b) {
$map = array_flip($a);
foreach($b as $val) unset($map[$val]);
return array_flip($map);
}
PS: I found this while looking for some array_diff() paradoxon: array_diff() took three times longer for practically the same task if used twice in the script.
UPDATE
see below for faster/better code.
array_diff behaviour is much better in php 5.3.4, but still ~10 times slower than Leo's function.
also it's worth noting that these functions are not strictly equivalent to array_diff since they don't maintain array keys, i.e. my_array_diff(x,y) == array_values(array_diff(x,y)).
/UPDATE
A better solution is to use hash maps
function my_array_diff($a, $b) {
$map = $out = array();
foreach($a as $val) $map[$val] = 1;
foreach($b as $val) if(isset($map[$val])) $map[$val] = 0;
foreach($map as $val => $ok) if($ok) $out[] = $val;
return $out;
}
$a = array('A', 'B', 'C', 'D');
$b = array('X', 'C', 'A', 'Y');
print_r(my_array_diff($a, $b)); // B, D
benchmark
function your_array_diff($arraya, $arrayb)
{
foreach ($arraya as $keya => $valuea)
{
if (in_array($valuea, $arrayb))
{
unset($arraya[$keya]);
}
}
return $arraya;
}
$a = range(1, 10000);
$b = range(5000, 15000);
shuffle($a);
shuffle($b);
$ts = microtime(true);
my_array_diff($a, $b);
printf("ME =%.4f\n", microtime(true) - $ts);
$ts = microtime(true);
your_array_diff($a, $b);
printf("YOU=%.4f\n", microtime(true) - $ts);
result
ME =0.0137
YOU=3.6282
any questions? ;)
and, just for fun,
$ts = microtime(true);
array_diff($a, $b);
printf("PHP=%.4f\n", microtime(true) - $ts);
result
ME =0.0140
YOU=3.6706
PHP=19.5980
that's incredible!
The best solution to know how it works it to take a look at its source-code ;-)
(Well, that's one of the powers of open source -- and if you see some possible optimization, you can submit a patch ;-) )
For array_diff, it should be in ext/standard -- which means, for PHP 5.3, it should be there : branches/PHP_5_3/ext/standard
And, then, the array.c file looks like a plausible target ; the php_array_diff function, line 3381, seems to correspond to array_diff.
(Good luck going through the code : it's quite long...)
It seems you can speed it up a good deal more by using another array instead of unsetting. Though, this uses more memory, which might be an issue depeding on the use-case (I haven't tested actual differences in memory allocation).
<?php
function my_array_diff($a, $b) {
$map = $out = array();
foreach($a as $val) $map[$val] = 1;
foreach($b as $val) if(isset($map[$val])) $map[$val] = 0;
foreach($map as $val => $ok) if($ok) $out[] = $val;
return $out;
}
function leo_array_diff($a, $b) {
$map = $out = array();
foreach($a as $val) $map[$val] = 1;
foreach($b as $val) unset($map[$val]);
return array_keys($map);
}
function flip_array_diff_key($b, $a) {
$at = array_flip($a);
$bt = array_flip($b);
$d = array_diff_key($bt, $at);
return array_keys($d);
}
function flip_isset_diff($b, $a) {
$at = array_flip($a);
$d = array();
foreach ($b as $i)
if (!isset($at[$i]))
$d[] = $i;
return $d;
}
function large_array_diff($b, $a) {
$at = array();
foreach ($a as $i)
$at[$i] = 1;
$d = array();
foreach ($b as $i)
if (!isset($at[$i]))
$d[] = $i;
return $d;
}
$functions = array("flip_array_diff_key", "flip_isset_diff", "large_array_diff", "leo_array_diff", "my_array_diff", "array_diff");
#$functions = array_reverse($functions);
$l = range(1, 1000000);
$l2 = range(1, 1000000, 2);
foreach ($functions as $function) {
$ts = microtime(true);
for ($i = 0; $i < 10; $i++) {
$f = $function($l, $l2);
}
$te = microtime(true);
$timing[$function] = $te - $ts;
}
asort($timing);
print_r($timing);
My timings are (PHP 5.3.27-1~dotdeb.0):
[flip_isset_diff] => 3.7415699958801
[flip_array_diff_key] => 4.2989008426666
[large_array_diff] => 4.7882599830627
[flip_flip_isset_diff] => 5.0816700458527
[leo_array_diff] => 11.086831092834
[my_array_diff] => 14.563184976578
[array_diff] => 99.379411935806
The three new functions were found at http://shiplu.mokadd.im/topics/performance-optimization/
As this has been brought up (see #BurninLeo's answer), what about something like this?
function binary_array_diff($a, $b) {
$result = $a;
asort($a);
asort($b);
list($bKey, $bVal) = each($b);
foreach ( $a as $aKey => $aVal ) {
while ( $aVal > $bVal ) {
list($bKey, $bVal) = each($b);
}
if ( $aVal === $bVal ) {
unset($result[$aKey]);
}
}
return $result;
}
After performing some tests, results seem to be acceptable:
$a = range(1, 10000);
$b = range(5000, 15000);
shuffle($a);
shuffle($b);
$ts = microtime(true);
for ( $n = 0; $n < 10; ++$n ) {
array_diff($a, $b);
}
printf("PHP => %.4f\n", microtime(true) - $ts);
$ts = microtime(true);
for ( $n = 0; $n < 10; ++$n ) {
binary_array_diff($a, $b);
}
printf("binary => %.4f\n", microtime(true) - $ts);
$binaryResult = binary_array_diff($a, $b);
$phpResult = array_diff($a, $b);
if ( $binaryResult == $phpResult && array_keys($binaryResult) == array_keys($phpResult) ) {
echo "returned arrays are the same\n";
}
Output:
PHP => 1.3018
binary => 1.3601
returned arrays are the same
Of course, PHP code cannot perform as good as C code, therefore there's no wonder that PHP code is a bit slower.
From PHP: "Returns an array containing all the entries from array1 that are not present in any of the other arrays."
So, you just check array1 against all arrayN and any values in array1 that don't appear in any of those arrays will be returned in a new array.
You don't necessarily even need to loop through all of array1's values. Just for all the additional arrays, loop through their values and check if each value is in_array($array1, $value).

Finding 4 highest values from an array

Instead of just 1, how can I pick the 4 highest values from an array using max()?
You could use an SplMaxHeap
function maxN(array $numbers, $n)
{
$maxHeap = new SplMaxHeap;
foreach($numbers as $number) {
$maxHeap->insert($number);
}
return iterator_to_array(
new LimitIterator($maxHeap, 0, $n)
);
}
Usage (demo):
print_r( maxN( array(7,54,2,4,26,7,82,4,34), 4 ) );
You could try this:
$a = array(3,5,6,1,23,6,78,99);
asort($a);
var_dump(array_slice($a, -4));
HTH.
This will do it in Θ(n) time:
$a = $b = $c = $d = null;
foreach($array as $v) {
if(!isset($a) || $v > $a) {
$d = $c;
$c = $b;
$b = $a;
$a = $v;
}elseif(!isset($b) || $v > $b) {
$d = $c;
$c = $b;
$b = $v;
}elseif(!isset($c) || $v > $c) {
$d = $c;
$c = $v;
}elseif(!isset($d) || $v > $d) {
$d = $v;
}
}
$result = array($a, $b, $c, $d);
function maxs($ar, $count=4)
{
$res = array();
foreach ($ar as $v)
{
for ($i = 0;$i < $count;$i++)
{
if ($i >= count($res) || $v > $res[$i])
{
do
{
$tmp = $res[$i];
$res[$i] = $v;
$v = $tmp;
$i++;
}
while ($i < $count);
break;
}
}
}
return $res;
}
A simple method using php predefined functions.
<?php
$arr = array(6, 8, 3, 2, 7, 9);
rsort($arr);
$first = array_shift($arr);
$second = array_shift($arr);
$third = array_shift($arr);
echo $first; // print 9
echo $second; // print 8
echo $third; // print 7
?>
While storing itself you can maintain another array as soon as the new item is inserted check with the max value in the inner array if the item being inserted is greater insert this item. During the item pop do viceversa. From the inner maintained array you can get as many max numbers as possible.

Sorting x and y coordinates in an array in PHP most efficently?

Currently I have an array that contains x and y coordinates of various positions.
ex.
$location[0]['x'] = 1; $location[0]['y'] = 1
This indicates id 0 has a position of (1,1).
Sometimes I want to sort this array by x, and other times by y.
Currently I am using array_multisort() to sort my data, but I feel this method is inefficient since every time before I sort, I must make a linear pass through the $location array just to build the index (on the x or y key) before I can invoke the array_multisort() command.
Does anyone know a better way to do this? Perhaps it is a bad idea even to store the data like this? Any suggestions would be great.
You could use usort() which lets you choose how your array elements are compared.
// sort by 'y'
usort($location, 'cmp_location_y');
// or sort by 'x'
usort($location, 'cmp_location_x');
// here are the comparison functions
function cmp_location_x($a, $b) {
return cmp_location($a, $b, 'x');
}
function cmp_location_y($a, $b) {
return cmp_location($a, $b, 'y');
}
function cmp_location($a, $b, $key) {
if ($a[$key] == $b[$key]) {
return 0;
} else if ($a[$key] < $b[$key]) {
return -1;
} else {
return 1;
}
}
You want to keep using multisort.
I made a quick benchmark of usort and array_multisort. Even at a count of only 10 multisort with building an index is faster than usort. At 100 elements it's about 5 times faster. At around 1000 elements improvement levels off right at a magnitude faster. User function calls are just too slow. I'm running 5.2.6
$count = 100;
for ($i = 0; $i < $count; $i++)
{
$temp = array('x' => rand(), 'y' => rand());
$data[] = $temp;
$data2[] = $temp;
}
function sortByX($a, $b) { return ($a['x'] > $b['x']); }
$start = microtime(true);
usort($data, "sortByX");
echo (microtime(true) - $start) * 1000000, "<br/>\n";
$start = microtime(true);
foreach ($data2 as $temp)
$s[] = $temp['x'];
array_multisort($s, SORT_NUMERIC, $data2);
echo (microtime(true) - $start) * 1000000, "<br/>\n";
PHP currently doesn't have an array_pluck function like ruby. Once it does you can replace this code
foreach ($data2 as $temp)
$s[] = $temp['x'];`
with
$s = array_pluck('x', $data2);
Something like what jcinacio said. With this class you can store and sort all sorts of data really, not just locations in different dimensions. You can implement other methods like remove etc as needed.
class Locations {
public $locations = array();
public $data = array();
public $dimensions = 2;
public function __construct($dimensions = null)
{
if (is_int($dimensions))
$this->dimensions = $dimensions;
}
public function addLocation()
{
$t = func_num_args();
if ($t !== $this->dimensions)
throw new Exception("This Locations object has {$this->dimensions} dimensions");
$args = func_get_args();
for ($i = 0; $i < $t; $i++)
$this->locations[$i][] = $args[$i];
return $this;
}
public function sortByDimension($dimension = 1)
{
if ($dimension > $this->dimensions)
throw new Exception("Wrong number of dimensions");
--$dimension;
$params[] = &$this->locations[$dimension];
for ($i = 0, $t = $this->dimensions; $i < $t; $i++) {
if ($i === $dimension)
continue;
$params[] = &$this->locations[$i];
}
call_user_func_array('array_multisort', $params);
return $this;
}
}
test data:
$loc = new Locations(3);
$loc
->addLocation(1, 1, 'A')
->addLocation(2, 3, 'B')
->addLocation(4, 2, 'C')
->addLocation(3, 2, 'D')
;
$loc->sortByDimension(1);
var_dump($loc->locations);
$loc->sortByDimension(2);
var_dump($loc->locations);
keeping the arrays and the multisort you have, changing the structure to something like the following would eliminate the need for a previous pass:
$locations = array(
'x' => $x_coordinates,
'y' => $y_coordinates,
'data' => $data_array
);
then just use the array_multisort() on all columns.

Categories