What is the best way to generate an MD5 (or any other hash) of a multi-dimensional array?
I could easily write a loop which would traverse through each level of the array, concatenating each value into a string, and simply performing the MD5 on the string.
However, this seems cumbersome at best and I wondered if there was a funky function which would take a multi-dimensional array, and hash it.
(Copy-n-paste-able function at the bottom)
As mentioned prior, the following will work.
md5(serialize($array));
However, it's worth noting that (ironically) json_encode performs noticeably faster:
md5(json_encode($array));
In fact, the speed increase is two-fold here as (1) json_encode alone performs faster than serialize, and (2) json_encode produces a smaller string and therefore less for md5 to handle.
Edit: Here is evidence to support this claim:
<?php //this is the array I'm using -- it's multidimensional.
$array = unserialize('a:6:{i:0;a:0:{}i:1;a:3:{i:0;a:0:{}i:1;a:0:{}i:2;a:3:{i:0;a:0:{}i:1;a:0:{}i:2;a:0:{}}}i:2;s:5:"hello";i:3;a:2:{i:0;a:0:{}i:1;a:0:{}}i:4;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:0:{}}}}}}}i:5;a:5:{i:0;a:0:{}i:1;a:4:{i:0;a:0:{}i:1;a:0:{}i:2;a:3:{i:0;a:0:{}i:1;a:0:{}i:2;a:0:{}}i:3;a:6:{i:0;a:0:{}i:1;a:3:{i:0;a:0:{}i:1;a:0:{}i:2;a:3:{i:0;a:0:{}i:1;a:0:{}i:2;a:0:{}}}i:2;s:5:"hello";i:3;a:2:{i:0;a:0:{}i:1;a:0:{}}i:4;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:0:{}}}}}}}i:5;a:5:{i:0;a:0:{}i:1;a:3:{i:0;a:0:{}i:1;a:0:{}i:2;a:3:{i:0;a:0:{}i:1;a:0:{}i:2;a:0:{}}}i:2;s:5:"hello";i:3;a:2:{i:0;a:0:{}i:1;a:0:{}}i:4;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:0:{}}}}}}}}}}i:2;s:5:"hello";i:3;a:2:{i:0;a:0:{}i:1;a:0:{}}i:4;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:1:{i:0;a:0:{}}}}}}}}}');
//The serialize test
$b4_s = microtime(1);
for ($i=0;$i<10000;$i++) {
$serial = md5(serialize($array));
}
echo 'serialize() w/ md5() took: '.($sTime = microtime(1)-$b4_s).' sec<br/>';
//The json test
$b4_j = microtime(1);
for ($i=0;$i<10000;$i++) {
$serial = md5(json_encode($array));
}
echo 'json_encode() w/ md5() took: '.($jTime = microtime(1)-$b4_j).' sec<br/><br/>';
echo 'json_encode is <strong>'.( round(($sTime/$jTime)*100,1) ).'%</strong> faster with a difference of <strong>'.($sTime-$jTime).' seconds</strong>';
JSON_ENCODE is consistently over 250% (2.5x) faster (often over 300%) -- this is not a trivial difference. You may see the results of the test with this live script here:
http://nathanbrauer.com/playground/serialize-vs-json.php
http://nathanbrauer.com/playground/plain-text/serialize-vs-json.php
Now, one thing to note is array(1,2,3) will produce a different MD5 as array(3,2,1). If this is NOT what you want. Try the following code:
//Optionally make a copy of the array (if you want to preserve the original order)
$original = $array;
array_multisort($array);
$hash = md5(json_encode($array));
Edit: There's been some question as to whether reversing the order would produce the same results. So, I've done that (correctly) here:
http://nathanbrauer.com/playground/json-vs-serialize.php
http://nathanbrauer.com/playground/plain-text/json-vs-serialize.php
As you can see, the results are exactly the same. Here's the (corrected) test originally created by someone related to Drupal:
http://nathanjbrauer.com/playground/drupal-calculation.php
http://nathanjbrauer.com/playground/plain-text/drupal-calculation.php
And for good measure, here's a function/method you can copy and paste (tested in 5.3.3-1ubuntu9.5):
function array_md5(Array $array) {
//since we're inside a function (which uses a copied array, not
//a referenced array), you shouldn't need to copy the array
array_multisort($array);
return md5(json_encode($array));
}
md5(serialize($array));
I'm joining a very crowded party by answering, but there is an important consideration that none of the extant answers address. The value of json_encode() and serialize() both depend upon the order of elements in the array!
Here are the results of not sorting and sorting the arrays, on two arrays with identical values but added in a different order (code at bottom of post):
serialize()
1c4f1064ab79e4722f41ab5a8141b210
1ad0f2c7e690c8e3cd5c34f7c9b8573a
json_encode()
db7178ba34f9271bfca3a05c5dddf502
c9661c0852c2bd0e26ef7951b4ca9e6f
Sorted serialize()
1c4f1064ab79e4722f41ab5a8141b210
1c4f1064ab79e4722f41ab5a8141b210
Sorted json_encode()
db7178ba34f9271bfca3a05c5dddf502
db7178ba34f9271bfca3a05c5dddf502
Therefore, the two methods that I would recommend to hash an array would be:
// You will need to write your own deep_ksort(), or see
// my example below
md5( serialize(deep_ksort($array)) );
md5( json_encode(deep_ksort($array)) );
The choice of json_encode() or serialize() should be determined by testing on the type of data that you are using. By my own testing on purely textual and numerical data, if the code is not running a tight loop thousands of times then the difference is not even worth benchmarking. I personally use json_encode() for that type of data.
Here is the code used to generate the sorting test above:
$a = array();
$a['aa'] = array( 'aaa'=>'AAA', 'bbb'=>'ooo', 'qqq'=>'fff',);
$a['bb'] = array( 'aaa'=>'BBBB', 'iii'=>'dd',);
$b = array();
$b['aa'] = array( 'aaa'=>'AAA', 'qqq'=>'fff', 'bbb'=>'ooo',);
$b['bb'] = array( 'iii'=>'dd', 'aaa'=>'BBBB',);
echo " serialize()\n";
echo md5(serialize($a))."\n";
echo md5(serialize($b))."\n";
echo "\n json_encode()\n";
echo md5(json_encode($a))."\n";
echo md5(json_encode($b))."\n";
$a = deep_ksort($a);
$b = deep_ksort($b);
echo "\n Sorted serialize()\n";
echo md5(serialize($a))."\n";
echo md5(serialize($b))."\n";
echo "\n Sorted json_encode()\n";
echo md5(json_encode($a))."\n";
echo md5(json_encode($b))."\n";
My quick deep_ksort() implementation, fits this case but check it before using on your own projects:
/*
* Sort an array by keys, and additionall sort its array values by keys
*
* Does not try to sort an object, but does iterate its properties to
* sort arrays in properties
*/
function deep_ksort($input)
{
if ( !is_object($input) && !is_array($input) ) {
return $input;
}
foreach ( $input as $k=>$v ) {
if ( is_object($v) || is_array($v) ) {
$input[$k] = deep_ksort($v);
}
}
if ( is_array($input) ) {
ksort($input);
}
// Do not sort objects
return $input;
}
Answer is highly depends on data types of array values.
For big strings use:
md5(serialize($array));
For short strings and integers use:
md5(json_encode($array));
4 built-in PHP functions can transform array to string:
serialize(), json_encode(), var_export(), print_r().
Notice: json_encode() function slows down while processing associative arrays with strings as values. In this case consider to use serialize() function.
Test results for multi-dimensional array with md5-hashes (32 char) in keys and values:
Test name Repeats Result Performance
serialize 10000 0.761195 sec +0.00%
print_r 10000 1.669689 sec -119.35%
json_encode 10000 1.712214 sec -124.94%
var_export 10000 1.735023 sec -127.93%
Test result for numeric multi-dimensional array:
Test name Repeats Result Performance
json_encode 10000 1.040612 sec +0.00%
var_export 10000 1.753170 sec -68.47%
serialize 10000 1.947791 sec -87.18%
print_r 10000 9.084989 sec -773.04%
Associative array test source.
Numeric array test source.
Aside from Brock's excellent answer (+1), any decent hashing library allows you to update the hash in increments, so you should be able to update with each string sequentially, instead having to build up one giant string.
See: hash_update
md5(serialize($array));
Will work, but the hash will change depending on the order of the array (that might not matter though).
Note that serialize and json_encode act differently when it comes to numeric arrays where the keys don't start at 0, or associative arrays.
json_encode will store such arrays as an Object, so json_decode returns an Object, where unserialize will return an array with exact the same keys.
I think that this could be a good tip:
Class hasharray {
public function array_flat($in,$keys=array(),$out=array()){
foreach($in as $k => $v){
$keys[] = $k;
if(is_array($v)){
$out = $this->array_flat($v,$keys,$out);
}else{
$out[implode("/",$keys)] = $v;
}
array_pop($keys);
}
return $out;
}
public function array_hash($in){
$a = $this->array_flat($in);
ksort($a);
return md5(json_encode($a));
}
}
$h = new hasharray;
echo $h->array_hash($multi_dimensional_array);
Important note about serialize()
I don't recommend to use it as part of hashing function because it can return different result for the following examples. Check the example below:
Simple example:
$a = new \stdClass;
$a->test = 'sample';
$b = new \stdClass;
$b->one = $a;
$b->two = clone $a;
Produces
"O:8:"stdClass":2:{s:3:"one";O:8:"stdClass":1:{s:4:"test";s:6:"sample";}s:3:"two";O:8:"stdClass":1:{s:4:"test";s:6:"sample";}}"
But the following code:
<?php
$a = new \stdClass;
$a->test = 'sample';
$b = new \stdClass;
$b->one = $a;
$b->two = $a;
Output:
"O:8:"stdClass":2:{s:3:"one";O:8:"stdClass":1:{s:4:"test";s:6:"sample";}s:3:"two";r:2;}"
So instead of second object php just create link "r:2;" to the first instance. It's definitely good and correct way to serialize data, but it can lead to the issues with your hashing function.
// Convert nested arrays to a simple array
$array = array();
array_walk_recursive($input, function ($a) use (&$array) {
$array[] = $a;
});
sort($array);
$hash = md5(json_encode($array));
----
These arrays have the same hash:
$arr1 = array(0 => array(1, 2, 3), 1, 2);
$arr2 = array(0 => array(1, 3, 2), 1, 2);
I didn't see the solution so easily above so I wanted to contribute a simpler answer. For me, I was getting the same key until I used ksort (key sort):
Sorted first with Ksort, then performed sha1 on a json_encode:
ksort($array)
$hash = sha1(json_encode($array) //be mindful of UTF8
example:
$arr1 = array( 'dealer' => '100', 'direction' => 'ASC', 'dist' => '500', 'limit' => '1', 'zip' => '10601');
ksort($arr1);
$arr2 = array( 'direction' => 'ASC', 'limit' => '1', 'zip' => '10601', 'dealer' => '100', 'dist' => '5000');
ksort($arr2);
var_dump(sha1(json_encode($arr1)));
var_dump(sha1(json_encode($arr2)));
Output of altered arrays and hashes:
string(40) "502c2cbfbe62e47eb0fe96306ecb2e6c7e6d014c"
string(40) "b3319c58edadab3513832ceeb5d68bfce2fb3983"
there are several answers telling to use json_code,
but json_encode don't work fine with iso-8859-1 string, as soon as there is a special char, the string is cropped.
i would advice to use var_export :
md5(var_export($array, true))
not as slow as serialize, not as bugged as json_encode
Currently the most up-voted answer md5(serialize($array)); doesn't work well with objects.
Consider code:
$a = array(new \stdClass());
$b = array(new \stdClass());
Even though arrays are different (they contain different objects), they have same hash when using md5(serialize($array));. So your hash is useless!
To avoid that problem, you can replace objects with result of spl_object_hash() before serializing. You also should do it recursively if your array has multiple levels.
Code below also sorts arrays by keys, as dotancohen have suggested.
function replaceObjectsWithHashes(array $array)
{
foreach ($array as &$value) {
if (is_array($value)) {
$value = $this->replaceObjectsInArrayWithHashes($value);
} elseif (is_object($value)) {
$value = spl_object_hash($value);
}
}
ksort($array);
return $array;
}
Now you can use md5(serialize(replaceObjectsWithHashes($array))).
(Note that the array in PHP is value type. So replaceObjectsWithHashes function DO NOT change original array.)
in some case maybe it's better to use http_build_query to convert array to string :
md5( http_build_query( $array ) );
Related
I personally like that title. My question is about the simplest and yet most secured way to find out if an array is contained in another array of arrays.
Here's my sample code to explaine a little bit more clear:
$container = array();
$array1 = array('A','B','C');
$container[] = $array1;
$array2 = array();
$array2[2] = 'C';
$array2[1] = 'B';
$array2[0] = 'A'; //now, the array is physically the same as $array1
if (in_array($array2,$container)) {
echo "is inside";
}
If I have more complex array (no objects in it) which contains several keys which may get added in different order, but are physically the same, does in_array compare reliable, or do I have to check every key itself?
You car use the native function PHP array_walk_recursive with your custom callback.
I have an array like:
$array = array('foo' => 'bar', 33 => 'bin', 'lorem' => 'ipsum');
echo native_function($array, 0); // bar
echo native_function($array, 1); // bin
echo native_function($array, 2); // ipsum
So, this native function would return a value based on a numeric index (second arg), ignoring assoc keys, looking for the real position in array.
Are there any native function to do that in PHP or should I write it?
Thanks
$array = array('foo' => 'bar', 33 => 'bin', 'lorem' => 'ipsum');
$array = array_values($array);
echo $array[0]; //bar
echo $array[1]; //bin
echo $array[2]; //ipsum
array_values() will do pretty much what you want:
$numeric_indexed_array = array_values($your_array);
// $numeric_indexed_array = array('bar', 'bin', 'ipsum');
print($numeric_indexed_array[0]); // bar
I am proposing my idea about it against any disadvantages array_values( ) function, because I think that is not a direct get function.
In this way it have to create a copy of the values numerically indexed array and then access. If PHP does not hide a method that automatically translates an integer in the position of the desired element, maybe a slightly better solution might consist of a function that runs the array with a counter until it leads to the desired position, then return the element reached.
So the work would be optimized for very large array of sizes, since the algorithm would be best performing indices for small, stopping immediately. In the solution highlighted of array_values( ), however, it has to do with a cycle flowing through the whole array, even if, for e.g., I have to access $ array [1].
function array_get_by_index($index, $array) {
$i=0;
foreach ($array as $value) {
if($i==$index) {
return $value;
}
$i++;
}
// may be $index exceedes size of $array. In this case NULL is returned.
return NULL;
}
Yes, for scalar values, a combination of implode and array_slice will do:
$bar = implode(array_slice($array, 0, 1));
$bin = implode(array_slice($array, 1, 1));
$ipsum = implode(array_slice($array, 2, 1));
Or mix it up with array_values and list (thanks #nikic) so that it works with all types of values:
list($bar) = array_values(array_slice($array, 0, 1));
is there a command to know if an array has their keys as string or plain int?
Like:
$array1 = array('value1','value2','value3');
checkArr($array1); //> Returns true because there aren't keys as string
And:
$array2 = array('key1'=>'value1','key2'=>'value2','value3');
checkArr($array2); //> Returns false because there are keys as string
Note: I know I can parse all the array to check it.
The "compact" version to test this is:
$allnumeric =
array_sum(array_map("is_numeric", array_keys($array))) == count($array);
#Gumbo's suggestion is 1 letter shorter and could very well be a bit speedier for huge arrays:
count(array_filter(array_keys($array), "is_numeric")) == count($array);
You can use array_keys to obtain the keys for the array and then analyse the resultant array.
look at array_keys() if values are int - you got ints if strings -> strings
If you want to check the array's keys, it would be probably better to use something like this:
reset($array);
while (($key = key($array)) !== null) {
// check the key, for example:
if (is_string($key)) {
// ...
}
next($array);
}
This will be most performant, as there are no extraneous copies made of variables that you are not going to use.
On the other hand, this way is probably the most readable and makes the intent crystal clear:
$keys = array_keys($array);
foreach($keys as $key) {
// check the key, for example:
if (is_string($key)) {
// ...
}
}
Take your pick.
Important note:
Last time I checked, PHP would not let you have keys which are string representations of integers. For example:
$array = array();
$array["5"] = "foo";
echo $array[5]; // You might think this will not work, but it will
So keep that in mind when you are checking what the types of such keys are: they might have been created as strings, but PHP will have converted them to integers behind the scenes.
Lets assume we have two PHP-Arrays:
$some_array = array('a','b','c','d','e','f');
$another_array = array(101,102,103,104,105,106);
In PHP, there are already some Array-Functions that allow the construction of an Associative Array (AKA hash), e,g,;
$hash = array_combine(
$some_array,
$another_array
);
And here it comes. What should I do if I want to create a hash in a more functional style, if I want to compute key and value on the fly and build the hash through a map-operation, like (not working):
# wishful thinking
$hash = array_map(
function($a, $b){ return ($a => $b); },
$some_array,
$another_array
);
The problem here seems to be the expectation, the line
...
function($a, $b){ return ($a => $b); }
...
would indeed return an key value/pair of a hash - which it doesn't.
Q: How can I return a key/value pair from a function - which can be used to build up an associative array?
Addendum
To make clear what I really was looking for, I'll provide a perl example of hash generation:
...
# we have one array of characters (on which our hash generation is based)
my #array = qw{ a b c d e f };
# now *generate* a hash with array contents as keys and
# ascii numbers as values in a *single operation *
# (in Perl, the $_ variable represents the actual array element)
my %hash = map +($_ => ord $_), #array;
...
Result (%hash):
a => 97
b => 98
c => 99
d => 100
e => 101
f => 102
From the responses, I'd now think this is impossible in PHP. Thanks to all respondends.
EDIT: It's not entirely clear whether you're having a problem merely with returning multiple variables from a function, or whether you're having problems storing a function in an array. Your post gives the impression that storing the function in the array works, so I'll tackle the return-multiple-variables problem.
There is no way to return a single instance of a key/value pair in PHP. You have to have them in an array... but remember that in PHP, an array and hashmap are exactly the same thing. It's weird (and controversial), but that means it's perfectly legitimate to return an array/hashmap with the multiple values you wish to return.
There are only two sane ways that I know (from 10+ years of PHP experience) to get more than one variable out of a function. One is the good'ol fashion way of making the input variable changeable.
function vacuumPackSandwitch(&$a, &$b) {
$a = 505;
$b = 707;
}
This will change both $a and $b as opposed to changing copies of them like usual. For example:
$a = 1;
$b = 2;
vacuumPackSandwitch($a, $b);
print $a.' '.$b;
This will return "505 707", not "1 2" like normally. You might have to do:
vacuumPackSandwitch(&$a, &$b);
But if that's the case, the PHP compiler will duly let you know.
The other way is to return an array, which I suppose is the clearer and preferred way.
function ($a, $b) {
return array($a, $b);
}
You can grab both variables at the same time by doing:
list($c, $d) = vacuumPackSandwitch($a, $b);
Hope it helps!
In PHP arrays can be associative too! An int-indexed array is just an associative array with 0 => element0 and so on. Thus, you can accomplish what you are looking for like this:
$somearray = array('name' => 'test', 'pass' => '***', 'group' => 'admin');
function myfunc($a, $b) { return array($a => $b); }
if($somearray['name'] == 'test') { echo 'it works!'; }
// Let's add more data with the function now
$somearray += myfunc('location', 'internet');
//Test the result
if($somearray['location'] == 'internet') { echo 'it works again!'; }
It is really very simple. Hope this helps.
I know that this is an old question, but google still likes it for a search I recently made, so I'll post my findings. There are two ways to do this that come close to what you're attempting, both relying on the same general idea.
Idea 1:
Instead of returning a key => value pair, we return an array with only one element, 'key => value', for each sequential element of the original arrays. Then, we reduce these arrays, merging at every step.
$array = array_map(
function($a, $b){
return array($a => $b);
},
$arr1,
$arr2
);
$array = array_reduce(
$array,
function($carry, $element){
$carry = array_merge($carry, $element);
return $carry;
},
array()
);
OR
Idea 2:
Similar to idea one, but we do the key => value assignment in array_reduce. We pass NULL to array_map, which creates an array of arrays (http://php.net/manual/en/function.array-map.php)
$array = array_map(NULL, $a, $b);
$array = array_reduce(
$array,
function($carry, $element){
$carry[$element[0]] = $element[1];
return $carry;
},
array()
);
Personally, I find Idea 2 to be a lot more elegant than Idea 1, though it requires knowing that passing NULL as the function to array_map creates an array of arrays and is therefore somewhat un-intuitive. I just think of it as a precursor to array_reduce, where all the business happens.
Idea 3:
$carry = array();
$uselessArray = array_map(
function($a, $b) use ($carry){
$carry[$a] = $b;
},
$a,
$b
);
Idea 3 is an alternative to Idea 2, but I think it's hackier than Idea 2. We have to use 'use' to jump out of the function's scope, which is pretty ugly and probably contrary to the functional style OP was seeking.
Lets just streamline Idea 2 a little and see how that looks:
Idea 2(b):
$array = array_reduce(
array_map(NULL, $a, $b),
function($carry, $element){
$carry[$element[0]] = $element[1];
return $carry;
},
array()
);
Yeah, that's nice.
What exactly is the difference between array_map, array_walk and array_filter. What I could see from documentation is that you could pass a callback function to perform an action on the supplied array. But I don't seem to find any particular difference between them.
Do they perform the same thing?
Can they be used interchangeably?
I would appreciate your help with illustrative example if they are different at all.
Changing Values:
array_map cannot change the values inside input array(s) while array_walk can; in particular, array_map never changes its arguments.
Array Keys Access:
array_map cannot operate with the array keys, array_walk can.
Return Value:
array_map returns a new array, array_walk only returns true. Hence, if you don't want to create an array as a result of traversing one array, you should use array_walk.
Iterating Multiple Arrays:
array_map also can receive an arbitrary number of arrays and it can iterate over them in parallel, while array_walk operates only on one.
Passing Arbitrary Data to Callback:
array_walk can receive an extra arbitrary parameter to pass to the callback. This mostly irrelevant since PHP 5.3 (when anonymous functions were introduced).
Length of Returned Array:
The resulting array of array_map has the same length as that of the largest input array; array_walk does not return an array but at the same time it cannot alter the number of elements of original array; array_filter picks only a subset of the elements of the array according to a filtering function. It does preserve the keys.
Example:
<pre>
<?php
$origarray1 = array(2.4, 2.6, 3.5);
$origarray2 = array(2.4, 2.6, 3.5);
print_r(array_map('floor', $origarray1)); // $origarray1 stays the same
// changes $origarray2
array_walk($origarray2, function (&$v, $k) { $v = floor($v); });
print_r($origarray2);
// this is a more proper use of array_walk
array_walk($origarray1, function ($v, $k) { echo "$k => $v", "\n"; });
// array_map accepts several arrays
print_r(
array_map(function ($a, $b) { return $a * $b; }, $origarray1, $origarray2)
);
// select only elements that are > 2.5
print_r(
array_filter($origarray1, function ($a) { return $a > 2.5; })
);
?>
</pre>
Result:
Array
(
[0] => 2
[1] => 2
[2] => 3
)
Array
(
[0] => 2
[1] => 2
[2] => 3
)
0 => 2.4
1 => 2.6
2 => 3.5
Array
(
[0] => 4.8
[1] => 5.2
[2] => 10.5
)
Array
(
[1] => 2.6
[2] => 3.5
)
The idea of mapping an function to array of data comes from functional programming. You shouldn't think about array_map as a foreach loop that calls a function on each element of the array (even though that's how it's implemented). It should be thought of as applying the function to each element in the array independently.
In theory such things as function mapping can be done in parallel since the function being applied to the data should ONLY affect the data and NOT the global state. This is because an array_map could choose any order in which to apply the function to the items in (even though in PHP it doesn't).
array_walk on the other hand it the exact opposite approach to handling arrays of data. Instead of handling each item separately, it uses a state (&$userdata) and can edit the item in place (much like a foreach loop). Since each time an item has the $funcname applied to it, it could change the global state of the program and therefor requires a single correct way of processing the items.
Back in PHP land, array_map and array_walk are almost identical except array_walk gives you more control over the iteration of data and is normally used to "change" the data in-place vs returning a new "changed" array.
array_filter is really an application of array_walk (or array_reduce) and it more-or-less just provided for convenience.
From the documentation,
bool array_walk ( array &$array , callback $funcname [, mixed $userdata ] ) <-return bool
array_walk takes an array and a function F and modifies it by replacing every element x with F(x).
array array_map ( callback $callback ,
array $arr1 [, array $... ] )<-return array
array_map does the exact same thing except that instead of modifying in-place it will return a new array with the transformed elements.
array array_filter ( array $input [,
callback $callback ] )<-return array
array_filter with function F, instead of transforming the elements, will remove any elements for which F(x) is not true
The other answers demonstrate the difference between array_walk (in-place modification) and array_map (return modified copy) quite well. However, they don't really mention array_reduce, which is an illuminating way to understand array_map and array_filter.
The array_reduce function takes an array, a two-argument function and an 'accumulator', like this:
array_reduce(array('a', 'b', 'c', 'd'),
'my_function',
$accumulator)
The array's elements are combined with the accumulator one at a time, using the given function. The result of the above call is the same as doing this:
my_function(
my_function(
my_function(
my_function(
$accumulator,
'a'),
'b'),
'c'),
'd')
If you prefer to think in terms of loops, it's like doing the following (I've actually used this as a fallback when array_reduce wasn't available):
function array_reduce($array, $function, $accumulator) {
foreach ($array as $element) {
$accumulator = $function($accumulator, $element);
}
return $accumulator;
}
This looping version makes it clear why I've called the third argument an 'accumulator': we can use it to accumulate results through each iteration.
So what does this have to do with array_map and array_filter? It turns out that they're both a particular kind of array_reduce. We can implement them like this:
array_map($function, $array) === array_reduce($array, $MAP, array())
array_filter($array, $function) === array_reduce($array, $FILTER, array())
Ignore the fact that array_map and array_filter take their arguments in a different order; that's just another quirk of PHP. The important point is that the right-hand-side is identical except for the functions I've called $MAP and $FILTER. So, what do they look like?
$MAP = function($accumulator, $element) {
$accumulator[] = $function($element);
return $accumulator;
};
$FILTER = function($accumulator, $element) {
if ($function($element)) $accumulator[] = $element;
return $accumulator;
};
As you can see, both functions take in the $accumulator and return it again. There are two differences in these functions:
$MAP will always append to $accumulator, but $FILTER will only do so if $function($element) is TRUE.
$FILTER appends the original element, but $MAP appends $function($element).
Note that this is far from useless trivia; we can use it to make our algorithms more efficient!
We can often see code like these two examples:
// Transform the valid inputs
array_map('transform', array_filter($inputs, 'valid'))
// Get all numeric IDs
array_filter(array_map('get_id', $inputs), 'is_numeric')
Using array_map and array_filter instead of loops makes these examples look quite nice. However, it can be very inefficient if $inputs is large, since the first call (map or filter) will traverse $inputs and build an intermediate array. This intermediate array is passed straight into the second call, which will traverse the whole thing again, then the intermediate array will need to be garbage collected.
We can get rid of this intermediate array by exploiting the fact that array_map and array_filter are both examples of array_reduce. By combining them, we only have to traverse $inputs once in each example:
// Transform valid inputs
array_reduce($inputs,
function($accumulator, $element) {
if (valid($element)) $accumulator[] = transform($element);
return $accumulator;
},
array())
// Get all numeric IDs
array_reduce($inputs,
function($accumulator, $element) {
$id = get_id($element);
if (is_numeric($id)) $accumulator[] = $id;
return $accumulator;
},
array())
NOTE: My implementations of array_map and array_filter above won't behave exactly like PHP's, since my array_map can only handle one array at a time and my array_filter won't use "empty" as its default $function. Also, neither will preserve keys.
It's not difficult to make them behave like PHP's, but I felt that these complications would make the core idea harder to spot.
The following revision seeks to more clearly delineate PHP's array_filer(), array_map(), and array_walk(), all of which originate from functional programming:
array_filter() filters out data, producing as a result a new array holding only the desired items of the former array, as follows:
<?php
$array = array(1, "apples",2, "oranges",3, "plums");
$filtered = array_filter( $array, "ctype_alpha");
var_dump($filtered);
?>
live code here
All numeric values are filtered out of $array, leaving $filtered with only types of fruit.
array_map() also creates a new array but unlike array_filter() the resulting array contains every element of the input $filtered but with altered values, owing to applying a callback to each element, as follows:
<?php
$nu = array_map( "strtoupper", $filtered);
var_dump($nu);
?>
live code here
The code in this case applies a callback using the built-in strtoupper() but a user-defined function is another viable option, too. The callback applies to every item of $filtered and thereby engenders $nu whose elements contain uppercase values.
In the next snippet, array walk() traverses $nu and makes changes to each element vis a vis the reference operator '&'. The changes occur without creating an additional array. Every element's value changes in place into a more informative string specifying its key, category and value.
<?php
$f = function(&$item,$key,$prefix) {
$item = "$key: $prefix: $item";
};
array_walk($nu, $f,"fruit");
var_dump($nu);
?>
See demo
Note: the callback function with respect to array_walk() takes two parameters which will automatically acquire an element's value and its key and in that order, too when invoked by array_walk(). (See more here).