Is there a way to perform sorting on integers or strings in an instance of the SplFixedArray class? Is converting to a PHP's array, sorting, and then converting back being the only option?
Firstly, congratulations on finding and using SplFixedArrays! I think they're a highly under-utilised feature in vanilla PHP ...
As you've probably appreciated, their performance is unrivalled (compared to the usual PHP arrays) - but this does come at some trade-offs, including a lack of PHP functions to sort them (which is a shame)!
Implementing your own bubble-sort is a relatively easy and efficient solution. Just iterate through, looking at each consecutive pairs of elements, putting the highest on the right. Rinse and repeat until the array is sorted:
<?php
$arr = new SplFixedArray(10);
$arr[0] = 2345;
$arr[1] = 314;
$arr[2] = 3666;
$arr[3] = 93;
$arr[4] = 7542;
$arr[5] = 4253;
$arr[6] = 2343;
$arr[7] = 32;
$arr[8] = 6324;
$arr[9] = 1;
$moved = 0;
while ($moved < sizeof($arr) - 1) {
$i = 0;
while ($i < sizeof($arr) - 1 - $moved) {
if ($arr[$i] > $arr[$i + 1]) {
$tmp = $arr[$i + 1];
$arr[$i + 1] = $arr[$i];
$arr[$i] = $tmp;
}
$i++;
var_dump ($arr);
}
$moved++;
}
It's not fast, it's not efficient. For that you might consider Quicksort - there's documented examples online including this one at wikibooks.org (will need modification of to work with SplFixedArrays).
Seriously, beyond getting your question answered, I truly feel that forcing yourself to ask why things like SplFixedArray exist and forcing yourself to understand what goes on behind a "quick call to array_sort()" (and why it quickly takes a very long time to run) make the difference between programmers and programmers. I applaud your question!
Here's my adaptation of bubble sort using splFixedArrays. In PHP 7 this simple program is twice as fast as the regular bubblesort
function bubbleSort(SplFixedArray $a)
{
$len = $a->getSize() - 1;
$sorted = false;
while (!$sorted) {
$sorted = true;
for ($i = 0; $i < $len; $i++)
{
$current = $a->offsetGet($i);
$next = $a->offsetGet($i + 1);
if ( $next < $current ) {
$a->offsetSet($i, $next);
$a->offsetSet($i + 1, $current);
$sorted = false;
}
}
}
return $a
}
$starttime = microtime(true);
$array = SplFixedArray::fromArray([3,4,1,3,5,1,92,2,4124,424,52,12]);
$array = bubbleSort($array);
print_r($array->toArray());
echo (microtime(true) - $starttime) * 1000, PHP_EOL;
Related
I am using this library to work with fractions in PHP. This works fine but sometimes, I have to loop over a lot of values and this results in the following error:
Allowed memory size of 134217728 bytes exhausted
I can allocate more memory using PHP ini but that is a slippery slope. At some point, I am going to run out of memory when the loops are big enough.
Here is my current code:
for($q = 10; $q <= 20; $q++) {
for($r= 10; $r <= 20; $r++) {
for($p = 10; $p <= 20; $p++) {
for($s = 10; $s <= 20; $s++) {
for($x = 50; $x <= 100; $x++) {
for($y = 50; $y <= 100; $y++) {
$den = ($q + $r + 1000) - ($p + $s);
$num = $x + $y;
$c_diff = new Fraction($num, $den);
}
}
}
}
}
}
I used memory_get_peak_usage(true)/(1024*1024) to keep track of the memory the script is using. The total memory used was just 2MB until I added the line that creates a new fraction.
Could anyone please guide me on how to get rid of this error. I went through the code of the library posted on GitHub here but can't figure out how to get rid of the exhausted memory error. Is this because of the static keyword? I am beginner so I am not entirely sure what's going on.
The library code is about a 100 lines after removing the empty lines and comments. Any help would be highly appreciated.
UPDATE:
The script exhausts its memory even if I use just this block of code and nothing else. I definitely know that creating a new Fraction object is the cause of exhausting memory.
I thought that there was not need to unset() anything because the same one variable to store the new fractional value over and over again.
This leads me to think that whenever I creating a new Fraction object something else happens which in the library code that takes up memory which is not released on rewriting the value in the $c_diff variable.
I am not very good at this so I thought it has something to do with the static keyword used at a couple of places. Could anyone confirm it for me?
If this issue can indeed be resolved by using unset(), should I place it at the end of the loop?
Various Possible fixes and efficiencies:
You have 6 for loops, each loop cycles a single integer value within various ranges.
But your calculation only uses 3 values and so it doesn't matter if $p = 10; $s = 14; or $p = 13; $s = 11; These are entirely equivilant in the calculation.
All you need is the sum; so once you've found that the value 24 works; you can find all the parts (over the minimum value of 10) that fit that value: ie (24 (sum) - 10 (min) = 14), then collect the values within the range; so there are 10,14, 11,13 , 12,12, 13,11, 14,10 valid values. savng yourself 80%+ processing work on the inner for loops.
$pairs = "p,s<BR>"; //the set of paired values found
$other = $sum - $min;
if($other > $max){
$other = $sum - $max;
}
$hardMin = $min;
while ($other >= $hardMin && $min >= $hardMin && $min <= $max){
$pairs .= $min.", ".$other."<BR>";
$other--; // -1
$min++; // +1
}
print $pairs;
Giving:
p,s
10,14
11,13
12,12
13,11
14,10
So for this for loop already, you may only need to do ~10% of the total work cycling the inner loops.
Stop instantiating new classes. Creating a class is expensive. Instad you create one class and simply plug the values in:
Example:
$c_diff = new Fraction();
for(...){
for(...){
$c_diff->checkValuesOrWhateverMethod($num, $den)
}
}
This will save you significant overhead (depending on the structure of the class)
The code you linked on GitHub is simply to turn the value into a fraction and seems to be highly inefficient.
All you need is this:
function float2frac($n, $tolerance = 1.e-6) {
$h1=1; $h2=0;
$k1=0; $k2=1;
$b = 1/$n;
do {
$b = 1/$b;
$a = floor($b);
$aux = $h1; $h1 = $a*$h1+$h2; $h2 = $aux;
$aux = $k1; $k1 = $a*$k1+$k2; $k2 = $aux;
$b = $b-$a;
} while (abs($n-$h1/$k1) > $n*$tolerance);
return $h1."/".$k1;
}
Taken from this excellent answer.
Example:
for(...){
for(...){
$den = ($q + $r + 1000) - ($p + $s);
$num = $x + $y;
$value = $num/den;
$c_diff = float2frac($value);
unset($value,den,$num);
}
}
If you need more precision you can read this question and update PHP.ini as appropriate, but personally I would recommend you use more specialist maths languages such as Matlab or Haskell.
Putting it all together:
You want to check three values, and then find the equivilant part of each one.
You want to simply find the lowest common denominator fraction (I think).
So:
/***
* to generate a fraction with Lowest Common Denominator
***/
function float2frac($n, $tolerance = 1.e-6) {
$h1=1; $h2=0;
$k1=0; $k2=1;
$b = 1/$n;
do {
$b = 1/$b;
$a = floor($b);
$aux = $h1; $h1 = $a*$h1+$h2; $h2 = $aux;
$aux = $k1; $k1 = $a*$k1+$k2; $k2 = $aux;
$b = $b-$a;
} while (abs($n-$h1/$k1) > $n*$tolerance);
return $h1."/".$k1;
}
/***
* To find equivilants
***/
function find_equivs($sum = 1, $min = 1, $max = 2){
$value_A = $sum - $min;
$value_B = $min;
if($value_A > $max){
$value_B = $sum - $max;
$value_A = $max;
}
$output = "";
while ($value_A >= $min && $value_B <= $max){
if($value_A + $value_B == $sum){
$output .= $value_A . ", " . $value_B . "<BR>";
}
$value_A--; // -1
$value_B++; // +1
}
return $output;
}
/***
* Script...
***/
$c_diff = []; // an array of results.
for($qr = 20; $qr <= 40; $qr++) {
for($ps = 20; $ps <= 40; $ps++) {
for($xy = 100; $x <= 200; $xy++) {
$den = ($qr + 1000) - $ps;
$num = $xy;
$value = $num/$den; // decimalised
$c_diff[] = float2frac($num, $den);
/***
What is your criteria for success?
***/
if(success){
$qr_text = "Q,R<BR>";
$qr_text .= find_equivs($qr,10,20);
$sp_text = "S,P<BR>";
$sp_text .= find_equivs($sp,10,20);
$xy_text = "X,Y<BR>";
$xy_text .= find_equivs($sp,50,100);
}
}
}
}
This should do only a small percentage of the original looping.
I guess this isn't the entire block of code you are using.
This loop creates 50*50*10*10*10*10 = 25.000.000 Fraction objects. Consider using PHP's unset() to clean up memory, since you are allocating memory to create objects, but you never free it up.
editing for clarification
When you create anything in PHP, be it variable, array, object, etc. PHP allocates memory to store it and usually, the allocated memory is freed when script execution ends.
unset() is the way to tell PHP, "hey, I don't need this anymore. Can you, pretty please, free up the memory it takes?". PHP takes this into consideration and frees up the memory, when its garbage collector runs.
It is better to prevent memory exhaustion rather than feeding your script with more memory.
Allowed memory size of 134217728 bytes exhausted
134217728 bytes = 134.218 megabytes
Can you try this?
ini_set('memory_limit', '140M')
/* loop code below */
I have a problem with object comparison in PHP. What seems like a straightforward code actually runs way too slow for my liking and as I am not that advanced in the language I would like some feedback and suggestions regarding the following code:
class TestTokenGroup {
private $tokens;
...
public static function create($tokens) {
$instance = new static();
$instance->tokens = $tokens;
...
return $instance;
}
public function getTokens() {
return $this->tokens;
}
public static function compare($tokenGroup1, $tokenGroup2) {
$i = 0;
$minLength = min(array(count($tokenGroup1->getTokens()), count($tokenGroup2->getTokens())));
$equalLengths = (count($tokenGroup1->getTokens()) == count($tokenGroup2->getTokens()));
$comparison = strcmp($tokenGroup1->getTokens()[$i], $tokenGroup2->getTokens()[$i]);
while ($comparison == 0) {
$i++;
if (($i == $minLength) && ($equalLengths == true)) {
return 0;
}
$comparison = strcmp($tokenGroup1->getTokens()[$i], $tokenGroup2->getTokens()[$i]);
}
$result = $comparison;
if ($result < 0)
return -1;
elseif ($result > 0)
return 1;
else
return 0;
}
...
}
In the code above $tokens is just a simple array of strings.
Using the method above through usort() for an array of TestTokenGroup consisting of around 40k objects takes ~2secs.
Is there a sensible way to speed that up? Where is the bottleneck here?
EDIT: Added the getTokens() method I initially forgot to include.
You know that objects are "pass by reference", and arrays are "pass by value"?
If getTokens() returns $this->tokens, the array is copied every time you invoke that method.
Try accessing $tokens directly via $tokenGroup1->tokens. You could also use references (&) although returning a reference doesn't work in all PHP versions.
Alternatively, make one copy only:
$tokens1 = $tokenGroup1->getTokens();
$tokens2 = $tokenGroup2->getTokens();
Even if each token group is relatively small, it will save at least 40000 * ( 6 + $average_token_group_length * 2) array copies.
UPDATE
I've benchmarked OP's code (removing the ... lines) using:
function gentokens() {
$ret = [];
for ( $i=0; $i< 3; $i++)
{
$str = "";
for ( $x = rand(0,3); $x < 10; $x ++ )
$str .= chr( rand(0,25) + ord('a') );
$ret[] = $str;
}
return $ret;
}
$start = microtime(true);
$array = []; // this will hold the TestTokenGroup instances
$dummy = ""; // this will hold the tokens, space-separated and newline-separated
$dummy2= []; // this will hold the space-concatenated strings
for ( $i=0; $i < 40000; $i++)
{
$array[] = TestTokenGroup::create( $t = gentokens() );
$dummy .= implode(' ', $t ) . "\n";
$dummy2[] = implode(' ', $t );
}
// write a test file to benchmark GNU sort:
file_put_contents("sort-data.txt", $dummy);
$inited = microtime(true);
printf("init: %f s\n", ($inited-$start));
usort( $array, [ 'TestTokenGroup', 'compare'] );
$sorted = microtime(true);
printf("sort: %f s\n", ($sorted-$inited));
usort( $dummy2, 'strcmp' );
$sorted2 = microtime(true);
printf("sort: %f s\n", ($sorted2-$sorted));
With the following results:
init: 0.359329 s // for generating 40000 * 3 random strings and setup
sort: 1.012096 s // for the TestTokenGroup::compare
sort: 0.120583 s // for the 'strcmp' compare
And, running time sort sort-data.txt > /dev/null yields
.052 u (user-time, in seconds).
optimisation 1: remove array copies
replacing ->getTokens() with ->tokens yields (I'll only list the TestTokenGroup::compare results):
sort: 0.832794 s
Optimisation 2: remove redundant array() in min
Changing the $minlength line to:
$minLength = min(count($tokenGroup1->tokens), count($tokenGroup2->tokens));
gives
sort: 0.779134 s
Optimisation 3: Only call count once for each tokenGroup
$count1 = count($tokenGroup1->tokens);
$count2 = count($tokenGroup2->tokens);
$minLength = min($count1, $count2);
$equalLengths = ($count1 == $count2);
gives
sort: 0.679649 s
Alternative approach
The fastest sort so far is strcmp( $stringarray, 'strcmp' ): 0.12s - still twice as slow as GNU sort, but the latter only does one thing, and does it well.
So, to sort the TokenGroups efficiently we need to construct sort key consisting of a simple string. We can use \0 as a delimiter for the tokens, and we don't have to worry about them being equal length, because as soon as one character is different, the compare aborts.
Here's the implementation:
$arr2 = [];
foreach ( $array as $o )
$arr2[ implode("\0", $o->getTokens() ) ] = $o;
$init2 = microtime(true);
printf("init2: %f s\n", ($init2-$sorted2));
uksort( $arr2, 'strcmp' );
$sorted3 = microtime(true);
printf("sort: %f s\n", ($sorted3-$init2));
and here the results:
init2: 0.125939 s
sort: 0.104717 s
In short, which is more efficient or better or faster than the other?
if (in_array($value, array('val1', 'val2', 'val3')) { ... }
or
$arr = array('val1', 'val2', 'val3');
if (in_array($value, $arr) { ... }
This code will run inside a loop, so the same array(...) declaration would happen multiple times.
Does it really matter if the array is in a variable or if it is redeclared on the fly for every in_array run inside a loop?
Speaking of speed - I prefer to use isset($arr[$key]) for the keys of the associate array instead of in_array if it is possible.
$arraySize = 10000;
$loop = 10000;
// for numbers, at first
$array = range(1, $arraySize);
$start = getMtime();
for($i = 0; $i < $loop; $i++)
{
if (in_array(rand(1, $arraySize), $array))
continue;
}
$end = getMtime();
echo ($end - $start) . '<br>';
$start = getMtime();
for($i = 0; $i < $loop; $i++)
{
if (isset($array[rand(1, $arraySize)]))
continue;
}
$end = getMtime();
echo ($end - $start) . '<br>';
// the same, but for strings
foreach($array as &$el)
$el = generateId(10);
$start = getMtime();
for($i = 0; $i < $loop; $i++)
{
if (in_array(generateId(10), $array))
continue;
}
$end = getMtime();
echo ($end - $start) . '<br>';
// now set them as keys
$array = array_flip($array);
echo 'Size of array ' . count($array) . '<br>';
$start = getMtime();
for($i = 0; $i < $loop; $i++)
{
if (isset($array[generateId(10)]))
continue;
}
$end = getMtime();
echo $end - $start;
function getMtime()
{
$mtime = microtime();
$mtime = explode(" ",$mtime);
$mtime = $mtime[1] + $mtime[0];
return $mtime;
}
function generateId($len = 10)
{
return join('', array_map(function() { return substr('0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', rand(0, 61), 1); }, range(1, $len)));
}
Results:
0.69922399520874 // in_array, numbers
0.0039558410644531 // isset, numbers
3.5183579921722 // in_array, strings
Size of array 10000
0.15712094306946 // isset, strings
No repeating keys, so the size of flipped array is the same. Of course, may be I'm missing something, but I do not expect high speed from in_array function.
The first way is faster, but not as legible. However, most arrays you need to evaluate this way are going to be arrays which you've already assigned to a variable for other purposes. The latter method requires an extra assignment, so it would take slightly more memory and time.
You should create or appropriate a time tracking class, if you care how fast things are (which is definitely a good way to approach programming). Almost every new method of doing things I create, I test the speed of it, and compare that to existing methods, to make sure it's beneficial/efficient and that I'm on the right track. Create a time class and execute each scenario you want to compare some 10000+ iterations, depending on what your testing, to see which is faster. That way you'll never have to guess or get somebody else's insight on the matter.
Edit:
This may not need to be said, but I thought I should add it just in case:
Any time class in php would need the ability to track milliseconds to be relevant.
I've recently send my CV to one company that was hiring PHP developers. They send me back a task to solve, to mesure if I'm experienced enough.
The task goes like that:
You have an array with 10k unique elements, sorted descendant. Write function that generates this array and next write three different functions which inserts new element into array, in the way that after insert array still will be sorted descendant. Write some code to measure speed of those functions. You can't use PHP sorting functions.
So I've wrote function to generate array and four functions to insert new element to array.
/********** Generating array (because use of range() was to simple :)): *************/
function generateSortedArray($start = 300000, $elementsNum = 10000, $dev = 30){
$arr = array();
for($i = 1; $i <= $elementsNum; $i++){
$rand = mt_rand(1, $dev);
$start -= $rand;
$arr[] = $start;
}
return $arr;
}
/********************** Four insert functions: **************************/
// for loop, and array copying
function insert1(&$arr, $elem){
if(empty($arr)){
$arr[] = $elem;
return true;
}
$c = count($arr);
$lastIndex = $c - 1;
$tmp = array();
$inserted = false;
for($i = 0; $i < $c; $i++){
if(!$inserted && $arr[$i] <= $elem){
$tmp[] = $elem;
$inserted = true;
}
$tmp[] = $arr[$i];
if($lastIndex == $i && !$inserted) $tmp[] = $elem;
}
$arr = $tmp;
return true;
}
// new element inserted at the end of array
// and moved up until correct place
function insert2(&$arr, $elem){
$c = count($arr);
array_push($arr, $elem);
for($i = $c; $i > 0; $i--){
if($arr[$i - 1] >= $arr[$i]) break;
$tmp = $arr[$i - 1];
$arr[$i - 1] = $arr[$i];
$arr[$i] = $tmp;
}
return true;
}
// binary search for correct place + array_splice() to insert element
function insert3(&$arr, $elem){
$startIndex = 0;
$stopIndex = count($arr) - 1;
$middle = 0;
while($startIndex < $stopIndex){
$middle = ceil(($stopIndex + $startIndex) / 2);
if($elem > $arr[$middle]){
$stopIndex = $middle - 1;
}else if($elem <= $arr[$middle]){
$startIndex = $middle;
}
}
$offset = $elem >= $arr[$startIndex] ? $startIndex : $startIndex + 1;
array_splice($arr, $offset, 0, array($elem));
}
// for loop to find correct place + array_splice() to insert
function insert4(&$arr, $elem){
$c = count($arr);
$inserted = false;
for($i = 0; $i < $c; $i++){
if($elem >= $arr[$i]){
array_splice($arr, $i, 0, array($elem));
$inserted = true;
break;
}
}
if(!$inserted) $arr[] = $elem;
return true;
}
/*********************** Speed tests: *************************/
// check if array is sorted descending
function checkIfArrayCorrect($arr, $expectedCount = null){
$c = count($arr);
if(isset($expectedCount) && $c != $expectedCount) return false;
$correct = true;
for($i = 0; $i < $c - 1; $i++){
if(!isset($arr[$i + 1]) || $arr[$i] < $arr[$i + 1]){
$correct = false;
break;
}
}
return $correct;
}
// claculates microtimetime diff
function timeDiff($startTime){
$diff = microtime(true) - $startTime;
return $diff;
}
// prints formatted execution time info
function showTime($func, $time){
printf("Execution time of %s(): %01.7f s\n", $func, $time);
}
// generated elements num
$elementsNum = 10000;
// generate starting point
$start = 300000;
// generated elements random range 1 - $dev
$dev = 50;
echo "Generating array with descending order, $elementsNum elements, begining from $start\n";
$startTime = microtime(true);
$arr = generateSortedArray($start, $elementsNum, $dev);
showTime('generateSortedArray', timeDiff($startTime));
$step = 2;
echo "Generating second array using range range(), $elementsNum elements, begining from $start, step $step\n";
$startTime = microtime(true);
$arr2 = range($start, $start - $elementsNum * $step, $step);
showTime('range', timeDiff($startTime));
echo "Checking if array is correct\n";
$startTime = microtime(true);
$sorted = checkIfArrayCorrect($arr, $elementsNum);
showTime('checkIfArrayCorrect', timeDiff($startTime));
if(!$sorted) die("Array is not in descending order!\n");
echo "Array OK\n";
$toInsert = array();
// number of elements to insert from every range
$randElementNum = 20;
// some ranges of elements to insert near begining, middle and end of generated array
// start value => end value
$ranges = array(
300000 => 280000,
160000 => 140000,
30000 => 0,
);
foreach($ranges as $from => $to){
$values = array();
echo "Generating $randElementNum random elements from range [$from - $to] to insert\n";
while(count($values) < $randElementNum){
$values[mt_rand($from, $to)] = 1;
}
$toInsert = array_merge($toInsert, array_keys($values));
}
// some elements to insert on begining and end of array
array_push($toInsert, 310000);
array_push($toInsert, -1000);
echo "Generated elements: \n";
for($i = 0; $i < count($toInsert); $i++){
if($i > 0 && $i % 5 == 0) echo "\n";
printf("%8d, ", $toInsert[$i]);
if($i == count($toInsert) - 1) echo "\n";
}
// functions to test
$toTest = array('insert1' => null, 'insert2' => null, 'insert3' => null, 'insert4' => null);
foreach($toTest as $func => &$time){
echo "\n\n================== Testing speed of $func() ======================\n\n";
$tmpArr = $arr;
$startTime = microtime(true);
for($i = 0; $i < count($toInsert); $i++){
$func($tmpArr, $toInsert[$i]);
}
$time = timeDiff($startTime, 'checkIfArraySorted');
showTime($func, $time);
echo "Checking if after using $func() array is still correct: \n";
if(!checkIfArrayCorrect($tmpArr, count($arr) + count($toInsert))){
echo "Array INCORRECT!\n\n";
}else{
echo "Array OK!\n\n";
}
echo "Few elements from begining of array:\n";
print_r(array_slice($tmpArr, 0, 5));
echo "Few elements from end of array:\n";
print_r(array_slice($tmpArr, -5));
//echo "\n================== Finished testing $func() ======================\n\n";
}
echo "\n\n================== Functions time summary ======================\n\n";
print_r($toTest);
Results can be found here: http://ideone.com/1xQ3T
Unfortunately I was rated only 13 points out of 30 for this task (don't know how it was calculated or what exactly was taken in account). I can only assume that's because there are better ways to insert new element into sorted array in PHP. I'm searching this topic for some time now but couldn't find anything good. Maby you know of better approach or some articles about that topic?
Btw on my localhost (PHP 5.3.6-13ubuntu3.6 with Suhosin-Patch, AMD Athlon(tm) II X4 620) insert2() is fastest, but on ideone (PHP 5.2.11) insert3() is fastest.
Any ideas why? I suppose that array_splice() is tuned up somehow :).
//EDIT
Yesterday I thought about it again, and figured out the better way to do inserts. If you only need sorted structure and a way to iterate over it and your primary concern is the speed of insert operation, than the best choise would be using SplMaxHeap class. In SplMaxHeap class inserts are damn fast :) I've modified my script to show how fast inserts are. Code is here: http://ideone.com/vfX98 (ideone has php 5.2 so there won't be SplMaxHeap class)
On my localhost I get results like that:
================== Functions time summary ======================
insert1() => 0.5983521938
insert2() => 0.2605950832
insert3() => 0.3288729191
insert4() => 0.3288729191
SplMaxHeap::insert() => 0.0000801086
It may just be me, but maybe they were looking for readability and maintainability as well?
I mean, you're naming your variables $arr, and $c and $middle, without even bothering to place proper documentation.
Example:
/**
* generateSortedArray() Function to generate a descending sorted array
*
* #param int $start Beginning with this number
* #param int $elementsNum Number of elements in array
* #param int $dev Maximum difference between elements
* #return array Sorted descending array.
*/
function generateSortedArray($start = 300000, $elementsNum = 10000, $dev = 30) {
$arr = array(); #Variable definition
for ($i = 1; $i <= $elementsNum; $i++) {
$rand = mt_rand(1, $dev); #Generate a random number
$start -= $rand; #Substract from initial value
$arr[] = $start; #Push to array
}
return $arr;
}
I need a unique string from an array so that I can tell when it changes without measuring the inputs of that array. I'm trying to work out if it is computationally efficient to calculate a value rather than add code to look out for changes in the array. The array itself can have a variety of values and for future proofing I don't want to try and measure whether new values have been added to the array, I'd much rather just create some string or hash that will change if the array itself changes.
So for example:
$a = Array(
'var1' => 1,
'var2' => 2,
'var3' => 3,
);
If I was to use md5(http_build_query($a)) perhaps with an added ksort to confirm that the order of the keys haven't changed this might then produce a unique string that I can use to compare against another run of the application to evaluate whether the array has changed.
I'm looking for an alternate, possibly faster or more elegant solutions to this.
Im use md5(serialize($array)) for this. Its better, because works for multi-dimensional arrays.
Thanks for all the ideas guys.
I've tried all of them except a sha-256 which my server doesn't have installed.
Here's the results:
Average (http_build_query): 1.3954045954045E-5
Average (diff): 0.00011533766233766
Average (serialize): 1.7588411588412E-5
Average (md5): 1.6036963036966E-5
Average (implode-haval160,4): 1.5349650349649E-5
That's running the operation 1000 times and averaging the result. After refreshing a couple times I could tell that the http_build_query was the quickest. I guess my next question would be if anyone can think of any pitfalls of using this method?
Thanks
Here's my code:
class a {
static $input;
function test() {
$start = null;
$s = $e = $d = $g = $h = $i = $k = array();
self::$input = array();
for ($x = 0; $x <= 30; $x++) {
self::$input['variable_' . $x] = rand();
}
for ($x = 0; $x <= 1000; $x++) {
$start = microtime();
$c = http_build_query(self::$input);
($c == $c);
$s[] = microtime() - $start;
}
for ($x = 0; $x <= 1000; $x++) {
$start = microtime();
$c = md5(http_build_query(self::$input));
($c == $c);
$e[] = microtime() - $start;
}
for ($x = 0; $x <= 1000; $x++) {
$start = microtime();
$c = array_diff(self::$input, self::$input);
$d[] = microtime() - $start;
}
for ($x = 0; $x <= 1000; $x++) {
$start = microtime();
$c = serialize(self::$input);
($c == $c);
$g[] = microtime() - $start;
}
for ($x = 0; $x <= 1000; $x++) {
$start = microtime();
$c = hash("haval160,4", implode(',',self::$input));
($c == $c);
$h[] = microtime() - $start;
}
echo "<pre>";
//print_r($s);
echo "Average (http_build_query): " . array_sum($s) / count($s) . "<br>";
echo "Average (diff): " . array_sum($d) / count($d) . "<br>";
echo "Average (serialize): " . array_sum($g) / count($g) . "<br>";
echo "Average (md5): " . array_sum($e) / count($e). "<br>";
echo "Average (implode-haval160,4): " . array_sum($h) / count($h);
}
}
a::test();
PHP has an array_diff() function, don't know if it's of any use for you.
Otherwise, you can eventualy use the incremental hashing possibility offered by php : http://www.php.net/manual/en/function.hash-init.php by iterating over each values of the array and adding them in the incremental hash.
You could always just do
$str = implode(",", $a);
$check = hash("sha-256", $str);
Theoretically, that should detect changes in array size, data, or ordering.
Of course, you can use whatever hash you wish.