Is this a fair method of algorithm comparison?

Is this a fair method of algorithm comparison? - php

I wanted to convert an array to lowercase and was wondering the most efficient method. I came up with two options, one using array_walk and one using foreach and wanted to compare them. Is this the best way to compare the two? Is there an even more efficient method that I have overlooked?
<?
$a = array_fill(0, 200000, genRandomString());
$b = array_fill(0, 200000, genRandomString());
$t = microtime(true);
array_walk($a, create_function('&$a', '$a = strtolower($a);'));
echo "array_walk: ".(microtime(true) - $t);
echo "<br />";
$t = microtime(true);
foreach($b as &$source) { $source = strtolower($source); }
echo "foreach: ".(microtime(true) - $t);
function genRandomString($length = 10) {
$characters = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
$string = '';
for ($p = 0; $p < $length; $p++) {
$string .= $characters[mt_rand(0, strlen($characters)-1)];
}
return $string;
}
The output:
array_walk: 0.52975487709045
foreach: 0.29656505584717

Two questions in one!
How to run the tests:
Personally, I'd write individual test scripts for each method, then use the Apache ab utility to run the tests:
ab -n 100 -c 1 http://localhost/arrayWalkTest.php
ab -n 100 -c 1 http://localhost/foreachTest.php
That gives me a much more detailed set of statistics for comparison
I'd also try to ensure that the two methods were working on identical datasets for each test, not different random data.
The most efficient method:
You should unset($source) after your loop as a safety measure: because you're accessing by reference in the loop, $source will still contain a reference to the last entry in the array and may give you unpredictable results if you reference $source anywhere else in your script.

I had lots of weird results in the past when using the microtime approach over using a dedicated profiler, like it exists in XDebug or Zend_Debugger. Also, for a fair comparison your arrays should be identical instead of two random arrays.
In addition, you could consider using array_map and strtolower:
$a = array_map('strtolower', $a);
which would save you the lambda for array_walk. Anonymous functions created with create_function (unlike PHP 5.3's anonymous functions) are known to be slow and strtolower is a native function, so using it directly should be faster.
I did a quick benchmark and I dont see any relevant speed difference between this approach and your foreach. Like so often, I'd say it's a µ-opt. Of course, you should test that in a real world application if you think it matters. Synthetic benchmarks are fun, but ultimately useless.
On a sidenote, to change the array keys, you can use
array_change_key_case — Changes all keys in an array

I don't know PHP, so this is a wild guess:
str_split(strtolower(implode("", $a)))

Related

Is there any manual method other than "str_repeat" to repeat the string?

I mean if we give 3, b as parameters passed into function, it should return "bbb" by using loops.
I've tried some code, but I do not want to post it because it might look crazy for a well-versed developer. I can provide you links, this question has been asked in an interview, mainly they want it to be computed in C or C++. Since I am a PHP practitioner, I am curious to know it is possible in PHP. Below is the link (ROUND 2: SIMPLE CODING(3 hours))
https://www.geeksforgeeks.org/zoho-interview-set-3-campus/

A PHP function to do that would probably look like this:
function string_repeat($num, $string)
{
$result = "";
for ($x = 0; $x < $num; $x++) {
$result .= $string;
}
return $result;
}
So calling echo string_repeat(3, 'b'); would output:
bbb

One way would be to keep around a "dummy" string, of sufficient length to be longer than any string you want to generate. Then, we can use preg_replace to replace each character with whatever the input is. Finally, we can substring that replace string to the desired length.
$dummy = "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx";
$length = 3;
$dummy = preg_replace('/./', 'b', $dummy);
$output = substr($dummy, 0, $length);
echo $output;
This prints:
bbb
You could wrap this into a helper function, if you really wanted to do that. Note that internally the regex engine is most likely doing some looping of its own, so we are not really freeing ourselves from looping, just hiding it from the current context.

strpos for arrays

Let's say I have 90 'indexes' in my array and I have a function which checks if that value exists in that array, would it be faster if i used strpos with a String instead?
Instead of using in_array() to
$data = array('John','Mary','Steven');
It will be
$data = 'John.Mary.Steven';
then I'll just strpos() on that String?

Without bothering to profile it, I'd say that imploding to a string followed by strpos would be slower than PHP's built-in in_array() function.... because you're adding all the overhead of converting the entire array (all 90 elements) to a string before you can even use strpos(). Premature Micro-optimisation isn't a good idea, unless you really need it, and then you should test your ideas.
EDIT
If you're using your own function instead of in_array(), it probably is slower, but raises the question "why"?

I was quite sure that use of strpos will be slower but I made a test below, and it looks like (at least in this particular case - searching for the last element) strpos is faster than in_array.
$array = array();
for($i=0;$i<10000;$i++) {
$array[] = md5($i . date('now'));
}
$string = implode('.', $array);
$lastElement = $array[9999];
$start = microtime(TRUE);
$isit = in_array($lastElement, $array);
$end = microtime(TRUE);
echo ($end - $start) . PHP_EOL;
$start = microtime(TRUE);
$pos = strpos($string, $lastElement);
$end = microtime(TRUE);
echo ($end - $start) . PHP_EOL;
Results I'm getting:
0.0012338161468506
0.00036406517028809

According to this test, looping your array and checking with strpos() would be slower than just using in_array(). They claim that in_array() is actually 2.4 times faster than doing a foreach loop with strpos().
On the other hand, a question here on SO seems to indicate otherwise.
Efficiency of Searching an Array Vs Searching in Text.... Which is Better?
If I were you, I would run my own performance tests to see what works best with my specific set of data.

is quicker to use array_keys() in a foreach statemnt or set a value variable that's never used?

Suppose I want to iterate over an array and either I never look at the values, or I am setting things in them, so I only want the keys. Which is quicker:
// Set a variable each iteration which is unused.
foreach ($array as $key => $value) {
$array[$key]['foo'] = 'bar';
}
// Call array_keys() before iterating.
foreach (array_keys($array) as $key) {
$array[$key]['foo'] = 'bar';
}

I think this would also work, and may be quicker:
foreach ($array as &$value) {
$value['foo'] = 'bar';
}
UPDATE: I did a little test, and it seems this is faster. http://codepad.org/WI7Mtp8K

Leaving aside that the second example is contrived. Meaning, as Ashley Banks noted, why would you be doing this?
The first example would be more performant of the two. The second has the overhead of the additional function call to array_keys().
Check out the PHP Benchmark for additional performance tests. When in doubt, benchmark it yourself with microtime().

People need to stop prioritizing questioning the motive of the OP.
I did a simple microtime test and noticed barely any noteable difference in execution time.
The method - creating the test-array:
$arr = array();
for($i = 0; $i<10000; $i++){
$arr[] = mt_rand(0, PHP_INT_MAX);
//NOTE: I also tested setting values in random keys, no measurable difference
}
Then use that array for testing:
//Method 1
$start = microtime(true);
foreach(array_keys($arr) as $k){
$arr[$k] = 0; //Do something
}
$end = microtime(true);
echo 'we spent '.($end - $start).' seconds doing this.';
Here is the code for the other method testing (ran this separately, or results varied even more)
//Method 2
$start = microtime(true);
foreach($arr as $k => $v){
$arr[$k] = 0; //Do something
}
$end = microtime(true);
echo 'we spent '.($end - $start).' seconds doing this.';
My timings were around 2 milliseconds runtime for both. I guess my system isn't stable enough for consistent measures of microseconds.
I could not determine if one was faster than the other.
I also tested running each foreach 1000 times with a smaller array to measure the method itself, but I just got the same 2 milliseconds result.
So what does this mean?
Based on these simple tests, the choice of method is pointless in regards to performance.
Maybe you can measure a tiny improvement with one method over another if you are running extremely large arrays on extremely stable systems, or running extremely many foreach-loops the one way the other.

Is it better to call function in a nested way or to split each passage inside a var?

Take these three examples:
ONE
return lowercaseKeys(json_decode(trim($json),true));
TWO
$trimmed = trim($json);
$array = json_decode($trimmed,true);
$return = lowercaseKeys($array);
return $return;
THREE
$return = trim($json);
$return = json_decode($return,true);
$return = lowercaseKeys($return);
return $return;
Aside from readability, which is the best performance wise?
What is considered best practice?
p.s. code is only an example, not related to the question, I just copy pasted from my notepad window.

Number one rule is do whatever is most readable when dealing with micro-optimizations. But here is a small test I did.
<?php
$iterations = 1000000;
$tests = array('one', 'two', 'three');
$json = json_encode($tests);
foreach ($tests as $function) {
echo $function;
$start = microtime(true);
for ($i = 1; $i <= $iterations; $i++) {
$function($json);
}
$end = microtime(true);
echo ' - ' . ($end - $start) . " sec\n";
}
function one($json) {
return array_change_key_case(json_decode(trim($json),true), CASE_LOWER);
}
function two($json) {
$trimmed = trim($json);
$array = json_decode($trimmed,true);
$return = array_change_key_case($array, CASE_LOWER);
return $return;
}
function three($json) {
$return = trim($json);
$return = json_decode($return,true);
$return = array_change_key_case($return, CASE_LOWER);
return $return;
}
?>
Results:
one - 3.3994290828705 sec
two - 3.5148930549622s sec
three - 3.5086510181427s sec
Option one is indeed a tiny bit faster, but that was one million iterations, and the time difference still wouldn't even be noticeable. With smaller amounts of iterations, the timing is too variable to even have a pattern to declare one better than the other.

Performance wise, I don't think you'd really notice any difference. Would be minuscule...
But I vote for option 2, because it's the easiest to read (as opposed to option 1), and I don't think it's a good idea to keep overwriting it like you do in option 3 (not very good form).

There should be no difference performance wise in your example.

If PHP is doing it right, these should all be the same once tokenized. Assuming that's true, option two is the most readable and easiest to modify in the future. Perhaps this page is worth reading.

Option 1 will be maybe a microsecond faster than the other 2 options and use a few bits less physical memory, but any difference will be completely negligible unless multiplied on an exponential level. The core here is going to be the ultimate readability. I find Option 1 to be perfectly suitable personally, but I recognize that in some of the teams where I work, this option would not meet the standard for the lowest common denominator of developer. Option 2 and Option 3 really are exactly the same since you're not doing anything with the extra variables that are created. They both create 3 separate tokens. Option 2 is more explicitly readable in that the variables describe the method being applied at that stage, so I would vote for 2 as well.

I tend to do thing the 'option 2' way, myself. You never know when you might need something from half way through. Helps with readability too. Ok, it's not quite as efficient but for the sake of being able to understand and edit your code at a later date (especially if someone else has got to do it), option 2 feels best to me.

I like the first option, and it use less memory but the difference is not significant

Personally I like option one, as for your questions:
which is the best performance wise? Option One
What is considered best practice? Option One and Two depending on the complexity of the function.
But let's look at how they look as a function
Option One:
// Simple, readable
function optionOne($json) {
return lowercaseKeys(json_decode(trim($json),true));
}
Option Two:
// Still readable with a little more detail to a novice developer
function optionTwo($json) {
$trimmed = trim($json);
$array = json_decode($trimmed,true);
$return = lowercaseKeys($array);
return $return;
}
Option Three:
// This might cause some problems with the $return variable
// Still looks like it would work but I'm not fond of this option
function optionThree($json) {
$return = trim($json);
$return = json_decode($return,true);
$return = lowercaseKeys($return);
return $return;
}

Which is faster in PHP, $array[] = $value or array_push($array, $value)?

What's better to use in PHP for appending an array member,
$array[] = $value;
or
array_push($array, $value);
?
Though the manual says you're better off to avoid a function call, I've also read $array[] is much slower than array_push(). What are some clarifications or benchmarks?

I personally feel like $array[] is cleaner to look at, and honestly splitting hairs over milliseconds is pretty irrelevant unless you plan on appending hundreds of thousands of strings to your array.
I ran this code:
$t = microtime(true);
$array = array();
for($i = 0; $i < 10000; $i++) {
$array[] = $i;
}
print microtime(true) - $t;
print '<br>';
$t = microtime(true);
$array = array();
for($i = 0; $i < 10000; $i++) {
array_push($array, $i);
}
print microtime(true) - $t;
The first method using $array[] is almost 50% faster than the second one.
Some benchmark results:
Run 1
0.0054171085357666 // array_push
0.0028800964355469 // array[]
Run 2
0.0054559707641602 // array_push
0.002892017364502 // array[]
Run 3
0.0055501461029053 // array_push
0.0028610229492188 // array[]
This shouldn't be surprising, as the PHP manual notes this:
If you use array_push() to add one element to the array it's better to use $array[] = because in that way there is no overhead of calling a function.
The way it is phrased I wouldn't be surprised if array_push is more efficient when adding multiple values. Out of curiosity, I did some further testing, and even for a large amount of additions, individual $array[] calls are faster than one big array_push. Interesting.

The main use of array_push() is that you can push multiple values onto the end of the array.
It says in the documentation:
If you use array_push() to add one
element to the array it's better to
use $array[] = because in that way
there is no overhead of calling a
function.

From the PHP documentation for array_push:
Note: If you use array_push() to add one element to the array it's better to use $array[] = because in that way there is no overhead of calling a function.

Word on the street is that [] is faster because no overhead for the function call. Plus, no one really likes PHP's array functions...
"Is it...haystack, needle....or is it needle haystack...ah, f*** it...[] = "

One difference is that you can call array_push() with more than two parameters, i.e. you can push more than one element at a time to an array.
$myArray = array();
array_push($myArray, 1,2,3,4);
echo join(',', $myArray);
prints 1,2,3,4

A simple $myarray[] declaration will be quicker as you are just pushing an item onto the stack of items due to the lack of overhead that a function would bring.

Since "array_push" is a function and it called multiple times when it is inside the loop, it will allocate memory into the stack.
But when we are using $array[] = $value then we are just assigning a value to the array.

Second one is a function call so generally it should be slower than using core array-access features. But I think even one database query within your script will outweight 1000000 calls to array_push().
See here for a quick benchmark using 1000000 inserts: https://3v4l.org/sekeV

I just wan't to add : int array_push(...) returns
the new number of elements in the array (PHP documentation). which can be useful and more compact than $myArray[] = ...; $total = count($myArray);.
Also array_push(...) is meaningful when variable is used as a stack.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Is this a fair method of algorithm comparison? - php

I don't know PHP, so this is a wild guess: str_split(strtolower(implode("", $a)))

Related

Is there any manual method other than "str_repeat" to repeat the string?

strpos for arrays

is quicker to use array_keys() in a foreach statemnt or set a value variable that's never used?

Is it better to call function in a nested way or to split each passage inside a var?

Which is faster in PHP, $array[] = $value or array_push($array, $value)?

Categories

Resources