If I have
return recordsAffected > 0;
which would return either true or false, do I need to put return recordsAffected > 0 ? true : false?
No, you do not have to as your code works just fine. You may find some developers recommend doing it because it is clearer to read and understand but that's a matter of personal opinion.
Always code as if the person maintaining your code is a violent psychopath who knows where you live - Martin Golding
You do not need that. Main reason, is that it would be 2 operations instead of one: first for comparison, second for value choosing. I also want to mention, that each conditional operator (even ternary) affects performance.
Short test:
<?php
header('Content-Type: text/plain; charset=utf-8');
$start = microtime(true);
for($i = 1; $j = 1, $i <= 10000000; $i++){
($i == $j);
}
$end = microtime(true);
echo 'Not ternary: ', $end - $start, PHP_EOL;
$start = microtime(true);
for($i = 1; $j = 1, $i <= 10000000; $i++){
($i == $j ? true : false);
}
$end = microtime(true);
echo 'Ternary: ', $end - $start, PHP_EOL;
?>
An it's results.
You certainly do not need to. The intention and semantics of
return recordsAffected > 0;
is perfectly clear. This should hold true for every decent programmer reading your code.
return recordsAffected > 0 ? true : false;
is redundant at best, but I'd go further and call it detrimental. The second snippet does not add anything to the statement but complexity. I'll bet if you did not write code like this all the time (and I believe most decent progammers don't) the second statement will take you at least two passes to grasp the meaning, if not more. When there are two semantically equal solutions you should always stick to the clearest one, which is not necessarily the most explicit one. Nobody would ever write something like
if(recordsAffected > 0 ? true : false)
{
}
Related
I'm trying to produce a timing attack in PHP and am using PHP 7.1 with the following script:
<?php
$find = "hello";
$length = array_combine(range(1, 10), array_fill(1, 10, 0));
for ($i = 0; $i < 1000000; $i++) {
for ($j = 1; $j <= 10; $j++) {
$testValue = str_repeat('a', $j);
$start = microtime(true);
if ($find === $testValue) {
// Do nothing
}
$end = microtime(true);
$length[$j] += $end - $start;
}
}
arsort($length);
$length = key($length);
var_dump($length . " found");
$found = '';
$alphabet = array_combine(range('a', 'z'), array_fill(1, 26, 0));
for ($len = 0; $len < $length; $len++) {
$currentIteration = $alphabet;
$filler = str_repeat('a', $length - $len - 1);
for ($i = 0; $i < 1000000; $i++) {
foreach ($currentIteration as $letter => $time) {
$testValue = $found . $letter . $filler;
$start = microtime(true);
if ($find === $testValue) {
// Do nothing
}
$end = microtime(true);
$currentIteration[$letter] += $end - $start;
}
}
arsort($currentIteration);
$found .= key($currentIteration);
}
var_dump($found);
This is searching for a word with the following constraints
a-z only
up to 10 characters
The script finds the length of the word without any issue, but the value of the word never comes back as expected with a timing attack.
Is there something I am doing wrong?
The script loops though lengths, correctly identifies the length. It then loops though each letter (a-z) and checks the speed on these. In theory, 'haaaa' should be slightly slower than 'aaaaa' due to the first letter being a h. It then carries on for each of the five letters.
Running gives something like 'brhas' which is clearly wrong (it's different each time, but always wrong).
Is there something I am doing wrong?
I don't think so. I tried your code and I too, like you and the other people who tried in the comments, get completely random results for the second loop. The first one (the length) is mostly reliable, though not 100% of the times. By the way, the $argv[1] trick suggested didn't really improve the consistency of the results, and honestly I don't really see why it should.
Since I was curious I had a look at the PHP 7.1 source code. The string identity function (zend_is_identical) looks like this:
case IS_STRING:
return (Z_STR_P(op1) == Z_STR_P(op2) ||
(Z_STRLEN_P(op1) == Z_STRLEN_P(op2) &&
memcmp(Z_STRVAL_P(op1), Z_STRVAL_P(op2), Z_STRLEN_P(op1)) == 0));
Now it's easy to see why the first timing attack on the length works great. If the length is different then memcmp is never called and therefore it returns a lot faster. The difference is easily noticeable, even without too many iterations.
Once you have the length figured out, in your second loop you are basically trying to attack the underlying memcmp. The problem is that the difference in timing highly depends on:
the implementation of memcmp
the current load and interfering processes
the architecture of the machine.
I recommend this article titled "Benchmarking memcmp for timing attacks" for more detailed explanations. They did a much more precise benchmark and still were not able to get a clear noticeable difference in timing. I'm simply going to quote the conclusion of the article:
In conclusion, it highly depends on the circumstances if a memcmp() is subject to a timing attack.
Supposedly string is:
$a = "abc-def"
if (preg_match("/[^a-z0-9]/i", $a, $m)){
$i = "i stopped scanning '$a' because I found a violation in it while
scanning it from left to right. The violation was: $m[0]";
}
echo $i;
example above: should indicate "-" was the violation.
I would like to know if there is a non-preg_match way of doing this.
I will likely run benchmarks if there is a non-preg_match way of doing this perhaps 1000 or 1 million runs, to see which is faster and more efficient.
In the benchmarks "$a" will be much longer.
To ensure it is not trying to scan the entire "$a" and to ensure it stops soon as it detects a violation within the "$a"
Based on information I have witnessed on the internet, preg_match stops when the first match is found.
UPDATE:
this is based on the answer that was given by "bishop" and will likely to be chosen as the valid answer soon ( shortly ).
i modified it a little bit because i only want it to report the violator character. but i also commented that line out so benchmark can run without entanglements.
let's run a 1 million run based on that answer.
$start_time = microtime(TRUE);
$count = 0;
while ($count < 1000000){
$allowed = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
$input = 'abc-def';
$validLen = strspn($input, $allowed);
if ($validLen < strlen($input)){
#echo "violation at: ". substr($input, $validLen,1);
}
$count = $count + 1;
};
$end_time = microtime(TRUE);
$dif = $end_time - $start_time;
echo $dif;
the result is: 0.606614112854
( 60 percent of a second )
let's do it with the preg_match method.
i hope everything is the same. ( and fair )..
( i say this because there is the ^ character in the preg_match )
$start_time = microtime(TRUE);
$count = 0;
while ($count < 1000000){
$input = 'abc-def';
preg_match("/[^a-z0-9]/i", $input, $m);
#echo "violation at:". $m[0];
$count = $count + 1;
};
$end_time = microtime(TRUE);
$dif = $end_time - $start_time;
echo $dif;
i use "dif" in reference to the terminology "difference".
the "dif" was.. 1.1145210266113
( took 11 percent more than a whole second )
( if it was 1.2 that would mean it is 2x slower than the php way )
You want to find the location of the first character not in the given range, without using regular expressions? You might want strspn or its complement strcspn:
$allowed = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
$input = 'abc-def';
$validLen = strspn($input, $allowed);
if (strlen($input) !== $validLen) {
printf('Input invalid, starting at %s', substr($input, $validLen));
} else {
echo 'Input is valid';
}
Outputs Input invalid, starting at -def. See it live.
strspn (and its complement) are very old, very well specified (POSIX even). The standard implementations are optimized for this task. PHP just leverages that platform implementation, so PHP should be fast, too.
I am in doubt what to use:
foreach(){
// .....
if(!in_array($view, $this->_views[$condition]))
array_push($this->_views[$condition], $view);
// ....
}
OR
foreach(){
// .....
array_push($this->_views[$condition], $view);
// ....
}
$this->_views[$condition] = array_unique($this->_views[$condition]);
UPDATE
The goal is to get array of unique values. This can be done by checking every time if value already exists with in_array or add all values each time and in the end use array_unique. So is there any major difference between this two ways?
I think the second approach would be more efficient. In fact, array_unique sorts the array then scans it.
Sorting is done in N log N steps, then scanning takes N steps.
The first approach takes N^2 steps (foreach element scans all N previous elements). On big arrays, there is a very big difference.
Honestly if you're using a small dataset it does not matter which one you use. If your dataset is in the 10000s you'll most definitely want to use a hash map for this sort of thing.
This is assuming the views are a string or something, which it looks like it is.
This is typically O(n) and possibly the fastest way to deal with tracking unique values.
foreach($views as $view)
{
if(!array_key_exists($view,$unique_views))
{
$unique_views[$condition][$view] = true;
}
}
TL;DR: foreach combined with if (!in_array()) is better.
Truthfully you should not really worry about what performs better; in most cases the difference is so small, its negligible (unless you're really doing some big data stuff). I would suggest to go with whatever seems more readable.
If you're interested, check out this script I wrote. It loops each case 100.000 times and both take between 50 and 200 ms.
https://3v4l.org/lkTCF
Note that array_unique() keeps the original keys so to counter that we also have to wrap the result with array_values().
In case the link ever dies:
<?php
$loops = 100000;
$start = microtime(true);
for ($l = 0; $l < $loops; $l++) {
$x = [1,2,3,4,6,7,8,9];
for ($i = 0; $i <= 10; $i++) {
if (!in_array($i, $x)) {
$x[] = $i;
}
}
}
$duration = microtime(true) - $start;
echo "in_array took $duration<br>".PHP_EOL;
$start = microtime(true);
for ($l = 0; $l < $loops; $l++) {
$x = [1,2,3,4,6,7,8,9];
$x = array_values(array_unique(array_merge($x, [0,1,2,3,4,5,6,7,8,9,10])));
}
$duration = microtime(true) - $start;
echo "array_unique took $duration<br>".PHP_EOL;
Please consider the following code:
$start = microtime();
for($i = 2; $i < 100; $i++)
{
for($y = 2; $y <= sqrt($i); $y++)
{
if($i%$y != 0)
{
continue;
}
else
{
continue 2;
}
}
echo $i.',';
}
echo "\nFinished in " . (microtime() - $start);
Given that the above code effectively uses continue 2 to break the inner loop and skip any code post the inner loop, why does the following code on average execute faster when it appears to do more:
$start = microtime();
for($i = 2; $i < 100; $i++)
{
$flag = true;
for($y = 2; $y <= sqrt($i); $y++)
{
if($i%$y != 0)
{
continue;
}
else
{
$flag = false;
break;
}
}
if($flag === true) echo $i.',';
}
echo "\nFinished in " . (microtime() - $start);
Thanks for any input.
_____ Update ____________
Thanks for the feedback but we seem to have missed the point. Regardless of if this is good programming practice I was trying to understand why the performance difference (which is tiny but consistent) is not within the bias I expected.
The passing of true to microtime seems insignificant as both samples are measured using the same method with the same overhead and the same inaccuracy.
More than one run was tested, as implied by use of the word average.
Just for illustration please consider the following small samples using microtime(true) which shows the same pattern as using microtime().
I know this is a small sample but the pattern is quite clear:
Continue
0.00037288665771484
0.00048208236694336
0.00046110153198242
0.00039386749267578
0.0003662109375
Break
0.00033903121948242
0.00035715103149414
0.00033307075500488
0.00034403800964355
0.00032901763916016
Thanks for looking, and thanks for any further feedback.
______ UPDATE Further investigation ____________
Interestingly if the echo statements are removed from the code the continue performs faster, with the echo statements in place break is faster.
Please consider the following code sample, and consider that the results are in conflict dependant on if the echo statements are removed or not:
<?php
$breakStats = array();
$continueStats = array();
ob_start();
for($i = 0; $i < 10000; $i++)
{
$breakStats[] = doBreakTest();
$continueStats[] = doContinueTest();
}
ob_clean();
echo "<br/>Continue Mean " . (array_sum($continueStats) / count($continueStats));
echo "<br/>Break Mean " . (array_sum($breakStats) / count($breakStats));
function doBreakTest()
{
$start = microtime(true);
for($i = 2; $i < 100; $i++)
{
$flag = true;
$root = sqrt($i);
for($y = 2; $y <= $root; $y++)
{
if($i%$y != 0)
{
continue;
}
else
{
$flag = false;
break;
}
}
}
if($flag === true) echo $i . '';
return microtime(true) - $start;
}
function doContinueTest()
{
$start = microtime(true);
for($i = 2; $i < 100; $i++)
{
$root = sqrt($i);
for($y = 2; $y <= $root; $y++)
{
if($i%$y != 0)
{
continue;
}
else
{
echo $i . '';
continue 2;
}
}
}
return microtime(true) - $start;
}
Echo statements present :
Continue Mean 0.00014134283065796
Break Mean 0.00012669243812561
Echo statements not present :
Continue Mean 0.00011746988296509
Break Mean 0.00013022310733795
Note that by removing the echo statement from the break and flag test we also remove the ($flag === true) check, so the load should reduce, but continue in this case still wins. W
So in a pure continue n versus break + flag scenario, it appears that continue n is the faster contstruct. But add an equal number of identical echo statements, and the continue n performance flags.
This makes sense to me logically that continue n should be faster, but I would have expected to see the same with the echo statements present.
This is clearly a difference in the generated opcodes, and the position of the echo statement (inner loop vs outer loop) does anyone know a way of seeing the opcodes generated? This I suppose is ultimatley what I need as I am trying to understand what is happening internally.
Thanks :)
Yes, first one is bit faster. It's because it just jumps out on continue 2 and prints $i.
2nd example has more job to do... assign value to $flag variable, jumps out of loop, checks $flag's value, checks $flag's type (compares too) and then prints out $i. It's bit slower (simple logic).
Anyways, has it any purpose?
Some of my results for comparing
0.0011570 < 0.0012173
0.0011540 < 0.0011754
0.0011820 < 0.0012036
0.0011570 < 0.0011693
0.0011970 < 0.0012790
Used: PHP 5.3.5 # Windows (1000 attempts; 100% first was faster)
0.0011570 < 0.0012173
0.0005000 > 0.0003333
0.0005110 > 0.0004159
0.0003900 < 0.0014029
0.0003950 > 0.0003119
0.0003120 > 0.0002370
Used: PHP 5.3.3 # Linux (1000 attempts; 32% first : 68% second was faster)
0.0006700 > 0.0004863
0.0003470 > 0.0002591
0.0005360 > 0.0004027
0.0004720 > 0.0004229
0.0005300 > 0.0004366
Used: PHP 5.2.13 # Linux (1000 attempts; 9% first : 91% second was faster)
Sorry, I don't have any more servers for testing :) Now I think it mostly depends of hardware (and maybe depends of OS too).
Generally: It proves only that Linux server is faster than one run at Windows :)
The continue 2 version is slightly faster for me. But these aren't the types of things you generally need to concern yourself with. Consider:
for($y = 2; $y <= sqrt($i); $y++)
Here you are calculating the sqrt on every iteration. Just changing that to:
$sqrt = sqrt($i);
for($y = 2; $y <= $sqrt; $y++)
will give you a much better improvement than switching between two nearly identical loop styles.
The continue 2 should be used if you find that it's easier for you to understand. The computer doesn't really care.
To address your update regarding looking at opcodes, see:
http://pecl.php.net/package/vld
php -d vld.active=1 -d vld.execute=0 -f foo.php
for($i=0;$i<$num;$i++) {
if($i==even) $hilite="hilite";
dothing($i,$hilite);
}
This is basically what I want to accomplish.
What is the most efficient way to determine if $i is even?
I know I could check if half == mod 2 ... but that seems a little excessive on the calculations? Is there a simpler way?
if ($i % 2 == 0)
The already mentioned % 2 syntax is most used, and most readable for other programmers. If you really want to avoid an 'overhead' of calculations:
for($i = 0, $even = true; $i < $num; $i++, $even =! $even) {
if($even) $hilite = "hilite";
dothing($i,$hilite);
}
Although the assignment itself is probably more work then the '%2' (which is inherently just a bit-shift).
It doesn't get any simpler than $i % 2 == 0. Period.
Change the i++ in the loop statement to i+=2, so that you only examine even values of i?
Typically, a number is odd if it's LSB (Least Significant Bit) is set. You can check the state of this bit by using the bitwise AND operator:
if($testvar & 1){
// $testvar is odd
}else{
// $testvar is even
}
In your code above, a more efficient way would be to have $i increment by 2 in every loop (assuming you can ignore odd-values):
for($i=0;$i<$num;$i+=2){
// $i will always be even!
}