I am trying to microptimize the code and I was wondering which is the faster to convert a variable into a boolean:
<?php
$a='test';
$result1 = !!$a;
$result2 = (bool)$a;
?>
I am not worry about code size, just about execution time.
some benchmark here, but it is very inconclusive (tried multiple times), so I am wondering what happens in the source code of PHP to see if they are handled differently:
<?php
$a = 'test';
for($c=0;$c<3;$c++){
$start = microtime(true);
for($i=0;$i<10000000;$i++){
$result = !!$a;
}
$end = microtime(true);
$delta = $end-$start;
echo '!!: '.$delta.'<br />';
}
$a = 'test';
for($c=0;$c<3;$c++){
$start = microtime(true);
for($i=0;$i<10000000;$i++){
$result = (bool)$a;
}
$end = microtime(true);
$delta = $end-$start;
echo '(bool): '.$delta.'<br />';
}
result
!!: 0.349671030045
!!: 0.362552021027
!!: 0.351779937744
(bool): 0.346690893173
(bool): 0.36114192009
(bool): 0.373970985413
(bool)$a means: take $a and cast it to boolean.
!!$a means: take $a, cast it to boolean if it isn't one already, then take the resulting value and flip it, then flip it again.
Not only is (bool) faster to execute (yes, I have benchmarked it; no, you won't notice any difference unless you have millions of such operations), but it's way faster to read. If you need to cast a type, just cast a type; don't use some "clever" hackiness that'll confuse the hell out of whoever has to read your code.
Related
Well I got this code:
function importXML($filename){
if(file($filename)) $xml_file = file($filename);
else umar(97);
$content = substr($xml_file[1], strpos($xml_file[1], "<office:spreadsheet>"), strpos($xml_file[1], "</office:spreadsheet>") - strpos($xml_file[1], "<office:spreadsheet>"));
$arr = array();
$arr2 = array();
$check = strripos($content, "</table:table-row>") + 18;
$offset = 0;
while($offset != $check){
$start2 = strpos($content, "<table:table-row ", $offset);
$end2 = strpos($content, "</table:table-row>", $offset);
$offset = $end2 + 18;
array_push($arr2, $start2, $end2);
array_push($arr, substr($content, $start2, $end2-$start2));
}
return json_encode($arr);
}
And this line
$check = strripos($content, "</table:table-row>") + 18;
returns error
Allowed memory size of 134217728 bytes exhausted (tried to allocate 58887728 bytes)
I'm working with 58893135 bytes file.
In php.ini is set
memory_limit=128M
So I'm wondering why I get this error, and how can I get rid of it.
Let's break this down a bit:
if(file($filename)) $xml_file = file($filename);
Here $xml_file will take 58MB . not to mention that you're reading the file twice. You should instead:
if (file_exists($filename)) $xml_file = file($filename);
Next bit:
$content = substr($xml_file[1],
strpos($xml_file[1], "<office:spreadsheet>"),
strpos($xml_file[1], "</office:spreadsheet>") -
strpos($xml_file[1], "<office:spreadsheet>")
);
Here you're looking for <office:spreadsheet> twice in a large file, then storing the substring in $content which will also take about 58MB. You can at this point do : $xml_file = null; to remove references to it if you don't need it, i.e.
$spreadSheetStart = strpos($xml_file[1], "<office:spreadsheet>");
$spreadSheetLen = strpos($xml_file[1], "</office:spreadsheet>") - $spreadSheetStart;
$content = substr($xml_file[1], $spreadSheetStart, $spreadSheetLen);
$xml_file = null;
Now at the end you have an array of size approximately 58MB and you need to make it into a JSON of the same size while still having the $content in memory which will mean you need 170MB. You can again do:
$content = null; //unset($content); would also work.
return json_encode($arr);
The main thing you need to understand about PHP and strings and array is that in PHP arrays and strings behave like primitive types which means they are always passed by value. Example:
$a = "Test";
$b = $a;
$b .= " again";
echo $a.PHP_EOL.$b.PHP_EOL;
This prints:
Test
Test again
This indicates that $b is a copy of $a and not a reference to the same memory location. A copy means that you've essentially doubled the memory requirements of the code by doing $b = $a;
The same goes for arrays and all other primitives (arrays, integers, booleans, floats).
As apokryfos has mentioned in comment, we should NEVER use something like:
while(1)
or
while(true)
or
while(1===1)
or something like that.
Although if you must use it then in that case,
I believe what's happening is that your while loop somehow never coming to your break statement, that means your condition : ($offset == $check) is never getting true.
As a verification, try starting a counter before while:
$count = 1;
increase it at the end of while and impose a condition of break when its lets say 50 :
while(true){
//your code
if($count === 50){
break;
}
$count++;
}
Now if problem resolves then for sure its because of the fact that your if condition for breaking is never getting true..try thinking all aspects and add more conditions.
If that doesn't then I believe your app needs more memory than you are providing. Now I HIGHLY DISCOURAGE this method, but until you find real problem, you could add:
ini_set('memory_limit','256M');
at beginning of your code.
But I again mention, THIS IS TEMPORARY, problem exists in your code, increasing memory limit is not a solution but it will give you opportunity to locate the real problem.
Try this ! You can add this line:
ini_set(ā€¯memory_limit","80M");
before
$check = strripos($content, "</table:table-row>") + 18;
Supposedly string is:
$a = "abc-def"
if (preg_match("/[^a-z0-9]/i", $a, $m)){
$i = "i stopped scanning '$a' because I found a violation in it while
scanning it from left to right. The violation was: $m[0]";
}
echo $i;
example above: should indicate "-" was the violation.
I would like to know if there is a non-preg_match way of doing this.
I will likely run benchmarks if there is a non-preg_match way of doing this perhaps 1000 or 1 million runs, to see which is faster and more efficient.
In the benchmarks "$a" will be much longer.
To ensure it is not trying to scan the entire "$a" and to ensure it stops soon as it detects a violation within the "$a"
Based on information I have witnessed on the internet, preg_match stops when the first match is found.
UPDATE:
this is based on the answer that was given by "bishop" and will likely to be chosen as the valid answer soon ( shortly ).
i modified it a little bit because i only want it to report the violator character. but i also commented that line out so benchmark can run without entanglements.
let's run a 1 million run based on that answer.
$start_time = microtime(TRUE);
$count = 0;
while ($count < 1000000){
$allowed = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
$input = 'abc-def';
$validLen = strspn($input, $allowed);
if ($validLen < strlen($input)){
#echo "violation at: ". substr($input, $validLen,1);
}
$count = $count + 1;
};
$end_time = microtime(TRUE);
$dif = $end_time - $start_time;
echo $dif;
the result is: 0.606614112854
( 60 percent of a second )
let's do it with the preg_match method.
i hope everything is the same. ( and fair )..
( i say this because there is the ^ character in the preg_match )
$start_time = microtime(TRUE);
$count = 0;
while ($count < 1000000){
$input = 'abc-def';
preg_match("/[^a-z0-9]/i", $input, $m);
#echo "violation at:". $m[0];
$count = $count + 1;
};
$end_time = microtime(TRUE);
$dif = $end_time - $start_time;
echo $dif;
i use "dif" in reference to the terminology "difference".
the "dif" was.. 1.1145210266113
( took 11 percent more than a whole second )
( if it was 1.2 that would mean it is 2x slower than the php way )
You want to find the location of the first character not in the given range, without using regular expressions? You might want strspn or its complement strcspn:
$allowed = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789';
$input = 'abc-def';
$validLen = strspn($input, $allowed);
if (strlen($input) !== $validLen) {
printf('Input invalid, starting at %s', substr($input, $validLen));
} else {
echo 'Input is valid';
}
Outputs Input invalid, starting at -def. See it live.
strspn (and its complement) are very old, very well specified (POSIX even). The standard implementations are optimized for this task. PHP just leverages that platform implementation, so PHP should be fast, too.
I have numbers coming out of a database (very controlled input) that will have underscores before and after them. They are stored like this:
_51_ _356_
They will not be stored in any other format, but there will be times where I need to get just the numbers out of them. I have chosen to use either
$x = filter_var($myNumber, FILTER_SANITIZE_NUMBER_INT);
or
$y = preg_replace("/[^0-9]/","",$myNumber);
I am not sure of the nuances between the 2 in the backend, but they both produce exactly what I need (I think so, anyway), so it doesn't matter to me which I use. What are the pros and cons of using each of these options? (For example, does one use an array or other weird thing that I might need to know about? One uses way too many resources?)
Well, there isn't big difference in your case. I think preg_replace is more expensive in resource, since it had to parse the regex pattern.
Alternatively you can use trim:
echo trim('_12_', '_');
It will remove the '_' in both side, I think this is the most readable manner to do.
Filters don't use regular expressions, but work in a similar way: iterate a string char-by-char and remove characters that don't match the pattern:
for (i = 0; i < Z_STRLEN_P(value); i++) {
if ((*map)[str[i]]) {
buf[c] = str[i];
++c;
}
}
#http://lxr.php.net/xref/PHP_5_6/ext/filter/sanitizing_filters.c#filter_map_apply
and the FILTER_SANITIZE_NUMBER_INT is defined as [^0-9+-]:
/* strip everything [^0-9+-] */
const unsigned char allowed_list[] = "+-" DIGIT;
filter_map map;
filter_map_init(&map);
filter_map_update(&map, 1, allowed_list);
filter_map_apply(value, &map);
#http://lxr.php.net/xref/PHP_5_6/ext/filter/sanitizing_filters.c#php_filter_number_int
Of course, [^0-9+-] is not a right expression to filter integer numbers, therefore be prepared for surprises:
$x = filter_var("+++123---", FILTER_SANITIZE_NUMBER_INT);
var_dump($x); // WTF?
My suggestion is to stick to regular expressions: they are explicit and far less buggy than filters.
I wanted to try some various methods for this, so set up the following benchmark. It looks like for your case, trim is definitely the best option as it only has to look at the beginning and end of the string instead of each character. Here are my test results on 10,000,000 random integers surrounded by underscores running PHP 7.0.18.
preg_replace: 1.9469740390778 seconds.
filter_var: 1.6922700405121 seconds.
str_replace: 0.72129797935486 seconds.
trim: 0.37275195121765 seconds.
And here is my code if anyone wants to run similar tests:
<?php
$ints = array();//array_fill(0, 10000000, '_1029384756_');
for($i = 0; $i < 10000000; $i++) {
$ints[] = '_'.mt_rand().'_';
}
$start = microtime(true);
foreach($ints as $v) {
preg_replace('/[^0-9]/', '', $v);
}
$end = microtime(true);
echo 'preg_replace in '.($end-$start).' seconds.',PHP_EOL;
$start = microtime(true);
foreach($ints as $v) {
filter_var($v, FILTER_SANITIZE_NUMBER_INT);
}
$end = microtime(true);
echo 'filter_var in '.($end-$start).' seconds.',PHP_EOL;
$start = microtime(true);
foreach($ints as $v) {
str_replace('_', '', $v);
}
$end = microtime(true);
echo 'str_replace in '.($end-$start).' seconds.',PHP_EOL;
$start = microtime(true);
foreach($ints as $v) {
trim($v, '_');
}
$end = microtime(true);
echo 'trim in '.($end-$start).' seconds.',PHP_EOL;
<?php
$a = microtime(true);
$num = 0;
for($i=0;$i<10000000;$i++)
{
$num = $i;
}
$b= microtime(true);
echo $b-$a;
?>
I run this on Ubuntu 12.10 and Apache 2
will give me approx. .50 seconds... when I run an assignment for a million times.. BUT BUT...
the same code, instead of $num = $i ... i go ...
$num = $i + 10; and it now takes almost 1.5 times less time to execute.. around .36 consistently..
How come the simple assignment is taking more, whilst an assignment and adding a 10 over it... takes less time!
I am by no means an expert, but here are my findings:
$s = microtime(true);
for($i=0;$i<100000000;$i++) $tmp = $i;
$t = microtime(true);
for($i=0;$i<100000000;$i++) $tmp = $i+10;
$u = microtime(true);
echo ($t-$s).chr(10).($u-$t);
Results in:
9.9528648853302
9.0821340084076
On the other hand, using a constant value for the assignment test:
$x = 0;
$s = microtime(true);
for($i=0;$i<100000000;$i++) $tmp = $x;
$t = microtime(true);
for($i=0;$i<100000000;$i++) $tmp = $x+10;
$u = microtime(true);
echo ($t-$s).chr(10).($u-$t);
Results in:
6.1365358829498
9.3231790065765
This leads me to believe that the answer has something to do with opcode cacheing. I honestly couldn't tell you what about it is making the difference, but as you can see using a constant value for the assignment makes a huge difference.
This is just an educated guess, based on looking at the latest php source on Github, but I'd say this difference is due to function call overhead in the interpreter source.
$tmp = $i;
compiles to a single opcode ASSIGN !2, !1;, which copies one named variable's value to another named variable. In the source, the key part looks like this:
if (EXPECTED(Z_TYPE_P(variable_ptr) <= IS_BOOL)) {
/* nothing to destroy */
ZVAL_COPY_VALUE(variable_ptr, value);
zendi_zval_copy_ctor(*variable_ptr);
}
$tmp = $i + 10;
compiles to two opcodes ADD ~8 !1, 10; ASSIGN !2, ~8;, which creates a temporary variable ~8 and assigns its value to a named variable. In the source, the key part looks like this:
if (EXPECTED(Z_TYPE_P(variable_ptr) <= IS_BOOL)) {
/* nothing to destroy */
ZVAL_COPY_VALUE(variable_ptr, value);
}
Notice that there's an extra function call to zendi_zval_copy_ctor() in the first case. That function performs some bookkeeping as needed (e.g. if the original variable is a resource, it needs to make sure that resource is not freed until this new variable is gone, etc.). For a primitive type such as a number, there's nothing to do, but the function call itself introduces some overhead, which accumulates over 10 million iterations of your test. You should note that this overhead is normally negligible, because even in 10 million iterations it only accumulated to .14 seconds.
#Kolink's observation about a constant being faster can also be answered in the same function. It includes a check to avoid redundant copying if the new value is the same as the old one:
if (EXPECTED(variable_ptr != value)) {
copy_value:
// the same code that handles `$tmp = $i` above
if (EXPECTED(Z_TYPE_P(variable_ptr) <= IS_BOOL)) {
/* nothing to destroy */
ZVAL_COPY_VALUE(variable_ptr, value);
zendi_zval_copy_ctor(*variable_ptr);
} else {
/* irrelevant to the question */
}
}
So only the first assignment of $tmp = $x copies the value of $x, the following ones see that the value of $tmp would not change and skip the copying, making it faster.
Hey there. Today I wrote a small benchmark script to compare performance of copying variables vs. creating references to them. I was expecting, that creating references to large arrays for example would be significantly slower than copying the whole array. Here is my benchmark code:
<?php
$array = array();
for($i=0; $i<100000; $i++) {
$array[] = mt_rand();
}
function recursiveCopy($array, $count) {
if($count === 1000)
return;
$foo = $array;
recursiveCopy($array, $count+1);
}
function recursiveReference($array, $count) {
if($count === 1000)
return;
$foo = &$array;
recursiveReference($array, $count+1);
}
$time = microtime(1);
recursiveCopy($array, 0);
$copyTime = (microtime(1) - $time);
echo "Took " . $copyTime . "s \n";
$time = microtime(1);
recursiveReference($array, 0);
$referenceTime = (microtime(1) - $time);
echo "Took " . $referenceTime . "s \n";
echo "Reference / Copy: " . ($referenceTime / $copyTime);
The actual result I got was, that recursiveReference took about 20 times (!) as long as recursiveCopy.
Can somebody explain this PHP behaviour?
PHP will very likely implement copy-on-write for its arrays, meaning when you "copy" an array, PHP doesn't do all the work of physically copying the memory until you modify one of the copies and your variables can no longer reference the same internal representation.
Your benchmarking is therefore fundamentally flawed, as your recursiveCopy function doesn't actually copy the object; if it did, you would run out of memory very quickly.
Try this: By assigning to an element of the array you force PHP to actually make a copy. You'll find you run out of memory pretty quickly as none of the copies go out of scope (and aren't garbage collected) until the recursive function reaches its maximum depth.
function recursiveCopy($array, $count) {
if($count === 1000)
return;
$foo = $array;
$foo[9492] = 3; // Force PHP to copy the array
recursiveCopy($array, $count+1);
}
in recursiveReference you're calling recursiveCopy... this doesn't make any sense, in this case you're calling recursiveReference just once. correct your code, rund the benchmark again and come back with your new results.
in addition, i don't think it's useful for a benchmark to do this recursively. a better solution would be to call a function 1000 times in a loop - once with the array directly and one with a reference to that array.
You don't need to (and thus shouldn't) assign or pass variables by reference just for performance reasons. PHP does such optimizations automatically.
The test you ran is flawed because of these automatic optimizations. In ran the following test instead:
<?php
for($i=0; $i<100000; $i++) {
$array[] = mt_rand();
}
$time = microtime(1);
for($i=0; $i<1000; $i++) {
$copy = $array;
unset($copy);
}
$duration = microtime(1) - $time;
echo "Normal Assignment and don't write: $duration<br />\n";
$time = microtime(1);
for($i=0; $i<1000; $i++) {
$copy =& $array;
unset($copy);
}
$duration = microtime(1) - $time;
echo "Assignment by Reference and don't write: $duration<br />\n";
$time = microtime(1);
for($i=0; $i<1000; $i++) {
$copy = $array;
$copy[0] = 0;
unset($copy);
}
$duration = microtime(1) - $time;
echo "Normal Assignment and write: $duration<br />\n";
$time = microtime(1);
for($i=0; $i<1000; $i++) {
$copy =& $array;
$copy[0] = 0;
unset($copy);
}
$duration = microtime(1) - $time;
echo "Assignment by Reference and write: $duration<br />\n";
?>
This was the output:
//Normal Assignment without write: 0.00023698806762695
//Assignment by Reference without write: 0.00023508071899414
//Normal Assignment with write: 21.302103042603
//Assignment by Reference with write: 0.00030708312988281
As you can see there is no significant performance difference in assigning by reference until you actually write to the copy, i.e. when there is also a functional difference.
Generally speaking in PHP, calling by reference is not something you'd do for performance reasons; it's something you'd do for functional reasons - ie because you actually want the referenced variable to be updated.
If you don't have a functional reason for calling by reference then you should stick with regular parameter passing, because PHP handles things perfectly efficiently that way.
(that said, as others have pointed out, your example code isn't exactly doing what you think it is anyway ;))
In recursiveReference() function you call recursiveCopy() function. It it what you really intended to do?
You do nothing with $foo variable - probably it was supposed to be used in further method call?
Passing variable by reference should generally save stack memory in case of passing large objects.
recursiveReference is calling recursiveCopy.
Not that that would necessarily harm performance, but that's probably not what you're trying to do.
Not sure why performance is slower, but it doesn't reflect the measurement you're trying to make.