While logging some data using microtime() (using PHP 5), I encountered some values that seemed slightly out of phase in respect to the timestamp of my log file, so I just tried to compare the output of time() and microtime() with a simple script (usleep is just here in order to limit the data output):
<?php
for($i = 0; $i < 500; $i++) {
$microtime = microtime();
$time = time();
list($usec, $sec) = explode(" ", $microtime);
if ((int)$sec > $time) {
echo $time . ' : ' . $microtime . '<br>';
}
usleep(50000);
}
?>
Now, as $microtime is declared before $time, I expect it to be smaller, and nothing should ever be output; however, this obviously is not the case, and every now and then, $time is smaller than the seconds returned from microtime(), as in this example (truncated) output:
1344536674 : 0.15545100 1344536675
1344536675 : 0.15553900 1344536676
1344536676 : 0.15961000 1344536677
1344536677 : 0.16758900 1344536678
Now, this is just a small gap; however, I have observed some series where the difference is (quite) more than a second... so, how is this possible?
If you look at the implementations of time and microtime, you see they're radically different:
time just calls the C time function.
microtime has two implementations: If the C function gettimeofday is available (which it should be on a Linux system), it is called straight-forward. Otherwise they pull of some acrobatics to use rusage to calculate the time.
Since the C time call is only precise up to a second, it may also intentionally use a low-fidelity time source.
Furthermore, on modern x86_64 systems, both C functions can be implemented without a system call by looking into certain CPU registers. If you're on a multi-core system, these registers may not exactly match across cores, and that could be a reason.
Another potential reason for the discrepancies is that NTPd(a time-keeping daemon) or some other user-space process is changing the clock. Normally, these kinds of effects should be avoided by adjtime.
All of these rationales are pretty esoteric. To further debug the problem, you should:
Determine OS and CPU architecture (hint: and tell us!)
Try to force the process on one core
Stop any programs that are (or may be) adjusting the system time.
This could be due to the fact that microtime() uses floating point numbers, and therefore rounding errors may occur.
You can specify floating point numbers' precision in php.ini
Related
In PHP, function parameters can be passed by reference by prepending an ampersand to the parameter in the function declaration, like so:
function foo(&$bar)
{
// ...
}
Now, I am aware that this is not designed to improve performance, but to allow functions to change variables that are normally out of their scope.
Instead, PHP seems to use Copy On Write to avoid copying objects (and maybe also arrays) until they are changed. So, for functions that do not change their parameters, the effect should be the same as if you had passed them by reference.
However, I was wondering if the Copy On Write logic maybe is shortcircuited on pass-by-reference and whether that has any performance impact.
ETA: To be sure, I assume that it's not faster, and I am well aware that this is not what references are for. So I think my own guesses are quite good, I'm just looking for an answer from someone who really knows what's definitely happening under the hood. In five years of PHP development, I've always found it hard to get quality information on PHP internals short from reading the source.
In a test with 100 000 iterations of calling a function with a string of 20 kB, the results are:
Function that just reads / uses the parameter
pass by value: 0.12065005 seconds
pass by reference: 1.52171397 seconds
Function to write / change the parameter
pass by value: 1.52223396 seconds
pass by reference: 1.52388787 seconds
Conclusions
Pass the parameter by value is always faster
If the function change the value of the variable passed, for practical purposes is the same as pass by reference than by value
The Zend Engine uses copy-on-write, and when you use a reference yourself, it incurs a little extra overhead. Can only find this mention at time of writing though, and comments in the manual contain other links.
(EDIT) The manual page on Objects and references contains a little more info on how object variables differ from references.
I ran some test on this because I was unsure of the answers given.
My results show that passing large arrays or strings by reference IS significantly faster.
Here are my results:
The Y axis (Runs) is how many times a function could be called in 1 second * 10
The test was repeated 8 times for each function/variable
And here is the variables I used:
$large_array = array_fill(PHP_INT_MAX / 2, 1000, 'a');
$small_array = array('this', 'is', 'a', 'small', 'array');
$large_object = (object)$large_array;
$large_string = str_repeat('a', 100000);
$small_string = 'this is a small string';
$value = PHP_INT_MAX / 2;
These are the functions:
function pass_by_ref(&$var) {
}
function pass_by_val($var) {
}
I have experimented with values and references of 10k bytes string passing it to two identical function. One takes argument by value and the second one by reference. They were common functions - take argument, do simple processing and return a value. I did 100 000 calls of both and figured out that references are not designed to increase performance - profit of reference was near 4-5% and it grows only when string becomes large enough (100k and longer, that gave 6-7% improvement). So, my conclusion is do not use references to increase perfomance, this stuff is not for that.
I used PHP Version 5.3.1
I'm pretty sure that no, it's not faster.
Additionally, it says specifically in the manual not to try using references to increase performance.
Edit: Can't find where it says that, but it's there!
I tried to benchmark this with a real-world example based on a project I was working on. As always, the differences are trivial, but the results were somewhat unexpected. For most of the benchmarks I've seen, the called function doesn't actually change the value passed in. I performed a simple str_replace() on it.
**Pass by Value Test Code:**
$originalString=''; // 1000 pseudo-random digits
function replace($string) {
return str_replace('1', 'x',$string);
}
$output = '';
/* set start time */
$mtime = microtime();
$mtime = explode(" ", $mtime);
$mtime = $mtime[1] + $mtime[0];
$tstart = $mtime;
set_time_limit(0);
for ($i = 0; $i < 10; $i++ ) {
for ($j = 0; $j < 1000000; $j++) {
$string = $originalString;
$string = replace($string);
}
}
/* report how long it took */
$mtime = microtime();
$mtime = explode(" ", $mtime);
$mtime = $mtime[1] + $mtime[0];
$tend = $mtime;
$totalTime = ($tend - $tstart);
$totalTime = sprintf("%2.4f s", $totalTime);
$output .= "\n" . 'Total Time' .
': ' . $totalTime;
$output .= "\n" . $string;
echo $output;
Pass by Reference Test Code
The same except for
function replace(&$string) {
$string = str_replace('1', 'x',$string);
}
/* ... */
replace($string);
Results in seconds (10 million iterations):
PHP 5
Value: 14.1007
Reference: 11.5564
PHP 7
Value: 3.0799
Reference: 2.9489
The difference is a fraction of a millisecond per function call, but for this use case, passing by reference is faster in both PHP 5 and PHP 7.
(Note: the PHP 7 tests were performed on a faster machine -- PHP 7 is faster, but probably not that much faster.)
There is nothing better than a testing piece of code
<?PHP
$r = array();
for($i=0; $i<500;$i++){
$r[]=5;
}
function a($r){
$r[0]=1;
}
function b(&$r){
$r[0]=1;
}
$start = microtime(true);
for($i=0;$i<9999;$i++){
//a($r);
b($r);
}
$end = microtime(true);
echo $end-$start;
?>
Final result! The bigger the array (or the greater the count of calls) the bigger the difference. So in this case, calling by reference is faster because the value is changed inside the function.
Otherwise there is no real difference between "by reference" and "by value", the compiler is smart enough not to create a new copy each time if there is no need.
Is simple, there is no need to test anything.
Depends on use-case.
Pass by value will ALWAYS BE FASTER BY VALUE than reference for small amount of arguments. This depends by how many variables that architecture allows to be passed through registers (ABI).
For example x64 will allow you 4 values 64 bit each to be passed through registers.
https://en.wikipedia.org/wiki/X86_calling_conventions
This is because you don't have to de-referentiate the pointers, just use value directly.
If your data that needs to be passed is bigger than ABI, rest of values will go to stack.
In this case, a array or a object (which in instance is a class, or a structure + headers) will ALWAYS BE FASTER BY REFERENCE.
This is because a reference is just a pointer to your data (not data itself), fixed size, say 32 or 64 bit depending on machine. That pointer will fit in one CPU register.
PHP is written in C/C++ so I'd expect to behave the same.
There is no need for adding & operator when passing objects. In PHP 5+ objects are passed by reference anyway.
strcmp - what is means "Binary safe string comparison"? This compare is safe for the timing attack?
If no, how can I compare two strings for preventing the timing attack? Compare hashes of the strings is enough? Or I must use some library (or own code) that gives constant time for the compare?
Here writes that the timing attack can be used in the web. But can be this type of an attack exists in the real world? Or this attack can be used only for a small type of an attacker (like government) so this protection through the web is excess?
"binary safe" means that any bytes can be safely compared with strcmp, not just valid characters in some character set. A quick test confirms that strcmp is not safe against timing attacks:
$nchars = 1000;
$s1 = str_repeat('a', $nchars + 1);
$s2 = str_repeat('a', $nchars) . 'b';
$s3 = 'b' . str_repeat('a', $nchars);
$times = 100000;
$start = microtime(true);
for ($i = 0; $i < $times; $i++) {
strcmp($s1, $s2);
}
$timeForSameAtStart = microtime(true) - $start;
$start = microtime(true);
for ($i = 0; $i < $times; $i++) {
strcmp($s1, $s3);
}
$timeForSameAtEnd = microtime(true) - $start;
printf("'b' at the end: %.4f\n'b' at the front: %.4f\n", $timeForSameAtStart, $timeForSameAtEnd);
For me this prints something like 'b' at the end: 0.0634 'b' at the front: 0.0287.
Many other string-based functions in PHP likely suffer from similar issues. Working around this is tricky, especially in PHP where you don't actually know what a lot of functions are really doing at the physical level.
One possible tactic is just sticking a random wait time in your code before you return the answer to the caller/potential attacker. Even better, measure how long it took to check the input data (e.g., with microtime), and then wait a random time minus that amount of time. This is not 100% secure, but it makes attacking the system MUCH harder because, at a minimum, an attacker will have to try each input many times in order to filter out the randomness.
The problem with strcmp is, that it depends on implementation. If it binarily compares each byte of strings until it reaches difference or end of either strings, then it is vulnerable to timing attack.
Now how about hashing?
I have found this Security question and i belive it has the correct answer for you:
https://security.stackexchange.com/a/46215
Timing attack is a myth.
I explain.
The time that it takes to validates a text, between one similar versus other different is around a fraction of second, let's say +/- 0.1 second (exaggerated!).
However, the time that it takes an attacker to measure this time is:
delay of the network + 0.1 seconds + delay of the system (may be its busy doing some other task) + other delays.
So no, its not possible, even for a local system (lag zero), the result of interval of time is always unclear.
In a test, let's say the difference between one method and another is 1us.
So, if we test it and the difference is 1us, then we could guess part of the number.
But what if there is another factor, for example, the network, the cpu usage at the moment, the cpu cycle of the moment and such.
Even if we excluded the network, we have that most operating systems are multi-tasking, so the test must be done in a system with a single-tasking operating system or a system running a single task, and that is not something that you see in the wild. Even embedded systems run multiple threads at the same time.
But let's say we run locally (not network) and we are doing a drill-run in a computer that only runs a single task, our task. But we have another problem, modern CPUs don't run at a constant cycle, they vary depending on the usage (, temperature and other factors.
So, it is only possible if:
it is executed locally and there is no other factor.
it runs as a single task and no other task is running on the server.
the cpu runs constantly.
i.e. it is ABSURD.
it is the test.
<?php
$text='123456789012345678901234567890123456789012345678901234567890123456789012345678901234';
$compare1='12345678901234567890123456789012345678901234567890123456789012345678901234567890123x';
$compare2='2222222222222222222222222222222222222222222222222222222222222222222222222222222222222';
$a1=microtime(true);
for($i=0;$i<100000;$i++) {
if($compare1===$text) {
// do something
}
}
$a2=microtime(true);
var_dump($a2-$a1);
$a1=microtime(true);
for($i=0;$i<100000;$i++) {
if($compare2===$text) {
// do something
}
}
$a2=microtime(true);
var_dump($a2-$a1);
It took me 5 minutes to invalidate this hypothesis.
What is tested:
it tests a 512bit text and it compares with two tests and compares the times.
This test is done to prove the hypothesis so it forces a no-real situation where the first text compared is almost the same as the first test (excluding the last character).
It also excludes latencies and other operations.
(why 512bits, most passwords are encrypted in 128 and 256bits, 512bits is what we can call it safe)
And it is the result.
one round:
0.021588087081909
0.021672010421753 (long time)
another run:
0.021767854690552
0.022729873657227 (long time)
and another run:
0.021697998046875 (long time)
0.021611213684082
and again
0.021565914154053 (long time)
0.020948171615601
and again
0.021995067596436
0.0224769115448 (long time)
So, even when the test is forced to validate the point, it fails.
i.e.
you can't find a trend when one of the variables is unknown and this factor compromises the whole test. I can test it 1 million times and the result will be the same. And this test, in particular, avoids any variable such as latency, other processes, access to the database, etc.
How can I calculate the CPU time actually used by my php script?
Note this is NOT what I'm looking for:
<?php
$begin_time=microtime(true);
//..
//end of the script:
$total_time=microtime(true)-$begin_time;
because that would give me the time elapsed. That may include a lot of time used by unrelated processes running at the same time, as well as time spent waiting for i/o.
I've seen there is getrusage(), but a user comment in the documentation page says:
getrusage() reports kernel counters that are updated only once application loses context and a switch to kernel space happens. For example on modern Linux server kernels that would mean that getrusage() calls would return information rounded at 10ms, desktop kernels - at 1ms.
getrusage() isn't usable for micro-measurements at all - and getmicrotime(true) might be much more valuable resource.
so that doesn't seem to be an option, is it?
What alternatives do I have?
Define "your" I/O. Technically a move of memory to cpu registers is I/O. Are you saying you want to remove that time from your calculation? (I'm guessing no)
What I'm getting at is you are looking to profile your code in some way most likely. If you want to measure time not spent reading/writing to files or network sockets, just use microtime and put extra calls around portions doing I/O. Then you will also get an idea of how much time your I/O is taking. More likely you will find you have some loop taking more time than you expect.
When I profile like this I either use profiling tools in eclipse, or I use a time logger and do some kind of binary search-ish insertion of the time logging into the code. Usually I find some small area of code that is taking 85% of the measured time and do optimization there.
Also as a side note, don't let the perfect become the enemy of the practical. 90% of the time during your process your calls won't be interrupted by some other process and your microtime counts will be close enough.
You can use getrusage()
Full example:
<?php
function rutime($ru, $rus, $index){
return ($ru["ru_$index.tv_sec"]*1000 + intval($ru["ru_$index.tv_usec"]/1000))
- ($rus["ru_$index.tv_sec"]*1000 + intval($rus["ru_$index.tv_usec"]/1000));
}
$cpu_before = getrusage();
$ms = microtime(true) * 1000;
sleep(3);
$tab = [];
for($i = 0; $i < 500000; $i++) {
$tab[] = $i;
}
$cpu_after = getrusage();
echo "Took ".rutime($cpu_after, $cpu_before, "utime")." ms CPU usage" . PHP_EOL;
echo "Took ".((microtime(true) * 1000) - $ms)." ms total". PHP_EOL;
Source: https://helpdesk.nodehost.ca/en/article/how-to-calculate-real-cpu-usage-in-a-php-script-o3ceu8/
I would like to compare different PHP code to know which one would be executed faster. I am currently using the following code:
<?php
$load_time_1 = 0;
$load_time_2 = 0;
$load_time_3 = 0;
for($x = 1; $x <= 20000; $x++)
{
//code 1
$start_time = microtime(true);
$i = 1;
$i++;
$load_time_1 += (microtime(true) - $start_time);
//code 2
$start_time = microtime(true);
$i = 1;
$i++;
$load_time_2 += (microtime(true) - $start_time);
//code 3
$start_time = microtime(true);
$i = 1;
$i++;
$load_time_3 += (microtime(true) - $start_time);
}
echo $load_time_1;
echo '<br />';
echo $load_time_2;
echo '<br />';
echo $load_time_3;
?>
I have executed the script several times.
The first result is
0.44057559967041
0.43392467498779
0.43600964546204
The second result is
0.50447297096252
0.48595094680786
0.49943733215332
The third result is
0.5283739566803
0.55247902870178
0.55091571807861
The result looks okay, but the problem is, that every time I execute this code the result is different. Also, I am comparing three times the same code and on the same machine.
Why would there be a difference in speed while comparing? And is there a way to compare execution times and to see the the real difference?
there is a thing called Observational error.
As long as your numbers do not exceed it, all your measurements are just waste of time.
The only proper way of doing measurements is called profiling and stands for measuring significant parts of the code, not senseless ones.
Why would there be a difference in
speed while comparing?
There are two reasons for this, which are both related to how things that are out of your control are handled by the PHP and the Operating system.
Firstly, the computer processor can only do certain amount of operations at any given time. The Operating system is basically responsible of handling the multitasking to divide these available cycles to your applications. Since these cycles aren't given at a constant rate, small speed variations are to be expected even with identical PHP commands, because of how processor cycles are allocated.
Secondly, a bigger cause to time variations are the background operations of PHP. There are many things that are completely hidden to the user, like memory allocation, garbage collection and handling various name spaces for variables and the like. These operations also take computer cycles and they can be run at unexpected times during your script. If garbage collection is performed during the first incrementation, but not the second, it causes the first operation to take longer than the second. Sometimes, because of garbage collection, the order in which the tests are performed can also impact the execution time.
Speed testing can be a bit tricky, because unrelated factors (like other applications running in the background) can skew the results of your test. Generally small speed differences are hard to tell between scripts, but when a speed test is run enough number of times, the real results can be seen. For example, if one script is constantly faster than another, it usually points out to that script being more efficient in terms of processing speed.
the reason the results vary is because there are other things going on at the same time, such as windows or linux based tasks, other processes, you will never get an exact result, you best of running the code over a 100 iterations and then devide the result to find the average time take, and use that as your figure/
Also it would be beneficial for you to create a class that can handle this for you, this way you can use it all the time without having to write the code every time:
try something like this (untested):
class CodeBench
{
private $benches = array();
public function __construct(){}
public function begin($name)
{
if(!isset($this->benches[$name]))
{
$this->benches[$name] = array();
}
$this->benches[$name]['start'] = array(
'microtime' => microtime(true)
/* Other information*/
);
}
public function end($name)
{
if(!isset($this->benches[$name]))
{
throw new Exception("You must first declare a benchmark for " . $name);
}
$this->benches[$name]['end'] = array(
'microtime' => microtime()
/* Other information*/
);
}
public function calculate($name)
{
if(!isset($this->benches[$name]))
{
throw new Exception("You must first declare a benchmark for " . $name);
}
if(!isset($this->benches[$name]['end']))
{
throw new Exception("You must first call an end call for " . $name);
}
return ($this->benches[$name]['end'] - $this->benches[$name]['start']) . 'ms'
}
}
And then use like so:
$CB = new CodeBench();
$CB->start("bench_1");
//Do work:
$CB->end("bench_1");
$CB->start("bench_2");
//Do work:
$CB->end("bench_2");
echo "First benchmark had taken: " . $CB->calculate("bench_1");
echo "Second benchmark had taken: " . $CB->calculate("bench_2");
Computing speeds are never 100% set in stone. PHP is a server-side script, and thus depending on the computing power available to the server, it can take a varying amount of time.
Since you're subtracting from the start time with each step, it is expected that load time 3 will be greater than 2 which will be greater than 1.
Or: Should I optimize my string-operations in PHP? I tried to ask PHP's manual about it, but I didn't get any hints to anything.
PHP already optimises it - variables are assigned using copy-on-write, and objects are passed by reference. In PHP 4 it doesn't, but nobody should be using PHP 4 for new code anyway.
One of the most essential speed optimization techniques in many languages is instance reuse. In that case the speed increase comes from at least 2 factors:
1. Less instantiations means less time spent on construction.
2. The less the amount of memory that the application uses, the less CPU cache misses there probably are.
For applications, where the speed is the #1 priority, there exists a truly tight bottleneck between the CPU and the RAM. One of the reasons for the bottleneck is the latency of the RAM.
The PHP, Ruby, Python, etc., are related to the cache-misses by a fact that even they store at least some (probably all) of the run-time data of the interpreted programs in the RAM.
String instantiation is one of the operations that is done pretty often, in relatively "huge quantities", and it may have a noticeable impact on speed.
Here's a run_test.bash of a measurement experiment:
#!/bin/bash
for i in `seq 1 200`;
do
/usr/bin/time -p -a -o ./measuring_data.rb php5 ./string_instantiation_speedtest.php
done
Here are the ./string_instantiation_speedtest.php and the measurement results:
<?php
// The comments on the
// next 2 lines show arithmetic mean of (user time + sys time) for 200 runs.
$b_instantiate=False; // 0.1624 seconds
$b_instantiate=True; // 0.1676 seconds
// The time consumed by the reference version is about 97% of the
// time consumed by the instantiation version, but a thing to notice is
// that the loop contains at least 1, probably 2, possibly 4,
// string instantiations at the array_push line.
$ar=array();
$s='This is a string.';
$n=10000;
$s_1=NULL;
for($i=0;$i<$n;$i++) {
if($b_instantiate) {
$s_1=''.$s;
} else {
$s_1=&$s;
}
// The rand is for avoiding optimization at storage.
array_push($ar,''.rand(0,9).$s_1);
} // for
echo($ar[rand(0,$n)]."\n");
?>
My conclusion from this experiment and one other experiment that I did with Ruby 1.8 is that it makes sense to pass string values around by reference.
One possible way to allow the "pass-strings-by-reference" to take place at the whole application scope is to consistently create a new string instance, whenever one needs to use a modified version of a string.
To increase locality, therefore speed, one may want to decrease the amount of memory that each of the operands consumes. The following experiment demonstrates the case for string concatenations:
<?php
// The comments on the
// next 2 lines show arithmetic mean of (user time + sys time) for 200 runs.
$b_suboptimal=False; // 0.0611 seconds
$b_suboptimal=True; // 0.0785 seconds
// The time consumed by the optimal version is about 78% of the
// time consumed by the suboptimal version.
//
// The number of concatenations is the same and the resultant
// string is the same, but what differs is the "average" and maximum
// lengths of the tokens that are used for assembling the $s_whole.
$n=1000;
$s_token="This is a string with a Linux line break.\n";
$s_whole='';
if($b_suboptimal) {
for($i=0;$i<$n;$i++) {
$s_whole=$s_whole.$s_token.$i;
} // for
} else {
$i_watershed=(int)round((($n*1.0)/2),0);
$s_part_1='';
$s_part_2='';
for($i=0;$i<$i_watershed;$i++) {
$s_part_1=$s_part_1.$i.$s_token;
} // for
for($i=$i_watershed;$i<$n;$i++) {
$s_part_2=$s_part_2.$i.$s_token;
} // for
$s_whole=$s_part_1.$s_part_2;
} // else
// To circumvent possible optimization one actually "uses" the
// value of the $s_whole.
$file_handle=fopen('./it_might_have_been_a_served_HTML_page.txt','w');
fwrite($file_handle, $s_whole);
fclose($file_handle);
?>
For example, if one assembles HTML pages that contain considerable amount of text, then one might want to think about the order, how different parts of the generated HTML are concated together.
A BSD-licensed PHP implementation and Ruby implementation of the watershed string concatenation algorithm is available. The same algorithm can be (has been by me) generalized to speed up multiplication of arbitrary precision integers.
Arrays and strings have copy-on-write behaviour. They are mutable, but when you assign them to a variable initially that variable will contain the exact same instance of the string or array. Only when you modify the array or string is a copy made.
Example:
$a = array_fill(0, 10000, 42); //Consumes 545744 bytes
$b = $a; // " 48 "
$b[0] = 42; // " 545656 "
$s = str_repeat(' ', 10000); // " 10096 "
$t = $s; // " 48 "
$t[0] = '!'; // " 10048 "
A quick google would seem to suggest that they are mutable, but the preferred practice is to treat them as immutable.
PHP 7.4 used mutable strings:
<?php
$str = "Hello\n";
echo $str;
$str[2] = 'y';
echo $str;
Output:
Hello
Heylo
Test: PHP Sandbox
PHP strings are immutable.
Try this:
$a="string";
echo "<br>$a<br>";
echo str_replace('str','b',$a);
echo "<br>$a";
It echos:
string
bing
string
If a string was mutable, it would have continued to show "bing".