How Significant Is PHP Function Call Overhead? - php

I'm relatively new to PHP and slowly learning the idiosyncrasies specific to the language. One thing I get dinged by a lot is that I (so I'm told) use too many function calls and am generally asked to do things to work around them. Here's two examples:
// Change this:
} catch (Exception $e) {
print "It seems that error " . $e->getCode() . " occured";
log("Error: " . $e->getCode());
}
// To this:
} catch (Exception $e) {
$code = $e->getCode();
print "It seems that error " . $code . " occured";
log("Error: " . $code);
}
2nd Example
// Change this:
$customer->setProducts($products);
// To this:
if (!empty($products)) {
$customer->setProducts($products);
}
In the first example I find that assigning $e->getCode() to $code ads a slight cognitive overhead; "What's '$code'? Ah, it's the code from the exception." Whereas the second example adds cyclomatic complexity. In both examples I find the optimization to come at the cost of readability and maintainability.
Is the performance increase worth it or is this micro optimization?
I should note that we're stuck with PHP 5.2 for right now.
I've done some very rough bench tests and find the function call performance hit to be on the order of 10% to 70% depending on the nature of my bench test. I'll concede that this is significant. But before that catch block is hit there was a call to a database and an HTTP end point. Before $products was set on the $customer there was a complex sort that happened to the $products array. At the end of the day does this optimization justify the cost of making the code harder to read and maintain? Or, although these examples are simplifications, does anybody find the 2nd examples just as easy or easier to read than the first (am I being a wiener)?
Can anyone cite any good articles or studies about this?
Edit:
An example bench test:
<?php
class Foo {
private $list;
public function setList($list) {
$this->list = $list;
}
}
$foo1 = new Foo();
for($i = 0; $i < 1000000; $i++) {
$a = array();
if (!empty($a))
$foo1->setList($a);
}
?>
Run that file with the time command. On one particular machine it takes an average of 0.60 seconds after several runs. Commenting out the if (!empty($a)) causes it to take an average of 3.00 seconds to run.
Clarification: These are examples. The 1st example demonstrates horrible exception handling and a possible DRY violation at the expense of a simple, non-domain-specific example.

Nobody has yet discussed how the server's hardware is related to function call overhead.
When a function is called, all of the CPU's registers contain data relevant to the current execution point. All of the CPU's registers must be saved to memory (typically the process' stack) or there is no hope of ever returning to that execution point and resuming execution. When returning from the function, all of the CPU's registers must be restored from memory (typically the process' stack).
So, one can see how a string of nested function calls can add overhead to the process. The CPU's registers must be saved over and over again on the stack and restored over and over again to get back from the functions.
This is really the source of the overhead of function calls. And if function arguments are passed, those must all be duplicated before the function can be called. Therefore, passing huge arrays as function arguments is a poor design.
Studies have been done on object-oriented PHP on overhead of use of getters/setters. Removing all getters/setters cut execution time by about 50%. And that simply is due to function call overhead.

PHP function call overhead is precisely 15.5355%.
:) Just stirring the pot.
Seriously, here are a couple great links on the subject:
Is it possible to have too many functions in a PHP application?
functions vs repeated code
The code maintainability versus speed discussions at these links address the (maybe more important) question implied by the OP, but just to add a little data that may also be pertinent and hopefully useful to people who come across this thread in the future, here are results from running the below code on a 2011 Macbook Pro (with very little drive space and too many programs running).
As noted elsewhere, an important consideration in deciding whether to call a function or put the code "in-line" is how many times the function will be called from within a certain block of code. The more times the function will be called, the more it's worth considering doing the work in-line.
Results (times in seconds)
Call Function Method | In-Line Method | Difference | Percent Different
1,000 iterations (4 runs)
0.0039088726043701 | 0.0031478404998779 | 0.00076103210449219 | 19.4694
0.0038208961486816 | 0.0025999546051025 | 0.0012209415435791 | 31.9543
0.0030159950256348 | 0.0029480457305908 | 6.7949295043945E-5 | 2.2530
0.0031449794769287 | 0.0031390190124512 | 5.9604644775391E-6 | 0.1895
1,000,000 iterations (4 runs)
3.1843111515045 | 2.6896121501923 | 0.49469900131226 | 15.5355
3.131945848465 | 2.7114839553833 | 0.42046189308167 | 13.4249
3.0256152153015 | 2.7648048400879 | 0.26081037521362 | 8.6201
3.1251409053802 | 2.7397727966309 | 0.38536810874939 | 12.3312
function postgres_friendly_number($dirtyString) {
$cleanString = str_ireplace("(", "-", $dirtyString);
$badChars = array("$", ",", ")");
$cleanString = str_ireplace($badChars, "", $cleanString);
return $cleanString;
}
//main
$badNumberString = '-$590,832.61';
$iterations = 1000000;
$startTime = microtime(true);
for ($i = 1; $i <= $iterations; $i++) {
$goodNumberString = postgres_friendly_number($badNumberString);
}
$endTime = microtime(true);
$firstTime = ($endTime - $startTime);
$startTime = microtime(true);
for ($i = 1; $i <= $iterations; $i++) {
$goodNumberString = str_ireplace("(", "-", $badNumberString);
$badChars = array("$", ",", ")");
$goodNumberString = str_ireplace($badChars, "", $goodNumberString);
}
$endTime = microtime(true);
$secondTime = ($endTime - $startTime);
$timeDifference = $firstTime - $secondTime;
$percentDifference = (( $timeDifference / $firstTime ) * 100);

The canonical PHP implementation is very slow because it's easy to implement and the applications PHP aims at do not require raw performance like fast function calls.
You might want to consider other PHP implementations.
If you are writing the applications you should be writing in PHP (dump data from DB to the browser over a network) then the function call overhead is not significant. Certainly don't go out of your way to duplicate code because you were afraid using a function would be too much overhead.

I you are confusing terminology. Generally function call overhead
means the overhead involved in calling an returning from a function.
Rather than processing inline. Its not the the total cost of a function call.
Its just the cost preparing the arguments and return values and doing the branch.
The problem you have is that PHP and other weakly typed scripting style languages are really bad at determining if functions have side effects. So rather than store the results of a function as a temp, they will make multiple calls. If the function is doing something complex this will be very inefficient.
So, the bottom line is: call the function once and store and reuse the result!
Don't call the same function multiple times with the same arguments. (without a good reason)

Related

Can a PHP script be expected to have more performace (i.e. processing more code in less time) by passing variable by reference? [duplicate]

In PHP, function parameters can be passed by reference by prepending an ampersand to the parameter in the function declaration, like so:
function foo(&$bar)
{
// ...
}
Now, I am aware that this is not designed to improve performance, but to allow functions to change variables that are normally out of their scope.
Instead, PHP seems to use Copy On Write to avoid copying objects (and maybe also arrays) until they are changed. So, for functions that do not change their parameters, the effect should be the same as if you had passed them by reference.
However, I was wondering if the Copy On Write logic maybe is shortcircuited on pass-by-reference and whether that has any performance impact.
ETA: To be sure, I assume that it's not faster, and I am well aware that this is not what references are for. So I think my own guesses are quite good, I'm just looking for an answer from someone who really knows what's definitely happening under the hood. In five years of PHP development, I've always found it hard to get quality information on PHP internals short from reading the source.
In a test with 100 000 iterations of calling a function with a string of 20 kB, the results are:
Function that just reads / uses the parameter
pass by value: 0.12065005 seconds
pass by reference: 1.52171397 seconds
Function to write / change the parameter
pass by value: 1.52223396 seconds
pass by reference: 1.52388787 seconds
Conclusions
Pass the parameter by value is always faster
If the function change the value of the variable passed, for practical purposes is the same as pass by reference than by value
The Zend Engine uses copy-on-write, and when you use a reference yourself, it incurs a little extra overhead. Can only find this mention at time of writing though, and comments in the manual contain other links.
(EDIT) The manual page on Objects and references contains a little more info on how object variables differ from references.
I ran some test on this because I was unsure of the answers given.
My results show that passing large arrays or strings by reference IS significantly faster.
Here are my results:
The Y axis (Runs) is how many times a function could be called in 1 second * 10
The test was repeated 8 times for each function/variable
And here is the variables I used:
$large_array = array_fill(PHP_INT_MAX / 2, 1000, 'a');
$small_array = array('this', 'is', 'a', 'small', 'array');
$large_object = (object)$large_array;
$large_string = str_repeat('a', 100000);
$small_string = 'this is a small string';
$value = PHP_INT_MAX / 2;
These are the functions:
function pass_by_ref(&$var) {
}
function pass_by_val($var) {
}
I have experimented with values and references of 10k bytes string passing it to two identical function. One takes argument by value and the second one by reference. They were common functions - take argument, do simple processing and return a value. I did 100 000 calls of both and figured out that references are not designed to increase performance - profit of reference was near 4-5% and it grows only when string becomes large enough (100k and longer, that gave 6-7% improvement). So, my conclusion is do not use references to increase perfomance, this stuff is not for that.
I used PHP Version 5.3.1
I'm pretty sure that no, it's not faster.
Additionally, it says specifically in the manual not to try using references to increase performance.
Edit: Can't find where it says that, but it's there!
I tried to benchmark this with a real-world example based on a project I was working on. As always, the differences are trivial, but the results were somewhat unexpected. For most of the benchmarks I've seen, the called function doesn't actually change the value passed in. I performed a simple str_replace() on it.
**Pass by Value Test Code:**
$originalString=''; // 1000 pseudo-random digits
function replace($string) {
return str_replace('1', 'x',$string);
}
$output = '';
/* set start time */
$mtime = microtime();
$mtime = explode(" ", $mtime);
$mtime = $mtime[1] + $mtime[0];
$tstart = $mtime;
set_time_limit(0);
for ($i = 0; $i < 10; $i++ ) {
for ($j = 0; $j < 1000000; $j++) {
$string = $originalString;
$string = replace($string);
}
}
/* report how long it took */
$mtime = microtime();
$mtime = explode(" ", $mtime);
$mtime = $mtime[1] + $mtime[0];
$tend = $mtime;
$totalTime = ($tend - $tstart);
$totalTime = sprintf("%2.4f s", $totalTime);
$output .= "\n" . 'Total Time' .
': ' . $totalTime;
$output .= "\n" . $string;
echo $output;
Pass by Reference Test Code
The same except for
function replace(&$string) {
$string = str_replace('1', 'x',$string);
}
/* ... */
replace($string);
Results in seconds (10 million iterations):
PHP 5
Value: 14.1007
Reference: 11.5564
PHP 7
Value: 3.0799
Reference: 2.9489
The difference is a fraction of a millisecond per function call, but for this use case, passing by reference is faster in both PHP 5 and PHP 7.
(Note: the PHP 7 tests were performed on a faster machine -- PHP 7 is faster, but probably not that much faster.)
There is nothing better than a testing piece of code
<?PHP
$r = array();
for($i=0; $i<500;$i++){
$r[]=5;
}
function a($r){
$r[0]=1;
}
function b(&$r){
$r[0]=1;
}
$start = microtime(true);
for($i=0;$i<9999;$i++){
//a($r);
b($r);
}
$end = microtime(true);
echo $end-$start;
?>
Final result! The bigger the array (or the greater the count of calls) the bigger the difference. So in this case, calling by reference is faster because the value is changed inside the function.
Otherwise there is no real difference between "by reference" and "by value", the compiler is smart enough not to create a new copy each time if there is no need.
Is simple, there is no need to test anything.
Depends on use-case.
Pass by value will ALWAYS BE FASTER BY VALUE than reference for small amount of arguments. This depends by how many variables that architecture allows to be passed through registers (ABI).
For example x64 will allow you 4 values 64 bit each to be passed through registers.
https://en.wikipedia.org/wiki/X86_calling_conventions
This is because you don't have to de-referentiate the pointers, just use value directly.
If your data that needs to be passed is bigger than ABI, rest of values will go to stack.
In this case, a array or a object (which in instance is a class, or a structure + headers) will ALWAYS BE FASTER BY REFERENCE.
This is because a reference is just a pointer to your data (not data itself), fixed size, say 32 or 64 bit depending on machine. That pointer will fit in one CPU register.
PHP is written in C/C++ so I'd expect to behave the same.
There is no need for adding & operator when passing objects. In PHP 5+ objects are passed by reference anyway.

Comparing execution times in PHP

I would like to compare different PHP code to know which one would be executed faster. I am currently using the following code:
<?php
$load_time_1 = 0;
$load_time_2 = 0;
$load_time_3 = 0;
for($x = 1; $x <= 20000; $x++)
{
//code 1
$start_time = microtime(true);
$i = 1;
$i++;
$load_time_1 += (microtime(true) - $start_time);
//code 2
$start_time = microtime(true);
$i = 1;
$i++;
$load_time_2 += (microtime(true) - $start_time);
//code 3
$start_time = microtime(true);
$i = 1;
$i++;
$load_time_3 += (microtime(true) - $start_time);
}
echo $load_time_1;
echo '<br />';
echo $load_time_2;
echo '<br />';
echo $load_time_3;
?>
I have executed the script several times.
The first result is
0.44057559967041
0.43392467498779
0.43600964546204
The second result is
0.50447297096252
0.48595094680786
0.49943733215332
The third result is
0.5283739566803
0.55247902870178
0.55091571807861
The result looks okay, but the problem is, that every time I execute this code the result is different. Also, I am comparing three times the same code and on the same machine.
Why would there be a difference in speed while comparing? And is there a way to compare execution times and to see the the real difference?
there is a thing called Observational error.
As long as your numbers do not exceed it, all your measurements are just waste of time.
The only proper way of doing measurements is called profiling and stands for measuring significant parts of the code, not senseless ones.
Why would there be a difference in
speed while comparing?
There are two reasons for this, which are both related to how things that are out of your control are handled by the PHP and the Operating system.
Firstly, the computer processor can only do certain amount of operations at any given time. The Operating system is basically responsible of handling the multitasking to divide these available cycles to your applications. Since these cycles aren't given at a constant rate, small speed variations are to be expected even with identical PHP commands, because of how processor cycles are allocated.
Secondly, a bigger cause to time variations are the background operations of PHP. There are many things that are completely hidden to the user, like memory allocation, garbage collection and handling various name spaces for variables and the like. These operations also take computer cycles and they can be run at unexpected times during your script. If garbage collection is performed during the first incrementation, but not the second, it causes the first operation to take longer than the second. Sometimes, because of garbage collection, the order in which the tests are performed can also impact the execution time.
Speed testing can be a bit tricky, because unrelated factors (like other applications running in the background) can skew the results of your test. Generally small speed differences are hard to tell between scripts, but when a speed test is run enough number of times, the real results can be seen. For example, if one script is constantly faster than another, it usually points out to that script being more efficient in terms of processing speed.
the reason the results vary is because there are other things going on at the same time, such as windows or linux based tasks, other processes, you will never get an exact result, you best of running the code over a 100 iterations and then devide the result to find the average time take, and use that as your figure/
Also it would be beneficial for you to create a class that can handle this for you, this way you can use it all the time without having to write the code every time:
try something like this (untested):
class CodeBench
{
private $benches = array();
public function __construct(){}
public function begin($name)
{
if(!isset($this->benches[$name]))
{
$this->benches[$name] = array();
}
$this->benches[$name]['start'] = array(
'microtime' => microtime(true)
/* Other information*/
);
}
public function end($name)
{
if(!isset($this->benches[$name]))
{
throw new Exception("You must first declare a benchmark for " . $name);
}
$this->benches[$name]['end'] = array(
'microtime' => microtime()
/* Other information*/
);
}
public function calculate($name)
{
if(!isset($this->benches[$name]))
{
throw new Exception("You must first declare a benchmark for " . $name);
}
if(!isset($this->benches[$name]['end']))
{
throw new Exception("You must first call an end call for " . $name);
}
return ($this->benches[$name]['end'] - $this->benches[$name]['start']) . 'ms'
}
}
And then use like so:
$CB = new CodeBench();
$CB->start("bench_1");
//Do work:
$CB->end("bench_1");
$CB->start("bench_2");
//Do work:
$CB->end("bench_2");
echo "First benchmark had taken: " . $CB->calculate("bench_1");
echo "Second benchmark had taken: " . $CB->calculate("bench_2");
Computing speeds are never 100% set in stone. PHP is a server-side script, and thus depending on the computing power available to the server, it can take a varying amount of time.
Since you're subtracting from the start time with each step, it is expected that load time 3 will be greater than 2 which will be greater than 1.

Is wrapping code into a function that doesn't need to be, bad in PHP?

Many many times on a page I will have to set post and get values in PHP like this
I just want to know if it is better to just continue doing it the way I have above or if performance would not be touched by adding it into a function like in the code below?
This would make it much easiar to write code but at the expense of making extra function calls on the page.
I have all the time in the world so making the code as fast as possible is more important to me then making it "easiar to write or faster to develop"
Appreciate any advice and please nothing about whichever makes it easier to develop, I am talking pure performance here =)
<?php
function arg_p($name, $default = null) {
return (isset($_GET[$name]))?$_GET[$name]:$default;
}
$pagesize = arg_p('pagesize', 10);
$pagesize = (isset($_GET['pagesize'])) ? $_GET['pagesize'] : 10;
?>
If you have all the time in the world, why don't you just test it?
<?php
// How many iterations?
$iterations = 100000;
// Inline
$timer_start = microtime(TRUE);
for($i = 0; $i < $iterations; $i++) {
$pagesize = (isset($_GET['pagesize'])) ? $_GET['pagesize'] : 10;
}
$time_spent = microtime(TRUE) - $timer_start;
printf("Inline: %.3fs\n", $time_spent);
// By function call
function arg_p($name, $default = null) {
return (isset($_GET[$name])) ? $_GET[$name] : $default;
}
$timer_start = microtime(TRUE);
for($i = 0; $i < $iterations; $i++) {
$pagesize = arg_p('pagesize', 10);
}
$time_spent = microtime(TRUE) - $timer_start;
printf("By function call: %.3fs\n", $time_spent);
?>
On my machine, this gives pretty clear results in favor of inline execution by a factor of almost 10. But you need a lot of iterations to really notice it.
(I would still use a function though, even if me answering this shows that I have time to waste ;)
Sure you'll probably get a performance benefit from not wrapping it into a function. But would it be noticeable? Not really.
Your time is worth more than the small about of CPU resources you'd save.
I doubt the difference in speed would be noticeable unless you are doing it many hundreds of times.
Function call is a performance hit, but you should also think about maintainability - wrapping it in the function could ease future changes (and copy-paste is bad for that).
Whilst performance wouldn't really be affected, anything that takes code out of the html stream the better.
Even with a thousand calls to your arg_p() you wouldn't be able to measure —let alone notice— the difference in performance. The time you will spend typing the extra "inline" code plus the time you will spend whenever you'll have to duplicate the changes to every inlined copy plus the added complexity and higher probability of typo or random error will cost you more than the unmeasurable performance improvement. In fact, that time could be spent on optimizing what really counts such as improving your database design, profile your code to find the areas that really affect the generation time, etc...
You'll be better off keeping your code clean. It will save you time that you can in turn invest into optimizing what really counts.

In PHP (>= 5.0), is passing by reference faster?

In PHP, function parameters can be passed by reference by prepending an ampersand to the parameter in the function declaration, like so:
function foo(&$bar)
{
// ...
}
Now, I am aware that this is not designed to improve performance, but to allow functions to change variables that are normally out of their scope.
Instead, PHP seems to use Copy On Write to avoid copying objects (and maybe also arrays) until they are changed. So, for functions that do not change their parameters, the effect should be the same as if you had passed them by reference.
However, I was wondering if the Copy On Write logic maybe is shortcircuited on pass-by-reference and whether that has any performance impact.
ETA: To be sure, I assume that it's not faster, and I am well aware that this is not what references are for. So I think my own guesses are quite good, I'm just looking for an answer from someone who really knows what's definitely happening under the hood. In five years of PHP development, I've always found it hard to get quality information on PHP internals short from reading the source.
In a test with 100 000 iterations of calling a function with a string of 20 kB, the results are:
Function that just reads / uses the parameter
pass by value: 0.12065005 seconds
pass by reference: 1.52171397 seconds
Function to write / change the parameter
pass by value: 1.52223396 seconds
pass by reference: 1.52388787 seconds
Conclusions
Pass the parameter by value is always faster
If the function change the value of the variable passed, for practical purposes is the same as pass by reference than by value
The Zend Engine uses copy-on-write, and when you use a reference yourself, it incurs a little extra overhead. Can only find this mention at time of writing though, and comments in the manual contain other links.
(EDIT) The manual page on Objects and references contains a little more info on how object variables differ from references.
I ran some test on this because I was unsure of the answers given.
My results show that passing large arrays or strings by reference IS significantly faster.
Here are my results:
The Y axis (Runs) is how many times a function could be called in 1 second * 10
The test was repeated 8 times for each function/variable
And here is the variables I used:
$large_array = array_fill(PHP_INT_MAX / 2, 1000, 'a');
$small_array = array('this', 'is', 'a', 'small', 'array');
$large_object = (object)$large_array;
$large_string = str_repeat('a', 100000);
$small_string = 'this is a small string';
$value = PHP_INT_MAX / 2;
These are the functions:
function pass_by_ref(&$var) {
}
function pass_by_val($var) {
}
I have experimented with values and references of 10k bytes string passing it to two identical function. One takes argument by value and the second one by reference. They were common functions - take argument, do simple processing and return a value. I did 100 000 calls of both and figured out that references are not designed to increase performance - profit of reference was near 4-5% and it grows only when string becomes large enough (100k and longer, that gave 6-7% improvement). So, my conclusion is do not use references to increase perfomance, this stuff is not for that.
I used PHP Version 5.3.1
I'm pretty sure that no, it's not faster.
Additionally, it says specifically in the manual not to try using references to increase performance.
Edit: Can't find where it says that, but it's there!
I tried to benchmark this with a real-world example based on a project I was working on. As always, the differences are trivial, but the results were somewhat unexpected. For most of the benchmarks I've seen, the called function doesn't actually change the value passed in. I performed a simple str_replace() on it.
**Pass by Value Test Code:**
$originalString=''; // 1000 pseudo-random digits
function replace($string) {
return str_replace('1', 'x',$string);
}
$output = '';
/* set start time */
$mtime = microtime();
$mtime = explode(" ", $mtime);
$mtime = $mtime[1] + $mtime[0];
$tstart = $mtime;
set_time_limit(0);
for ($i = 0; $i < 10; $i++ ) {
for ($j = 0; $j < 1000000; $j++) {
$string = $originalString;
$string = replace($string);
}
}
/* report how long it took */
$mtime = microtime();
$mtime = explode(" ", $mtime);
$mtime = $mtime[1] + $mtime[0];
$tend = $mtime;
$totalTime = ($tend - $tstart);
$totalTime = sprintf("%2.4f s", $totalTime);
$output .= "\n" . 'Total Time' .
': ' . $totalTime;
$output .= "\n" . $string;
echo $output;
Pass by Reference Test Code
The same except for
function replace(&$string) {
$string = str_replace('1', 'x',$string);
}
/* ... */
replace($string);
Results in seconds (10 million iterations):
PHP 5
Value: 14.1007
Reference: 11.5564
PHP 7
Value: 3.0799
Reference: 2.9489
The difference is a fraction of a millisecond per function call, but for this use case, passing by reference is faster in both PHP 5 and PHP 7.
(Note: the PHP 7 tests were performed on a faster machine -- PHP 7 is faster, but probably not that much faster.)
There is nothing better than a testing piece of code
<?PHP
$r = array();
for($i=0; $i<500;$i++){
$r[]=5;
}
function a($r){
$r[0]=1;
}
function b(&$r){
$r[0]=1;
}
$start = microtime(true);
for($i=0;$i<9999;$i++){
//a($r);
b($r);
}
$end = microtime(true);
echo $end-$start;
?>
Final result! The bigger the array (or the greater the count of calls) the bigger the difference. So in this case, calling by reference is faster because the value is changed inside the function.
Otherwise there is no real difference between "by reference" and "by value", the compiler is smart enough not to create a new copy each time if there is no need.
Is simple, there is no need to test anything.
Depends on use-case.
Pass by value will ALWAYS BE FASTER BY VALUE than reference for small amount of arguments. This depends by how many variables that architecture allows to be passed through registers (ABI).
For example x64 will allow you 4 values 64 bit each to be passed through registers.
https://en.wikipedia.org/wiki/X86_calling_conventions
This is because you don't have to de-referentiate the pointers, just use value directly.
If your data that needs to be passed is bigger than ABI, rest of values will go to stack.
In this case, a array or a object (which in instance is a class, or a structure + headers) will ALWAYS BE FASTER BY REFERENCE.
This is because a reference is just a pointer to your data (not data itself), fixed size, say 32 or 64 bit depending on machine. That pointer will fit in one CPU register.
PHP is written in C/C++ so I'd expect to behave the same.
There is no need for adding & operator when passing objects. In PHP 5+ objects are passed by reference anyway.

Are php strings immutable?

Or: Should I optimize my string-operations in PHP? I tried to ask PHP's manual about it, but I didn't get any hints to anything.
PHP already optimises it - variables are assigned using copy-on-write, and objects are passed by reference. In PHP 4 it doesn't, but nobody should be using PHP 4 for new code anyway.
One of the most essential speed optimization techniques in many languages is instance reuse. In that case the speed increase comes from at least 2 factors:
1. Less instantiations means less time spent on construction.
2. The less the amount of memory that the application uses, the less CPU cache misses there probably are.
For applications, where the speed is the #1 priority, there exists a truly tight bottleneck between the CPU and the RAM. One of the reasons for the bottleneck is the latency of the RAM.
The PHP, Ruby, Python, etc., are related to the cache-misses by a fact that even they store at least some (probably all) of the run-time data of the interpreted programs in the RAM.
String instantiation is one of the operations that is done pretty often, in relatively "huge quantities", and it may have a noticeable impact on speed.
Here's a run_test.bash of a measurement experiment:
#!/bin/bash
for i in `seq 1 200`;
do
/usr/bin/time -p -a -o ./measuring_data.rb php5 ./string_instantiation_speedtest.php
done
Here are the ./string_instantiation_speedtest.php and the measurement results:
<?php
// The comments on the
// next 2 lines show arithmetic mean of (user time + sys time) for 200 runs.
$b_instantiate=False; // 0.1624 seconds
$b_instantiate=True; // 0.1676 seconds
// The time consumed by the reference version is about 97% of the
// time consumed by the instantiation version, but a thing to notice is
// that the loop contains at least 1, probably 2, possibly 4,
// string instantiations at the array_push line.
$ar=array();
$s='This is a string.';
$n=10000;
$s_1=NULL;
for($i=0;$i<$n;$i++) {
if($b_instantiate) {
$s_1=''.$s;
} else {
$s_1=&$s;
}
// The rand is for avoiding optimization at storage.
array_push($ar,''.rand(0,9).$s_1);
} // for
echo($ar[rand(0,$n)]."\n");
?>
My conclusion from this experiment and one other experiment that I did with Ruby 1.8 is that it makes sense to pass string values around by reference.
One possible way to allow the "pass-strings-by-reference" to take place at the whole application scope is to consistently create a new string instance, whenever one needs to use a modified version of a string.
To increase locality, therefore speed, one may want to decrease the amount of memory that each of the operands consumes. The following experiment demonstrates the case for string concatenations:
<?php
// The comments on the
// next 2 lines show arithmetic mean of (user time + sys time) for 200 runs.
$b_suboptimal=False; // 0.0611 seconds
$b_suboptimal=True; // 0.0785 seconds
// The time consumed by the optimal version is about 78% of the
// time consumed by the suboptimal version.
//
// The number of concatenations is the same and the resultant
// string is the same, but what differs is the "average" and maximum
// lengths of the tokens that are used for assembling the $s_whole.
$n=1000;
$s_token="This is a string with a Linux line break.\n";
$s_whole='';
if($b_suboptimal) {
for($i=0;$i<$n;$i++) {
$s_whole=$s_whole.$s_token.$i;
} // for
} else {
$i_watershed=(int)round((($n*1.0)/2),0);
$s_part_1='';
$s_part_2='';
for($i=0;$i<$i_watershed;$i++) {
$s_part_1=$s_part_1.$i.$s_token;
} // for
for($i=$i_watershed;$i<$n;$i++) {
$s_part_2=$s_part_2.$i.$s_token;
} // for
$s_whole=$s_part_1.$s_part_2;
} // else
// To circumvent possible optimization one actually "uses" the
// value of the $s_whole.
$file_handle=fopen('./it_might_have_been_a_served_HTML_page.txt','w');
fwrite($file_handle, $s_whole);
fclose($file_handle);
?>
For example, if one assembles HTML pages that contain considerable amount of text, then one might want to think about the order, how different parts of the generated HTML are concated together.
A BSD-licensed PHP implementation and Ruby implementation of the watershed string concatenation algorithm is available. The same algorithm can be (has been by me) generalized to speed up multiplication of arbitrary precision integers.
Arrays and strings have copy-on-write behaviour. They are mutable, but when you assign them to a variable initially that variable will contain the exact same instance of the string or array. Only when you modify the array or string is a copy made.
Example:
$a = array_fill(0, 10000, 42); //Consumes 545744 bytes
$b = $a; // " 48 "
$b[0] = 42; // " 545656 "
$s = str_repeat(' ', 10000); // " 10096 "
$t = $s; // " 48 "
$t[0] = '!'; // " 10048 "
A quick google would seem to suggest that they are mutable, but the preferred practice is to treat them as immutable.
PHP 7.4 used mutable strings:
<?php
$str = "Hello\n";
echo $str;
$str[2] = 'y';
echo $str;
Output:
Hello
Heylo
Test: PHP Sandbox
PHP strings are immutable.
Try this:
$a="string";
echo "<br>$a<br>";
echo str_replace('str','b',$a);
echo "<br>$a";
It echos:
string
bing
string
If a string was mutable, it would have continued to show "bing".

Categories