In PHP, function parameters can be passed by reference by prepending an ampersand to the parameter in the function declaration, like so:
function foo(&$bar)
{
// ...
}
Now, I am aware that this is not designed to improve performance, but to allow functions to change variables that are normally out of their scope.
Instead, PHP seems to use Copy On Write to avoid copying objects (and maybe also arrays) until they are changed. So, for functions that do not change their parameters, the effect should be the same as if you had passed them by reference.
However, I was wondering if the Copy On Write logic maybe is shortcircuited on pass-by-reference and whether that has any performance impact.
ETA: To be sure, I assume that it's not faster, and I am well aware that this is not what references are for. So I think my own guesses are quite good, I'm just looking for an answer from someone who really knows what's definitely happening under the hood. In five years of PHP development, I've always found it hard to get quality information on PHP internals short from reading the source.
In a test with 100 000 iterations of calling a function with a string of 20 kB, the results are:
Function that just reads / uses the parameter
pass by value: 0.12065005 seconds
pass by reference: 1.52171397 seconds
Function to write / change the parameter
pass by value: 1.52223396 seconds
pass by reference: 1.52388787 seconds
Conclusions
Pass the parameter by value is always faster
If the function change the value of the variable passed, for practical purposes is the same as pass by reference than by value
The Zend Engine uses copy-on-write, and when you use a reference yourself, it incurs a little extra overhead. Can only find this mention at time of writing though, and comments in the manual contain other links.
(EDIT) The manual page on Objects and references contains a little more info on how object variables differ from references.
I ran some test on this because I was unsure of the answers given.
My results show that passing large arrays or strings by reference IS significantly faster.
Here are my results:
The Y axis (Runs) is how many times a function could be called in 1 second * 10
The test was repeated 8 times for each function/variable
And here is the variables I used:
$large_array = array_fill(PHP_INT_MAX / 2, 1000, 'a');
$small_array = array('this', 'is', 'a', 'small', 'array');
$large_object = (object)$large_array;
$large_string = str_repeat('a', 100000);
$small_string = 'this is a small string';
$value = PHP_INT_MAX / 2;
These are the functions:
function pass_by_ref(&$var) {
}
function pass_by_val($var) {
}
I have experimented with values and references of 10k bytes string passing it to two identical function. One takes argument by value and the second one by reference. They were common functions - take argument, do simple processing and return a value. I did 100 000 calls of both and figured out that references are not designed to increase performance - profit of reference was near 4-5% and it grows only when string becomes large enough (100k and longer, that gave 6-7% improvement). So, my conclusion is do not use references to increase perfomance, this stuff is not for that.
I used PHP Version 5.3.1
I'm pretty sure that no, it's not faster.
Additionally, it says specifically in the manual not to try using references to increase performance.
Edit: Can't find where it says that, but it's there!
I tried to benchmark this with a real-world example based on a project I was working on. As always, the differences are trivial, but the results were somewhat unexpected. For most of the benchmarks I've seen, the called function doesn't actually change the value passed in. I performed a simple str_replace() on it.
**Pass by Value Test Code:**
$originalString=''; // 1000 pseudo-random digits
function replace($string) {
return str_replace('1', 'x',$string);
}
$output = '';
/* set start time */
$mtime = microtime();
$mtime = explode(" ", $mtime);
$mtime = $mtime[1] + $mtime[0];
$tstart = $mtime;
set_time_limit(0);
for ($i = 0; $i < 10; $i++ ) {
for ($j = 0; $j < 1000000; $j++) {
$string = $originalString;
$string = replace($string);
}
}
/* report how long it took */
$mtime = microtime();
$mtime = explode(" ", $mtime);
$mtime = $mtime[1] + $mtime[0];
$tend = $mtime;
$totalTime = ($tend - $tstart);
$totalTime = sprintf("%2.4f s", $totalTime);
$output .= "\n" . 'Total Time' .
': ' . $totalTime;
$output .= "\n" . $string;
echo $output;
Pass by Reference Test Code
The same except for
function replace(&$string) {
$string = str_replace('1', 'x',$string);
}
/* ... */
replace($string);
Results in seconds (10 million iterations):
PHP 5
Value: 14.1007
Reference: 11.5564
PHP 7
Value: 3.0799
Reference: 2.9489
The difference is a fraction of a millisecond per function call, but for this use case, passing by reference is faster in both PHP 5 and PHP 7.
(Note: the PHP 7 tests were performed on a faster machine -- PHP 7 is faster, but probably not that much faster.)
There is nothing better than a testing piece of code
<?PHP
$r = array();
for($i=0; $i<500;$i++){
$r[]=5;
}
function a($r){
$r[0]=1;
}
function b(&$r){
$r[0]=1;
}
$start = microtime(true);
for($i=0;$i<9999;$i++){
//a($r);
b($r);
}
$end = microtime(true);
echo $end-$start;
?>
Final result! The bigger the array (or the greater the count of calls) the bigger the difference. So in this case, calling by reference is faster because the value is changed inside the function.
Otherwise there is no real difference between "by reference" and "by value", the compiler is smart enough not to create a new copy each time if there is no need.
Is simple, there is no need to test anything.
Depends on use-case.
Pass by value will ALWAYS BE FASTER BY VALUE than reference for small amount of arguments. This depends by how many variables that architecture allows to be passed through registers (ABI).
For example x64 will allow you 4 values 64 bit each to be passed through registers.
https://en.wikipedia.org/wiki/X86_calling_conventions
This is because you don't have to de-referentiate the pointers, just use value directly.
If your data that needs to be passed is bigger than ABI, rest of values will go to stack.
In this case, a array or a object (which in instance is a class, or a structure + headers) will ALWAYS BE FASTER BY REFERENCE.
This is because a reference is just a pointer to your data (not data itself), fixed size, say 32 or 64 bit depending on machine. That pointer will fit in one CPU register.
PHP is written in C/C++ so I'd expect to behave the same.
There is no need for adding & operator when passing objects. In PHP 5+ objects are passed by reference anyway.
Related
In PHP, function parameters can be passed by reference by prepending an ampersand to the parameter in the function declaration, like so:
function foo(&$bar)
{
// ...
}
Now, I am aware that this is not designed to improve performance, but to allow functions to change variables that are normally out of their scope.
Instead, PHP seems to use Copy On Write to avoid copying objects (and maybe also arrays) until they are changed. So, for functions that do not change their parameters, the effect should be the same as if you had passed them by reference.
However, I was wondering if the Copy On Write logic maybe is shortcircuited on pass-by-reference and whether that has any performance impact.
ETA: To be sure, I assume that it's not faster, and I am well aware that this is not what references are for. So I think my own guesses are quite good, I'm just looking for an answer from someone who really knows what's definitely happening under the hood. In five years of PHP development, I've always found it hard to get quality information on PHP internals short from reading the source.
In a test with 100 000 iterations of calling a function with a string of 20 kB, the results are:
Function that just reads / uses the parameter
pass by value: 0.12065005 seconds
pass by reference: 1.52171397 seconds
Function to write / change the parameter
pass by value: 1.52223396 seconds
pass by reference: 1.52388787 seconds
Conclusions
Pass the parameter by value is always faster
If the function change the value of the variable passed, for practical purposes is the same as pass by reference than by value
The Zend Engine uses copy-on-write, and when you use a reference yourself, it incurs a little extra overhead. Can only find this mention at time of writing though, and comments in the manual contain other links.
(EDIT) The manual page on Objects and references contains a little more info on how object variables differ from references.
I ran some test on this because I was unsure of the answers given.
My results show that passing large arrays or strings by reference IS significantly faster.
Here are my results:
The Y axis (Runs) is how many times a function could be called in 1 second * 10
The test was repeated 8 times for each function/variable
And here is the variables I used:
$large_array = array_fill(PHP_INT_MAX / 2, 1000, 'a');
$small_array = array('this', 'is', 'a', 'small', 'array');
$large_object = (object)$large_array;
$large_string = str_repeat('a', 100000);
$small_string = 'this is a small string';
$value = PHP_INT_MAX / 2;
These are the functions:
function pass_by_ref(&$var) {
}
function pass_by_val($var) {
}
I have experimented with values and references of 10k bytes string passing it to two identical function. One takes argument by value and the second one by reference. They were common functions - take argument, do simple processing and return a value. I did 100 000 calls of both and figured out that references are not designed to increase performance - profit of reference was near 4-5% and it grows only when string becomes large enough (100k and longer, that gave 6-7% improvement). So, my conclusion is do not use references to increase perfomance, this stuff is not for that.
I used PHP Version 5.3.1
I'm pretty sure that no, it's not faster.
Additionally, it says specifically in the manual not to try using references to increase performance.
Edit: Can't find where it says that, but it's there!
I tried to benchmark this with a real-world example based on a project I was working on. As always, the differences are trivial, but the results were somewhat unexpected. For most of the benchmarks I've seen, the called function doesn't actually change the value passed in. I performed a simple str_replace() on it.
**Pass by Value Test Code:**
$originalString=''; // 1000 pseudo-random digits
function replace($string) {
return str_replace('1', 'x',$string);
}
$output = '';
/* set start time */
$mtime = microtime();
$mtime = explode(" ", $mtime);
$mtime = $mtime[1] + $mtime[0];
$tstart = $mtime;
set_time_limit(0);
for ($i = 0; $i < 10; $i++ ) {
for ($j = 0; $j < 1000000; $j++) {
$string = $originalString;
$string = replace($string);
}
}
/* report how long it took */
$mtime = microtime();
$mtime = explode(" ", $mtime);
$mtime = $mtime[1] + $mtime[0];
$tend = $mtime;
$totalTime = ($tend - $tstart);
$totalTime = sprintf("%2.4f s", $totalTime);
$output .= "\n" . 'Total Time' .
': ' . $totalTime;
$output .= "\n" . $string;
echo $output;
Pass by Reference Test Code
The same except for
function replace(&$string) {
$string = str_replace('1', 'x',$string);
}
/* ... */
replace($string);
Results in seconds (10 million iterations):
PHP 5
Value: 14.1007
Reference: 11.5564
PHP 7
Value: 3.0799
Reference: 2.9489
The difference is a fraction of a millisecond per function call, but for this use case, passing by reference is faster in both PHP 5 and PHP 7.
(Note: the PHP 7 tests were performed on a faster machine -- PHP 7 is faster, but probably not that much faster.)
There is nothing better than a testing piece of code
<?PHP
$r = array();
for($i=0; $i<500;$i++){
$r[]=5;
}
function a($r){
$r[0]=1;
}
function b(&$r){
$r[0]=1;
}
$start = microtime(true);
for($i=0;$i<9999;$i++){
//a($r);
b($r);
}
$end = microtime(true);
echo $end-$start;
?>
Final result! The bigger the array (or the greater the count of calls) the bigger the difference. So in this case, calling by reference is faster because the value is changed inside the function.
Otherwise there is no real difference between "by reference" and "by value", the compiler is smart enough not to create a new copy each time if there is no need.
Is simple, there is no need to test anything.
Depends on use-case.
Pass by value will ALWAYS BE FASTER BY VALUE than reference for small amount of arguments. This depends by how many variables that architecture allows to be passed through registers (ABI).
For example x64 will allow you 4 values 64 bit each to be passed through registers.
https://en.wikipedia.org/wiki/X86_calling_conventions
This is because you don't have to de-referentiate the pointers, just use value directly.
If your data that needs to be passed is bigger than ABI, rest of values will go to stack.
In this case, a array or a object (which in instance is a class, or a structure + headers) will ALWAYS BE FASTER BY REFERENCE.
This is because a reference is just a pointer to your data (not data itself), fixed size, say 32 or 64 bit depending on machine. That pointer will fit in one CPU register.
PHP is written in C/C++ so I'd expect to behave the same.
There is no need for adding & operator when passing objects. In PHP 5+ objects are passed by reference anyway.
I'm relatively new to PHP and slowly learning the idiosyncrasies specific to the language. One thing I get dinged by a lot is that I (so I'm told) use too many function calls and am generally asked to do things to work around them. Here's two examples:
// Change this:
} catch (Exception $e) {
print "It seems that error " . $e->getCode() . " occured";
log("Error: " . $e->getCode());
}
// To this:
} catch (Exception $e) {
$code = $e->getCode();
print "It seems that error " . $code . " occured";
log("Error: " . $code);
}
2nd Example
// Change this:
$customer->setProducts($products);
// To this:
if (!empty($products)) {
$customer->setProducts($products);
}
In the first example I find that assigning $e->getCode() to $code ads a slight cognitive overhead; "What's '$code'? Ah, it's the code from the exception." Whereas the second example adds cyclomatic complexity. In both examples I find the optimization to come at the cost of readability and maintainability.
Is the performance increase worth it or is this micro optimization?
I should note that we're stuck with PHP 5.2 for right now.
I've done some very rough bench tests and find the function call performance hit to be on the order of 10% to 70% depending on the nature of my bench test. I'll concede that this is significant. But before that catch block is hit there was a call to a database and an HTTP end point. Before $products was set on the $customer there was a complex sort that happened to the $products array. At the end of the day does this optimization justify the cost of making the code harder to read and maintain? Or, although these examples are simplifications, does anybody find the 2nd examples just as easy or easier to read than the first (am I being a wiener)?
Can anyone cite any good articles or studies about this?
Edit:
An example bench test:
<?php
class Foo {
private $list;
public function setList($list) {
$this->list = $list;
}
}
$foo1 = new Foo();
for($i = 0; $i < 1000000; $i++) {
$a = array();
if (!empty($a))
$foo1->setList($a);
}
?>
Run that file with the time command. On one particular machine it takes an average of 0.60 seconds after several runs. Commenting out the if (!empty($a)) causes it to take an average of 3.00 seconds to run.
Clarification: These are examples. The 1st example demonstrates horrible exception handling and a possible DRY violation at the expense of a simple, non-domain-specific example.
Nobody has yet discussed how the server's hardware is related to function call overhead.
When a function is called, all of the CPU's registers contain data relevant to the current execution point. All of the CPU's registers must be saved to memory (typically the process' stack) or there is no hope of ever returning to that execution point and resuming execution. When returning from the function, all of the CPU's registers must be restored from memory (typically the process' stack).
So, one can see how a string of nested function calls can add overhead to the process. The CPU's registers must be saved over and over again on the stack and restored over and over again to get back from the functions.
This is really the source of the overhead of function calls. And if function arguments are passed, those must all be duplicated before the function can be called. Therefore, passing huge arrays as function arguments is a poor design.
Studies have been done on object-oriented PHP on overhead of use of getters/setters. Removing all getters/setters cut execution time by about 50%. And that simply is due to function call overhead.
PHP function call overhead is precisely 15.5355%.
:) Just stirring the pot.
Seriously, here are a couple great links on the subject:
Is it possible to have too many functions in a PHP application?
functions vs repeated code
The code maintainability versus speed discussions at these links address the (maybe more important) question implied by the OP, but just to add a little data that may also be pertinent and hopefully useful to people who come across this thread in the future, here are results from running the below code on a 2011 Macbook Pro (with very little drive space and too many programs running).
As noted elsewhere, an important consideration in deciding whether to call a function or put the code "in-line" is how many times the function will be called from within a certain block of code. The more times the function will be called, the more it's worth considering doing the work in-line.
Results (times in seconds)
Call Function Method | In-Line Method | Difference | Percent Different
1,000 iterations (4 runs)
0.0039088726043701 | 0.0031478404998779 | 0.00076103210449219 | 19.4694
0.0038208961486816 | 0.0025999546051025 | 0.0012209415435791 | 31.9543
0.0030159950256348 | 0.0029480457305908 | 6.7949295043945E-5 | 2.2530
0.0031449794769287 | 0.0031390190124512 | 5.9604644775391E-6 | 0.1895
1,000,000 iterations (4 runs)
3.1843111515045 | 2.6896121501923 | 0.49469900131226 | 15.5355
3.131945848465 | 2.7114839553833 | 0.42046189308167 | 13.4249
3.0256152153015 | 2.7648048400879 | 0.26081037521362 | 8.6201
3.1251409053802 | 2.7397727966309 | 0.38536810874939 | 12.3312
function postgres_friendly_number($dirtyString) {
$cleanString = str_ireplace("(", "-", $dirtyString);
$badChars = array("$", ",", ")");
$cleanString = str_ireplace($badChars, "", $cleanString);
return $cleanString;
}
//main
$badNumberString = '-$590,832.61';
$iterations = 1000000;
$startTime = microtime(true);
for ($i = 1; $i <= $iterations; $i++) {
$goodNumberString = postgres_friendly_number($badNumberString);
}
$endTime = microtime(true);
$firstTime = ($endTime - $startTime);
$startTime = microtime(true);
for ($i = 1; $i <= $iterations; $i++) {
$goodNumberString = str_ireplace("(", "-", $badNumberString);
$badChars = array("$", ",", ")");
$goodNumberString = str_ireplace($badChars, "", $goodNumberString);
}
$endTime = microtime(true);
$secondTime = ($endTime - $startTime);
$timeDifference = $firstTime - $secondTime;
$percentDifference = (( $timeDifference / $firstTime ) * 100);
The canonical PHP implementation is very slow because it's easy to implement and the applications PHP aims at do not require raw performance like fast function calls.
You might want to consider other PHP implementations.
If you are writing the applications you should be writing in PHP (dump data from DB to the browser over a network) then the function call overhead is not significant. Certainly don't go out of your way to duplicate code because you were afraid using a function would be too much overhead.
I you are confusing terminology. Generally function call overhead
means the overhead involved in calling an returning from a function.
Rather than processing inline. Its not the the total cost of a function call.
Its just the cost preparing the arguments and return values and doing the branch.
The problem you have is that PHP and other weakly typed scripting style languages are really bad at determining if functions have side effects. So rather than store the results of a function as a temp, they will make multiple calls. If the function is doing something complex this will be very inefficient.
So, the bottom line is: call the function once and store and reuse the result!
Don't call the same function multiple times with the same arguments. (without a good reason)
I want to send a UDP message over a network in PHP.
The message has a predefined protocol say the message must be 10 bytes long, in which the first 3 bytes are for entry 1, 2 bytes for entry 2, 3 bytes are for entry 3, and the last 2 bytes for entry 4.
How can I do this in PHP?
As in C we can use memcpy.
Just create a string of the required bytes. A character of a string is a byte in PHP.
I really wish there was a fast way to do this in PHP! All the PHP string functions
return a string rather than allow you to directly copy into an existing string.
The latter is extremely useful from a memory perspective if you have megabyte -
sized arrays. The following code is a PHP variant of the kind of thing I would like...
It would be nice if you could do this kinda manipulation directly in PHP as a native
function call rather than have to make your own extension. In the below,
$source is a short string we would like to copy over part of the data in the
much longer string $destination. We pass the latter by reference so we don't
waste memory by doing the function call.
function charCopy($source, &$destination, $start, $length)
{
$endk = intval($length - 1);
$end = intval($start + $endk);
$start = intval($start);
for($j = $end, $k = $endk; $j >= $start; $j--, $k--) {
$destination[$j] = $source[$k];
}
}
this is close.
function &memcpy(&$dest,$src,$n){
$dest=substr($src,0,$n) . substr($dest,$n);
return $dest;
}
note that unlike C, PHP manage memory & realloc for you, and you can't overflow :p i don't think you can make an exact memcpy clone in userland, you could throw some exceptions / trigger_error() if $n > strlen($src) or $n>strlen($dest) ofc. the code above will just automatically rezie $dest for you, if needed, unlike C's memcpy.
I am curious, is there a size limit on serialize in PHP. Would it be possible to serialize an array with 5,000 keys and values so it can be stored into a cache?
I am hoping to cache a users friend list on a social network site, the cache will need to be updated fairly often but it will need to be read almost every page load.
On a single server setup I am assuming APC would be better then memcache for this.
As quite a couple other people answered already, just for fun, here's a very quick benchmark (do I dare calling it that ? ) ; consider the following code :
$num = 1;
$list = array_fill(0, 5000, str_repeat('1234567890', $num));
$before = microtime(true);
for ($i=0 ; $i<10000 ; $i++) {
$str = serialize($list);
}
$after = microtime(true);
var_dump($after-$before);
var_dump(memory_get_peak_usage());
I'm running this on PHP 5.2.6 (the one bundled with Ubuntu jaunty).
And, yes, there are only values ; no keys ; and the values are quite simple : no object, no sub-array, no nothing but string.
For $num = 1, you get :
float(11.8147978783)
int(1702688)
For $num = 10, you get :
float(13.1230671406)
int(2612104)
And, for $num = 100, you get :
float(63.2925770283)
int(11621760)
So, it seems the bigger each element of the array is, the longer it takes (seems fair, actually). But, for elements 100 times bigger, you don't take 100 times much longer...
Now, with an array of 50000 elements, instead of 5000, which means this part of the code is changed :
$list = array_fill(0, 50000, str_repeat('1234567890', $num));
With $num = 1, you get :
float(158.236332178)
int(15750752)
Considering the time it took for 1, I won't be running this for either $num = 10 nor $num = 100...
Yes, of course, in a real situation, you wouldn't be doing this 10000 times ; so let's try with only 10 iterations of the for loop.
For $num = 1 :
float(0.206310987473)
int(15750752)
For $num = 10 :
float(0.272629022598)
int(24849832)
And for $num = 100 :
float(0.895547151566)
int(114949792)
Yeah, that's almost 1 second -- and quite a bit of memory used ^^
(No, this is not a production server : I have a pretty high memory_limit on this development machine ^^ )
So, in the end, to be a bit shorter than those number -- and, yes, you can have numbers say whatever you want them to -- I wouldn't say there is a "limit" as in "hardcoded" in PHP, but you'll end up facing one of those :
max_execution_time (generally, on a webserver, it's never more than 30 seconds)
memory_limit (on a webserver, it's generally not muco more than 32MB)
the load you webserver will have : while 1 of those big serialize-loop was running, it took 1 of my CPU ; if you are having quite a couple of users on the same page at the same time, I let you imagine what it will give ;-)
the patience of your user ^^
But, except if you are really serializing long arrays of big data, I am not sure it will matter that much...
And you must take into consideration the amount of time/CPU-load using that cache might help you gain ;-)
Still, the best way to know would be to test by yourself, with real data ;-)
And you might also want to take a look at what Xdebug can do when it comes to profiling : this kind of situation is one of those it is useful for!
The serialize() function is only limited by available memory.
There's no limit enforced by PHP. Serialize returns a bytestream representation (string) of the serialized structure, so you would just get a large string.
The only practical limit is your available memory, since serialization involves creating a string in memory.
There is no limit, but remember that serialization and unserialization has a cost.
Unserialization is exteremely costly.
A less costly way of caching that data would be via var_export() as such (since PHP 5.1.0, it works on objects):
$largeArray = array(1,2,3,'hello'=>'world',4);
file_put_contents('cache.php', "<?php\nreturn ".
var_export($largeArray, true).
';');
You can then simply retrieve the array by doing the following:
$largeArray = include('cache.php');
Resources are usually not cache-able.
Unfortunately, if you have circular references in your array, you'll need to use serialize().
As suggested by Thinker above:
You could use
$string = json_encode($your_array_here);
and to decode it
$array = json_decode($your_array_here, true);
This returns an array. It works well even if the encoded array was multilevel.
Ok... more numbers! (PHP 5.3.0 OSX, no opcode cache)
#Pascal's code on my machine for n=1 at 10k iters produces:
float(18.884856939316)
int(1075900)
I add unserialize() to the above as so.
$num = 1;
$list = array_fill(0, 5000, str_repeat('1234567890', $num));
$before = microtime(true);
for ($i=0 ; $i<10000 ; $i++) {
$str = serialize($list);
$list = unserialize($str);
}
$after = microtime(true);
var_dump($after-$before);
var_dump(memory_get_peak_usage());
produces
float(50.204112052917)
int(1606768)
I assume the extra 600k or so are the serialized string.
I was curious about var_export and its include/eval partner $str = var_export($list, true); instead of serialize() in the original produces
float(57.064643859863)
int(1066440)
so just a little less memory (at least for this simple example) but way more time already.
adding in eval('$list = '.$str.';'); instead of unserialize in the above produces
float(126.62566018105)
int(2944144)
Indicating theres probably a memory leak somewhere when doing eval :-/.
So again, these aren't great benchmarks (I really should isolate the eval/unserialize by putting the string in a local var or something, but I'm being lazy) but they show the associated trends. var_export seems slow.
Nope, there is no limit and this:
set_time_limit(0);
ini_set('memory_limit ', -1);
unserialize('s:2000000000:"a";');
is why you should have safe.mode = On or a extension like Suhosin installed, otherwise it will eat up all the memory in your system.
I think better than serialize is json_encode function. It got a drawback, that associative arrays and objects are not distinguished, but string result is smaller and easier to read by human, so also to debug and edit.
If you want to cache it (so I assume performance is the issue), use apc_add instead to avoid the performance hit of converting it to a string + gain cache in memory.
As stated above the only size limit is available memory.
A few other gotchas:
serialize'd data is not portable between multi-byte and single-byte character encodings.
PHP5 classes include NUL bytes that can cause havoc with code that doesn't expect them.
Your use case sounds like you're better off using a database to do that rather than relying solely on PHP's available resources. The advantages to using something like MySQL instead is that it's specifically engineered with memory management in mind for such things as storage and lookup.
It's really no fun constantly serializing and unserializing data just to update or change a few pieces of information.
I've just come across an instance where I thought I was hitting an upper limit of serialisation.
I'm persisting serialised objects to a database using a mysql TEXT field.
The limit of the available characters for a single-byte characters is 65,535 so whilst I can serialize much larger objects than that with PHP It's impossible to unserialize them as they are truncated by the limit of the TEXT field.
i have a case in which unserialize throws an exception on a large serialized object, size: 65535 (the magic number: 16bits full bit = 65536)
Or: Should I optimize my string-operations in PHP? I tried to ask PHP's manual about it, but I didn't get any hints to anything.
PHP already optimises it - variables are assigned using copy-on-write, and objects are passed by reference. In PHP 4 it doesn't, but nobody should be using PHP 4 for new code anyway.
One of the most essential speed optimization techniques in many languages is instance reuse. In that case the speed increase comes from at least 2 factors:
1. Less instantiations means less time spent on construction.
2. The less the amount of memory that the application uses, the less CPU cache misses there probably are.
For applications, where the speed is the #1 priority, there exists a truly tight bottleneck between the CPU and the RAM. One of the reasons for the bottleneck is the latency of the RAM.
The PHP, Ruby, Python, etc., are related to the cache-misses by a fact that even they store at least some (probably all) of the run-time data of the interpreted programs in the RAM.
String instantiation is one of the operations that is done pretty often, in relatively "huge quantities", and it may have a noticeable impact on speed.
Here's a run_test.bash of a measurement experiment:
#!/bin/bash
for i in `seq 1 200`;
do
/usr/bin/time -p -a -o ./measuring_data.rb php5 ./string_instantiation_speedtest.php
done
Here are the ./string_instantiation_speedtest.php and the measurement results:
<?php
// The comments on the
// next 2 lines show arithmetic mean of (user time + sys time) for 200 runs.
$b_instantiate=False; // 0.1624 seconds
$b_instantiate=True; // 0.1676 seconds
// The time consumed by the reference version is about 97% of the
// time consumed by the instantiation version, but a thing to notice is
// that the loop contains at least 1, probably 2, possibly 4,
// string instantiations at the array_push line.
$ar=array();
$s='This is a string.';
$n=10000;
$s_1=NULL;
for($i=0;$i<$n;$i++) {
if($b_instantiate) {
$s_1=''.$s;
} else {
$s_1=&$s;
}
// The rand is for avoiding optimization at storage.
array_push($ar,''.rand(0,9).$s_1);
} // for
echo($ar[rand(0,$n)]."\n");
?>
My conclusion from this experiment and one other experiment that I did with Ruby 1.8 is that it makes sense to pass string values around by reference.
One possible way to allow the "pass-strings-by-reference" to take place at the whole application scope is to consistently create a new string instance, whenever one needs to use a modified version of a string.
To increase locality, therefore speed, one may want to decrease the amount of memory that each of the operands consumes. The following experiment demonstrates the case for string concatenations:
<?php
// The comments on the
// next 2 lines show arithmetic mean of (user time + sys time) for 200 runs.
$b_suboptimal=False; // 0.0611 seconds
$b_suboptimal=True; // 0.0785 seconds
// The time consumed by the optimal version is about 78% of the
// time consumed by the suboptimal version.
//
// The number of concatenations is the same and the resultant
// string is the same, but what differs is the "average" and maximum
// lengths of the tokens that are used for assembling the $s_whole.
$n=1000;
$s_token="This is a string with a Linux line break.\n";
$s_whole='';
if($b_suboptimal) {
for($i=0;$i<$n;$i++) {
$s_whole=$s_whole.$s_token.$i;
} // for
} else {
$i_watershed=(int)round((($n*1.0)/2),0);
$s_part_1='';
$s_part_2='';
for($i=0;$i<$i_watershed;$i++) {
$s_part_1=$s_part_1.$i.$s_token;
} // for
for($i=$i_watershed;$i<$n;$i++) {
$s_part_2=$s_part_2.$i.$s_token;
} // for
$s_whole=$s_part_1.$s_part_2;
} // else
// To circumvent possible optimization one actually "uses" the
// value of the $s_whole.
$file_handle=fopen('./it_might_have_been_a_served_HTML_page.txt','w');
fwrite($file_handle, $s_whole);
fclose($file_handle);
?>
For example, if one assembles HTML pages that contain considerable amount of text, then one might want to think about the order, how different parts of the generated HTML are concated together.
A BSD-licensed PHP implementation and Ruby implementation of the watershed string concatenation algorithm is available. The same algorithm can be (has been by me) generalized to speed up multiplication of arbitrary precision integers.
Arrays and strings have copy-on-write behaviour. They are mutable, but when you assign them to a variable initially that variable will contain the exact same instance of the string or array. Only when you modify the array or string is a copy made.
Example:
$a = array_fill(0, 10000, 42); //Consumes 545744 bytes
$b = $a; // " 48 "
$b[0] = 42; // " 545656 "
$s = str_repeat(' ', 10000); // " 10096 "
$t = $s; // " 48 "
$t[0] = '!'; // " 10048 "
A quick google would seem to suggest that they are mutable, but the preferred practice is to treat them as immutable.
PHP 7.4 used mutable strings:
<?php
$str = "Hello\n";
echo $str;
$str[2] = 'y';
echo $str;
Output:
Hello
Heylo
Test: PHP Sandbox
PHP strings are immutable.
Try this:
$a="string";
echo "<br>$a<br>";
echo str_replace('str','b',$a);
echo "<br>$a";
It echos:
string
bing
string
If a string was mutable, it would have continued to show "bing".