Are php strings immutable? - php

Or: Should I optimize my string-operations in PHP? I tried to ask PHP's manual about it, but I didn't get any hints to anything.

PHP already optimises it - variables are assigned using copy-on-write, and objects are passed by reference. In PHP 4 it doesn't, but nobody should be using PHP 4 for new code anyway.

One of the most essential speed optimization techniques in many languages is instance reuse. In that case the speed increase comes from at least 2 factors:
1. Less instantiations means less time spent on construction.
2. The less the amount of memory that the application uses, the less CPU cache misses there probably are.
For applications, where the speed is the #1 priority, there exists a truly tight bottleneck between the CPU and the RAM. One of the reasons for the bottleneck is the latency of the RAM.
The PHP, Ruby, Python, etc., are related to the cache-misses by a fact that even they store at least some (probably all) of the run-time data of the interpreted programs in the RAM.
String instantiation is one of the operations that is done pretty often, in relatively "huge quantities", and it may have a noticeable impact on speed.
Here's a run_test.bash of a measurement experiment:
#!/bin/bash
for i in `seq 1 200`;
do
/usr/bin/time -p -a -o ./measuring_data.rb php5 ./string_instantiation_speedtest.php
done
Here are the ./string_instantiation_speedtest.php and the measurement results:
<?php
// The comments on the
// next 2 lines show arithmetic mean of (user time + sys time) for 200 runs.
$b_instantiate=False; // 0.1624 seconds
$b_instantiate=True; // 0.1676 seconds
// The time consumed by the reference version is about 97% of the
// time consumed by the instantiation version, but a thing to notice is
// that the loop contains at least 1, probably 2, possibly 4,
// string instantiations at the array_push line.
$ar=array();
$s='This is a string.';
$n=10000;
$s_1=NULL;
for($i=0;$i<$n;$i++) {
if($b_instantiate) {
$s_1=''.$s;
} else {
$s_1=&$s;
}
// The rand is for avoiding optimization at storage.
array_push($ar,''.rand(0,9).$s_1);
} // for
echo($ar[rand(0,$n)]."\n");
?>
My conclusion from this experiment and one other experiment that I did with Ruby 1.8 is that it makes sense to pass string values around by reference.
One possible way to allow the "pass-strings-by-reference" to take place at the whole application scope is to consistently create a new string instance, whenever one needs to use a modified version of a string.
To increase locality, therefore speed, one may want to decrease the amount of memory that each of the operands consumes. The following experiment demonstrates the case for string concatenations:
<?php
// The comments on the
// next 2 lines show arithmetic mean of (user time + sys time) for 200 runs.
$b_suboptimal=False; // 0.0611 seconds
$b_suboptimal=True; // 0.0785 seconds
// The time consumed by the optimal version is about 78% of the
// time consumed by the suboptimal version.
//
// The number of concatenations is the same and the resultant
// string is the same, but what differs is the "average" and maximum
// lengths of the tokens that are used for assembling the $s_whole.
$n=1000;
$s_token="This is a string with a Linux line break.\n";
$s_whole='';
if($b_suboptimal) {
for($i=0;$i<$n;$i++) {
$s_whole=$s_whole.$s_token.$i;
} // for
} else {
$i_watershed=(int)round((($n*1.0)/2),0);
$s_part_1='';
$s_part_2='';
for($i=0;$i<$i_watershed;$i++) {
$s_part_1=$s_part_1.$i.$s_token;
} // for
for($i=$i_watershed;$i<$n;$i++) {
$s_part_2=$s_part_2.$i.$s_token;
} // for
$s_whole=$s_part_1.$s_part_2;
} // else
// To circumvent possible optimization one actually "uses" the
// value of the $s_whole.
$file_handle=fopen('./it_might_have_been_a_served_HTML_page.txt','w');
fwrite($file_handle, $s_whole);
fclose($file_handle);
?>
For example, if one assembles HTML pages that contain considerable amount of text, then one might want to think about the order, how different parts of the generated HTML are concated together.
A BSD-licensed PHP implementation and Ruby implementation of the watershed string concatenation algorithm is available. The same algorithm can be (has been by me) generalized to speed up multiplication of arbitrary precision integers.

Arrays and strings have copy-on-write behaviour. They are mutable, but when you assign them to a variable initially that variable will contain the exact same instance of the string or array. Only when you modify the array or string is a copy made.
Example:
$a = array_fill(0, 10000, 42); //Consumes 545744 bytes
$b = $a; // " 48 "
$b[0] = 42; // " 545656 "
$s = str_repeat(' ', 10000); // " 10096 "
$t = $s; // " 48 "
$t[0] = '!'; // " 10048 "

A quick google would seem to suggest that they are mutable, but the preferred practice is to treat them as immutable.

PHP 7.4 used mutable strings:
<?php
$str = "Hello\n";
echo $str;
$str[2] = 'y';
echo $str;
Output:
Hello
Heylo
Test: PHP Sandbox

PHP strings are immutable.
Try this:
$a="string";
echo "<br>$a<br>";
echo str_replace('str','b',$a);
echo "<br>$a";
It echos:
string
bing
string
If a string was mutable, it would have continued to show "bing".

Related

Can a PHP script be expected to have more performace (i.e. processing more code in less time) by passing variable by reference? [duplicate]

In PHP, function parameters can be passed by reference by prepending an ampersand to the parameter in the function declaration, like so:
function foo(&$bar)
{
// ...
}
Now, I am aware that this is not designed to improve performance, but to allow functions to change variables that are normally out of their scope.
Instead, PHP seems to use Copy On Write to avoid copying objects (and maybe also arrays) until they are changed. So, for functions that do not change their parameters, the effect should be the same as if you had passed them by reference.
However, I was wondering if the Copy On Write logic maybe is shortcircuited on pass-by-reference and whether that has any performance impact.
ETA: To be sure, I assume that it's not faster, and I am well aware that this is not what references are for. So I think my own guesses are quite good, I'm just looking for an answer from someone who really knows what's definitely happening under the hood. In five years of PHP development, I've always found it hard to get quality information on PHP internals short from reading the source.
In a test with 100 000 iterations of calling a function with a string of 20 kB, the results are:
Function that just reads / uses the parameter
pass by value: 0.12065005 seconds
pass by reference: 1.52171397 seconds
Function to write / change the parameter
pass by value: 1.52223396 seconds
pass by reference: 1.52388787 seconds
Conclusions
Pass the parameter by value is always faster
If the function change the value of the variable passed, for practical purposes is the same as pass by reference than by value
The Zend Engine uses copy-on-write, and when you use a reference yourself, it incurs a little extra overhead. Can only find this mention at time of writing though, and comments in the manual contain other links.
(EDIT) The manual page on Objects and references contains a little more info on how object variables differ from references.
I ran some test on this because I was unsure of the answers given.
My results show that passing large arrays or strings by reference IS significantly faster.
Here are my results:
The Y axis (Runs) is how many times a function could be called in 1 second * 10
The test was repeated 8 times for each function/variable
And here is the variables I used:
$large_array = array_fill(PHP_INT_MAX / 2, 1000, 'a');
$small_array = array('this', 'is', 'a', 'small', 'array');
$large_object = (object)$large_array;
$large_string = str_repeat('a', 100000);
$small_string = 'this is a small string';
$value = PHP_INT_MAX / 2;
These are the functions:
function pass_by_ref(&$var) {
}
function pass_by_val($var) {
}
I have experimented with values and references of 10k bytes string passing it to two identical function. One takes argument by value and the second one by reference. They were common functions - take argument, do simple processing and return a value. I did 100 000 calls of both and figured out that references are not designed to increase performance - profit of reference was near 4-5% and it grows only when string becomes large enough (100k and longer, that gave 6-7% improvement). So, my conclusion is do not use references to increase perfomance, this stuff is not for that.
I used PHP Version 5.3.1
I'm pretty sure that no, it's not faster.
Additionally, it says specifically in the manual not to try using references to increase performance.
Edit: Can't find where it says that, but it's there!
I tried to benchmark this with a real-world example based on a project I was working on. As always, the differences are trivial, but the results were somewhat unexpected. For most of the benchmarks I've seen, the called function doesn't actually change the value passed in. I performed a simple str_replace() on it.
**Pass by Value Test Code:**
$originalString=''; // 1000 pseudo-random digits
function replace($string) {
return str_replace('1', 'x',$string);
}
$output = '';
/* set start time */
$mtime = microtime();
$mtime = explode(" ", $mtime);
$mtime = $mtime[1] + $mtime[0];
$tstart = $mtime;
set_time_limit(0);
for ($i = 0; $i < 10; $i++ ) {
for ($j = 0; $j < 1000000; $j++) {
$string = $originalString;
$string = replace($string);
}
}
/* report how long it took */
$mtime = microtime();
$mtime = explode(" ", $mtime);
$mtime = $mtime[1] + $mtime[0];
$tend = $mtime;
$totalTime = ($tend - $tstart);
$totalTime = sprintf("%2.4f s", $totalTime);
$output .= "\n" . 'Total Time' .
': ' . $totalTime;
$output .= "\n" . $string;
echo $output;
Pass by Reference Test Code
The same except for
function replace(&$string) {
$string = str_replace('1', 'x',$string);
}
/* ... */
replace($string);
Results in seconds (10 million iterations):
PHP 5
Value: 14.1007
Reference: 11.5564
PHP 7
Value: 3.0799
Reference: 2.9489
The difference is a fraction of a millisecond per function call, but for this use case, passing by reference is faster in both PHP 5 and PHP 7.
(Note: the PHP 7 tests were performed on a faster machine -- PHP 7 is faster, but probably not that much faster.)
There is nothing better than a testing piece of code
<?PHP
$r = array();
for($i=0; $i<500;$i++){
$r[]=5;
}
function a($r){
$r[0]=1;
}
function b(&$r){
$r[0]=1;
}
$start = microtime(true);
for($i=0;$i<9999;$i++){
//a($r);
b($r);
}
$end = microtime(true);
echo $end-$start;
?>
Final result! The bigger the array (or the greater the count of calls) the bigger the difference. So in this case, calling by reference is faster because the value is changed inside the function.
Otherwise there is no real difference between "by reference" and "by value", the compiler is smart enough not to create a new copy each time if there is no need.
Is simple, there is no need to test anything.
Depends on use-case.
Pass by value will ALWAYS BE FASTER BY VALUE than reference for small amount of arguments. This depends by how many variables that architecture allows to be passed through registers (ABI).
For example x64 will allow you 4 values 64 bit each to be passed through registers.
https://en.wikipedia.org/wiki/X86_calling_conventions
This is because you don't have to de-referentiate the pointers, just use value directly.
If your data that needs to be passed is bigger than ABI, rest of values will go to stack.
In this case, a array or a object (which in instance is a class, or a structure + headers) will ALWAYS BE FASTER BY REFERENCE.
This is because a reference is just a pointer to your data (not data itself), fixed size, say 32 or 64 bit depending on machine. That pointer will fit in one CPU register.
PHP is written in C/C++ so I'd expect to behave the same.
There is no need for adding & operator when passing objects. In PHP 5+ objects are passed by reference anyway.

PHP built in functions complexity (isAnagramOfPalindrome function)

I've been googling for the past 2 hours, and I cannot find a list of php built in functions time and space complexity. I have the isAnagramOfPalindrome problem to solve with the following maximum allowed complexity:
expected worst-case time complexity is O(N)
expected worst-case space complexity is O(1) (not counting the storage required for input arguments).
where N is the input string length. Here is my simplest solution, but I don't know if it is within the complexity limits.
class Solution {
// Function to determine if the input string can make a palindrome by rearranging it
static public function isAnagramOfPalindrome($S) {
// here I am counting how many characters have odd number of occurrences
$odds = count(array_filter(count_chars($S, 1), function($var) {
return($var & 1);
}));
// If the string length is odd, then a palindrome would have 1 character with odd number occurrences
// If the string length is even, all characters should have even number of occurrences
return (int)($odds == (strlen($S) & 1));
}
}
echo Solution :: isAnagramOfPalindrome($_POST['input']);
Anyone have an idea where to find this kind of information?
EDIT
I found out that array_filter has O(N) complexity, and count has O(1) complexity. Now I need to find info on count_chars, but a full list would be very convenient for future porblems.
EDIT 2
After some research on space and time complexity in general, I found out that this code has O(N) time complexity and O(1) space complexity because:
The count_chars will loop N times (full length of the input string, giving it a start complexity of O(N) ). This is generating an array with limited maximum number of fields (26 precisely, the number of different characters), and then it is applying a filter on this array, which means the filter will loop 26 times at most. When pushing the input length towards infinity, this loop is insignificant and it is seen as a constant. Count also applies to this generated constant array, and besides, it is insignificant because the count function complexity is O(1). Hence, the time complexity of the algorithm is O(N).
It goes the same with space complexity. When calculating space complexity, we do not count the input, only the objects generated in the process. These objects are the 26-elements array and the count variable, and both are treated as constants because their size cannot increase over this point, not matter how big the input is. So we can say that the algorithm has a space complexity of O(1).
Anyway, that list would be still valuable so we do not have to look inside the php source code. :)
A probable reason for not including this information is that is is likely to change per release, as improvements are made / optimizations for a general case.
PHP is built on C, Some of the functions are simply wrappers around the c counterparts, for example hypot a google search, a look at man hypot, in the docs for he math lib
http://www.gnu.org/software/libc/manual/html_node/Exponents-and-Logarithms.html#Exponents-and-Logarithms
The source actually provides no better info
https://github.com/lattera/glibc/blob/a2f34833b1042d5d8eeb263b4cf4caaea138c4ad/math/w_hypot.c (Not official, Just easy to link to)
Not to mention, This is only glibc, Windows will have a different implementation. So there MAY even be a different big O per OS that PHP is compiled on
Another reason could be because it would confuse most developers.
Most developers I know would simply choose a function with the "best" big O
a maximum doesnt always mean its slower
http://www.sorting-algorithms.com/
Has a good visual prop of whats happening with some functions, ie bubble sort is a "slow" sort, Yet its one of the fastest for nearly sorted data.
Quick sort is what many will use, which is actually very slow for nearly sorted data.
Big O is worst case - PHP may decide between a release that they should optimize for a certain condition and that will change the big O of the function and theres no easy way to document that.
There is a partial list here (which I guess you have seen)
List of Big-O for PHP functions
Which does list some of the more common PHP functions.
For this particular example....
Its fairly easy to solve without using the built in functions.
Example code
function isPalAnagram($string) {
$string = str_replace(" ", "", $string);
$len = strlen($string);
$oddCount = $len & 1;
$string = str_split($string);
while ($len > 0 && $oddCount >= 0) {
$current = reset($string);
$replace_count = 0;
foreach($string as $key => &$char) {
if ($char === $current){
unset($string[$key]);
$len--;
$replace_count++;
continue;
}
}
$oddCount -= ($replace_count & 1);
}
return ($len - $oddCount) === 0;
}
Using the fact that there can not be more than 1 odd count, you can return early from the array.
I think mine is also O(N) time because its worst case is O(N) as far as I can tell.
Test
$a = microtime(true);
for($i=1; $i<100000; $i++) {
testMethod("the quick brown fox jumped over the lazy dog");
testMethod("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
testMethod("testest");
}
printf("Took %s seconds, %s memory", microtime(true) - $a, memory_get_peak_usage(true));
Tests run using really old hardware
My way
Took 64.125452041626 seconds, 262144 memory
Your way
Took 112.96145009995 seconds, 262144 memory
I'm fairly sure that my way is not the quickest way either.
I actually cant see much info either for languages other than PHP (Java for example).
I know a lot of this post is speculating about why its not there and theres not a lot drawing from credible sources, I hope its an partially explained why big O isnt listed in the documentation page though

Timing attack in PHP

strcmp - what is means "Binary safe string comparison"? This compare is safe for the timing attack?
If no, how can I compare two strings for preventing the timing attack? Compare hashes of the strings is enough? Or I must use some library (or own code) that gives constant time for the compare?
Here writes that the timing attack can be used in the web. But can be this type of an attack exists in the real world? Or this attack can be used only for a small type of an attacker (like government) so this protection through the web is excess?
"binary safe" means that any bytes can be safely compared with strcmp, not just valid characters in some character set. A quick test confirms that strcmp is not safe against timing attacks:
$nchars = 1000;
$s1 = str_repeat('a', $nchars + 1);
$s2 = str_repeat('a', $nchars) . 'b';
$s3 = 'b' . str_repeat('a', $nchars);
$times = 100000;
$start = microtime(true);
for ($i = 0; $i < $times; $i++) {
strcmp($s1, $s2);
}
$timeForSameAtStart = microtime(true) - $start;
$start = microtime(true);
for ($i = 0; $i < $times; $i++) {
strcmp($s1, $s3);
}
$timeForSameAtEnd = microtime(true) - $start;
printf("'b' at the end: %.4f\n'b' at the front: %.4f\n", $timeForSameAtStart, $timeForSameAtEnd);
For me this prints something like 'b' at the end: 0.0634 'b' at the front: 0.0287.
Many other string-based functions in PHP likely suffer from similar issues. Working around this is tricky, especially in PHP where you don't actually know what a lot of functions are really doing at the physical level.
One possible tactic is just sticking a random wait time in your code before you return the answer to the caller/potential attacker. Even better, measure how long it took to check the input data (e.g., with microtime), and then wait a random time minus that amount of time. This is not 100% secure, but it makes attacking the system MUCH harder because, at a minimum, an attacker will have to try each input many times in order to filter out the randomness.
The problem with strcmp is, that it depends on implementation. If it binarily compares each byte of strings until it reaches difference or end of either strings, then it is vulnerable to timing attack.
Now how about hashing?
I have found this Security question and i belive it has the correct answer for you:
https://security.stackexchange.com/a/46215
Timing attack is a myth.
I explain.
The time that it takes to validates a text, between one similar versus other different is around a fraction of second, let's say +/- 0.1 second (exaggerated!).
However, the time that it takes an attacker to measure this time is:
delay of the network + 0.1 seconds + delay of the system (may be its busy doing some other task) + other delays.
So no, its not possible, even for a local system (lag zero), the result of interval of time is always unclear.
In a test, let's say the difference between one method and another is 1us.
So, if we test it and the difference is 1us, then we could guess part of the number.
But what if there is another factor, for example, the network, the cpu usage at the moment, the cpu cycle of the moment and such.
Even if we excluded the network, we have that most operating systems are multi-tasking, so the test must be done in a system with a single-tasking operating system or a system running a single task, and that is not something that you see in the wild. Even embedded systems run multiple threads at the same time.
But let's say we run locally (not network) and we are doing a drill-run in a computer that only runs a single task, our task. But we have another problem, modern CPUs don't run at a constant cycle, they vary depending on the usage (, temperature and other factors.
So, it is only possible if:
it is executed locally and there is no other factor.
it runs as a single task and no other task is running on the server.
the cpu runs constantly.
i.e. it is ABSURD.
it is the test.
<?php
$text='123456789012345678901234567890123456789012345678901234567890123456789012345678901234';
$compare1='12345678901234567890123456789012345678901234567890123456789012345678901234567890123x';
$compare2='2222222222222222222222222222222222222222222222222222222222222222222222222222222222222';
$a1=microtime(true);
for($i=0;$i<100000;$i++) {
if($compare1===$text) {
// do something
}
}
$a2=microtime(true);
var_dump($a2-$a1);
$a1=microtime(true);
for($i=0;$i<100000;$i++) {
if($compare2===$text) {
// do something
}
}
$a2=microtime(true);
var_dump($a2-$a1);
It took me 5 minutes to invalidate this hypothesis.
What is tested:
it tests a 512bit text and it compares with two tests and compares the times.
This test is done to prove the hypothesis so it forces a no-real situation where the first text compared is almost the same as the first test (excluding the last character).
It also excludes latencies and other operations.
(why 512bits, most passwords are encrypted in 128 and 256bits, 512bits is what we can call it safe)
And it is the result.
one round:
0.021588087081909
0.021672010421753 (long time)
another run:
0.021767854690552
0.022729873657227 (long time)
and another run:
0.021697998046875 (long time)
0.021611213684082
and again
0.021565914154053 (long time)
0.020948171615601
and again
0.021995067596436
0.0224769115448 (long time)
So, even when the test is forced to validate the point, it fails.
i.e.
you can't find a trend when one of the variables is unknown and this factor compromises the whole test. I can test it 1 million times and the result will be the same. And this test, in particular, avoids any variable such as latency, other processes, access to the database, etc.

PHP - Checking Character Type and Length and Trimming Leading Zeros

I have this code that works well, just inquiring to see if there is a better, shorter way to do it:
$sString = $_GET["s"];
if (ctype_digit($sString) == true) {
if (strlen($sString) == 5){
$sString = ltrim($sString, '0');
}
}
The purpose of this code is to remove the leading zeros from a search query before passing it to a search engine if it is 5 digits and. The exact scenario is A) All digits, B) Length equals 5, and C) It has leading zeros.
Better/Shorter means in execution time...
$sString = $_GET["s"];
if (preg_match('/^\d{5}$/', $sString)) {
$sString = ltrim($sString, '0');
}
This is shorter (and makes some assumptions you're not clear about in your question). However, shorter is not always better. Readability is sacrificed here, as well as some performance I guess (which will be negligable in a single execution, could be "costly" when run in tight loops though) etc. You ask for a "better, shorter way to do it"; then you need to define "better" and "shorter". Shorter in terms of lines/characters of code? Shorter in execution time? Better as in more readable? Better as in "most best practices" applied?
Edit:
So you've added the following:
The purpose of this code is to remove the leading zeros from a search
query before passing it to a search engine if it is 5 digits and. The
exact scenario is A) All digits, B) Length equals 5, and C) It has
leading zeros.
A) Ok, that clears things up (You sure characters like '+' or '-' won't be required? I assume these 'inputs' will be things like (5 digit) invoicenumbers, productnumbers etc.?)
B) What to do on inputs of length 1, 2, 3, 4, 6, 7, ... 28? 97? Nothing? Keep input intact?
Better/Shorter means in execution time...
Could you explain why you'd want to "optimize" this tiny piece of code? Unless you run this thousands of times in a tight loop the effects of optimizing this will be negligible. (Mumbles something about premature, optimization, root, evil). What is it that you're hoping to "gain" in "optimizing" this piece of code?
I haven't profiled/measured it yet, but my (educated) guess is that my preg_match is more 'expensive' in terms of execution time than your original code. It is "shorter" in terms of code though (but see above about sacrificing readability etc.).
Long story short: this code is not worth optimizing*; any I/O, database queries etc. will "cost" a hundred, maybe even thousands of times more. Go optimize that first
* Unless you have "optimized" everything else as far as possible (and even then, a database query might be more efficient when written in another way so the execution plan is mor eefficient or I/O might be saved using (more) caching etc. etc.). The fraftion of a millisecond (at best) you're about to save optimizing this code better be worth it! And even then you'd probably need to consider switching to better hardware, another platform or other programming language for example.
Edit 2:
I have quick'n'dirty "profiled" your code:
$start = microtime(true);
for ($i=0; $i<1000000;$i++) {
$sString = '01234';
if (ctype_digit($sString) == true) {
if (strlen($sString) == 5){
$sString = ltrim($sString, '0');
}
}
}
echo microtime(true) - $start;
Output: 0.806390047073
So, that's 0.8 seconds for 1 million(!) iterations. My code, "profiled" the same way:
$start = microtime(true);
for ($i=0; $i<1000000;$i++) {
$sString = '01234';
if (preg_match('/^\d{5}$/', $sString)) {
$sString = ltrim($sString, '0');
}
}
echo microtime(true) - $start;
Output: 1.09024000168
So, it's slower. As I guessed/predicted.
Note my explicitly mentioning "quick'n'dirty": for good/accurate profiling you need better tools and if you do use a poor-man's profiling method like I demonstrated above then at least make sure you average your numbers over a few dozen runs etc. to make sure your numbers are consistent and reliable.
But, either solution takes, worst case, less than 0,0000011 to run. If this code runs "Tens of Thousands of times a day" (assuming 100.000 times) you'll save exactly 0.11 seconds over an entire day IF you got it down to 0! If this 0.11 seconds is a problem you have a bigger problem at hand, not these few lines of code. It's just not worth "optimizing" (hell, it's not even worth discussing; the time you and I have taken going back-and-forth over this will not be "earned back" in at least 50 years).
Lesson learned here:
Measure / Profile before optimizing!
A little shorter:
if (ctype_digit($sString) && strlen($sString) == 5) {
$sString = ltrim($sString, '0');
}
It also matters if you want "digits" 0-9 or a "numeric value", which may be -1, +0123.45e6, 0xf4c3b00c, 0b10100111001 etc.. in which case use is_numeric.
I guess you could also just do this to remove 0s:
$sString = (int)$sString;

How to determine the memory footprint (size) of a variable?

Is there a function in PHP (or a PHP extension) to find out how much memory a given variable uses? sizeof just tells me the number of elements/properties.
memory_get_usage helps in that it gives me the memory size used by the whole script. Is there a way to do this for a single variable?
Note that this is on a development machine, so loading extensions or debug tools is feasible.
There's no direct way to get the memory usage of a single variable, but as Gordon suggested, you can use memory_get_usage. That will return the total amount of memory allocated, so you can use a workaround and measure usage before and after to get the usage of a single variable. This is a bit hacky, but it should work.
$start_memory = memory_get_usage();
$foo = "Some variable";
echo memory_get_usage() - $start_memory;
Note that this is in no way a reliable method, you can't be sure that nothing else touches memory while assigning the variable, so this should only be used as an approximation.
You can actually turn that to an function by creating a copy of the variable inside the function and measuring the memory used. Haven't tested this, but in principle, I don't see anything wrong with it:
function sizeofvar($var) {
$start_memory = memory_get_usage();
$tmp = unserialize(serialize($var));
return memory_get_usage() - $start_memory;
}
You Probably need a Memory Profiler. I have gathered information fro SO but I have copied the some important thing which may help you also.
As you probably know, Xdebug dropped the memory profiling support since the 2.* version. Please search for the "removed functions" string here: http://www.xdebug.org/updates.php
Removed functions
Removed support for Memory profiling as that didn't work properly.
Other Profiler Options
php-memory-profiler
https://github.com/arnaud-lb/php-memory-profiler. This is what I've done on my Ubuntu server to enable it:
sudo apt-get install libjudy-dev libjudydebian1
sudo pecl install memprof
echo "extension=memprof.so" > /etc/php5/mods-available/memprof.ini
sudo php5enmod memprof
service apache2 restart
And then in my code:
<?php
memprof_enable();
// do your stuff
memprof_dump_callgrind(fopen("/tmp/callgrind.out", "w"));
Finally open the callgrind.out file with KCachegrind
Using Google gperftools (recommended!)
First of all install the Google gperftools by downloading the latest package here: https://code.google.com/p/gperftools/
Then as always:
sudo apt-get update
sudo apt-get install libunwind-dev -y
./configure
make
make install
Now in your code:
memprof_enable();
// do your magic
memprof_dump_pprof(fopen("/tmp/profile.heap", "w"));
Then open your terminal and launch:
pprof --web /tmp/profile.heap
pprof will create a new window in your existing browser session with something like shown below:
Xhprof + Xhgui (the best in my opinion to profile both cpu and memory)
With Xhprof and Xhgui you can profile the cpu usage as well or just the memory usage if that's your issue at the moment.
It's a very complete solutions, it gives you full control and the logs can be written both on mongo or in the filesystem.
For more details see here.
Blackfire
Blackfire is a PHP profiler by SensioLabs, the Symfony2 guys https://blackfire.io/
If you use puphpet to set up your virtual machine you'll be happy to know it's supported ;-)
Xdebug and tracing memory usage
XDEBUG2 is a extension for PHP. Xdebug allows you to log all function calls, including parameters and return values to a file in different formats.There are three output formats. One is meant as a human readable trace, another one is more suited for computer programs as it is easier to parse, and the last one uses HTML for formatting the trace. You can switch between the two different formats with the setting. An example would be available here
forp
forp simple, non intrusive, production-oriented, PHP profiler. Some of features are:
measurement of time and allocated memory for each function
CPU usage
file and line number of the function call
output as Google's Trace Event format
caption of functions
grouping of functions
aliases of functions (useful for anonymous functions)
DBG
DBG is a a full-featured php debugger, an interactive tool that helps you debugging php scripts. It works on a production and/or development WEB server and allows you debug your scripts locally or remotely, from an IDE or console and its features are:
Remote and local debugging
Explicit and implicit activation
Call stack, including function calls, dynamic and static method calls, with their parameters
Navigation through the call stack with ability to evaluate variables in corresponding (nested) places
Step in/Step out/Step over/Run to cursor functionality
Conditional breakpoints
Global breakpoints
Logging for errors and warnings
Multiple simultaneous sessions for parallel debugging
Support for GUI and CLI front-ends
IPv6 and IPv4 networks supported
All data transferred by debugger can be optionally protected with SSL
No, there is not. But you can serialize($var) and check the strlen of the result for an approximation.
In answer to Tatu Ulmanens answer:
It should be noted, that $start_memory itself will take up memory (PHP_INT_SIZE * 8).
So the whole function should become:
function sizeofvar($var) {
$start_memory = memory_get_usage();
$var = unserialize(serialize($var));
return memory_get_usage() - $start_memory - PHP_INT_SIZE * 8;
}
Sorry to add this as an extra answer, but I can not yet comment on an answer.
Update: The *8 is not definate. It can depend apparently on the php version and possibly on 64/32 bit.
You can't retrospectively calculate the exact footprint of a variable as two variables can share the same allocated space in the memory
Let's try to share memory between two arrays, we see that allocating the second array costs half of the memory of the first one. When we unset the first one, nearly all the memory is still used by the second one.
echo memory_get_usage()."\n"; // <-- 433200
$c=range(1,100);
echo memory_get_usage()."\n"; // <-- 444348 (+11148)
$d=array_slice($c, 1);
echo memory_get_usage()."\n"; // <-- 451040 (+6692)
unset($c);
echo memory_get_usage()."\n"; // <-- 444232 (-6808)
unset($d);
echo memory_get_usage()."\n"; // <-- 433200 (-11032)
So we can't conclude than the second array uses half the memory, as it becomes false when we unset the first one.
For a full view about how the memory is allocated in PHP and for which use, I suggest you to read the following article: How big are PHP arrays (and values) really? (Hint: BIG!)
The Reference Counting Basics in the PHP documentation has also a lot of information about memory use, and references count to shared data segment.
The different solutions exposed here are good for approximations but none can handle the subtle management of PHP memory.
calculating newly allocated space
If you want the newly allocated space after an assignment, then you have to use memory_get_usage() before and after the allocation, as using it with a copy does give you an erroneous view of the reality.
// open output buffer
echo "Result: ";
// call every function once
range(1,1); memory_get_usage();
echo memory_get_usage()."\n";
$c=range(1,100);
echo memory_get_usage()."\n";
Remember that if you want to store the result of the first memory_get_usage(), the variable has to already exist before, and memory_get_usage() has to be called another previous time, and every other function also.
If you want to echo like in the above example, your output buffer has to be already opened to avoid accounting memory needed to open the output buffer.
calculating required space
If you want to rely on a function to calculate the required space to store a copy of a variable, the following code takes care of different optimizations:
<?php
function getMemorySize($value) {
// existing variable with integer value so that the next line
// does not add memory consumption when initiating $start variable
$start=1;
$start=memory_get_usage();
// json functions return less bytes consumptions than serialize
$tmp=json_decode(json_encode($value));
return memory_get_usage() - $start;
}
// open the output buffer, and calls the function one first time
echo ".\n";
getMemorySize(NULL);
// test inside a function in order to not care about memory used
// by the addition of the variable name to the $_GLOBAL array
function test() {
// call the function name once
range(1,1);
// we will compare the two values (see comment above about initialization of $start)
$start=1;
$start=memory_get_usage();
$c=range(1,100);
echo memory_get_usage()-$start."\n";
echo getMemorySize($c)."\n";
}
test();
// same result, this works fine.
// 11044
// 11044
Note that the size of the variable name matters in the memory allocated.
Check your code!!
A variable has a basic size defined by the inner C structure used in the PHP source code. This size does not fluctuate in the case of numbers. For strings, it would add the length of the string.
typedef union _zvalue_value {
long lval; /* long value */
double dval; /* double value */
struct {
char *val;
int len;
} str;
HashTable *ht; /* hash table value */
zend_object_value obj;
} zvalue_value;
If we do not take the initialization of the variable name into account, we already know how much a variable uses (in case of numbers and strings):
44 bytes in the case of numbers
&plus; 24 bytes in the case of strings
&plus; the length of the string (including the final NUL character)
(those numbers can change depending on the PHP version)
You have to round up to a multiple of 4 bytes due to memory alignment. If the variable is in the global space (not inside a function), it will also allocate 64 more bytes.
So if you want to use one of the codes inside this page, you have to check that the result using some simple test cases (strings or numbers) match those data taking into account every one of the indications in this post ($_GLOBAL array, first function call, output buffer, ...)
See:
memory_get_usage() — Returns the amount of memory allocated to PHP
memory_get_peak_usage() — Returns the peak of memory allocated by PHP
Note that this won't give you the memory usage of a specific variable though. But you can put calls to these function before and after assigning the variable and then compare the values. That should give you an idea of the memory used.
You could also have a look at the PECL extension Memtrack, though the documentation is a bit lacking, if not to say, virtually non-existent.
You could opt for calculating memory difference on a callback return value. It's a more elegant solution available in PHP 5.3+.
function calculateFootprint($callback) {
$startMemory = memory_get_usage();
$result = call_user_func($callback);
return memory_get_usage() - $startMemory;
}
$memoryFootprint = calculateFootprint(
function() {
return range(1, 1000000);
}
);
echo ($memoryFootprint / (1024 * 1024)) . ' MB' . PHP_EOL;
I had a similar problem, and the solution I used was to write the variable to a file then run filesize() on it. Roughly like this (untested code):
function getVariableSize ( $foo )
{
$tmpfile = "temp-" . microtime(true) . ".txt";
file_put_contents($tmpfile, $foo);
$size = filesize($tmpfile);
unlink($tmpfile);
return $size;
}
This solution isn't terribly fast because it involves disk IO, but it should give you something much more exact than the memory_get_usage tricks. It just depends upon how much precision you require.
The following script shows total memory usage of a single variable.
function getVariableUsage($var) {
$total_memory = memory_get_usage();
$tmp = unserialize(serialize($var));
return memory_get_usage() - $total_memory;
}
$var = "Hey, what's you doing?";
echo getVariableUsage($var);
Check this out
http://www.phpzag.com/how-much-memory-do-php-variables-use/
Never tried, but Xdebug traces with xdebug.collect_assignments may be enough.

Categories