Strange memory usage in while(1) vs. for(;;)

Strange memory usage in while(1) vs. for(;;) - php

I have the 2 following codes.
1:
$i = 0;
while(1)
{
$i++;
echo "big text for memory usage ";
if ( $i == 50000 )
break;
}
echo "<br />" . memory_get_usage();
It echoes every time : 1626464
2:
$i = 0;
for(;;)
{
$i++;
echo "big text for memory usage ";
if ( $i == 50000 )
break;
}
echo "<br />" . memory_get_usage();
It echoes every time : 1626656
Can anybody exaplain this difference between the 2 different memory usages? Even if they are so small...

It's an implementation detail. With the for loop, PHP probably uses some space to store three pointers, one for the for intialization, one for the incrementation, and one for the stop condition. If you're on a 64-bit system, then this accounts for the 64 * 3 = 192 extra bits you're seeing. Of course, it's hard to tell if I'm right without looking at the actual code.

The amount of memory difference you're seeing is very negligible and shouldn't be a concern. The way the two loops are compiled should not affect memory usage, but they may affect runtime speed (still negligible though).
For instance:
while(1): This will cause the compiler to check if 1 is true; if it isn't, it will jump to the end of your loop - if it is, it will process the loop's contents.
for(;;): A standard for-loop does three things. It defines an iterator, which you skip, checks if the iterator meets a condition, which you also skip, executes the body content, and then increments the iterator and jumps back to the condition-check. The jumps are all still in-place, but your code doesn't do anything at each (hence the "endless loop" - until you manually break, of course).
As a pure guess for a potential answer as-to-why the for-loop has a very slight raise in memory could be because of the way it's supposed to work - with a defined/managed iterator. PHP could pre-allocate extra space, if very small, to accomodate for this iterator and garbage collection even if you're not using it.

Related

PHP built in functions complexity (isAnagramOfPalindrome function)

I've been googling for the past 2 hours, and I cannot find a list of php built in functions time and space complexity. I have the isAnagramOfPalindrome problem to solve with the following maximum allowed complexity:
expected worst-case time complexity is O(N)
expected worst-case space complexity is O(1) (not counting the storage required for input arguments).
where N is the input string length. Here is my simplest solution, but I don't know if it is within the complexity limits.
class Solution {
// Function to determine if the input string can make a palindrome by rearranging it
static public function isAnagramOfPalindrome($S) {
// here I am counting how many characters have odd number of occurrences
$odds = count(array_filter(count_chars($S, 1), function($var) {
return($var & 1);
}));
// If the string length is odd, then a palindrome would have 1 character with odd number occurrences
// If the string length is even, all characters should have even number of occurrences
return (int)($odds == (strlen($S) & 1));
}
}
echo Solution :: isAnagramOfPalindrome($_POST['input']);
Anyone have an idea where to find this kind of information?
EDIT
I found out that array_filter has O(N) complexity, and count has O(1) complexity. Now I need to find info on count_chars, but a full list would be very convenient for future porblems.
EDIT 2
After some research on space and time complexity in general, I found out that this code has O(N) time complexity and O(1) space complexity because:
The count_chars will loop N times (full length of the input string, giving it a start complexity of O(N) ). This is generating an array with limited maximum number of fields (26 precisely, the number of different characters), and then it is applying a filter on this array, which means the filter will loop 26 times at most. When pushing the input length towards infinity, this loop is insignificant and it is seen as a constant. Count also applies to this generated constant array, and besides, it is insignificant because the count function complexity is O(1). Hence, the time complexity of the algorithm is O(N).
It goes the same with space complexity. When calculating space complexity, we do not count the input, only the objects generated in the process. These objects are the 26-elements array and the count variable, and both are treated as constants because their size cannot increase over this point, not matter how big the input is. So we can say that the algorithm has a space complexity of O(1).
Anyway, that list would be still valuable so we do not have to look inside the php source code. :)

A probable reason for not including this information is that is is likely to change per release, as improvements are made / optimizations for a general case.
PHP is built on C, Some of the functions are simply wrappers around the c counterparts, for example hypot a google search, a look at man hypot, in the docs for he math lib
http://www.gnu.org/software/libc/manual/html_node/Exponents-and-Logarithms.html#Exponents-and-Logarithms
The source actually provides no better info
https://github.com/lattera/glibc/blob/a2f34833b1042d5d8eeb263b4cf4caaea138c4ad/math/w_hypot.c (Not official, Just easy to link to)
Not to mention, This is only glibc, Windows will have a different implementation. So there MAY even be a different big O per OS that PHP is compiled on
Another reason could be because it would confuse most developers.
Most developers I know would simply choose a function with the "best" big O
a maximum doesnt always mean its slower
http://www.sorting-algorithms.com/
Has a good visual prop of whats happening with some functions, ie bubble sort is a "slow" sort, Yet its one of the fastest for nearly sorted data.
Quick sort is what many will use, which is actually very slow for nearly sorted data.
Big O is worst case - PHP may decide between a release that they should optimize for a certain condition and that will change the big O of the function and theres no easy way to document that.
There is a partial list here (which I guess you have seen)
List of Big-O for PHP functions
Which does list some of the more common PHP functions.
For this particular example....
Its fairly easy to solve without using the built in functions.
Example code
function isPalAnagram($string) {
$string = str_replace(" ", "", $string);
$len = strlen($string);
$oddCount = $len & 1;
$string = str_split($string);
while ($len > 0 && $oddCount >= 0) {
$current = reset($string);
$replace_count = 0;
foreach($string as $key => &$char) {
if ($char === $current){
unset($string[$key]);
$len--;
$replace_count++;
continue;
}
}
$oddCount -= ($replace_count & 1);
}
return ($len - $oddCount) === 0;
}
Using the fact that there can not be more than 1 odd count, you can return early from the array.
I think mine is also O(N) time because its worst case is O(N) as far as I can tell.
Test
$a = microtime(true);
for($i=1; $i<100000; $i++) {
testMethod("the quick brown fox jumped over the lazy dog");
testMethod("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
testMethod("testest");
}
printf("Took %s seconds, %s memory", microtime(true) - $a, memory_get_peak_usage(true));
Tests run using really old hardware
My way
Took 64.125452041626 seconds, 262144 memory
Your way
Took 112.96145009995 seconds, 262144 memory
I'm fairly sure that my way is not the quickest way either.
I actually cant see much info either for languages other than PHP (Java for example).
I know a lot of this post is speculating about why its not there and theres not a lot drawing from credible sources, I hope its an partially explained why big O isnt listed in the documentation page though

Amount of data stored in a PHP array

I'm looking for a way to measure the amount of data stored in a PHP array. I'm not talking about the number of elements in the array (which you can figure out with count($array, COUNT_RECURSIVE)), but the cumulative amount of data from all the keys and their corresponding values. For instance:
array('abc'=>123); // size = 6
array('a'=>1,'b'=>2); // size = 4
As what I'm interested in is order of magnitude rather than the exact amount (I want to compare the processing memory and time usage versus the size of the arrays) I thought about using the following trick:
strlen(print_r($array,true));
However the amount of overhead coming from print_r varies depending on the structure of the array which doesn't give me consistent results:
echo strlen(print_r(array('abc'=>123),true)); // 27
echo strlen(print_r(array('a'=>1,'b'=>2),true)); // 35
Is there a way (ideally in a one-liner and without impacting too much performance as I need to execute this at run-time on production) to measure the amount of data stored in an array in PHP?

Does this do the trick:
<?php
$arr = array('abc'=>123);
echo strlen(implode('',array_keys($arr)).implode('',$arr));
?>

Short answer: mission impossible
You could try something like:
strlen(serialize($myArray)) // either this
strlen(json_encode($myArray)) // or this
But to approximate the true memory footprint of an array, you will have to do a lot more than that. If you're looking for a ballpark estimate, arrays take 3-8x more than their serialized version, depending on what you store in them and how many elements you have. It increases gradually, in bigger and bigger chunks as your array grows. To give you an idea of what's happening, here's an array estimation function I came up with, after many hours of trying, for one-level arrays only:
function estimateArrayFootprint($a) { // copied from one of my failed quests :(
$size = 0;
foreach($a as $k=>$v) {
foreach([$k,$v] as $x) {
$n = strlen($x);
do{
if($n>8192 ) {$n = (1+($n>>12)<<12);break;}
if($n>1024 ) {$n = (1+($n>> 9)<< 9);break;}
if($n>512 ) {$n = (1+($n>> 8)<< 8);break;}
if($n>0 ) {$n = (1+($n>> 5)<< 5);break;}
}while(0);
$size += $n + 96;
}
}
return $size;
}
So that's how easy it is, not. And again, it's not a reliable estimation, it probably depends on the PHP memory limit, the architecture, the PHP version and a lot more. The question is how accurately do you need this value.
Also let's not forget that these values came from a memory_get_usage(1) which is not very exact either. PHP allocates memory in big blocks in order to avoid a noticeable overhead as your string/array/whatever else grows, like in a for(...) $x.="yada" situation.
I wish I could say anything more useful.

Comparing execution times in PHP

I would like to compare different PHP code to know which one would be executed faster. I am currently using the following code:
<?php
$load_time_1 = 0;
$load_time_2 = 0;
$load_time_3 = 0;
for($x = 1; $x <= 20000; $x++)
{
//code 1
$start_time = microtime(true);
$i = 1;
$i++;
$load_time_1 += (microtime(true) - $start_time);
//code 2
$start_time = microtime(true);
$i = 1;
$i++;
$load_time_2 += (microtime(true) - $start_time);
//code 3
$start_time = microtime(true);
$i = 1;
$i++;
$load_time_3 += (microtime(true) - $start_time);
}
echo $load_time_1;
echo '<br />';
echo $load_time_2;
echo '<br />';
echo $load_time_3;
?>
I have executed the script several times.
The first result is
0.44057559967041
0.43392467498779
0.43600964546204
The second result is
0.50447297096252
0.48595094680786
0.49943733215332
The third result is
0.5283739566803
0.55247902870178
0.55091571807861
The result looks okay, but the problem is, that every time I execute this code the result is different. Also, I am comparing three times the same code and on the same machine.
Why would there be a difference in speed while comparing? And is there a way to compare execution times and to see the the real difference?

there is a thing called Observational error.
As long as your numbers do not exceed it, all your measurements are just waste of time.
The only proper way of doing measurements is called profiling and stands for measuring significant parts of the code, not senseless ones.

Why would there be a difference in
speed while comparing?
There are two reasons for this, which are both related to how things that are out of your control are handled by the PHP and the Operating system.
Firstly, the computer processor can only do certain amount of operations at any given time. The Operating system is basically responsible of handling the multitasking to divide these available cycles to your applications. Since these cycles aren't given at a constant rate, small speed variations are to be expected even with identical PHP commands, because of how processor cycles are allocated.
Secondly, a bigger cause to time variations are the background operations of PHP. There are many things that are completely hidden to the user, like memory allocation, garbage collection and handling various name spaces for variables and the like. These operations also take computer cycles and they can be run at unexpected times during your script. If garbage collection is performed during the first incrementation, but not the second, it causes the first operation to take longer than the second. Sometimes, because of garbage collection, the order in which the tests are performed can also impact the execution time.
Speed testing can be a bit tricky, because unrelated factors (like other applications running in the background) can skew the results of your test. Generally small speed differences are hard to tell between scripts, but when a speed test is run enough number of times, the real results can be seen. For example, if one script is constantly faster than another, it usually points out to that script being more efficient in terms of processing speed.

the reason the results vary is because there are other things going on at the same time, such as windows or linux based tasks, other processes, you will never get an exact result, you best of running the code over a 100 iterations and then devide the result to find the average time take, and use that as your figure/
Also it would be beneficial for you to create a class that can handle this for you, this way you can use it all the time without having to write the code every time:
try something like this (untested):
class CodeBench
{
private $benches = array();
public function __construct(){}
public function begin($name)
{
if(!isset($this->benches[$name]))
{
$this->benches[$name] = array();
}
$this->benches[$name]['start'] = array(
'microtime' => microtime(true)
/* Other information*/
);
}
public function end($name)
{
if(!isset($this->benches[$name]))
{
throw new Exception("You must first declare a benchmark for " . $name);
}
$this->benches[$name]['end'] = array(
'microtime' => microtime()
/* Other information*/
);
}
public function calculate($name)
{
if(!isset($this->benches[$name]))
{
throw new Exception("You must first declare a benchmark for " . $name);
}
if(!isset($this->benches[$name]['end']))
{
throw new Exception("You must first call an end call for " . $name);
}
return ($this->benches[$name]['end'] - $this->benches[$name]['start']) . 'ms'
}
}
And then use like so:
$CB = new CodeBench();
$CB->start("bench_1");
//Do work:
$CB->end("bench_1");
$CB->start("bench_2");
//Do work:
$CB->end("bench_2");
echo "First benchmark had taken: " . $CB->calculate("bench_1");
echo "Second benchmark had taken: " . $CB->calculate("bench_2");

Computing speeds are never 100% set in stone. PHP is a server-side script, and thus depending on the computing power available to the server, it can take a varying amount of time.
Since you're subtracting from the start time with each step, it is expected that load time 3 will be greater than 2 which will be greater than 1.

PHP memory_get_usage

I came across the PHP's memory_get_usage() and memory_get_peak_usage().
The problem is that I found that these two functions do not provide the real memory used by the current script.
My test script is:
<?php
echo memory_get_usage();
echo '<br />';
$a = str_repeat('hello', 100000);
echo '<br />';
echo memory_get_usage();
echo '<br />';
echo memory_get_peak_usage();
?>
Which returns:
355120
5355216
5356008
What do you understand from this?
The first value is before executing the str_repeat() so it has to be the value of 0.
The second is after the process and it's OK to have a value greater than 0 but not that big value.
The third is the "peak" value and it's slightly greater than the second as I think it should be the biggest value in a processing microsecond.
So do you think that the real value of the current script's memory consumption should be like this:
memory_usage = the second memory usage - the first memory usage
peak_memory_usage = the third (peak_usage) - the first memory usage
which gives:
1) 5355216 - 355120 = 5000096 bytes
2) 5356008 - 355120 = 5000888 bytes
If this is how it works, I assume that the first 355120 bytes are the whole system allocated memory used by apache and other modules, as the first value never changes when you increase or decrease the number of repeats in the str_repeat(), only the two values after the process increase or decrease but never gets smaller that the first value.

According to the php manual, memory_get_usage returns the amount of memory allocated to php, not necessarily the amount being used.

Ok, your first assertion that the first memory_get_usage() should be 0 is wrong. According to PHP's documentation:
Returns the amount of memory, in
bytes, that's currently being
allocated to your PHP script.
Your script is running, therefore it must have some memory allocated to it. The first call informs you of how much that is.
Your second assertion that str_repeat() should not use that much memory is not looking at the whole picture.
You have the string "hello" (which uses 5 bytes) repeated 100,000 times, for a total of 500,000 bytes...minimum. The question is, how did PHP perform this action? Did they use code such as this? (pseudocode):
s = ""
for(i=0; i<100000; i++)
s += "hello"
This code would require that you reallocate a new string for each iteration of the for loop. Now I can't pretend to say that I know how PHP implements str_repeat(), but you have to be extremely careful with how you use memory to keep memory usage down. From the appearance of things, they did not manage memory in that function as well as they could have.
Third, the difference between the peak memory usage and current memory usage likely comes from the stack that was necessary to make the function call to str_repeat(), as well as any local variables necessary within that function. The memory was probably reclaimed when the function returned.
Finally, Apache runs in a different process and we are dealing with virtual memory. Nothing that Apache does will affect the result of memory_get_usage() as processes do not "share" virtual memory.

In my case (PHP 5.3.3 on Mac OS X 10.5) your script prints:
323964
824176
824980
Now, the difference between the second measurement and the first gives 500212, which is very close to the length of "hello" (5) times 100,000. So I would say no surprises here. The peak is a bit greater because of some temporary allocations when evaluating these statements.
(Your other questions are answered already)

Are php strings immutable?

Or: Should I optimize my string-operations in PHP? I tried to ask PHP's manual about it, but I didn't get any hints to anything.

PHP already optimises it - variables are assigned using copy-on-write, and objects are passed by reference. In PHP 4 it doesn't, but nobody should be using PHP 4 for new code anyway.

One of the most essential speed optimization techniques in many languages is instance reuse. In that case the speed increase comes from at least 2 factors:
1. Less instantiations means less time spent on construction.
2. The less the amount of memory that the application uses, the less CPU cache misses there probably are.
For applications, where the speed is the #1 priority, there exists a truly tight bottleneck between the CPU and the RAM. One of the reasons for the bottleneck is the latency of the RAM.
The PHP, Ruby, Python, etc., are related to the cache-misses by a fact that even they store at least some (probably all) of the run-time data of the interpreted programs in the RAM.
String instantiation is one of the operations that is done pretty often, in relatively "huge quantities", and it may have a noticeable impact on speed.
Here's a run_test.bash of a measurement experiment:
#!/bin/bash
for i in `seq 1 200`;
do
/usr/bin/time -p -a -o ./measuring_data.rb php5 ./string_instantiation_speedtest.php
done
Here are the ./string_instantiation_speedtest.php and the measurement results:
<?php
// The comments on the
// next 2 lines show arithmetic mean of (user time + sys time) for 200 runs.
$b_instantiate=False; // 0.1624 seconds
$b_instantiate=True; // 0.1676 seconds
// The time consumed by the reference version is about 97% of the
// time consumed by the instantiation version, but a thing to notice is
// that the loop contains at least 1, probably 2, possibly 4,
// string instantiations at the array_push line.
$ar=array();
$s='This is a string.';
$n=10000;
$s_1=NULL;
for($i=0;$i<$n;$i++) {
if($b_instantiate) {
$s_1=''.$s;
} else {
$s_1=&$s;
}
// The rand is for avoiding optimization at storage.
array_push($ar,''.rand(0,9).$s_1);
} // for
echo($ar[rand(0,$n)]."\n");
?>
My conclusion from this experiment and one other experiment that I did with Ruby 1.8 is that it makes sense to pass string values around by reference.
One possible way to allow the "pass-strings-by-reference" to take place at the whole application scope is to consistently create a new string instance, whenever one needs to use a modified version of a string.
To increase locality, therefore speed, one may want to decrease the amount of memory that each of the operands consumes. The following experiment demonstrates the case for string concatenations:
<?php
// The comments on the
// next 2 lines show arithmetic mean of (user time + sys time) for 200 runs.
$b_suboptimal=False; // 0.0611 seconds
$b_suboptimal=True; // 0.0785 seconds
// The time consumed by the optimal version is about 78% of the
// time consumed by the suboptimal version.
//
// The number of concatenations is the same and the resultant
// string is the same, but what differs is the "average" and maximum
// lengths of the tokens that are used for assembling the $s_whole.
$n=1000;
$s_token="This is a string with a Linux line break.\n";
$s_whole='';
if($b_suboptimal) {
for($i=0;$i<$n;$i++) {
$s_whole=$s_whole.$s_token.$i;
} // for
} else {
$i_watershed=(int)round((($n*1.0)/2),0);
$s_part_1='';
$s_part_2='';
for($i=0;$i<$i_watershed;$i++) {
$s_part_1=$s_part_1.$i.$s_token;
} // for
for($i=$i_watershed;$i<$n;$i++) {
$s_part_2=$s_part_2.$i.$s_token;
} // for
$s_whole=$s_part_1.$s_part_2;
} // else
// To circumvent possible optimization one actually "uses" the
// value of the $s_whole.
$file_handle=fopen('./it_might_have_been_a_served_HTML_page.txt','w');
fwrite($file_handle, $s_whole);
fclose($file_handle);
?>
For example, if one assembles HTML pages that contain considerable amount of text, then one might want to think about the order, how different parts of the generated HTML are concated together.
A BSD-licensed PHP implementation and Ruby implementation of the watershed string concatenation algorithm is available. The same algorithm can be (has been by me) generalized to speed up multiplication of arbitrary precision integers.

Arrays and strings have copy-on-write behaviour. They are mutable, but when you assign them to a variable initially that variable will contain the exact same instance of the string or array. Only when you modify the array or string is a copy made.
Example:
$a = array_fill(0, 10000, 42); //Consumes 545744 bytes
$b = $a; // " 48 "
$b[0] = 42; // " 545656 "
$s = str_repeat(' ', 10000); // " 10096 "
$t = $s; // " 48 "
$t[0] = '!'; // " 10048 "

A quick google would seem to suggest that they are mutable, but the preferred practice is to treat them as immutable.

PHP 7.4 used mutable strings:
<?php
$str = "Hello\n";
echo $str;
$str[2] = 'y';
echo $str;
Output:
Hello
Heylo
Test: PHP Sandbox

PHP strings are immutable.
Try this:
$a="string";
echo "<br>$a<br>";
echo str_replace('str','b',$a);
echo "<br>$a";
It echos:
string
bing
string
If a string was mutable, it would have continued to show "bing".

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.