I am trying to create a 2D array in PHP with a size of 2000x2000 (4 million entries). It seems that I run out of memory here, but the manner in which the error is appearing is confusing me.
When I define the array and fill it initially using the array_fill command, and initialize each position in the array (matrix) with 0 there is no problem.
However if I try iterating over the array and fill each position with 0, it runs out of memory.
I would assume that once I run array_fill it allocates the memory at that point, and it should not run out of memory in the loop.
Of course, this is just a simplified version of the code. In my actual application I will be using the X & Y coordinates to lookup value from another table, process it, and then store it in my matrix. These will be floating point values.
Can somebody help through some light on this please? Is there some other way I should be doing this?
Thank you!
<?php
// Set error reporting.
error_reporting(E_ALL);
ini_set('display_errors', TRUE);
ini_set('display_startup_errors', TRUE);
// Define Matrix dimensions.
define("MATRIX_WIDTH", 2000+1);
define("MATRIX_HEIGHT", 2000+1);
// Setup array for matrix and initialize it.
$matrix = array_fill(0,MATRIX_HEIGHT,array_fill(0,MATRIX_WIDTH,0));
// Populate each matrix point with calculated value.
for($y_cood=0;$y_cood<MATRIX_HEIGHT;$y_cood++) {
// Debugging statement to see where the script stops running.
if( ($y_cood % 100) == 0 ) {print("Y=$y_cood<br>"); flush();}
for($x_cood=0;$x_cood<MATRIX_WIDTH;$x_cood++) {
$fill_value = 0;
$matrix[$y_cood][$x_cood]=$fill_value;
}
}
print("Matrix width: ".count($matrix)."<br>");
print("Matrix height: ".count($matrix[0])."<br>");
?>
I would assume that once I run array_fill it allocates the memory at that point, and it should not run out of memory in the loop.
Yes ...and no. Allocating memory and executing the program code are two different shoes (usually).
The memory allocated to a program/process is usually divided in two - heap and stack. When you "allocate memory" (in the meaning you used in your question), this occurs in the heap. When you execute program code, the stack is also used. Both are not completely separated, since you may push and/or pop references (pointers to the heap) on and/or from the stack.
The thing is that the heap and the stack share part of the memory (allocated to that process) and usually the one grows (is being filled) from higher addresses to the low ones and the other - from low addresses to the higher one, and so you have a "floating" border between both. As soon as both parts reach that "border" you're "out of memory".
So, in your case, when you create and fill your array(matrix) you've used memory for 2001 x 2001 integers. If an integer requires 32 bits or 4 Bytes, then there are 2001 x 2001 x 4 Bytes = 4004001 x 4 Bytes = 16016004 Bytes ~ 16 MB.
When executing the code, the stack's being filled with the (local) variables - loop condition variable, loop counter and all the other variables.
You should also not forget that the PHP (library) code should also be loaded in the memory, so depending on the value you have set as memory_limit in your configuration, you may quickly run out of memory.
Related
I'm using a php script for updating product data.
While the consumed memory is constant, the consumed time per 1.000 products is increasing all the time:
[26000 - 439.75 MB / 14.822s]..........
[27000 - 439.25 MB / 15.774s]..........
[28000 - 438.25 MB / 15.068s]..........
[29000 - 437.75 MB / 16.317s]..........
[30000 - 437.25 MB / 16.968s]..........
[31000 - 436.25 MB / 17.521s]....
Even if i disable everything except reading a line of my variable containing the CSV data, the effect is the same, except a lower increase rate:
[65000 - 424.75 MB / 0.001s]..........
[66000 - 424.75 MB / 0.63s]..........
[67000 - 424.75 MB / 0.716s]..........
[68000 - 424.75 MB / 0.848s]..........
[69000 - 424.75 MB / 0.943s]..........
[70000 - 424.25 MB / 1.126s]..........
[71000 - 423.5 MB / 1.312s]....
I tried changing the GC settings (php -dzend.enable_gc=1 and php -dzend.enable_gc=0).
I load my data in advance with:
$this->file = file($file_path);
The next line is retrieved with:
$line = array_shift($this->file);
I don't know why this should consistantly increase the required time, especially when I just array_shift the line without performing any actions on it.
My current solution is to split the file up in 10.000 pieces, which is not a desirable solution for a file that contains more than 300.000 lines and has to be updated every day.
It would be nice to at least understand what happens here...
Thanks in advance for any hints.
The issue with array_shift()
Part of the data maintained internally for every single element in an array is a sequence number identifying the position of that element within the array. These values are effectively sequential integers, starting from 0 for the first element. Don't confuse this with the key value of an enumerated array, it's maintained purely internally, and completely separate to the key so that you can do associative sorts, which effectively just re-organize these internal position values.
When you add a new element to an array, it needs to be given a new sequence value. If you're just adding the new element to the end of the array, then it's as simple as taking the previous higest sequence value, adding one, and assigning that as the sequence value for the new element.... a simple O(1) activity. Likewise, if you remove the last element, it can simply be removed, and the sequence for all other elements remains valid.
However, if you add a new element to the beginning of the array using array_unshift(), then it will be assigned the 0 value, and every existing element already in the array will need to have its sequence value increased by 1, so PHP internally has to traverse every element making this an O(n) transaction. Likewise array_shift() has to decreement the sequence value for every remaining array element once it has removed the first element from the array, also O(n). If your array is very large, this can be a major overhead.
General performance
In answer to your performance issues.... why are you reading the entire file into memory in one go? Why can't you simply process it one line at a time?
$fh = fopen('filename.txt', 'r');
while (!feof($fh)) {
$item = fread($fh);
.... processing here
}
fclose($fh);
And don't try to out-think PHP's garbage collection
array_shift() should technically run faster the more it is used, as it has to re-index a smaller set.
Are you doing anything else with the returned result?
Alternatively, you may think about reversing the array before the loop:
$reversed = array_reverse($file);
And then popping the last value inside your loop
$item = array_pop($reversed);
Is there a specific reason why you need to use array_shift()?
Maybe just reading the file and closing it would make your script run faster:
$this->file = file($file_path);
foreach ($this->file as $line) {
// do the thing you need to do
}
unset ($this->file);
Another thing is that you seem to be reading one array ($file) and turning it into another ($line). Maybe it might be worth using the $file array as it is?
I'm not sure exactly what you're doing - but hopefully these suggestions might help.
I am using the following code in an application based on ZF1:
$select = $db->select()->from('table', array('id', 'int', 'float'))->limit(10000, (($i - 1) * 10000));
$data = $select->query();
while ($row = $data->fetch()) {
# ...
}
This operation is happening in a foreach loop for some 800 times. I output the memory usage for each pass and can see it increasing by about 5MB per pass. I suppose that is because Zend apparently does not free the result from the query once the pass is complete. A simple unset didn't solve the issue. Using fetchAll also did not improve (or change) the situation.
Is there any way to free the result from a Zend_Db_Statement_PDO thus freeing the memory used by it? Or do you suspect another reason?
I believe you want to do this:
$sql = "SELECT something FROM random-table-with-an-obscene-large-amount-of-entries";
$res = $db->query($sql);
while ($row = $res->fetch(Zend_Db::FETCH_NUM)) {
// do some with the data returned in $row
}
Zend_Db::FETCH_NUM - return data in an array of arrays. The arrays are indexed by integers, corresponding to the position of the respective field in the select-list of the query.
Since you overwrite $row on each loop, the memory should be reclaimed. If you are paranoid you can unset($row) at the bottom of the loop I believe. I've not tested this myself recently, but I ran into a batch problem about a year ago that was similar, and I seem to recall using this solution.
Actually the problem was hidden somewhere else:
Inside the loop some integer results were stored in an array for modification at a later planned stage in the workflow.
While one might expect PHP arrays to be small, that is not the case: Arrays grow big really fast and a PHP array is on average 18 times larger than it is to be 'expected'. Watch out while working with arrays, even if you only store integers in them!
In case the linked article disappears sometime:
In this post I want to investigate the memory usage of PHP arrays (and values in general) using the following script as an example, which creates 100000 unique integer array elements and measures the resulting memory usage:
$startMemory = memory_get_usage();
$array = range(1, 100000);
echo memory_get_usage() - $startMemory, ' bytes';
How much would you expect it to be? Simple, one integer is 8 bytes (on a 64 bit unix machine and using the long type) and you got 100000 integers, so you obviously will need 800000 bytes. That’s something like 0.76 MBs.
Now try and run the above code. This gives me 14649024 bytes. Yes, you heard right, that’s 13.97 MB - eightteen times more than we estimated.
Supposing a multidimensional associative array that, when printed as text with print_r(), creates a 470 KiB file. Is it reasonable to assume that the variable in question takes up half a MiB of server memory per instance if it is different for each user? Therefore if 1000 users hit the server at the same time almost half a GiB of memory will be consumed?
Thanks.
There is an excellent article on this topic at IBM:
http://www.ibm.com/developerworks/opensource/library/os-php-v521/
UPDATE
The original page was taken down, for now the JP version is still there https://www.ibm.com/developerworks/jp/opensource/library/os-php-v521/
Basic takeaways form it are that you can use memory_get_usage() to check how much memory your script currently occupies:
// This is only an example, the numbers below will differ depending on your system
echo memory_get_usage () "\ n";. // 36640
$ A = str_repeat ( "Hello", 4242);
echo memory_get_usage () "\ n";. // 57960
unset ($ a);
echo memory_get_usage () "\ n";. // 36744
Also, you can check the peak memory usage of your script with memory_get_peak_usage().
As an answer to your questions: print_r() is a representation of data which is bloated with text and formatting. The occupied memory itself will be less than the number of characters of print_r(). How much depends on the data. You should check it like in the example above.
Whatever result you get, it will be for each user executing the script, so yes - if 1000 users are requesting it at the same time, you will need that memory.
I came across the PHP's memory_get_usage() and memory_get_peak_usage().
The problem is that I found that these two functions do not provide the real memory used by the current script.
My test script is:
<?php
echo memory_get_usage();
echo '<br />';
$a = str_repeat('hello', 100000);
echo '<br />';
echo memory_get_usage();
echo '<br />';
echo memory_get_peak_usage();
?>
Which returns:
355120
5355216
5356008
What do you understand from this?
The first value is before executing the str_repeat() so it has to be the value of 0.
The second is after the process and it's OK to have a value greater than 0 but not that big value.
The third is the "peak" value and it's slightly greater than the second as I think it should be the biggest value in a processing microsecond.
So do you think that the real value of the current script's memory consumption should be like this:
memory_usage = the second memory usage - the first memory usage
peak_memory_usage = the third (peak_usage) - the first memory usage
which gives:
1) 5355216 - 355120 = 5000096 bytes
2) 5356008 - 355120 = 5000888 bytes
If this is how it works, I assume that the first 355120 bytes are the whole system allocated memory used by apache and other modules, as the first value never changes when you increase or decrease the number of repeats in the str_repeat(), only the two values after the process increase or decrease but never gets smaller that the first value.
According to the php manual, memory_get_usage returns the amount of memory allocated to php, not necessarily the amount being used.
Ok, your first assertion that the first memory_get_usage() should be 0 is wrong. According to PHP's documentation:
Returns the amount of memory, in
bytes, that's currently being
allocated to your PHP script.
Your script is running, therefore it must have some memory allocated to it. The first call informs you of how much that is.
Your second assertion that str_repeat() should not use that much memory is not looking at the whole picture.
You have the string "hello" (which uses 5 bytes) repeated 100,000 times, for a total of 500,000 bytes...minimum. The question is, how did PHP perform this action? Did they use code such as this? (pseudocode):
s = ""
for(i=0; i<100000; i++)
s += "hello"
This code would require that you reallocate a new string for each iteration of the for loop. Now I can't pretend to say that I know how PHP implements str_repeat(), but you have to be extremely careful with how you use memory to keep memory usage down. From the appearance of things, they did not manage memory in that function as well as they could have.
Third, the difference between the peak memory usage and current memory usage likely comes from the stack that was necessary to make the function call to str_repeat(), as well as any local variables necessary within that function. The memory was probably reclaimed when the function returned.
Finally, Apache runs in a different process and we are dealing with virtual memory. Nothing that Apache does will affect the result of memory_get_usage() as processes do not "share" virtual memory.
In my case (PHP 5.3.3 on Mac OS X 10.5) your script prints:
323964
824176
824980
Now, the difference between the second measurement and the first gives 500212, which is very close to the length of "hello" (5) times 100,000. So I would say no surprises here. The peak is a bit greater because of some temporary allocations when evaluating these statements.
(Your other questions are answered already)
Here is my code, which creates 2d array filled with zeros, array dimensions are (795,6942):
function zeros($rowCount, $colCount){
$matrix = array();
for ($rowIndx=0; $rowIndx<$rowCount; $rowIndx++){
$matrix[] = array();
for($colIndx=0; $colIndx<$colCount; $colIndx++){
$matrix[$rowIndx][$colIndx]=0;
}
}
return $matrix;
}
$matrix = zeros(795,6942);
And here is the error that I receive:
Allowed memory size of 134217728 bytes exhausted (tried to allocate 35 bytes)
Any ideas how to solve this?
As a quick calculation, you are trying to create an array that contains :
795*6942 = 5,518,890
integers.
If we consider that one integer is stored on 4 bytes (i.e. 32 bits ; using PHP, it not be less), it means :
5518890*4 = 22,075,560
bytes.
OK, quick calculation... result is "it should be OK".
But things are not that easy, unfortunatly :-(
I suppose it's related to the fact that data is stored by PHP using an internal data-structure that's much more complicated than a plain 32 bits integer
Now, just to be curious, let's modify your function so it outputs how much memory is used at the end of each one of the outer for-loop :
function zeros($rowCount, $colCount){
$matrix = array();
for ($rowIndx=0; $rowIndx<$rowCount; $rowIndx++){
$matrix[] = array();
for($colIndx=0; $colIndx<$colCount; $colIndx++){
$matrix[$rowIndx][$colIndx]=0;
}
var_dump(memory_get_usage());
}
return $matrix;
}
With this, I'm getting this kind of output (PHP 5.3.2-dev on a 64bits system ; memory_limit is set to 128MB -- which is already a lot !) :
int 1631968
int 2641888
int 3651808
...
...
int 132924168
int 133934088
Fatal error: Allowed memory size of 134217728 bytes exhausted
Which means each iteration of the outer for-loop requires something like 1.5 MB of memory -- and I only get to 131 iterations before the script runs out of memory ; and not 765 like you wanted.
Considering you set your memory_limit to 128M, you'd have to set it to something really much higher -- like
128*(765/131) = 747 MB
Well, even with
ini_set('memory_limit', '750M');
it's still not enough... with 800MB, it seems enough ;-)
But I would definitly not recommend setting memory_limit to such a high value !
(If you have 2GB of RAM, your server will not be able to handle more than 2 concurrent users ^^ ;; I wouldn't actually test this if my computer had 2GB of RAM, to be honest)
The only solution I see here is for you to re-think your design : there has to be something else you can do than use this portion of code :-)
(BTW : maybe "re-think your design" means using another language PHP : PHP is great when it comes to developping web-sites, but is not suited to every kind of problem)
The default PHP array implementation is very memory-intensive. If you are just storing integers (and lots of them), you probably want to look at SplFixedArray. It uses a regular contiguous memory block to store the data, as opposed to the traditional hash structure.
You should try increasing the amount of memory available to PHP:
ini_set('memory_limit', '32M');
in your PHP file.