Problems with PHP when trying to create big array - php

Here is my code, which creates 2d array filled with zeros, array dimensions are (795,6942):
function zeros($rowCount, $colCount){
$matrix = array();
for ($rowIndx=0; $rowIndx<$rowCount; $rowIndx++){
$matrix[] = array();
for($colIndx=0; $colIndx<$colCount; $colIndx++){
$matrix[$rowIndx][$colIndx]=0;
}
}
return $matrix;
}
$matrix = zeros(795,6942);
And here is the error that I receive:
Allowed memory size of 134217728 bytes exhausted (tried to allocate 35 bytes)
Any ideas how to solve this?

As a quick calculation, you are trying to create an array that contains :
795*6942 = 5,518,890
integers.
If we consider that one integer is stored on 4 bytes (i.e. 32 bits ; using PHP, it not be less), it means :
5518890*4 = 22,075,560
bytes.
OK, quick calculation... result is "it should be OK".
But things are not that easy, unfortunatly :-(
I suppose it's related to the fact that data is stored by PHP using an internal data-structure that's much more complicated than a plain 32 bits integer
Now, just to be curious, let's modify your function so it outputs how much memory is used at the end of each one of the outer for-loop :
function zeros($rowCount, $colCount){
$matrix = array();
for ($rowIndx=0; $rowIndx<$rowCount; $rowIndx++){
$matrix[] = array();
for($colIndx=0; $colIndx<$colCount; $colIndx++){
$matrix[$rowIndx][$colIndx]=0;
}
var_dump(memory_get_usage());
}
return $matrix;
}
With this, I'm getting this kind of output (PHP 5.3.2-dev on a 64bits system ; memory_limit is set to 128MB -- which is already a lot !) :
int 1631968
int 2641888
int 3651808
...
...
int 132924168
int 133934088
Fatal error: Allowed memory size of 134217728 bytes exhausted
Which means each iteration of the outer for-loop requires something like 1.5 MB of memory -- and I only get to 131 iterations before the script runs out of memory ; and not 765 like you wanted.
Considering you set your memory_limit to 128M, you'd have to set it to something really much higher -- like
128*(765/131) = 747 MB
Well, even with
ini_set('memory_limit', '750M');
it's still not enough... with 800MB, it seems enough ;-)
But I would definitly not recommend setting memory_limit to such a high value !
(If you have 2GB of RAM, your server will not be able to handle more than 2 concurrent users ^^ ;; I wouldn't actually test this if my computer had 2GB of RAM, to be honest)
The only solution I see here is for you to re-think your design : there has to be something else you can do than use this portion of code :-)
(BTW : maybe "re-think your design" means using another language PHP : PHP is great when it comes to developping web-sites, but is not suited to every kind of problem)

The default PHP array implementation is very memory-intensive. If you are just storing integers (and lots of them), you probably want to look at SplFixedArray. It uses a regular contiguous memory block to store the data, as opposed to the traditional hash structure.

You should try increasing the amount of memory available to PHP:
ini_set('memory_limit', '32M');
in your PHP file.

Related

PHP memory limit on line running foreach

Is there a more efficient way to set JSON array values than this?
for($i=0;$i<sizeOf($json['activity']);$i++){
$json['activity'][$i]['active'] = 'false';
}
I want to set all sub keys named 'active' to 'false'
The arrays are not huge, they are multi-dimension with about 8-10 sub arrays and I am running on XAMPP localhost.
I am getting
Allowed memory size of 134217728 bytes exhausted (tried to allocate 32 bytes)
error briefly and then the rest of the code runs OK on a setInterval. I have looked at ways to set the memory limit but suspect there must be a cleaner way to set the array keys.
Thank you
If I understand this correctly, you created an infinite loop since everytime it runs, your array gets one more value, same as your $i-counter. Try getting the count of the array first in a seperate variable and then run the loop with that
$c = sizeOf($json['activity']); for($i=0;$i<$c;$i++){
$json['activity'][$i]['active'] = 'false';
}

Need an array-like structure in PHP with minimal memory usage

In my PHP script I need to create an array of >600k integers. Unfortunately my webservers memory_limit is set to 32M so when initializing the array the script aborts with message
Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 71 bytes) in /home/www/myaccount/html/mem_test.php on line 8
I am aware of the fact, that PHP does not store the array values as plain integers, but rather as zvalues which are much bigger than the plain integer value (8 bytes on my 64-bit system). I wrote a small script to estimate how much memory each array entry uses and it turns out, that it's pretty exactly 128 bytes. 128!!! I'd need >73M just to store the array. Unfortunately the webserver is not under my control so I cannot increase the memory_limit.
My question is, is there any possibility in PHP to create an array-like structure that uses less memory. I don't need this structure to be associative (plain index-access is sufficient). It also does not need to have dynamic resizing - I know exactly how big the array will be. Also, all elements would be of the same type. Just like a good old C-array.
Edit:
So deceze's solution works out-of-the-box with 32-bit integers. But even if you're on a 64-bit system, pack() does not seem to support 64-bit integers. In order to use 64-bit integers in my array I applied some bit-manipulation. Perhaps the below snippets will be of help for someone:
function push_back(&$storage, $value)
{
// split the 64-bit value into two 32-bit chunks, then pass these to pack().
$storage .= pack('ll', ($value>>32), $value);
}
function get(&$storage, $idx)
{
// read two 32-bit chunks from $storage and glue them back together.
return (current(unpack('l', substr($storage, $idx * 8, 4)))<<32 |
current(unpack('l', substr($storage, $idx * 8+4, 4))));
}
The most memory efficient you'll get is probably by storing everything in a string, packed in binary, and use manual indexing to it.
$storage = '';
$storage .= pack('l', 42);
// ...
// get 10th entry
$int = current(unpack('l', substr($storage, 9 * 4, 4)));
This can be feasible if the "array" initialisation can be done in one fell swoop and you're just reading from the structure. If you need a lot of appending to the string, this becomes extremely inefficient. Even this can be done using a resource handle though:
$storage = fopen('php://memory', 'r+');
fwrite($storage, pack('l', 42));
...
This is very efficient. You can then read this buffer back into a variable and use it as string, or you can continue to work with the resource and fseek.
A PHP Judy Array will use significantly less memory than a standard PHP array, and an SplFixedArray.
I quote "An array with 1 million entries using regular PHP array data structure takes 200MB. SplFixedArray uses around 90 megabytes. Judy uses 8 megs. Tradeoff is in performance, Judy takes about double the time of regular php array implementation."
You could use an object if possible. These often use less memory than array's.
Also SplFixedArray is an good option.
But it really depends on the implementation that you need to do. If you need an function to return an array and are using PHP 5.5. You could use the generator yield to stream the array back.
You can try to use a SplFixedArray, it's faster and take less memory (the doc comment say ~30% less). Test here and here.
Use a string - that's what I'd do. Store it in a string on fixed offsets (16 or 20 digits should do it I guess?) and use substr to get the one needed. Blazing fast write / read, super easy, and 600.000 integers will only take ~12M to store.
base_convert() - if you need something more compact but with minimum effort, convert your integers to base-36 instead of base-10; in this case, a 14-digit number would be stored in 9 alphanumeric characters. You'll need to make 2 pieces of 64-bit ints, but I'm sure that's not a problem. (I'd split them to 9-digit chunks where conversion gives you a 6-char version.)
pack()/unpack() - binary packing is the same thing with a bit more efficiency. Use it if nothing else works; split your numbers to make them fit to two 32-bit pieces.
600K is a lot of elements. If you are open to alternative methods, I personally would use a database for that. Then use standard sql/nosql select syntax to pull things out. Perhaps memcache or redis if you have an easy host for that, such as garantiadata.com. Maybe APC.
Depending on how you are generate the integers, you could potentially use PHP's generators, assuming you are traversing the array and doing something with individual values.
I took the answer by #deceze and wrapped it in a class that can handle 32-bit integers. It is append-only, but you can still use it as a simple, memory-optimized PHP Array, Queue, or Heap. AppendItem and ItemAt are both O(1), and it has no memory overhead. I added currentPosition/currentSize to avoid unnecessary fseek function calls. If you need to cap memory usage and switch to a temporary file automatically, use php://temp instead.
class MemoryOptimizedArray
{
private $_storage;
private $_currentPosition;
private $_currentSize;
const BYTES_PER_ENTRY = 4;
function __construct()
{
$this->_storage = fopen('php://memory', 'rw+');
$this->_currentPosition = 0;
$this->_currentSize = 0;
}
function __destruct()
{
fclose($this->_storage);
}
function AppendItem($value)
{
if($this->_currentPosition != $this->_currentSize)
{
fseek($this->_storage, SEEK_END);
}
fwrite($this->_storage, pack('l', $value));
$this->_currentSize += self::BYTES_PER_ENTRY;
$this->_currentPosition = $this->_currentSize;
}
function ItemAt($index)
{
$itemPosition = $index * self::BYTES_PER_ENTRY;
if($this->_currentPosition != $itemPosition)
{
fseek($this->_storage, $itemPosition);
}
$binaryData = fread($this->_storage, self::BYTES_PER_ENTRY);
$this->_currentPosition = $itemPosition + self::BYTES_PER_ENTRY;
$unpackedElements = unpack('l', $binaryData);
return $unpackedElements[1];
}
}
$arr = new MemoryOptimizedArray();
for($i = 0; $i < 3; $i++)
{
$v = rand(-2000000000,2000000000);
$arr->AddToEnd($v);
print("added $v\n");
}
for($i = 0; $i < 3; $i++)
{
print($arr->ItemAt($i)."\n");
}
for($i = 2; $i >=0; $i--)
{
print($arr->ItemAt($i)."\n");
}

Memory optimization in PHP array

I'm working with a large array which is a height map, 1024x1024 and of course, i'm stuck with the memory limit. In my test machine i can increase the mem limit to 1gb if i want, but in my tiny VPS with only 256 ram, it's not an option.
I've been searching in stack and google and found several "well, you are using PHP not because memory efficiency, ditch it and rewrite in c++" and honestly, that's ok and I recognize PHP loves memory.
But, when digging more inside PHP memory management, I did not find what memory consumes every data type. Or if casting to another type of data reduces mem consumption.
The only "optimization" technique i found was to unset variables and arrays, that's it.
Converting the code to c++ using some PHP parsers would solve the problem?
Thanks!
If you want a real indexed array, use SplFixedArray. It uses less memory. Also, PHP 5.3 has a much better garbage collector.
Other than that, well, PHP will use more memory than a more carefully written C/C++ equivalent.
Memory Usage for 1024x1024 integer array:
Standard array: 218,756,848
SplFixedArray: 92,914,208
as measured by memory_get_peak_usage()
$array = new SplFixedArray(1024 * 1024); // array();
for ($i = 0; $i < 1024 * 1024; ++$i)
$array[$i] = 0;
echo memory_get_peak_usage();
Note that the same array in C using 64-bit integers would be 8M.
As others have suggested, you could pack the data into a string. This is slower but much more memory efficient. If using 8 bit values it's super easy:
$x = str_repeat(chr(0), 1024*1024);
$x[$i] = chr($v & 0xff); // store value $v into $x[$i]
$v = ord($x[$i]); // get value $v from $x[$i]
Here the memory will only be about 1.5MB (that is, when considering the entire overhead of PHP with just this integer string array).
For the fun of it, I created a simple benchmark of creating 1024x1024 8-bit integers and then looping through them once. The packed versions all used ArrayAccess so that the user code looked the same.
mem write read
array 218M 0.589s 0.176s
packed array 32.7M 1.85s 1.13s
packed spl array 13.8M 1.91s 1.18s
packed string 1.72M 1.11s 1.08s
The packed arrays used native 64-bit integers (only packing 7 bytes to avoid dealing with signed data) and the packed string used ord and chr. Obviously implementation details and computer specs will affect things a bit, but I would expect you to get similar results.
So while the array was 6x faster it also used 125x the memory as the next best alternative: packed strings. Obviously the speed is irrelevant if you are running out of memory. (When I used packed strings directly without an ArrayAccess class they were only 3x slower than native arrays.)
In short, to summarize, I would use something other than pure PHP to process this data if speed is of any concern.
In addition to the accepted answer and suggestions in the comments, I'd like to suggest PHP Judy array implementation.
Quick tests showed interesting results. An array with 1 million entries using regular PHP array data structure takes ~200 MB. SplFixedArray uses around 90 megabytes. Judy uses 8 megs. Tradeoff is in performance, Judy takes about double the time of regular php array implementation.
A little bit late to the party, but if you have a multidimensional array you can save a lot of RAM when you store the complete array as json.
$array = [];
$data = [];
$data["a"] = "hello";
$data["b"] = "world";
To store this array just use:
$array[] = json_encode($data);
instead of
$array[] = $data;
If you want to get the arrry back, just use something like:
$myData = json_decode($array[0], true);
I had a big array with 275.000 sets and saved about 36% RAM consumption.
EDIT:
I found a more better way, when you zip the json string:
$array[] = gzencode(json_encode($data));
and unzip it when you need it:
$myData = json_decode(gzdecode($array[0], true));
This saved me nearly 75% of RAM peak usage.

More than 640 000 elements in the array - memory problem [Dijkstra]

I have a script which puts 803*803 (644 809) graph with 1 000 000 value inside each. With ~500*500 everything works fine - but now it crashes - it tries to allocate more than 64MB of memory (which I haven't). What's the solution? Somehow "split" it or...?
$result=mysql_query("SELECT * FROM some_table", $connection);
confirm($result);
while($rows = mysql_fetch_array($result)){
$result2=mysql_query("SELECT * FROM some_table", $connection);
confirm($result2);
while($rows2 = mysql_fetch_array($result2)){
$first = $rows["something"];
$second = $rows2["something2"];
$graph[$first][$second] = 1000000;
}
}
*it's about Dijkstra algorithm
p.s. no, I can't allocate more than 64MB
Try freeing your inside sql result at the end of each loop, using mysql_free_result($result2);, the PHP script may not do it for you, depending on the PHP version (the garbage collector may not be enabled or may be useless due to a too old PHP version).
Do not instanciate the two temporary variables inside the loop, use the mysql_fetch_array result directly such as $graph[$rows["something"]][$rows2["something2"]] = 1000000; , you will save 2 memory allocations per loop..
PS: This is micro-optimization, therefore it may help you to save enough memory to fit into your 64M of memory. Don't forget that with 64 * 1024 * 1024 bytes of memory, you have an average 104 bytes maximum size for each of your 644 809 elements, plus the array size itself, plus the rest of the temporary data you may allocate for your algorithm.
If it doesn't fit, consider splitting your matrix and doing batched jobs or such to split your work in lesser memory consuming but more than one script run.
If your above code sample actually matches your real code, you're fetching the same result two times (the second one even in a loop). If it's the same dataset fetching it once from the database should be enough and will reduce database-load, execution time and memory footprint altogether.
Perhaps the following approach may work in your memory-restricted environment.
$result = mysql_unbuffered_query("SELECT * FROM some_table", $connection);
confirm($result);
$rawData = array();
while ($rows = mysql_fetch_assoc($result)) {
$rawData[] = array($rows["something"], $rows["something2"]);
}
mysql_free_result($result);
$graph = array();
foreach ($rawData as $r1) {
foreach ($rawData as $r2) {
$graph[$r1[0]][$r2[1]] = 1000000;
}
}
unset($rawData);
Notes:
I'm using mysql_fetch_assoc() instead of mysql_fetch_array() because the latter will return every column twice (one numerically indexed and one indexed by the column name)
Perhaps using mysql_unbuffered_query() instead of mysql_query() might also reduce the memory footprint (depending on the actual dataset size)
Try using http://en.wikipedia.org/wiki/Adjacency_list to represent the graph instead of Adjacency Matrix ( i think you are using matrix cause of $graph[$first][$second] = 1000000;
For a sparse graph, it takes less memory.
If you insist on using PHP for high memory operations (which is not really a good idea to begin with), I would partition the graph into quadrants, and use GD to combine the quadrants. That way you will only have to build the graph with 1/4 the memory footprint.
Again, this is not ideal, but you're trying to use a nail to drive in a hammer :D

PHP memory_get_usage

I came across the PHP's memory_get_usage() and memory_get_peak_usage().
The problem is that I found that these two functions do not provide the real memory used by the current script.
My test script is:
<?php
echo memory_get_usage();
echo '<br />';
$a = str_repeat('hello', 100000);
echo '<br />';
echo memory_get_usage();
echo '<br />';
echo memory_get_peak_usage();
?>
Which returns:
355120
5355216
5356008
What do you understand from this?
The first value is before executing the str_repeat() so it has to be the value of 0.
The second is after the process and it's OK to have a value greater than 0 but not that big value.
The third is the "peak" value and it's slightly greater than the second as I think it should be the biggest value in a processing microsecond.
So do you think that the real value of the current script's memory consumption should be like this:
memory_usage = the second memory usage - the first memory usage
peak_memory_usage = the third (peak_usage) - the first memory usage
which gives:
1) 5355216 - 355120 = 5000096 bytes
2) 5356008 - 355120 = 5000888 bytes
If this is how it works, I assume that the first 355120 bytes are the whole system allocated memory used by apache and other modules, as the first value never changes when you increase or decrease the number of repeats in the str_repeat(), only the two values after the process increase or decrease but never gets smaller that the first value.
According to the php manual, memory_get_usage returns the amount of memory allocated to php, not necessarily the amount being used.
Ok, your first assertion that the first memory_get_usage() should be 0 is wrong. According to PHP's documentation:
Returns the amount of memory, in
bytes, that's currently being
allocated to your PHP script.
Your script is running, therefore it must have some memory allocated to it. The first call informs you of how much that is.
Your second assertion that str_repeat() should not use that much memory is not looking at the whole picture.
You have the string "hello" (which uses 5 bytes) repeated 100,000 times, for a total of 500,000 bytes...minimum. The question is, how did PHP perform this action? Did they use code such as this? (pseudocode):
s = ""
for(i=0; i<100000; i++)
s += "hello"
This code would require that you reallocate a new string for each iteration of the for loop. Now I can't pretend to say that I know how PHP implements str_repeat(), but you have to be extremely careful with how you use memory to keep memory usage down. From the appearance of things, they did not manage memory in that function as well as they could have.
Third, the difference between the peak memory usage and current memory usage likely comes from the stack that was necessary to make the function call to str_repeat(), as well as any local variables necessary within that function. The memory was probably reclaimed when the function returned.
Finally, Apache runs in a different process and we are dealing with virtual memory. Nothing that Apache does will affect the result of memory_get_usage() as processes do not "share" virtual memory.
In my case (PHP 5.3.3 on Mac OS X 10.5) your script prints:
323964
824176
824980
Now, the difference between the second measurement and the first gives 500212, which is very close to the length of "hello" (5) times 100,000. So I would say no surprises here. The peak is a bit greater because of some temporary allocations when evaluating these statements.
(Your other questions are answered already)

Categories