Need an array-like structure in PHP with minimal memory usage - php

In my PHP script I need to create an array of >600k integers. Unfortunately my webservers memory_limit is set to 32M so when initializing the array the script aborts with message
Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 71 bytes) in /home/www/myaccount/html/mem_test.php on line 8
I am aware of the fact, that PHP does not store the array values as plain integers, but rather as zvalues which are much bigger than the plain integer value (8 bytes on my 64-bit system). I wrote a small script to estimate how much memory each array entry uses and it turns out, that it's pretty exactly 128 bytes. 128!!! I'd need >73M just to store the array. Unfortunately the webserver is not under my control so I cannot increase the memory_limit.
My question is, is there any possibility in PHP to create an array-like structure that uses less memory. I don't need this structure to be associative (plain index-access is sufficient). It also does not need to have dynamic resizing - I know exactly how big the array will be. Also, all elements would be of the same type. Just like a good old C-array.
Edit:
So deceze's solution works out-of-the-box with 32-bit integers. But even if you're on a 64-bit system, pack() does not seem to support 64-bit integers. In order to use 64-bit integers in my array I applied some bit-manipulation. Perhaps the below snippets will be of help for someone:
function push_back(&$storage, $value)
{
// split the 64-bit value into two 32-bit chunks, then pass these to pack().
$storage .= pack('ll', ($value>>32), $value);
}
function get(&$storage, $idx)
{
// read two 32-bit chunks from $storage and glue them back together.
return (current(unpack('l', substr($storage, $idx * 8, 4)))<<32 |
current(unpack('l', substr($storage, $idx * 8+4, 4))));
}

The most memory efficient you'll get is probably by storing everything in a string, packed in binary, and use manual indexing to it.
$storage = '';
$storage .= pack('l', 42);
// ...
// get 10th entry
$int = current(unpack('l', substr($storage, 9 * 4, 4)));
This can be feasible if the "array" initialisation can be done in one fell swoop and you're just reading from the structure. If you need a lot of appending to the string, this becomes extremely inefficient. Even this can be done using a resource handle though:
$storage = fopen('php://memory', 'r+');
fwrite($storage, pack('l', 42));
...
This is very efficient. You can then read this buffer back into a variable and use it as string, or you can continue to work with the resource and fseek.

A PHP Judy Array will use significantly less memory than a standard PHP array, and an SplFixedArray.
I quote "An array with 1 million entries using regular PHP array data structure takes 200MB. SplFixedArray uses around 90 megabytes. Judy uses 8 megs. Tradeoff is in performance, Judy takes about double the time of regular php array implementation."

You could use an object if possible. These often use less memory than array's.
Also SplFixedArray is an good option.
But it really depends on the implementation that you need to do. If you need an function to return an array and are using PHP 5.5. You could use the generator yield to stream the array back.

You can try to use a SplFixedArray, it's faster and take less memory (the doc comment say ~30% less). Test here and here.

Use a string - that's what I'd do. Store it in a string on fixed offsets (16 or 20 digits should do it I guess?) and use substr to get the one needed. Blazing fast write / read, super easy, and 600.000 integers will only take ~12M to store.
base_convert() - if you need something more compact but with minimum effort, convert your integers to base-36 instead of base-10; in this case, a 14-digit number would be stored in 9 alphanumeric characters. You'll need to make 2 pieces of 64-bit ints, but I'm sure that's not a problem. (I'd split them to 9-digit chunks where conversion gives you a 6-char version.)
pack()/unpack() - binary packing is the same thing with a bit more efficiency. Use it if nothing else works; split your numbers to make them fit to two 32-bit pieces.

600K is a lot of elements. If you are open to alternative methods, I personally would use a database for that. Then use standard sql/nosql select syntax to pull things out. Perhaps memcache or redis if you have an easy host for that, such as garantiadata.com. Maybe APC.

Depending on how you are generate the integers, you could potentially use PHP's generators, assuming you are traversing the array and doing something with individual values.

I took the answer by #deceze and wrapped it in a class that can handle 32-bit integers. It is append-only, but you can still use it as a simple, memory-optimized PHP Array, Queue, or Heap. AppendItem and ItemAt are both O(1), and it has no memory overhead. I added currentPosition/currentSize to avoid unnecessary fseek function calls. If you need to cap memory usage and switch to a temporary file automatically, use php://temp instead.
class MemoryOptimizedArray
{
private $_storage;
private $_currentPosition;
private $_currentSize;
const BYTES_PER_ENTRY = 4;
function __construct()
{
$this->_storage = fopen('php://memory', 'rw+');
$this->_currentPosition = 0;
$this->_currentSize = 0;
}
function __destruct()
{
fclose($this->_storage);
}
function AppendItem($value)
{
if($this->_currentPosition != $this->_currentSize)
{
fseek($this->_storage, SEEK_END);
}
fwrite($this->_storage, pack('l', $value));
$this->_currentSize += self::BYTES_PER_ENTRY;
$this->_currentPosition = $this->_currentSize;
}
function ItemAt($index)
{
$itemPosition = $index * self::BYTES_PER_ENTRY;
if($this->_currentPosition != $itemPosition)
{
fseek($this->_storage, $itemPosition);
}
$binaryData = fread($this->_storage, self::BYTES_PER_ENTRY);
$this->_currentPosition = $itemPosition + self::BYTES_PER_ENTRY;
$unpackedElements = unpack('l', $binaryData);
return $unpackedElements[1];
}
}
$arr = new MemoryOptimizedArray();
for($i = 0; $i < 3; $i++)
{
$v = rand(-2000000000,2000000000);
$arr->AddToEnd($v);
print("added $v\n");
}
for($i = 0; $i < 3; $i++)
{
print($arr->ItemAt($i)."\n");
}
for($i = 2; $i >=0; $i--)
{
print($arr->ItemAt($i)."\n");
}

Related

Amount of data stored in a PHP array

I'm looking for a way to measure the amount of data stored in a PHP array. I'm not talking about the number of elements in the array (which you can figure out with count($array, COUNT_RECURSIVE)), but the cumulative amount of data from all the keys and their corresponding values. For instance:
array('abc'=>123); // size = 6
array('a'=>1,'b'=>2); // size = 4
As what I'm interested in is order of magnitude rather than the exact amount (I want to compare the processing memory and time usage versus the size of the arrays) I thought about using the following trick:
strlen(print_r($array,true));
However the amount of overhead coming from print_r varies depending on the structure of the array which doesn't give me consistent results:
echo strlen(print_r(array('abc'=>123),true)); // 27
echo strlen(print_r(array('a'=>1,'b'=>2),true)); // 35
Is there a way (ideally in a one-liner and without impacting too much performance as I need to execute this at run-time on production) to measure the amount of data stored in an array in PHP?
Does this do the trick:
<?php
$arr = array('abc'=>123);
echo strlen(implode('',array_keys($arr)).implode('',$arr));
?>
Short answer: mission impossible
You could try something like:
strlen(serialize($myArray)) // either this
strlen(json_encode($myArray)) // or this
But to approximate the true memory footprint of an array, you will have to do a lot more than that. If you're looking for a ballpark estimate, arrays take 3-8x more than their serialized version, depending on what you store in them and how many elements you have. It increases gradually, in bigger and bigger chunks as your array grows. To give you an idea of what's happening, here's an array estimation function I came up with, after many hours of trying, for one-level arrays only:
function estimateArrayFootprint($a) { // copied from one of my failed quests :(
$size = 0;
foreach($a as $k=>$v) {
foreach([$k,$v] as $x) {
$n = strlen($x);
do{
if($n>8192 ) {$n = (1+($n>>12)<<12);break;}
if($n>1024 ) {$n = (1+($n>> 9)<< 9);break;}
if($n>512 ) {$n = (1+($n>> 8)<< 8);break;}
if($n>0 ) {$n = (1+($n>> 5)<< 5);break;}
}while(0);
$size += $n + 96;
}
}
return $size;
}
So that's how easy it is, not. And again, it's not a reliable estimation, it probably depends on the PHP memory limit, the architecture, the PHP version and a lot more. The question is how accurately do you need this value.
Also let's not forget that these values came from a memory_get_usage(1) which is not very exact either. PHP allocates memory in big blocks in order to avoid a noticeable overhead as your string/array/whatever else grows, like in a for(...) $x.="yada" situation.
I wish I could say anything more useful.

Memory efficient way to store huge number of booleans in PHP

I'm looking for an efficient way to store a huge number of booleans (up to 2.5*10e11) in PHP's memory. My first idea was to create an array of integers and store one boolean per bit in each integer:
// number of booleans to store
$n = 2.5 * pow(10, 11);
// bits per integer
$bitsPerInt = PHP_INT_SIZE * 8;
// init storage
$storage = array();
for ($i=0; $i<ceil($n/$bitsPerInt); $i++) {
$storage[$i] = 0;
}
// bits in each integer can be accessed using PHP's bitwise operators
However, the overhead with this solution is still way too big: Storing 10^8 booleans (bits) in a 32-bit environment (PHP_INT_SIZE = 4 bytes) needs an array of 3125000 integers, consuming ~ 254 MB of memory, whereas the rare data of 10^8 booleans would only need ~ 12 MB.
So which is the best way to store a huge number of booleans in PHP (5)?
If you really must use an array of that many booleans you can use a string as a ByteArray and pack 8 booleans for each char in the string. This has very little memory overhead compared to the native PHP array but it is more difficult to use.
You can convert bytes to chars and back with the ord and chr functions.
Perhaps the SplStack or the SplFixedArray classes from the SPL fit your needs better.
If using one bit per value uses too much memory then you will need to rethink your design - everything in memory is just bits at the end of the day, and you can't squeeze more than one boolean value into a single bit (by definition).

What is the maximum size of an array in PHP?

We like to store database values in array. But we do not know the maximum size of an array which is allowed in PHP?
There is no max on the limit of an array. There is a limit on the amount of memory your script can use. This can be changed in the 'memory_limit' in your php.ini configuration.
Array size is limited only by amount of memory your server has. If your array gets too big, you will get "out of memory" error.
It seems to me to be the 16-bit signed integer limit. (2^15)
$ar = [];
while (array_push($ar, null)) {
print 'a';
}
Length of output: 32768
If, like me, you need to use a huge array in a class in PHP 5.6.40 and have found that there is a limit to the size of class arrays so that they get overflowed and overwritten when surpassing 32768 elements, then here is the solution I found to work.
Create a public function with the huge array in it as a local variable. Then assign that local variable to the class variable. Call this function right in the constructor. You will see that it prints the correct size of the array instead of the overflow leftover size.
class Zipcode_search {
public $zipcodes;
public function __construct() {
$this->setHugeArray();
print "size is ".sizeof($this->zipcodes). "<br />";
}
public function setHugeArray(){
$zipcodes=[too much stuff];//actual array with +40,000 elements etc.
$this->zipcodes = $zipcodes;
}
}
2,147,483,647 items, even on 64-bit PHP. (PHP 7.2.24)
In PHP, typedef struct _hashtable is defined with uint values for nTableSize and nNumOfElements.
Because of this, the largest array you can create with array_fill() or range() appears to be 2^32-1 items. While keys can be anything, including numbers outside that range, if you start at zero, with a step size of 1, your highest index can be 2147483646.
If you are asking this question, you have likely seen an error like:
# php -r 'array_fill(0, 2147483648, 0);'
PHP Warning: array_fill(): Too many elements in Command line code on line 1
or even:
# php -r 'array_fill(0, 2147483647, 0);'
Segmentation fault (core dumped)
...or, most likely, the error which explicitly refers to the "maximum array size":
php -r 'range(0,2147483647);'
PHP Warning: range(): The supplied range exceeds the maximum array size:
start=0 end=2147483647 in Command line code on line 1
A caution for those reading this question:
The most common place you'll run into this, is through misuse/abuse of the range() operator, as if it was an iterator. It is in other languages, but in PHP it is not: it is an array-filling operator, just like array_fill().
So odds are good that you can avoid the array use entirely. It is unsafe to do things like:
foreach (range($min, $max, $step)) { ... stuff ... }
Instead do:
for ($i = $min; $i <= $max; $i += $step) { ... stuff ... }
Equally, I've seen people doing:
// Takes 3 minutes with $min=0, $max=1e9, $step=1.
// Segfaults with slightly larger ranges.
$numItems = count(range($min, $max, $step));
Which can instead be rewritten in a more secure, idiomatic and performant way:
// Takes microseconds with $min=0, $max=1e9, $step=1.
// Can handle vastly larger numbers, too.
$numItems = ($max - $min) % $step;
If you are running into errors about array size, odds are good that you are doing crazy stuff that you probably should avoid.

Memory optimization in PHP array

I'm working with a large array which is a height map, 1024x1024 and of course, i'm stuck with the memory limit. In my test machine i can increase the mem limit to 1gb if i want, but in my tiny VPS with only 256 ram, it's not an option.
I've been searching in stack and google and found several "well, you are using PHP not because memory efficiency, ditch it and rewrite in c++" and honestly, that's ok and I recognize PHP loves memory.
But, when digging more inside PHP memory management, I did not find what memory consumes every data type. Or if casting to another type of data reduces mem consumption.
The only "optimization" technique i found was to unset variables and arrays, that's it.
Converting the code to c++ using some PHP parsers would solve the problem?
Thanks!
If you want a real indexed array, use SplFixedArray. It uses less memory. Also, PHP 5.3 has a much better garbage collector.
Other than that, well, PHP will use more memory than a more carefully written C/C++ equivalent.
Memory Usage for 1024x1024 integer array:
Standard array: 218,756,848
SplFixedArray: 92,914,208
as measured by memory_get_peak_usage()
$array = new SplFixedArray(1024 * 1024); // array();
for ($i = 0; $i < 1024 * 1024; ++$i)
$array[$i] = 0;
echo memory_get_peak_usage();
Note that the same array in C using 64-bit integers would be 8M.
As others have suggested, you could pack the data into a string. This is slower but much more memory efficient. If using 8 bit values it's super easy:
$x = str_repeat(chr(0), 1024*1024);
$x[$i] = chr($v & 0xff); // store value $v into $x[$i]
$v = ord($x[$i]); // get value $v from $x[$i]
Here the memory will only be about 1.5MB (that is, when considering the entire overhead of PHP with just this integer string array).
For the fun of it, I created a simple benchmark of creating 1024x1024 8-bit integers and then looping through them once. The packed versions all used ArrayAccess so that the user code looked the same.
mem write read
array 218M 0.589s 0.176s
packed array 32.7M 1.85s 1.13s
packed spl array 13.8M 1.91s 1.18s
packed string 1.72M 1.11s 1.08s
The packed arrays used native 64-bit integers (only packing 7 bytes to avoid dealing with signed data) and the packed string used ord and chr. Obviously implementation details and computer specs will affect things a bit, but I would expect you to get similar results.
So while the array was 6x faster it also used 125x the memory as the next best alternative: packed strings. Obviously the speed is irrelevant if you are running out of memory. (When I used packed strings directly without an ArrayAccess class they were only 3x slower than native arrays.)
In short, to summarize, I would use something other than pure PHP to process this data if speed is of any concern.
In addition to the accepted answer and suggestions in the comments, I'd like to suggest PHP Judy array implementation.
Quick tests showed interesting results. An array with 1 million entries using regular PHP array data structure takes ~200 MB. SplFixedArray uses around 90 megabytes. Judy uses 8 megs. Tradeoff is in performance, Judy takes about double the time of regular php array implementation.
A little bit late to the party, but if you have a multidimensional array you can save a lot of RAM when you store the complete array as json.
$array = [];
$data = [];
$data["a"] = "hello";
$data["b"] = "world";
To store this array just use:
$array[] = json_encode($data);
instead of
$array[] = $data;
If you want to get the arrry back, just use something like:
$myData = json_decode($array[0], true);
I had a big array with 275.000 sets and saved about 36% RAM consumption.
EDIT:
I found a more better way, when you zip the json string:
$array[] = gzencode(json_encode($data));
and unzip it when you need it:
$myData = json_decode(gzdecode($array[0], true));
This saved me nearly 75% of RAM peak usage.

serialize a large array in PHP?

I am curious, is there a size limit on serialize in PHP. Would it be possible to serialize an array with 5,000 keys and values so it can be stored into a cache?
I am hoping to cache a users friend list on a social network site, the cache will need to be updated fairly often but it will need to be read almost every page load.
On a single server setup I am assuming APC would be better then memcache for this.
As quite a couple other people answered already, just for fun, here's a very quick benchmark (do I dare calling it that ? ) ; consider the following code :
$num = 1;
$list = array_fill(0, 5000, str_repeat('1234567890', $num));
$before = microtime(true);
for ($i=0 ; $i<10000 ; $i++) {
$str = serialize($list);
}
$after = microtime(true);
var_dump($after-$before);
var_dump(memory_get_peak_usage());
I'm running this on PHP 5.2.6 (the one bundled with Ubuntu jaunty).
And, yes, there are only values ; no keys ; and the values are quite simple : no object, no sub-array, no nothing but string.
For $num = 1, you get :
float(11.8147978783)
int(1702688)
For $num = 10, you get :
float(13.1230671406)
int(2612104)
And, for $num = 100, you get :
float(63.2925770283)
int(11621760)
So, it seems the bigger each element of the array is, the longer it takes (seems fair, actually). But, for elements 100 times bigger, you don't take 100 times much longer...
Now, with an array of 50000 elements, instead of 5000, which means this part of the code is changed :
$list = array_fill(0, 50000, str_repeat('1234567890', $num));
With $num = 1, you get :
float(158.236332178)
int(15750752)
Considering the time it took for 1, I won't be running this for either $num = 10 nor $num = 100...
Yes, of course, in a real situation, you wouldn't be doing this 10000 times ; so let's try with only 10 iterations of the for loop.
For $num = 1 :
float(0.206310987473)
int(15750752)
For $num = 10 :
float(0.272629022598)
int(24849832)
And for $num = 100 :
float(0.895547151566)
int(114949792)
Yeah, that's almost 1 second -- and quite a bit of memory used ^^
(No, this is not a production server : I have a pretty high memory_limit on this development machine ^^ )
So, in the end, to be a bit shorter than those number -- and, yes, you can have numbers say whatever you want them to -- I wouldn't say there is a "limit" as in "hardcoded" in PHP, but you'll end up facing one of those :
max_execution_time (generally, on a webserver, it's never more than 30 seconds)
memory_limit (on a webserver, it's generally not muco more than 32MB)
the load you webserver will have : while 1 of those big serialize-loop was running, it took 1 of my CPU ; if you are having quite a couple of users on the same page at the same time, I let you imagine what it will give ;-)
the patience of your user ^^
But, except if you are really serializing long arrays of big data, I am not sure it will matter that much...
And you must take into consideration the amount of time/CPU-load using that cache might help you gain ;-)
Still, the best way to know would be to test by yourself, with real data ;-)
And you might also want to take a look at what Xdebug can do when it comes to profiling : this kind of situation is one of those it is useful for!
The serialize() function is only limited by available memory.
There's no limit enforced by PHP. Serialize returns a bytestream representation (string) of the serialized structure, so you would just get a large string.
The only practical limit is your available memory, since serialization involves creating a string in memory.
There is no limit, but remember that serialization and unserialization has a cost.
Unserialization is exteremely costly.
A less costly way of caching that data would be via var_export() as such (since PHP 5.1.0, it works on objects):
$largeArray = array(1,2,3,'hello'=>'world',4);
file_put_contents('cache.php', "<?php\nreturn ".
var_export($largeArray, true).
';');
You can then simply retrieve the array by doing the following:
$largeArray = include('cache.php');
Resources are usually not cache-able.
Unfortunately, if you have circular references in your array, you'll need to use serialize().
As suggested by Thinker above:
You could use
$string = json_encode($your_array_here);
and to decode it
$array = json_decode($your_array_here, true);
This returns an array. It works well even if the encoded array was multilevel.
Ok... more numbers! (PHP 5.3.0 OSX, no opcode cache)
#Pascal's code on my machine for n=1 at 10k iters produces:
float(18.884856939316)
int(1075900)
I add unserialize() to the above as so.
$num = 1;
$list = array_fill(0, 5000, str_repeat('1234567890', $num));
$before = microtime(true);
for ($i=0 ; $i<10000 ; $i++) {
$str = serialize($list);
$list = unserialize($str);
}
$after = microtime(true);
var_dump($after-$before);
var_dump(memory_get_peak_usage());
produces
float(50.204112052917)
int(1606768)
I assume the extra 600k or so are the serialized string.
I was curious about var_export and its include/eval partner $str = var_export($list, true); instead of serialize() in the original produces
float(57.064643859863)
int(1066440)
so just a little less memory (at least for this simple example) but way more time already.
adding in eval('$list = '.$str.';'); instead of unserialize in the above produces
float(126.62566018105)
int(2944144)
Indicating theres probably a memory leak somewhere when doing eval :-/.
So again, these aren't great benchmarks (I really should isolate the eval/unserialize by putting the string in a local var or something, but I'm being lazy) but they show the associated trends. var_export seems slow.
Nope, there is no limit and this:
set_time_limit(0);
ini_set('memory_limit ', -1);
unserialize('s:2000000000:"a";');
is why you should have safe.mode = On or a extension like Suhosin installed, otherwise it will eat up all the memory in your system.
I think better than serialize is json_encode function. It got a drawback, that associative arrays and objects are not distinguished, but string result is smaller and easier to read by human, so also to debug and edit.
If you want to cache it (so I assume performance is the issue), use apc_add instead to avoid the performance hit of converting it to a string + gain cache in memory.
As stated above the only size limit is available memory.
A few other gotchas:
serialize'd data is not portable between multi-byte and single-byte character encodings.
PHP5 classes include NUL bytes that can cause havoc with code that doesn't expect them.
Your use case sounds like you're better off using a database to do that rather than relying solely on PHP's available resources. The advantages to using something like MySQL instead is that it's specifically engineered with memory management in mind for such things as storage and lookup.
It's really no fun constantly serializing and unserializing data just to update or change a few pieces of information.
I've just come across an instance where I thought I was hitting an upper limit of serialisation.
I'm persisting serialised objects to a database using a mysql TEXT field.
The limit of the available characters for a single-byte characters is 65,535 so whilst I can serialize much larger objects than that with PHP It's impossible to unserialize them as they are truncated by the limit of the TEXT field.
i have a case in which unserialize throws an exception on a large serialized object, size: 65535 (the magic number: 16bits full bit = 65536)

Categories