PHP Internal Memory Bloat - php

I have a need to process a some large files say 50MB each. I have found that PHP functions use up large portions of memory. In the below example the memory used by PHP's functions ends up being four (4) times the file size. I can understand the transient usage of twice the memory size of the file, but not four times. In the end PHP blows out the memory_limit. While I can increase the PHP memory_limit it is not a good long term solution as I may have to process larger files, and in a production environment having PHP gobble up 400MB per process is not desirable.
Code:
$buf = '';
report_memory(__LINE__);
$buf = file_get_contents('./20MB.pdf');
report_memory(__LINE__);
base64_encode($buf);
report_memory(__LINE__);
urlencode($buf);
report_memory(__LINE__);
function report_memory($line=0) {
echo 'Line: ' . str_pad($line,3) . ' ';
echo 'Mem: ' . str_pad(intval(memory_get_usage()/1024 ) . 'K',8) . ' ';
echo 'Peak: ' . str_pad(intval(memory_get_peak_usage()/1024) . 'K',8) . ' ';
echo "\n";
}
Output:
Line: 4 Mem: 622K Peak: 627K
Line: 7 Mem: 21056K Peak: 21074K
Line: 10 Mem: 21056K Peak: 48302K
Line: 13 Mem: 21056K Peak: 82358K
One can see that for a 20MB file the current memory usage hovers at 21MB, while the peak memory usage jumps up to an insane 82MB.
The PHP functions used in the example are arbitrary, I can easily swap in str_replace, is_string, gettype, etc with the same results.
The question is how can I keep PHP from doing this?
The environment is CentOS 6.6 running a stock PHP 5.3.3.
Thanks for any insight.

You're url-encoding. Given that your PDF is basically "random" binary garbage, MANY of the bytes in there are non-printable. That means you're going from a one byte "binary" character to 3+ byte URL-encoded string. Given you've got a 20meg PDF, it's no surprise that tripling the amount of text in there is going to bloat your memory. Remember that PHP has to keep TWO copies of your PDF while it's working: the original "raw" version, and the working copy of whatever transform you're doing on it.
Assuming a worst-case "every single character gets encoded", your 20meg PDF will convert to a 60meg url-encoded string, causing a 20+60 = 80 meg peak usage, even though that 60meg encoded version is immediately tossed away.

Related

PHP won't read full file into array, only partial

I have a file with 3,200,000 lines of csv data (with 450 columns). Total file size is 6 GB.
I read the file like this:
$data = file('csv.out');
Without fail, it only reads 897,000 lines. I confirmed with 'print_r', and echo sizeof($data). I increased my "memory_limit" to a ridiculous value like 80 GB but didn't make a difference.
Now, it DID read in my other large file, same number of lines (3,200,000) but only a few columns so total file size 1.1 GB. So it appears to be a total file size issue. FYI, 897,000 lines in the $data array is around 1.68 GB.
Update: I increased the second (longer) file to 2.1 GB (over 5 million lines) and it reads it in fine, yet truncates the other file at 1.68 GB. So does not appear to be a size issue. If I continue to increase the size of the second file to 2.2 GB, instead of truncating it and continuing the program (like it does for the first file), it dies and core dumps.
Update: I verified my system is 64 bit by printing integer and float numbers:
<?php
$large_number = 2147483647;
var_dump($large_number); // int(2147483647)
$large_number = 2147483648;
var_dump($large_number); // float(2147483648)
$million = 1000000;
$large_number = 50000 * $million;
var_dump($large_number); // float(50000000000)
$large_number = 9223372036854775807;
var_dump($large_number); //
int(9223372036854775807)
$large_number = 9223372036854775808;
var_dump($large_number); //
float(9.2233720368548E+18)
$million = 1000000;
$large_number = 50000000000000 * $million;
var_dump($large_number); // float(5.0E+19)
print "PHP_INT_MAX: " . PHP_INT_MAX . "\n";
print "PHP_INT_SIZE: " . PHP_INT_SIZE . " bytes (" . (PHP_INT_SIZE * 8) . " bits)\n";
?>
The output from this script is:
int(2147483647)
int(2147483648)
int(50000000000)
int(9223372036854775807)
float(9.2233720368548E+18)
float(5.0E+19)
PHP_INT_MAX: 9223372036854775807
PHP_INT_SIZE: 8 bytes (64 bits)
So since it's 64 bit, and memory limit is set really high, why is PHP not reading files > 2.15 GB?
Some things that come to mind:
If you're using a 32 bits PHP, you cannot read files that are larger than 2GB.
If reading the file takes too long, there could be time-outs.
If the file is really huge, then reading it all into memory is going to be problematic. It's usually better to read blocks of data and process that, unless you need random access to all parts of the file.
Another approach (i've used that in the past), is to chop the large file into smaller, more manageable ones (should work if it's a straightforwards log file for example)
I fixed it. All I had to do was change the way I read the files. Why...I do not know.
Old code that only reads 2.15 GB out of 6.0 GB:
$data = file('csv.out');
New code that reads the full 6.0 GB:
$data = array();
$i=1;
$handle = fopen('csv.out');
if ($handle) {
while (($data[$i] = fgets($handle)) !== false){
// process the line read
$i++;
}
Feel free to shed some light on why. There must be some limitation when using
$var=file();
Interestingly, 2.15 GB is close to the 32 bit limit I read about.

PHP XML Memory Leak?

We have a severe memory leak in one of our regularly run scripts that quickly wipes out the free memory on the server. Despite many hours of research and experiments, though, I've been unable to even make a dent in it.
Here is the code:
echo '1:'.memory_get_usage()."\n";
ini_set('memory_limit', '1G');
echo '2:'.memory_get_usage()."\n";
$oXML = new DOMDocument();
echo '3:'.memory_get_usage()."\n";
$oXML->load('feed.xml'); # 556 MB file
echo '4:'.memory_get_usage()."\n";
$xpath = new DOMXPath($oXML);
echo '5:'.memory_get_usage()."\n";
$oNodes = $xpath->query('//feed/item'); # 270,401 items
echo '6:'.memory_get_usage()."\n";
unset($xpath);
echo '7:'.memory_get_usage()."\n";
unset($oNodes);
echo '8:'.memory_get_usage()."\n";
unset($oXML);
echo '9:'.memory_get_usage()."\n";
And here is the output:
1:679016
2:679320
3:680128
4:680568
5:681304
6:150852408
7:150851840
8:34169968
9:34169448
As you can see, when we use xpath to load the nodes into an object, memory usage jumps from 681,304 to 150,852,408. I'm not terribly concerned about that.
My problem is that even after destroying the $oNodes object, we're still stuck at memory usage of 34,169,968.
But the real problem is that the memory usage that PHP shows is a tiny fraction of the total memory eaten by the script. Using free -m directly from the command line on the server, we go from 3,295 MB memory used to 5,226 MB -- and it never goes back down. We're losing 2 GB of memory every time this script runs, and I am at a complete loss as to why or how to fix it.
I tried using SimpleXML instead, but the results were basically identical. I also studied these three threads but didn't find anything in them that helped:
XML xpath search and array looping with php, memory issue
DOMDocument / Xpath leaking memory during long command line process - any way to deconstruct this class
DOMDocument PHP Memory Leak
I'm hoping this is something easy that I'm just overlooking.
UPDATE 11/10: It does appear that memory is eventually freed up. I noticed that after a little more than 30 minutes, suddenly a big block came free again. Obviously, though, that hasn't been nearly fast enough recently to keep the server from running out of memory and locking up.
And for what it's worth, we're running PHP 5.3.15 with Apache 2.2.3 on Red Hat 5.11. We're working to update to the latest versions of all of those, so somewhere along that upgrade path, we might find this fixed. It would be great to do it before then, though.
Recently experienced a issue just like yours. We needed to extract data from a 3gb xml file and also noticed that server memory was reaching its limits. There are several ways you can decrease the memory usage;
instead of using xpath which causes the great amount of memory usage use (for example) file_get_contents. Then do a search via regular expression to find desired data
split the xml into smaller pieces. Basicly its reinventing the xml file, however you can handle the maximum sizes for the files (thus memory)
You mentioned that after 30 minutes some memory was released. Reading a 500mb xml over 30 minutes is way to slow. The solution we used is splitting up the 3gb xml file into several pieces (aprox 200). Our script writes the required data(around 700k records) to our database in less then 5 minutes.
We just experienced a similar issue with PHPDocxPro (which uses DomDocument) and submitted a patch to them that at least improves upon the problem. The memory usage reported by get_memory_usage() never increased, as though PHP was not aware of the allocation at all. The memory reported while watching execution via top or ps is what we were more concerned about.
// ps reports X memory usage
var $foo = (new DomDocument())->loadXML(getSomeXML());
// ps reports X + Y memory usage
var $foo = (new DomDocument())->loadXML(getSomeXML());
// ps reports X + ~2Y memory usage
var $foo = (new DomDocument())->loadXML(getSomeXML());
// ps reports X + ~3Y memory usage
Adding an unset() before each subsequent call...
// ps reports X memory usage
var $foo = (new DomDocument())->loadXML(getSomeXML());
// ps reports X + Y memory usage
unset($foo);
var $foo = (new DomDocument())->loadXML(getSomeXML());
// ps reports X + ~Y memory usage
unset($foo);
var $foo = (new DomDocument())->loadXML(getSomeXML());
// ps reports X + ~Y memory usage
I haven't dug into the extension code to understand what's going on, but my guess is that they're allocating memory without using PHP's allocation, and as such, it's not being counted as part of the heap that get_memory_usage() considers. Despite this, there does appear to be some reference counting to determine whether or not memory can be freed. The unset($foo) before a subsequent call makes sure that the extension can reuse some resources. Without that, memory usage increases every time the code is run.

PHP Allowed memory size Memory size exhausted

I am trying to display users on a map using google API. Now that when users count increase to 12000 I got an memory exception error. For the time being I increased the memory to 256 from 128. But I am sure when its 25000 users again the same error will come.
The error is,
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 30720 bytes) in /var/www/html/digitalebox_v2_20150904_100/protected/extensions/bootstrap/widgets/TbBaseMenu.phponline 193
Fatal error: Class declarations may not be nestedin/var/www/html/yii-1.1.14.f0fee9/framework/collections/CListIterator.phponline 20`
My code,
$criteria = new CDbCriteria();
$criteria->condition = 't.date BETWEEN :from_date AND :to_date';
$modelUsers = User::model()->findAll($criteria);
foreach ($modelUsers as $modelUser) {
$fullAddress = $modelUser->address1 . ', ' . $modelUser->zip . ' ' . $modelUser->city . ', ' . Country::model()->findByPk($modelUser->countryCode)->countryName;
}
when this $modelUsers has 12000 records this memory problem comes as its 12000 user objects.
what should i do to prevent this kind of issues in future ? what is the minimum required memory size for a php application to run ?
When you call findAll it loads all records in time, so you get end memory error. Speacialy for that situations Yii has CDataProviderIterator. It allows iteration over large data sets without holding the entire set in memory.
$dataProvider = new CActiveDataProvider("User");
$iterator = new CDataProviderIterator($dataProvider);
foreach($iterator as $user) {
$fullAddress = $modelUser->address1 . ', ' . $modelUser->zip . ' ' . $modelUser->city . ', ' . Country::model()->findByPk($modelUser->countryCode)->countryName;
}
I'd solve this problem in a completely different way than suggested. I'd ditch the whole model concept and I'd query MySQL for the addresses. You can query MySQL so it returns already concatenated address, which means you can avoid concatenating it in PHP - that avoids wasting memory.
Next step would be using PDO and issuing an unbuffered query. This means that PHP process will not store entire 12 000 records in its memory - that's how you avoid exhausting the memory.
Final step is outputting the result - as you loop through the unbuffered query, you can use output a single row (or 10 rows) at a time.
What happens this way is that you trade CPU for RAM. Since you don't know how much RAM you will need, the best approach is to use as little as possible.
Using unbuffered queries and flushing PHP's output buffer seems like the only viable way to go for your use case, seeing you can't avoid outputting a lot of data.
Perhaps increase the memory a bit more and see if that corrects the issue. Try setting it to 512M or 1024M. I will tell you from my own experience, if you are trying to load google map markers that number of markers will probably crash the map.
Increase the value of memory_limit in php.ini. Then restart Apache or whatever PHP is running under. Keep in mind that what you add here needs to be taken away from other programs running on the same machine (such MySQL and its innodb_buffer_pool_size).

PhpExcel stops working after setting 20 cell types

I have a script that generates a little xls table (~25x15). It contains percentages and strings and i have an if operator that sets the current cell as percentage type with this code:
$this->objPHPExcel->getActiveSheet()->getStyle($coords)->getNumberFormat()->setFormatCode('0.00%');
But when i export and look at the file I see it only managed to set type and style about 20 cells. And all the rest are with default settings. I debugged it and realized the problem isn't in my logic. I read about increasing php cache memory - tried it but it didn't work. Please help because i need to export at least 15 times larger table. Thanks in advance!
PHPExcel allocates quite some memory
While PHPExcel is a beautiful library, using it may require huge amounts of memory allocated to PHP.
According to this thread, just 5 cells may render 6 MByte of memory usage:
<?php
require_once 'php/PHPExcelSVN/PHPExcel/IOFactory.php';
$objPHPExcel = PHPExcel_IOFactory::load("php/tinytest.xlsx");
$objPHPExcel->setActiveSheetIndex(0);
$objPHPExcel->getActiveSheet()->setCellValue('D2', 50);
echo $objPHPExcel->getActiveSheet()->getCell('D8')->getCalculatedValue() . "
";
echo date('H:i:s') . " Peak memory usage: " . (memory_get_peak_usage(true) / 1024 / 1024) . " MB\r\n";
?>
I get 6MB of memory usage.
Another user even failed with a 256MByte memory setting.
While PHPExcel provides ways to reduce its memory footprint, all reductions turned out to be too small in my case. This page on github provides details of PHPExcel's cache management options. For example, this setting serializes and the GZIPs the cell-structure of a worksheet:
$cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_in_memory_gzip;
PHPExcel_Settings::setCacheStorageMethod($cacheMethod);
PHPExcel's FAQ explains this:
Fatal error: Allowed memory size of xxx bytes exhausted (tried to
allocate yyy bytes) in zzz on line aaa
PHPExcel holds an "in memory" representation of a spreadsheet, so it
is susceptible to PHP's memory limitations. The memory made available
to PHP can be increased by editing the value of the memorylimit
directive in your php.ini file, or by using iniset('memory_limit',
'128M') within your code (ISP permitting);
Some Readers and Writers are faster than others, and they also use
differing amounts of memory. You can find some indication of the
relative performance and memory usage for the different Readers and
Writers, over the different versions of PHPExcel, here
http://phpexcel.codeplex.com/Thread/View.aspx?ThreadId=234150
If you've already increased memory to a maximum, or can't change your
memory limit, then this discussion on the board describes some of the
methods that can be applied to reduce the memory usage of your scripts
using PHPExcel
http://phpexcel.codeplex.com/Thread/View.aspx?ThreadId=242712
Measurement results for PHP Excel
I instrumented the PHPExcel example file [01simple.php][5] and did some quick testing.
Consume 92 KByte:
for( $n=0; $n<200; $n++ ) {
$objPHPExcel->setActiveSheetIndex(0) ->setCellValue('A' . $n, 'Miscellaneous glyphs');
}
Consumes 4164 KBytes:
for( $n=0; $n<200; $n++ ) {
$objPHPExcel->setActiveSheetIndex(0) ->setCellValue('A' . $n, 'Miscellaneous glyphs');
$objPHPExcel->getActiveSheet()->getStyle('A' . $n)->getAlignment()->setWrapText(true);
}
If one executes this fragment several times unchanged, each fragment consumes around 4 MBytes.
Checking your app is logically correct
To ensure, that your app is logically correct, I'd propose to increase PHP memory first:
ini_set('memory_limit', '32M');
In my case, I have to export to export result data of an online assessment application. While there are less than 100 cells horizontally, I need to export up to several 10.000 rows. While the amount of cells was big, each of my cells holds a number or a string of 3 characters - no formulas, no styles.
In case of strong memory restrictions or large spreadsheets
In my case, none of the cache options reduced the amount as much as required. Plus, the runtime of the application grew enormously.
Finally I had to switch over to old fashioned CSV-data file exports.
I ran your code locally and found that all the 26 cells you set to this percentage had the right format and a % sign. I had to uncomment lines 136-137 first, of course.
This must be related to your setup. I cannot imagine you'd have too little memory for a spreadsheet of this size.
For your information, I confirmed it worked on PHP Version 5.4.16 with php excel version version 1.7.6, 2011-02-27. I opened the spreadsheet with MS Excel 2007.
<?php
$file = 'output_log.txt';
function get_owner($file)
{
$stat = stat($file);
$user = posix_getpwuid($stat['uid']);
return $user['name'];
}
$format = "UID # %s: %s\n";
printf($format, date('r'), get_owner($file));
chown($file, 'ross');
printf($format, date('r'), get_owner($file));
clearstatcache();
printf($format, date('r'), get_owner($file));
?>
clearstatcache(); can be useful. Load this function at the start of the php page.

PHP opcode memory hog during include?

While optimizing a site for memory, I noticed a leap in memory consumption while including a large number of PHP class files (600+) for a specific purpose. Taking things apart I noticed that including a PHP file (and thus presumably compiling to opcodes) takes about about 50 times more memory than the filesize on disk.
In my case the files on disk are together around 800 kB in size (with indentation and comments, pure class declarations, not many strings), however after including them all, memory consumption was around 40 MB higher.
I measured like this (PHP 5.3.6):
echo memory_get_usage(), "<br>\n";
include($file);
echo memory_get_usage(), "<br>\n";
Within a loop over the 600 files I can watch memory consumption grow from basically zero to 40 MB. (There is no autoloader loading additonal classes, or any global code or constructor code that is executed immediately, it's really the pure include only.)
Is this normal behaviour? I assumed opcodes are more compact than pure source code (stripping out all spaces and comments, or having for example just one or two instruction bytes instead of a "foreach" string etc.)?
If this is normal, is there a way to optimize it? (I assume using an opcode cache would just spare me the compile time, not the actual memory consumption?)
Apparently that's just the way it is.
I've retested this from the ground up:
Include an empty zero length file: 784 bytes memory consumption increase
Include an empty class X { } definition: 2128 bytes
Include a class with one empty method: 2816 bytes
Include a class with two empty methods: 3504 bytes
The filesize of the include file is under 150 bytes in all tests.

Categories