While optimizing a site for memory, I noticed a leap in memory consumption while including a large number of PHP class files (600+) for a specific purpose. Taking things apart I noticed that including a PHP file (and thus presumably compiling to opcodes) takes about about 50 times more memory than the filesize on disk.
In my case the files on disk are together around 800 kB in size (with indentation and comments, pure class declarations, not many strings), however after including them all, memory consumption was around 40 MB higher.
I measured like this (PHP 5.3.6):
echo memory_get_usage(), "<br>\n";
include($file);
echo memory_get_usage(), "<br>\n";
Within a loop over the 600 files I can watch memory consumption grow from basically zero to 40 MB. (There is no autoloader loading additonal classes, or any global code or constructor code that is executed immediately, it's really the pure include only.)
Is this normal behaviour? I assumed opcodes are more compact than pure source code (stripping out all spaces and comments, or having for example just one or two instruction bytes instead of a "foreach" string etc.)?
If this is normal, is there a way to optimize it? (I assume using an opcode cache would just spare me the compile time, not the actual memory consumption?)
Apparently that's just the way it is.
I've retested this from the ground up:
Include an empty zero length file: 784 bytes memory consumption increase
Include an empty class X { } definition: 2128 bytes
Include a class with one empty method: 2816 bytes
Include a class with two empty methods: 3504 bytes
The filesize of the include file is under 150 bytes in all tests.
Related
I have this file of 10 millions words, one word on every line. I'm trying to open that file, read every line, put it in an array and count the number of occurrences for each word.
wartek
mei_atnz
sommerray
swaggyfeed
yo_bada
ronnieradke
… and so on (10M+ lines)
I can open the file, read its size, even parse it line by line and echo the line on the browser (it's very long, of course), but when I'm trying to perform any other operation, the script just refuse to execute. No error, no warning, no die(…), nothing.
Accessing the file is always OK, but it's the operations which are not performed with the same success. I tried this and it worked…
while(!feof($pointer)) {
$row = fgets($pointer);
print_r($row);
}
… but this didn't :
while(!feof($pointer)) {
$row = fgets($pointer);
array_push($dest, $row);
}
Also tried with SplFileObject or file($source, FILE_IGNORE_NEW_LINES) with the same result every time (not okay with big file, okay with small file)
Guessing that the issue is not the size (150 ko), but probably the length (10M+ lines), I chunked the file to reduce it to ~20k without any improvement, then reduced it again to ~8k lines, and it worked.
I also removed the time limit with set_time_limit(0); or removed (almost) any memory limit both in the php.ini and in my script ini_set('memory_limit', '8192M');.Regarding the errors I could have, I set the error_reporting(E_ALL); at the top of my script.
So the questions are :
is there a maximum number of lines that can be read by PHP built-in functions?
why I can echo or print_r but not perform any other operations?
I think you might be running into a long execution time:
How to increase the execution timeout in php?
Different operation take different time. Printing might be a lot easier than pushing 10M new data into an array one-by-one. It's strange that you don't get any error messages, you should receive process exceeded time somewhere.
We have a severe memory leak in one of our regularly run scripts that quickly wipes out the free memory on the server. Despite many hours of research and experiments, though, I've been unable to even make a dent in it.
Here is the code:
echo '1:'.memory_get_usage()."\n";
ini_set('memory_limit', '1G');
echo '2:'.memory_get_usage()."\n";
$oXML = new DOMDocument();
echo '3:'.memory_get_usage()."\n";
$oXML->load('feed.xml'); # 556 MB file
echo '4:'.memory_get_usage()."\n";
$xpath = new DOMXPath($oXML);
echo '5:'.memory_get_usage()."\n";
$oNodes = $xpath->query('//feed/item'); # 270,401 items
echo '6:'.memory_get_usage()."\n";
unset($xpath);
echo '7:'.memory_get_usage()."\n";
unset($oNodes);
echo '8:'.memory_get_usage()."\n";
unset($oXML);
echo '9:'.memory_get_usage()."\n";
And here is the output:
1:679016
2:679320
3:680128
4:680568
5:681304
6:150852408
7:150851840
8:34169968
9:34169448
As you can see, when we use xpath to load the nodes into an object, memory usage jumps from 681,304 to 150,852,408. I'm not terribly concerned about that.
My problem is that even after destroying the $oNodes object, we're still stuck at memory usage of 34,169,968.
But the real problem is that the memory usage that PHP shows is a tiny fraction of the total memory eaten by the script. Using free -m directly from the command line on the server, we go from 3,295 MB memory used to 5,226 MB -- and it never goes back down. We're losing 2 GB of memory every time this script runs, and I am at a complete loss as to why or how to fix it.
I tried using SimpleXML instead, but the results were basically identical. I also studied these three threads but didn't find anything in them that helped:
XML xpath search and array looping with php, memory issue
DOMDocument / Xpath leaking memory during long command line process - any way to deconstruct this class
DOMDocument PHP Memory Leak
I'm hoping this is something easy that I'm just overlooking.
UPDATE 11/10: It does appear that memory is eventually freed up. I noticed that after a little more than 30 minutes, suddenly a big block came free again. Obviously, though, that hasn't been nearly fast enough recently to keep the server from running out of memory and locking up.
And for what it's worth, we're running PHP 5.3.15 with Apache 2.2.3 on Red Hat 5.11. We're working to update to the latest versions of all of those, so somewhere along that upgrade path, we might find this fixed. It would be great to do it before then, though.
Recently experienced a issue just like yours. We needed to extract data from a 3gb xml file and also noticed that server memory was reaching its limits. There are several ways you can decrease the memory usage;
instead of using xpath which causes the great amount of memory usage use (for example) file_get_contents. Then do a search via regular expression to find desired data
split the xml into smaller pieces. Basicly its reinventing the xml file, however you can handle the maximum sizes for the files (thus memory)
You mentioned that after 30 minutes some memory was released. Reading a 500mb xml over 30 minutes is way to slow. The solution we used is splitting up the 3gb xml file into several pieces (aprox 200). Our script writes the required data(around 700k records) to our database in less then 5 minutes.
We just experienced a similar issue with PHPDocxPro (which uses DomDocument) and submitted a patch to them that at least improves upon the problem. The memory usage reported by get_memory_usage() never increased, as though PHP was not aware of the allocation at all. The memory reported while watching execution via top or ps is what we were more concerned about.
// ps reports X memory usage
var $foo = (new DomDocument())->loadXML(getSomeXML());
// ps reports X + Y memory usage
var $foo = (new DomDocument())->loadXML(getSomeXML());
// ps reports X + ~2Y memory usage
var $foo = (new DomDocument())->loadXML(getSomeXML());
// ps reports X + ~3Y memory usage
Adding an unset() before each subsequent call...
// ps reports X memory usage
var $foo = (new DomDocument())->loadXML(getSomeXML());
// ps reports X + Y memory usage
unset($foo);
var $foo = (new DomDocument())->loadXML(getSomeXML());
// ps reports X + ~Y memory usage
unset($foo);
var $foo = (new DomDocument())->loadXML(getSomeXML());
// ps reports X + ~Y memory usage
I haven't dug into the extension code to understand what's going on, but my guess is that they're allocating memory without using PHP's allocation, and as such, it's not being counted as part of the heap that get_memory_usage() considers. Despite this, there does appear to be some reference counting to determine whether or not memory can be freed. The unset($foo) before a subsequent call makes sure that the extension can reuse some resources. Without that, memory usage increases every time the code is run.
I have a need to process a some large files say 50MB each. I have found that PHP functions use up large portions of memory. In the below example the memory used by PHP's functions ends up being four (4) times the file size. I can understand the transient usage of twice the memory size of the file, but not four times. In the end PHP blows out the memory_limit. While I can increase the PHP memory_limit it is not a good long term solution as I may have to process larger files, and in a production environment having PHP gobble up 400MB per process is not desirable.
Code:
$buf = '';
report_memory(__LINE__);
$buf = file_get_contents('./20MB.pdf');
report_memory(__LINE__);
base64_encode($buf);
report_memory(__LINE__);
urlencode($buf);
report_memory(__LINE__);
function report_memory($line=0) {
echo 'Line: ' . str_pad($line,3) . ' ';
echo 'Mem: ' . str_pad(intval(memory_get_usage()/1024 ) . 'K',8) . ' ';
echo 'Peak: ' . str_pad(intval(memory_get_peak_usage()/1024) . 'K',8) . ' ';
echo "\n";
}
Output:
Line: 4 Mem: 622K Peak: 627K
Line: 7 Mem: 21056K Peak: 21074K
Line: 10 Mem: 21056K Peak: 48302K
Line: 13 Mem: 21056K Peak: 82358K
One can see that for a 20MB file the current memory usage hovers at 21MB, while the peak memory usage jumps up to an insane 82MB.
The PHP functions used in the example are arbitrary, I can easily swap in str_replace, is_string, gettype, etc with the same results.
The question is how can I keep PHP from doing this?
The environment is CentOS 6.6 running a stock PHP 5.3.3.
Thanks for any insight.
You're url-encoding. Given that your PDF is basically "random" binary garbage, MANY of the bytes in there are non-printable. That means you're going from a one byte "binary" character to 3+ byte URL-encoded string. Given you've got a 20meg PDF, it's no surprise that tripling the amount of text in there is going to bloat your memory. Remember that PHP has to keep TWO copies of your PDF while it's working: the original "raw" version, and the working copy of whatever transform you're doing on it.
Assuming a worst-case "every single character gets encoded", your 20meg PDF will convert to a 60meg url-encoded string, causing a 20+60 = 80 meg peak usage, even though that 60meg encoded version is immediately tossed away.
I use the following code to create an image and encode it to base64. There is no direct output of the image.
ob_start(); // catching the output buffer
imagepng($imgSignature);
$base64Signature=base64_encode(ob_get_contents());
ob_end_clean();
ob_start started recently to throw error 500 and I have trouble figuring out the issue. The server uses php 5.4.11. I really don't know if it was running the same version as I installed the script, of if the memory runs full. I know that ob_start has changed throughout the php version. I really have a hard time to wrap my head around this. Is the script correct for php 5.4.11?
I really appreciate any help.
I'm not sure how to solve your issue with ob_start(), but I have an alternative for what you are doing that don't envolve output buffers.
imagepng($imgSignature, 'php://memory/file.png');
$base64Signature = base64_encode(file_get_contents('php://memory/file.png'));
This is basically saving the png image to a virtual temporary file that exists only in memory, then you read it back and have the same result.
My theory about your error:
At some point in your code, you will have this image stored multiple times in memory. In the $imgSignature, the internal buffer you created with ob_start(), the buffer you read with ob_get_contents(), and the resulting value of base64_encode(). Pretty much all in one line. God only knows how much memory its using, not to mention you probably allocated more resources before as you were mounting this image.
It is important to not have too much stuff allocated at the same time, specially when dealing with memory consuming resources like images. If you unset() or overwrite variables you no longer need, you will allow the garbage collector to do its job of disposing those unreferenced resources from memory.
For instance, you can change the way this piece of code was written to this:
ob_start();
imagepng($imgSignature);
imagedestroy($imgSignature);
$data = ob_get_contents();
ob_end_clean();
$data = base64_encode($data);
I dropped $imgSignature as soon as I didn't need it anymore, ended and cleaned my buffer as soon I was done getting what I wanted from it, and then disposed $data as I overwrote it with the base64 encoded $data that was really what I wanted.
Now this will use significantly less memory. If you extend this to the rest of your code, or do it at least to the parts that use a lot of memory like the images you loaded or created with the GD2 lib, it should optimize the memory usage of your script giving you that extra space you need.
I have a script that generates a little xls table (~25x15). It contains percentages and strings and i have an if operator that sets the current cell as percentage type with this code:
$this->objPHPExcel->getActiveSheet()->getStyle($coords)->getNumberFormat()->setFormatCode('0.00%');
But when i export and look at the file I see it only managed to set type and style about 20 cells. And all the rest are with default settings. I debugged it and realized the problem isn't in my logic. I read about increasing php cache memory - tried it but it didn't work. Please help because i need to export at least 15 times larger table. Thanks in advance!
PHPExcel allocates quite some memory
While PHPExcel is a beautiful library, using it may require huge amounts of memory allocated to PHP.
According to this thread, just 5 cells may render 6 MByte of memory usage:
<?php
require_once 'php/PHPExcelSVN/PHPExcel/IOFactory.php';
$objPHPExcel = PHPExcel_IOFactory::load("php/tinytest.xlsx");
$objPHPExcel->setActiveSheetIndex(0);
$objPHPExcel->getActiveSheet()->setCellValue('D2', 50);
echo $objPHPExcel->getActiveSheet()->getCell('D8')->getCalculatedValue() . "
";
echo date('H:i:s') . " Peak memory usage: " . (memory_get_peak_usage(true) / 1024 / 1024) . " MB\r\n";
?>
I get 6MB of memory usage.
Another user even failed with a 256MByte memory setting.
While PHPExcel provides ways to reduce its memory footprint, all reductions turned out to be too small in my case. This page on github provides details of PHPExcel's cache management options. For example, this setting serializes and the GZIPs the cell-structure of a worksheet:
$cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_in_memory_gzip;
PHPExcel_Settings::setCacheStorageMethod($cacheMethod);
PHPExcel's FAQ explains this:
Fatal error: Allowed memory size of xxx bytes exhausted (tried to
allocate yyy bytes) in zzz on line aaa
PHPExcel holds an "in memory" representation of a spreadsheet, so it
is susceptible to PHP's memory limitations. The memory made available
to PHP can be increased by editing the value of the memorylimit
directive in your php.ini file, or by using iniset('memory_limit',
'128M') within your code (ISP permitting);
Some Readers and Writers are faster than others, and they also use
differing amounts of memory. You can find some indication of the
relative performance and memory usage for the different Readers and
Writers, over the different versions of PHPExcel, here
http://phpexcel.codeplex.com/Thread/View.aspx?ThreadId=234150
If you've already increased memory to a maximum, or can't change your
memory limit, then this discussion on the board describes some of the
methods that can be applied to reduce the memory usage of your scripts
using PHPExcel
http://phpexcel.codeplex.com/Thread/View.aspx?ThreadId=242712
Measurement results for PHP Excel
I instrumented the PHPExcel example file [01simple.php][5] and did some quick testing.
Consume 92 KByte:
for( $n=0; $n<200; $n++ ) {
$objPHPExcel->setActiveSheetIndex(0) ->setCellValue('A' . $n, 'Miscellaneous glyphs');
}
Consumes 4164 KBytes:
for( $n=0; $n<200; $n++ ) {
$objPHPExcel->setActiveSheetIndex(0) ->setCellValue('A' . $n, 'Miscellaneous glyphs');
$objPHPExcel->getActiveSheet()->getStyle('A' . $n)->getAlignment()->setWrapText(true);
}
If one executes this fragment several times unchanged, each fragment consumes around 4 MBytes.
Checking your app is logically correct
To ensure, that your app is logically correct, I'd propose to increase PHP memory first:
ini_set('memory_limit', '32M');
In my case, I have to export to export result data of an online assessment application. While there are less than 100 cells horizontally, I need to export up to several 10.000 rows. While the amount of cells was big, each of my cells holds a number or a string of 3 characters - no formulas, no styles.
In case of strong memory restrictions or large spreadsheets
In my case, none of the cache options reduced the amount as much as required. Plus, the runtime of the application grew enormously.
Finally I had to switch over to old fashioned CSV-data file exports.
I ran your code locally and found that all the 26 cells you set to this percentage had the right format and a % sign. I had to uncomment lines 136-137 first, of course.
This must be related to your setup. I cannot imagine you'd have too little memory for a spreadsheet of this size.
For your information, I confirmed it worked on PHP Version 5.4.16 with php excel version version 1.7.6, 2011-02-27. I opened the spreadsheet with MS Excel 2007.
<?php
$file = 'output_log.txt';
function get_owner($file)
{
$stat = stat($file);
$user = posix_getpwuid($stat['uid']);
return $user['name'];
}
$format = "UID # %s: %s\n";
printf($format, date('r'), get_owner($file));
chown($file, 'ross');
printf($format, date('r'), get_owner($file));
clearstatcache();
printf($format, date('r'), get_owner($file));
?>
clearstatcache(); can be useful. Load this function at the start of the php page.