PHP XML Memory Leak? - php

We have a severe memory leak in one of our regularly run scripts that quickly wipes out the free memory on the server. Despite many hours of research and experiments, though, I've been unable to even make a dent in it.
Here is the code:
echo '1:'.memory_get_usage()."\n";
ini_set('memory_limit', '1G');
echo '2:'.memory_get_usage()."\n";
$oXML = new DOMDocument();
echo '3:'.memory_get_usage()."\n";
$oXML->load('feed.xml'); # 556 MB file
echo '4:'.memory_get_usage()."\n";
$xpath = new DOMXPath($oXML);
echo '5:'.memory_get_usage()."\n";
$oNodes = $xpath->query('//feed/item'); # 270,401 items
echo '6:'.memory_get_usage()."\n";
unset($xpath);
echo '7:'.memory_get_usage()."\n";
unset($oNodes);
echo '8:'.memory_get_usage()."\n";
unset($oXML);
echo '9:'.memory_get_usage()."\n";
And here is the output:
1:679016
2:679320
3:680128
4:680568
5:681304
6:150852408
7:150851840
8:34169968
9:34169448
As you can see, when we use xpath to load the nodes into an object, memory usage jumps from 681,304 to 150,852,408. I'm not terribly concerned about that.
My problem is that even after destroying the $oNodes object, we're still stuck at memory usage of 34,169,968.
But the real problem is that the memory usage that PHP shows is a tiny fraction of the total memory eaten by the script. Using free -m directly from the command line on the server, we go from 3,295 MB memory used to 5,226 MB -- and it never goes back down. We're losing 2 GB of memory every time this script runs, and I am at a complete loss as to why or how to fix it.
I tried using SimpleXML instead, but the results were basically identical. I also studied these three threads but didn't find anything in them that helped:
XML xpath search and array looping with php, memory issue
DOMDocument / Xpath leaking memory during long command line process - any way to deconstruct this class
DOMDocument PHP Memory Leak
I'm hoping this is something easy that I'm just overlooking.
UPDATE 11/10: It does appear that memory is eventually freed up. I noticed that after a little more than 30 minutes, suddenly a big block came free again. Obviously, though, that hasn't been nearly fast enough recently to keep the server from running out of memory and locking up.
And for what it's worth, we're running PHP 5.3.15 with Apache 2.2.3 on Red Hat 5.11. We're working to update to the latest versions of all of those, so somewhere along that upgrade path, we might find this fixed. It would be great to do it before then, though.

Recently experienced a issue just like yours. We needed to extract data from a 3gb xml file and also noticed that server memory was reaching its limits. There are several ways you can decrease the memory usage;
instead of using xpath which causes the great amount of memory usage use (for example) file_get_contents. Then do a search via regular expression to find desired data
split the xml into smaller pieces. Basicly its reinventing the xml file, however you can handle the maximum sizes for the files (thus memory)
You mentioned that after 30 minutes some memory was released. Reading a 500mb xml over 30 minutes is way to slow. The solution we used is splitting up the 3gb xml file into several pieces (aprox 200). Our script writes the required data(around 700k records) to our database in less then 5 minutes.

We just experienced a similar issue with PHPDocxPro (which uses DomDocument) and submitted a patch to them that at least improves upon the problem. The memory usage reported by get_memory_usage() never increased, as though PHP was not aware of the allocation at all. The memory reported while watching execution via top or ps is what we were more concerned about.
// ps reports X memory usage
var $foo = (new DomDocument())->loadXML(getSomeXML());
// ps reports X + Y memory usage
var $foo = (new DomDocument())->loadXML(getSomeXML());
// ps reports X + ~2Y memory usage
var $foo = (new DomDocument())->loadXML(getSomeXML());
// ps reports X + ~3Y memory usage
Adding an unset() before each subsequent call...
// ps reports X memory usage
var $foo = (new DomDocument())->loadXML(getSomeXML());
// ps reports X + Y memory usage
unset($foo);
var $foo = (new DomDocument())->loadXML(getSomeXML());
// ps reports X + ~Y memory usage
unset($foo);
var $foo = (new DomDocument())->loadXML(getSomeXML());
// ps reports X + ~Y memory usage
I haven't dug into the extension code to understand what's going on, but my guess is that they're allocating memory without using PHP's allocation, and as such, it's not being counted as part of the heap that get_memory_usage() considers. Despite this, there does appear to be some reference counting to determine whether or not memory can be freed. The unset($foo) before a subsequent call makes sure that the extension can reuse some resources. Without that, memory usage increases every time the code is run.

Related

PHP Allowed memory size Memory size exhausted

I am trying to display users on a map using google API. Now that when users count increase to 12000 I got an memory exception error. For the time being I increased the memory to 256 from 128. But I am sure when its 25000 users again the same error will come.
The error is,
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 30720 bytes) in /var/www/html/digitalebox_v2_20150904_100/protected/extensions/bootstrap/widgets/TbBaseMenu.phponline 193
Fatal error: Class declarations may not be nestedin/var/www/html/yii-1.1.14.f0fee9/framework/collections/CListIterator.phponline 20`
My code,
$criteria = new CDbCriteria();
$criteria->condition = 't.date BETWEEN :from_date AND :to_date';
$modelUsers = User::model()->findAll($criteria);
foreach ($modelUsers as $modelUser) {
$fullAddress = $modelUser->address1 . ', ' . $modelUser->zip . ' ' . $modelUser->city . ', ' . Country::model()->findByPk($modelUser->countryCode)->countryName;
}
when this $modelUsers has 12000 records this memory problem comes as its 12000 user objects.
what should i do to prevent this kind of issues in future ? what is the minimum required memory size for a php application to run ?
When you call findAll it loads all records in time, so you get end memory error. Speacialy for that situations Yii has CDataProviderIterator. It allows iteration over large data sets without holding the entire set in memory.
$dataProvider = new CActiveDataProvider("User");
$iterator = new CDataProviderIterator($dataProvider);
foreach($iterator as $user) {
$fullAddress = $modelUser->address1 . ', ' . $modelUser->zip . ' ' . $modelUser->city . ', ' . Country::model()->findByPk($modelUser->countryCode)->countryName;
}
I'd solve this problem in a completely different way than suggested. I'd ditch the whole model concept and I'd query MySQL for the addresses. You can query MySQL so it returns already concatenated address, which means you can avoid concatenating it in PHP - that avoids wasting memory.
Next step would be using PDO and issuing an unbuffered query. This means that PHP process will not store entire 12 000 records in its memory - that's how you avoid exhausting the memory.
Final step is outputting the result - as you loop through the unbuffered query, you can use output a single row (or 10 rows) at a time.
What happens this way is that you trade CPU for RAM. Since you don't know how much RAM you will need, the best approach is to use as little as possible.
Using unbuffered queries and flushing PHP's output buffer seems like the only viable way to go for your use case, seeing you can't avoid outputting a lot of data.
Perhaps increase the memory a bit more and see if that corrects the issue. Try setting it to 512M or 1024M. I will tell you from my own experience, if you are trying to load google map markers that number of markers will probably crash the map.
Increase the value of memory_limit in php.ini. Then restart Apache or whatever PHP is running under. Keep in mind that what you add here needs to be taken away from other programs running on the same machine (such MySQL and its innodb_buffer_pool_size).

PHP: How to solve ob_start() in combination imagepng() issue?

I use the following code to create an image and encode it to base64. There is no direct output of the image.
ob_start(); // catching the output buffer
imagepng($imgSignature);
$base64Signature=base64_encode(ob_get_contents());
ob_end_clean();
ob_start started recently to throw error 500 and I have trouble figuring out the issue. The server uses php 5.4.11. I really don't know if it was running the same version as I installed the script, of if the memory runs full. I know that ob_start has changed throughout the php version. I really have a hard time to wrap my head around this. Is the script correct for php 5.4.11?
I really appreciate any help.
I'm not sure how to solve your issue with ob_start(), but I have an alternative for what you are doing that don't envolve output buffers.
imagepng($imgSignature, 'php://memory/file.png');
$base64Signature = base64_encode(file_get_contents('php://memory/file.png'));
This is basically saving the png image to a virtual temporary file that exists only in memory, then you read it back and have the same result.
My theory about your error:
At some point in your code, you will have this image stored multiple times in memory. In the $imgSignature, the internal buffer you created with ob_start(), the buffer you read with ob_get_contents(), and the resulting value of base64_encode(). Pretty much all in one line. God only knows how much memory its using, not to mention you probably allocated more resources before as you were mounting this image.
It is important to not have too much stuff allocated at the same time, specially when dealing with memory consuming resources like images. If you unset() or overwrite variables you no longer need, you will allow the garbage collector to do its job of disposing those unreferenced resources from memory.
For instance, you can change the way this piece of code was written to this:
ob_start();
imagepng($imgSignature);
imagedestroy($imgSignature);
$data = ob_get_contents();
ob_end_clean();
$data = base64_encode($data);
I dropped $imgSignature as soon as I didn't need it anymore, ended and cleaned my buffer as soon I was done getting what I wanted from it, and then disposed $data as I overwrote it with the base64 encoded $data that was really what I wanted.
Now this will use significantly less memory. If you extend this to the rest of your code, or do it at least to the parts that use a lot of memory like the images you loaded or created with the GD2 lib, it should optimize the memory usage of your script giving you that extra space you need.

PHP Using fgetcsv on a huge csv file

Using fgetcsv, can I somehow do a destructive read where rows I've read and processed would be discarded so if I don't make it through the whole file in the first pass, I can come back and pick up where I left off before the script timed out?
Additional Details:
I'm getting a daily product feed from a vendor that comes across as a 200mb .gz file. When I unpack the file, it turns into a 1.5gb .csv with nearly 500,000 rows and 20 - 25 fields. I need to read this information into a MySQL db, ideally with PHP so I can schedule a CRON to run the script at my web hosting provider every day.
I have a hard timeout on the server set to 180 seconds by the hosting provider, and max memory utilization limit of 128mb for any single script. These limits cannot be changed by me.
My idea was to grab the information from the .csv using the fgetcsv function, but I'm expecting to have to take multiple passes at the file because of the 3 minute timeout, I was thinking it would be nice to whittle away at the file as I process it so I wouldn't need to spend cycles skipping over rows that were already processed in a previous pass.
From your problem description it really sounds like you need to switch hosts. Processing a 2 GB file with a hard time limit is not a very constructive environment. Having said that, deleting read lines from the file is even less constructive, since you would have to rewrite the entire 2 GB to disk minus the part you have already read, which is incredibly expensive.
Assuming you save how many rows you have already processed, you can skip rows like this:
$alreadyProcessed = 42; // for example
$i = 0;
while ($row = fgetcsv($fileHandle)) {
if ($i++ < $alreadyProcessed) {
continue;
}
...
}
However, this means you're reading the entire 2 GB file from the beginning each time you go through it, which in itself already takes a while and you'll be able to process fewer and fewer rows each time you start again.
The best solution here is to remember the current position of the file pointer, for which ftell is the function you're looking for:
$lastPosition = file_get_contents('last_position.txt');
$fh = fopen('my.csv', 'r');
fseek($fh, $lastPosition);
while ($row = fgetcsv($fh)) {
...
file_put_contents('last_position.txt', ftell($fh));
}
This allows you to jump right back to the last position you were at and continue reading. You obviously want to add a lot of error handling here, so you're never in an inconsistent state no matter which point your script is interrupted at.
You can avoid timeout and memory error to some extent when reading like a Stream. By Reading line by line and then inserts each line into a database (Or Process accordingly). In that way only single line is hold in memory on each iteration. Please note don't try to load a huge csv-file into an array, that really would consume a lot of memory.
if(($handle = fopen("yourHugeCSV.csv", 'r')) !== false)
{
// Get the first row (Header)
$header = fgetcsv($handle);
// loop through the file line-by-line
while(($data = fgetcsv($handle)) !== false)
{
// Process Your Data
unset($data);
}
fclose($handle);
}
I think a better solution (it will be phenomnally inefficient to continuously rewind and write to open file stream) would be to track the file position of each record read (using ftell) and store it with the data you've read - then if you have to resume, then just fseek to the last position.
You could try loading the file directly using mysql's read file function (which will likely be a lot faster) although I've had problems with this in the past and ended up writing my own php code.
I have a hard timeout on the server set to 180 seconds by the hosting provider, and max memory utilization limit of 128mb for any single script. These limits cannot be changed by me.
What have you tried?
The memory can be limited by other means than the php.ini file, but I can't imagine how anyone could actually prevent you from using a different execution time (even if ini_set is disabled, from the command line you could run php -d max_execution_time=3000 /your/script.php or php -c /path/to/custom/inifile /your/script.php )
Unless you are trying to fit the entire datafile into memory then there should be no issue with a memory limit of 128Mb

PHP opcode memory hog during include?

While optimizing a site for memory, I noticed a leap in memory consumption while including a large number of PHP class files (600+) for a specific purpose. Taking things apart I noticed that including a PHP file (and thus presumably compiling to opcodes) takes about about 50 times more memory than the filesize on disk.
In my case the files on disk are together around 800 kB in size (with indentation and comments, pure class declarations, not many strings), however after including them all, memory consumption was around 40 MB higher.
I measured like this (PHP 5.3.6):
echo memory_get_usage(), "<br>\n";
include($file);
echo memory_get_usage(), "<br>\n";
Within a loop over the 600 files I can watch memory consumption grow from basically zero to 40 MB. (There is no autoloader loading additonal classes, or any global code or constructor code that is executed immediately, it's really the pure include only.)
Is this normal behaviour? I assumed opcodes are more compact than pure source code (stripping out all spaces and comments, or having for example just one or two instruction bytes instead of a "foreach" string etc.)?
If this is normal, is there a way to optimize it? (I assume using an opcode cache would just spare me the compile time, not the actual memory consumption?)
Apparently that's just the way it is.
I've retested this from the ground up:
Include an empty zero length file: 784 bytes memory consumption increase
Include an empty class X { } definition: 2128 bytes
Include a class with one empty method: 2816 bytes
Include a class with two empty methods: 3504 bytes
The filesize of the include file is under 150 bytes in all tests.

How to overwrite php memory for security reason?

I am actually working on a security script and it seems that I meet a problem with PHP and the way PHP uses memory.
my.php:
<?php
// Display current PID
echo 'pid= ', posix_getpid(), PHP_EOL;
// The user type a very secret key
echo 'Fill secret: ';
$my_secret_key = trim(fgets(STDIN));
// 'Destroty' the secret key
unset($my_secret_key);
// Wait for something
echo 'waiting...';
sleep(60);
And now I run the script:
php my.php
pid= 1402
Fill secret: AZERTY <= User input
waiting...
Before the script end (while sleeping), I generate a core file sending SIGSEV signal to the script
kill -11 1402
I inspect the corefile:
strings core | less
Here is an extract of the result:
...
fjssdd
sleep
STDIN
AZERTY <==== this is the secret key
zergdf
...
I understand that the memory is just released with the unset and not 'destroyed'. The data are not really removed (a call to the free() function)
So if someone dumps the memory of the process, even after the script execution, he could read $my_secret_key (until the memory space will be overwritten by another process)
Is there a way to overwrite this memory segment of the full memory space after the PHP script execution?
Thanks to all for your comments.
I already now how memory is managed by the system.
Even if PHP doesn't use malloc and free (but some edited versions like emalloc or efree), it seems (and I understand why) it is simply impossible for PHP to 'trash' after freeing disallowed memory.
The question was more by curiosity, and every comments seems to confirm what I previously intend to do: write a little piece of code in a memory aware language (c?) to handle this special part by allocating a simple string with malloc, overwriting with XXXXXX after using THEN freeing.
Thanks to all
J
You seem to be lacking a lot of understanding about how memory management works in general, and specifically within PHP.
A discussion of the various salient points is redundant when you consider what the security risk is here:
So if someone dumps the memory of the process, even after the script execution
If someone can access the memory of a program running under a different uid then they have root access and can compromise the target in so many other ways - and it doesn't matter if it's PHP script, ssh, an Oracle DBMS....
If someone can access the memory previously occupied by a process which has now terminated, then not only have they got root, they've already compromised the kernel.
You seem to have missed an important lesson in what computers mean by "delete operations".
See, it's never feasible for computer to zero-out memory, but instead they just "forget" they were using that memory.
In other words, if you want to clear memory, you most definitely need to overwrite it, just as #hakre suggested.
That said, I hardly see the point of your script. PHP just isn't made for the sort of thing you are doing. You're probably better off with a small dedicated solution rather than using PHP. But this is just my opinion. I think.
I dunno if that works, but if you can in your tests, please add these lines to see the outcome:
...
// Overwrite it:
echo 'Overwrite secret: ';
for($l = strlen($my_secret_key), $i = 0; $i < $l; $i++)
{
$my_secret_key[$i] = '#';
}
And I wonder whether or not running
gc_collect_cycles();
makes a difference. Even the values are free'ed, they might still be in memory (of the scripts pid or even somewhere else in memory space).
I would try whether overwriting memory with some data would eventually erase your original locations of variables:
$buffer = '';
for ($i = 0; $i < 1e6; $i++) {
$buffer .= "\x00";
}
As soon as php releases the memory, I suppose more allocations might be given the same location. It's hardly fail proof though.

Categories