PHP MongoDB findOne Memory Leak - php

I have found some old posts regarding the problem of a memory leak with the php mongodb driver. But none gave a final solution or explenation for older versions of the php mongodb driver.
My Version of the Driver is 1.4.5 (stable)
with PHP 5.3.10
Code with Debug echos:
echo memory_get_usage()." in MB : ";
echo memory_get_usage()/1024/1024;
echo "<br>";
unset($cursor);
$dt = new DateTime($day." 00:00:00", new DateTimeZone($this->timezone));
$mongodate = new MongoDate($dt->getTimestamp());
// print_r($mongodate);
$cursor = $dc->findOne(array('keyword' => $keyword, 'date' => $mongodate));
echo "Cursor loaded Doc (".$cursor['_id'].") : ";
echo memory_get_usage()." in MB : ";
echo memory_get_usage()/1024/1024;
echo "<br>";
** Echos True Memory Usage**
3932160 in MB : 3.75
Cursor geladen Doc (534cdee3c30fd1b8ee0bb641) : 218305980 in MB : 208.1928062439
Code with Debug echos True Memory Usage:
echo memory_get_usage(true)." in MB : ";
echo memory_get_peak_usage(true)/1024/1024;
echo "<br>";
unset($cursor);
$dt = new DateTime($day." 00:00:00", new DateTimeZone($this->timezone));
$mongodate = new MongoDate($dt->getTimestamp());
// print_r($mongodate);
$cursor = $dc->findOne(array('keyword' => $keyword, 'date' => $mongodate));
/*
echo "<pre>";
print_r($cursor);
echo "</pre>";
*/
echo "Cursor loaded Doc (".$cursor['_id'].") : ";
echo memory_get_usage(true)." in MB : ";
echo memory_get_peak_usage(true)/1024/1024;
echo "<br>";
** Echos True Memory Usage**
3932160 in MB : 3.75
Cursor loaded Doc (534cdee3c30fd1b8ee0bb641) : 218628096 in MB : 224.5
So only one Documents causes an encrease by over 200 MB of memory.
bitrs3:PRIMARY> var doc = db.dailies.findOne({"_id" : ObjectId("534cdee3c30fd1b8ee0bb641")})
bitrs3:PRIMARY> Object.bsonsize(doc)
16754823
The document loaded is truely not small, it has 16754823 Bytes so reaches the maximum Bson Size of 16 MB
Still I am wondering if it is normal that the findOne + cursor operation which creates an array out of the results needs so much memory.

You can verify if this is "the cost of doing business with PHP" or if you have found a bug in the driver by serializing the array (using serialize() or even json_encode()) and save it to a file.
Then you unserialize() (or json_decode()) the content of the file and check the memory usage.
If the memory usage is similar you see the actual overhead of PHP typing.
<?php
$mc = new MongoClient;
$collection = $mc->selectCollection("myDB", "myCollection");
$d = $collection->findOne(array("my" => "criteria"));
var_dump(memory_get_usage() / 1024 / 1024);
file_put_contents("serialized.bin", serialize($d));
?>
Then loading it again:
<?php
$val = unserialize(file_get_contents("serialized.bin"));
var_dump(memory_get_usage() / 1024 / 1024);
?>
EDIT: To preemptively clarify any misunderstanding.
A 16Mb MongoDB document does generally not need hundreds of megabytes of memory.
If you however have hundreds of thousands of elements inside that object, then individual element overhead starts to count times every element you have.

Related

PHP 5.3 - how to append contents to a large file without loading in to memory

I'm trying to efficiently write a large amount of data to a file ( I'm relatively new to PHP ) in a legacy system to a file without killing the memory. It is only writing 50 customers at a time , but after a while it slows down considerably , so I assume it is keeping the whole file in memory. Any ideas how I can just append to a file and cope with the file getting very large? Code snippet below. Note: I am stuck with PHP 5.3.
do{
//Tell the collection which page to load.
$collection->setCurPage($currentPage);
$collection->load();
$fp = fopen(Mage::getBaseDir('export') .'/customers.json', 'a');
foreach ($collection as $customer){
//write collection as json
fwrite($fp, "," . json_encode($customer->getData()));
$customerCount++;
}
fclose($fp);
$currentPage++;
//make the collection unload the data in memory so it will pick up the next page when load() is called.
$collection->clear();
echo memory_get_usage() . "\n";
echo "Finished page $currentPage of $pages \n";
} while ($currentPage <= $pages);

PHP Memory exhausted

As I know the solution for this is.
ini_set('memory_limit','-1');
What if even this is not enough.
Problem is I am using a loop and creating and destroying the variables used in loop. But still I have not found the exact reason behind this. that memory utilization after every loop execution increases. my loop is going to run almost 2000 to 10000 times. So even 4gb ram is not going to enough.
As I observed using top commond that memory is using 50mb at the begining of loops, once loop goes on it increases size 10 to 15 mb after every iteration. So my code is not getting executed completely.
ini_set('memory_limit', '-1');
ini_set('xdebug.max_nesting_level', 1000);
$ex_data = some data;
$config = some data;
$docConf = some data;
$codeNameIndex = some data;
$originalName = some data;
CONST LIMIT = 3000;
CONST START = 1000;
//till here it is using 55 to 6o mb memory
for ($i = self::START; $i < (self::START + self::LIMIT); $i++) {
$start_memory = memory_get_usage();
$object = new ImportProjectController();
$object->ex_data = $ex_data;
$object->config = $config;
$object->docConf = $docConf;
$StratProInsertDateTime = microtime(true);
try {
DB::connection()->getPdo()->beginTransaction();
$object->ex_data[$codeNameIndex[2]][$codeNameIndex[1]] = $originalName . '_' . $i;
$object->ex_data[$codeCodeIndex[2]][$codeCodeIndex[1]] = $originalCode . '_' . $i;
if (!$object->insert_project()) {
throw new Exception('error while inserting project');
}
if (!$object->insert_documents()) {
throw new Exception('error while inserting documents');
}
App::make('AccessController')->rebuildCache();
DB::connection()->getPdo()->commit();
} catch (Exception $ex) {
DB::connection()->getPdo()->rollBack();
echo $ex;
}
//it is increasing memory utilization every iteration.
echo "Memory used for inserting a ".$i."th project :- ";
echo memory_get_usage() - $start_memory.PHP_EOL;
unset($object->ex_data);
unset($object->config);
unset($object->docConf);
$object = null;
echo "Memory utilization before inserting project :- ";
echo memory_get_usage() - $start_memory.PHP_EOL;
}
$object->insert_project()
$object->insert_documents()
App::make('AccessController')->rebuildCache()
Methods do some database inserts.
As I am unsetting the $object variable at the end of loop. but still it is not releasing the memory. And I am sure there is nothing that occupying the memory in above method.
Swap: 0k total, 0k used, 0k free, 241560k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
27671 ec2-user 20 0 1489m 1.1g 9908 R 66.0 30.4 8:15.00 php
4307 mysql 20 0 852m 140m 5576 S 18.2 3.7 14:21.50 mysqld
Above is the top commond output, as you can clearly see the memory utilization goes to 1.1gb. and it is increasing..
Plz let me know if need more description.
I got answer from my colleague for this problem.
Laravel do a query logging, and all query keeps it in-memory, Thats why I was getting such issue. With the following code my script is running well with the use of only 250mb of memory. Hope this will help to others.
DB::disableQueryLog();

Caching large Array causes memory exhaustion

So I'm trying to cache an array in a file and use it somewhere else.
import.php
// Above code is to get each line in CSV and put in it in an array
// (1 line is 1 multidimensional array) - $csv
$export = var_export($csv, true);
$content = "<?php \$data=" . $export . ";?>";
$target_path1 = "/var/www/html/Samples/test";
file_put_contents($target_path1 . "recordset.php", $content);
somewhere.php
ini_set('memory_limit','-1');
include_once("/var/www/html/Samples/test/recordset.php");
print_r($data);
Now, I've included recordset.php in somewhere.php to use the array stored in it. It works fine when the uploaded CSV file has 5000 lines, now if i try to upload csv with 50000 lines for example, i'm getting a fatal error:
Fatal error: Allowed memory size of 67108864 bytes exhausted (tried to allocate 79691776 bytes)
How can I fix it or is there a possible way to achieve what i want in a more convenient way? Speaking about the performance... Should i consider the CPU of the server? I've override the memory limit and set it to -1 in somewhere.php
There are 2 ways to fix this:
You need to increase memory(RAM) on the server as memory_limit can only use memory which is available on server. And it seems that you have very low RAM available for PHP.
To Check the total RAM on Linux server:
<?php
$fh = fopen('/proc/meminfo','r');
$mem = 0;
while ($line = fgets($fh)) {
$pieces = array();
if (preg_match('/^MemTotal:\s+(\d+)\skB$/', $line, $pieces)) {
$mem = $pieces[1];
break;
}
}
fclose($fh);
echo "$mem kB RAM found"; ?>
Source: get server ram with php
You should parse your CSV file in chunks & every time release occupied memory using unset function.

Where PHP stores static variables?

Here is a result of a small test to measure performance between 3 storages: mysql, memcached and php static.
I'm interested in data only during script execution, so static is acceptable.
Steps:
filling up storages with 15000 objects (uuid, some name and 1 KB of random data)
fetch 5000 uuid keys
loop and search each key in 3 storages
Resuts:
Object count: 15000
Search requests: 5000
================================
[filling] mysql
time: 175.268 s
memory: 2.519 Mb
================================
[filling] memcached
time: 7.79517 s
memory: 1.9455 Mb
================================
[filling] static
time: 0.229687 s
memory: 3.7875 Mb
================================
[search] mysql
time: 3.12171 s
memory: 0.991821 Mb
================================
[search] memcached
time: 1.41455 s
memory: 0.686646 Mb
================================
[search] static
time: 0.0488684 s
memory: 0.762939 Mb
Time and memory are summarized for 5000 search requests.
Measure code
$timeInit = (float) microtime(true);
$memoryInit = (float) memory_get_usage(false);
function(); // measured operations
$timeFinish = (float) microtime(true);
$memoryFinish = (float) memory_get_usage(false);
$time = $timeFinish - $timeInit;
$memory = ($memoryFinish - $memoryInit)/1024/1024;
So, memcache and mysql store their data outside of php. But where is "static" memory usage?
Could static storage cause lack of memory ("PHP Memory Exhausted Error") during the script?
And what is the best (static or memcache) if I using data only during execution?
Solved
Seems that the problem is in variable assignment. This code doesn't produce memory allocation for copied variable.
$object = [
'guid' => Guid::generate(),
'name' => substr(md5(rand()), 0, 7),
'data' => openssl_random_pseudo_bytes(10*1024),
];
// start measure
self::$data[$object['guid']] = $object;
// stop measure
But this shows normal memory usage
self::$data[$object['guid']] = serialize($object);
PHP behavior for this situation explained here
How PHP manages variables
PHP creates a pointer, until we made meanful changes to variable
self::$data[$object['guid']] = $object;
self::$data[$object['guid']]['data'] = $object['data']; // reference
self::$data[$object['guid']]['data'] = $object['data'] . ''; // copy

PHP Memory Debugging

For one off my projects I need to import a very huge text file ( ~ 950MB ). I'm using Symfony2 & Doctrine 2 for my project.
My problem is that I get errors like:
Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 24 bytes)
The error even occurs if I increase the memory limit to 1GB.
I tried to analyze the problem by using XDebug and KCacheGrind ( as part of PHPEdit ), but I don't really understand the values :(
I'am looking for a tool or a method (Quick & Simple due to the fact that I don't have much time) to find out why memory is allocated and not freed again.
Edit
To clear some things up here is my code:
$handle = fopen($geonameBasePath . 'allCountries.txt','r');
$i = 0;
$batchSize = 100;
if($handle) {
while (($buffer = fgets($handle,16384)) !== false) {
if( $buffer[0] == '#') //skip comments
continue;
//split parts
$parts = explode("\t",$buffer);
if( $parts[6] != 'P')
continue;
if( $i%$batchSize == 0 ) {
echo 'Flush & Clear' . PHP_EOL;
$em->flush();
$em->clear();
}
$entity = $em->getRepository('MyApplicationBundle:City')->findOneByGeonameId( $parts[0] );
if( $entity !== null) {
$i++;
continue;
}
//create city object
$city = new City();
$city->setGeonameId( $parts[0] );
$city->setName( $parts[1] );
$city->setInternationalName( $parts[2] );
$city->setLatitude($parts[4] );
$city->setLongitude( $parts[5] );
$city->setCountry( $em->getRepository('MyApplicationBundle:Country')->findOneByIsoCode( $parts[8] ) );
$em->persist($city);
unset($city);
unset($entity);
unset($parts);
unset($buffer);
echo $i . PHP_EOL;
$i++;
}
}
fclose($handle);
Things I have tried, but nothing helped:
Adding second parameter to fgets
Increasing memory_limit
Unsetting vars
Increasing memory limit is not going to be enough. When importing files like that, you buffer the reading.
$f = fopen('yourfile');
while ($data = fread($f, '4096') != 0) {
// Do your stuff using the read $data
}
fclose($f);
Update :
When working with an ORM, you have to understand that nothing is actually inserted in the database until the flush call. Meaning all those objects are stored by the ORM tagged as "to be inserted". Only when the flush call is made, the ORM will check the collection and start inserting.
Solution 1 : Flush often. And clear.
Solution 2 : Don't use the ORM. Go for plain SQL command. They will take up far less memory than the object + ORM solution.
33554432 are 32MB
change memory limit in php.ini for example 75MB
memory_limit = 75M
and restart server
Instead of simply reading the file, you should read the file line by line. Every time you do read the one line you should process your data. Do NOT try to fit EVERYTHING in memory. You will fail. The reason for that is that while you can put the TEXT file in ram, you will not be able to also have the data as php objects/variables/whathaveyou at the same time, since php by itself needs much larger amounts of memory for each of them.
What I instead suggest is
a) read a new line,
b) parse the data in the line
c) create the new object to store in the database
d) goto step a, by unset(ting) the old object first or reusing it's memory

Categories