Memory usage goes wild with Doctrine bulk insert - php

I’m trying to insert a large amount of data (30 000+ lines) in a MySQL database using Doctrine2 and the Symfony2 fixture bundle.
I looked at the right way to do it. I saw lots of questions about memory leaks and Doctrine, but no satisfying answer for me. It often comes the Doctrine clear() function.
So, I did various shapes of this:
while (($data = getData()) {
$iteration++;
$obj = new EntityObject();
$obj->setName('henry');
// Fill object...
$manager->persist($obj);
if ($iteration % 500 == 0) {
$manager->flush();
$manager->clear();
// Also tried some sort of:
// $manager->clear($obj);
// $manager->detach($obj);
// gc_collect_cycles();
}
}
PHP memory still goes wild, right after the flush() (I’m sure of that). In fact, every time the entities are flushed, memory goes up for a certain amount depending on batch size and the entities, until it reaches the deadly Allowed Memory size exhausted error. With a very very tiny entity, it works but memory consumption increase too much: several MB whereas it should be KB.
clear(), detach() or calling GC doesn’t seem to have an effect at all. It only clears some KB.
Is my approach flawed? Did I miss something, somewhere? Is it a bug?
More info:
Without flush() memory barely moves;
Lowering the batch do not change the outcome;
Data comes from a CSV that need to be sanitized;
EDIT (partial solution):
#qooplmao brought a solution that significantly decrease memory consumption, disable doctrine sql logger: $manager->getConnection()->getConfiguration()->setSQLLogger(null);
However, it is still abnormally high and increasing.

I resolved my problem using this resource, as #Axalix suggested.
This is how I modified the code:
// IMPORTANT - Disable the Doctrine SQL Logger
$manager->getConnection()->getConfiguration()->setSQLLogger(null);
// SUGGESION - make getData as a generator (using yield) to to save more memory.
while ($data = getData()) {
$iteration++;
$obj = new EntityObject();
$obj->setName('henry');
// Fill object...
$manager->persist($obj);
// IMPORTANT - Temporary store entities (of course, must be defined first outside of the loop)
$tempObjets[] = $obj;
if ($iteration % 500 == 0) {
$manager->flush();
// IMPORTANT - clean entities
foreach($tempObjets as $tempObject) {
$manager->detach($tempObject);
}
$tempObjets = null;
gc_enable();
gc_collect_cycles();
}
}
// Do not forget the last flush
$manager->flush();
And, last but not least, as I use this script with Symfony data fixtures, adding the --no-debug parameter in the command is also very important. Then memory consumption is stable.

I found out that Doctrine logs all SQLs during execute. I recommend to disable it with code below, it can really save memory:
use Doctrine\ORM\EntityManagerInterface;
public function __construct(EntityManagerInterface $entity_manager)
{
$em_connection = $entity_manager->getConnection();
$em_connection->getConfiguration()->setSQLLogger(null);
}

My suggestion is to drop the Doctrine approach for bulk inserts. I really like Doctrine but I just hate this kind of stuff on bulk inserts.
MySQL has a great thing called LOAD DATA. I would rather use it or even if I have to sanitize my csv first and do the LOAD after.
If you need to change the values, I would read csv to array $csvData = array_map("str_getcsv", file($csv));. Change whatever you need on the array and save it to the line. After that, use the new .csv to LOAD with MySQL.
To support my claims on why I wouldn't use Doctrine for this here described on the top.

Related

Doctrine Paginator fills up memory

I'm having a Symfony Command that uses the Doctrine Paginator on PHP 7.0.22. The command must process data from a large table, so I do it in chunks of 100 items. The issue is that after a few hundred loops it gets to fill 256M RAM. As measures against OOM (out-of-memory) I use:
$em->getConnection()->getConfiguration()->setSQLLogger(null); - disables the sql logger, that fills memory with logged queries for scripts running many sql commands
$em->clear(); - detaches all objects from Doctrine at the end of every loop
I've put some dumps with memory_get_usage() to check what's going on and it seems that the collector doesn't clean as much as the command adds at every $paginator->getIterator()->getArrayCopy(); call.
I've even tried to manually collect the garbage every loop with gc_collect_cycles(), but still no difference, the command starts using 18M and increases with ~2M every few hundred items. Also tried to manually unset the results and the query builder... nothing. I removed all the data processing and kept only the select query and the paginator and got the same behaviour.
Anyone has any idea where I should look next?
Note: 256M should be more than enough for this kind of operations, so please don't recommend solutions that suggest increasing allowed memory.
The striped down execute() method looks something like this:
protected function execute(InputInterface $input, OutputInterface $output)
{
// Remove SQL logger to avoid out of memory errors
$em = $this->getEntityManager(); // method defined in base class
$em->getConnection()->getConfiguration()->setSQLLogger(null);
$firstResult = 0;
// Get latest ID
$maxId = $this->getMaxIdInTable('AppBundle:MyEntity'); // method defined in base class
$this->getLogger()->info('Working for max media id: ' . $maxId);
do {
// Get data
$dbItemsQuery = $em->createQueryBuilder()
->select('m')
->from('AppBundle:MyEntity', 'm')
->where('m.id <= :maxId')
->setParameter('maxId', $maxId)
->setFirstResult($firstResult)
->setMaxResults(self::PAGE_SIZE)
;
$paginator = new Paginator($dbItemsQuery);
$dbItems = $paginator->getIterator()->getArrayCopy();
$totalCount = count($paginator);
$currentPageCount = count($dbItems);
// Clear Doctrine objects from memory
$em->clear();
// Update first result
$firstResult += $currentPageCount;
$output->writeln($firstResult);
}
while ($currentPageCount == self::PAGE_SIZE);
// Finish message
$output->writeln("\n\n<info>Done running <comment>" . $this->getName() . "</comment></info>\n");
}
The memory leak was generated by Doctrine Paginator. I replaced it with native query using Doctrine prepared statements and fixed it.
Other things that you should take into consideration:
If you are replacing the Doctrine Paginator, you should rebuild the pagination functionality, by adding a limit to your query.
Run your command with --no-debug flag or -env=prod or maybe both. The thing is that the commands are running by default in the dev environment. This enables some data collectors that are not used in the prod environment. See more on this topic in the Symfony documentation - How to Use the Console
Edit: In my particular case I was also using the bundle eightpoints/guzzle-bundle that implements the HTTP Guzzle library (had some API calls in my command). This bundle was also leaking, apparently through some middleware. To fix this, I had to instantiate the Guzzle client independently, without the EightPoints bundle.

Is there a way to free the memory allotted to a Symfony 2 form object?

I am trying to optimize an import operation that uses Symfony 2 forms to validate data during the import process, a bit like this much simplified example:
/// For each imported row
foreach ($importedData as $row) {
$user = new User;
$userType = new UserType;
// !! The 250 KB of memory allotted in this line is never released again !!
$form = $symfonyFormFactory->create($userType, $user);
// This line allots a bunch of memory as well,
// but it's released when the EM is flushed at the end
$form->bind($row);
if (!$form->isValid()) {
// Error handling
}
$entityManager->persist($user);
$entityManager->flush();
}
Each one of those loops, memory usage is increasing by around 250 KB, which is crippling for large imports.
I have discovered that the memory leak is coming from the $form = $symfonyFormFactory->create(new UserType, $user); line.
EDIT: The bulk of the memory was being used by the entity manager, not the form component (see accepted answer). But the loop is still taking 55 KB each loop, which is better than 250 KB but could be better. Just not today.
Try to disable SQL logging to reduce memory usage
$entityManager->getConnection()->getConfiguration()->setSQLLogger(null)
Also i just found similar question here
Are you sure you don't want to release entity object? First of all, don't flush data in every iteration. Lets say persist 50 objects - I don't know how big is your import data - and flush them together, then clear the object by simply calling $entityManager->clear();

How to correctly close an Entity Manager in doctrine

I was having a memory leak problem in a Doctrine2 script that was aparently caused by a piece of code that was supposed to eliminate memory problems.
Before I knew you could (and should) clear the Entity Manager, every 20 iterations i did the following:
if ($this->usersCalculated % 20 == 0) {
$this->em->close();
$this->em = \Bootstrap::createEm();
$this->loadRepositories();
}
And the Bootstrap::createEm looks like this:
public static function createEm() {
$em = EntityManager::create(Bootstrap::$connectionOptions, Bootstrap::$config);
$em->getConnection()->setCharset('utf8');
return $em;
}
The reason that I recreated the Entity Manager in the first place was because my UnitOfWork was growing wild and I didn't know about the $em->clear() method.
So, even if my current memory leak seems solved at the moment (or at least reduced), i still have to create a new Entity Manager whenever I need to do a separate insert/update query without relying that someone else do the flush. For example, whenever I send an email, I insert a row in the database to indicate so, and the code looks like this:
$emailSent = new \model\EmailSent();
$emailSent->setStuff();
// I do it in a new em to not affect whatever currentunit was doing.
$newEm = \Bootstrap::createEm();
$newEm->persist($emailSent);
$newEm->flush();
$newEm->close();
From what I've learned from before, that leaves some memory leaked behind.
So my question is, what am I doing wrong here? why is this leaking memory and how should I really close/recreate an entity manager?
Have you tried:
$this->em->getConnection()->getConfiguration()->setSQLLogger(null);
I've read that this turns off the SQL Logger which is not cleared and sometimes produces memory leaks like you are experiencing.
Have you tried actually using the clear method instead of close?
I hope this helps you---> Batch Processing

Where is my memory leak in this Doctrine 1.2 code?

I am trying to reduce my memory usage on a large loop script so I made this little test. Using Doctrine I run this code:
$new_user_entry = getById($new_user_entries[0]['id']);
unset($new_user_entry);
$new_user_entry = getById($new_user_entries[1]['id']);
unset($new_user_entry);
function getById($holding_id)
{
return Doctrine_Core::getTable('UserHoldingTable')->findOneById($holding_id);
}
But it leaves about another 50 KB in the memory for each time I do the getById and unset and I don't know why or how to change it. I have a loop that goes through thousands of these plus a couple other functions and this is creating an issue.
I was unable to find a better solution so I gave up on Doctrine for this function and did a manual query with mysqli. This side stepped the issue and everything worked though not ideal.

php/symfony/doctrine memory leak?

I'm having problems with a batch insertion of objects into a database using symfony 1.4 and doctrine 1.2.
My model has a certain kind of object called "Sector", each of which has several objects of type "Cupo" (usually ranging from 50 up to 200000). These objects are pretty small; just a short identifier string and one or two integers. Whenever a group of Sectors are created by the user, I need to automatically add all these instances of "Cupo" to the database. In case anything goes wrong, I'm using a doctrine transaction to roll back everything. The problem is that I can only create around 2000 instances before php runs out of memory. It currently has a 128MB limit, which should be more than enough for handling objects that use less than 100 bytes. I've tried increasing the memory limit up to 512MB, but php still crashes and that doesn't solve the problem. Am I doing the batch insertion correctly or is there a better way?
Here's the error:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 71 bytes) in /Users/yo/Sites/grifoo/lib/vendor/symfony/lib/log/sfVarLogger.class.php on line 170
And here's the code:
public function save($conn=null){
$conn=$conn?$conn:Doctrine_Manager::connection();
$conn->beginTransaction();
try {
$evento=$this->object;
foreach($evento->getSectores() as $s){
for($j=0;$j<$s->getCapacity();$j++){
$cupo=new Cupo();
$cupo->setActivo($s->getActivo());
$cupo->setEventoId($s->getEventoId());
$cupo->setNombre($j);
$cupo->setSector($s);
$cupo->save();
}
}
$conn->commit();
return;
}
catch (Exception $e) {
$conn->rollback();
throw $e;
}
Once again, this code works fine for less than 1000 objects, but anything bigger than 1500 fails. Thanks for the help.
Tried doing
$cupo->save();
$cupo->free();
$cupo = null;
(But substituting my code) And I'm still getting memory overflows. Any other ideas, SO?
Update:
I created a new environment in my databases.yml, that looks like:
all:
doctrine:
class: sfDoctrineDatabase
param:
dsn: 'mysql:host=localhost;dbname=.......'
username: .....
password: .....
profiler: false
The profiler: false entry disables doctrine's query logging, that normally keeps a copy of every query you make. It didn't stop the memory leakage, but I was able to get about twice as far through my data importing as I was without it.
Update 2
I added
Doctrine_Manager::connection()->setAttribute(Doctrine_Core::ATTR_AUTO_FREE_QUERY_OBJECTS, true );
before running my queries, and changed
$cupo = null;
to
unset($cupo);
And now my script has been churning away happily. I'm pretty sure it will finish without running out of RAM this time.
Update 3
Yup. That's the winning combo.
I have just did "daemonized" script with symfony 1.4 and setting the following stopped the memory hogging:
sfConfig::set('sf_debug', false);
For a symfony task, I also faced to this issue and done following things. It worked for me.
Disable debug mode. Add following before db connection initialize
sfConfig::set('sf_debug', false);
Set auto query object free attribute for db connection
$connection->setAttribute(Doctrine_Core::ATTR_AUTO_FREE_QUERY_OBJECTS, true );
Free all object after use
$object_name->free()
Unset all arrays after use unset($array_name)
Check all doctrine queries used on task. Free all queries after use. $q->free()
(This is a good practice for any time of query using.)
That's all. Hope it may help someone.
Doctrine leaks and there's not much you can do about it. Make sure you use $q->free() whenever applicable to minimize the effect.
Doctrine is not meant for maintenance scripts. The only way to work around this problem is to break you script to parts which will perform part of the task. One way to do that is to add a start parameter to your script and after a certain amount of objects had been processed, the script redirects to itself with a higher start value. This works well for me although it makes writing maintenance scripts more cumbersome.
Try to unset($cupo); after every saving. This should be help. An other thing is to split the script and do some batch processing.
Try to break circular reference which usually cause memory leaks with
$cupo->save();
$cupo->free(); //this call
as described in Doctrine manual.
For me , I've just initialized the task like that:
// initialize the database connection
$databaseManager = new sfDatabaseManager($this->configuration);
$connection = $databaseManager->getDatabase($options['connection'])->getConnection();
$config = ProjectConfiguration::getApplicationConfiguration('frontend', 'prod', true);
sfContext::createInstance($config);
(WITH PROD CONFIG)
and use free() after a save() on doctrine's object
the memory is stable at 25Mo
memory_get_usage=26.884071350098Mo
with php 5.3 on debian squeeze
Periodically close and re-open the connection - not sure why but it seems PDO is retaining references.
What is working for me is calling the free method like this:
$cupo->save();
$cupo->free(true); // free also the related components
unset($cupo);

Categories