I'm having a Symfony Command that uses the Doctrine Paginator on PHP 7.0.22. The command must process data from a large table, so I do it in chunks of 100 items. The issue is that after a few hundred loops it gets to fill 256M RAM. As measures against OOM (out-of-memory) I use:
$em->getConnection()->getConfiguration()->setSQLLogger(null); - disables the sql logger, that fills memory with logged queries for scripts running many sql commands
$em->clear(); - detaches all objects from Doctrine at the end of every loop
I've put some dumps with memory_get_usage() to check what's going on and it seems that the collector doesn't clean as much as the command adds at every $paginator->getIterator()->getArrayCopy(); call.
I've even tried to manually collect the garbage every loop with gc_collect_cycles(), but still no difference, the command starts using 18M and increases with ~2M every few hundred items. Also tried to manually unset the results and the query builder... nothing. I removed all the data processing and kept only the select query and the paginator and got the same behaviour.
Anyone has any idea where I should look next?
Note: 256M should be more than enough for this kind of operations, so please don't recommend solutions that suggest increasing allowed memory.
The striped down execute() method looks something like this:
protected function execute(InputInterface $input, OutputInterface $output)
{
// Remove SQL logger to avoid out of memory errors
$em = $this->getEntityManager(); // method defined in base class
$em->getConnection()->getConfiguration()->setSQLLogger(null);
$firstResult = 0;
// Get latest ID
$maxId = $this->getMaxIdInTable('AppBundle:MyEntity'); // method defined in base class
$this->getLogger()->info('Working for max media id: ' . $maxId);
do {
// Get data
$dbItemsQuery = $em->createQueryBuilder()
->select('m')
->from('AppBundle:MyEntity', 'm')
->where('m.id <= :maxId')
->setParameter('maxId', $maxId)
->setFirstResult($firstResult)
->setMaxResults(self::PAGE_SIZE)
;
$paginator = new Paginator($dbItemsQuery);
$dbItems = $paginator->getIterator()->getArrayCopy();
$totalCount = count($paginator);
$currentPageCount = count($dbItems);
// Clear Doctrine objects from memory
$em->clear();
// Update first result
$firstResult += $currentPageCount;
$output->writeln($firstResult);
}
while ($currentPageCount == self::PAGE_SIZE);
// Finish message
$output->writeln("\n\n<info>Done running <comment>" . $this->getName() . "</comment></info>\n");
}
The memory leak was generated by Doctrine Paginator. I replaced it with native query using Doctrine prepared statements and fixed it.
Other things that you should take into consideration:
If you are replacing the Doctrine Paginator, you should rebuild the pagination functionality, by adding a limit to your query.
Run your command with --no-debug flag or -env=prod or maybe both. The thing is that the commands are running by default in the dev environment. This enables some data collectors that are not used in the prod environment. See more on this topic in the Symfony documentation - How to Use the Console
Edit: In my particular case I was also using the bundle eightpoints/guzzle-bundle that implements the HTTP Guzzle library (had some API calls in my command). This bundle was also leaking, apparently through some middleware. To fix this, I had to instantiate the Guzzle client independently, without the EightPoints bundle.
Related
I have an API written in Laravel. There is the following code in it:
public function getData($cacheKey)
{
if(Cache::has($cacheKey)) {
return Cache::get($cacheKey);
}
// if cache is empty for the key, get data from external service
$dataFromService = $this->makeRequest($cacheKey);
$dataMapped = array_map([$this->transformer, 'transformData'], $dataFromService);
Cache::put($cacheKey, $dataMapped);
return $dataMapped;
}
In getData() if cache contains necessary key, data returned from cache.
If cache does not have necessary key, data is fetched from external API, processed and placed to cache and after that returned.
The problem is: when there are many concurrent requests to the method, data is corrupted. I guess, data is written to cache incorrectly because of race conditions.
You seem to be experiencing some sort of critical section problem. But here's the thing. Redis operations are atomic however Laravel does its own checks before calling Redis.
The major issue here is that all concurrent requests will all cause a request to be made and then all of them will write the results to the cache (which is definitely not good). I would suggest implementing a simple mutual exclusion lock on your code.
Replace your current method body with the following:
public function getData($cacheKey)
{
$mutexKey = "getDataMutex";
if (!Redis::setnx($mutexKey,true)) {
//Already running, you can either do a busy wait until the cache key is ready or fail this request and assume that another one will succeed
//Definately don't trust what the cache says at this point
}
$value = Cache::rememberForever($cacheKey, function () { //This part is just the convinience method, it doesn't change anything
$dataFromService = $this->makeRequest($cacheKey);
$dataMapped = array_map([$this->transformer, 'transformData'], $dataFromService);
return $dataMapped;
});
Redis::del($mutexKey);
return $value;
}
setnx is a native redis command that sets a value if it doesn't exist already. This is done atomically so it can be used to implement a simple locking mechanism, but (as mentioned in the manual) will not work if you're using a redis cluster. In that case the redis manual describes a method to implement distributed locks
In the end I came to the following solution: I use retry() function from Laravel 5.5 helpers to get cache value until it is written there normally with interval of 1 second.
I'm creating a console command for my bundle with Symfony 2. This command execute several request to database (Mysql). In order to debug my command I need to know how much SQL query has been executed during the command execution. And if it's possible, show these requests (like the Symfony profiler do)
I have the same problem with AJAX requests. When I make an AJAX request, I can't know how much query have been executed during the request.
You can enable the doctrine logging like :
$doctrine = $this->get('doctrine');
$doctrine = $this->getDoctrine();
$em = $doctrine->getConnection();
// $doctrine->getManager() did not work for me
// (resulted in $stack->queries being empty array)
$stack = new \Doctrine\DBAL\Logging\DebugStack();
$em->getConfiguration()->setSQLLogger($stack);
... // do some queries
var_dump($stack->queries);
You can go to see that : http://vvv.tobiassjosten.net/symfony/logging-doctrine-queries-in-symfony2/
To return to Cesar what Cesar own. I find it here : Count queries to database in Doctrine2
You can put all this logic into domain model and treat command only as an invoker. Then you can use the same domain model with controller using www and profiler to diagnose.
Second thing is that you should have integration test for this and you can verify execution time with this test.
I’m trying to insert a large amount of data (30 000+ lines) in a MySQL database using Doctrine2 and the Symfony2 fixture bundle.
I looked at the right way to do it. I saw lots of questions about memory leaks and Doctrine, but no satisfying answer for me. It often comes the Doctrine clear() function.
So, I did various shapes of this:
while (($data = getData()) {
$iteration++;
$obj = new EntityObject();
$obj->setName('henry');
// Fill object...
$manager->persist($obj);
if ($iteration % 500 == 0) {
$manager->flush();
$manager->clear();
// Also tried some sort of:
// $manager->clear($obj);
// $manager->detach($obj);
// gc_collect_cycles();
}
}
PHP memory still goes wild, right after the flush() (I’m sure of that). In fact, every time the entities are flushed, memory goes up for a certain amount depending on batch size and the entities, until it reaches the deadly Allowed Memory size exhausted error. With a very very tiny entity, it works but memory consumption increase too much: several MB whereas it should be KB.
clear(), detach() or calling GC doesn’t seem to have an effect at all. It only clears some KB.
Is my approach flawed? Did I miss something, somewhere? Is it a bug?
More info:
Without flush() memory barely moves;
Lowering the batch do not change the outcome;
Data comes from a CSV that need to be sanitized;
EDIT (partial solution):
#qooplmao brought a solution that significantly decrease memory consumption, disable doctrine sql logger: $manager->getConnection()->getConfiguration()->setSQLLogger(null);
However, it is still abnormally high and increasing.
I resolved my problem using this resource, as #Axalix suggested.
This is how I modified the code:
// IMPORTANT - Disable the Doctrine SQL Logger
$manager->getConnection()->getConfiguration()->setSQLLogger(null);
// SUGGESION - make getData as a generator (using yield) to to save more memory.
while ($data = getData()) {
$iteration++;
$obj = new EntityObject();
$obj->setName('henry');
// Fill object...
$manager->persist($obj);
// IMPORTANT - Temporary store entities (of course, must be defined first outside of the loop)
$tempObjets[] = $obj;
if ($iteration % 500 == 0) {
$manager->flush();
// IMPORTANT - clean entities
foreach($tempObjets as $tempObject) {
$manager->detach($tempObject);
}
$tempObjets = null;
gc_enable();
gc_collect_cycles();
}
}
// Do not forget the last flush
$manager->flush();
And, last but not least, as I use this script with Symfony data fixtures, adding the --no-debug parameter in the command is also very important. Then memory consumption is stable.
I found out that Doctrine logs all SQLs during execute. I recommend to disable it with code below, it can really save memory:
use Doctrine\ORM\EntityManagerInterface;
public function __construct(EntityManagerInterface $entity_manager)
{
$em_connection = $entity_manager->getConnection();
$em_connection->getConfiguration()->setSQLLogger(null);
}
My suggestion is to drop the Doctrine approach for bulk inserts. I really like Doctrine but I just hate this kind of stuff on bulk inserts.
MySQL has a great thing called LOAD DATA. I would rather use it or even if I have to sanitize my csv first and do the LOAD after.
If you need to change the values, I would read csv to array $csvData = array_map("str_getcsv", file($csv));. Change whatever you need on the array and save it to the line. After that, use the new .csv to LOAD with MySQL.
To support my claims on why I wouldn't use Doctrine for this here described on the top.
I was having a memory leak problem in a Doctrine2 script that was aparently caused by a piece of code that was supposed to eliminate memory problems.
Before I knew you could (and should) clear the Entity Manager, every 20 iterations i did the following:
if ($this->usersCalculated % 20 == 0) {
$this->em->close();
$this->em = \Bootstrap::createEm();
$this->loadRepositories();
}
And the Bootstrap::createEm looks like this:
public static function createEm() {
$em = EntityManager::create(Bootstrap::$connectionOptions, Bootstrap::$config);
$em->getConnection()->setCharset('utf8');
return $em;
}
The reason that I recreated the Entity Manager in the first place was because my UnitOfWork was growing wild and I didn't know about the $em->clear() method.
So, even if my current memory leak seems solved at the moment (or at least reduced), i still have to create a new Entity Manager whenever I need to do a separate insert/update query without relying that someone else do the flush. For example, whenever I send an email, I insert a row in the database to indicate so, and the code looks like this:
$emailSent = new \model\EmailSent();
$emailSent->setStuff();
// I do it in a new em to not affect whatever currentunit was doing.
$newEm = \Bootstrap::createEm();
$newEm->persist($emailSent);
$newEm->flush();
$newEm->close();
From what I've learned from before, that leaves some memory leaked behind.
So my question is, what am I doing wrong here? why is this leaking memory and how should I really close/recreate an entity manager?
Have you tried:
$this->em->getConnection()->getConfiguration()->setSQLLogger(null);
I've read that this turns off the SQL Logger which is not cleared and sometimes produces memory leaks like you are experiencing.
Have you tried actually using the clear method instead of close?
I hope this helps you---> Batch Processing
I'm having problems with a batch insertion of objects into a database using symfony 1.4 and doctrine 1.2.
My model has a certain kind of object called "Sector", each of which has several objects of type "Cupo" (usually ranging from 50 up to 200000). These objects are pretty small; just a short identifier string and one or two integers. Whenever a group of Sectors are created by the user, I need to automatically add all these instances of "Cupo" to the database. In case anything goes wrong, I'm using a doctrine transaction to roll back everything. The problem is that I can only create around 2000 instances before php runs out of memory. It currently has a 128MB limit, which should be more than enough for handling objects that use less than 100 bytes. I've tried increasing the memory limit up to 512MB, but php still crashes and that doesn't solve the problem. Am I doing the batch insertion correctly or is there a better way?
Here's the error:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 71 bytes) in /Users/yo/Sites/grifoo/lib/vendor/symfony/lib/log/sfVarLogger.class.php on line 170
And here's the code:
public function save($conn=null){
$conn=$conn?$conn:Doctrine_Manager::connection();
$conn->beginTransaction();
try {
$evento=$this->object;
foreach($evento->getSectores() as $s){
for($j=0;$j<$s->getCapacity();$j++){
$cupo=new Cupo();
$cupo->setActivo($s->getActivo());
$cupo->setEventoId($s->getEventoId());
$cupo->setNombre($j);
$cupo->setSector($s);
$cupo->save();
}
}
$conn->commit();
return;
}
catch (Exception $e) {
$conn->rollback();
throw $e;
}
Once again, this code works fine for less than 1000 objects, but anything bigger than 1500 fails. Thanks for the help.
Tried doing
$cupo->save();
$cupo->free();
$cupo = null;
(But substituting my code) And I'm still getting memory overflows. Any other ideas, SO?
Update:
I created a new environment in my databases.yml, that looks like:
all:
doctrine:
class: sfDoctrineDatabase
param:
dsn: 'mysql:host=localhost;dbname=.......'
username: .....
password: .....
profiler: false
The profiler: false entry disables doctrine's query logging, that normally keeps a copy of every query you make. It didn't stop the memory leakage, but I was able to get about twice as far through my data importing as I was without it.
Update 2
I added
Doctrine_Manager::connection()->setAttribute(Doctrine_Core::ATTR_AUTO_FREE_QUERY_OBJECTS, true );
before running my queries, and changed
$cupo = null;
to
unset($cupo);
And now my script has been churning away happily. I'm pretty sure it will finish without running out of RAM this time.
Update 3
Yup. That's the winning combo.
I have just did "daemonized" script with symfony 1.4 and setting the following stopped the memory hogging:
sfConfig::set('sf_debug', false);
For a symfony task, I also faced to this issue and done following things. It worked for me.
Disable debug mode. Add following before db connection initialize
sfConfig::set('sf_debug', false);
Set auto query object free attribute for db connection
$connection->setAttribute(Doctrine_Core::ATTR_AUTO_FREE_QUERY_OBJECTS, true );
Free all object after use
$object_name->free()
Unset all arrays after use unset($array_name)
Check all doctrine queries used on task. Free all queries after use. $q->free()
(This is a good practice for any time of query using.)
That's all. Hope it may help someone.
Doctrine leaks and there's not much you can do about it. Make sure you use $q->free() whenever applicable to minimize the effect.
Doctrine is not meant for maintenance scripts. The only way to work around this problem is to break you script to parts which will perform part of the task. One way to do that is to add a start parameter to your script and after a certain amount of objects had been processed, the script redirects to itself with a higher start value. This works well for me although it makes writing maintenance scripts more cumbersome.
Try to unset($cupo); after every saving. This should be help. An other thing is to split the script and do some batch processing.
Try to break circular reference which usually cause memory leaks with
$cupo->save();
$cupo->free(); //this call
as described in Doctrine manual.
For me , I've just initialized the task like that:
// initialize the database connection
$databaseManager = new sfDatabaseManager($this->configuration);
$connection = $databaseManager->getDatabase($options['connection'])->getConnection();
$config = ProjectConfiguration::getApplicationConfiguration('frontend', 'prod', true);
sfContext::createInstance($config);
(WITH PROD CONFIG)
and use free() after a save() on doctrine's object
the memory is stable at 25Mo
memory_get_usage=26.884071350098Mo
with php 5.3 on debian squeeze
Periodically close and re-open the connection - not sure why but it seems PDO is retaining references.
What is working for me is calling the free method like this:
$cupo->save();
$cupo->free(true); // free also the related components
unset($cupo);