Doctrine Batch insert - entity manager clear - php

I would like to insert 10 000 rows to database with batch processing.
In first step I need select some objects from databse, then interate these objects and for each of them persist another object to database.
Here is code example:
$em = $this->getDoctrine()->getManager();
$products = $em->getRepository('MyBundle:Product')->findAll(); // return 10 000 products
$category = $em->getRepository('MyBundle:Category')->find(1);
$batchsize = 100;
foreach ($products as $i => $product) {
$entity = new TestEntity();
$entity->setCategory($category);
$entity->setProduct($product); // MyEntity And Product is OneToOne Mapping with foreign key in MyEntity
$em->persist($entity);
if ($i % $batchsize === 0) {
$em->flush();
$em->clear();
}
}
$em->flush();
$em->clear();
It returns this error:
A new entity was found through the relationship 'Handel\GeneratorBundle\Entity\GenAdgroup#product' that was not configured to cascade persist operations for entity
I think problem is in clear(), that remove all objects in memory including $products and $category.
If I use cascade={"persist"} in association, doctrine insert new category row in db.
After some attempts I made some dirty entity errors.
Am I doing sometihng wrong? What is solution and best practice for this job?
Thanks a lot for answer

solution is just to clear only those objects that are changing/creating. Those one that are constant should be left within EntityManager.
Like this
$em->clear(TestEntity::class);
$em->clear(...);
If you left clear without param it will detach all objects that are current under entity manager. Meaning that if you try to reuse them it will throw error as you get. For instance unique filed will be duplicated and trow that error.

After calling
$em->clear();
Category object becomes unpersisted.
You can try calling $em->merge($category) method on it. But probably the most guaranteed way is to fetch it again.
if ($i % $batchsize === 0) {
$em->flush();
$em->clear();
$category = $em->getRepository('MyBundle:Category')->find(1);
}

Related

Doctrine bulk insert - How to fix "Out of memory" with bulk insert using Doctrine / Symfony 4

When I am trying to get metadata from a supplier I am converting the data to our own metadata format. But because of the sheer size of the imported data the application gets a OutOfMemoryException.
I tried several things. Like pumping up the memory that may be used and also I tried using Doctrine Batch Processing but there is a small problem with this approach. Doctrine data processing is based on a 'for' loop with indexation.
$batchSize = 20;
for ($i = 1; $i <= 10000; ++$i) {
$user = new CmsUser;
$user->setStatus('user');
$user->setUsername('user' . $i);
$user->setName('Mr.Smith-' . $i);
$em->persist($user);
if (($i % $batchSize) === 0) {
$em->flush();
$em->clear(); // Detaches all objects from Doctrine!
}
}
$em->flush(); //Persist objects that did not make up an entire batch
$em->clear();
But the data I import is a multi-layered array which I created in a threedimensional 'foreach' loop:
$this->index = 0;
$batchSize = 100;
foreach ($response as $item) {
$item = new Item;
$item->setName($item->name);
$item->setStatus($item->status);
$em->persist($item);
if (($this->index % $batchSize) === 0) {
$em->flush();
$em->clear();
}
foreach ($item->category as $category) {
$category = new Category;
$category->setName($category->name);
$category->setStatus($category->status);
$em->persist($item);
if (($this->index % $batchSize) === 0) {
$em->flush();
$em->clear();
}
foreach ($category->suppliers as $supplier) {
$supplier = new Supplier;
$supplier->setName($supplier->name);
$supplier->setStatus($supplier->status);
$em->persist($item);
if (($this->index % $batchSize) === 0) {
$em->flush();
$em->clear();
}
}
}
}
$this->em->flush();
This is fictional code to illustrate my problem. With this the application still gets OutOfMemoryException and I do have the feeling that the batching methode isn't working properly.
I would like to get the memory usage down so the application works properly or would like some advice to try and find a other approach to this problem. Like making a background process that just takes care of the import on the background.
The way you've written your nested foreach loops you will obviously consume resources exponentially. I also suspect it's not going to achieve what you really want since you will have a LOT of duplicate Suppliers and Categorys.
Working with full entities in doctrine also carries a tremendous amount of overhead, but it does have some advantages so I'll assume that you are wanting to do so.
My approach to bulk imports like this has been to work from the bottom up. In your case it might be a variant of what I have below. The assumption is that you have data in an existing database, and each existing "entity" in the old database will have its own unique id.
1- Import all suppliers from old db to new db; in the new db have a column named oldId that references the unique id from the old db. Stop to clear cache/memory.
2- Pull all suppliers from the new database into an array indexed by their oldId. I use code like so:
$suppliers = [];
$_suppliers = $this->em->getRepository(Supplier:class)->findAll();
foreach ($_suppliers as $supplier) {
$suppliers[$supplier->getOldId()] = $supplier;
}
3- Repeat step 1 for categories. During the import your old db will have a reference to the oldId of linked suppliers. Although your code does not do this, I assume you want to maintain the link between supplier and category, so you can now reference the supplier by its oldId within a loop over the linked "old" suppliers:
$category->addSupplier($suppliers[ <<oldSupplier Id>> ]);
4- Repeat above for individual items, only this time saving the linked categories.
Obviously there are a lot of tweaks that can improve on this. The main point is that touching each supplier once, then touching each category once, and then touching each item once when done sequentially will be orders of magnitude faster and less resource intensive than trying to tackle in a deeply nested loop.

Doctrine2: calling flush multiple times

I have a question concerning Doctrine and entities on Symfony 2.3.
According to the v2.3 "Documentation Book" chapter "Databases and Doctrine" > Saving Related Entities, the example create simultaneously a new row in both product and category tables and associates product.category_id value with the id of the new category item.
The problem is that the controller action creates a new Product and a new Category anytime it is invoked!
In order to just create a new product and associate its category_id with an existing category id, this is the routing.yml route:
acme_store_create_product_by_category:
path: /create/newproduct/{name}/{categoryId}
defaults: { _controller: AcmeStoreBundle:Default:createProduct }
I made a test passing parameters via URL:
/web/store/create/newproduct/Kayak/12
I did something like this which seems working fine:
public function createProductAction($name, $categoryId)
{
$em = $this->getDoctrine()->getManager();
if ( $em->getRepository("AcmeStoreBundle:Category")->findOneById($categoryId) ) {
$product = new Product();
$product->setName($name);
$product->setPrice(220);
$product->setDescription("This is just a test");
$em->persist($product);
$em->flush();
$newproduct = $em->getRepository("AcmeStoreBundle:Product")->find($product->getId());
/** Create new product and populate $newproduct with its data */
$repository = $em->getRepository("AcmeStoreBundle:Category")->find($categoryId);
$newproduct->setCategory($repository);
$em->persist($newproduct);
$em->flush();
/** Update the id_category field of the new product with parameter $categoryId */
//exit(\Doctrine\Common\Util\Debug::dump($product));
return new Response('Create product ' . $name . ' with category id ' . $categoryId);
} else {
return new Response('It doesn\'t exists any category with id ' . $categoryId);
}
}
My doubt in this case is: Is it a good practice to invoke flush() method two times in the same action ? In this case I would like to create a new product selecting the related category from a "list box".
Thank you in advance!
I think it mostly depends on your application domain. If you run flush two times it means you're running two transactions. In the first one you're persisting a product, in the second one a category. So if the first transaction fails (let's say you have a unique key on the product name and you're trying to persist a product with the same name so you get a duplicate key exception) then ask yourself if it's OK to go on and persist a category. I don't think we can answer that easily here because I think it depends on your application logic, what that endpoint is supposed to do, what happens if you end up having a product and not a category or vice-versa.
You should also consider that if you get an exception during the first transaction your code won't handle that error and the second transaction will therefore fail. When an exception like a duplicate key occurs all entities are detached and the entity manager doesn't know anymore how to manage things. So you'll have to reset it or you're going to get an EntityManager is closed issue.
try {
// first transaction
$entityManager->persist($entityOne);
$entityManager->flush();
} catch (\Exception $e) {
/* ... handle the exception */
$entityManager->resetManager();
}
// now we can safely run a second transaction here
I hope this answers your question :-)
I suggest the edited code snippet.
public function createProductAction($name, $categoryId)
{
$em = $this->getDoctrine()->getManager();
if ( $em->getRepository("AcmeStoreBundle:Category")->findOneById($categoryId) ) {
$repository = $em->getRepository("AcmeStoreBundle:Category")->find($categoryId);
$product = new Product();
$product->setName($name);
$product->setPrice(220);
$product->setDescription("This is just a test");
$product->setCategory($repository);
$em->persist($product);
$em->flush();
return new Response('Create product ' . $name . ' with category id ' . $categoryId);
} else {
return new Response('It doesn\'t exists any category with id ' . $categoryId);
}
}

Delete records in Doctrine

I'm trying to delete a record in Doctrine, but I don't know why it's not deleting.
Here is my Code:
function del_user($id)
{
$single_user = $entityManager->find('Users', $id);
$entityManager->remove($single_user);
$entityManager->flush();
}
Plus: How can I echo query to see what going on here?
This is an old question and doesn't seem to have an answer yet. For reference I am leaving that here for more reference. Also you can check the doctrine documentation
To delete a record, you need to ( assuming you are in your controller ):
// get EntityManager
$em = $this->getDoctrine()->getManager();
// Get a reference to the entity ( will not generate a query )
$user = $em->getReference('ProjectBundle:User', $id);
// OR you can get the entity itself ( will generate a query )
// $user = $em->getRepository('ProjectBundle:User')->find($id);
// Remove it and flush
$em->remove($user);
$em->flush();
Using the first method of getting a reference is usually better if you just want to delete the entity without checking first whether it exists or not, because it will not query the DB and will only create a proxy object that you can use to delete your entity.
If you want to make sure that this ID corresponds to a valid entity first, then the second method is better because it will query the DB for your entity before trying to delete it.
For my understanding if you need to delete a record in doctrine that have a doctrine relationship eg. OneToMany, ManyToMany and association cannot be easy deleted until you set the field that reference to another relation equal to null.
......
you can use this for non relation doctrine
$entityManager=$this->getDoctrine()->getManager();
$single_user=$this->getDoctrine()->getRepository(User::class)->findOneBy(['id'=>$id]);
$entityManager->remove($single_user);
$entityManager->flush();
but for relation doctrine set the field that reference to another relation to null
$entityManager=$this->getDoctrine()->getManager();
$single_user=$this->getDoctrine()->getRepository(User::class)->findOneBy(['id'=>$id]);
{# assume you have field that reference #}
$single_user->setFieldData(null);
$entityManager->remove($single_user);
$entityManager->flush();
do you check your entity as the good comment annotation ?
cascade={"persist", "remove"}, orphanRemoval=true
In a Silex route I do like this, in case it helps someone:
$app->get('/db/order/delete', function (Request $request) use ($app) {
...
$id = $request->query->get('id');
$em = $app['orm.em']; //or wherever your EntityManager is
$order = $em->find("\App\Entity\Orders",$id); //your Entity
if($order){
try{
$em->remove($order);
$em->flush();
}
catch( Exception $e )
{
return new Response( $e->getMessage(), 500 );
}
return new Response( "Success deleting order " . $order->getId(), 200 );
}else{
return new Response("Order Not Found", 500);
}
}
You first need repository.
$entityManager->getRepository('Users')->find($id);
instead of
$single_user = $entityManager->find('Users', $id);
'Users' String is the name of the Users repository in doctrine ( depends if you are using Symfony , Zend . . etc ).
First, You may need to check if 'Users' is your fully qualified class name. If not check, and update it to your class name with the namespace info.
Make sure the object returned by find() is not null or not false and is an instance of your entity class before calling EM's remove().
Regarding your other question, instead of making doctrine return SQL's I just use my database (MySQL) to log all queries (since its just development environment).
try a var_dump() of your $single_user. If it is "null", it doens't exist ?
Also check if "Users" is a valid Entity name (no namespace?), and does the $id reference the PK of the user?
If you want to see the queries that are executed check your mysql/sql/... log or look into Doctrine\DBAL\Logging\EchoSQLLogger

Doctrine 2: weird behavior while batch processing inserts of entities that reference other entities

I am trying out the batch processing method described here:
http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/reference/batch-processing.html
my code looks like this
$limit = 10000;
$batchSize = 20;
$role = $this->em->getRepository('userRole')->find(1);
for($i = 0; $i <= $limit; $i++)
{
$user = new \Entity\User;
$user->setName('name'.$i);
$user->setEmail('email'.$i.'#email.blah');
$user->setPassword('pwd'.$i);
$user->setRole($role);
$this->em->persist($user);
if (($i % $batchSize) == 0) {
$this->em->flush();
$this->em->clear();
}
}
the problem is, that after the first call to em->flush() also the
$role gets detached and for every 20 users a new role with a new id is
created, which is not what i want
is there any workaround available for this situation? only one i could make work is to fetch the user role entity every time in the loop
thanks
clear() detaches all entities managed by the entity manager, so $role is detached too, and trying to persist a detached entity creates a new entity.
You should fetch the role again after clear:
$this->em->clear();
$role = $this->em->getRepository('userRole')->find(1);
Or just create a reference instead:
$this->em->clear();
$role = $this->em->getReference('userRole', 1);
As an alternative to arnaud576875's answer you could detach the $user from the entity manager so that it can be GC'd immediately. Like so:
$this->em->flush();
$this->em->detach($user);
Edit:
As pointed out by Geoff this will only detach the latest created user-object. So this method is not recommended.

Skip Entities while flushing when they are a Duplicate

i'm playing a little bit with Symfony2 and Doctrine2.
I have an Entity that has a unique title for example:
class listItem
{
/**
* #orm:Id
* #orm:Column(type="integer")
* #orm:GeneratedValue(strategy="AUTO")
*/
protected $id;
/**
* #orm:Column(type="string", length="255", unique="true")
* #assert:NotBlank()
*/
protected $title;
now i'm fetching a json and updating my database with those items:
$em = $this->get('doctrine.orm.entity_manager');
foreach($json->value->items as $item) {
$listItem = new ListItem();
$listItem->setTitle($item->title);
$em->persist($listItem);
}
$em->flush();
works fine the first time. but the second time i'm getting an sql error (of course): Integrity constraint violation: 1062 Duplicate entry
sometimes my json file gets updated and some of the items are new, some are not.
Is there a way to tell the entity manager to skip the duplicate files and just insert the new ones?
Whats the best way to do this?
Thanks for all help. Please leave a comment if something is unclear
Edit:
what works for me is doing something like this:
$uniqueness = $em->getRepository('ListItem')->checkUniqueness($item->title);
if(false == $uniqueness) {
continue;
}
$listItem = new ListItem();
$listItem->setTitle($item->title);
$em->persist($listItem);
$em->flush();
}
checkUniqueness is a method in my ListItem Repo that checks if the title is already in my db.
thats horrible. this are 2 database queries for each item. this ends up about 85 database queries for this action.
How about retrieving all the current titles into an array first and checking the inserting title against the current titles in that array
$existingTitles = $em->getRepository('ListItem')->getCurrentTitles();
foreach($json->value->items as $item) {
if (!in_array($item->title, $existingTitles)) {
$listItem = new ListItem();
$listItem->setTitle($item->title);
$em->persist($listItem);
}
}
$em->flush();
getCurrentTitles() would need to be added to ListItem Repo to simply return an array of titles.
This only requires one extra DB query but does cost you more in memory to hold the current titles in an array. There maybe problems with this method if your dataset for ListItem is very big.
If the number of items your want to insert each time isn't too large, you could modify the getCurrentTitles() function to query for all those items with the titles your trying to insert. This way the max amount of $existingTiles you will return will be the size of your insert data list. Then you could perform your checks as above.
// getCurrentTitles() - $newTitles is array of all new titles you want to insert
return $qb->select('title')
->from('Table', 't')
->in('t.title = ', $newTitles)
->getArrayResult();
If you are using an entity that may already exists in the manager you have to merge it.
Here is what I would do (did not test it yet) :
$repository = $this->get('doctrine.orm.entity_manager');
foreach($json->value->items as $item) {
$listItem = new ListItem();
$listItem->setTitle($item->title);
$em->merge($listItem); // return a managed entity
// no need to persist as long as the entity is now managed
}
$em->flush();

Categories