I have a strange problem with \Doctrine\ORM\UnitOfWork::getScheduledEntityDeletions used inside onFlush event
foreach ($unitOfWork->getScheduledEntityDeletions() as $entity) {
if ($entity instanceof PollVote) {
$arr = $entity->getAnswer()->getVotes()->toArray();
dump($arr);
dump($entity);
dump(in_array($entity, $arr, true));
dump(in_array($entity, $arr));
}
}
And here is the result:
So we see that the object is pointing to a different instance than the original, therefore in_array no longer yields expected results when used with stick comparison (AKA ===). Furthermore, the \DateTime object is pointing to a different instance.
The only possible explanation I found is the following (source):
Whenever you fetch an object from the database Doctrine will keep a copy of all the properties and associations inside the UnitOfWork. Because variables in the PHP language are subject to “copy-on-write” the memory usage of a PHP request that only reads objects from the database is the same as if Doctrine did not keep this variable copy. Only if you start changing variables PHP will create new variables internally that consume new memory.
However, I did not change anything (even the created field is kept as it is). The only operations that were preformed on entity are:
\Doctrine\ORM\EntityRepository::findBy (fetching from DB)
\Doctrine\Common\Persistence\ObjectManager::remove (scheduling for removal)
$em->flush(); (triggering synchronization with DB)
Which leads me to think (I might be wrong) that the Doctrine's change tracking method has nothing to do with the issue that I'm experiencing. Which leads me to following questions:
What causes this?
How to reliably check if an entity scheduled for deletion is inside a collection (\Doctrine\Common\Collections\Collection::contains uses in_array with strict comparison) or which items in a collection are scheduled for deletion?
The problem is that when you tell doctrine to remove entity, it is removed from identity map (here):
<?php
public function scheduleForDelete($entity)
{
$oid = spl_object_hash($entity);
// ....
$this->removeFromIdentityMap($entity);
// ...
if ( ! isset($this->entityDeletions[$oid])) {
$this->entityDeletions[$oid] = $entity;
$this->entityStates[$oid] = self::STATE_REMOVED;
}
}
And when you do $entity->getAnswer()->getVotes(), it does the following:
Load all votes from database
For every vote, checks if it is in identity map, use old one
If it is not in identity map, create new object
Try to call $entity->getAnswer()->getVotes() before you delete entity. If the problem disappears, then I am right. Of cause, I would not suggest this hack as a solution, just to make sure we understand what is going on under the hood.
UPD instead of $entity->getAnswer()->getVotes() you should probably do foreach for all votes, because of lazy loading. If you just call $entity->getAnswer()->getVotes(), Doctrine probably wouldn't do anytning, and will load them only when you start to iterate through them.
From the doc:
If you call the EntityManager and ask for an entity with a specific ID twice, it will return the same instance
So calling twice findOneBy(['id' => 12]) should result in two exact same instances.
So it all depends on how both instances are retrieved by Doctrine.
In my opinion, the one you get in $arr is from a One-to-Many association on $votes in the Answer entity, which results in a separate query (maybe a id IN (12)) by the ORM.
Something you could try is to declare this association as EAGER (fetch="EAGER"), it may force the ORM to make a specific query and keep it in cache so that the second time you want to get it, the same instance is returned ?
Could you have a look at the logs and post them here ? It may indicates something interesting or at least relevant to investigate further.
Related
I'm working on a tool for concurrency with Doctrine 2.
I'm facing a "best practice" issue to retrieve a new instance of an entity without cache (the idea after that is to be able to compare some properties from 2 differents objects of the same entity and return the differences)
Some code might help (+ the doc: (http://doctrine-orm.readthedocs.org/projects/doctrine-orm/en/latest/reference/transactions-and-concurrency.html)):
// This is my current implementation.
$entity = $em->find(1);
$entity->setName('TEST');
// This entity as a "version" field equal to 2 in DB for example
try {
$em->lock($entity, LockMode::OPTIMISTIC, 1); // Will throw an OptimisticLockException
} catch(OptimisticLockException $e) {
$em->detach($entity);
$dbEntity = $this->find($entity->getId());
$em->detach($dbEntity);
$entity = $em->merge($entity);
var_dump($entity->getName()); // TEST
var_dump($dbEntity->getName()); // The old value
... do more stuff, like comparing the two objects ...
}
Is using the detach + merge methods a good practice for this behavior ? Any better idea to improve this code ?
--
Edit 1:
Actually, after adding some tests, the "merge" method is not what I expected: the object is not "re-attach" to the unit of work.
This behaviour is not what I want because the developer can't performs changes + flush on his entity after using my tool.
--
Edit 2:
After digging in the documentation and the source code, the "merge" method is actually what I wanted: a new instance of the entity is attached, not the one I provided ($entity in my example).
Since this code (In my tool) is in a method which purpose is to returns the $dbEntity object of my $entity object, passing the $entity reference (&$entity) solve my "Edit 1" issue.
within a huge dataset I sometimes get inconsistencies when one document is deleted. Symfony2 App with Doctrine ODM and FosREST
$a = new Element();
$b = new Element();
$c = new List();
$c->addElement($a);
$c->addElement($b);
$em->persist($c);
saving at this point works flawlessly
in 99% of the cases $a and $b are still valid Documents when $c is loaded later.
BUT sometimes either $a or $b is deleted without updating the reference in $c.
-> at this moment the next loading of $c will fail with a \Doctrine\ODM\MongoDB\DocumentNotFoundException
(message is something like: The "MongoDBODMProxies__CG__\App\Model\Element" document with identifier "541417702798711d2900607c" could not be found.)
What is the best approach now to handle this case?
I was thinking about either
catching the Exception and to check if the reference it tried to load was on the Element Model
custom exception Handler in fosRest to check for
custom repository function in the mapping and to check there if everything is still valid (+ to store somehow that there is a missing Element) -> but this then forces me to check on every occasion if the "error" is set
UPDATE: The Mapping between the Documents is a bit more complex than I described here
for one the element is basically a collection separated by a discriminator, where only one type of fields references another document (I call it "Tree" now)
a tree can be used in thousands of ElementTree's (that specific type that contains a Tree)
sometimes Tree's can be deleted (this is already a slow running process since a lot of data needs to be updated then)
I would now need to find out what Lists need to change and basically reject the api calls to those lists with the information that a specific element is no longer available.
A few things to check especially for MongoDB:
Make sure that there are no circular references (for example if you have the property $elements on the class List and references set to true on it, make sure List is not referenced on the Elements class as well) and your mappings are consistent.
In the addElement function IF the reference is held on the Element class make sure you also call $element->setList($this) inside the function. (and the same for removeElement, unset the reference if neccessary)
Make sure you cascade all the necessary operations. (For example cascade : ["persist", "delete", "refresh" or "all" ]
You can check your mappings with
$ app/console doctrine:mongodb:mapping:info
Finally if you expect that document to be deleted but you get an error from the proxy object you can clear the metadata cache
$ app/console doctrine:mongodb:cache:clear-metadata
Inperfect Solution that works for now
I now chose to throw a new Exception (it is important not let doctrine throw one because it will reject then any persist attempts in the same request).
In the PostLoad LifecycleEvent I check now the following (simplified):
if ($document instanceof List) {
foreach ($document->getElements() as $element) {
// at this moment $element->getId() is already defined but not yet loaded from mongo
$result = $this->elementRepository->findBy(array(‘_id’ => $element->getId()));
if (sizeof($result)==0) {
throw new InvalidElementInList($element->getId());
}
}
}
in the RestController this enables me now to catch this specific exception and to remove the invalid element from the list + to return a custom view to the user indicating that the element was removed.
I'm not having any luck using merge(). I'm doing almost exactly what is documented:
/* #var $detachedDocument MyDocumentClass */
$detachedDocument = unserialize($serializedDocument);
$document = $dm->merge($detachedDocument);
$document->setLastUpdated(new \MongoDate());
$dm->persist($document);
but the change never sticks. I have to do this instead:
$dm->createQueryBuilder('MyDocumentClass')
->findAndUpdate()
->field('lastUpdated')->set(new \MongoDate())
->getQuery()
->execute();
merge() seems pretty straightforward, so I'm confused why it doesn't work like I think it should.
In your first code example, merge() followed by persist() is redundant, and you omitted a flush(), which is the only operation that would actually write to the database (unless you execute a query manually, as you did in the second example). If you walk through the code in UnitOfWork::doMerge(), you'll see that it's going to either persist the object (if it has no ID) or fetch the document by its ID. The end result is that merge() returns a managed document. Persist ensures that the document will be managed after it is called (it returns nothing itself). If you poke in UnitOfWork::doPersist(), you'll see that passing a managed object to the method is effectively a NOOP.
Try replacing persist() with flush(). Note that you can flush a single document if necessary, but $dm->flush() processes all managed objects by default.
If that still doesn't help, I'd confirm that the lastUpdated field is properly mapped in ODM. You can inspect the output of $dm->getClassMetadata('MyDocumentClass') to confirm. If it isn't a mapped field, UnitOfWork will detect no changes in the document and there will be nothing to flush.
As an aside: in the second code example, you're executing findAndUpdate() without any search criteria (only the set() is specified). Typically, you'd pair the modification with something like equals() (probably the ID in your case) to ensure that a single document is atomically modified and returned.
I am currently a beginner in CakePHP, and have played around with CakePHP 1.3, but recently CakePHP 2.0 has been released.
So far I like it but the only thing is being a pain is the fact that it doesn't return Objects, rather it just returns arrays. I mean, it hardly makes sense to have to do $post['Post']['id']. It is (in my opinion) much more practical to just do $post->id.
Now after Google I stumbled upon this link, however, this kept generating errors about indexes not being defined when using the Form class (guessing this is because it was getting the objectified version rather than the array version).
I am following the Blog tutorial (already have followed it under 1.3 but going over it again for 2.0)
So, anyone know how to achieve this without it interfering with the Form class?
Hosh
Little known fact: Cake DOES return them as objects, or well properties of an object, anyway. The arrays are the syntactical sugar:
// In your View:
debug($this->viewVars);
Shwoing $this is a View object and the viewVars property corresponds with the $this->set('key', $variable) or $this->set(compact('data', 'for', 'view')) from the controller action.
The problem with squashing them into $Post->id for the sake of keystrokes is Cake is why. Cake is designed to be a heavy lifter, so its built-in ORM is ridiculously powerful, unavoidable, and intended for addressing infinity rows of infinity associated tables - auto callbacks, automatic data passing, query generation, etc. Base depth of multidimensional arrays depends on your find method, as soon as you're working with more than one $Post with multiple associated models (for example), you've introduced arrays into the mix and there's just no avoiding that.
Different find methods return arrays of different depths. From the default generated controller code, you can see that index uses $this->set('posts', $this->paginate()); - view uses $this->set('post', $this->Post->read(null, $id)); and edit doesn't use $this->set with a Post find at all - it assigns $this->data = $this->Post->read(null, $id);.
FWIW, Set::map probably throws those undefined index errors because (guessing) you happen to be trying to map an edit action, amirite? By default, edit actions only use $this->set to set associated model finds to the View. The result of $this->read is sent to $this->data instead. That's probably why Set::map is failing. Either way, you're still going to end up aiming at $Post[0]->id or $Post->id (depending on what you find method you used), which isn't much of an improvement.
Here's some generic examples of Set::map() property depth for these actions:
// In posts/index.ctp
$Post = Set::map($posts);
debug($Post);
debug($Post[0]->id);
// In posts/edit/1
debug($this-viewVars);
debug($this->data);
// In posts/view/1
debug($this-viewVars);
$Post = Set::map($post);
debug($Post->id);
http://api13.cakephp.org/class/controller#method-Controllerset
http://api13.cakephp.org/class/model#method-Modelread
http://api13.cakephp.org/class/model#method-ModelsaveAll
HTH.
You could create additional object vars. This way you wouldn't interfere with Cake's automagic but could access data using a format like $modelNameObj->id; format.
Firstly, create an AppController.php in /app/Controller if you don't already have one. Then create a beforeRender() function. This will look for data in Cake's standard naming conventions, and from it create additional object vars.
<?php
App::uses('Controller', 'Controller');
class AppController extends Controller {
public function beforeRender() {
parent::beforeRender();
// camelcase plural of current model
$plural = lcfirst(Inflector::pluralize($this->modelClass));
// create a new object
if (!empty($this->viewVars[$plural])) {
$objects = Set::map($this->viewVars[$plural]);
$this->set($plural . 'Obj', $objects);
}
// camelcase singular of current model
$singular = lcfirst(Inflector::singularize($this->modelClass));
// create new object
if (!empty($this->viewVars[$singular])) {
$object = Set::map($this->viewVars[$singular]);
$this->set($singular . 'Obj', $object);
}
}
}
Then in your views you can access the objects like so:
index.ctp
$productsObj;
view.ctp
$productObj->id;
All we're doing is adding 'Obj' to the variable names that Cake would already provide. Some example mappings:
Products -> $productsObj
ProductType -> $productTypesObj
I know this is not perfect but it would essentially achieve what you wanted and would be available across all of your models.
While I like the idea Moz proposes there are a number of existing solutions to this problem.
The quickest one I found is https://github.com/kanshin/CakeEntity - but it looks like you might need to refactor it for 2.x - there might even already be a 2.x branch or fork but I didn't look.
I also ran this question couple of time in my head. Now a few Cake based apps later, I see the benefit to be able to branch and merge (am, in_array etc.) result sets more conveniently with arrays than using objects.
The $Post->id form would be a sweet syntactic sugar, but not a real benefit over arrays.
You could write a function that iterates over your public propertys (see ReflectionClass::getProperties) and save it in an array (and return the array).
If you have access to the class, you can implement the ArrayAccess Interface and easily access your object as an array.
P.S.: Sorry, i've never used CakePHP but i think object-to-array conversion doesn't have to be a framework specific problem
I'm working with Doctrine2 for the first time, but I think this question is generic enough to not be dependent on a specific ORM.
Should the entities in a Data Mapper pattern be aware - and use - the Mapper?
I have a few specific examples, but they all seem to boil down to the same general question.
If I'm dealing with data from an external source - for example a User has many Messages - and the external source simply provides the latest few entities (like an RSS feed), how can $user->addMessage($message) check for duplicates unless it either is aware of the Mapper, or it 'searches' through the collection (seems like an inefficient thing to do).
Of course a Controller or Transaction Script could check for duplicates before adding the message to the user - but that doesn't seem quite right, and would lead to code duplication.
If I have a large collection - again a User with many Messages - how can the User entity provide limiting and pagination for the collection without actually proxying a Mapper call?
Again, the Controller or Transaction Script or whatever is using the Entity could use the Mapper directly to retrieve a collection of the User's Messages limited by count, date range, or other factors - but that too would lead to code duplication.
Is the answer using Repositories and making the Entity aware of them? (At least for Doctrine2, and whatever analogous concept is used by other ORMs.) At that point the Entity is still relatively decoupled from the Mapper.
Rule #1: Keep your domain model simple and straightforward.
First, don't prematurely optimize something because you think it may be inefficient. Build your domain so that the objects and syntax flow correctly. Keep the interfaces clean: $user->addMessage($message) is clean, precise and unambiguous. Underneath the hood you can utilize any number of patterns/techniques to ensure that integrity is maintained (caching, lookups, etc). You can utilize Services to orchestrate (complex) object dependencies, probably overkill for this but here is a basic sample/idea.
class User
{
public function addMessage(Message $message)
{
// One solution, loop through all messages first, throw error if already exists
$this->messages[] $message;
}
public function getMessage()
{
return $this->messages;
}
}
class MessageService
{
public function addUserMessage(User $user, Message $message)
{
// Ensure unique message for user
// One solution is loop through $user->getMessages() here and make sure unique
// This is more or less the only path to adding a message, so ensure its integrity here before proceeding
// There could also be ACL checks placed here as well
// You could also create functions that provide checks to determine whether certain criteria are met/unmet before proceeding
if ($this->doesUserHaveMessage($user,$message)) {
throw Exception...
}
$user->addMessage($message);
}
// Note, this may not be the correct place for this function to "live"
public function doesUserHaveMessage(User $user, Message $message)
{
// Do a database lookup here
return ($user->hasMessage($message) ? true
}
}
class MessageRepository
{
public function find(/* criteria */)
{
// Use caching here
return $message;
}
}
class MessageFactory
{
public function createMessage($data)
{
//
$message = new Message();
// setters
return $message;
}
}
// Application code
$user = $userRepository->find(/* lookup criteria */);
$message = $messageFactory->create(/* data */);
// Could wrap in try/catch
$messageService->sendUserMessage($user,$message);
Been working with Doctrine2 as well. Your domain entity objects are just that objects...they should not have any idea of where they came from, the domain model just manages them and passes them around to the various functions that manage and manipulate them.
Looking back over, I'm not sure that I completely answered your question. However, I don't think that the entities themselves should have any access to the mappers. Create Services/Repositories/Whatever to operate on the objects and utilize the appropriate techniques in those functions...
Don't overengineer it from the onset either. Keep your domain focused on its goal and refactor when performance is actually an issue.
IMO, an Entity should be oblivious of where it came from, who created it and how to populate its related Entities. In the ORM I use (my own) I am able to define joins between two tables and limiting its results by specifying (in C#) :
SearchCriteria sc = new SearchCriteria();
sc.AddSort("Message.CREATED_DATE","DESC");
sc.MaxRows = 10;
results = Mapper.Read(sc, new User(new Message());
That will result in a join which is limited to 10 items, ordered by date create of message. The Message items will be added to each User. If I write:
results = Mapper.Read(sc, new Message(new User());
the join is reversed.
So, it is possible to make Entities completely unaware of the mapper.
No.
Here's why: trust. You cannot trust data to act on the benefit of the system. You can only trust the system to act on data. This is a fundamental of programming logic.
Let's say something nasty slipped into the data and it was intended for XSS. If a data chunk is performing actions or if it's evaluated, then the XSS code gets blended into things and it will open a security hole.
Let not the left hand know what the right hand doeth! (mostly because you don't want to know)