Empirically it seems that flush() is not necessary after findAndUpdate(), I just couldn't find this explicitly stated anywhere in the Doctrine ODM/MongoDB docs (and I didn't bother to read much source code).
The findAndModify docs on mongodb.org state
This command can be used to atomically modify a document (at most one) and return it.
And Doctrine MongoDB's findAndUpdate() uses MongoDB's findAndModify. So it sounds like the whole thing does indeed happen in one go, so calling flush() on the document manager shouldn't be necessary.
Flush is only needed for writing changes to managed objects back to Mongo. Anything you do through the query builder interface will be executed directly and bypass UnitOfWork. This is especially true for updates and upserts. In the case of findAndUpdate(), the update should be executed in Mongo immediately, but I believe the object returned might be managed. Any changes to that document afterwards (e.g. via setter methods) would need a flush() if you wanted them written back to Mongo.
Also, be aware of returnNew() on the query builder, which corresponds to the new option of findAndModify. By default, I believe findAndUpdate() will return the document in its pre-updated state. You may prefer to retrieve the document in its updated state.
Related
I'm new to Symfony and Doctrine.
I got a project where I need a method inside a Symfony service to be called with data from the DB whenever a dateTime object saved in that DB table "expires" (reaches a certain (dynamic) age).
As I'm just starting out I do not have any code yet. What I need is a start point to get me looking in the right direction as neither the life cycle callbacks nor the doctrine event listener / dispatcher structure seems to be able to solve this task.
Am I missing something important here or is it maybe just a totally wrong start to my problem which actually can't be solved by doctrine itself?
What came to my mind is a cron-job'ish structure, but that kind of implementation is not as dynamic as required but bound to specific time frames which may be not reactive enough and maybe even immensly decreases the performance in different situations.
If I'm getting your problem right: You want something that executes when a record's datetime expires.
The main problem is that you would have to call PHP based on a DB event which is not straight forward...
One possible solution can be a Symfony command that's executed periodically(using cron) and you select the expired entities and do the required actions.
So as far as I found out doctrine is really not able to do this task in the descriped way. Of course the DB can't react to a part of a record it saved without an external action triggering the lookup.
So what I will propably go with is a shell programm called at.
It actually is something like I (and katon.abel) mentioned. It is able to enter one time crons which are then executed according to the provided time (that I then do not need to save in the DB but just pass it to at).
This way I can easily create the crons via symfony, save the needed data via doctrine and call the callback method via a script triggered by at.
I've got a script that fetches data from a database using doctrine. Sometimes it needs to fetch the data for the same entity, the second time however it uses the identity map and therefor might go out of sync with the database (another process can modify the entities in the db). One solution that we tried was to set the query hint Query::HINT_REFRESH before we run the DQL query. We however would like to use it also with simple findBy(..) calls but that doesn't seem to work? We would also like to be able to set it globally per process so that all the doctrine SELECT queries that are run in that context would actually fetch the entities from the DB. We tried to set the $em->getConfiguration()->setDefaultQueryHint(Query::HINT_REFRESH, true); but again that doesn't seem to work?
Doctrine explicitly warns you that it is not meant to be used without a cache.
However if want to ignore this, then Cerad's comment (also mentioned in in this answer) sound right. If you want to do it on every query though you might look into hooking into a doctrine event, unfortunately there is no event for preLoad, only postLoad, but if you really don't care about performance you could create a postLoad listener which first gets the class and id of the loaded entity, calls clear on the entity manager and finally reloads it. Sounds very wrong to me though, I wash my hands of it :-)
The doctrine interface Doctrine\Common\Persistence\ObjectManager defines the flush method as having no parameters. Yet the implementation Doctrine\ORM\EntityManager allows a single entity to be passed.
Aside from the, IMO, bad programming style, Is this anything I need to be worried about?
I know that PHP will ignore any extra parameters if they are not declared in the method. Which would cause a non ORM manager to flush all entities.
I ask because I'm trying to write my code in such a way that the ORM is configurable and can switched at a later date. Now, while writing a batch import class, I have found that calling flush without an entity causes memory leaks, it also effects a 'progress/history' entity I use outside of the main import loop. So it's pretty important I only flush certain entities.
I have noticed the differences between the definition and implementation of flush() as well. That may be a question only the developers of doctrine can answer.
Short Answer
Don't worry about it.
Long Answer
We can still address the differences and how they affect your application.
According to doctrine's documentation, flush() is the only method that will persist changes to your domain objects.
Other methods, such as persist() and remove() only place that object in a queue to be updated.
It is very important to understand that only EntityManager#flush() ever causes write operations against the database to be executed. Any other methods such as EntityManager#persist($entity) or EntityManager#remove($entity) only notify the UnitOfWork to perform these operations during flush.
Not calling EntityManager#flush() will lead to all changes during that request being lost.
Performance
Flushing individual entities at a time may cause performance issues in itself. Each flush() is a new trip to the database. Large sums of calls to flush() may slow down your application.
The flush() method should not be affecting your progress/history entity unless you are intentionally making changes to it. But, if that is the case, and you still do not want progress/history entity to be updated when flush() is executed, you can detach the entity from doctrine. This will allow you to make changes to the entity without doctrine being aware of those changes. Therefore, it will not be affected by flush().
When you are ready for the entity to be re-attached to doctrine, you can use the merge method provided by your entity manager. Then call flush() one last time to merge the changes.
While using doctrine, I noticed that, to delete an entity, I need to retrieve that entity by given parameter(name,id etc) and then call the remove method. On the other hand, in query, I can just execute delete query.
So, seems like, using ORM style requires two operation and general sql operation require one operation. That's why, I am a little confusing, whether we should use delete(or update) operation in ORM? Isn't it worse in performance? Or Is there anything else I am missing? Can it be done in any other way in ORM style?
In Doctrine2 you can call the delete on a proxy object, which is not loaded from the database. Just create a "dummy" object, something like:
$user = $em->getPartialReference('model\User', array('id' => $id));
$em->remove($user);
It doesn't require the initial query, but I'm not quite sure if Doctrine still does it internally on flush. I don't see it in the SqlLog.
Just to add, I think this is expected behavior of any decent ORM. It deals with objects and relations. It has to know that something exists before deleting it. ORM is not just a query generator. Generally, a native query will always be faster in any ORM. Any ORM adds a layer of abstraction and it takes some time to execute it. It is a typical tradeoff, you get some fancy features and clean code, but loose some on performance.
EDIT:
I'm glad it worked out for you. Actually I stumbled on another problem, which made me realize that proxies and partial objects aren't actually the same thing. Partial objects instance the real model class, and fill it with values you want. After you initialize a partial object lazy-loading doesn't work on it anymore. So for instance, if you make a partial object with only the id, and want to delete only if another object field satisfies some condition, it will not work, because that other field will always be null.
On the other hand, proxies do work with lazy-loading, and don't share the problems that partial objects have. So I would strongly suggest not to use getPartialReference method, instead you can do something like:
$user = $em->getReference('model\User', $id);
$em->remove($user);
The getReference method returns the object if it is already loaded or a proxy if it is not. A proxy can lazy-load all the other values if/when you need them. As for your example, they will behave exactly the same, but proxies are surely a better way to go.
Done!
for me it worked like this add line 3:
$user = $em->getReference('model\User', $id);
$em->remove($user);
$em->flush();
I was wondering if there is a way to perform a find() and have Mongo automatically return the associated references without having to run getDBRef() once the parent record has been returned.
I don't see it anywhere in the PHP documentation. I can easily support using getDBRef but it doesn't seem as efficient as it could be.
Also...I'm surprised there's no way to select the specific data to return in the linked reference. I may as well just perform another manual find statement so I can control what the return is...but there has to be a more performance oriented way to do this.
Perhaps I should change my methodology and instead of using the PHP library classes for find, generate my own JavaScript command and run it using the MongoCode class? Would that work and if so...I'm wondering what it would look like. scratches head then heads to The Google
Thanks!
MongoDB does not support joins. Database References (DBRefs) just refers to the practice of a field storing an _id referencing another document. There is currently no specific server-side support for this, and hydrating the reference to a document does require another query. Some MongoDB drivers have convenience methods so you don't have to manually do the find. It is equally valid/performant if you want to do your own find() given a DBRef to lookup (or use other criteria to find related documents).
Depending on your use case and data modelling, a more efficient alternative to the DBRef linking could be embedding related data as a subdocument. See the MongoDB wiki info on Schema Design for more examples.
As far as performance goes, it would be better to use PHP queries than MongoCode (JavaScript which needs to be eval'ed on the server). MongoCode is really intended for more limited use such as within Map/Reduce functions. Refer to Server-Side Code Execution for some of the potential limitations with that approach.
Refer: http://docs.mongodb.org/manual/reference/database-references/
Manual references where you save the _id field of one document in another document as a reference. Then your application can run a second query to return the related data. These references are simple and sufficient for most use cases.
DBRefs are references from one document to another using the value of the first document’s _id field, collection name, and, optionally, its database name. By including these names, DBRefs allow documents located in multiple collections to be more easily linked with documents from a single collection.
To resolve DBRefs, your application must perform additional queries to return the referenced documents. Many drivers have helper methods that form the query for the DBRef automatically. The drivers do not automatically resolve DBRefs into documents.
So either way, no matter which type of referencing you are using, you need to do the dereferencing yourself.
Hope it helps!