Doctrine entity remove vs delete query, performance comparison

Doctrine entity remove vs delete query, performance comparison - php

While using doctrine, I noticed that, to delete an entity, I need to retrieve that entity by given parameter(name,id etc) and then call the remove method. On the other hand, in query, I can just execute delete query.
So, seems like, using ORM style requires two operation and general sql operation require one operation. That's why, I am a little confusing, whether we should use delete(or update) operation in ORM? Isn't it worse in performance? Or Is there anything else I am missing? Can it be done in any other way in ORM style?

In Doctrine2 you can call the delete on a proxy object, which is not loaded from the database. Just create a "dummy" object, something like:
$user = $em->getPartialReference('model\User', array('id' => $id));
$em->remove($user);
It doesn't require the initial query, but I'm not quite sure if Doctrine still does it internally on flush. I don't see it in the SqlLog.
Just to add, I think this is expected behavior of any decent ORM. It deals with objects and relations. It has to know that something exists before deleting it. ORM is not just a query generator. Generally, a native query will always be faster in any ORM. Any ORM adds a layer of abstraction and it takes some time to execute it. It is a typical tradeoff, you get some fancy features and clean code, but loose some on performance.
EDIT:
I'm glad it worked out for you. Actually I stumbled on another problem, which made me realize that proxies and partial objects aren't actually the same thing. Partial objects instance the real model class, and fill it with values you want. After you initialize a partial object lazy-loading doesn't work on it anymore. So for instance, if you make a partial object with only the id, and want to delete only if another object field satisfies some condition, it will not work, because that other field will always be null.
On the other hand, proxies do work with lazy-loading, and don't share the problems that partial objects have. So I would strongly suggest not to use getPartialReference method, instead you can do something like:
$user = $em->getReference('model\User', $id);
$em->remove($user);
The getReference method returns the object if it is already loaded or a proxy if it is not. A proxy can lazy-load all the other values if/when you need them. As for your example, they will behave exactly the same, but proxies are surely a better way to go.

Done!
for me it worked like this add line 3:
$user = $em->getReference('model\User', $id);
$em->remove($user);
$em->flush();

Related

Data Mapper, only for CRUD operations?

I have recently started reading about data mappers and all the articles I've found only demonstrate CRUD operations like:
$user = new User('John', 'Joe', 'john#hotmail.com');
$userMapper->insert($user);
// or this...
$user = $userMapper->fetchById(239);
Surely they are supposed to do more?
Currently in my application I use DAO's (or something similar anyway) so for example when I need a $user object one of my factories creates a $userDAO object and injects it into my $user object. And to do a query from my $user object I just do:
$this->userDAO->getNumActiveOrders($this->userId);
and it will do the query in the $userDAO object and return the result.
After loads of reading it seems my implementation is wrong because the domain object should not know about the DAO and vice versa. Am I right or wrong?
If it's wrong to do it that way then I assume that data mappers must be used for more than CRUD operations?
So if I wanted to find out how many active orders a user has I can do something like:
$userMapper->getNumActiveOrders($userId);
Would that be correct?
And if I wanted to set that value in my $user object I would have to do something like:
$user->setNumActiveOrders($userMapper->getNumActiveOrders($userId));
Using my implementation of DAO's seems to be a lot faster and uses less code than using data mappers but I am probably implementing data mappers wrongly.
Any advice would be great thanks.

After loads of reading it seems my implementation is wrong because the domain object should not know about the DAO and vice versa. Am I right or wrong?
That is correct.
If it's wrong to do it that way then I assume that data mappers must be used for more than CRUD operations?
The purpose of a DataMapper is to map data from a Database to Domain Objects. Since object graphs are usually not structured like data in a relational database system, you need some sort of mapper to get the relational data from the database into your objects and vice versa. DataMappers try to solve the problem of Impedance Mismatch.
So if I wanted to find out how many active orders a user has I can do something like:
$userMapper->getNumActiveOrders($userId);
Would that be correct?
Yes, you could it do it that way. But you could also query the user object for it, e.g.
echo $user->getActiveOrders();
and your user object would likely have some sort of Lazy Loading mechanism to fetch the Active Orders then.
And if I wanted to set that value in my $user object I would have to do something like:
$user->setNumActiveOrders($userMapper->getNumActiveOrders($userId));
No. You'd simply set the Active Orders. The number can be derived from them. If the count is something you want inserted in the database, you'd handle that in the Mapper.
Using my implementation of DAO's seems to be a lot faster and uses less code than using data mappers but I am probably implementing data mappers wrongly.
That's pretty much normal, because DAO's only query the database and don't do any mapping.

Who should handle the conditions in complex queries, the data mapper or the service layer?

this question did a very good job of clearing my confusion a bit on the matter, but I'm having a hard time finding reliable sources on what the exact limitations of the service layer should be.
For this example, assume we're dealing with books, and we want to get books by author. The BookDataMapper could have a generic get() method that accepts condition(s) such as the book's unique identifier, author name, etc. This implementation is fairly trivial (logically), but what if we want to have multiple conditions that require a more complex query?
Lets say we want to get all book written by a certain author under a specific publisher. We could expand the BookDataMapper->get() method to parse out multiple conditions, or we could write a new method such as BookDataMapper->getByAuthorAndPublisher().
Is it preferable to have the service layer call these [more specific] methods directly, or have the conditions parsed before calling the more generic BookDataMapper->get() method with multiple conditions passed? In the latter scenario, the service layer would do more of the logical "heavy lifting," leaving the data mapper fairly simple. The former option would reduce the service layer almost entirely to just a middle-man, leaving conditional logic to the data mapper in methods like BookDataMapper->getByAuthorAndPublisher().
The obvious concern with letting the service layer parse the conditions is that some of the domain logic leaks out of the data mapper. (this is explained in the linked question here. However if the service layer was to handle the conditions, the logic wouldn't make it out of the model layer; The controller would call $book_service->getByAuthorAndPublisher() regardless.

The data mapper pattern only tells you, what it is supposed to do, not how it should be implemented.
Therefore all the answers in this topic should be treated as subjective, because they reflect each authors personal preferences.
I usually try to keep mapper's interface as simple as possible:
fetch(), retrieves data in the domain object or collection,
save(), saves (updates existing or inserts new) the domain object or collection
remove(), deletes the domain object or collection from storage medium
I keep the condition in the domain object itself:
$user = new User;
$user->setName( 'Jedediah' );
$mapper = new UserMapper;
$mapper->fetch( $user );
if ( $user->getFlags() > 5 )
{
$user->setStatus( User::STATUS_LOCKED );
}
$mapper->save( $user );
This way you can have multiple conditions for the retrieval, while keeping the interface clean.
The downside to this would be that you need a public method for retrieving information from the domain object to have such fetch() method, but you will need it anyway to perform save().
There is no real way to implement the "Tell Don't Ask" rule-of-thumb for mapper and domain object interaction.
As for "How to make sure that you really need to save the domain object?", which might occur to you, it has been covered here, with extensive code examples and some useful bits in the comments.
Update
I case if you expect to deal with groups of objects, you should be dealing with different structures, instead of simple Domain Objects.
$category = new Category;
$category->setTitle( 'privacy' );
$list = new ArticleCollection;
$list->setCondition( $category );
$list->setDateRange( mktime( 0, 0, 0, 12, 9, 2001) );
// it would make sense, if unset second value for range of dates
// would default to NOW() in mapper
$mapper = new ArticleCollectionMapper;
$mapper->fetch( $list );
foreach ( $list as $article )
{
$article->setFlag( Article::STATUS_REMOVED );
}
$mapper->store( $list );
In this case the collection is glorified array, with ability to accept different parameters, which then are used as conditions for the mapper. It also should let the mapper to acquired list changed domain objects from this collection, when mapper is attempting to store the collection.
The mapper in this case should be capable of building (or using preset ones) queries with all the possible conditions (as a developer you will know all of those conditions, therefore you do not need to make it work with infinite set of conditions) and update or create new entries for all the unsaved domain object, that collection contains.
Note: In some aspect you could say, that the mapper are related to builder/factory patterns. The goal is different, but the approach to solving the problems is very similar.

I normally prefer this to be more concrete, like:
BookDataMapper->getByAuthorAndPublisher($author, $publisher)
That is because I do not need to re-invent SQL. The database is better for that and the data-mapper takes care here that the rest of the application does not need to know anything about how things are stored or queried in concrete either.
If you make that more dynamic you can easily have the tendency to offer too much functionality via the interface. Not good.
And take a look at your application. You'll see that there is not that much going to be queried differently. For the main part of data that are normally about 5-10 routines if at all. It's written much faster than to even think about some dynamic system that actually would belong into it's own layer anyway.

With Doctrine what are the benefits of using DQL over SQL?

Can someone provide me a couple clear (fact supported) reasons to use/learn DQL vs. SQL when needing a custom query while working with Doctrine Classes?
I find that if I cannot use an ORM's built-in relational functionality to achieve something I usually write a custom method in the extended Doctrine or DoctrineTable class. In this method write the needed it in straight SQL (using PDO with proper prepared statements/injection protection, etc...). DQL seems like additional language to learn/debug/maintain that doesn't appear provide enough compelling reasons to use under most common situations. DQL does not seem to be much less complex than SQL for that to warrant use--in fact I doubt you could effectively use DQL without already having solid SQL understanding. Most core SQL syntax ports fairly well across the most common DB's you'll use with PHP.
What am I missing/overlooking? I'm sure there is a reason, but I'd like to hear from people who have intentionally used it significantly and what the gain was over trying to work with plain-ole SQL.
I'm not looking for an argument supporting ORMs, just DQL when needing to do something outside the core 'get-by-relationship' type needs, in a traditional LAMP setup (using mysql, postgres, etc...)

To be honest, I learned SQL using Doctrine1.2 :) I wasn't even aware of foreign-keys, cascade operations, complex functions like group_concat and many, many other things. Indexed search is also very nice and handy thing that simply works out-of-the-box.
DQL is much simpler to write and understand the code. For example, this query:
$query = ..... // some query for Categories
->leftJoin("c.Products p")
It will do left join between Categories and Products and you don't have to write ON p.category_id=c.id.
And if in future you change relation from one-2-many to let's say many-2-many, this same query will work without any changes at all. Doctrine will take care for that. If you would do that using SQL, than all the queries would have to be changed to include that intermediary many-2-many table.

I find DQL more readable and handy. If you configure it correctly, it will be easier to join objects and queries will be easier to write.
Your code will be easy to migrate to any RDBMS.
And most important, DQL is object query language for your object model, not for your relational schema.

Using DQL helps you to deal with Objects.
in case inserting into databae , you will insert an Object
$test = new Test();
$test->attr = 'test';
$test->save();
in case of selecting from databae, you will select an array and then you can fill it in your Object
public function getTestParam($testParam)
{
$q=Doctrine_Query::create()
->select('t.test_id , t.attr')
->from('Test t ')
$p = $q->execute();
return $p;
}
you can check the Doctrine Documentation for more details

Zeljko's answer is pretty spot-on.
Most important reason to go with DQL instead of raw SQL (in my book): Doctrine separates entity from the way it is persisted in database, which means that entities should not have to change as underlying storage changes. That, in turn, means that if you ever wish to make changes on the underlying storage (i.e. renaming columns, altering relationships), you don't have to touch your DQL, because in DQL you use entity properties instead (which only happen to be translated behind the scenes to correct SQL, depending on your current mappings).

Using virtual fields in Doctrine_Query

Is there a way to insert logic based on virtual fields into a Doctrine_Query?
I have defined a virtual field in my model, "getStatus()" which I would ultimately like to utilize in a Where clause in my Doctrine_Query.
...
->AndWhere('x.status = ?',$status);
"status", however, is not a column in the table it is instead computed by business logic in the model.
Filtering the Collection after executing the query works in some situations, but not when a Doctrine_Pager is thrown in the mix, as it computes it's offsets and such before you have access to the Collection.
Am I best off ditching Doctrine_Pager and rebuilding that functionality after modifying the Doctrine_Collection?

If you can do it in SQL you can do it in Doctrine. All doctrine is doing is working out what you are putting into the DQL parser, be it strings or values and turning that into SQL then hydrating objects from the result.
You can't use Doctrine_Pager to page on non query objects, however you could use sfPager and pass it the results of the Doctrine_Collection as an array? In the worst case you could pass it the results of the query minus any limits in the query and let it handle the paging, however this is really inefficient.
It might be quicker to write the pager "old skool" like you would in plain old PHP.

I don't really know what business logic you're applying to work out the status, but if it's not live (as in, computed per request), I'd compute it on save (using a Doctrine Record Listener or simply a preSave/preInsert hook in the model) and store it in the table, or set up a symfony task to refresh it periodically and run that as a cronjob. That would let you query it in Doctrine just fine and boost performance as a fringe benefit.
Alternatively, if status is dependent on the state of related objects, you can put an event trigger on them that updates the status of the parent object when they're modified. It's hard to recommend a best approach without more context. :)

What does a Data Mapper typically look like?

I have a table called Cat, and an PHP class called Cat. Now I want to make a CatDataMapper class, so that Cat extends CatDataMapper.
I want that Data Mapper class to provide basic functionality for doing ORM, and for creating, editing and deleting Cat.
For that purpose, maybe someone who knows this pattern very well could give me some helpful advice? I feel it would be a little bit too simple to just provide some functions like update(), delete(), save().
I realize a Data Mapper has this problem: First you create the instance of Cat, then initialize all the variables like name, furColor, eyeColor, purrSound, meowSound, attendants, etc.. and after everything is set up, you call the save() function which is inherited from CatDataMapper. This was simple ;)
But now, the real problem: You query the database for cats and get back a plain boring result set with lots of cats data.
PDO features some ORM capability to create Cat instances. Lets say I use that, or lets even say I have a mapDataset() function that takes an associative array. However, as soon as I got my Cat object from a data set, I have redundant data. At the same time, twenty users could pick up the same cat data from the database and edit the cat object, i.e. rename the cat, and save() it, while another user still things about setting another furColor. When all of them save their edits, everything is messed up.
Err... ok, to keep this question really short: What's good practice here?

From DataMapper in PoEA
The Data Mapper is a layer of software
that separates the in-memory objects
from the database. Its responsibility
is to transfer data between the two
and also to isolate them from each
other. With Data Mapper the in-memory
objects needn't know even that there's
a database present; they need no SQL
interface code, and certainly no
knowledge of the database schema. (The
database schema is always ignorant of
the objects that use it.) Since it's a
form of Mapper (473), Data Mapper
itself is even unknown to the domain
layer.
Thus, a Cat should not extend CatDataMapper because that would create an is-a relationship and tie the Cat to the Persistence layer. If you want to be able to handle persistence from your Cats in this way, look into ActiveRecord or any of the other Data Source Architectural Patterns.
You usually use a DataMapper when using a Domain Model. A simple DataMapper would just map a database table to an equivalent in-memory class on a field-to-field basis. However, when the need for a DataMapper arises, you usually won't have such simple relationships. Tables will not map 1:1 to your objects. Instead multiple tables could form into one Object Aggregate and viceversa. Consequently, implementing just CRUD methods, can easily become quite a challenge.
Apart from that, it is one of the more complicated patterns (covers 15 pages in PoEA), often used in combination with the Repository pattern among others. Look into the related questions column on the right side of this page for similar questions.
As for your question about multiple users editing the same Cat, that's a common problem called Concurrency. One solution to that would be locking the row, while someone edits it. But like everything, this can lead to other issues.

If you rely on ORM's like Doctrine or Propel, the basic principle is to create a static class that would get the actual data from the database, (for instance Propel would create CatPeer), and the results retrieved by the Peer class would then be "hydrated" into Cat objects.
The hydration process is the process of converting a "plain boring" MySQL result set into nice objects having getters and setters.
So for a retrieve you'd use something like CatPeer::doSelect(). Then for a new object you'd first instantiate it (or retrieve and instance from the DB):
$cat = new Cat();
The insertion would be as simple as doing: $cat->save(); That'd be equivalent to an insert (or an update if the object already exists in the db... The ORM should know how to do the difference between new and existing objects by using, for instance, the presence ort absence of a primary key).

Implementing a Data Mapper is very hard in PHP < 5.3, since you cannot read/write protected/private fields. You have a few choices when loading and saving the objects:
Use some kind of workaround, like serializing the object, modifying it's string representation, and bringing it back with unserialize
Make all the fields public
Keep them private/protected, and write mutators/accessors for each of them
The first method has the possibility of breaking with a new release, and is very crude hack, the second one is considered a (very) bad practice.
The third option is also considered bad practice, since you should not provide getters/setters for all of your fields, only the ones that need it. Your model gets "damaged" from a pure DDD (domain driven design) perspective, since it contains methods that are only needed because of the persistence mechanism.
It also means that now you have to describe another mapping for the fields -> setter methods, next to the fields -> table columns.
PHP 5.3 introduces the ability to access/change all types of fields, by using reflection:
http://hu2.php.net/manual/en/reflectionproperty.setaccessible.php
With this, you can achieve a true data mapper, because the need to provide mutators for all of the fields has ceased.

PDO features some ORM capability to
create Cat instances. Lets say I use
that, or lets even say I have a
mapDataset() function that takes an
associative array. However, as soon as
I got my Cat object from a data set, I
have redundant data. At the same time,
twenty users could pick up the same
cat data from the database and edit
the cat object, i.e. rename the cat,
and save() it, while another user
still things about setting another
furColor. When all of them save their
edits, everything is messed up.
In order to keep track of the state of data typically and IdentityMap and/or a UnitOfWork would be used keep track of all teh different operations on mapped entities... and the end of the request cycle al the operations would then be performed.

keep the answer short:
You have an instance of Cat. (Maybe it extends CatDbMapper, or Cat3rdpartycatstoreMapper)
You call:
$cats = $cat_model->getBlueEyedCats();
//then you get an array of Cat objects, in the $cats array
Don't know what do you use, you might take a look at some php framework to the better understanding.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.