Is there a way to insert logic based on virtual fields into a Doctrine_Query?
I have defined a virtual field in my model, "getStatus()" which I would ultimately like to utilize in a Where clause in my Doctrine_Query.
...
->AndWhere('x.status = ?',$status);
"status", however, is not a column in the table it is instead computed by business logic in the model.
Filtering the Collection after executing the query works in some situations, but not when a Doctrine_Pager is thrown in the mix, as it computes it's offsets and such before you have access to the Collection.
Am I best off ditching Doctrine_Pager and rebuilding that functionality after modifying the Doctrine_Collection?
If you can do it in SQL you can do it in Doctrine. All doctrine is doing is working out what you are putting into the DQL parser, be it strings or values and turning that into SQL then hydrating objects from the result.
You can't use Doctrine_Pager to page on non query objects, however you could use sfPager and pass it the results of the Doctrine_Collection as an array? In the worst case you could pass it the results of the query minus any limits in the query and let it handle the paging, however this is really inefficient.
It might be quicker to write the pager "old skool" like you would in plain old PHP.
I don't really know what business logic you're applying to work out the status, but if it's not live (as in, computed per request), I'd compute it on save (using a Doctrine Record Listener or simply a preSave/preInsert hook in the model) and store it in the table, or set up a symfony task to refresh it periodically and run that as a cronjob. That would let you query it in Doctrine just fine and boost performance as a fringe benefit.
Alternatively, if status is dependent on the state of related objects, you can put an event trigger on them that updates the status of the parent object when they're modified. It's hard to recommend a best approach without more context. :)
Related
I cant seem to find an acceptable answer to this.
There are two big things I keep seeing:
1) Don't execute queries in the controller. That is the responsibility of business or data.
2) Only select the columns that you need in a query.
My problem is that these two things kind of butt heads since what is displayed in the UI is really what determines what columns need to be queried. This in turn leads to the obvious solution of running the query in the controller, which you aren't supposed to do. Any documentation I have found googling, etc. seems to conveniently ignore this topic and pretend it isn't an issue.
Doing it in the business layer
Now if I take it the other way and query everything in the business layer then I implicitly am making all data access closely reflect the ui layer. This is more a problem with naming of query functions and classes than anything I think.
Take for example an application that has several views for displaying different info about a customer. The natural thing to do would be to name these data transfer classes the same as the view that needs them. But, the business or service layer has no knowledge of the ui layer and therefore any one of these data transfer classes could really be reused for ANY view without breaking any architecture rules. So then, what do I name all of these variations of, say "Customer", where one selects first name and last name, another might select last name and email, or first name and city, and so on. You can only name so many classes "CustomerSummary".
Entity Framework and IQueryable is great. But, what about everything else?
I understand that in entity framework I can have a data layer pass back an IQuerable whose execution is deferred and then just tell that IQueryable what fields I want. That is great. It seems to solve the problem. For .NET. The problem is, I also do PHP development. And pretty much all of the ORMs for php are designed in a way that totally defeat the purpose of using an ORM at all. And even those dont have the same ability as EF / IQueryable. So I am back to the same problem without a solution again in PHP.
Wrapping it up
So, my overall question is how do I get only the fields I need without totally stomping on all the rules of an ntier architecture? And without creating a data layer that inevitably has to be designed to reflect the layout of the UI layer?
And pretty much all of the ORMs for php are designed in a way that totally defeat the purpose of using an ORM at all.
The Doctrine PHP ORM offers lazy loading down to the property / field level. You can have everything done through proxies that will only query the database as needed. In my experience letting the ORM load the whole object once is preferable 90%+ of the time. Otherwise if you're not careful you will end up with multiple queries to the database for the same records. The extra DB chatter isn't worthwhile unless your data model is messy and your rows are very long.
Keep in mind a good ORM will also offer a built-in caching layer. Populating a whole object once and caching it is easier and more extensible then having your code keep track of which fields you need to query in various places.
So my answer is don't go nuts trying to only query the fields you need when using an ORM. If you are writing your queries by hand just in the places you need them, then only query the fields you need. But since you are talking good architectural patterns I assume you're not doing this.
Of course there are exceptions, like querying large data sets for reporting or migrations. These will require unique optimizations.
Questions
1) Don't execute queries in the controller. That is the responsibility of business or data.
How you design your application is up to you. That being said, it's always best to consider best patterns and practices. The way I design my controllers is that I pass in the data layer(IRepository) through constructor and inject that at run time.
public MyController(IRepository repo)
To query my code I simply call
repository.Where(x=> x.Prop == "whatever")
Using IQueryable creates the leaky abstraction problem. Although, it may not be a big deal but you have to be careful and mindful of how you are using your objects especially if they contain relational data. Once you query your data layer you would construct your view model in your controller action with the appropriate data required for your view.
public ActionResult MyAction(){
var data = _repository.Single(x => x.Id == 1);
var vm = new MyActionViewModel {
Name = data.Name,
Age = data.Age
};
return View();
}
If I had any queries that where complex I would create a business layer to include that logic. This would include enforcing business rules etc. In my business layer I would pass in the repository and use that.
2) Only select the columns that you need in a query.
With ORMs you usually pass back the whole object. After that you can construct your view model to include only the data you need.
My suggestion to your php problem is maybe to set up a web api for your data. It would return json data that you can then parse in whatever language you need.
Hope this helps.
The way I do it is as follows:
Have a domain object (entity, business object .. things with the same name) for Entities\Customer, that has all fields and associated logic for all of the data, that a complete instance would have. But for persistence create two separate data mappers:
Mappers\Customer for handling all of the data
Mappers\CustomerSummary for only important parts
If you only need to get customers name and phone number, you use the "summary mapper", but, when you need to examine user's profile, you have the "all data mapper". And the same separation can be really useful, when updating data too. Especially, if your "full customer" get populated from multiple tables.
// code from a method of some service layer class
$customer = new \Model\Entities\Customer;
$customer->setId($someID);
$mapper = new \Model\Mappers\CustomerSummary($this->db);
if ($needEverything) {
$mapper = new \Model\Mappers\Customer($this->db);
}
$mapper->fetch($customer);
As for, what goes where, you probably might want to read this old post.
Perhaps this is a question with a trivial answer but nevertheless it is driving me nuts for a couple of days so i would like to hear an answer. I'm recently looking up a lot of information related to building a custom datamapper for my own project (and not using an ORM) and read several thread on stackoverflow or other websites.
It seems very convincing to me to have AuthorCollection objects, which are basically only a container of Author instances or BookCollection objects, which hold multiple Book instances. But why would one need a mapper for the single Author object? All fetch criterias i can think of (except the one asking for the object with a specified BookID or AuthorID) will return multiple Book or Author instances hence BookCollection or AuthorCollection instances. So why bother with a mapper for the single objects, if the one for the appropriate collection is more general and you don't have to be sure that your criteria will only return one result?
Thanks in advance for your help.
Short answer
You don't need to bother creating two mappers for Author and AuthorCollection. If your program doesn't need an AuthorMapper and an AuthorCollectionMapper in order to work smoothly and have a clean source, by all means, do what you're most comfortable with.
Note: Choosing this route means you should be extra careful looking out for SRP violations.
Long(er) answer
It all depends on what you're trying to do. For the sake of this post, let's call AuthorMapper an item data mapper and AuthorCollectionMapper a collection data mapper.
Typically, item mappers won't be as sophisticated as their collection mappers. Item mappers will normally only fetch by a primary key and therefore limit the results, making the mapper clean and uncluttered by additional collection-specific things.
One main part of these "collection-specific things" I bring up is conditions1 and how they're implemented into queries. Often within collection mappers you'll probably have more advanced, longer, and tedious queries than what would normally be inside an item data mapper. Though entirely possible to combine your average item data mapper query (SELECT ... WHERE id = :id) with a complicated collection mapper query without using a smelly condition2, it gets more complicated and still bothers the database to execute a lengthy query when all it needed was a simple, generic one.
Additionally, though you pointed out that with an item mapper we really only fetch by a primary key, it usually turns out to be radically simpler using an item mapper for other things. An item mapper's save() and remove() methods can handle (with the right implementation) the job better than attempting to use a collection mapper to save/remove items. And, along with this point, it also becomes apparent that at times throughout using a collection mappers' save() and remove() method, a collection mapper may want to utilize item mapper methods.
In response to your question below, there may be numerous times you may want to set conditions in deleting a collection of rows from the database. For example, you may have a spam flag that, when set, hides the post but self-destructs in thirty days. I'm that case you'd most likely have a condition for the spam flag and one for the time range. Another might be deleting all the comments under an answer thirty days after an answer was deleted. I mention thirty days because it's wise to at least keep this data for a little while in case someone should want their comment or it turns out the row with a spam flag isn't actually spam.
1. Condition here means a property set on the collection instance which the collection mapper's query knows how to handle. If you haven't already, check out #tereško's answer here.
2. This condition is different and refers to the "evil if" people speak of. If you don't understand their nefariousness, I'd suggest watching some Clean Code Talks. This one specifically, but all are great.
Symfony ACL allows me to grant access to an entity, and then check it:
if (false === $securityContext->isGranted('EDIT', $comment)) {
throw new AccessDeniedException();
}
However, if I have thousands of entities in the database and the user has access only to 10 of them, I don't want to load all the entities in memory and hydrate them.
How can I do a simple "SELECT * FROM X" while filtering only on the entities the user has access (at SQL level)?
Well there it is: it's not possible.
In the last year I've been working on an alternative ACL system that would allow to filter directly in database queries.
My company recently agreed to open source it, so here it is: http://myclabs.github.io/ACL/
As pointed out by #gregor in the previous discussion,
In your first query, get a list (with a custom query) of all the object_identity_ids (for a specific entity/class X) a user has access to.
Then, when querying a list of objects for entity/class X, add "IN (object_identity_ids)" to your query.
Matthieu, I wasn't satisfied by replying with more of conjectures (since my conjectures don't add anything valuable to the conversation). So I did some bench-marking on this approach (Digital Ocean 5$/mo VPS).
As expected, table size doesn't matter when using the IN array approach. But a big array size indeed makes things get out of control.
So, Join approach vs IN array approach?
JOIN is indeed better when the array size is huge. BUT, this is assuming that we shouldn't consider the table size. Turns out, in practice IN array is faster - except when there's a large table of objects and the acl entries cover almost every object (see the linked question).
I've expanded on my reasoning on a separate question. Please see When using Symfony's ACL, is it better to use a JOIN query or an IN array query?
You could have a look into the Doctrine filters. That way you could extend all queries. I have not done this yet and there are some limitations documented. But maybe it helps you. You'll find a description of the ACL database tables here.
UPDATE
Each filter will return a string and all those strings will be added to the SQL queries like so:
SELECT ... FROM ... WHERE ... AND (<result of filter 1> AND <result of filter 2> ...)
Also the table alias is passed to the filter method. So I think you can add Subqueries here to filter your entities.
I am designing a room booking system which has nine entities, which all relate to each other. In this specific instance I am retrieving 10-30 rows from the entity entry which has 25 properties. Each entry has one room which has 10 properties. I need all of the entry information as well as entry->room->id and entry->room->name. But it seems like doctrine is loading the entire room when I use Query::HYDRATE_ARRAY. It seems to be lazy-loading in Query::HYDRATE_OBJECT more easily.
So, I am wondering if using the Query::HYDRATE_OBJECT mode is faster or "better" than Query::HYDRATE_ARRAY / Query::HYDRATE_SCALAR/ Query::HYDRATE_SINGLE_SCALAR. Since I am reusing some older code I'd like to use HYDRATE_ARRAY but only if it won't slow the application down.
My 2 cents:
HYDRATE_OBJECT is best for when you plan on using a lot of business logic with your objects. Especially if you're doing a lot of data manipulation. It's also probably the slowest (depending on the situation).
HYDRATE_ARRAY is usually reserved for when you only need a result and 1 degrees of relational data and it's going to be used for printing/viewing purposes only.
HYDRATE_NONE is another one I use when I'm only selecting a very small subset of data (like one or two fields instead of the entire row). This behaves much like a raw query result would.
This might also be of interest http://www.doctrine-project.org/2010/03/17/doctrine-performance-revisited.html
This is from the 1.2 docs but I think the Hydration tips apply in 2.0 http://doctrine.readthedocs.org/en/latest/en/manual/improving-performance.html
Another important rule that belongs in this category is: Only fetch objects when you really need them. Doctrine has the ability to fetch "array graphs" instead of object graphs. At first glance this may sound strange because why use an object-relational mapper in the first place then? Take a second to think about it. PHP is by nature a precedural language that has been enhanced with a lot of features for decent OOP. Arrays are still the most efficient data structures you can use in PHP. Objects have the most value when they're used to accomplish complex business logic. It's a waste of resources when data gets wrapped in costly object structures when you have no benefit of that
On using HYDRATE_ARRAY:
Can you think of any benefit of having objects in the view instead of arrays? You're not going to execute business logic in the view, are you? One parameter can save you a lot of unnecessary processing:
$blogPosts = $q->execute(array(1), Doctrine_Core::HYDRATE_ARRAY);
I have a table called Cat, and an PHP class called Cat. Now I want to make a CatDataMapper class, so that Cat extends CatDataMapper.
I want that Data Mapper class to provide basic functionality for doing ORM, and for creating, editing and deleting Cat.
For that purpose, maybe someone who knows this pattern very well could give me some helpful advice? I feel it would be a little bit too simple to just provide some functions like update(), delete(), save().
I realize a Data Mapper has this problem: First you create the instance of Cat, then initialize all the variables like name, furColor, eyeColor, purrSound, meowSound, attendants, etc.. and after everything is set up, you call the save() function which is inherited from CatDataMapper. This was simple ;)
But now, the real problem: You query the database for cats and get back a plain boring result set with lots of cats data.
PDO features some ORM capability to create Cat instances. Lets say I use that, or lets even say I have a mapDataset() function that takes an associative array. However, as soon as I got my Cat object from a data set, I have redundant data. At the same time, twenty users could pick up the same cat data from the database and edit the cat object, i.e. rename the cat, and save() it, while another user still things about setting another furColor. When all of them save their edits, everything is messed up.
Err... ok, to keep this question really short: What's good practice here?
From DataMapper in PoEA
The Data Mapper is a layer of software
that separates the in-memory objects
from the database. Its responsibility
is to transfer data between the two
and also to isolate them from each
other. With Data Mapper the in-memory
objects needn't know even that there's
a database present; they need no SQL
interface code, and certainly no
knowledge of the database schema. (The
database schema is always ignorant of
the objects that use it.) Since it's a
form of Mapper (473), Data Mapper
itself is even unknown to the domain
layer.
Thus, a Cat should not extend CatDataMapper because that would create an is-a relationship and tie the Cat to the Persistence layer. If you want to be able to handle persistence from your Cats in this way, look into ActiveRecord or any of the other Data Source Architectural Patterns.
You usually use a DataMapper when using a Domain Model. A simple DataMapper would just map a database table to an equivalent in-memory class on a field-to-field basis. However, when the need for a DataMapper arises, you usually won't have such simple relationships. Tables will not map 1:1 to your objects. Instead multiple tables could form into one Object Aggregate and viceversa. Consequently, implementing just CRUD methods, can easily become quite a challenge.
Apart from that, it is one of the more complicated patterns (covers 15 pages in PoEA), often used in combination with the Repository pattern among others. Look into the related questions column on the right side of this page for similar questions.
As for your question about multiple users editing the same Cat, that's a common problem called Concurrency. One solution to that would be locking the row, while someone edits it. But like everything, this can lead to other issues.
If you rely on ORM's like Doctrine or Propel, the basic principle is to create a static class that would get the actual data from the database, (for instance Propel would create CatPeer), and the results retrieved by the Peer class would then be "hydrated" into Cat objects.
The hydration process is the process of converting a "plain boring" MySQL result set into nice objects having getters and setters.
So for a retrieve you'd use something like CatPeer::doSelect(). Then for a new object you'd first instantiate it (or retrieve and instance from the DB):
$cat = new Cat();
The insertion would be as simple as doing: $cat->save(); That'd be equivalent to an insert (or an update if the object already exists in the db... The ORM should know how to do the difference between new and existing objects by using, for instance, the presence ort absence of a primary key).
Implementing a Data Mapper is very hard in PHP < 5.3, since you cannot read/write protected/private fields. You have a few choices when loading and saving the objects:
Use some kind of workaround, like serializing the object, modifying it's string representation, and bringing it back with unserialize
Make all the fields public
Keep them private/protected, and write mutators/accessors for each of them
The first method has the possibility of breaking with a new release, and is very crude hack, the second one is considered a (very) bad practice.
The third option is also considered bad practice, since you should not provide getters/setters for all of your fields, only the ones that need it. Your model gets "damaged" from a pure DDD (domain driven design) perspective, since it contains methods that are only needed because of the persistence mechanism.
It also means that now you have to describe another mapping for the fields -> setter methods, next to the fields -> table columns.
PHP 5.3 introduces the ability to access/change all types of fields, by using reflection:
http://hu2.php.net/manual/en/reflectionproperty.setaccessible.php
With this, you can achieve a true data mapper, because the need to provide mutators for all of the fields has ceased.
PDO features some ORM capability to
create Cat instances. Lets say I use
that, or lets even say I have a
mapDataset() function that takes an
associative array. However, as soon as
I got my Cat object from a data set, I
have redundant data. At the same time,
twenty users could pick up the same
cat data from the database and edit
the cat object, i.e. rename the cat,
and save() it, while another user
still things about setting another
furColor. When all of them save their
edits, everything is messed up.
In order to keep track of the state of data typically and IdentityMap and/or a UnitOfWork would be used keep track of all teh different operations on mapped entities... and the end of the request cycle al the operations would then be performed.
keep the answer short:
You have an instance of Cat. (Maybe it extends CatDbMapper, or Cat3rdpartycatstoreMapper)
You call:
$cats = $cat_model->getBlueEyedCats();
//then you get an array of Cat objects, in the $cats array
Don't know what do you use, you might take a look at some php framework to the better understanding.