Doctrine2...Best hydration mode? - php

I am designing a room booking system which has nine entities, which all relate to each other. In this specific instance I am retrieving 10-30 rows from the entity entry which has 25 properties. Each entry has one room which has 10 properties. I need all of the entry information as well as entry->room->id and entry->room->name. But it seems like doctrine is loading the entire room when I use Query::HYDRATE_ARRAY. It seems to be lazy-loading in Query::HYDRATE_OBJECT more easily.
So, I am wondering if using the Query::HYDRATE_OBJECT mode is faster or "better" than Query::HYDRATE_ARRAY / Query::HYDRATE_SCALAR/ Query::HYDRATE_SINGLE_SCALAR. Since I am reusing some older code I'd like to use HYDRATE_ARRAY but only if it won't slow the application down.

My 2 cents:
HYDRATE_OBJECT is best for when you plan on using a lot of business logic with your objects. Especially if you're doing a lot of data manipulation. It's also probably the slowest (depending on the situation).
HYDRATE_ARRAY is usually reserved for when you only need a result and 1 degrees of relational data and it's going to be used for printing/viewing purposes only.
HYDRATE_NONE is another one I use when I'm only selecting a very small subset of data (like one or two fields instead of the entire row). This behaves much like a raw query result would.
This might also be of interest http://www.doctrine-project.org/2010/03/17/doctrine-performance-revisited.html
This is from the 1.2 docs but I think the Hydration tips apply in 2.0 http://doctrine.readthedocs.org/en/latest/en/manual/improving-performance.html
Another important rule that belongs in this category is: Only fetch objects when you really need them. Doctrine has the ability to fetch "array graphs" instead of object graphs. At first glance this may sound strange because why use an object-relational mapper in the first place then? Take a second to think about it. PHP is by nature a precedural language that has been enhanced with a lot of features for decent OOP. Arrays are still the most efficient data structures you can use in PHP. Objects have the most value when they're used to accomplish complex business logic. It's a waste of resources when data gets wrapped in costly object structures when you have no benefit of that
On using HYDRATE_ARRAY:
Can you think of any benefit of having objects in the view instead of arrays? You're not going to execute business logic in the view, are you? One parameter can save you a lot of unnecessary processing:
$blogPosts = $q->execute(array(1), Doctrine_Core::HYDRATE_ARRAY);

Related

Mvc and only selecting fields needed

I cant seem to find an acceptable answer to this.
There are two big things I keep seeing:
1) Don't execute queries in the controller. That is the responsibility of business or data.
2) Only select the columns that you need in a query.
My problem is that these two things kind of butt heads since what is displayed in the UI is really what determines what columns need to be queried. This in turn leads to the obvious solution of running the query in the controller, which you aren't supposed to do. Any documentation I have found googling, etc. seems to conveniently ignore this topic and pretend it isn't an issue.
Doing it in the business layer
Now if I take it the other way and query everything in the business layer then I implicitly am making all data access closely reflect the ui layer. This is more a problem with naming of query functions and classes than anything I think.
Take for example an application that has several views for displaying different info about a customer. The natural thing to do would be to name these data transfer classes the same as the view that needs them. But, the business or service layer has no knowledge of the ui layer and therefore any one of these data transfer classes could really be reused for ANY view without breaking any architecture rules. So then, what do I name all of these variations of, say "Customer", where one selects first name and last name, another might select last name and email, or first name and city, and so on. You can only name so many classes "CustomerSummary".
Entity Framework and IQueryable is great. But, what about everything else?
I understand that in entity framework I can have a data layer pass back an IQuerable whose execution is deferred and then just tell that IQueryable what fields I want. That is great. It seems to solve the problem. For .NET. The problem is, I also do PHP development. And pretty much all of the ORMs for php are designed in a way that totally defeat the purpose of using an ORM at all. And even those dont have the same ability as EF / IQueryable. So I am back to the same problem without a solution again in PHP.
Wrapping it up
So, my overall question is how do I get only the fields I need without totally stomping on all the rules of an ntier architecture? And without creating a data layer that inevitably has to be designed to reflect the layout of the UI layer?
And pretty much all of the ORMs for php are designed in a way that totally defeat the purpose of using an ORM at all.
The Doctrine PHP ORM offers lazy loading down to the property / field level. You can have everything done through proxies that will only query the database as needed. In my experience letting the ORM load the whole object once is preferable 90%+ of the time. Otherwise if you're not careful you will end up with multiple queries to the database for the same records. The extra DB chatter isn't worthwhile unless your data model is messy and your rows are very long.
Keep in mind a good ORM will also offer a built-in caching layer. Populating a whole object once and caching it is easier and more extensible then having your code keep track of which fields you need to query in various places.
So my answer is don't go nuts trying to only query the fields you need when using an ORM. If you are writing your queries by hand just in the places you need them, then only query the fields you need. But since you are talking good architectural patterns I assume you're not doing this.
Of course there are exceptions, like querying large data sets for reporting or migrations. These will require unique optimizations.
Questions
1) Don't execute queries in the controller. That is the responsibility of business or data.
How you design your application is up to you. That being said, it's always best to consider best patterns and practices. The way I design my controllers is that I pass in the data layer(IRepository) through constructor and inject that at run time.
public MyController(IRepository repo)
To query my code I simply call
repository.Where(x=> x.Prop == "whatever")
Using IQueryable creates the leaky abstraction problem. Although, it may not be a big deal but you have to be careful and mindful of how you are using your objects especially if they contain relational data. Once you query your data layer you would construct your view model in your controller action with the appropriate data required for your view.
public ActionResult MyAction(){
var data = _repository.Single(x => x.Id == 1);
var vm = new MyActionViewModel {
Name = data.Name,
Age = data.Age
};
return View();
}
If I had any queries that where complex I would create a business layer to include that logic. This would include enforcing business rules etc. In my business layer I would pass in the repository and use that.
2) Only select the columns that you need in a query.
With ORMs you usually pass back the whole object. After that you can construct your view model to include only the data you need.
My suggestion to your php problem is maybe to set up a web api for your data. It would return json data that you can then parse in whatever language you need.
Hope this helps.
The way I do it is as follows:
Have a domain object (entity, business object .. things with the same name) for Entities\Customer, that has all fields and associated logic for all of the data, that a complete instance would have. But for persistence create two separate data mappers:
Mappers\Customer for handling all of the data
Mappers\CustomerSummary for only important parts
If you only need to get customers name and phone number, you use the "summary mapper", but, when you need to examine user's profile, you have the "all data mapper". And the same separation can be really useful, when updating data too. Especially, if your "full customer" get populated from multiple tables.
// code from a method of some service layer class
$customer = new \Model\Entities\Customer;
$customer->setId($someID);
$mapper = new \Model\Mappers\CustomerSummary($this->db);
if ($needEverything) {
$mapper = new \Model\Mappers\Customer($this->db);
}
$mapper->fetch($customer);
As for, what goes where, you probably might want to read this old post.

A data structure from Repository to serve all purposes?

I need help with something I can’t get my head wrapped around regarding the Repository and Service/Use-case pattern (part of DDD design) I want to implement in my next (Laravel PHP) project.
All seems clear. Just one part of DDD that is confusing is the data structures from repositories. People seem to choose data structures the Repository should return (arrays or entities) but it all has disadvantages. One of which is performance looking at my experiences in the past. And one is which you don’t have interfaces for simple data structures (array or simple object attributes).
I’ll start with explaining the experience I have with a previous project. This project had flaws but some good strengths I learned from and like to see in my new project but with solving some design mistakes.
Previous experience
In the past I’ve build a website that was API Centric using the Kohana framework and Doctrine 2 ORM (data mapper pattern). The flow looked like this:
Website controller → API client (HMVC calls) → API controller → Custom Repository → Doctrine 2 ORM native Repository/Entity-manager
My custom Repository returned plain arrays using Doctrine2 DQL. Doctrine2 recommends array result data for read only operations. And yes, it made my site nice and light. The API controller just converted the array data to JSON. Simple as that.
In the past my company created projects relying fully on loaded Doctrine2 entities and it’s something we regretted due to performance.
My REST API supported queries like
/api/users?include_latest_adverts=2&include_location=true
on the users resource. The API controller passed include_location to the repository which directly included the location relation. The controller read latest_adverts=2 and called the adverts repository to get the latest 2 adverts of each user. Arrays were returned.
For example first user array:
[
name
avatar
adverts [
advert 1 [
name
price
]
advert 2 [
….
]
]
]
This proved to be very successful. My whole website was using the API. It would be very easy to to add a new client because the API was perfectly in production already using oauth. The whole website runs on it.
But this design had flaws too. My controller still contained A LOT of logic for validation, mailing, params or filters like has_adverts=true to get users with adverts only. It would mean that if I created a new port, like a total new CLI interface, I would have to duplicate alot of these controllers due to all the validation etc. But no duplication if I would create a new client. So at least one problem was solved :-)
My admin panels were completely coupled to the doctrine2 repository/entity-manager to speed up development (sort of). Why? Because my API had fat controllers with special functionality for the website only (special validation, mailing for registering etc). I would have to redo work or refactor a lot. So decided to use the Entities directly to still have some sort clear way of writing code instead of rewriting all my API controllers and move them to Services (for site & admin) for instance. Time was an issue in fixing my design mistakes.
For my next project I want all code to go through my own custom repositories and services. One flow for good separation.
New project (using DDD ideas) and dilemma with data structures
While I like the idea of being API centric, I don’t want my next project to be API centric in core because I think the same functionality should be available without the HTTP protocol in between. I want to design the core using DDD ideas.
But I liked the idea using a layer that just talked as a API and returns simple arrays. The perfect base for any new port, including my own frontend. My idea is to consider my Service classes as the API interface (return the array data), do the validation etc. I could have Services specially for the website (registering) and plain services used by the Admin or background processes. In some admin cases a Service would not be required anyway for simple CRUD editing, I could just use Repositories directly. Controllers would be very thin. With this creating a real REST API would just be a matter to create new controllers using the same Services my frontend controller classes do.
For internal logic like business rules it would be useful to have Entities (clear interfaces) instead of arrays from repositories. This way I could benefit from defining some methods that did some logic based on attributes. BUT If I would be using Doctrine2 and my repositories would always return Entities my application would suffer a big performance hit!!
One data structure ensures performance but no clear interfaces, the other ensures clear interfaces but bad performance when using a Data Pattern pattern like Doctrine 2 (now or in the future). Also I could end up with two data types which would be confusing.
I was thinking something similar to this flow:
Controller (thin) → UserService (incl. validation) → UserRepository (just storage) → Eloquent ORM
Why Eloquent instead of Doctrine2? Because I want to stick a bit to what’s common within the Laravel framework and community. So I could benefit from third party modules, for example to generate admin interfaces or similar based on models (bypassing my DDD rules). Other than using third party modules, I would design my core stuff so switching should always be easy and not affect data structure choices or performance.
Eloquent is an activerecord pattern. So I would be tempted to convert this data to POPO’s like Doctrine2 entities are. But nope... as said above, with doctrine2 real models would make the system very fat. So I fall back to simple arrays again. Knowing this would work for both and any other implementation in the future.
But it feels bad always rely on arrays. Especially when creating internal business rules. A developer would have to guess values on arrays, have no autocompletion in his IDE, could not have special methods like in Entity classes. But making two ways of dealing with data feels bad too. Or I am just too perfectionist ;) I want ONE clear data structure for all!
Building interfaces and POPO’s would mean a lot of duplicate work. I would need to convert an Eloquent model (just a table mapper, not entity) to an entity object implementing this interface. All is extra work. And eventually my last layer would be just like a API, thus converting it to arrays again. Which is extra work too. Arrays seem the deal again.
It seemed so easy reading up into DDD and Hexagonal. It seems so logic! But in reality I struggle with this one simple issue trying to stick to OOP principles. I want to use arrays because it’s the only way to be 100% sure I am not depended on any model choice and querying choice from my ORM regarding performance etc and don't have duplicate work in converting to arrays for views or an API. But there's no clear contract on how a user array could look. I want to speed up my project using these patterns, not slow them down :-) So not an option to have many converters.
Now I read a lot of topics. One makes POPO’s & interfaces that conform proper entities like Doctrine2 could return, but with all the extra work for Eloquent. Switching to Doctrine2 should be fairly easy, but would impact performance so bad or one would need to convert Doctrine2 array data to these own entity interfaces. Others choose to return simple arrays.
One convinces people to use Doctrine2 instead of Eloquent, but they leave out the fact that Doctrine2 is heavy and you really need to use array results for read only operations.
We design repositories to be changeable right? Not because it’s “nice” by design only. So how could we rely on full Entities if it has such big impact on performance or duplicate work? Even when using Doctrine2 only (coupled) this same issue would arise due to its performance!
All ORM implementations would be able to return arrays, thus no duplicate work there. Good performance. But we miss clear contracts. And we don’t have interfaces for arrays or class attributes (as a workaround)... Ugh ;)
Do I just miss a missing block in our programming languages? Interfaces on simple data structures??
Is it wise to make all arrays and have advanced business logic talk to these arrays? Thus no classes with clear interfaces. Any precalculated data (normally would be returned by an Entity method) would be within an array key defined the Service class. if not wise, what’s the alternative considering all of the above?
I would really appreciate if someone with great experience in this “domain” considering performance, different ORM implementations, etc could tell me how he/she has dealt with this?
Thanks in advance!
I think what you are dealing with is something similiar I'm struggling with. The solution I'm thinking works best is:
Entities/Repositories
Use and pass around Entities always when performing Write operations (Creating things, Updating things, Deleting things, and complex combinations thereof).
Sometimes you may use Entities when doing Read operations (when you anticipate the Read might need to be used for a Write soon after...ie. ->findById is soon followed by ->save).
Anytime you are working with an Entity (whether it be Write or Read), the Repositories need to be the place to go. You should be able to tell new developers that they can only persist to the database through Entities and the Repository.
The Entities will have properties that represent some Domain Object (many times they represent a database table with the fields of a table, but not always). They will also contain the domain logic/rules with them (ie. validation, calculations) so they are not anemic. You may additionally have some domain services if your Entities need help interacting with other Entities (need to trigger other events), or you just need an additional place to handle some extra domain logic (perform Repository calls to check for some unique conditions).
Your Repositories will solely be for working with Entities. The Repositories could accept Entities and do some persistence work with them. Or they could accept just some parameters, and do some reading/fetching into full Entities.
Some Repositories will know how to save some Domain Objects that are more complex than others. Perhaps an Entity that has a property which contains a list of other Entities that need to be saved along side the main entity (you can dive deeper into learning about Aggregate roots if you want).
The interfaces to Repositories rest in your Domain layer, but not the actual implementations of those Repositories. That way you can have an Eloquent version or whatever.
Other Queries (Table Data Gateway)
These queries won't work with Entities. They'll just be accepting parameters and returning things like Arrays or POPO's (Plain Old PHP Objects).
Many times you will need to perform Reads that do not return nicely into a single Entity. These Reads are typically more for reporting (not for CRUD-like operations, like Reading a user into an edit form that is eventually submitted and saved). For example, you might have a report that is 200 rows of JOINed data. If you used the Repositiory and tried to return large deep objects (with all the relationships populated, or even lazy-loaded) then you are going to have performance issues. Instead, use the Table Data Gatway pattern. You are just displaying data and not really needing OOP power here. The outputted data could however contain ID's, which through the UI could be used to initiate calls to Repository persistence methods.
As you are developing your app, when you come across the need for a new Read/Report query, create a new method in some class somewhere in your Table Data Gatway folder. You may find you have already created a similar query, so see how you can consolidate the other query. Use some parameters if necessary to make the gateway method's queries more flexible in particular ways (ie. columns to select, sort order, pagination, etc.). Don't make your queries too flexible though, this is where query builders/ORMs go wrong! You need to constrain your queries to a certain extent to where if you need to replace them (perhaps a different database engine) then you can easily perceive what the allowed variations are and aren't. It's up to you to find the right balance between flexibility (so you have more DRY code) and constraints (so you can optimize/replace queries later).
You can create services in your Domain to handle receiving parameters, then passing them to the Table Data Gateway, and then receiving back arrays to do some more mutating on. This will keep your Domain logic in the domain (and out of the infrastructure/persistence layer of the Repository & Table Data Gateway).
Again, just like the Repository, use interfaces in your domain services so that the implementation details stay out of your Domain layer, and resides in the actual Table Data Gateway folder.

PHP datamapper - why use them for non-collection objects?

Perhaps this is a question with a trivial answer but nevertheless it is driving me nuts for a couple of days so i would like to hear an answer. I'm recently looking up a lot of information related to building a custom datamapper for my own project (and not using an ORM) and read several thread on stackoverflow or other websites.
It seems very convincing to me to have AuthorCollection objects, which are basically only a container of Author instances or BookCollection objects, which hold multiple Book instances. But why would one need a mapper for the single Author object? All fetch criterias i can think of (except the one asking for the object with a specified BookID or AuthorID) will return multiple Book or Author instances hence BookCollection or AuthorCollection instances. So why bother with a mapper for the single objects, if the one for the appropriate collection is more general and you don't have to be sure that your criteria will only return one result?
Thanks in advance for your help.
Short answer
You don't need to bother creating two mappers for Author and AuthorCollection. If your program doesn't need an AuthorMapper and an AuthorCollectionMapper in order to work smoothly and have a clean source, by all means, do what you're most comfortable with.
Note: Choosing this route means you should be extra careful looking out for SRP violations.
Long(er) answer
It all depends on what you're trying to do. For the sake of this post, let's call AuthorMapper an item data mapper and AuthorCollectionMapper a collection data mapper.
Typically, item mappers won't be as sophisticated as their collection mappers. Item mappers will normally only fetch by a primary key and therefore limit the results, making the mapper clean and uncluttered by additional collection-specific things.
One main part of these "collection-specific things" I bring up is conditions1 and how they're implemented into queries. Often within collection mappers you'll probably have more advanced, longer, and tedious queries than what would normally be inside an item data mapper. Though entirely possible to combine your average item data mapper query (SELECT ... WHERE id = :id) with a complicated collection mapper query without using a smelly condition2, it gets more complicated and still bothers the database to execute a lengthy query when all it needed was a simple, generic one.
Additionally, though you pointed out that with an item mapper we really only fetch by a primary key, it usually turns out to be radically simpler using an item mapper for other things. An item mapper's save() and remove() methods can handle (with the right implementation) the job better than attempting to use a collection mapper to save/remove items. And, along with this point, it also becomes apparent that at times throughout using a collection mappers' save() and remove() method, a collection mapper may want to utilize item mapper methods.
In response to your question below, there may be numerous times you may want to set conditions in deleting a collection of rows from the database. For example, you may have a spam flag that, when set, hides the post but self-destructs in thirty days. I'm that case you'd most likely have a condition for the spam flag and one for the time range. Another might be deleting all the comments under an answer thirty days after an answer was deleted. I mention thirty days because it's wise to at least keep this data for a little while in case someone should want their comment or it turns out the row with a spam flag isn't actually spam.
1. Condition here means a property set on the collection instance which the collection mapper's query knows how to handle. If you haven't already, check out #tereško's answer here.
2. This condition is different and refers to the "evil if" people speak of. If you don't understand their nefariousness, I'd suggest watching some Clean Code Talks. This one specifically, but all are great.

Object Orientated Design with Databases and scalability/optimisation using PHP and mySQL

I'm currently at an impasse in reguards to the structural design of my website. At the moment I'm using objects to simplify the structure of my site (I have a person object, a party object, a position object, etc...) and in theory each of these is a row from it's respective table in the database.
Now from what I've learnt, OO Design is good for keeping things simple and easy to use/implement, which I agree with - it makes my code look so much cleaner and easier to maintain, but what I'm confused about is how I go about linking my objects to the database.
Let's say there is a person page. I create a person object, which equals one mysql query (which is reasonable), but then that person might have multiple positions which I need to fetch and display on a single page.
What I am currently doing is using a method called getPositions from the person object which gets the data from mysql and creates a separate position object for each row, passing in the data as an array. That keeps the queries down to a minimum (2 to a page) but it seems like a horrible implementation and to me, breaks the rules of object orientated design (should I want to change a mysql row, I'd need to change it in multiple places) but the alternative is worse.
In this case the alternative is just getting the ID's that I need and then creating separate positions, passing in the ID which then goes on to getting the row from the database in the constructor. If you have 20 positions per page, it can quickly add up and I've read about how much Wordpress is criticised for it's high number of queries per page and it's CPU usage. The other thing I'll need to consider in this case is sorting, and doing it this way means I'll need to sort the data using PHP, which surely can't be as efficient as natively doing it in mysql.
Of course, pages will be (and can be) cached, but to me, this seems almost like cheating for poorly built applications. In this case, what is the correct solution?
The way you're doing it now is at least on the right track. Having an array in the parent object with references to the children is basically how the data is represented in the database.
I'm not completely sure from your question if you're storing the children as references in the parent's array, but you should be and that's how PHP should store them by default. If you also use a singleton pattern for your objects that are pulled from the database, you should never need to modify multiple objects to change one row as you suggest in your question.
You should probably also create multiple constructors for your objects (using static methods that return new instances) so you can create them from their ID and have them pull the data or just create them from data you already have. The latter case would be used when you're creating children; you can have the parent pull all of the data for its children and create all of them using only one query. Getting a child from its ID will probably be used somewhere else so its good just to have if its needed.
For sorting, you could create additional private (or public if you want) arrays that have the children sorted in a particular way with references to the same objects the main array references.

Prefetching data vs using ActiveRecord methods in a loop

In my MVC web app, I'm finding myself doing a lot of actions with ActiveRecords where I'll fetch a specific subset of Products from the database (for a search query, say), and then loop through again to display each one -- but to display each one requires several more trips to the database, to fetch things like price, who supplies them, and various other pieces of metadata. To calculate each of these pieces of metadata isn't very simple; it's not really something that could be achieved with a simple JOIN. However, it WOULD be possible (for most of these cases anyway) to batch the required database calls and do them all at once before the loop, and then within the loop refer to those pre-fetched data to do the various calculations.
Just as an example of the type of thing -- in a search, I might want to know what regions the product is provided by. In the database I have various rows which represent a particular supplier's stock of that item, and I can look up all the different suppliers which supply that item, and then get all the regions supplied by those suppliers. That fits nicely into one query, but it would start getting a bit complex to join into the original product search (wouldn't it?).
I have two questions:
does anyone else do something similar to this, and does it sound like a good way to handle the problem? Or does it sound more like a design problem (perhaps the application has grown out of ActiveRecord's usefulness, or perhaps the models need to be split up and combined in different ways, for instance).
If I do pre-fetch a bunch of different things I think I'll use inside the loop, I'm having a hard time deciding what would be the best way to pass the appropriate data back to the model. At the moment I'm using a static method on the model to fetch all the data I need at the start of the array, like fetchRegionsForProductIds(array $ids) and so forth; these methods return an array keyed by the ID of the product, so when I'm working inside the loop I can get the regions for the current product and pass them as a parameter to the model method that needs them. But that seems kind of hacky and laborious to me. So, can anyone tell me if there is just some really obvious and beautiful design pattern I'm missing which could totally resolve this for me, or if it's just a bit of a complex problem that needs a kind of ugly complex solution?
Small update: I wonder if using a datamapper class would put me on the right track? Is there a common way of implementing a data mapper so that it can be told to do large batch queries up front, store that information in an array, and then drip feed it out to the records as they request it?
I really hope this question makes sense; I've done the best I can to make it clear, and will happily add more detail if someone thinks they can have a go at it!

Categories