PHP datamapper - why use them for non-collection objects?

PHP datamapper - why use them for non-collection objects? - php

Perhaps this is a question with a trivial answer but nevertheless it is driving me nuts for a couple of days so i would like to hear an answer. I'm recently looking up a lot of information related to building a custom datamapper for my own project (and not using an ORM) and read several thread on stackoverflow or other websites.
It seems very convincing to me to have AuthorCollection objects, which are basically only a container of Author instances or BookCollection objects, which hold multiple Book instances. But why would one need a mapper for the single Author object? All fetch criterias i can think of (except the one asking for the object with a specified BookID or AuthorID) will return multiple Book or Author instances hence BookCollection or AuthorCollection instances. So why bother with a mapper for the single objects, if the one for the appropriate collection is more general and you don't have to be sure that your criteria will only return one result?
Thanks in advance for your help.

Short answer
You don't need to bother creating two mappers for Author and AuthorCollection. If your program doesn't need an AuthorMapper and an AuthorCollectionMapper in order to work smoothly and have a clean source, by all means, do what you're most comfortable with.
Note: Choosing this route means you should be extra careful looking out for SRP violations.
Long(er) answer
It all depends on what you're trying to do. For the sake of this post, let's call AuthorMapper an item data mapper and AuthorCollectionMapper a collection data mapper.
Typically, item mappers won't be as sophisticated as their collection mappers. Item mappers will normally only fetch by a primary key and therefore limit the results, making the mapper clean and uncluttered by additional collection-specific things.
One main part of these "collection-specific things" I bring up is conditions1 and how they're implemented into queries. Often within collection mappers you'll probably have more advanced, longer, and tedious queries than what would normally be inside an item data mapper. Though entirely possible to combine your average item data mapper query (SELECT ... WHERE id = :id) with a complicated collection mapper query without using a smelly condition2, it gets more complicated and still bothers the database to execute a lengthy query when all it needed was a simple, generic one.
Additionally, though you pointed out that with an item mapper we really only fetch by a primary key, it usually turns out to be radically simpler using an item mapper for other things. An item mapper's save() and remove() methods can handle (with the right implementation) the job better than attempting to use a collection mapper to save/remove items. And, along with this point, it also becomes apparent that at times throughout using a collection mappers' save() and remove() method, a collection mapper may want to utilize item mapper methods.
In response to your question below, there may be numerous times you may want to set conditions in deleting a collection of rows from the database. For example, you may have a spam flag that, when set, hides the post but self-destructs in thirty days. I'm that case you'd most likely have a condition for the spam flag and one for the time range. Another might be deleting all the comments under an answer thirty days after an answer was deleted. I mention thirty days because it's wise to at least keep this data for a little while in case someone should want their comment or it turns out the row with a spam flag isn't actually spam.
1. Condition here means a property set on the collection instance which the collection mapper's query knows how to handle. If you haven't already, check out #tereško's answer here.
2. This condition is different and refers to the "evil if" people speak of. If you don't understand their nefariousness, I'd suggest watching some Clean Code Talks. This one specifically, but all are great.

Related

What is the best approach to represent a generalization case for a forum website

I've modeled the following UML for the database of our website (uni project):
However, I can't seem to find how to convert to SQL the generalization case between Post, Story and Comment. My teacher suggested to use the same table, but I think that limits me if I want to add more features, like tags for stories. Right now I already have two extra relations: a post can have several child comments, and a story belongs to a specific channel.
Should I follow my teacher's suggestion? Or should I use 3 classes?

Since your class hierarchy has only two subclasses that come with only 3 additional properties (title, channel and post), you should use the Single Table Inheritance approach. Even when you may add a few more properties/methods later on, this will be the best choice, since implementing your class hierarchy with multiple tables would imply a much greater effort/overhead.
You can read more about this choice in the section Representing class hierarchies with SQL database tables (of a tutoral on handling class hierarchies in a JavaScript front-end web app authored by me).

As always it depends. Having a single table has the advantage of accessing related post info in a simple query and just looking at a flag inside the column telling what kind of object you're dealing with. Having separate tables is a more "academic" way of solving it since you (as you noticed) have object information separated.
I'd probably go with the single table approach in this case. But - YMMV - you can as well stick to your 3 table approach. There might come other opinions here. Just wait a few days to decide :-)

Mvc and only selecting fields needed

I cant seem to find an acceptable answer to this.
There are two big things I keep seeing:
1) Don't execute queries in the controller. That is the responsibility of business or data.
2) Only select the columns that you need in a query.
My problem is that these two things kind of butt heads since what is displayed in the UI is really what determines what columns need to be queried. This in turn leads to the obvious solution of running the query in the controller, which you aren't supposed to do. Any documentation I have found googling, etc. seems to conveniently ignore this topic and pretend it isn't an issue.
Doing it in the business layer
Now if I take it the other way and query everything in the business layer then I implicitly am making all data access closely reflect the ui layer. This is more a problem with naming of query functions and classes than anything I think.
Take for example an application that has several views for displaying different info about a customer. The natural thing to do would be to name these data transfer classes the same as the view that needs them. But, the business or service layer has no knowledge of the ui layer and therefore any one of these data transfer classes could really be reused for ANY view without breaking any architecture rules. So then, what do I name all of these variations of, say "Customer", where one selects first name and last name, another might select last name and email, or first name and city, and so on. You can only name so many classes "CustomerSummary".
Entity Framework and IQueryable is great. But, what about everything else?
I understand that in entity framework I can have a data layer pass back an IQuerable whose execution is deferred and then just tell that IQueryable what fields I want. That is great. It seems to solve the problem. For .NET. The problem is, I also do PHP development. And pretty much all of the ORMs for php are designed in a way that totally defeat the purpose of using an ORM at all. And even those dont have the same ability as EF / IQueryable. So I am back to the same problem without a solution again in PHP.
Wrapping it up
So, my overall question is how do I get only the fields I need without totally stomping on all the rules of an ntier architecture? And without creating a data layer that inevitably has to be designed to reflect the layout of the UI layer?

And pretty much all of the ORMs for php are designed in a way that totally defeat the purpose of using an ORM at all.
The Doctrine PHP ORM offers lazy loading down to the property / field level. You can have everything done through proxies that will only query the database as needed. In my experience letting the ORM load the whole object once is preferable 90%+ of the time. Otherwise if you're not careful you will end up with multiple queries to the database for the same records. The extra DB chatter isn't worthwhile unless your data model is messy and your rows are very long.
Keep in mind a good ORM will also offer a built-in caching layer. Populating a whole object once and caching it is easier and more extensible then having your code keep track of which fields you need to query in various places.
So my answer is don't go nuts trying to only query the fields you need when using an ORM. If you are writing your queries by hand just in the places you need them, then only query the fields you need. But since you are talking good architectural patterns I assume you're not doing this.
Of course there are exceptions, like querying large data sets for reporting or migrations. These will require unique optimizations.

Questions
1) Don't execute queries in the controller. That is the responsibility of business or data.
How you design your application is up to you. That being said, it's always best to consider best patterns and practices. The way I design my controllers is that I pass in the data layer(IRepository) through constructor and inject that at run time.
public MyController(IRepository repo)
To query my code I simply call
repository.Where(x=> x.Prop == "whatever")
Using IQueryable creates the leaky abstraction problem. Although, it may not be a big deal but you have to be careful and mindful of how you are using your objects especially if they contain relational data. Once you query your data layer you would construct your view model in your controller action with the appropriate data required for your view.
public ActionResult MyAction(){
var data = _repository.Single(x => x.Id == 1);
var vm = new MyActionViewModel {
Name = data.Name,
Age = data.Age
};
return View();
}
If I had any queries that where complex I would create a business layer to include that logic. This would include enforcing business rules etc. In my business layer I would pass in the repository and use that.
2) Only select the columns that you need in a query.
With ORMs you usually pass back the whole object. After that you can construct your view model to include only the data you need.
My suggestion to your php problem is maybe to set up a web api for your data. It would return json data that you can then parse in whatever language you need.
Hope this helps.

The way I do it is as follows:
Have a domain object (entity, business object .. things with the same name) for Entities\Customer, that has all fields and associated logic for all of the data, that a complete instance would have. But for persistence create two separate data mappers:
Mappers\Customer for handling all of the data
Mappers\CustomerSummary for only important parts
If you only need to get customers name and phone number, you use the "summary mapper", but, when you need to examine user's profile, you have the "all data mapper". And the same separation can be really useful, when updating data too. Especially, if your "full customer" get populated from multiple tables.
// code from a method of some service layer class
$customer = new \Model\Entities\Customer;
$customer->setId($someID);
$mapper = new \Model\Mappers\CustomerSummary($this->db);
if ($needEverything) {
$mapper = new \Model\Mappers\Customer($this->db);
}
$mapper->fetch($customer);
As for, what goes where, you probably might want to read this old post.

Prefetching data vs using ActiveRecord methods in a loop

In my MVC web app, I'm finding myself doing a lot of actions with ActiveRecords where I'll fetch a specific subset of Products from the database (for a search query, say), and then loop through again to display each one -- but to display each one requires several more trips to the database, to fetch things like price, who supplies them, and various other pieces of metadata. To calculate each of these pieces of metadata isn't very simple; it's not really something that could be achieved with a simple JOIN. However, it WOULD be possible (for most of these cases anyway) to batch the required database calls and do them all at once before the loop, and then within the loop refer to those pre-fetched data to do the various calculations.
Just as an example of the type of thing -- in a search, I might want to know what regions the product is provided by. In the database I have various rows which represent a particular supplier's stock of that item, and I can look up all the different suppliers which supply that item, and then get all the regions supplied by those suppliers. That fits nicely into one query, but it would start getting a bit complex to join into the original product search (wouldn't it?).
I have two questions:
does anyone else do something similar to this, and does it sound like a good way to handle the problem? Or does it sound more like a design problem (perhaps the application has grown out of ActiveRecord's usefulness, or perhaps the models need to be split up and combined in different ways, for instance).
If I do pre-fetch a bunch of different things I think I'll use inside the loop, I'm having a hard time deciding what would be the best way to pass the appropriate data back to the model. At the moment I'm using a static method on the model to fetch all the data I need at the start of the array, like fetchRegionsForProductIds(array $ids) and so forth; these methods return an array keyed by the ID of the product, so when I'm working inside the loop I can get the regions for the current product and pass them as a parameter to the model method that needs them. But that seems kind of hacky and laborious to me. So, can anyone tell me if there is just some really obvious and beautiful design pattern I'm missing which could totally resolve this for me, or if it's just a bit of a complex problem that needs a kind of ugly complex solution?
Small update: I wonder if using a datamapper class would put me on the right track? Is there a common way of implementing a data mapper so that it can be told to do large batch queries up front, store that information in an array, and then drip feed it out to the records as they request it?
I really hope this question makes sense; I've done the best I can to make it clear, and will happily add more detail if someone thinks they can have a go at it!

Doctrine2...Best hydration mode?

I am designing a room booking system which has nine entities, which all relate to each other. In this specific instance I am retrieving 10-30 rows from the entity entry which has 25 properties. Each entry has one room which has 10 properties. I need all of the entry information as well as entry->room->id and entry->room->name. But it seems like doctrine is loading the entire room when I use Query::HYDRATE_ARRAY. It seems to be lazy-loading in Query::HYDRATE_OBJECT more easily.
So, I am wondering if using the Query::HYDRATE_OBJECT mode is faster or "better" than Query::HYDRATE_ARRAY / Query::HYDRATE_SCALAR/ Query::HYDRATE_SINGLE_SCALAR. Since I am reusing some older code I'd like to use HYDRATE_ARRAY but only if it won't slow the application down.

My 2 cents:
HYDRATE_OBJECT is best for when you plan on using a lot of business logic with your objects. Especially if you're doing a lot of data manipulation. It's also probably the slowest (depending on the situation).
HYDRATE_ARRAY is usually reserved for when you only need a result and 1 degrees of relational data and it's going to be used for printing/viewing purposes only.
HYDRATE_NONE is another one I use when I'm only selecting a very small subset of data (like one or two fields instead of the entire row). This behaves much like a raw query result would.
This might also be of interest http://www.doctrine-project.org/2010/03/17/doctrine-performance-revisited.html
This is from the 1.2 docs but I think the Hydration tips apply in 2.0 http://doctrine.readthedocs.org/en/latest/en/manual/improving-performance.html
Another important rule that belongs in this category is: Only fetch objects when you really need them. Doctrine has the ability to fetch "array graphs" instead of object graphs. At first glance this may sound strange because why use an object-relational mapper in the first place then? Take a second to think about it. PHP is by nature a precedural language that has been enhanced with a lot of features for decent OOP. Arrays are still the most efficient data structures you can use in PHP. Objects have the most value when they're used to accomplish complex business logic. It's a waste of resources when data gets wrapped in costly object structures when you have no benefit of that
On using HYDRATE_ARRAY:
Can you think of any benefit of having objects in the view instead of arrays? You're not going to execute business logic in the view, are you? One parameter can save you a lot of unnecessary processing:
$blogPosts = $q->execute(array(1), Doctrine_Core::HYDRATE_ARRAY);

What does a Data Mapper typically look like?

I have a table called Cat, and an PHP class called Cat. Now I want to make a CatDataMapper class, so that Cat extends CatDataMapper.
I want that Data Mapper class to provide basic functionality for doing ORM, and for creating, editing and deleting Cat.
For that purpose, maybe someone who knows this pattern very well could give me some helpful advice? I feel it would be a little bit too simple to just provide some functions like update(), delete(), save().
I realize a Data Mapper has this problem: First you create the instance of Cat, then initialize all the variables like name, furColor, eyeColor, purrSound, meowSound, attendants, etc.. and after everything is set up, you call the save() function which is inherited from CatDataMapper. This was simple ;)
But now, the real problem: You query the database for cats and get back a plain boring result set with lots of cats data.
PDO features some ORM capability to create Cat instances. Lets say I use that, or lets even say I have a mapDataset() function that takes an associative array. However, as soon as I got my Cat object from a data set, I have redundant data. At the same time, twenty users could pick up the same cat data from the database and edit the cat object, i.e. rename the cat, and save() it, while another user still things about setting another furColor. When all of them save their edits, everything is messed up.
Err... ok, to keep this question really short: What's good practice here?

From DataMapper in PoEA
The Data Mapper is a layer of software
that separates the in-memory objects
from the database. Its responsibility
is to transfer data between the two
and also to isolate them from each
other. With Data Mapper the in-memory
objects needn't know even that there's
a database present; they need no SQL
interface code, and certainly no
knowledge of the database schema. (The
database schema is always ignorant of
the objects that use it.) Since it's a
form of Mapper (473), Data Mapper
itself is even unknown to the domain
layer.
Thus, a Cat should not extend CatDataMapper because that would create an is-a relationship and tie the Cat to the Persistence layer. If you want to be able to handle persistence from your Cats in this way, look into ActiveRecord or any of the other Data Source Architectural Patterns.
You usually use a DataMapper when using a Domain Model. A simple DataMapper would just map a database table to an equivalent in-memory class on a field-to-field basis. However, when the need for a DataMapper arises, you usually won't have such simple relationships. Tables will not map 1:1 to your objects. Instead multiple tables could form into one Object Aggregate and viceversa. Consequently, implementing just CRUD methods, can easily become quite a challenge.
Apart from that, it is one of the more complicated patterns (covers 15 pages in PoEA), often used in combination with the Repository pattern among others. Look into the related questions column on the right side of this page for similar questions.
As for your question about multiple users editing the same Cat, that's a common problem called Concurrency. One solution to that would be locking the row, while someone edits it. But like everything, this can lead to other issues.

If you rely on ORM's like Doctrine or Propel, the basic principle is to create a static class that would get the actual data from the database, (for instance Propel would create CatPeer), and the results retrieved by the Peer class would then be "hydrated" into Cat objects.
The hydration process is the process of converting a "plain boring" MySQL result set into nice objects having getters and setters.
So for a retrieve you'd use something like CatPeer::doSelect(). Then for a new object you'd first instantiate it (or retrieve and instance from the DB):
$cat = new Cat();
The insertion would be as simple as doing: $cat->save(); That'd be equivalent to an insert (or an update if the object already exists in the db... The ORM should know how to do the difference between new and existing objects by using, for instance, the presence ort absence of a primary key).

Implementing a Data Mapper is very hard in PHP < 5.3, since you cannot read/write protected/private fields. You have a few choices when loading and saving the objects:
Use some kind of workaround, like serializing the object, modifying it's string representation, and bringing it back with unserialize
Make all the fields public
Keep them private/protected, and write mutators/accessors for each of them
The first method has the possibility of breaking with a new release, and is very crude hack, the second one is considered a (very) bad practice.
The third option is also considered bad practice, since you should not provide getters/setters for all of your fields, only the ones that need it. Your model gets "damaged" from a pure DDD (domain driven design) perspective, since it contains methods that are only needed because of the persistence mechanism.
It also means that now you have to describe another mapping for the fields -> setter methods, next to the fields -> table columns.
PHP 5.3 introduces the ability to access/change all types of fields, by using reflection:
http://hu2.php.net/manual/en/reflectionproperty.setaccessible.php
With this, you can achieve a true data mapper, because the need to provide mutators for all of the fields has ceased.

PDO features some ORM capability to
create Cat instances. Lets say I use
that, or lets even say I have a
mapDataset() function that takes an
associative array. However, as soon as
I got my Cat object from a data set, I
have redundant data. At the same time,
twenty users could pick up the same
cat data from the database and edit
the cat object, i.e. rename the cat,
and save() it, while another user
still things about setting another
furColor. When all of them save their
edits, everything is messed up.
In order to keep track of the state of data typically and IdentityMap and/or a UnitOfWork would be used keep track of all teh different operations on mapped entities... and the end of the request cycle al the operations would then be performed.

keep the answer short:
You have an instance of Cat. (Maybe it extends CatDbMapper, or Cat3rdpartycatstoreMapper)
You call:
$cats = $cat_model->getBlueEyedCats();
//then you get an array of Cat objects, in the $cats array
Don't know what do you use, you might take a look at some php framework to the better understanding.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.