Data Mapper pattern and duplicate objects

Data Mapper pattern and duplicate objects - php

I'm using the data mapper pattern in a PHP app I'm developing and have a question. At present, you request a Site object with a specific ID and the mapper will lookup the row, create an object and return it. However, if you do this again for the same Site you end up with two different objects with identical data. eg.:
$mapper = new Site_Mapper();
$a = $mapper->get(1);
$b = $mapper->get(1);
$a == $b // true
$a === $b // false
So, my question is, should I:
Store instantiated Site objects in
the mapper so I can then check if
they already exist before creating a
new one (could be a problem if
there's multiple mappers of the same
type)
Do the same as #1 but ensure
there is only ever one instances of each
mapper
Do the same as #1 but use a
static property so multiple
instances isn't a problem
Don't
worry about it because it's
probably not a problem

What you're looking for is the Identity Map pattern. Be careful with so called "reading inconsistencies", though. While you use an "old instance", the DB might have been changed already. And while you edit your object, another user might get an instance of it, change it faster and save it faster. Then the other object overrides all these changes again. On the web though maybe not such a big problem since a "page" quickly runs through and no object survives for longer than a few fractional seconds.

I know the question was asked quite a while ago still wanted to answer just in case if someone else runs in a similar dilemma. Actually the above suggestions #1,2,3 that author made are all related and one should consider them all in order to solve the problem.
1) Store each retrieved from DB object in a mapper so that you don't have to do it again when object with the same ID is requested. In all subsequent calls the mapper should return the stored object. This is called IdentityMap pattern. To achieve this make a private property in your mapper to hold an instance of an IdentityMap for a given object type. The Site_Mapper->get() method should always check the IdentityMap for a given ID if the object is not retrieved yet the mapper will go to the database but if it's already stored in the map it returns the cached instance which saves the trip to the database. So then $a === $b should be true because they would be references to the same object instance.
2) Yes ideally there should always be one instance of the given data mapper (Site_Mapper) in order to maintain a single instance of the IdentityMap at a given time. This can be done using Singleton pattern. This is possible with some getter method like Site_Mapper::getInstance() which will always return the same instance of a given mapper. You'd also have to declare the __construct() as a private method to prevent unwanted instantiation using new and make sure getInstance() is the only way to instantiate a mapper.
3) What the author mentioned above about static properties is true as well. To implement a Singleton in PHP one has to use a static property to hold and instance of a Mapper.
I would highly recommend Martin Fowler's book "Patterns of Enterprise Application Architecture" which talks about the above mentioned patterns and many more. It's a good read especially if you're working on your own custom ORM solution. Hope that helps.

I'd go with caching somehow - static mapper classes would be my first choice, and is what I've seen most of. Otherwise, your option 2 (which is the singleton pattern) is probably the best option.
Remember you need to clear this cache when an update is made to avoid returning stale data.
Having said that, unless you are making something to get a lot of use or that does a lot of queries, it may not matter. (your 4)
Also worth looking at for guidance (I'm sure there are many examples, I just know this one best), Propel (http://propel.phpdb.org/) has the caching feature - might be worth looking at how it does it? Or just use it maybe?

Related

Do Abstract Factories use "new"?

I am trying to use Dependency Injection as much as possible, but I am having trouble when it comes to things like short-lived dependencies.
For example, let's say I have a blog manager object that would like to generate a list of blogs that it found in the database. The options to do this (as far as I can tell) are:
new Blog();
$this->loader->blog();
the loader object creates various other types of objects like database objects, text filters, etc.
$this->blogEntryFactory->create();
However, #1 is bad because it creates a strong coupling. #2 still seems bad because it means that the object factory has to be previously injected - exposing all the other objects that it can create.
Number 3 seems okay, but if I use #3, do I put the "new" keywords in the blogEntryFactory itself, OR, do I inject the loader into the blogEntryFactory and use the loader?
If I have many different factories like blogEntryFactory (for example I could have userFactory and commentFactory) it would seem like putting the "new" keyword across all these different factories would be creating dependency problems.
I hope this makes sense...
NOTE
I have had some answers about how this is unnecessary for this specific blog example, but there are, in fact, cases where you should use the Abstract Factory Pattern, and that is the point I am getting at. Do you use "new" in that case, or do something else?

I'm no expert, but I'm going to take a crack at this. This assumes that Blog is just a data model object that acts as a container for some data and gets filled by the controller (new Blog is not very meaningful). In this case, Blog is a leaf of the object graph, and using new is okay. If you are going to test methods that need to create a Blog, you have to simultaneously test the creation of the Blog anyway, and using a mock object doesn't make sense .. the Blog does not persist past this method.
As an example, say that PHP did not have an array construct but had a collections object. Would you call $this->collectionsFactory->create() or would you be satisfied to say new Array;?

In answer to the title: yes, abstract factories typically use new. For example, see the MazeFactory code on page 92 of the GoF book. It includes, return new Maze; return new Wall; return new Room; return new Door;
In answer to the note: a design that uses abstract factories to create data models is highly suspect. The purpose is to vary the behavior of the factory's products while making their concrete implementations invisible to clients. Data models with no behavior do not benefit from an abstract factory.

OO Troubles in PHP (or any non-persistent environment)

Unless you count QBASIC back when I was a kid, I learned to program in the object oriented paradigm a half-decade ago in university, so I understand how it works. I have difficulty, however, implementing it in my large, database-driven PHP applications, because everything is re-initialized on each page load. The only true persistence is the database, but pulling all the data from the database and reconstructing all the objects on every page load sounds like a lot of overhead for not much benefit, as I'll just have to do it all over again on next page load.
Additionally, I run into these sorts of problems when I try to re-factor an application (third attempt now) to OO:
A class called "Person" exists, and it contains a variety of properties. One of the extensions of that class, "Player", contains a "parent" property, of the type "Player". Initially, none of the players start with any parents, so they would all be initialized with that field as NULL. Over time, however, parents would be added to the Players properties, and would reference other Player objects that already exist.
In PHP, however, it's inevitable that I would have to rebuild the class structure with Players already having parents. In such a case, if I instantiate a Player who has as a parent a Player that hasn't been instantiated yet - I have a problem.
Is it necessary to reduce the scope of OO programming when dealing with PHP?

You're confusing OOP concepts, with concepts about HTTP and persistence.
Your question isn't really about OOP at all- it's about persisting OOP data across multiple HTTP requests.
PHP doesn't automatically persist objects across HTTP requests- it's left up to you to take care of that (in your case, doing just as you said: "pulling all the data from the database and reconstructing all the objects on every page load").

Is it necessary to reduce the scope of OO programming when dealing with PHP?
No. You just need to use lazy loading to instantiate parent player only if it becomes necessary.
Class Player {
protected $id;
protected $name;
protected $parentId;
protected $parent;
public static function getById($id) {
// fetch row from db where primary key = $id
// instantiate Player populated with data returned from db
}
public function getParentPlayer() {
if ($this->parentId) {
$this->parent or $this->parent = Player::getById($this->parentId);
return $this->parent;
}
return new Player(); // or null if you will
}
}
This way if you intantiate player id 40:
$john = Player::getById(40);
It does not have the $parent property populated. Only upon call to getParentPlayer() it is being loaded and stored under that property.
$joe = $john->getParentPlayer();
That is of course only if $parentId is pointing to non-NULL.
Update: to solve problem of duplicate instances stored in protected property when two or more players share single parent, static cache can be used:
protected static $cache = array();
public static function getById($id) {
if (array_key_exists($id, self::$cache)) {
return self::$cache[$id];
}
// fetch row from db where primary key = $id
// instantiate Player populated with data returned from db
self::$cache[$id] = ... // newly created instance
// return instance
}
If you expect to access significant number of parents on every request, it may be feasible to select all records from database at once and store them in another static pool.

Items not being created by means of parents is indeed something that is tricky. You could do something like, ID referencing and build a PlayerManager that does the linking for you. By means of RUID's (Runtime Unique ID's). But of course, you would like to have the parents be created first.
So when creating something like an OO structure of your database, which I would encourage if you want to do some extra manipulations on it besides printing. It is doable to create a player from your DB and then, if it has a parent, create the parent as you would do normally. When arriving at the parent in your while-loop. You can just say, no it is already created. Use a Player-manager class for this, which holds your Players. So it can check if Players (and thus parents) are already there.
So you get a bit of a different way of walking through your DB, instead of just doing it linear.
As for: is it necessary to do so. I actually don't know that much about it myself. It seems, as far as I've seen on the web, that PHP classes are mostly used to created one object that can do smart things (like a DOMDocument).
But do you really have to convert your DB table to objects? It depends on your type of application. If printing is what you want, it doesn't seem logical. If some big manipulation with a lot of calls to the objects (or normally to the table), I maybe would like to see a nice OO programmed structure.

OOP programming in PHP is for me more a way of having a good maintainability and a good architecture (composition, factories, component based programing, etc ).
For sure you'll need to rethink some Design Patterns used in persitent envirronments. Pools of objects, etc. And of course some Design Patterns as still completly good in PHP used in short-term module in apache: Singleton, Factory, Adapter, Interfaces, Lazy loading etc.
About the persitence problem there're several solutions. Of course database storage. But you have as well the session storage and application caches (like memcached or apc). You can store serialized objects in such caches. You'll just need a good autoloader to get the classes loaded (and a good opcode to avoid re-interpreting the sources at every requests).
Some objects are really heavy to load and build, we can think for example of an ACL object, loading roles, ressources, default policy, exception rules, maybe even compling some of theses rules. Once you've got it it would be a desaster to rebuild this object at every request. You should really study the way to store/load this finished object somewhere to limit his load time (avoiding SQL requests and build time). Once built it's certainly an object that could have a long life, like 1hour.
The only objects that you cannot store as serialized strings are in fact the one which depends shortly of data updates by concurrent requests. Some websites does not even need to really care about it, accuracy of data is not the same in a public website publishing news and in a financial accouting app. If your application can handle pre-build serialized objects you can see the serialize-storage-load-unserialize as a simple overhead in the framework :-) But effectivly, share nothing, no object listening to all concurrent requests, no write operation on an object shared with a parallel request until this object (or related data) is stored somewhere (and read). Take it as a challenge!

benefit of having a factory for object creation?

I'm trying to understand the factory design pattern.
I don't understand why it's good to have a middleman between the client and the product (object that the client wants).
example with no factory:
$mac = new Mac();
example with a factory:
$appleStore = new AppleStore();
$mac = $appleStore->getProduct('mac');
How does the factory pattern decouple the client from the product?
Could someone give an example of a future code change that will impact on example 1 negative, but positive in example 2 so I understand the importance of decoupling?
Thanks.

I think it has to do with the resources needed to construct some types of objects.
Informally, if you told someone to build a Mac, it would be a painstaking process that would take years of design, development, manufacturing, and testing, and it might not be done right. This process would have to be repeated for every single Mac. However, if you introduce a factory, all the hard work can be done just once, then Macs can be produced more cheaply.
Now consider Joomla's factory.php. From what I can tell, the main purpose of JFactory is to pool objects and make sure objects that should be the same aren't copied. For instance, JFactory::getUser() will return a reference to one and only one object. If something gets changed in that user object, it will appear everywhere. Also, note that JFactory::getUser() returns a reference, not a new object. That is something you simply cannot do with a constructor.
Often, you need local context when constructing an object, and that context may persist and possibly take on many forms. For instance, there might be a MySQL database holding users. If User objects are created with a constructor, you'll need to pass a Database object to the constructor (or have it rely on a global variable). If you decide to switch your application to PostgreSQL, the semantics of the Database object may change, causing all uses of the constructor to need review. Global variables let us hide those details, and so do factories. Thus, a User factory would decouple the details of constructing User objects from places where User objects are needed.
When are factories helpful? When constructing an object involves background details. When are constructors better? When global variables suffice.

Don't know if I can put it any better than IBM did https://www.ibm.com/developerworks/library/os-php-designptrns/#N10076

This example returns an object of type Mac and it can never be anything different:
$mac = new Mac();
It can't be a subclass of Mac, not can it be a class that matches the interface of Mac.
Whereas the following example may return an object of type Mac or whatever other type the factory decides is appropriate.
$appleStore = new AppleStore();
$mac = $appleStore->getProduct('mac');
You might want a set of subclasses of Mac, each representing a different model of Mac. Then you write code in the factory to decide which of these subclasses to use. You can't do that with the new operator.
So a factory gives you more flexibility in object creation. Flexibility often goes hand in hand with decoupling.
Re your comment: I wouldn't say never use new. In fact, I do use new for the majority of simple object creation. But it has nothing to do with who is writing the client code. The factory pattern is for when you want an architecture that can choose the class to instantiate dynamically.
In your Apple Store example, you would probably want some simple code to instantiate a product and add it to a shopping cart. If you use new and you have different object types for each different product type, you'd have to write a huge case statement so you could make a new object of the appropriate type. Every time you add a product type, you'd have to update that case statement. And you might have several of these case statements in other parts of your application.
By using a factory, you would only have one place to update, that knows how to take a parameter and instantiate the right type of object. All places in your app would implicitly gain support for the new type, with no code changes needed. This is a win whether you're the sole developer or if you're on a team.
But again, you don't need a factory if you don't need to support a variety of subtypes. Just continue to use new in simple cases.

What does a Data Mapper typically look like?

I have a table called Cat, and an PHP class called Cat. Now I want to make a CatDataMapper class, so that Cat extends CatDataMapper.
I want that Data Mapper class to provide basic functionality for doing ORM, and for creating, editing and deleting Cat.
For that purpose, maybe someone who knows this pattern very well could give me some helpful advice? I feel it would be a little bit too simple to just provide some functions like update(), delete(), save().
I realize a Data Mapper has this problem: First you create the instance of Cat, then initialize all the variables like name, furColor, eyeColor, purrSound, meowSound, attendants, etc.. and after everything is set up, you call the save() function which is inherited from CatDataMapper. This was simple ;)
But now, the real problem: You query the database for cats and get back a plain boring result set with lots of cats data.
PDO features some ORM capability to create Cat instances. Lets say I use that, or lets even say I have a mapDataset() function that takes an associative array. However, as soon as I got my Cat object from a data set, I have redundant data. At the same time, twenty users could pick up the same cat data from the database and edit the cat object, i.e. rename the cat, and save() it, while another user still things about setting another furColor. When all of them save their edits, everything is messed up.
Err... ok, to keep this question really short: What's good practice here?

From DataMapper in PoEA
The Data Mapper is a layer of software
that separates the in-memory objects
from the database. Its responsibility
is to transfer data between the two
and also to isolate them from each
other. With Data Mapper the in-memory
objects needn't know even that there's
a database present; they need no SQL
interface code, and certainly no
knowledge of the database schema. (The
database schema is always ignorant of
the objects that use it.) Since it's a
form of Mapper (473), Data Mapper
itself is even unknown to the domain
layer.
Thus, a Cat should not extend CatDataMapper because that would create an is-a relationship and tie the Cat to the Persistence layer. If you want to be able to handle persistence from your Cats in this way, look into ActiveRecord or any of the other Data Source Architectural Patterns.
You usually use a DataMapper when using a Domain Model. A simple DataMapper would just map a database table to an equivalent in-memory class on a field-to-field basis. However, when the need for a DataMapper arises, you usually won't have such simple relationships. Tables will not map 1:1 to your objects. Instead multiple tables could form into one Object Aggregate and viceversa. Consequently, implementing just CRUD methods, can easily become quite a challenge.
Apart from that, it is one of the more complicated patterns (covers 15 pages in PoEA), often used in combination with the Repository pattern among others. Look into the related questions column on the right side of this page for similar questions.
As for your question about multiple users editing the same Cat, that's a common problem called Concurrency. One solution to that would be locking the row, while someone edits it. But like everything, this can lead to other issues.

If you rely on ORM's like Doctrine or Propel, the basic principle is to create a static class that would get the actual data from the database, (for instance Propel would create CatPeer), and the results retrieved by the Peer class would then be "hydrated" into Cat objects.
The hydration process is the process of converting a "plain boring" MySQL result set into nice objects having getters and setters.
So for a retrieve you'd use something like CatPeer::doSelect(). Then for a new object you'd first instantiate it (or retrieve and instance from the DB):
$cat = new Cat();
The insertion would be as simple as doing: $cat->save(); That'd be equivalent to an insert (or an update if the object already exists in the db... The ORM should know how to do the difference between new and existing objects by using, for instance, the presence ort absence of a primary key).

Implementing a Data Mapper is very hard in PHP < 5.3, since you cannot read/write protected/private fields. You have a few choices when loading and saving the objects:
Use some kind of workaround, like serializing the object, modifying it's string representation, and bringing it back with unserialize
Make all the fields public
Keep them private/protected, and write mutators/accessors for each of them
The first method has the possibility of breaking with a new release, and is very crude hack, the second one is considered a (very) bad practice.
The third option is also considered bad practice, since you should not provide getters/setters for all of your fields, only the ones that need it. Your model gets "damaged" from a pure DDD (domain driven design) perspective, since it contains methods that are only needed because of the persistence mechanism.
It also means that now you have to describe another mapping for the fields -> setter methods, next to the fields -> table columns.
PHP 5.3 introduces the ability to access/change all types of fields, by using reflection:
http://hu2.php.net/manual/en/reflectionproperty.setaccessible.php
With this, you can achieve a true data mapper, because the need to provide mutators for all of the fields has ceased.

PDO features some ORM capability to
create Cat instances. Lets say I use
that, or lets even say I have a
mapDataset() function that takes an
associative array. However, as soon as
I got my Cat object from a data set, I
have redundant data. At the same time,
twenty users could pick up the same
cat data from the database and edit
the cat object, i.e. rename the cat,
and save() it, while another user
still things about setting another
furColor. When all of them save their
edits, everything is messed up.
In order to keep track of the state of data typically and IdentityMap and/or a UnitOfWork would be used keep track of all teh different operations on mapped entities... and the end of the request cycle al the operations would then be performed.

keep the answer short:
You have an instance of Cat. (Maybe it extends CatDbMapper, or Cat3rdpartycatstoreMapper)
You call:
$cats = $cat_model->getBlueEyedCats();
//then you get an array of Cat objects, in the $cats array
Don't know what do you use, you might take a look at some php framework to the better understanding.

How do I access class variables of a parent object in PHP?

An instance of class A instantiates a couple of other objects, say for example from class B:
$foo = new B();
I would like to access A's public class variables from methods within B.
Unless I'm missing something, the only way to do this is to pass the current object to the instances of B:
$foo = new B($this);
Is this best practice or is there another way to do this?

That looks fine to me, I tend to use a rule of thumb of "would someone maintaining this understand it?" and that's an easily understood solution.
If there's only one "A", you could consider using the registry pattern, see for example http://www.phppatterns.com/docs/design/the_registry

I would first check if you are not using the wrong pattern: From your application logic, should B really know about A? If B needs to know about A, a parent-child relationship seems not quite adequate. For example, A could be the child, or part of A's logic could go into a third object that is "below" B in the hierarchy (i. e. doesn't know about B).
That said, I would suggest you have a method in B to register A as a data source, or create a method in A to register B as an Observer and a matching method in B that A uses to notify B of value changes.

Similar to what Paul said, if there's only one A, you can implement that as a singleton. You can then pass the instance of A as an argument to the constructor (aggregation), with a setter method (essentially aggregation again), or you can set this relationship directly in the constructor (composition).
However, while singletons are powerful, be wary of implementing them with composition. It's nice to think that you can do it that way and get rid of a constructor argument, but it also makes it impossible to replace A with something else without a code rewrite. Peronsally, I'd stick with aggregation, even if using a singleton
$foo = new B( A::getInstance() );

$foo = new B($this);
Code like this unfortunately does not match my needs. Is there any other way to access the parent object properties?
I'll try to explain why. We write a game software and some classes have very "unusual" dependencies and influence each other in different ways. That's why code sometimes gets almost unsupportable without links to parents in every instance (sometimes even several parents from different contexts i.e. a Squad may belong to Battle and to User etc...).
And now the reason why links don't satisfy me. When I generate an output for the client side, I use a kind of serializing objects in XML. It works very nice until it meets recursive references like those links to parents. I can make them protected, but then they loose their usage i.e. (dummy example)
$this->squad->battle->getTeam($tid)->getSquad($sqid)->damageCreature(...);
The other way - to implement serialization method in every serializable class and call it inside serializer like this:
$obj->toXML($node);
$this->appendChild($node);
but that's a lot of stuff to write and to support! And sometimes i generate the objects for serializer dynamically (less traffic).
I even think about a hack: to "teach" serializer to ignore some properties in certain classess )). Huh... bad idea...
It's a long discussion, but believe me, that Registry and Observer don't fit. Are there any other ideas?

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.