PHP DataMapper with multiple persistence layers

PHP DataMapper with multiple persistence layers - php

I am writing a system in PHP that has to write to three persistence layers:
One web service
Two databases (one mysql one mssql)
The reason for this is legacy systems and cannot be changed.
I am wanting to use the DataMapper pattern and I am trying to establish the best way to achieve what I want. I have an interface like follows:
<?php
$service = $factory->getService()->create($entity);
?>
Below is some contrived and cut down code for brevity:
<?php
class Post extends AbstractService
{
protected $_mapper;
public function create(Entity $post)
{
return $this->_mapper->create($post);
}
}
class AbstractMapper
{
protected $_persistence;
public function create(Entity $entity)
{
$data = $this->_prepareForPersistence($entity);
return $this->_persistence->create($data);
}
}
?>
My question is that because there are three persistence layers, there will also therefore likely be a need for three mappers for each. I'd like a clean design pattern inspired interface to make this work.
I see it as having three options:
Inject three mappers into the Service and call create on each
$_mapper is an array/collection and it iterates through them calling create on each
$_mapper is actually a container object that acts as a
further proxy and calls create on each
Something strikes me as wrong with each of these solutions and would appreciate any feedback/recognised design patterns that might fit this.

I have had to solve a similar problem but very many years ago in the days of PEAR DB. In that particular case, there was a need to replicate the data across multiple databases.
We did not have the problem of the different databases having different mappings though so it was a fair bit simpler.
What we did was to facade the DB class and override the getResult function (or whatever it was called). This function then analysed the SQL and if it was a read - it would send it to just one backed and if it was a write, it would send it to all.
This actually worked really well for a very heavily utilised site.
From that background, I would suggest entirely facading all of the persistence operations. Once you have done that, the implementation details are less relevant and can be changed at any time.
From this perspective, any of your implementation ideas seem like a reasonable approach. There are various things you will want to think about though.
What if one of the backends throw an error?
What is the performance impact of writing to three database servers?
Can the writes be done asynchronously (if so, ask the first question again)
There is potentially another way to solve this problem as well. That is to use stored procedures. If you have a primary database server, you could write a trigger which, on commit (or thereabouts) connects to the other database and sychronises the data.
If the data update does not need to be immediate, you could get the primary database to log changes and have another script that regularly "fed" this data into the other system. Again, the issue of errors will need to be considered.
Hope this helps.

First, a bit of terminology: what you call three layers, are in fact, three modules, not layers. That is, you have three modules within the persistence layer.
Now, the basic premise of this problem is this: you MUST have three different persistence logic, corresponding to three different storage sources. This is something that you can't avoid. Therefore, the question is just about how to invoke write operation on this modules (assuming that for read you don't need to call all three, or if you do, that is a separate question any ways).
From the three options you have listed, in my opinion the first one is better. Because, that is the simplest of the three. The other two, will still need to call three modules separately, with the additional work of introducing a container or some sort of data structure. You still can't avoid calling three modules somewhere.
If working with the first option, then you obviously need to work with interfaces, to provide a uniform abstraction for the user/client (in this case the service).
My point is that:
1. Their is an inherent complexity in your problem, which you can't simplify further.
2. The first option is better, because the other two, make things more complex, not simple.

I think option #2 is the best in my opinion. I would go with that. If you had like 10+ mappers than option #3 would make sense to shift the create logic to the mapper itself, but since you have a reasonable number of mappers it makes more sense to just inject them and iterate over them. Extending functionality by adding another mapper would be a matter of just adding 1 line to your dependency injection configuration.

Related

Mvc and only selecting fields needed

I cant seem to find an acceptable answer to this.
There are two big things I keep seeing:
1) Don't execute queries in the controller. That is the responsibility of business or data.
2) Only select the columns that you need in a query.
My problem is that these two things kind of butt heads since what is displayed in the UI is really what determines what columns need to be queried. This in turn leads to the obvious solution of running the query in the controller, which you aren't supposed to do. Any documentation I have found googling, etc. seems to conveniently ignore this topic and pretend it isn't an issue.
Doing it in the business layer
Now if I take it the other way and query everything in the business layer then I implicitly am making all data access closely reflect the ui layer. This is more a problem with naming of query functions and classes than anything I think.
Take for example an application that has several views for displaying different info about a customer. The natural thing to do would be to name these data transfer classes the same as the view that needs them. But, the business or service layer has no knowledge of the ui layer and therefore any one of these data transfer classes could really be reused for ANY view without breaking any architecture rules. So then, what do I name all of these variations of, say "Customer", where one selects first name and last name, another might select last name and email, or first name and city, and so on. You can only name so many classes "CustomerSummary".
Entity Framework and IQueryable is great. But, what about everything else?
I understand that in entity framework I can have a data layer pass back an IQuerable whose execution is deferred and then just tell that IQueryable what fields I want. That is great. It seems to solve the problem. For .NET. The problem is, I also do PHP development. And pretty much all of the ORMs for php are designed in a way that totally defeat the purpose of using an ORM at all. And even those dont have the same ability as EF / IQueryable. So I am back to the same problem without a solution again in PHP.
Wrapping it up
So, my overall question is how do I get only the fields I need without totally stomping on all the rules of an ntier architecture? And without creating a data layer that inevitably has to be designed to reflect the layout of the UI layer?

And pretty much all of the ORMs for php are designed in a way that totally defeat the purpose of using an ORM at all.
The Doctrine PHP ORM offers lazy loading down to the property / field level. You can have everything done through proxies that will only query the database as needed. In my experience letting the ORM load the whole object once is preferable 90%+ of the time. Otherwise if you're not careful you will end up with multiple queries to the database for the same records. The extra DB chatter isn't worthwhile unless your data model is messy and your rows are very long.
Keep in mind a good ORM will also offer a built-in caching layer. Populating a whole object once and caching it is easier and more extensible then having your code keep track of which fields you need to query in various places.
So my answer is don't go nuts trying to only query the fields you need when using an ORM. If you are writing your queries by hand just in the places you need them, then only query the fields you need. But since you are talking good architectural patterns I assume you're not doing this.
Of course there are exceptions, like querying large data sets for reporting or migrations. These will require unique optimizations.

Questions
1) Don't execute queries in the controller. That is the responsibility of business or data.
How you design your application is up to you. That being said, it's always best to consider best patterns and practices. The way I design my controllers is that I pass in the data layer(IRepository) through constructor and inject that at run time.
public MyController(IRepository repo)
To query my code I simply call
repository.Where(x=> x.Prop == "whatever")
Using IQueryable creates the leaky abstraction problem. Although, it may not be a big deal but you have to be careful and mindful of how you are using your objects especially if they contain relational data. Once you query your data layer you would construct your view model in your controller action with the appropriate data required for your view.
public ActionResult MyAction(){
var data = _repository.Single(x => x.Id == 1);
var vm = new MyActionViewModel {
Name = data.Name,
Age = data.Age
};
return View();
}
If I had any queries that where complex I would create a business layer to include that logic. This would include enforcing business rules etc. In my business layer I would pass in the repository and use that.
2) Only select the columns that you need in a query.
With ORMs you usually pass back the whole object. After that you can construct your view model to include only the data you need.
My suggestion to your php problem is maybe to set up a web api for your data. It would return json data that you can then parse in whatever language you need.
Hope this helps.

The way I do it is as follows:
Have a domain object (entity, business object .. things with the same name) for Entities\Customer, that has all fields and associated logic for all of the data, that a complete instance would have. But for persistence create two separate data mappers:
Mappers\Customer for handling all of the data
Mappers\CustomerSummary for only important parts
If you only need to get customers name and phone number, you use the "summary mapper", but, when you need to examine user's profile, you have the "all data mapper". And the same separation can be really useful, when updating data too. Especially, if your "full customer" get populated from multiple tables.
// code from a method of some service layer class
$customer = new \Model\Entities\Customer;
$customer->setId($someID);
$mapper = new \Model\Mappers\CustomerSummary($this->db);
if ($needEverything) {
$mapper = new \Model\Mappers\Customer($this->db);
}
$mapper->fetch($customer);
As for, what goes where, you probably might want to read this old post.

Why separate Model and Controller in MVC?

I'm trying to understand the MVC pattern in Phalcon.
In my current application I only need ONE template file for each table. The template contains the datagrid, the SQL statement for the SELECT, the form, add/edit/delete-buttons, a search box and all things necessary to interact with the database, like connection information (of course using includes as much as possible to prevent duplicate code). (I wrote my own complex framework, which converts xml-templates into a complete HTML-page, including all generated Javascript-code and CSS, without any PHP needed for the business logic. Instead of having specific PHP classes for each table in the database, I only use standard operation-scripts and database-classes that can do everything). I'm trying to comply more with web standards though, so I'm investigating alternatives.
I tried the INVO example of Phalcon and noticed that the Companies-page needs a Companies model, a CompaniesController, a CompaniesForm and 4 different views. To me, compared to my single file template now, having so many different files is too confusing.
I agree that separating the presentation from the business logic makes sense, but I can't really understand why the model and controller need to be in separate classes. This only seems to make things more complicated. And it seems many people already are having trouble deciding what should be in the model and what should be in the controller anyway. For example validation sometimes is put in the model if it requires business logic, but otherwise in the controller, which seems quite complex.
I work in a small team only, so 'separation of concerns' (apart from the presentation and business logic) is not really the most important thing for us.
If I decide not to use separate model and controller classes,
what problems could I expect?

Phalcon's Phalcon\Mvc\Model class, which your models are supposed to extend, is designed to provide an object-oriented way of interacting with the database. For example, if your table is Shopping_Cart then you'd name your class ShoppingCart. If your table has a column "id" then you'd define a property in your class public $id;.
Phalcon also gives you methods like initialize() and beforeValidationOnCreate(). I will admit these methods can be very confusing regarding how they work and when they're ran and why you'd ever want to call it in the first place.
The initialize() is quite self-explanatory and is called whenever your class is initiated. Here you can do things like setSource if your table is named differently than your class or call methods like belongsTo and hasMany to define its relationship with other tables.
Relationship are useful since it makes it easy to do something like search for a product in a user's cart, then using the id, you'd get a reference to the Accounts table and finally grab the username of the seller of the item in the buyer's cart.
I mean, sure, you could do separate queries for this kind of stuff, but if you define the table relationships in the very beginning, why not?
In terms of what's the point of defining a dedicated model for each table in the database, you can define your own custom methods for managing the model. For example you might want to define a public function updateItemsInCart($productId,$quantity) method in your ShoppingCart class. Then the idea is whenever you need to interact with the ShoppingCart, you simply call this method and let the Model worry about the business logic. This is instead of writing some complex update query which would also work.
Yes, you can put this kind of stuff in your controller. But there's also a DRY (Don't Repeat Yourself) principle. The purpose of MVC is separation of concerns. So why follow MVC in the first place if you don't want a dedicated Models section? Well, perhaps you don't need one. Not every application requires a model. For example this code doesn't use any: https://github.com/phalcon/blog
Personally, after using Phalcon's Model structure for a while, I've started disliking their 1-tier approach to Models. I prefer multi-tier models more in the direction of entities, services, and repositories. You can find such code over here:
https://github.com/phalcon/mvc/tree/master/multiple-service-layer-model/apps/models
But such can become overkill very quickly and hard to manage due to using too much abstraction. A solution somewhere between the two is usually feasible.
But honestly, there's nothing wrong with using Phalcon's built-in database adapter for your queries. If you come across a query very difficult to write, nobody said that every one of your models needs to extend Phalcon\Mvc\Model. It's still perfectly sound logic to write something like:
$pdo = \Phalcon\DI::getDefault()->getDb()->prepare($sql);
foreach($params as $key => &$val)
{
$pdo->bindParam($key,$val);
}
$pdo->setFetchMode(PDO::FETCH_OBJ);
$pdo->execute();
$results=$pdo->fetchAll();
The models are very flexible, there's no "best" way to arrange them. The "whatever works" approach is fine. As well as the "I want my models to have a method for each operation I could possibly ever want".
I will admit that the invo and vokuro half-functional examples (built for demo purposes only) aren't so great for picking up good model designing habits. I'd advise finding a piece of software which is actually used in a serious manner, like the code for the forums: https://github.com/phalcon/forum/tree/master/app/models
Phalcon is still rather new of a framework to find good role models out there.
As you mention, regarding having all the models in one file, this is perfectly fine. Do note, as mentioned before, using setSource within initialize, you can name your classes differently than the table they're working on. You can also take advantage of namespaces and have the classes match the table names. You can take this a step further and create a single class for creating all your tables dynamically using setSource. That's assuming you want to use Phalcon's database adapter. There's nothing wrong with writing your own code on top of PDO or using another framework's database adapter out there.
As you say, separation of concerns isn't so important to you on a small team, so you can get away without a models directory. If it's any help, you could use something like what I wrote for your database adapter: http://pastie.org/10631358
then you'd toss that in your app/library directory. Load the component in your config like so:
$di->set('easySQL', function(){
return new EasySQL();
});
Then in your Basemodel you'd put:
public function easyQuery($sql,$params=array())
{
return $this->di->getEasySQL()->prepare($sql,$params)->execute()->fetchAll();
}
Finally, from a model, you can do something as simple as:
$this->easyQuery($sqlString,array(':id'=>$id));
Or define the function globally so your controllers can also use it, etc.
There's other ways to do it. Hopefully my "EasySQL" component brings you closer to your goal. Depending on your needs, maybe my "EasySQL" component is just the long way of writing:
$query = new \Phalcon\Mvc\Model\Query($sql, $di);
$matches=$query->execute($params);
If not, perhaps you're looking for something more in the direction of
$matches=MyModel::query()->where(...)->orderBy(...)->limit(...)->execute();
Which is perfectly fine.

Model, View and Controller were designed to separate each process.
Not just Phalcon uses this kind of approach, almost PHP Frameworks today uses that approach.
The Model should be the place where you're saving or updating things, it should not rely on other components but the database table itself (ONLY!), and you're just passing some boolean(if CRUD is done) or a database record query.
You could do that using your Controller, however if you'll be creating multiple controllers and you're doing the same process, it is much better to use 1 function from your model to call and to pass-in your data.
Also, Controllers supposed to be the script in the middle, it should be the one to dispatch every request, when saving records, when you need to use Model, if you need things to queue, you need to call some events, and lastly to respond using json response or showing your template adapter (volt).
We've shorten the word M-V-C, but in reality, we're processing these:
HTTP Request -> Services Loaded (including error handlers) -> The Router -> (Route Parser) -> (Dispatch to specified Controller) -> The Controller -> (Respond using JSON or Template Adapter | Call a Model | Call ACL | Call Event | Queue | API Request | etc....) -> end.

Where would a senior PHP developer locate the method getActiveEntries()?

Separation of Concerns or Single Responsibility Principle
The majority of the questions in the dropdown list of questions that "may already have your answer" only explain "theory" and are not concrete examples that answer my simple question.
What I'm trying to accomplish
I have a class named GuestbookEntry that maps to the properties that are in the database table named "guestbook". Very simple!
Originally, I had a static method named getActiveEntries() that retrieved an array of all GuestbookEntry objects that had entries in the database. Then while learning how to properly design php classes, I learned two things:
Static methods are not desirable.
Separation of Concerns
My question:
Dealing with Separation of Concerns, if the GuestbookEntry class should only be responsible for managing single guestbook entries then where should this getActiveEntries() method go? I want to learn the absolute proper way to do this.

I guess there are actually two items in the question:
As #duskwuff points out, there is nothing wrong with static methods per-se; if you know their caveats (e.g. "Late Static Binding") and limitations, they are just another tool to work with. However, the way you model the interaction with the DB does have an impact on separation of concerns and, for example, unit testing.
For different reasons there is no "absolute proper way" of doing persistence. One of the reasons is that each way of tackling it has different tradeoffs; which one is better for you project is hard to tell. The other important reason is that languages evolve, so a new language feature can improve the way frameworks handle things. So, instead of looking for the perfect way of doing it you may want to consider different ways of approaching OO persistence assuming that you want so use a relational database:
Use the Active Record pattern. What you have done so far looks like is in the Active Record style, so you may find it natural. The active record has the advantage of being simple to grasp, but tends to be tightly coupled with the DB (of course this depends on the implementation). This is bad from the separation of concerns view and may complicate testing.
Use an ORM (like Doctrine or Propel). In this case most of the hard work is done by the framework (BD mapping, foreign keys, cascade deletes, some standard queries, etc.), but you must adapt to the framework rules (e.g. I recall having a lot of problems with the way Doctrine 1 handled hierarchies in a project. AFAIK this things are solved in Doctrine 2).
Roll your own framework to suite your project needs and your programming style. This is clearly a time consuming task, but you learn a lot.
As a general rule of thumb I try to keep my domain models as independent as possible from things like DB, mainly because of unit tests. Unit tests should be running all the time while you are programming and thus they should run fast (I always keep a terminal open while I program and I'm constantly switching to it to run the whole suite after applying changes). If you have to interact with the DB then your tests will become slow (any mid-sized system will have 100 or 200 test methods, so the methods should run in the order of milliseconds to be useful).
Finally, there are different techniques to cope with objects that communicate with DBs (e.g. mock objects), but a good advise is to always have a layer between your domain model and the DB. This will make your model more flexible to changes and easier to test.
EDIT: I forgot to mention the DAO approach, also stated in #MikeSW answer.
HTH

Stop second-guessing yourself. A static method getActiveEntries() is a perfectly reasonable way to solve this problem.
The "enterprisey" solution that you're looking for would probably involve something along the lines of creating a GuestbookEntryFactory object, configuring it to fetch active entries, and executing it. This is silly. Static methods are a tool, and this is an entirely appropriate job for them.

GetActiveEntries should be a method of a repository/DAO which will return an array of GuestBookEntry. That repository of course can implement an interface so here you have easy testing.
I disagree in using a static method for this, as it's clearly a persistence access issue. The GuestBookEntry should care only about the 'business' functionality and never about db. That's why it's useful to use a repository, that object will bridge the business layer to the db layer.
Edit My php's rusty but you get the idea.
public interface IRetrieveEntries
{
function GetActiveEntries();
}
public class EntriesRepository implements IRetrieveEntries
{
private $_db;
function __constructor($db)
{
$this->_db=$db;
}
function GetActiveEntries()
{
/* use $db to retreive the actual entries
You can use an ORM, PDO whatever you want
*/
//return entries;
}
}
You'll pass the interface around everywhere you need to access the functionality i.e you don't couple the code to the actual repository. You will use it for unit testing as well. The point is that you encapsulate the real data access in the GetActiveEntries method, the rest of the app won't know about the database.
About repository pattern you can read some tutorials I've wrote (ignore the C# used, the concepts are valid in any language)

PHP Static Methods/Singleton Pattern

I am designing a new architecture for my company's software product. I am fairly new to unit testing. I read some horror stories about the use of singleton and static methods, but I am not clearly understanding the problem with using them and would appreciate some enlightenment.
Here is what I am doing:
I have a multi-tier architecture. At the server side level, I am using a series of reusable objects to represent database tables, called "Handlers". These handlers use other objects, like XMLObjects, XMLTables, different Datastructures, etc. Most of these are homemade, not your prepacked objects. Anyway, on top of this layer is a pseudo-singleton layer. The primary purpose of this is to simplify higher levels of server side code and create seamless class management. I can say:
$tablehandler = databasename::Handler('tablename')
...to get a table. I don't see the inherent problem with this. I am using a stack of handlers (an associative array) to contain instances of the different objects. I am not using globals, and all static datamembers are protected or private. My question is how this can cause problems with unit testing. I am not looking for bandwagon rhetoric, I am looking for a causal relationship. I would also appreciate any other insight to this. I feel like its an extremely flexible and efficient architecture at this point. Any help here would be great.

Using static methods to provide access to managed instances--whether they be singletons, pooled objects, or on-the-fly instances--isn't a problem for testing as long as you provide a way for the tests to inject their own mocks and stubs as needed and remove them once complete. From your description I don't see anything that would block that ability. It's merely up to you to build it when necessary.
You'll experience difficulty if you have a class that makes static calls to methods that do the work without an extension point or going through an instance. For example, a call like this
UserTable::findByEmail($email);
makes testing difficult because you cannot plug in a mock, memory-only table, etc. However, changing it to
UserTable::get()->findByEmail($email);
solves the problem because the test can call UserTable::set($mock) in the setup code. For a more detailed example, see this answer.

PHP MVC & SQL minus Model

I've been reading several articles on MVC and had a few questions I was hoping someone could possibly assist me in answering.
Firstly if MODEL is a representation of the data and a means in which to manipulate that data, then a Data Access Object (DAO) with a certain level of abstraction using a common interface should be sufficient for most task should it not?
To further elaborate on this point, say most of my development is done with MySQL as the underlying storage mechanism for my data, if I avoided vendor specific functions -- (i.e. UNIX_TIMESTAMP) -- in the construction of my SQL statements and used a abstract DB object that has a common interface moving between MySQL and maybe PostgreSQL, or MySQL and SQLite should be a simple process.
Here's what I'm getting at some task, are handled by a single CONTROLLER -- (i.e. UserRegistration) and rather that creating a MODEL for that task, I can get an instance of the db object -- (i.e. DB::getInstance()) -- then make the necessary db calls to INSERT a new user. Why with such a simple task would I create a new MODEL?
In some of the examples I've seen a MODEL is created, and within that MODEL there's a SELECT statement that fetches x number of orders from the order table and returns an array. Why do this, if in your CONTROLLER your creating another loop to iterate over that array and assign it to the VIEW; ex. 1?
ex. 1: foreach ($list as $order) { $this->view->set('order', $order); }
I guess one could modify the return so something like this is possibly; ex. 2.
ex. 2: while ($order = $this->model->getOrders(10)) { $this->view->set('order', $order); }
I guess my argument is that why create a model when you can simply make the necessary db calls from within your CONTROLLER, assuming your using a DB object with common interface to access your data, as I suspect most of websites are using. Yes I don't expect this is practical for all task, but again when most of what's being done is simple enough to not necessarily warrant a separate MODEL.
As it stands right now a user makes a request 'www.mysite.com/Controller/action/args1/args2', the front controller (I call it router) passes off to Controller (class) and within that controller a certain action (method) is called and from there the appropriate VIEW is created and then output.

So I guess you're wondering whether the added complexity of a model layer -on top- of a Database Access Object is the way you want to go. In my experience, simplicity trumps any other concern, so I would suggest that if you see a clear situation where it's simpler to completely go without a Model and have the data access occur in the equivalent of a controller, then you should go with that.
However, there are still other potential benefits to having an MVC separation:
No SQL at all in the controller: Maybe you decide to gather your data from a source other than a database (an array in the session? A mock object for testing? a file? just something else), or your database schema changes and you have to look for all the places that your code has to change, you could look through just the models.
Seperation of skillsets: Maybe someone on your team is great at complex SQL queries, but not great at dealing with the php side. Then the more separated the code is, the more people can play to their strengths (even more so when it comes to the html/css/javascript side of things).
Conceptual object that represents a block of data: As Steven said, there's a difference in the benefits you get from being database agnostic (so you can switch between mysql and postgresql if need be) and being schema agnostic (so you have an object full of data that fits together well, even if it came from different relational tables). When you have a model that represents a good block of data, you should be able to reuse that model in more than one place (e.g. a person model could be used in logins and when displaying a personnel list).
I certainly think that the ideals of separation of the tasks of MVC are very useful. But over time I've come to think that alternate styles, like keeping that MVC-like separation with a functional programming style, may be easier to deal with in php than a full blown OOP MVC system.

I found this great article that addressed most of my questions. In case anyone else had similar questions or is interested in reading this article. You can find it here http://blog.astrumfutura.com/archives/373-The-M-in-MVC-Why-Models-are-Misunderstood-and-Unappreciated.html.

The idea behind MVC is to have a clean separation between your logic. So your view is just your output, and your controller is a way of interacting with your models and using your models to get the necessary data to give to the necessary views. But all the work of actually getting data will go on your model.
If you think of your User model as an actual person and not a piece of data. If you want to know that persons name is it easier to call up a central office on the phone (the database) and request the name or to just ask the person, "what is your name?" That's one of the ideas behind the model. In a most simplistic way you can view your models as real living things and the methods you attach to them allow your controllers to ask those living things a series of questions (IE - can you view this page? are you logged in? what type of image are you? are you published? when were you last modified?). Your controller should be dumb and your model should be smart.
The other idea is to keep your SQL work in one central location, in this case your models. So that you don't have errant SQL floating around your controllers and (worst case scenario) your views.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.