I am currently working on a huge refactoring project. We have taken over a classic PHP/MySQL project, where most code is procedural, duplicated, and there is very little hint of an architecture.
I am planning on using Doctrine to handle our Data Access, and have all of my tables mapped to entities. However, our MySQL tables are largely messed up.
The table I am currently working with has over 40 columns, and is not normalized by any means. A quick example of what we have:
Brand
id
name
poNumber
orderConfirmationEmail <---- these should go into a BrandConfirmations entity
shippingConfirmationEmail <-----
bill_address <---- these should go into a BrandAddress entity
bill_address2 <-----
city <------
.
.
.
Ideally, what I would like to have is for Doctrine to pull out the fields that reference different Entities, and actually put them into those Entities. So for instance id, name, and poNumber would get pulled out into a Brand entity. orderConfirmationEmail and shippingConfirmationEmail would get pulled out into a BrandNotification entity. Next, bill_address, and the rest of the address fields would get pulled out into a BrandBillAddress entity. Is there a way to configure Doctrine to split the table into these models for me, or do I have to custom write code myself that would do that?
If I do have to write the code to split this table myself, do you have any resources or advice that tackle a similar issue? I haven't been able to find many yet.
The latest version of Doctrine 2 supports what they call embeddables: http://doctrine-orm.readthedocs.org/en/latest/tutorials/embeddables.html. It may solve some of your problems. However, it requires D2.5+. Currently, S2 uses Doctrine 2.4. You could experiment with using the very latest doctrine.
What you can do is make your domain models (entities) act as though you had value objects. So $brand->getOrderConfirmation() would actually return an order confirmation object. You have to do some messing around to keep everything mapped to one table and you might be limited on some of your queries but it's not that hard. The advantage is that the rest of your new applications deals with proper normalized objects. It's only the internal persistence code that needs to get messy.
There are quite a few links on this approach. Here is one: http://russellscottwalker.blogspot.com/2013/11/entities-vs-value-objects-and-doctrine-2.html
Your best bet of course is to refactor your database schema. I like to do kind of a raw dump of the original database into a yaml file with the desired object nesting. I then load the yaml file into the new schema. If you are really lucky then you might even be able to create new views for your existing application which will allow it to keep working in parallel with your new application.
Related
I need help with something I can’t get my head wrapped around regarding the Repository and Service/Use-case pattern (part of DDD design) I want to implement in my next (Laravel PHP) project.
All seems clear. Just one part of DDD that is confusing is the data structures from repositories. People seem to choose data structures the Repository should return (arrays or entities) but it all has disadvantages. One of which is performance looking at my experiences in the past. And one is which you don’t have interfaces for simple data structures (array or simple object attributes).
I’ll start with explaining the experience I have with a previous project. This project had flaws but some good strengths I learned from and like to see in my new project but with solving some design mistakes.
Previous experience
In the past I’ve build a website that was API Centric using the Kohana framework and Doctrine 2 ORM (data mapper pattern). The flow looked like this:
Website controller → API client (HMVC calls) → API controller → Custom Repository → Doctrine 2 ORM native Repository/Entity-manager
My custom Repository returned plain arrays using Doctrine2 DQL. Doctrine2 recommends array result data for read only operations. And yes, it made my site nice and light. The API controller just converted the array data to JSON. Simple as that.
In the past my company created projects relying fully on loaded Doctrine2 entities and it’s something we regretted due to performance.
My REST API supported queries like
/api/users?include_latest_adverts=2&include_location=true
on the users resource. The API controller passed include_location to the repository which directly included the location relation. The controller read latest_adverts=2 and called the adverts repository to get the latest 2 adverts of each user. Arrays were returned.
For example first user array:
[
name
avatar
adverts [
advert 1 [
name
price
]
advert 2 [
….
]
]
]
This proved to be very successful. My whole website was using the API. It would be very easy to to add a new client because the API was perfectly in production already using oauth. The whole website runs on it.
But this design had flaws too. My controller still contained A LOT of logic for validation, mailing, params or filters like has_adverts=true to get users with adverts only. It would mean that if I created a new port, like a total new CLI interface, I would have to duplicate alot of these controllers due to all the validation etc. But no duplication if I would create a new client. So at least one problem was solved :-)
My admin panels were completely coupled to the doctrine2 repository/entity-manager to speed up development (sort of). Why? Because my API had fat controllers with special functionality for the website only (special validation, mailing for registering etc). I would have to redo work or refactor a lot. So decided to use the Entities directly to still have some sort clear way of writing code instead of rewriting all my API controllers and move them to Services (for site & admin) for instance. Time was an issue in fixing my design mistakes.
For my next project I want all code to go through my own custom repositories and services. One flow for good separation.
New project (using DDD ideas) and dilemma with data structures
While I like the idea of being API centric, I don’t want my next project to be API centric in core because I think the same functionality should be available without the HTTP protocol in between. I want to design the core using DDD ideas.
But I liked the idea using a layer that just talked as a API and returns simple arrays. The perfect base for any new port, including my own frontend. My idea is to consider my Service classes as the API interface (return the array data), do the validation etc. I could have Services specially for the website (registering) and plain services used by the Admin or background processes. In some admin cases a Service would not be required anyway for simple CRUD editing, I could just use Repositories directly. Controllers would be very thin. With this creating a real REST API would just be a matter to create new controllers using the same Services my frontend controller classes do.
For internal logic like business rules it would be useful to have Entities (clear interfaces) instead of arrays from repositories. This way I could benefit from defining some methods that did some logic based on attributes. BUT If I would be using Doctrine2 and my repositories would always return Entities my application would suffer a big performance hit!!
One data structure ensures performance but no clear interfaces, the other ensures clear interfaces but bad performance when using a Data Pattern pattern like Doctrine 2 (now or in the future). Also I could end up with two data types which would be confusing.
I was thinking something similar to this flow:
Controller (thin) → UserService (incl. validation) → UserRepository (just storage) → Eloquent ORM
Why Eloquent instead of Doctrine2? Because I want to stick a bit to what’s common within the Laravel framework and community. So I could benefit from third party modules, for example to generate admin interfaces or similar based on models (bypassing my DDD rules). Other than using third party modules, I would design my core stuff so switching should always be easy and not affect data structure choices or performance.
Eloquent is an activerecord pattern. So I would be tempted to convert this data to POPO’s like Doctrine2 entities are. But nope... as said above, with doctrine2 real models would make the system very fat. So I fall back to simple arrays again. Knowing this would work for both and any other implementation in the future.
But it feels bad always rely on arrays. Especially when creating internal business rules. A developer would have to guess values on arrays, have no autocompletion in his IDE, could not have special methods like in Entity classes. But making two ways of dealing with data feels bad too. Or I am just too perfectionist ;) I want ONE clear data structure for all!
Building interfaces and POPO’s would mean a lot of duplicate work. I would need to convert an Eloquent model (just a table mapper, not entity) to an entity object implementing this interface. All is extra work. And eventually my last layer would be just like a API, thus converting it to arrays again. Which is extra work too. Arrays seem the deal again.
It seemed so easy reading up into DDD and Hexagonal. It seems so logic! But in reality I struggle with this one simple issue trying to stick to OOP principles. I want to use arrays because it’s the only way to be 100% sure I am not depended on any model choice and querying choice from my ORM regarding performance etc and don't have duplicate work in converting to arrays for views or an API. But there's no clear contract on how a user array could look. I want to speed up my project using these patterns, not slow them down :-) So not an option to have many converters.
Now I read a lot of topics. One makes POPO’s & interfaces that conform proper entities like Doctrine2 could return, but with all the extra work for Eloquent. Switching to Doctrine2 should be fairly easy, but would impact performance so bad or one would need to convert Doctrine2 array data to these own entity interfaces. Others choose to return simple arrays.
One convinces people to use Doctrine2 instead of Eloquent, but they leave out the fact that Doctrine2 is heavy and you really need to use array results for read only operations.
We design repositories to be changeable right? Not because it’s “nice” by design only. So how could we rely on full Entities if it has such big impact on performance or duplicate work? Even when using Doctrine2 only (coupled) this same issue would arise due to its performance!
All ORM implementations would be able to return arrays, thus no duplicate work there. Good performance. But we miss clear contracts. And we don’t have interfaces for arrays or class attributes (as a workaround)... Ugh ;)
Do I just miss a missing block in our programming languages? Interfaces on simple data structures??
Is it wise to make all arrays and have advanced business logic talk to these arrays? Thus no classes with clear interfaces. Any precalculated data (normally would be returned by an Entity method) would be within an array key defined the Service class. if not wise, what’s the alternative considering all of the above?
I would really appreciate if someone with great experience in this “domain” considering performance, different ORM implementations, etc could tell me how he/she has dealt with this?
Thanks in advance!
I think what you are dealing with is something similiar I'm struggling with. The solution I'm thinking works best is:
Entities/Repositories
Use and pass around Entities always when performing Write operations (Creating things, Updating things, Deleting things, and complex combinations thereof).
Sometimes you may use Entities when doing Read operations (when you anticipate the Read might need to be used for a Write soon after...ie. ->findById is soon followed by ->save).
Anytime you are working with an Entity (whether it be Write or Read), the Repositories need to be the place to go. You should be able to tell new developers that they can only persist to the database through Entities and the Repository.
The Entities will have properties that represent some Domain Object (many times they represent a database table with the fields of a table, but not always). They will also contain the domain logic/rules with them (ie. validation, calculations) so they are not anemic. You may additionally have some domain services if your Entities need help interacting with other Entities (need to trigger other events), or you just need an additional place to handle some extra domain logic (perform Repository calls to check for some unique conditions).
Your Repositories will solely be for working with Entities. The Repositories could accept Entities and do some persistence work with them. Or they could accept just some parameters, and do some reading/fetching into full Entities.
Some Repositories will know how to save some Domain Objects that are more complex than others. Perhaps an Entity that has a property which contains a list of other Entities that need to be saved along side the main entity (you can dive deeper into learning about Aggregate roots if you want).
The interfaces to Repositories rest in your Domain layer, but not the actual implementations of those Repositories. That way you can have an Eloquent version or whatever.
Other Queries (Table Data Gateway)
These queries won't work with Entities. They'll just be accepting parameters and returning things like Arrays or POPO's (Plain Old PHP Objects).
Many times you will need to perform Reads that do not return nicely into a single Entity. These Reads are typically more for reporting (not for CRUD-like operations, like Reading a user into an edit form that is eventually submitted and saved). For example, you might have a report that is 200 rows of JOINed data. If you used the Repositiory and tried to return large deep objects (with all the relationships populated, or even lazy-loaded) then you are going to have performance issues. Instead, use the Table Data Gatway pattern. You are just displaying data and not really needing OOP power here. The outputted data could however contain ID's, which through the UI could be used to initiate calls to Repository persistence methods.
As you are developing your app, when you come across the need for a new Read/Report query, create a new method in some class somewhere in your Table Data Gatway folder. You may find you have already created a similar query, so see how you can consolidate the other query. Use some parameters if necessary to make the gateway method's queries more flexible in particular ways (ie. columns to select, sort order, pagination, etc.). Don't make your queries too flexible though, this is where query builders/ORMs go wrong! You need to constrain your queries to a certain extent to where if you need to replace them (perhaps a different database engine) then you can easily perceive what the allowed variations are and aren't. It's up to you to find the right balance between flexibility (so you have more DRY code) and constraints (so you can optimize/replace queries later).
You can create services in your Domain to handle receiving parameters, then passing them to the Table Data Gateway, and then receiving back arrays to do some more mutating on. This will keep your Domain logic in the domain (and out of the infrastructure/persistence layer of the Repository & Table Data Gateway).
Again, just like the Repository, use interfaces in your domain services so that the implementation details stay out of your Domain layer, and resides in the actual Table Data Gateway folder.
I'm in the process of designing a system that stores entities and their relations over time.
Each entity has properties, each property should be versioned, so when a property of the entity changes, a new history state gets added. The complexity comes with the fact that I also need to version the relations between the separate entites. For example: when entity A moves from parent X to parent Y, the relation of both entities also gets a new history state.
I'm looking for advice on how to design this on a lower level - are there any design patterns available for this sort of thing, or any other best practices/proven methods?
I'm building this in PHP with a PostgreSQL database, optionally using Doctrine as the ORM/DBAL.
I would recommend looking into the table_log extension (https://github.com/psoo/table_log) for this. It does require compiling but it has the advantage of allowing you to restore tables to a previous state, audit changes, etc.
One important detail here is that the history changes are stored in other tables, not your main tables, so you think of it as current data plus external audit trails. If you need to merge them for a query, you could create a view.
What I'm trying to figure out is how to add new fields to a table, using Symfony2 with Doctrine2.
I used this to initially create the Entity:
php app/console doctrine:generate:entity --entity="MyMainBundle:ImagesTable" --fields="title:string(100) file:string(100)"
And I used this to create/update the tables on the database:
php app/console doctrine:schema:update --force
Now if I wanted to add new fields to the ImagesTable entity, is there an easy way to do it using the console, or do I have to manually edit the entity. I am just using 1 entity as an example right now, but in reality, there are many entities I'd be changing; so, there has to be an easier way to do it.
I've been manually editing them to create relationships, so if there is an easier way to do that as well, that'd be great.
I remember this being a lot easier with Symfony1.4 - all I had to do was create the database/tables using phpMyAdmin, and Symfony was able to generate the models with no issues.
I really hope I'm missing something here, because this won't work if I have to manually edit every entity for every change.
Doctrine generator commands are intended to help the developer to quickly prototype an idea. They generally don't produce production ready code, and the code needs to be checked to see if it contains what you want.
You can still create your model in phpmyadmin and use Doctrine reverse engineering tools, but it also doesn't produce production ready code, only intended to use in prototyping.
Creating database/tables beforehand doesn't really work well with Doctrine2, as the underlying relation between tables may not be the same as the relation between objects of your model. The whole point of ORM is to think in classes and letting Doctrine do the rest of the work for you.
Doctrine is not intended to write your entities for you, it gives you tools to build your data model, which you use to code your model in Php.
If you don't like to code your entities by hand (which is what all developers using doctrine does), you may want to have a look at RedbeanPHP, a zero-config ORM framework for PHP. It creates the database tables, columns, indexes on the fly depending on the data model you use.
Now that I have read an awfull lot of posts, articles, questions and answers on OOP, MVC and design patterns, I still have questions on what is the best way to build what i want to build.
My little framework is build in an MVC fashion. It uses smarty as the viewer and I have a class set up as the controller that is called from the url.
Now where I think I get lost is in the model part. I might be mixing models and classes/objects to much (or to little).
Anyway an example. When the aim is to get a list of users that reside in my database:
the application is called by e.g. "users/list" The controller then runs the function list, that opens an instance of a class "user" and requests that class to retrieve a list from the table. once returned to the controller, the controller pushes it to the viewer by assigning the result set (an array) to the template and setting the template.
The user would then click on a line in the table that would tell the controler to start "user/edit" for example - which would in return create a form and fill that with the user data for me to edit.
so far so good.
right now i have all of that combined in one user class - so that class would have a function create, getMeAListOfUsers, update etc and properties like hairType and noseSize.
But proper oop design would want me to seperate "user" (with properties like, login name, big nose, curly hair) from "getme a list of users" what would feel more like a "user manager class".
If I would implement a user manager class, how should that look like then? should it be an object (can't really compare it to a real world thing) or should it be an class with just public functions so that it more or less looks like a set of functions.
Should it return an array of found records (like: array([0]=>array("firstname"=>"dirk", "lastname"=>"diggler")) or should it return an array of objects.
All of that is still a bit confusing to me, and I wonder if anyone can give me a little insight on how to do approach this the best way.
The level of abstraction you need for your processing and data (Business Logic) depends on your needs. For example for an application with Transaction Scripts (which probably is the case with your design), the class you describe that fetches and updates the data from the database sounds valid to me.
You can generalize things a bit more by using a Table Data Gateway, Row Data Gateway or Active Record even.
If you get the feeling that you then duplicate a lot of code in your transaction scripts, you might want to create your own Domain Model with a Data Mapper. However, I would not just blindly do this from the beginning because this needs much more code to get started. Also it's not wise to write a Data Mapper on your own but to use an existing component for that. Doctrine is such a component in PHP.
Another existing ORM (Object Relational Mapper) component is Propel which provides Active Records.
If you're just looking for a quick way to query your database, you might find NotORM inspiring.
You can find the Patterns listed in italics in
http://martinfowler.com/eaaCatalog/index.html
which lists all patterns in the book Patterns of Enterprise Application Architecture.
I'm not an expert at this but have recently done pretty much exactly the same thing. The way I set it up is that I have one class for several rows (Users) and one class for one row (User). The "several rows class" is basically just a collection of (static) functions and they are used to retrieve row(s) from a table, like so:
$fiveLatestUsers = Users::getByDate(5);
And that returns an array of User objects. Each User object then has methods for retrieving the fields in the table (like $user->getUsername() or $user->getEmail() etc). I used to just return an associative array but then you run into occasions where you want to modify the data before it is returned and that's where having a class with methods for each field makes a lot of sense.
Edit: The User object also have methods for updating and deleting the current row;
$user->setUsername('Gandalf');
$user->save();
$user->delete();
Another alternative to Doctrine and Propel is PHP Activerecords.
Doctrine and Propel are really mighty beasts. If you are doing a smaller project, I think you are better off with something lighter.
Also, when talking about third-party solutions there are a lot of MVC frameworks for PHP like: Kohana, Codeigniter, CakePHP, Zend (of course)...
All of them have their own ORM implementations, usually lighter alternatives.
For Kohana framework there is also Auto modeler which is supposedly very lightweight.
Personally I'm using Doctrine, but its a huge project. If I was doing something smaller I'd sooner go with a lighter alternative.
What I really like about Entity framework is its drag and drop way of making up the whole model layer of your application. You select the tables, it joins them and you're done. If you update the database scheda, right click -> update and you're done again.
This seems to me miles ahead the competiting ORMs, like the mess of XML (n)Hibernate requires or the hard-to-update Django Models.
Without concentrating on the fact that maybe sometimes more control over the mapping process may be good, are there similar one-click (or one-command) solutions for other (mainly open source like python or php) programming languages or frameworks?
Thanks
SQLAlchemy database reflection gets you half way there. You'll still have to declare your classes and relations between them. Actually you could easily autogenerate the classes too, but you'll still need to name the relations somehow so you might as well declare the classes manually.
The code to setup your database would look something like this:
from sqlalchemy import create_engine, MetaData
from sqlalchemy.ext.declarative import declarative_base
metadata = MetaData(create_engine(database_url), reflect=True)
Base = declarative_base(metadata)
class Order(Base):
__table__ = metadata.tables['orders']
class OrderLine(Base):
__table__ = metadata.tables['orderlines']
order = relation(Order, backref='lines')
In production code, you'd probably want to cache the reflected database metadata somehow. Like for instance pickle it to a file:
from cPickle import dump, load
import os
if os.path.exists('metadata.cache'):
metadata = load(open('metadata.cache'))
metadata.bind = create_engine(database_url)
else:
metadata = MetaData(create_engine(database_url), reflect=True)
dump(metadata, open('metadata.cache', 'w'))
I do not like “drag and drop” create of data access code.
At first sight it seems easy, but then you make a change to the database and have to update the data access code. This is where it becomes hard, as you often have to redo what you have done before, or hand edit the code the drag/drop designer created. Often when you make a change to one field mapping with a drag/drop designer, the output file has unrelated lines changes, so you can not use your source code control system to confirm you have make the intended change (and not change anything else).
However having to create/edit xml configuring files is not nice every time you refractor your code or change your database schema you have to update the mapping file. It is also very hard to get started with mapping files and tracking down what looks like simple problem can take ages.
There are two other options:
Use a code generator like CodeSmith that comes with templates for many ORM systems. When (not if) you need to customize the output you can edit the template, but the simple case are taken care of for you. That ways you just rerun the code generator every time you change the database schema and get a repeatable result.
And/or use fluent interface (e.g Fluent NHibernate) to configure your ORM system, this avoids the need to the Xml config file and in most cases you can use naming conventions to link fields to columns etc. This will be harder to start with then a drag/drop designer but will pay of in the long term if you do match refactoring of the code or database.
Another option is to use a model that you generate both your database and code from. The “model” is your source code and is kept under version control. This is called “Model Driven Development” and can be great if you have lots of classes that have simpler patterns, as you only need to create the template for each pattern once.
I have heard iBattis is good. A few companies fall back to iBattis when their programmer teams are not capable of understanding Hibernate (time issue).
Personally, I still like Linq2Sql. Yes, the first time someone needs to delete and redrag over a table seems like too much work, but it really is not. And the time that it doesn't update your class code when you save is really a pain, but you simply control-a your tables and drag them over again. Total remakes are very quick and painless. The classes it creates are extremely simple. You can even create multiple table entities if you like with SPs for CRUD.
Linking SPs to CRUD is similar to EF: You simply setup your SP with the same parameters as your table, then drag it over your table, and poof, it matches the data types.
A lot of people go out of their way to take IQueryable away from the repository, but you can limit what you link in linq2Sql, so IQueryable is not too bad.
Come to think of it, I wonder if there is a way to restrict the relations (and foreign keys).