Flush only certain entities in single transaction

Flush only certain entities in single transaction - php

in my application I do write to a read model table (think CQRS) at certain times. While doing so, I also have to remove older read models. So at any given point I need to:
Remove entities a,b,c
Persist entities x,y,z
In order to maintain a valid read model throughout the lifecycle, I would like to encapsulate this process in a single transaction. Doctrine does provide the necessary means.
However, I also must guarantee that no other entities are being flushed in the process. Sadly, calling doctrine's $em->getConnection()->commit(); seems to to flush the whole unit of work. But according to the docs I have to call that to finalise my transaction.
I cannot introduce a second entity manager to only take of my read model entities as they are in the same namespace as the other entities and apparently that is not the way the doctrine-orm-bundle is supposed to be used.
The only other approach I see is to work on a lower level and circumvent the EntityManager and UnitOfWork completely, but I would like to guarantee transactional integrity and do not see a way to do so without em/ouw.

TLDR: The way your application works might warrant that update-concerns are completely separable into two independent sets, but that's unusual, and fragile (usually not even true at the time of making the assertion). The proper way to model that is using separate EntityManagers for each of the sets, and by manually guarding their interconnections is the semantics of "how they are independent" coded into the system.
Details:
If I understand the question correctly, you are facing a design flaw here. You have two sets of updates (one for the read models and one for the other entities) and you mix them (because you want both in the same entity manager) while you also want to separate them (by the means of a separate transaction). In general, this won't work.
For example, let's think non-read-model entity instance A is just created in-memory (so it has no ID yet) and based on this you decide to reference it with read-model entity instance R. The R->A relationship is valid in memory, but now you expect to be able to flush only read model entities but not others. I.e. when you try to persist+flush R it will reference a non-existing foreign key and your RDBMS will hopefully fail the transaction. On a high-level, this is because a connected in-memory graph should be consistent data in its entirety and when you try to split its valid changes into subsets, you're rearranging the order of those changes implicitly, which may introduce temporary inconsistency, and then your transaction commit might just be at such a point.
Of course, it may happen that you know some rule why such a thing will never happen and why consistent state of each set is warranted in a fashion that is completely independent from the other set. But then you need to write your code reflecting that separation; the way you can do that is to use two entity managers. In that case your code will clearly cope with the two distinct transactions, their separation and how exactly that is consistent from both sides. But even in this case, to avoid clashes of updates, you probably need to have rules outlining a one-way visibility between the two sets, which will also imply an order to committing transactions. This is because transactions in a connected graph can be nested, but not "only overlap", because at each transaction commit you are asking the ORM to sync the consistent in-memory data of the transaction scope to the RDBMS.
I know you mentioned that you do not want to use two EMs, because read-model entities "are in the same namespace as the other entities and apparently that is not the way the doctrine-orm-bundle is supposed to be used".
The namespace does not really matter. You can use them separately in the two managers. You can even interconnect them, if you a) properly merge() them and b) cater for the above mentioned consistency you need to provide for both EM's transactions (because now they are working on one connected graph).
You should elaborate what exactly you refer to by saying "that is not the way the doctrine-orm-bundle is supposed to be used" -- probably there's an error in the original suggestion or something wrong with the way that suggestion is applied to this problem.

Related

A data structure from Repository to serve all purposes?

I need help with something I can’t get my head wrapped around regarding the Repository and Service/Use-case pattern (part of DDD design) I want to implement in my next (Laravel PHP) project.
All seems clear. Just one part of DDD that is confusing is the data structures from repositories. People seem to choose data structures the Repository should return (arrays or entities) but it all has disadvantages. One of which is performance looking at my experiences in the past. And one is which you don’t have interfaces for simple data structures (array or simple object attributes).
I’ll start with explaining the experience I have with a previous project. This project had flaws but some good strengths I learned from and like to see in my new project but with solving some design mistakes.
Previous experience
In the past I’ve build a website that was API Centric using the Kohana framework and Doctrine 2 ORM (data mapper pattern). The flow looked like this:
Website controller → API client (HMVC calls) → API controller → Custom Repository → Doctrine 2 ORM native Repository/Entity-manager
My custom Repository returned plain arrays using Doctrine2 DQL. Doctrine2 recommends array result data for read only operations. And yes, it made my site nice and light. The API controller just converted the array data to JSON. Simple as that.
In the past my company created projects relying fully on loaded Doctrine2 entities and it’s something we regretted due to performance.
My REST API supported queries like
/api/users?include_latest_adverts=2&include_location=true
on the users resource. The API controller passed include_location to the repository which directly included the location relation. The controller read latest_adverts=2 and called the adverts repository to get the latest 2 adverts of each user. Arrays were returned.
For example first user array:
[
name
avatar
adverts [
advert 1 [
name
price
]
advert 2 [
….
]
]
]
This proved to be very successful. My whole website was using the API. It would be very easy to to add a new client because the API was perfectly in production already using oauth. The whole website runs on it.
But this design had flaws too. My controller still contained A LOT of logic for validation, mailing, params or filters like has_adverts=true to get users with adverts only. It would mean that if I created a new port, like a total new CLI interface, I would have to duplicate alot of these controllers due to all the validation etc. But no duplication if I would create a new client. So at least one problem was solved :-)
My admin panels were completely coupled to the doctrine2 repository/entity-manager to speed up development (sort of). Why? Because my API had fat controllers with special functionality for the website only (special validation, mailing for registering etc). I would have to redo work or refactor a lot. So decided to use the Entities directly to still have some sort clear way of writing code instead of rewriting all my API controllers and move them to Services (for site & admin) for instance. Time was an issue in fixing my design mistakes.
For my next project I want all code to go through my own custom repositories and services. One flow for good separation.
New project (using DDD ideas) and dilemma with data structures
While I like the idea of being API centric, I don’t want my next project to be API centric in core because I think the same functionality should be available without the HTTP protocol in between. I want to design the core using DDD ideas.
But I liked the idea using a layer that just talked as a API and returns simple arrays. The perfect base for any new port, including my own frontend. My idea is to consider my Service classes as the API interface (return the array data), do the validation etc. I could have Services specially for the website (registering) and plain services used by the Admin or background processes. In some admin cases a Service would not be required anyway for simple CRUD editing, I could just use Repositories directly. Controllers would be very thin. With this creating a real REST API would just be a matter to create new controllers using the same Services my frontend controller classes do.
For internal logic like business rules it would be useful to have Entities (clear interfaces) instead of arrays from repositories. This way I could benefit from defining some methods that did some logic based on attributes. BUT If I would be using Doctrine2 and my repositories would always return Entities my application would suffer a big performance hit!!
One data structure ensures performance but no clear interfaces, the other ensures clear interfaces but bad performance when using a Data Pattern pattern like Doctrine 2 (now or in the future). Also I could end up with two data types which would be confusing.
I was thinking something similar to this flow:
Controller (thin) → UserService (incl. validation) → UserRepository (just storage) → Eloquent ORM
Why Eloquent instead of Doctrine2? Because I want to stick a bit to what’s common within the Laravel framework and community. So I could benefit from third party modules, for example to generate admin interfaces or similar based on models (bypassing my DDD rules). Other than using third party modules, I would design my core stuff so switching should always be easy and not affect data structure choices or performance.
Eloquent is an activerecord pattern. So I would be tempted to convert this data to POPO’s like Doctrine2 entities are. But nope... as said above, with doctrine2 real models would make the system very fat. So I fall back to simple arrays again. Knowing this would work for both and any other implementation in the future.
But it feels bad always rely on arrays. Especially when creating internal business rules. A developer would have to guess values on arrays, have no autocompletion in his IDE, could not have special methods like in Entity classes. But making two ways of dealing with data feels bad too. Or I am just too perfectionist ;) I want ONE clear data structure for all!
Building interfaces and POPO’s would mean a lot of duplicate work. I would need to convert an Eloquent model (just a table mapper, not entity) to an entity object implementing this interface. All is extra work. And eventually my last layer would be just like a API, thus converting it to arrays again. Which is extra work too. Arrays seem the deal again.
It seemed so easy reading up into DDD and Hexagonal. It seems so logic! But in reality I struggle with this one simple issue trying to stick to OOP principles. I want to use arrays because it’s the only way to be 100% sure I am not depended on any model choice and querying choice from my ORM regarding performance etc and don't have duplicate work in converting to arrays for views or an API. But there's no clear contract on how a user array could look. I want to speed up my project using these patterns, not slow them down :-) So not an option to have many converters.
Now I read a lot of topics. One makes POPO’s & interfaces that conform proper entities like Doctrine2 could return, but with all the extra work for Eloquent. Switching to Doctrine2 should be fairly easy, but would impact performance so bad or one would need to convert Doctrine2 array data to these own entity interfaces. Others choose to return simple arrays.
One convinces people to use Doctrine2 instead of Eloquent, but they leave out the fact that Doctrine2 is heavy and you really need to use array results for read only operations.
We design repositories to be changeable right? Not because it’s “nice” by design only. So how could we rely on full Entities if it has such big impact on performance or duplicate work? Even when using Doctrine2 only (coupled) this same issue would arise due to its performance!
All ORM implementations would be able to return arrays, thus no duplicate work there. Good performance. But we miss clear contracts. And we don’t have interfaces for arrays or class attributes (as a workaround)... Ugh ;)
Do I just miss a missing block in our programming languages? Interfaces on simple data structures??
Is it wise to make all arrays and have advanced business logic talk to these arrays? Thus no classes with clear interfaces. Any precalculated data (normally would be returned by an Entity method) would be within an array key defined the Service class. if not wise, what’s the alternative considering all of the above?
I would really appreciate if someone with great experience in this “domain” considering performance, different ORM implementations, etc could tell me how he/she has dealt with this?
Thanks in advance!

I think what you are dealing with is something similiar I'm struggling with. The solution I'm thinking works best is:
Entities/Repositories
Use and pass around Entities always when performing Write operations (Creating things, Updating things, Deleting things, and complex combinations thereof).
Sometimes you may use Entities when doing Read operations (when you anticipate the Read might need to be used for a Write soon after...ie. ->findById is soon followed by ->save).
Anytime you are working with an Entity (whether it be Write or Read), the Repositories need to be the place to go. You should be able to tell new developers that they can only persist to the database through Entities and the Repository.
The Entities will have properties that represent some Domain Object (many times they represent a database table with the fields of a table, but not always). They will also contain the domain logic/rules with them (ie. validation, calculations) so they are not anemic. You may additionally have some domain services if your Entities need help interacting with other Entities (need to trigger other events), or you just need an additional place to handle some extra domain logic (perform Repository calls to check for some unique conditions).
Your Repositories will solely be for working with Entities. The Repositories could accept Entities and do some persistence work with them. Or they could accept just some parameters, and do some reading/fetching into full Entities.
Some Repositories will know how to save some Domain Objects that are more complex than others. Perhaps an Entity that has a property which contains a list of other Entities that need to be saved along side the main entity (you can dive deeper into learning about Aggregate roots if you want).
The interfaces to Repositories rest in your Domain layer, but not the actual implementations of those Repositories. That way you can have an Eloquent version or whatever.
Other Queries (Table Data Gateway)
These queries won't work with Entities. They'll just be accepting parameters and returning things like Arrays or POPO's (Plain Old PHP Objects).
Many times you will need to perform Reads that do not return nicely into a single Entity. These Reads are typically more for reporting (not for CRUD-like operations, like Reading a user into an edit form that is eventually submitted and saved). For example, you might have a report that is 200 rows of JOINed data. If you used the Repositiory and tried to return large deep objects (with all the relationships populated, or even lazy-loaded) then you are going to have performance issues. Instead, use the Table Data Gatway pattern. You are just displaying data and not really needing OOP power here. The outputted data could however contain ID's, which through the UI could be used to initiate calls to Repository persistence methods.
As you are developing your app, when you come across the need for a new Read/Report query, create a new method in some class somewhere in your Table Data Gatway folder. You may find you have already created a similar query, so see how you can consolidate the other query. Use some parameters if necessary to make the gateway method's queries more flexible in particular ways (ie. columns to select, sort order, pagination, etc.). Don't make your queries too flexible though, this is where query builders/ORMs go wrong! You need to constrain your queries to a certain extent to where if you need to replace them (perhaps a different database engine) then you can easily perceive what the allowed variations are and aren't. It's up to you to find the right balance between flexibility (so you have more DRY code) and constraints (so you can optimize/replace queries later).
You can create services in your Domain to handle receiving parameters, then passing them to the Table Data Gateway, and then receiving back arrays to do some more mutating on. This will keep your Domain logic in the domain (and out of the infrastructure/persistence layer of the Repository & Table Data Gateway).
Again, just like the Repository, use interfaces in your domain services so that the implementation details stay out of your Domain layer, and resides in the actual Table Data Gateway folder.

How to make database transaction in PHP OOP

In my obsolate procedural code (which I'd like now to translate into OOP) I have simple database transaction code like this:
mysql_query("BEGIN");
mysql_query("INSERT INTO customers SET cid=$cid,cname='$cname'");
mysql_query("INSERT INTO departments SET did=$did,dname='$dname'");
mysql_query("COMMIT");
If I build OOP classes Customer and Department for mapping customers and departments database tables I can insert table records like:
$customer=new Customer();
$customer->setId($cid);
$customer->setName($cname);
$customer->save();
$department=new Department();
$department->setId($did);
$department->setName($dname);
$department->save();
My Customer and Department classes internally use other DB class for querying database.
But how to make $customer.save() and $department.save() parts of a database transaction?
Should I have one outer class starting/ending transaction with Customer and Department classes instantiated in it or transaction should be started somehow in Customer (like Customer.startTransaction()) and ended in Department (like Department.endTransaction())? Or...

Additional object is the way to go. Something like this:
$customer=new Customer();
$customer->setId($cid);
$customer->setName($cname);
$department=new Department();
$department->setId($did);
$department->setName($dname);
$transaction = new Transaction();
$transaction->add($customer);
$transaction->add($department);
$transaction->commit();
You can see that there is no call to save() method on $customer and $department anymore. $transaction object takes care of that.
Implementation can be as simple as this:
class Transaction
{
private $stack;
public function __construct()
{
$this->stack = array();
}
public function add($entity)
{
$this->stack[] = $entity;
}
public function commit()
{
mysql_query("BEGIN");
foreach ($this->stack as $entity) {
$entity->save();
}
mysql_query("COMMIT");
}
}

How to make $customer.save() and $department.save() parts of a database transaction?
You don't have to do anything besides start the transaction.
In most DBMS interfaces, the transaction is "global" to the database connection. If you start a transaction, then all subsequent work is automatically done within the scope of that transaction. If you commit, you have committed all changes since the last transaction BEGIN. If you rollback, you discard all changes since the last BEGIN (there's also an option to rollback to the last transaction savepoint).
I've only used one database API that allowed multiple independent transactions to be active per database connection simultaneously (that was InterBase / Firebird). But this is so uncommon, that standard database interfaces like ODBC, JDBC, PDO, Perl DBI just assume that you only get one active transaction per db connection, and all changes happen within the scope of the one active transaction.
Should I have one outer class starting/ending transaction with Customer and Department classes instantiated in it or transaction should be started somehow in Customer (like Customer.startTransaction()) and ended in Department (like Department.endTransaction())? Or...
You should start a transaction, then invoke domain model classes like Customer and Department, then afterwards, either commit or rollback the transaction in the calling code.
The reason for this is that domain model methods can call other domain model methods. You never know how many levels deep these calls go, so it's really difficult for the domain model to know when it's time to commit or rollback.
For some pitfalls of doing this, see How do detect that transaction has already been started?
But they don't have to know that. Customer and Department should just do their work, inserting and deleting and updating as needed. Once they are done, the calling code decides if it wants to commit or rollback the whole set of work.
In a typical PHP application, a transaction is usually the same amount of work as one PHP request. It's possible, though uncommon, to do more than one transaction during a given PHP request, and it's not possible for a transaction to span across multiple PHP requests.
So the simple answer is that your PHP script should start a transaction near the beginning of the script, before invoking any domain model classes, then commit or rollback at the end of the script, or once the domain model classes have finished their work.

You are migrating to OOP, and thats great, but soon you will find yourself migrating to an arquitecture with a well diferenciated Data Access Layer, including a more complex way of separating data from control. Now, i guess you are using some kind of Data access object, that is a great first approach pattern, but for sure you can go further. Some of the answer here already lead you in that direction. You shouldent think in your objects as the basis of your arquitecture, and use some helper objects to query database. Instead, you should think about a fully featured layer, with all required generic classes that takes care of the comunication with the database, that you will use in all your projects, and then have the business-level-objects, like customer or department, than know as litle as possible about database implementations.
For this, for sure you will have an outer class handling transactions, but probably also other taking care of security, other for building queries providing a unique api regardless or the database engine, and even more, a class that reads objects in order to put them in the database, so the object itself doesn't even know that it is meant to end in a database.
Achieve this, would be a hard and long work, but after that, you could have a custom and widely reusable layer that will make your projects more escalable, more stable, and more trustable. And that will be great and you will learn a lot and after that you would fill quite good. You will have some kind of DBAL or ORM.
But that wouldnt also be the best solution, since there are people that already have been years doing that, and it will be hard to achieve what the already have.
So, what i recommend, for any medium size project, is that you take data base abstraction as serious as you can, and any opensource ORM, that happens to be easy to use, and finally you will save time and get a system much better.
for example, doctrine has a very nice way of handling transactions and concurrency, in two ways: implicit, taking automatically care of the normal operations, or implicit, when you need to take over and control transaction demarcation yourself. check it out here. Also, there are some other complex posibilities like transaction nesting, and others.
The most famous and reliable ORM are
Doctrine, and
Propel
I use doctrine mostly, since it has a module to integrate with Zend Framework 2 that i like, but propel has some aspects that i like a lot.
Probably you would have to refactor somethings, and you dont feel like doing it at this point, but i can say for my experience, that this is one of those things you dont even want to think about, and years after you start using it and realize how you wasted time :-)recommend you to consider this if not know, in your very next project.
UPDATE
Some thoughts after Tomas' comment.
It's true that for not so big projects (especially if you are not very familiar with orms, or your model is very complex) it can be a big effort to integrate a vendor orm.
But what i can say after years developing projects of any size, is that for any medium size one, i would use at least a custom, less serious and more flexible home-made orm, with a sort of generic classes, and as few as possible business oriented repositories, where an entity knows its table, and probably other related tables, and where you can encapsulate some sql or custom query function calls, but all around that entity (for example the main table of the entity, the table of pictures associated to that entity, and so) in order to provide to the controller a single interface to the data, so at any range the database engine is independent of the API of the model, and as much important as that, the controller doesn't have to be aware of any DBMS aspects, like the use of a transactions, something that is meant just to ensure a behavior that is purely model-related, and in a scandalous low level: related pretty much to DBMS technical needs. i mean, your controller could know that it is storing stuff in a database, but for sure it doesn't have to even know what a transaction is.
For sure this is a philosophical discussion, and it could be many equally valid points of view.
For any custom ORM, i would recommend to start looking for some DAO/DTO generator that can help you to create the main classes from your database, so you only need to adapt them to your needs at the points where you find exceptions to the normal behavior of a normal create-read-update-delete. This reminds me that you can also look for PHP CRUD and find some useful and fun tools.

Active Record must have domain logic?

I started some time working with the Yii Framework and I saw some things "do not let me sleep." Here I talk about my doubts about how Yii users use the Active Record.
I saw many people add business rules of the application directly in Active Record, the same generated by Gii. I deeply believe that this is a misinterpretation of what is Active Record and a violation of SRP.
Early on, SRP is easier to apply. ActiveRecord classes handle persistence, associations and not much else. But bit-by-bit, they grow. Objects that are inherently responsible for persistence become the de facto owner of all business logic as well. And a year or two later you have a User class with over 500 lines of code, and hundreds of methods in it’s public interface. Callback hell ensues.
When I talked about it with some people and my view was criticized. But when asked:
And when you need to regenerate your Active Record full of business rules through Gii what do you do? Rewrite? Copy and Paste? That's great, congratulations!
Got an answer, only the silence.
So, I:
What I am currently doing in order to reach a little better architecture is to generate the Active Records in a folder /ar. And inside the /models folder add the Domain Model.
By the way, is the Domain Model who owns the business rules, and is the Domain Model that uses the Active Records to persist and retrieve data, and this is the Data Model.
What do you think of this approach?
If I'm wrong somewhere, please tell me why before criticizing harshly.

Some of the comments on this article are quite helpful:
http://blog.codeclimate.com/blog/2012/10/17/7-ways-to-decompose-fat-activerecord-models/
In particular, the idea that your models should grow out of a strictly 'fat model' setup as you need more seems quite wise.
Are you having issues now or mainly trying to plan ahead? This may be hard to plan ahead for and may just need refactoring as you go ...
Edit:
Regarding moveUserToGroup (in your comment below), I could see how having that might bother you. Found this as I was thinking about your question: https://gist.github.com/justinko/2838490 An equivalent setup that you might use for your moveUserToGroup would be a CFormModel subclass. It'll give you the ability to do validations, etc, but could then be more specific to what you're trying to handle (and use multiple AR objects to achieve your objectives instead of just one).
I often use CFormModel to handle forms that have multiple AR objects or forms where I want to do other things.
Sounds like that may be what you're after. More details available here:
http://www.yiiframework.com/doc/guide/1.1/en/form.overview

The definition of Active Record, according to Martin Fowler:
An object carries both data and behavior. Much of this data is persistent and needs to be stored in a database. Active Record uses the most obvious approach, putting data access logic in the domain object. This way all people know how to read and write their data to and from the database.
When you segregate data and behavior you no longer have an Active Record. Two common related patterns are Data Mapper and Table/Row Gateway (this one more related to RDBMS's).
Again, Fowler says:
The Data Mapper is a layer of software that separates the in-memory objects from the database. Its responsibility is to transfer data between the two and also to isolate them from each other. With Data Mapper the in-memory objects needn't know even that there's a database present; they need no SQL interface code, and certainly no knowledge of the database schema.
And again:
A Table Data Gateway holds all the SQL for accessing a single table or view: selects, inserts, updates, and deletes. Other code calls its methods for all interaction with the database.
A Row Data Gateway gives you objects that look exactly like the record in your record structure but can be accessed with the regular mechanisms of your programming language. All details of data source access are hidden behind this interface.
A Data Mapper is usualy storage independent, the mapper recovers data from the storage and creates mapped objects (Plain-old objects). The mapped object knows absolutely nothing about being stored somewhere else.
As I said, TDG/RDG are more inwardly related to a relational table. TDG object represents the structure of the table and implements all common operations. RGD object contains data related to one single row of the table. Unlike mapped object of Data Mapper, the RDG object has conscience that it is part of a whole, because it references its container TDG.

How do I architect my classes for easier unit testing?

I'll admit, I haven't unit tested much... but I'd like to. With that being said, I have a very complex registration process that I'd like to optimize for easier unit testing. I'm looking for a way to structure my classes so that I can test them more easily in the future. All of this logic is contained within an MVC framework, so you can assume the controller is the root where everything gets instantiated from.
To simplify, what I'm essentially asking is how to setup a system where you can manage any number of third party modules with CRUD updates. These third party modules are all RESTful API driven and response data is stored in local copies. Something like the deletion of a user account would need to trigger the deletion of all associated modules (which I refer to as providers). These providers may have a dependency on another provider, so the order of deletions/creations is important. I'm interested in which design patterns I should specifically be using to support my application.
Registration spans several classes and stores data in several db tables. Here's the order of the different providers and methods (they aren't statics, just written that way for brevity):
Provider::create('external::create-user') initiates registration at a particular step of a particular provider. The double colon syntax in the first param indicates the class should trigger creation on providerClass::providerMethod. I had made a general assumption that Provider would be an interface with the methods create(), update(), delete() that all other providers would implement it. How this gets instantiated is likely something you need to help me with.
$user = Provider_External::createUser() creates a user on an external API, returns success, and user gets stored in my database.
$customer = Provider_Gapps_Customer::create($user) creates a customer on a third party API, returns success, and stores locally.
$subscription = Provider_Gapps_Subscription::create($customer) creates a subscription associated to the previously created customer on the third party API, returns success, and stores locally.
Provider_Gapps_Verification::get($customer, $subscription) retrieves a row from an external API. This information gets stored locally. Another call is made which I'm skipping to keep things concise.
Provider_Gapps_Verification::verify($customer, $subscription) performs an external API verification process. The result of which gets stored locally.
This is a really dumbed down sample as the actual code relies upon at least 6 external API calls and over 10 local database rows created during registration. It doesn't make sense to use dependency injection at the constructor level because I might need to instantiate 6 classes in the controller without knowing if I even need them all. What I'm looking to accomplish would be something like Provider::create('external') where I simply specify the starting step to kick off registration.
The Crux of the Problem
So as you can see, this is just one sample of a registration process. I'm building a system where I could have several hundred service providers (external API modules) that I need to sign up for, update, delete, etc. Each of these providers gets related back to a user account.
I would like to build this system in a manner where I can specify an order of operations (steps) when triggering the creation of a new provider. Put another way, allow me to specify which provider/method combination gets triggered next in the chain of events since creation can span so many steps. Currently, I have this chain of events occurring via the subject/observer pattern. I'm looking to potentially move this code to a database table, provider_steps, where I list each step as well as it's following success_step and failure_step (for rollbacks and deletes). The table would look as follows:
# the id of the parent provider row
provider_id int(11) unsigned primary key,
# the short, slug name of the step for using in codebase
step_name varchar(60),
# the name of the method correlating to the step
method_name varchar(120),
# the steps that get triggered on success of this step
# can be comma delimited; multiple steps could be triggered in parallel
triggers_success varchar(255),
# the steps that get triggered on failure of this step
# can be comma delimited; multiple steps could be triggered in parallel
triggers_failure varchar(255),
created_at datetime,
updated_at datetime,
index ('provider_id', 'step_name')
There's so many decisions to make here... I know I should favor composition over inheritance and create some interfaces. I also know I'm likely going to need factories. Lastly, I have a lot of domain model shit going on here... so I likely need business domain classes. I'm just not sure how to mesh them all together without creating an utter mess in my pursuit of the holy grail.
Also, where would be the best place for the db queries to take place?
I have a model for each database table already, but I'm interested in knowing where and how to instantiate the particular model methods.
Things I've Been Reading...
Design Patterns
The Strategy Pattern
Composition over Inheritance
The Factory method pattern
The Abstract factory pattern
The Builder pattern
The Chain-of-responsibility pattern

You're already working with the pub/sub pattern, which seems appropriate. Given nothing but your comments above, I'd be considering an ordered list as a priority mechanism.
But it still doesn't smell right that each subscriber is concerned with the order of operations of its dependents for triggering success/failure. Dependencies usually seem like they belong in a tree, not a list. If you stored them in a tree (using the composite pattern) then the built-in recursion would be able to clean up each dependency by cleaning up its dependents first. That way you're no longer worried about prioritizing in which order the cleanup happens - the tree handles that automatically.
And you can use a tree for storing pub/sub subscribers almost as easily as you can use a list.
Using a test-driven development approach could get you what you need, and would ensure your entire application is not only fully testable, but completely covered by tests that prove it does what you want. I'd start by describing exactly what you need to do to meet one single requirement.
One thing you know you want to do is add a provider, so a TestAddProvider() test seems appropriate. Note that it should be pretty simple at this point, and have nothing to do with a composite pattern. Once that's working, you know that a provider has a dependent. Create a TestAddProviderWithDependent() test, and see how that goes. Again, it shouldn't be complex. Next, you'd likely want to TestAddProviderWithTwoDependents(), and that's where the list would get implemented. Once that's working, you know you want the Provider to also be a Dependent, so a new test would prove the inheritance model worked. From there, you'd add enough tests to convince yourself that various combinations of adding providers and dependents worked, and tests for exception conditions, etc. Just from the tests and requirements, you'd quickly arrive at a composite pattern that meets your needs. At this point I'd actually crack open my copy of GoF to ensure I understood the consequences of choosing the composite pattern, and to make sure I didn't add an inappropriate wart.
Another known requirement is to delete providers, so create a TestDeleteProvider() test, and implement the DeleteProvider() method. You won't be far away from having the provider delete its dependents, too, so the next step might be creating a TestDeleteProviderWithADependent() test. The recursion of the composite pattern should be evident at this point, and you should only need a few more tests to convince yourself that deeply nested providers, empty leafs, wide nodes, etc., all will properly clean themselves up.
I would assume that there's a requirement for your providers to actually provide their services. Time to test calling the providers (using mock providers for testing), and adding tests that ensure they can find their dependencies. Again, the recursion of the composite pattern should help build the list of dependencies or whatever you need to call the correct providers correctly.
You might find that providers have to be called in a specific order. At this point you might need to add prioritization to the lists at each node within the composite tree. Or maybe you have to build an entirely different structure (such as a linked list) to call them in the right order. Use the tests and approach it slowly. You might still have people concerned that you delete dependents in a particular externally prescribed order. At this point you can use your tests to prove to the doubters that you will always delete them safely, even if not in the order they were thinking.
If you've been doing it right, all your previous tests should continue to pass.
Then come the tricky questions. What if you have two providers that share a common dependency? If you delete one provider, should it delete all of its dependencies even though a different provider needs one of them? Add a test, and implement your rule. I figure I'd handle it through reference counting, but maybe you want a copy of the provider for the second instance, so you never have to worry about sharing children, and you keep things simpler that way. Or maybe it's never a problem in your domain. Another tricky question is if your providers can have circular dependencies. How do you ensure you don't end up in a self-referential loop? Write tests and figure it out.
After you've got this whole structure figured out, only then would you start thinking about the data you would use to describe this hierarchy.
That's the approach I'd consider. It may not be right for you, but that's for you to decide.

Unit Testing
With unit testing, we only want to test the code that makes up the individual unit of source code, typically a class method or function in PHP (Unit Testing Overview). Which indicates that we don't want to actually test the external API in Unit Testing, we only want to test the code we are writing locally. If you do want to test entire workflows, you are likely wanting to perform integration testing (Integration Testing Overview), which is a different beast.
As you specifically asked about designing for Unit Testing, lets assume you actually mean Unit Testing as opposed to Integration Testing and submit that there are two reasonable ways to go about designing your Provider classes.
Stub Out
The practice of replacing an object with a test double that (optionally) returns configured return values is refered to as stubbing. You can use a stub to "replace a real component on which the SUT depends so that the test has a control point for the indirect inputs of the SUT. This allows the test to force the SUT down paths it might not otherwise execute". Reference & Examples
Mock Objects
The practice of replacing an object with a test double that verifies expectations, for instance asserting that a method has been called, is referred to as mocking.
You can use a mock object "as an observation point that is used to verify the indirect outputs of the SUT as it is exercised. Typically, the mock object also includes the functionality of a test stub in that it must return values to the SUT if it hasn't already failed the tests but the emphasis is on the verification of the indirect outputs. Therefore, a mock object is lot more than just a test stub plus assertions; it is used a fundamentally different way".
Reference & Examples
Our Advice
Design your class to both all both Stubbing and Mocking. The PHP Unit Manual has an excellent example of Stubbing and Mocking Web Service. While this doesn't help you out of the box, it demonstrates how you would go about implementing the same for the Restful API you are consuming.
Where is the best place for the db queries to take place?
We suggest you use an ORM and not solve this yourself. You can easily Google PHP ORM's and make your own decision based off your own needs; our advice is to use Doctrine because we use Doctrine and it suits our needs well and over the past few years, we have come to appreciate how well the Doctrine developers know the domain, simply put, they do it better than we could do it ourselves so we are happy to let them do it for us.
If you don't really grasp why you should use an ORM, see Why should you use an ORM? and then Google the same question. If you still feel like you can roll your own ORM or otherwise handle the Database Access yourself better than the guys dedicated to it, we would expect you to already know the answer to the question. If you feel you have a pressing need to handle it yourself, we suggest you look at the source code for a number a of ORM's (See Doctrine on Github) and find the solution that best fits your scenario.
Thanks for asking a fun question, I appreciate it.

Every single dependency relationship within your class hierarchy must be accessible from outside world (shouldn't be highly coupled). For instance, if you are instantiating class A within class B, class B must have setter/getter methods implemented for class A instance holder in class B.
http://en.wikipedia.org/wiki/Dependency_injection

The furthermost problem I can see with your code - and this hinders you from testing it actually - is making use of static class method calls:
Provider::create('external::create-user')
$user = Provider_External::createUser()
$customer = Provider_Gapps_Customer::create($user)
$subscription = Provider_Gapps_Subscription::create($customer)
...
It's epidemic in your code - even if you "only" outlined them as static for "brevity". Such attitiude is not brevity it's counter-productive for testable code. Avoid these at all cost incl. when asking a question about Unit-Testing, this is known bad practice and it is known that such code is hard to test.
After you've converted all static calls into object method invocations and used Dependency Injection instead of static global state to pass the objects along, you can just do unit-testing with PHPUnit incl. making use of stub and mock objects collaborating in your (simple) tests.
So here is a TODO:
Refactor static method calls into object method invocations.
Use Dependency Injection to pass objects along.
And you very much improved your code. If you argue that you can not do that, do not waste your time with unit-testing, waste it with maintaining your application, ship it fast, let it make some money, and burn it if it's not profitable any longer. But don't waste your programming life with unit-testing static global state - it's just stupid to do.

Think about layering your application with defined roles and responsibilities for each layer. You may like to take inspiration from Apache-Axis' message flow subsystem. The core idea is to create a chain of handlers through which the request flows until it is processed. Such a design facilitates plugable components which may be bundled together to create higher order functions.
Further you may like to read about Functors/Function Objects, particularly Closure, Predicate, Transformer and Supplier to create your participating components. Hope that helps.

Have you looked at the state design pattern? http://en.wikipedia.org/wiki/State_pattern
You could make all your steps as different states in state machine and it would look like graph. You could store this graph in your database table/xml, also every provider can have his own graph which represents order in which execution should happen.
So when you get into certain state you may trigger event/events (save user, get user). I dont know your application specific, but events can be res-used by other providers.
If it fails on some of the steps then different graph path is executed.
If you will correctly abstract it you could have loosely coupled system which follows orders given by graph and executes events based on state.
Then later if you need add some other provider you only need to create graph and/or some new events.
Here is some example: https://github.com/Metabor/Statemachine

multiple databases with same structure

What is the best way in doctrine2 to deal with different bases but with the same schema. Currently I
generate entities separately for every database, adding namespace and name of database to every metadata object, putting them in the different namespaces (XXX\Base\EntityClass), but with the same alias
create one EntityManager per base (even if they are sharing same connection)
create a proxy which passes calls to multiple EntityManagers and collects responses
merge responses in one output
Is there simpler way of dealing with multiple bases in doctrine2 ?

I can't answer for doctrine2, but I'm doing this in C#.
One set of entities, with strong names and strong types, defined in terms of what the rest of the application needs. This maps the schema, but isn't tied to either database.
One facade, the knows which database you're using at the moment, and directs requests to one of two...
Separate data access namespaces, that handle a common set of operations, and populate results into the single set of entities, which are returned to the requestor through the facade.
Static code generators, based on reading the scema from the database catalog, are useful. You may want to pick one as a model, if you can infer everything you need to know about the other database.
Dynamic code generators are also useful, for inserts, updates, where clauses, etc.
Invest some time in a framework to support all of this. Decide whether you need to keep metadata at run time, and whether it's primarily to support queries or change operations. Provide a common method for extracting data from results sets for either database, so that you can get strongly named and typed result sets back to your application without regard to the underlying database.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.