in my application I do write to a read model table (think CQRS) at certain times. While doing so, I also have to remove older read models. So at any given point I need to:
Remove entities a,b,c
Persist entities x,y,z
In order to maintain a valid read model throughout the lifecycle, I would like to encapsulate this process in a single transaction. Doctrine does provide the necessary means.
However, I also must guarantee that no other entities are being flushed in the process. Sadly, calling doctrine's $em->getConnection()->commit(); seems to to flush the whole unit of work. But according to the docs I have to call that to finalise my transaction.
I cannot introduce a second entity manager to only take of my read model entities as they are in the same namespace as the other entities and apparently that is not the way the doctrine-orm-bundle is supposed to be used.
The only other approach I see is to work on a lower level and circumvent the EntityManager and UnitOfWork completely, but I would like to guarantee transactional integrity and do not see a way to do so without em/ouw.
TLDR: The way your application works might warrant that update-concerns are completely separable into two independent sets, but that's unusual, and fragile (usually not even true at the time of making the assertion). The proper way to model that is using separate EntityManagers for each of the sets, and by manually guarding their interconnections is the semantics of "how they are independent" coded into the system.
Details:
If I understand the question correctly, you are facing a design flaw here. You have two sets of updates (one for the read models and one for the other entities) and you mix them (because you want both in the same entity manager) while you also want to separate them (by the means of a separate transaction). In general, this won't work.
For example, let's think non-read-model entity instance A is just created in-memory (so it has no ID yet) and based on this you decide to reference it with read-model entity instance R. The R->A relationship is valid in memory, but now you expect to be able to flush only read model entities but not others. I.e. when you try to persist+flush R it will reference a non-existing foreign key and your RDBMS will hopefully fail the transaction. On a high-level, this is because a connected in-memory graph should be consistent data in its entirety and when you try to split its valid changes into subsets, you're rearranging the order of those changes implicitly, which may introduce temporary inconsistency, and then your transaction commit might just be at such a point.
Of course, it may happen that you know some rule why such a thing will never happen and why consistent state of each set is warranted in a fashion that is completely independent from the other set. But then you need to write your code reflecting that separation; the way you can do that is to use two entity managers. In that case your code will clearly cope with the two distinct transactions, their separation and how exactly that is consistent from both sides. But even in this case, to avoid clashes of updates, you probably need to have rules outlining a one-way visibility between the two sets, which will also imply an order to committing transactions. This is because transactions in a connected graph can be nested, but not "only overlap", because at each transaction commit you are asking the ORM to sync the consistent in-memory data of the transaction scope to the RDBMS.
I know you mentioned that you do not want to use two EMs, because read-model entities "are in the same namespace as the other entities and apparently that is not the way the doctrine-orm-bundle is supposed to be used".
The namespace does not really matter. You can use them separately in the two managers. You can even interconnect them, if you a) properly merge() them and b) cater for the above mentioned consistency you need to provide for both EM's transactions (because now they are working on one connected graph).
You should elaborate what exactly you refer to by saying "that is not the way the doctrine-orm-bundle is supposed to be used" -- probably there's an error in the original suggestion or something wrong with the way that suggestion is applied to this problem.
In our Symfony2 project, we would like to ensure that modifications across resources are transactional. For example, something like:
namespace ...;
use .../TransactionManager;
class MyService {
protected $tm;
public function __construct(TransactionManager $tm)
{
$this->tm = $tm;
}
/**
* #ManagedTransaction
*/
public function doSomethingAcrossResources()
{
...
// where tm is the transaction manager
// tm is exposing a Doctrine EntityManager adapter here
$this->tm->em->persist($entity);
...
// tm is exposing a redis adapter here
$this->tm->redis->set('foo', 'bar');
if ($somethingWentWrong) {
throw new Exception('Something went terribly wrong');
}
}
}
So there are a couple things to note here:
Every resource will need an adapter exposing it's API (e.g. a Doctrine adapter, Redis adapter, Memcache adapter, File adapter, etc.)
In the case that something goes wrong (Exception thrown), nothing should get written to any managed resource (i.e. rollback everything).
If nothing goes wrong, all resources will get updated as expected
doSomethingAcrossResources function does not have to worry about un-doing changes it made to non-transactional resources like Files and Memcache for example. This is key, because otherwise, this code would likely become a tangled mess of only writing to redis at the appropriate time, etc.
#ManagedTransacton annotation will take care of the rest (commiting / rolling back / starting the transactions required (based on the adapters), etc.)
In the simplest implementation, the tm can simply manage a queue and dequeue all items serially. If an exception is thrown, it simply won't dequeue anything. So the adapters are the transaction manager's knowledge of how to commit each item in the queue.
If an exception occurs during a dequeue, then the transaction manager will look to it's adapters for how to rollback the already dequeued items (probably placed in a rollback stack). This might get tricky for resources like the EntityManager that would need to manage a transaction internally in order to rollback the changes easily. However, redis adapter might cache a previous value during an update, or during an ADD simply issue a DELETE during a rollback.
Does a transaction manager like this already exist? Is there a better way of achieving these goals? Are there caveats that I may be overlooking?
Thanks!
It turns out that we ended up not needing to ensure atomicity across our resources. We do want to be atomic with our database interactions when multiple rows / tables are involved, but we decided to use an event driven architecture instead.
If, say, updating redis fails inside of an event listener, we will stop propagation, but it's not the end of the world -- allowing us to inform the user of a successful operation (even if side effects were not successful).
We can run background jobs to occasionally update redis as needed. This enables us to focus the core business logic in a service method, and then dispatch an event upon success allowing non-critical side effects (updating cache, sending emails, updating elastic search, etc.) to take place, isolated from one another, outside the main business logic.
I'll admit, I haven't unit tested much... but I'd like to. With that being said, I have a very complex registration process that I'd like to optimize for easier unit testing. I'm looking for a way to structure my classes so that I can test them more easily in the future. All of this logic is contained within an MVC framework, so you can assume the controller is the root where everything gets instantiated from.
To simplify, what I'm essentially asking is how to setup a system where you can manage any number of third party modules with CRUD updates. These third party modules are all RESTful API driven and response data is stored in local copies. Something like the deletion of a user account would need to trigger the deletion of all associated modules (which I refer to as providers). These providers may have a dependency on another provider, so the order of deletions/creations is important. I'm interested in which design patterns I should specifically be using to support my application.
Registration spans several classes and stores data in several db tables. Here's the order of the different providers and methods (they aren't statics, just written that way for brevity):
Provider::create('external::create-user') initiates registration at a particular step of a particular provider. The double colon syntax in the first param indicates the class should trigger creation on providerClass::providerMethod. I had made a general assumption that Provider would be an interface with the methods create(), update(), delete() that all other providers would implement it. How this gets instantiated is likely something you need to help me with.
$user = Provider_External::createUser() creates a user on an external API, returns success, and user gets stored in my database.
$customer = Provider_Gapps_Customer::create($user) creates a customer on a third party API, returns success, and stores locally.
$subscription = Provider_Gapps_Subscription::create($customer) creates a subscription associated to the previously created customer on the third party API, returns success, and stores locally.
Provider_Gapps_Verification::get($customer, $subscription) retrieves a row from an external API. This information gets stored locally. Another call is made which I'm skipping to keep things concise.
Provider_Gapps_Verification::verify($customer, $subscription) performs an external API verification process. The result of which gets stored locally.
This is a really dumbed down sample as the actual code relies upon at least 6 external API calls and over 10 local database rows created during registration. It doesn't make sense to use dependency injection at the constructor level because I might need to instantiate 6 classes in the controller without knowing if I even need them all. What I'm looking to accomplish would be something like Provider::create('external') where I simply specify the starting step to kick off registration.
The Crux of the Problem
So as you can see, this is just one sample of a registration process. I'm building a system where I could have several hundred service providers (external API modules) that I need to sign up for, update, delete, etc. Each of these providers gets related back to a user account.
I would like to build this system in a manner where I can specify an order of operations (steps) when triggering the creation of a new provider. Put another way, allow me to specify which provider/method combination gets triggered next in the chain of events since creation can span so many steps. Currently, I have this chain of events occurring via the subject/observer pattern. I'm looking to potentially move this code to a database table, provider_steps, where I list each step as well as it's following success_step and failure_step (for rollbacks and deletes). The table would look as follows:
# the id of the parent provider row
provider_id int(11) unsigned primary key,
# the short, slug name of the step for using in codebase
step_name varchar(60),
# the name of the method correlating to the step
method_name varchar(120),
# the steps that get triggered on success of this step
# can be comma delimited; multiple steps could be triggered in parallel
triggers_success varchar(255),
# the steps that get triggered on failure of this step
# can be comma delimited; multiple steps could be triggered in parallel
triggers_failure varchar(255),
created_at datetime,
updated_at datetime,
index ('provider_id', 'step_name')
There's so many decisions to make here... I know I should favor composition over inheritance and create some interfaces. I also know I'm likely going to need factories. Lastly, I have a lot of domain model shit going on here... so I likely need business domain classes. I'm just not sure how to mesh them all together without creating an utter mess in my pursuit of the holy grail.
Also, where would be the best place for the db queries to take place?
I have a model for each database table already, but I'm interested in knowing where and how to instantiate the particular model methods.
Things I've Been Reading...
Design Patterns
The Strategy Pattern
Composition over Inheritance
The Factory method pattern
The Abstract factory pattern
The Builder pattern
The Chain-of-responsibility pattern
You're already working with the pub/sub pattern, which seems appropriate. Given nothing but your comments above, I'd be considering an ordered list as a priority mechanism.
But it still doesn't smell right that each subscriber is concerned with the order of operations of its dependents for triggering success/failure. Dependencies usually seem like they belong in a tree, not a list. If you stored them in a tree (using the composite pattern) then the built-in recursion would be able to clean up each dependency by cleaning up its dependents first. That way you're no longer worried about prioritizing in which order the cleanup happens - the tree handles that automatically.
And you can use a tree for storing pub/sub subscribers almost as easily as you can use a list.
Using a test-driven development approach could get you what you need, and would ensure your entire application is not only fully testable, but completely covered by tests that prove it does what you want. I'd start by describing exactly what you need to do to meet one single requirement.
One thing you know you want to do is add a provider, so a TestAddProvider() test seems appropriate. Note that it should be pretty simple at this point, and have nothing to do with a composite pattern. Once that's working, you know that a provider has a dependent. Create a TestAddProviderWithDependent() test, and see how that goes. Again, it shouldn't be complex. Next, you'd likely want to TestAddProviderWithTwoDependents(), and that's where the list would get implemented. Once that's working, you know you want the Provider to also be a Dependent, so a new test would prove the inheritance model worked. From there, you'd add enough tests to convince yourself that various combinations of adding providers and dependents worked, and tests for exception conditions, etc. Just from the tests and requirements, you'd quickly arrive at a composite pattern that meets your needs. At this point I'd actually crack open my copy of GoF to ensure I understood the consequences of choosing the composite pattern, and to make sure I didn't add an inappropriate wart.
Another known requirement is to delete providers, so create a TestDeleteProvider() test, and implement the DeleteProvider() method. You won't be far away from having the provider delete its dependents, too, so the next step might be creating a TestDeleteProviderWithADependent() test. The recursion of the composite pattern should be evident at this point, and you should only need a few more tests to convince yourself that deeply nested providers, empty leafs, wide nodes, etc., all will properly clean themselves up.
I would assume that there's a requirement for your providers to actually provide their services. Time to test calling the providers (using mock providers for testing), and adding tests that ensure they can find their dependencies. Again, the recursion of the composite pattern should help build the list of dependencies or whatever you need to call the correct providers correctly.
You might find that providers have to be called in a specific order. At this point you might need to add prioritization to the lists at each node within the composite tree. Or maybe you have to build an entirely different structure (such as a linked list) to call them in the right order. Use the tests and approach it slowly. You might still have people concerned that you delete dependents in a particular externally prescribed order. At this point you can use your tests to prove to the doubters that you will always delete them safely, even if not in the order they were thinking.
If you've been doing it right, all your previous tests should continue to pass.
Then come the tricky questions. What if you have two providers that share a common dependency? If you delete one provider, should it delete all of its dependencies even though a different provider needs one of them? Add a test, and implement your rule. I figure I'd handle it through reference counting, but maybe you want a copy of the provider for the second instance, so you never have to worry about sharing children, and you keep things simpler that way. Or maybe it's never a problem in your domain. Another tricky question is if your providers can have circular dependencies. How do you ensure you don't end up in a self-referential loop? Write tests and figure it out.
After you've got this whole structure figured out, only then would you start thinking about the data you would use to describe this hierarchy.
That's the approach I'd consider. It may not be right for you, but that's for you to decide.
Unit Testing
With unit testing, we only want to test the code that makes up the individual unit of source code, typically a class method or function in PHP (Unit Testing Overview). Which indicates that we don't want to actually test the external API in Unit Testing, we only want to test the code we are writing locally. If you do want to test entire workflows, you are likely wanting to perform integration testing (Integration Testing Overview), which is a different beast.
As you specifically asked about designing for Unit Testing, lets assume you actually mean Unit Testing as opposed to Integration Testing and submit that there are two reasonable ways to go about designing your Provider classes.
Stub Out
The practice of replacing an object with a test double that (optionally) returns configured return values is refered to as stubbing. You can use a stub to "replace a real component on which the SUT depends so that the test has a control point for the indirect inputs of the SUT. This allows the test to force the SUT down paths it might not otherwise execute". Reference & Examples
Mock Objects
The practice of replacing an object with a test double that verifies expectations, for instance asserting that a method has been called, is referred to as mocking.
You can use a mock object "as an observation point that is used to verify the indirect outputs of the SUT as it is exercised. Typically, the mock object also includes the functionality of a test stub in that it must return values to the SUT if it hasn't already failed the tests but the emphasis is on the verification of the indirect outputs. Therefore, a mock object is lot more than just a test stub plus assertions; it is used a fundamentally different way".
Reference & Examples
Our Advice
Design your class to both all both Stubbing and Mocking. The PHP Unit Manual has an excellent example of Stubbing and Mocking Web Service. While this doesn't help you out of the box, it demonstrates how you would go about implementing the same for the Restful API you are consuming.
Where is the best place for the db queries to take place?
We suggest you use an ORM and not solve this yourself. You can easily Google PHP ORM's and make your own decision based off your own needs; our advice is to use Doctrine because we use Doctrine and it suits our needs well and over the past few years, we have come to appreciate how well the Doctrine developers know the domain, simply put, they do it better than we could do it ourselves so we are happy to let them do it for us.
If you don't really grasp why you should use an ORM, see Why should you use an ORM? and then Google the same question. If you still feel like you can roll your own ORM or otherwise handle the Database Access yourself better than the guys dedicated to it, we would expect you to already know the answer to the question. If you feel you have a pressing need to handle it yourself, we suggest you look at the source code for a number a of ORM's (See Doctrine on Github) and find the solution that best fits your scenario.
Thanks for asking a fun question, I appreciate it.
Every single dependency relationship within your class hierarchy must be accessible from outside world (shouldn't be highly coupled). For instance, if you are instantiating class A within class B, class B must have setter/getter methods implemented for class A instance holder in class B.
http://en.wikipedia.org/wiki/Dependency_injection
The furthermost problem I can see with your code - and this hinders you from testing it actually - is making use of static class method calls:
Provider::create('external::create-user')
$user = Provider_External::createUser()
$customer = Provider_Gapps_Customer::create($user)
$subscription = Provider_Gapps_Subscription::create($customer)
...
It's epidemic in your code - even if you "only" outlined them as static for "brevity". Such attitiude is not brevity it's counter-productive for testable code. Avoid these at all cost incl. when asking a question about Unit-Testing, this is known bad practice and it is known that such code is hard to test.
After you've converted all static calls into object method invocations and used Dependency Injection instead of static global state to pass the objects along, you can just do unit-testing with PHPUnit incl. making use of stub and mock objects collaborating in your (simple) tests.
So here is a TODO:
Refactor static method calls into object method invocations.
Use Dependency Injection to pass objects along.
And you very much improved your code. If you argue that you can not do that, do not waste your time with unit-testing, waste it with maintaining your application, ship it fast, let it make some money, and burn it if it's not profitable any longer. But don't waste your programming life with unit-testing static global state - it's just stupid to do.
Think about layering your application with defined roles and responsibilities for each layer. You may like to take inspiration from Apache-Axis' message flow subsystem. The core idea is to create a chain of handlers through which the request flows until it is processed. Such a design facilitates plugable components which may be bundled together to create higher order functions.
Further you may like to read about Functors/Function Objects, particularly Closure, Predicate, Transformer and Supplier to create your participating components. Hope that helps.
Have you looked at the state design pattern? http://en.wikipedia.org/wiki/State_pattern
You could make all your steps as different states in state machine and it would look like graph. You could store this graph in your database table/xml, also every provider can have his own graph which represents order in which execution should happen.
So when you get into certain state you may trigger event/events (save user, get user). I dont know your application specific, but events can be res-used by other providers.
If it fails on some of the steps then different graph path is executed.
If you will correctly abstract it you could have loosely coupled system which follows orders given by graph and executes events based on state.
Then later if you need add some other provider you only need to create graph and/or some new events.
Here is some example: https://github.com/Metabor/Statemachine
The whole "when to throw exception or return value" questions has been asked a lot (see the following to see just one example):
Should a retrieval method return 'null' or throw an exception when it can't produce the return value?
and I completely agree with the answers in main.
Now my question arises from adding a little more context to the above when applying this to a more complex system. Ill try and keep this as brief and simple as possible.
Right we have an example MVC PHP application:
Model A: has a function get_car($id) which returns a car object.
Controller A has a simple function for say showing a car to the user
Controller B however has a complex function that say gets the car, modify it (say through one of model A's set functions) and also updates other tables based on some of these new values through other models and libraries throughout the system - very complex ay lol
we now get to the main part of my question:
For data integrity I want to use MySQL transactions. This is where I run into a "what's best / what's best practice" scenario...
We write Model A to return FALSE if the car is not found or there is an SQL error.
This is fine for Controller A as it just wants to know if there was a error and bom out, so we just check the return value an bom - fine.
We now get to Controller B. Controller B say does some database updating before the Model A function is called which we need to roll back on error so we need to use a transactions. now this is where I get to my problem. do I leave Model A as a return value and just check it or do I change it to throw exception with the knock on effect of then having to also re-write Controller A as we now need to catch the exception... then (not done yet ;o)) do I roll back in the catch of the model (but how do we know if a transaction has been used or not?) or do we catch and re-throw or allow to bubble up to the controller catch and do the roll back there?
what I'm trying to say is that if I have lots of models and controllers with database interaction should I just make them throw exceptions and then wrap all my other code eg controller functions in try catches encase the model or library functions ever throw, or, do I make the models "self contained" to tidy and handle there own problems but then what do I do about rolling back a transaction if (for this "call") one was open (as per my example above not every time is a transaction opened...)? if this was the case I would have to make all my functions return something and then check this in the controller, as this is the only place that knows if there is an open transaction or not...
So to clarify I can use a try catch to catch and roll back in a controller, that's ok, but how to I do this from "further down" eg in a model or library function... that could be called both during and transaction or just as an auto commit normal MySQL call?
An explained answer would be great (as I like to understand why I am doing something) but if not a some vote for the favourite of the follow solutions (well the solutions I can see):
1) make all model and library functions always return a value and then the controller can either just bom or do a try catch to roll back where necessary - but meaning that I would have to check the return value of every model and library function everywhere they are used.
2) make all model and library functions throw exceptions (say on SQL error) and wrap every controller (which would call the model and library functions) in a try catch where the catch would either just bom or roll back if necessary...
also please note "bom" is push user somewhere or show a pretty error (before someone says "its bad practice to just allow your application to die..." lol)
I hope you get where Im coming from here and sorry for the long loooooong question.
Thanks in advance
Ben
[There's a theoretical problem implicit in the "For data integrity I want to use MySQL transactions"... since MySQL historically hasn't been very ACID - PostgreSQL and Oracle both provide stronger support for ACID. However, back to the real question...]
Both your (1) and (2) focus on exceptions versus failure-return values, but my impression is that this isn't the key part of detangling exceptions, error returns, and open transactions (and some databases support SQL exceptions as well). Instead, I'd focus on keeping the transaction state tied to the nesting of the functions manipulating the model. Here are some thoughts along this line:
You will probably always have error returns from some library functions anyway, so having Model A return FALSE isn't really breaking the paradigm, nor is there anything particularly troublesome about a mix of error returns versus exceptions. However, error returns MUST bubble up correctly - or be converted to exceptions if they go beyond what can be locally address.
Nested transactions are the most obvious way to have one controller start a database manipulation and still call other stuff in the app that also uses transactions. This allows a failed sub-sub-function to abort just its own part transaction and take either the error return or exception approach to bubbling the error up on the non-SQL side while the closed sub-transactions still maintain reasonable matching state. This usually needs to be simulated in code outside of the database (this is essentially what Django does).
Your code could start a new (potentially large) transaction, and keep track of the fact that it's already open to keep the sub-sub-functions in your code from trying to reopen it.
In some databases, code can detect whether a transaction is already open based on the database session state, allowing you to check the DB session state instead of tracking it in code.
Both of the above allow one to use savepoints to simulate truly nested transactions.
One must be very careful to avoid calling SQL calls with implicit commits (CREATE TABLE, for example). MySQL probably deserves a lot more caution around this issue than, say, PostgreSQL.
One way to implement the one big transaction approach is to have high-level function that initiates the transaction and then itself calls the top of whatever Controller B needs to do. This makes either bubbling up errors or having a special abort-transaction exception pretty straightforward. Only the top function would call commit rather than abort if no subfunction failed and no exception was caught. Subfunctions wouldn't call commit.
Conversely, you could have all of your functions pay attention to transactional depth implemented in the non-SQL side (your code), although this is harder to set up in some languages than others (it's pretty easy using decorators in Python, for example). This would allow any of them to call commit if they were done and the transactional depth at zero.
Hope this helps someone :-)
I'm stuck with a general OOP problem, and can't find the right way to phrase my question.
I want to create a class that gives me an object which I can write to once, persist it to storage and then be unable to change the properties. (for example: invoice information - once written to storage, this should be immutable). Not all information is available immediately, during the lifecycle of the object, information is added.
What I'd like to avoid is having exceptions flying out of setters when trying to write, because it feels like you're offering a contract you don't intend to keep.
Here are some ideas I've considered so far:
Pass in any write-information in the constructor. Constructor throws exception if the data is already present.
Create multiple classes in an inheritance tree, with each class representing the entity at some stage of its lifecycle, with appropriate setters where needed. Add a colletive interface for all the read operations.
Silently discarding any inappropriate writes.
My thoughts on these:
1. Makes the constructor highly unstable, generally a bad idea.
2. Explosion of complexity, and doesn't solve the problem completely (you can call the setter twice in a row, within the same request)
3. Easy, but same problem as with the exceptions; it's all a big deception towards your clients.
(Just FYI: I'm working in PHP5 at the moment - although I suspect this to be a generic problem)
Interesting problem. I think your best choice was #1, but I'm not sure I'd do it in the constructor. That way the client code can choose what it wants to do with the exception (suppress them, handle them, pass them up to the caller, etc...). And if you don't like exceptions, you could move the writing to a write() method that returns true if the write was successful and false otherwise.