In our Symfony2 project, we would like to ensure that modifications across resources are transactional. For example, something like:
namespace ...;
use .../TransactionManager;
class MyService {
protected $tm;
public function __construct(TransactionManager $tm)
{
$this->tm = $tm;
}
/**
* #ManagedTransaction
*/
public function doSomethingAcrossResources()
{
...
// where tm is the transaction manager
// tm is exposing a Doctrine EntityManager adapter here
$this->tm->em->persist($entity);
...
// tm is exposing a redis adapter here
$this->tm->redis->set('foo', 'bar');
if ($somethingWentWrong) {
throw new Exception('Something went terribly wrong');
}
}
}
So there are a couple things to note here:
Every resource will need an adapter exposing it's API (e.g. a Doctrine adapter, Redis adapter, Memcache adapter, File adapter, etc.)
In the case that something goes wrong (Exception thrown), nothing should get written to any managed resource (i.e. rollback everything).
If nothing goes wrong, all resources will get updated as expected
doSomethingAcrossResources function does not have to worry about un-doing changes it made to non-transactional resources like Files and Memcache for example. This is key, because otherwise, this code would likely become a tangled mess of only writing to redis at the appropriate time, etc.
#ManagedTransacton annotation will take care of the rest (commiting / rolling back / starting the transactions required (based on the adapters), etc.)
In the simplest implementation, the tm can simply manage a queue and dequeue all items serially. If an exception is thrown, it simply won't dequeue anything. So the adapters are the transaction manager's knowledge of how to commit each item in the queue.
If an exception occurs during a dequeue, then the transaction manager will look to it's adapters for how to rollback the already dequeued items (probably placed in a rollback stack). This might get tricky for resources like the EntityManager that would need to manage a transaction internally in order to rollback the changes easily. However, redis adapter might cache a previous value during an update, or during an ADD simply issue a DELETE during a rollback.
Does a transaction manager like this already exist? Is there a better way of achieving these goals? Are there caveats that I may be overlooking?
Thanks!
It turns out that we ended up not needing to ensure atomicity across our resources. We do want to be atomic with our database interactions when multiple rows / tables are involved, but we decided to use an event driven architecture instead.
If, say, updating redis fails inside of an event listener, we will stop propagation, but it's not the end of the world -- allowing us to inform the user of a successful operation (even if side effects were not successful).
We can run background jobs to occasionally update redis as needed. This enables us to focus the core business logic in a service method, and then dispatch an event upon success allowing non-critical side effects (updating cache, sending emails, updating elastic search, etc.) to take place, isolated from one another, outside the main business logic.
Related
My question is about more philosophical than technical issues.
A few words about Doctrine`s EM. It closes connection and clear itself if any exception occurred during it work: connection to database failed (common case in long-running consumers with low number of incoming tasks), error in SQL statement, or something else, related to DB-server, or EM itself. After this EM instance is completely unusable.
So, real-world example: i have a queue and consumer, that`s running as console worker and wait for tasks. Consumer has next dependencies:
EntityManager (EM)
Service1 -> has dependency from EM and Doctrine Repository1
Service2 -> has dependency from EM and Doctrine Repository2
ServiceN -> has dependency from EM and Doctrine RepositoryN
If EM service is failed - Service(1-N) and Repository(1-N), that depends on this EM, will be also throw errors when called, because EM is no longer works correctly. What I should do in this case?
"let-it-crash": worker stoppped with error and later reloaded by
supervisord. Leads to increase number of useless errors in
logs\stderr.
do some magic with $connection->ping() in each iteration: actually, ping() just execute SELECT 1;, so, this leads to
increase number of useless queries to DB server.
same as before, but in case of EM fail create new one on consumer: execute ping() on each iteration, if it failed - create new EM.
But, all services used in consumer should be also re-created, so I
need a Factory for each of them. This way leads to increase number
of classes and more complicated logic in consumer: re-create all
services (and it dependencies) on each iteration with new, or old
EM, or detect EM re-creation and re-create all dependent services
only in case of new EM. But this leads to abstaction leak: consumer
should not know what EM instance it uses - old or new, and should
not do this crappy things.
What is the best way to deal with this things?
I would share some thoughts here.
"Leads to increase number of useless errors in logs\stderr" - I do not think these are useless errors. If your software throws an exception, you should know about that. A log file of the software is best when it doesn't have any exceptions, but that's rarely the case. Anyway, any database exception and a rate at which it occurs, should be investigated.
I would not rely on reestablishing connection, but instead rely on Doctrine API to initialize itself. This answer has some details on how to do that for several Doctrine2 versions.
I think this is too much of the logic to implement and will only complicate matters.
If I were to choose, I would go with option #1 (let-it-crash) because it is the simpliest of all and it does not hide anything from us.
In my obsolate procedural code (which I'd like now to translate into OOP) I have simple database transaction code like this:
mysql_query("BEGIN");
mysql_query("INSERT INTO customers SET cid=$cid,cname='$cname'");
mysql_query("INSERT INTO departments SET did=$did,dname='$dname'");
mysql_query("COMMIT");
If I build OOP classes Customer and Department for mapping customers and departments database tables I can insert table records like:
$customer=new Customer();
$customer->setId($cid);
$customer->setName($cname);
$customer->save();
$department=new Department();
$department->setId($did);
$department->setName($dname);
$department->save();
My Customer and Department classes internally use other DB class for querying database.
But how to make $customer.save() and $department.save() parts of a database transaction?
Should I have one outer class starting/ending transaction with Customer and Department classes instantiated in it or transaction should be started somehow in Customer (like Customer.startTransaction()) and ended in Department (like Department.endTransaction())? Or...
Additional object is the way to go. Something like this:
$customer=new Customer();
$customer->setId($cid);
$customer->setName($cname);
$department=new Department();
$department->setId($did);
$department->setName($dname);
$transaction = new Transaction();
$transaction->add($customer);
$transaction->add($department);
$transaction->commit();
You can see that there is no call to save() method on $customer and $department anymore. $transaction object takes care of that.
Implementation can be as simple as this:
class Transaction
{
private $stack;
public function __construct()
{
$this->stack = array();
}
public function add($entity)
{
$this->stack[] = $entity;
}
public function commit()
{
mysql_query("BEGIN");
foreach ($this->stack as $entity) {
$entity->save();
}
mysql_query("COMMIT");
}
}
How to make $customer.save() and $department.save() parts of a database transaction?
You don't have to do anything besides start the transaction.
In most DBMS interfaces, the transaction is "global" to the database connection. If you start a transaction, then all subsequent work is automatically done within the scope of that transaction. If you commit, you have committed all changes since the last transaction BEGIN. If you rollback, you discard all changes since the last BEGIN (there's also an option to rollback to the last transaction savepoint).
I've only used one database API that allowed multiple independent transactions to be active per database connection simultaneously (that was InterBase / Firebird). But this is so uncommon, that standard database interfaces like ODBC, JDBC, PDO, Perl DBI just assume that you only get one active transaction per db connection, and all changes happen within the scope of the one active transaction.
Should I have one outer class starting/ending transaction with Customer and Department classes instantiated in it or transaction should be started somehow in Customer (like Customer.startTransaction()) and ended in Department (like Department.endTransaction())? Or...
You should start a transaction, then invoke domain model classes like Customer and Department, then afterwards, either commit or rollback the transaction in the calling code.
The reason for this is that domain model methods can call other domain model methods. You never know how many levels deep these calls go, so it's really difficult for the domain model to know when it's time to commit or rollback.
For some pitfalls of doing this, see How do detect that transaction has already been started?
But they don't have to know that. Customer and Department should just do their work, inserting and deleting and updating as needed. Once they are done, the calling code decides if it wants to commit or rollback the whole set of work.
In a typical PHP application, a transaction is usually the same amount of work as one PHP request. It's possible, though uncommon, to do more than one transaction during a given PHP request, and it's not possible for a transaction to span across multiple PHP requests.
So the simple answer is that your PHP script should start a transaction near the beginning of the script, before invoking any domain model classes, then commit or rollback at the end of the script, or once the domain model classes have finished their work.
You are migrating to OOP, and thats great, but soon you will find yourself migrating to an arquitecture with a well diferenciated Data Access Layer, including a more complex way of separating data from control. Now, i guess you are using some kind of Data access object, that is a great first approach pattern, but for sure you can go further. Some of the answer here already lead you in that direction. You shouldent think in your objects as the basis of your arquitecture, and use some helper objects to query database. Instead, you should think about a fully featured layer, with all required generic classes that takes care of the comunication with the database, that you will use in all your projects, and then have the business-level-objects, like customer or department, than know as litle as possible about database implementations.
For this, for sure you will have an outer class handling transactions, but probably also other taking care of security, other for building queries providing a unique api regardless or the database engine, and even more, a class that reads objects in order to put them in the database, so the object itself doesn't even know that it is meant to end in a database.
Achieve this, would be a hard and long work, but after that, you could have a custom and widely reusable layer that will make your projects more escalable, more stable, and more trustable. And that will be great and you will learn a lot and after that you would fill quite good. You will have some kind of DBAL or ORM.
But that wouldnt also be the best solution, since there are people that already have been years doing that, and it will be hard to achieve what the already have.
So, what i recommend, for any medium size project, is that you take data base abstraction as serious as you can, and any opensource ORM, that happens to be easy to use, and finally you will save time and get a system much better.
for example, doctrine has a very nice way of handling transactions and concurrency, in two ways: implicit, taking automatically care of the normal operations, or implicit, when you need to take over and control transaction demarcation yourself. check it out here. Also, there are some other complex posibilities like transaction nesting, and others.
The most famous and reliable ORM are
Doctrine, and
Propel
I use doctrine mostly, since it has a module to integrate with Zend Framework 2 that i like, but propel has some aspects that i like a lot.
Probably you would have to refactor somethings, and you dont feel like doing it at this point, but i can say for my experience, that this is one of those things you dont even want to think about, and years after you start using it and realize how you wasted time :-)recommend you to consider this if not know, in your very next project.
UPDATE
Some thoughts after Tomas' comment.
It's true that for not so big projects (especially if you are not very familiar with orms, or your model is very complex) it can be a big effort to integrate a vendor orm.
But what i can say after years developing projects of any size, is that for any medium size one, i would use at least a custom, less serious and more flexible home-made orm, with a sort of generic classes, and as few as possible business oriented repositories, where an entity knows its table, and probably other related tables, and where you can encapsulate some sql or custom query function calls, but all around that entity (for example the main table of the entity, the table of pictures associated to that entity, and so) in order to provide to the controller a single interface to the data, so at any range the database engine is independent of the API of the model, and as much important as that, the controller doesn't have to be aware of any DBMS aspects, like the use of a transactions, something that is meant just to ensure a behavior that is purely model-related, and in a scandalous low level: related pretty much to DBMS technical needs. i mean, your controller could know that it is storing stuff in a database, but for sure it doesn't have to even know what a transaction is.
For sure this is a philosophical discussion, and it could be many equally valid points of view.
For any custom ORM, i would recommend to start looking for some DAO/DTO generator that can help you to create the main classes from your database, so you only need to adapt them to your needs at the points where you find exceptions to the normal behavior of a normal create-read-update-delete. This reminds me that you can also look for PHP CRUD and find some useful and fun tools.
I'm working on a PHP/MySQL app using the Yii framework.
I've come across the following situation:
In my VideoController, I have a actionCreate which creates a new Video and actionPrivacy which sets the privacy on the Video. The problem is that during the actionCreate the setPrivacy method of the Video model is called which currently has a transaction. I would like the creation of the Video to be in a transaction as well which leads to an error since a transaction is already active.
In the comment on this answer, Bill Karwin writes
So there's no need to make Domain Model classes or DAO classes manage
transactions -- just do it at the Controller level
and in this answer:
Since you're using PHP, the scope of your transactions is at most a
single request. So you should just use container-managed transactions,
not service-layer transa. That is, start the transaction at the start
of handling the request, and commit (or rollback) as you finish
handling the request.
If I manage the transactions in the controller, I would have a bunch of code that looks like:
public function actionCreate() {
$trans = Yii::app()->getDb()->beginTransaction();
...action code...
$trans->commit();
}
That leads to duplicated code in a lot of places where I need transactions for the action.
Or I could refactor it into the beforeAction() and afterAction() methods of the parent Controller class which would then automatically create transactions for each action being performed.
Would there be any problems with this method? What is a good practice for transaction management for a PHP app?
The reason that I say transactions don't belong in the model layer is basically this:
Models can call methods in other models.
If a model tries to start a transaction, but it has no knowledge of whether its caller started a transaction already, then the model has to conditionally start a transaction, as shown in the code example in #Bubba's answer. The methods of the model have to accept a flag so that the caller can tell it whether it is permitted to start its own transaction or not. Or else the model has to have the ability to query its caller's "in a transaction" state.
public function setPrivacy($privacy, $caller){
if (! $caller->isInTransaction() ) $this->beginTransaction();
$this->privacy = $privacy;
// ...action code..
if (! $caller->isInTransaction() ) $this->commit();
}
What if the caller isn't an object? In PHP, it could be a static method or simply non-object-oriented code. This gets very messy, and leads to a lot of repeated code in models.
It's also an example of Control Coupling, which is considered bad because the caller has to know something about the internal workings of the called object. For example, some of the methods of your Model may have a $transactional parameter, but other methods may not have that parameter. How is the caller supposed to know when the parameter matters?
// I need to override method's attempt to commit
$video->setPrivacy($privacy, false);
// But I have no idea if this method might attempt to commit
$video->setFormat($format);
The other solution I have seen suggested (or even implemented in some frameworks like Propel) is to make beginTransaction() and commit() no-ops when the DBAL knows it's already in a transaction. But this can lead to anomalies if your model tries to commit and finds that its doesn't really commit. Or tries to rollback and has that request ignored. I've written about these anomalies before.
The compromise I have suggested is that Models don't know about transactions. The model doesn't know if its request to setPrivacy() is something it should commit immediately or is it part of a larger picture, a more complex series of changes that involve multiple Models and should only be committed if all these changes succeed. That's the point of transactions.
So if Models don't know whether they can or should begin and commit their own transaction, then who does? GRASP includes a Controller pattern which is a non-UI class for a use case, and it is assigned the responsibility to create and control all the pieces to accomplish that use case. Controllers know about transactions because that's the place all the information is accessible about whether the complete use case is complex, and requires multiple changes to be done in Models, within one transaction (or perhaps within several transactions).
The example I have written about before, that is to start a transaction in the beforeAction() method of an MVC Controller and commit it in the afterAction() method, is a simplification. The Controller should be free to start and commit as many transactions as it logically requires to complete the current action. Or sometimes the Controller could refrain from explicit transaction control, and allow the Models to autocommit each change.
But the point is that the information about what tranasction(s) are necessary is something that the Models don't know -- they have to be told (in the form of a $transactional parameter) or else query it from their caller, which would have to delegate the question all the way up to the Controller's action anyway.
You may also create a Service Layer of classes that each know how to execute such complex use cases, and whether to enclose all the changes in a single transaction. That way you avoid a lot of repeated code. But it's not common for PHP apps to include a distinct Service Layer; the Controller's action is usually coincident with a Service Layer.
Best Practice: Put the the transactions in the model, do not put the transactions in the controller.
The primary advantage of the MVC design pattern is this: MVC makes model classes reusable without modification. Make maintenance and implementing new features easy.
For example, presumably you are primarily developing for a browser where a user enters one collection of data at a time, and you move data manipulation into the controller. Later you realize you need to support allowing the user to upload a large number of collections of data to be imported on the server from the command line.
If all the data manipulation was in the model, you could simply slurp in the data and pass it to the model to handle. If there is needful (transactional) functionality in the controller, you would have to replicate that in your CLI script.
On the other hand, perhaps you end up with another controller that needs to perform the same functionality, from a different point. You will need to replicate code in that other controller as well now.
To that end, you merely need to solve the transaction challenges in the model.
Assuming you have a Video class (model) with the setPrivacy() method that already has transaction build in; and you want to call it from another method persist() which needs to also wrap its functionality in a larger transaction, you could merely modify setPrivacy() to perform a conditional transaction.
Perhaps something like this.
class Video{
private $privacy;
private $transaction;
public function __construct($privacy){
$this->privacy = $privacy;
}
public function persist(){
$this->beginTransaction();
// ...action code...
$this->setPrivacy($this->privacy, false);
// ...action code...
$this->commit();
}
public function setPrivacy($privacy, $transactional = true){
if ($transactional) $this->beginTransaction();
$this->privacy = $privacy;
// ...action code..
if ($transactional) $this->commit();
}
private function beginTransaction(){
$this->transaction = Yii::app()->getDb()->beginTransaction();
}
private function commit(){
$this->transaction->commit();
}
}
In the end, your instincts are correct (re: That leads to duplicated code in a lot of places where I need transactions for the action.). Architect your models to support the myriad of transactional needs you have, and let the controller merely determine which entry point (method) it will use in it's own context.
No you are right. The transaction is delegated by the "create" method which is what a controller is supposed to do. Your suggestion of using a 'wrapper' like beforeAction() is the way to go. Just make the controller extend or implement this class. It looks like you are looking for an Observer type pattern or a factory-like implementation.
Well, one disadvantage of these broad transactions (over the whole request) is that you limit concurrency capabilities of your database engine and you also increase deadlocks probability. From this point of view, it might pay off to put transactions only where you need them and let them cover only code that needs to be covered.
If possible, I would definitely go for placing transaction in models. The problem with overlapping transactions can be solved by introducing BaseModel (ancestors of all models) and variable transactionLock in that model. Then you simply wrap your begin/commit transaction directives into BaseModel methods that respect this variable.
The whole "when to throw exception or return value" questions has been asked a lot (see the following to see just one example):
Should a retrieval method return 'null' or throw an exception when it can't produce the return value?
and I completely agree with the answers in main.
Now my question arises from adding a little more context to the above when applying this to a more complex system. Ill try and keep this as brief and simple as possible.
Right we have an example MVC PHP application:
Model A: has a function get_car($id) which returns a car object.
Controller A has a simple function for say showing a car to the user
Controller B however has a complex function that say gets the car, modify it (say through one of model A's set functions) and also updates other tables based on some of these new values through other models and libraries throughout the system - very complex ay lol
we now get to the main part of my question:
For data integrity I want to use MySQL transactions. This is where I run into a "what's best / what's best practice" scenario...
We write Model A to return FALSE if the car is not found or there is an SQL error.
This is fine for Controller A as it just wants to know if there was a error and bom out, so we just check the return value an bom - fine.
We now get to Controller B. Controller B say does some database updating before the Model A function is called which we need to roll back on error so we need to use a transactions. now this is where I get to my problem. do I leave Model A as a return value and just check it or do I change it to throw exception with the knock on effect of then having to also re-write Controller A as we now need to catch the exception... then (not done yet ;o)) do I roll back in the catch of the model (but how do we know if a transaction has been used or not?) or do we catch and re-throw or allow to bubble up to the controller catch and do the roll back there?
what I'm trying to say is that if I have lots of models and controllers with database interaction should I just make them throw exceptions and then wrap all my other code eg controller functions in try catches encase the model or library functions ever throw, or, do I make the models "self contained" to tidy and handle there own problems but then what do I do about rolling back a transaction if (for this "call") one was open (as per my example above not every time is a transaction opened...)? if this was the case I would have to make all my functions return something and then check this in the controller, as this is the only place that knows if there is an open transaction or not...
So to clarify I can use a try catch to catch and roll back in a controller, that's ok, but how to I do this from "further down" eg in a model or library function... that could be called both during and transaction or just as an auto commit normal MySQL call?
An explained answer would be great (as I like to understand why I am doing something) but if not a some vote for the favourite of the follow solutions (well the solutions I can see):
1) make all model and library functions always return a value and then the controller can either just bom or do a try catch to roll back where necessary - but meaning that I would have to check the return value of every model and library function everywhere they are used.
2) make all model and library functions throw exceptions (say on SQL error) and wrap every controller (which would call the model and library functions) in a try catch where the catch would either just bom or roll back if necessary...
also please note "bom" is push user somewhere or show a pretty error (before someone says "its bad practice to just allow your application to die..." lol)
I hope you get where Im coming from here and sorry for the long loooooong question.
Thanks in advance
Ben
[There's a theoretical problem implicit in the "For data integrity I want to use MySQL transactions"... since MySQL historically hasn't been very ACID - PostgreSQL and Oracle both provide stronger support for ACID. However, back to the real question...]
Both your (1) and (2) focus on exceptions versus failure-return values, but my impression is that this isn't the key part of detangling exceptions, error returns, and open transactions (and some databases support SQL exceptions as well). Instead, I'd focus on keeping the transaction state tied to the nesting of the functions manipulating the model. Here are some thoughts along this line:
You will probably always have error returns from some library functions anyway, so having Model A return FALSE isn't really breaking the paradigm, nor is there anything particularly troublesome about a mix of error returns versus exceptions. However, error returns MUST bubble up correctly - or be converted to exceptions if they go beyond what can be locally address.
Nested transactions are the most obvious way to have one controller start a database manipulation and still call other stuff in the app that also uses transactions. This allows a failed sub-sub-function to abort just its own part transaction and take either the error return or exception approach to bubbling the error up on the non-SQL side while the closed sub-transactions still maintain reasonable matching state. This usually needs to be simulated in code outside of the database (this is essentially what Django does).
Your code could start a new (potentially large) transaction, and keep track of the fact that it's already open to keep the sub-sub-functions in your code from trying to reopen it.
In some databases, code can detect whether a transaction is already open based on the database session state, allowing you to check the DB session state instead of tracking it in code.
Both of the above allow one to use savepoints to simulate truly nested transactions.
One must be very careful to avoid calling SQL calls with implicit commits (CREATE TABLE, for example). MySQL probably deserves a lot more caution around this issue than, say, PostgreSQL.
One way to implement the one big transaction approach is to have high-level function that initiates the transaction and then itself calls the top of whatever Controller B needs to do. This makes either bubbling up errors or having a special abort-transaction exception pretty straightforward. Only the top function would call commit rather than abort if no subfunction failed and no exception was caught. Subfunctions wouldn't call commit.
Conversely, you could have all of your functions pay attention to transactional depth implemented in the non-SQL side (your code), although this is harder to set up in some languages than others (it's pretty easy using decorators in Python, for example). This would allow any of them to call commit if they were done and the transactional depth at zero.
Hope this helps someone :-)
I'm just wondering how to best handle transactions across multiple service layers. The service layers use an ORM to store and retrieve from the database. Should the transactions be known and handled within the individual service layers? Or should they be handled by another layer?
For example: I have two service layers for users and clients. I would like to:
1) Create and save a new client
2) Create and save a new user
3) Assign that user to the client
All within a single transaction.
A simple example might look like this:
$userManagementService = new UserManagementService;
$newUserData = array(...);
$newUser = $userManagementService->create($newUserData);
$clientManagementService = new ClientManagementService;
$newClientData = array(...);
$newClient = $clientManagementService->create($newClientData);
$userManagementService->assignUserToClient($newUser, $newClient);
Where should transaction logic go?
Do not try to do nested transactions within service layers or within the ORM.
Transactions are global to the DB connection. Unless your RDBMS supports nested transactions natively and your DB API exposes nested transactions, you can run into anomalies.
For details, see my answer to How do detect that transaction has already been started?
Since you're using PHP, the scope of your transactions is at most a single request. So you should just use container-managed transactions, not service-layer transa. That is, start the transaction at the start of handling the request, and commit (or rollback) as you finish handling the request.
If an exception requiring a rollback occurs deep within nested ORM actions, then bubble that up by using an Exception, and let the container (i.e. your PHP action controller) take care of it.
Are you facing an aggregation of transactions? Does this pseudo code match what I think you're saying?
try
begin application transaction
begin ORM transaction 1
create new user
commit request
begin ORM transaction 2
create new client
commit request
begin ORM transaction 3
create user client association
commit request
commit application tx
catch()
abort ORM tx 3
abort ORM tx 2
abort ORM tx 1
abort app tx
At any point a rollback of a nested transaction will likely throw an exception, and those exceptions would logically roll back all the nested transactions in the two-phase commit.
I might not be getting what you're after tho.