Concurrency in Doctrine

Concurrency in Doctrine - php

I have an application, running on php + mysql plattform, using Doctrine2 framework. I need to execute 3 db queries during one http request: first INSERT, second SELECT, third UPDATE. UPDATE is dependent on result of SELECT query. There is a high probability of concurrent http requests. If such situation occurs, and DB queries get mixed up (eg. INS1, INS2, SEL1, SEL2, UPD1, UPD2), it will result in data inconsistency. How do I assure atomicity of INS-SEL-UPD operation? Do I need to use some kind of locks, or transactions are sufficient?

The answer from #YaK is actually a good answer. You should know how to deal with locks in general.
Addressing Doctrine2 specifically, your code should look like:
$em->getConnection()->beginTransaction();
try {
$toUpdate = $em->find('Entity\WhichWillBeUpdated', $id, \Doctrine\DBAL\LockMode::PESSIMISTIC_WRITE);
// this will append FOR UPDATE http://docs.doctrine-project.org/en/2.0.x/reference/transactions-and-concurrency.html
$em->persist($anInsertedOne);
// you can flush here as well, to obtain the ID after insert if needed
$toUpdate->changeValue('new value');
$em->persist($toUpdate);
$em->flush();
$em->getConnection()->commit();
} catch (\Exception $e) {
$em->getConnection()->rollback();
throw $e;
}
The every subsequent request to fetch for update, will wait until this transaction finishes for one process which has acquired the lock. Mysql will release the lock automatically after transaction is finished successfully or failed. By default, innodb lock timeout is 50 seconds. So if your process does not finish transaction in 50 seconds it will rollback and release the lock automatically. You do not need any additional fields on your entity.

A table-wide LOCK is guaranteed to work in all situations. But they are quite bad because they kind of prevent concurrency, rather than deal with it.
However, if your script holds the locks for a very short time frame, it might be an acceptable solution.
If your table uses InnoDB engine (no support for transactions with MyISAM), transaction is the most efficient solution, but also the most complex.
For your very specific need (in the same table, first INSERT, second SELECT, third UPDATE dependending on result of SELECT query):
Start a transaction
INSERT your records. Other transactions will not see these new rows until your own transaction is committed (unless you use a non-standard isolation level)
SELECT your record(s) with SELECT...LOCK IN SHARE MODE. You now have a READ lock on these rows, no one else may change these rows. (*)
Compute whatever you need to compute to determine whether or not you need to UPDATE something.
UPDATE the rows if required.
Commit
Expect errors at any time. If a dead-lock is detected, MySQL may decide to ROLLBACK you transaction to escape the dead-lock. If another transaction is updating the rows you are trying to read from, your transaction may be locked for some time, or even time-out.
The atomicity of your transaction is guaranteed if you proceed this way.
(*) in general, rows not returned by this SELECT may still be inserted in a concurrent transaction, that is, the non-existence is not guaranteed throughout the course of the transaction unless proper precautions are taken

Transactions won't prevent thread B to read the values thread A has not locked
So you must use locks to prevent concurrency access.
#Gediminas explained how you can use locks with Doctrine.
But using locks can result in dead locks or lock timeouts.
Doctrine renders these SQL errors as RetryableExceptions.
These exceptions are often normal if you are in a high concurrency environment.
They can happen very often and your application should handle them properly.
Each time a RetryableException is thrown by Doctrine, the proper way to handle this is to retry the whole transaction.
As easy as it seems, there is a trap. The Doctrine 2 EntityManager becomes unusable after a RetryableException and you must recreate a new one to replay your whole transaction.
I wrote this article illustrated with a full example.

Related

What makes SQL Server decide what lock level to use (row, page or table lock?)

SQL Server has many ways of locking resource. I am trying to understand what make SQL Server pick what level of locks it will choose. I want to know when will it use Page or table lock over row lock?
Problem
I have a PHP application that uses transaction with every http request to ensure all queries are executed before a commit. One issue that is puzzling me is when many (5+) people use the application the app seems to be hanging (spinning for a long periods of time)! Nothing I can think of will cause such a behaviors except for database locks! The scenario that I am thinking it happening is that SQL Server is choosing to pick Page or Table lock over rowlock for some reason. I am trying to ensure that SQL Server is doing a row lock not Page or table lock. I am using an ORM so I can't use ROWLOCK hint in my queries.
Is there a way for me to run queries explain plan to see what lock level will be used?

As you can see here there is no default granularity in lock modes.
In general the optimizer will choose the best course of action to handle this.
Could it be a case of livelock due to a long running transaction that leads to resource starvation?
You can also check here and here for information on lock escalation, but I'd suggest to not disable it for any table.

Partially lock a table for a specific attribute

We want to prevent some concurrency issues in a database (we use event sourcing and want to insert an event to the event log in case the event is a valid one).
The problem is that we need to check if a certain operation is allowed (requiring a SELECT query and some checks in php) and then run a INSERT query to actually perform the operation.
Now, we can simply LOCK the entire table, do our checks and if they succeed, then insert (and remove the lock).
The problem with this is that it locks the entire table, which is overkill (there will be lots of queries on this table). What I would like to do is to lock all queries that want to do this select-insert operation for a specific object_id, but allow queries for all other object_id's to continue as if there is no lock.
I searched a bit but couldn't find a lock attribute command. There seems to be a lock row command in innoDB, but it's not really what we want (I think I'm not 100% sure what it does).
We can of course try to manually handle the locks (check if there exists some column with object_id in some seperate lock table and wait untill there is none), but that feels a bit fishy and error prone.
So, here's the actual question: is it possible to lock a table for a specific value of a column (object_id)?
It would be awesome if the lock only held for the specific SELECT-INSERT queries, and not for standalone SELECT's, but that doesn't matter that much for now.

Consider manual arbitrary locks with GET_LOCK();
Choose a name specific to the rows you want locking. e.g. 'xxx_event_id_y'. Where 'xxx' is a string specific to the procedure and table and 'y' is the event id.
Call SELECT GET_LOCK('xxx_event_id_y',30) to lock the name 'xxx_event_id_y'.. it will return 1 and set the lock if the name becomes available, or return 0 if the lock is not available after 30 seconds (the second parameter is the timeout).
Use DO RELEASE_LOCK('xxx_event_id_y') when you are finished.
Be aware; You will have to use the same names in each transaction that you want to wait and calling GET_LOCK() again in a transaction will release the previously set lock.
GET_LOCK() docs
I actually use this method to lock our application cache too (even when it doesn't use the DB), so it has scope outside the database as well.

Migrate tables to innodb if not already done, and use transactions.

PHP - MySQL Row level locking example

I've seen many posts explaining the usage of Select FOR UPDATE and how to lock a row, however I haven't been able to find any that explain what occurs when the code tries to read a row that's locked.
For instance. Say I use the following:
$con->autocommit(FALSE);
$ps = $con->prepare( "SELECT 1 FROM event WHERE row_id = 100 FOR UPDATE");
$ps->execute();
...
//do something if lock successful
...
$mysqli->commit();
In this case, how do I determine if my lock was successful? What is the best way to handle a scenario when the row is locked already?
Sorry if this is described somewhere, but all I seem to find are the 'happy path' explanations out there.

In this case, how do I determine if my lock was successful? What is the best way to handle a scenario when the row is locked already?
If the row you are trying to lock is already locked - the mysql server will not return any response for this row. It will wait², until the locking transaction is either commited or rolled back.
(Obviously: if the row has been deleted already, your SELECT will return an empty result set and not lock anything)
After that, it will return the latest value, commited by the transaction that was holding the lock.
Regular Select Statements will not care about the lock and return the current value, ignoring that there is a uncommited change.
So, in other words: your code will only be executed WHEN the lock is successfull. (Otherwhise waiting² until the prior lock is released)
Note, that using FOR UPDATE will also block any transactional SELECTS for the time beeing locked - If you do not want this, you should use LOCK IN SHARE MODE instead. This would allow transactional selects to proceed with the current value, while just blocking any update or delete statement.
² the query will return an error, after the time defined with innodb_lock_wait_timeout http://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_lock_wait_timeout
It then will return ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
In other words: That's the point where your attempt to acquire a lock fails.
Sidenode: This kind of locking is just suitable to ensure data-integrity. (I.e. that no referenced row is deleted while you are inserting something that references this row).
Once the lock is released any blocked (or better call it delayed) delete statement will be executed, maybe deleting the row you just inserted due to Cascading on the row on which you just held the lock to ensure integrity.
If you want to create a system to avoid 2 users modifying the same data at the same time, you should do this at an application level and look at pessimistic vs optimistic locking approches, because it is no good idea to keep transactions running for a long period of time. (I think in PHP your database connections are automatically closed after each request anyway, causing an implicit commit on any running transaction)

Auction Bid - Will SELECT LOCK IN SHARE MODE keep information at its most recent?

I am currently looking into how I can manage a high number of bids on my auction site project. As it is quite possible that some people may send bids at exactly the same time it has become apparent that I need to ensure that there are locks to prevent any data corruption.
I have come down to using SELECT LOCK IN SHARE MODE which states that If any of these rows were changed by another transaction that has not yet committed, your query waits until that transaction ends and then uses the latest values.
http://dev.mysql.com/doc/refman/5.1/en/innodb-locking-reads.html
This suggests to me that the bids will enter a queue where each bid is dealt with and checked to ensure that the bid is higher than the current bid and if there are changes since an insert is put in this queue then the latest bid amount is used.
However, I have read that there can be damaging deadlock issues where two users try to place bids at the same time and no query can maintain a lock. Therefore I have also considered using SELECT FOR UPDATE but this will then also disable any reads which i am quite unsure about.
If anybody could shed any light on this issue that would be appreciated, if you could suggest any other database like NoSQL which would be more suitable then that would be very helpful!!!
EDIT: This is essentially a concurrency problem where i don't want to be checking the current bid with incorrect/old data which would therefore produce a 'lost update' on certain bids.

By itself, two simultaneous updates will not cause a deadlock, just transient blocking. Let's call them Bid A and Bid B.
Although we're considering them simultaneous, one will acquire a lock first. We'll say that A gets there 1 ms faster.
A acquires a lock on the row in question. B has it's lock request go in queue and must wait for the lock belonging to A to be released. As soon as lock A is released, B acquires it's lock.
There may be more to your code but from your question, and as I've described it, there is no deadlock scenario. In order to deadlock, A must be waiting for B to release it's lock on another resource but B will not release it's lock until it acquires a lock on A's resource.
If you need to validate the bid in real time you can either:
A. Use the appropriate transaction isolation level (repeatable read, probably, which is the default in InnoDB) and perform both your select and update in an explicit transaction.
BEGIN TRAN
SELECT ... FOR UPDATE
IF ...
UPDATE ...
COMMIT
B. Perform your check logic in your Update statement itself. In other words, construct your UPDATE query so that it will only affect rows when the current bid is less than the new bid. If no records were affected, the bid was too low. This is a possible approach and reduces work on the DB but has it's own considerations.
UPDATE ...
WHERE currentBid < newBid
Personally my vote would be to opt for A because I don't know how complex your logic is.
A repeatable read isolation level will ensure that a every time you read a given record in a transaction, the value is guaranteed to be the same. It does this by holding a lock on the row which prevents others from updating the given row until your transaction either commits or rolls back. One connection cannot update your table until the last has completed it's transaction.
The bottom line is your select/update will be atomic in your DB so you don't have to worry about lost updates.
Regarding concurrency, the key there is to keep your transactions as short as possible. Get in, get out. By default you can't read a record that is being updated because it is in an indeterminate state. These updates and reads should be taking small fractions of a second.

Deadlocks and Timeouts break ACID transactions

I've got transactional application that works like so:
try {
$db->begin();
increaseNumber();
$db->commit();
} catch(Exception $e) {
$db->rollback();
}
And then inside the increaseNumber() I'll have a query like so, which is the only function that works with this table:
// I use FOR UPDATE so that nobody else can read this table until its been updated
$result = $db->select("SELECT item1
FROM units
WHERE id = '{$id}'
FOR UPDATE");
$result = $db->update("UPDATE units SET item1 = item1 + 1
WHERE id = '{$id}'");
Everything is wrapped in a transaction but lately I've been dealing with some pretty slow queries and there's a lot of concurrency going on in my application so I can't really make sure that queries are to be run in a specific order.
Can deadlocks cause ACID transactions to break? I have one function that adds something and then another that removes it but when I have deadlocks then I find the data is completely out of sync like the transactions were ignored.
Is this bound to happen or is something else wrong?
Thanks, Dominic

Well, if a transaction runs into a lock (from another transaction) that doesn't release, it'll fail after timeout. I believe the default is 30 seconds. You should make note if anyone is using any 3rd party applications on the database. I know for a fact that, for example, SQL Manager 2007 does not release locks on InnoDB unless you disconnect from database (sometimes it only takes a Commit Transaction on ... well, everything), which causes a lot of queries to fail after timeout. Of course, if your transactions are ACID-compliant, it should execute in all-or-nothing. It will break only if you break data between transactions.
You can try extending the timeout, but a 30 second lock might imply some deeper problems. It depends, of course, on what storage engine you're using (by MySQL tag and transactions I assumed InnoDB).
You can also try and turn on query profiling to see if any queries run for a ridiculous amount of time. Just note that it does, of course, decrease performance, so it may not be a production solution.

A in ACID stands for Atomic, so no deadlocks cannot make an ACID transaction break -- Rather it will make it not happen like in all-or-nothing.
More likely, if you inconsistent data, you application is doing multiple "transactions" in what is a logical single transaction, like: user creates and account (transaction-begin..-commit), user sets a password (transaction-begin...-deadlock..-rollback) your application ignored the error and continues, and now your database is left with a user created and no password.
Look in your application as what else the application is doing besides the rollback, and logically whether there is multiple parts to the build up of the consistent data.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.