Im trying to find a solution to a MariaDB race condition.
There are some cases where multiple process get executed almost at the same time (with the same timestamp that comes from some units in the field) and they need to read the same row each one (but only the first one needs to perform the action).
I was thinking to lock the row (only the row, locking the entire table is not an option), so the first process will read it and check the timestamp of the latest update, perform the task needed and once is unlocked the other processes will be able to read it, and if it has the same timestamp just ignore it.
It will affect performance, but only for those few cases where this happens.
I've been trying to do this in MariaDB, but I can't find the how... starting a transaction and executing FOR UPDATE seems to lock the entire table, because I'm not able to fetch another rows meanwhile.
And reading MariaDB documentation I see this
When LOCK IN SHARE MODE is specified in a SELECT statement, MariaDB will wait until all transactions that have modified the rows are committed. Then, a write lock is acquired. All transactions can read the rows, but if they want to modify them, they have to wait until your transaction is committed.
So basically, what I want to do can't be done in MariaDB?
Any ideas?
Thank you,
Related
We want to prevent some concurrency issues in a database (we use event sourcing and want to insert an event to the event log in case the event is a valid one).
The problem is that we need to check if a certain operation is allowed (requiring a SELECT query and some checks in php) and then run a INSERT query to actually perform the operation.
Now, we can simply LOCK the entire table, do our checks and if they succeed, then insert (and remove the lock).
The problem with this is that it locks the entire table, which is overkill (there will be lots of queries on this table). What I would like to do is to lock all queries that want to do this select-insert operation for a specific object_id, but allow queries for all other object_id's to continue as if there is no lock.
I searched a bit but couldn't find a lock attribute command. There seems to be a lock row command in innoDB, but it's not really what we want (I think I'm not 100% sure what it does).
We can of course try to manually handle the locks (check if there exists some column with object_id in some seperate lock table and wait untill there is none), but that feels a bit fishy and error prone.
So, here's the actual question: is it possible to lock a table for a specific value of a column (object_id)?
It would be awesome if the lock only held for the specific SELECT-INSERT queries, and not for standalone SELECT's, but that doesn't matter that much for now.
Consider manual arbitrary locks with GET_LOCK();
Choose a name specific to the rows you want locking. e.g. 'xxx_event_id_y'. Where 'xxx' is a string specific to the procedure and table and 'y' is the event id.
Call SELECT GET_LOCK('xxx_event_id_y',30) to lock the name 'xxx_event_id_y'.. it will return 1 and set the lock if the name becomes available, or return 0 if the lock is not available after 30 seconds (the second parameter is the timeout).
Use DO RELEASE_LOCK('xxx_event_id_y') when you are finished.
Be aware; You will have to use the same names in each transaction that you want to wait and calling GET_LOCK() again in a transaction will release the previously set lock.
GET_LOCK() docs
I actually use this method to lock our application cache too (even when it doesn't use the DB), so it has scope outside the database as well.
Migrate tables to innodb if not already done, and use transactions.
I have to update a big table (products) in a MySQL database, every 10 minutes with PHP. I have to run the PHP script with cron job, and I get the most up to date products from a CSV file. The table has currently ~18000 rows, and unfortunately I can not tell how much it will change in a 10 min period. The most important thing is of course I do not want the users to notice the update in the background.
These are my ideas and fears:
Idea1: I know that there is a way to load a csv file into a table with MySQL, so maybe I can use a transaction to truncate the table, and import the CSV. But even if I use transactions, as long as the table is large, I'm afraid that there will be a little chance for some users to see the empty database.
Idea2: I could compare the old and the new csv file with a library and only update/add/remove the changed rows. This way I think there it's not possible for a user to see an empty database, but I'm afraid this method will cost a lot of RAM and CPU, and I'm on a shared hosting.
So basically I would like to know which method is the most secure to update a table completely without the users noticing it.
Assuming InnoDB and default isolation level, you can start a transaction, delete all rows, insert your new rows, then commit. Before the commit completes, users will see the previous state.
While the transaction is open (after the deletes), updates will block, but SELECTs will not. Since it's a read only table for the user, it won't be an issue. They'll still be able to SELECT while the transaction is open.
You can learn the details by reading about MVCC. The gist of it is that any time someone performs a SELECT, MySQL uses the data in the database plus the rollback segment to fetch the previous state until the transaction is committed or rolled back.
From MySQL docs:
InnoDB uses the information in the rollback segment to perform the
undo operations needed in a transaction rollback. It also uses the
information to build earlier versions of a row for a consistent read.
Only after the commit completes will the users see the new data instead of the old data, and they won't see the new data until their current transaction is over.
I've seen many posts explaining the usage of Select FOR UPDATE and how to lock a row, however I haven't been able to find any that explain what occurs when the code tries to read a row that's locked.
For instance. Say I use the following:
$con->autocommit(FALSE);
$ps = $con->prepare( "SELECT 1 FROM event WHERE row_id = 100 FOR UPDATE");
$ps->execute();
...
//do something if lock successful
...
$mysqli->commit();
In this case, how do I determine if my lock was successful? What is the best way to handle a scenario when the row is locked already?
Sorry if this is described somewhere, but all I seem to find are the 'happy path' explanations out there.
In this case, how do I determine if my lock was successful? What is the best way to handle a scenario when the row is locked already?
If the row you are trying to lock is already locked - the mysql server will not return any response for this row. It will wait², until the locking transaction is either commited or rolled back.
(Obviously: if the row has been deleted already, your SELECT will return an empty result set and not lock anything)
After that, it will return the latest value, commited by the transaction that was holding the lock.
Regular Select Statements will not care about the lock and return the current value, ignoring that there is a uncommited change.
So, in other words: your code will only be executed WHEN the lock is successfull. (Otherwhise waiting² until the prior lock is released)
Note, that using FOR UPDATE will also block any transactional SELECTS for the time beeing locked - If you do not want this, you should use LOCK IN SHARE MODE instead. This would allow transactional selects to proceed with the current value, while just blocking any update or delete statement.
² the query will return an error, after the time defined with innodb_lock_wait_timeout http://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_lock_wait_timeout
It then will return ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
In other words: That's the point where your attempt to acquire a lock fails.
Sidenode: This kind of locking is just suitable to ensure data-integrity. (I.e. that no referenced row is deleted while you are inserting something that references this row).
Once the lock is released any blocked (or better call it delayed) delete statement will be executed, maybe deleting the row you just inserted due to Cascading on the row on which you just held the lock to ensure integrity.
If you want to create a system to avoid 2 users modifying the same data at the same time, you should do this at an application level and look at pessimistic vs optimistic locking approches, because it is no good idea to keep transactions running for a long period of time. (I think in PHP your database connections are automatically closed after each request anyway, causing an implicit commit on any running transaction)
I've been looking at several different solutions to my table concurrency problems in MySQL/PHP/InnoDB:
I really want Process 2 to wait for Process 1 to commit before starting its own transaction and SELECTing data to work on and then trying to INSERT a UNIQUE INDEX. So, Process 1 locks the table and Process 2 checks for the lock, and waits if a WRITE lock already exists with something like sleep()... (I cannot use semaphores).
Is that it? 1b. That simple? 1c. Should I also still check for a duplicate entry when INSERTing?
Also, do I need to check for this single WRITE lock EVERYWHERE ELSE that there are UPDATES and/or INSERTS to the table, or will this be handled automatically by making the UPDATE/INSERT wait? This is only done once, so making all other queries wait is the single biggest downside if this is the case.
SOLUTION BONUS THOUGHT: it would seem simpler to check for a duplicate key error when inserting, and if true, then re-SELECT and re-calculate a new value and try again until the DUPLICATE KEY error is no longer.
I am currently looking into how I can manage a high number of bids on my auction site project. As it is quite possible that some people may send bids at exactly the same time it has become apparent that I need to ensure that there are locks to prevent any data corruption.
I have come down to using SELECT LOCK IN SHARE MODE which states that If any of these rows were changed by another transaction that has not yet committed, your query waits until that transaction ends and then uses the latest values.
http://dev.mysql.com/doc/refman/5.1/en/innodb-locking-reads.html
This suggests to me that the bids will enter a queue where each bid is dealt with and checked to ensure that the bid is higher than the current bid and if there are changes since an insert is put in this queue then the latest bid amount is used.
However, I have read that there can be damaging deadlock issues where two users try to place bids at the same time and no query can maintain a lock. Therefore I have also considered using SELECT FOR UPDATE but this will then also disable any reads which i am quite unsure about.
If anybody could shed any light on this issue that would be appreciated, if you could suggest any other database like NoSQL which would be more suitable then that would be very helpful!!!
EDIT: This is essentially a concurrency problem where i don't want to be checking the current bid with incorrect/old data which would therefore produce a 'lost update' on certain bids.
By itself, two simultaneous updates will not cause a deadlock, just transient blocking. Let's call them Bid A and Bid B.
Although we're considering them simultaneous, one will acquire a lock first. We'll say that A gets there 1 ms faster.
A acquires a lock on the row in question. B has it's lock request go in queue and must wait for the lock belonging to A to be released. As soon as lock A is released, B acquires it's lock.
There may be more to your code but from your question, and as I've described it, there is no deadlock scenario. In order to deadlock, A must be waiting for B to release it's lock on another resource but B will not release it's lock until it acquires a lock on A's resource.
If you need to validate the bid in real time you can either:
A. Use the appropriate transaction isolation level (repeatable read, probably, which is the default in InnoDB) and perform both your select and update in an explicit transaction.
BEGIN TRAN
SELECT ... FOR UPDATE
IF ...
UPDATE ...
COMMIT
B. Perform your check logic in your Update statement itself. In other words, construct your UPDATE query so that it will only affect rows when the current bid is less than the new bid. If no records were affected, the bid was too low. This is a possible approach and reduces work on the DB but has it's own considerations.
UPDATE ...
WHERE currentBid < newBid
Personally my vote would be to opt for A because I don't know how complex your logic is.
A repeatable read isolation level will ensure that a every time you read a given record in a transaction, the value is guaranteed to be the same. It does this by holding a lock on the row which prevents others from updating the given row until your transaction either commits or rolls back. One connection cannot update your table until the last has completed it's transaction.
The bottom line is your select/update will be atomic in your DB so you don't have to worry about lost updates.
Regarding concurrency, the key there is to keep your transactions as short as possible. Get in, get out. By default you can't read a record that is being updated because it is in an indeterminate state. These updates and reads should be taking small fractions of a second.