need and scope of transaction isolation - MySQL - php

http://dev.mysql.com/doc/refman/5.6/en/set-transaction.html#isolevel_repeatable-read
"All consistent reads within the same transaction read the snapshot
established by the first read."
What does this snapshot contain? Only a snapshot of the rows read by the first read, of the complete table or even complete database?
Actually I thought only a snapshot of the rows read by the first read, but this confuses me:
TRANSACTION 1 is started at first, then 2. The result of the last "SELECT * FROM B;" in T1 is EXACTLY the same as if I had not executed T2 meanwhile (NEITHER the UPDATE nor the INSERT appear and that, although the read and write are on DIFFERENT tables)
TRANSACTION 1:
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION;
SELECT * FROM A WHERE a_id = 1;
SELECT SLEEP(8);
SELECT * FROM B;
COMMIT;
TRANSACTION 2
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION;
UPDATE B SET b_name = 'UPDATE_6' WHERE b_id = 2;
INSERT INTO B (b_name) VALUES('NEW_5');
COMMIT;
OUTPUTS of TRANSACTION 1
# 1st query
a_id a_name
1 a_1
# 3rd query
b_id b_name
1 b_1
2 b_2
3 b_3
In my web application a PHP script imports data from files in a MySQL database (InnoDB). It is made sure by the application, that there is just this 1 writing process at the same time. However there may be additionally multiple concurrent readers.
Now I wonder, whether I should and if yes how I can prevent the following:
in one repeatable-read transaction:
reader R1 reads from table T1
reader R1 does sth. else
reader R1 reads from table T2
If data in T1 and T2 belong together in any way, it could happen, that the reader reads in the 1st step some data and in the 3rd step the related data, that now might not be related anymore, because a writer has changed T1 AND T2 meanwhile. AFAIK repeatable-read only guarantees, that the same reads return the same data, but the 2nd read is not the same as the 1st one.
I hope, you know, what I mean, and I fear, that I got sth. totally wrong about this topic.
(a week ago I asked this question in MySQL forum without answers: http://forums.mysql.com/read.php?20,629710,629710#msg-629710)

The snapshot is for all the tables in database. The MySQL documentation states that explicitly multiple times here http://dev.mysql.com/doc/refman/5.6/en/innodb-consistent-read.html:
A consistent read means that InnoDB uses multi-versioning to present
to a query a snapshot of the database at a point in time.
and
Suppose that you are running in the default REPEATABLE READ isolation
level. When you issue a consistent read (that is, an ordinary SELECT
statement), InnoDB gives your transaction a timepoint according to
which your query sees the database.
and
The snapshot of the database state applies to SELECT statements within
a transaction, not necessarily to DML statements.

Related

MariaDB row lock for read

I have two scripts using PHP7 / 10.4.14-MariaDB . Both update the same value in the database.
Script1 uses a transaction; script2 does not. Script1 is executed slightly earlier than script2.
The pseudo-code for both are:
Script 1:
$objDb->startTransaction();
$objDb->query("select ID,name from table1 where name='nameB' limit 1 FOR UPDATE ");
if($objDb->totalRows()>0)
{
$objDb->get();
$objDb->query("update table1 set name ='nameBB' where ID=".$objDb->row['ID']." ");
}
sleep(3);
$objDb->commit();
Script 2:
$objDb->query("select ID,name from table1 where name='nameB' limit 1");
if($objDb->totalRows()>0)
{
$objDb->get();
$objDb->query("update table1 set name ='nameCC' where ID=".$objDb->row['ID']." ");
}
If I would execute script2 with a transaction then the final database-value is 'nameBB' since script2 waits until script 1 is committed, as expected.
However in the current script2 example (without a transaction) the final database-value is 'nameCC'. I expected it also to be 'nameBB'. Apparently no read-lock is placed for the ID of table1.
How can I make sure that regular select queries ( without transaction / autocommit ) are put in read lock?
help appreciated
The Script 1 starts an transaction and updates name to 'nameBB'. This happens inside the transaction. This means that the change is not visible to other processes until it is committed.
The Script 2 is free to read the "old" data, but it is blocked to update the row until the transaction from Script 1 is either committed or it is rolled back.
When the Script 1 commits, the lock is released and the Script2 performs the update resulting 'nameCC' as name column value.
Note that the two scripts are independent of each other. It could have been that the Script 2's read happened before the row was locked by Script 1. The result would have been the same, so locking the read is not the answer.
What you should do, is avoid using separate SELECT/UPDATE and when possible do:
update table1 set name ='nameCC' where name='nameB' limit 1
If you hve two processes updating the same data simultaneously, you need to decide which of the updates is the valid one.
If you want to use separate SELECT/UPDATE, you can for example use updated_at datetime column to make sure your update matches the read.

Can a field change during the execution of a MySQL query and both values be present in the result set?

I've come across some odd defensive code. Basically it does a query like this:
select * from A join B on (A.b_id=B.id)
Then it iterates through the result set and whenever it meets a new row from table B, it caches it (by id). Afterwards only the cached copy is used, even for subsequent rows.
It looks like it was trying to safeguard against a result set like this:
A.id | A.value | B.id | B.value
------+---------+------+---------
1 | First | 1 | Yay
2 | Second | 1 | Nay
But is this even possible? Even if the row in table B is updated while the select query is fetched half way, will it really be visible? Can the update even proceed while someone is querying the table?
For what it's worth, I think the table at the time was MyISAM, although it's been since converted to InnoDB. Also, the code which is running the query is written in PHP. As far as I can tell, it uses the default transaction isolation level and fetch mode.
OK, it seems I need some clarifications. Here's a code, similar to what I've found:
$sql = "select A.id a_id, A.value a_value, B.id b_id, B.value b_value from A join B on (A.b_id=B.id)";
$res = mysql_query($sql);
$cacheB = array();
$A = new classA();
$B = new classB();
while ($row = mysql_fetch_assoc($res)) {
$A->setData($row);
if ( !isset($cacheB[$row['b_id']]) ) {
$cacheB[$row['b_id']] = $row;
}
$B->setData($cacheB[$row['b_id']]);
// Do some processing depending on $A and $B
}
This code is a CLI application running from a cron job. The data from $A and $B isn't returned to anything, but depending on the contents, some external services may be called and some other DB tables may be modified. The contents of classA, classB and the processing are not relevant to this question.
My question is - is there a point for this "safeguard", or is it a deadweight that can be deleted? Let's assume that the processing part would actually be sensitive to a change in the values of B (although in reality I doubt it, but still).
Can a field change during the execution of a MySQL query and both values be present in the result set?
No.
In MyISAM the entire table is locked by each query, so it's not possible at all, by design (see table locking).
In InnoDB queries are isolated and a select is a consistent read as mentioned in the doc Locks Set by Different SQL Statements in InnoDB. A consistent read is defined as "A read operation that uses snapshot information to present query results based on a point in time, regardless of changes performed by other transactions running at the same time."
Even if the row in table B is updated while the select query is fetched half way, will it really be visible?
Yes, even then, it's impossible.
Can the update even proceed while someone is querying the table?
In MyISAM no, it'll have to wait, as explain in the doc: "Table locking enables many sessions to read from a table at the same time, but if a session wants to write to a table, it must first get exclusive access, meaning it might have to wait for other sessions to finish with the table first. During the update, all other sessions that want to access this particular table must wait until the update is done."
In InnoDB yes, but the queries are isolated and work on different "snapshot" of the database as explained, so it doesn't matter. Transactions are particularly useful in this case if you have any doubt by the way.
The code you are showing might or might not have another purpose, this I can't say. But if the only purpose it to prevent something that cannot happen to happen, then it's completely redundant and can be safely removed.
Just in case:
Right now in your while loop you have a row
{ a_id1, a_value1, b_id1, b_value1 }
And you set $B and save in the cache the whole row not just the values from B.
so next row in the loop will have a different a_id, but same b_id
{ a_id2, a_value2, b_id1, b_value1 }
But in this case you will set $B using the cached version of $row so you will have a_id1 instead of a_id2
My guess is $B->setData() only care about fields related to B so using the cache version doesn't make any difference, but if that isn't the case you are cloning the A values from the first row on the following rows with same b_id1.

Implementing a simple queue with PHP and MySQL?

I have a PHP script that retrieves rows from a database and then performs work based on the contents. The work can be time consuming (but not necessarily computationally expensive) and so I need to allow multiple scripts to run in parallel.
The rows in the database looks something like this:
+---------------------+---------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+---------------+------+-----+---------------------+----------------+
| id | bigint(11) | NO | PRI | NULL | auto_increment |
.....
| date_update_started | datetime | NO | | 0000-00-00 00:00:00 | |
| date_last_updated | datetime | NO | | 0000-00-00 00:00:00 | |
+---------------------+---------------+------+-----+---------------------+----------------+
My script currently selects rows with the oldest dates in date_last_updated (which is updated once the work is done) and does not make use of date_update_started.
If I were to run multiple instances of the script in parallel right now, they would select the same rows (at least some of the time) and duplicate work would be done.
What I'm thinking of doing is using a transaction to select the rows, update the date_update_started column, and then add a WHERE condition to the SQL statement selecting the rows to only select rows with date_update_started greater than some value (to ensure another script isn't working on it). E.g.
$sth = $dbh->prepare('
START TRANSACTION;
SELECT * FROM table WHERE date_update_started > 1 DAY ORDER BY date_last_updated LIMIT 1000;
UPDATE table DAY SET date_update_started = UTC_TIMESTAMP() WHERE id IN (SELECT id FROM table WHERE date_update_started > 1 DAY ORDER BY date_last_updated LIMIT 1000;);
COMMIT;
');
$sth->execute(); // in real code some values will be bound
$rows = $sth->fetchAll(PDO::FETCH_ASSOC);
From what I've read, this is essentially a queue implementation and seems to be frowned upon in MySQL. All the same, I need to find a way to allow multiple scripts to run in parallel, and after the research I've done this is what I've come up with.
Will this type of approach work? Is there a better way?
I think your approach could work, as long as you also add some kind of identifier to the rows you selected that they are currently been worked on, it could be as #JuniusRendel suggested and i would even think about using another string key (random or instance id) for cases where the script resulted in errors and did not complete gracefully, as you will have to clean these fields once you updated the rows back after your work.
The problem with this approach as i see it is the option that there will be 2 scripts that run at the same point and will select the same rows before they were signed as locked. here as i can see it, it really depends on what kind of work you do on the rows, if the end result in these both scripts will be the same, i think the only problem you have is for wasted time and server memory (which are not small issues but i will put them aside for now...). if your work will result in different updates on both scripts your problem will be that you could have the wrong update at the end in the TB.
#Jean has mentioned the second approach you can take that involves using the MySql locks. i am not an expert of the subject but it seems like a good approach and using the 'Select .... FOR UPDATE' statement could give you what you are looking for as you could do on the same call the select & the update - which will be faster than 2 separate queries and could reduce the risk for other instances to select these rows as they will be locked.
The 'SELECT .... FOR UPDATE' allows you to run a select statement and lock those specific rows for updating them, so your statement could look like:
START TRANSACTION;
SELECT * FROM tb where field='value' LIMIT 1000 FOR UPDATE;
UPDATE tb SET lock_field='1' WHERE field='value' LIMIT 1000;
COMMIT;
Locks are powerful but be careful that it wont affect your application in different sections. Check if those selected rows that are currently locked for the update, are they requested somewhere else in your application (maybe for the end user) and what will happen in that case.
Also, Tables must be InnoDB and it is recommended that the fields you are checking the where clause with have a Mysql index as if not you may lock the whole table or encounter the 'Gap Lock'.
There is also a possibility that the locking process and especially when running parallel scripts will be heavy on your CPU & memory.
here is another read on the subject: http://www.percona.com/blog/2006/08/06/select-lock-in-share-mode-and-for-update/
Hope this helps, and would like to hear how you progressed.
We have something like this implemented in production.
To avoid duplicates, we do a MySQL UPDATE like this (I modified the query to resemble your table):
UPDATE queue SET id = LAST_INSERT_ID(id), date_update_started = ...
WHERE date_update_started IS NULL AND ...
LIMIT 1;
We do this UPDATE in a single transaction, and we leverage the LAST_INSERT_ID function. When used like that, with a parameter, it writes in the transaction session the parameter that, in this case, it's the ID of the single (LIMIT 1) queue that has been updated (if there is one).
Just after that, we do:
SELECT LAST_INSERT_ID();
When used without parameter, it retrieves the previously stored value, obtaining the queue item's ID that has to be performed.
Edit: Sorry, I totally misunderstood your question
You should just put a "locked" column on your table put the value to true on the entries your script is working with, and when it's done put it to false.
In my case i have put 3 other timestamp (integer) columns: target_ts , start_ts , done_ts.
You
UPDATE table SET locked = TRUE WHERE target_ts<=UNIX_TIMESTAMP() AND ISNULL(done_ts) AND ISNULL(start_ts);
and then
SELECT * FROM table WHERE target_ts<=UNIX_TIMESTAMP() AND ISNULL(start_ts) AND locked=TRUE;
Do your jobs and update each entry one by one (to avoid data inconcistencies) setting the done_ts property to current timestamp (you can also unlock them now). You can update target_ts to the next update you wish or you can ignore this column and just use done_ts for your select
Each time the script runs I would have the script generate a uniqid.
$sctiptInstance = uniqid();
I would add a script instance column to hold this value as a varchar and put an index on it. When the script runs I would use select for update inside of a transaction to select your rows based on whatever logic, excluding rows with a script instance, and then update those rows with the script instance. Something like:
START TRANSACTION;
SELECT * FROM table WHERE script_instance = '' AND date_update_started > 1 DAY ORDER BY date_last_updated LIMIT 1000 FOR UPDATE;
UPDATE table SET date_update_started = UTC_TIMESTAMP(), script_instance = '{$scriptInstance}' WHERE script_instance = '' AND date_update_started > 1 DAY ORDER BY date_last_updated LIMIT 1000;
COMMIT;
Now those rows will be excluded from other instances of the script. Do you work, and then update the rows to set the script instance back to null or blank, and also update your date last updated column.
You could also use the script instance to write to another table called "current instances" or something like that, and have the script check that table to get a count of running scripts to control the number of concurrent scripts. I would add the PID of the script to the table as well. You could then use that information to create a housekeeping script to run from cron periodically to check for long running or rogue processes and kill them, etc.
I have a system working exactly like this in production. We run a script every minute to do some processing, and sometimes that run can take more than a minute.
We have a table column for status, which is 0 for NOT RUN YET, 1 for FINISHED, and other value for under way.
The first thing the script does is to update the table, setting a line or multiple lines with a value meaning that we are working on that line. We use getmypid() to update the lines that we want to work on, and that are still unprocessed.
When we finish the processing, the script updates the lines that have the same process ID, marking them as finished (status 1).
This way we avoid each of the scripts to try and process a line that is already under processing, and it works like a charm. This doesn't mean that there isn't a better way, but this does get the work done.
I have used a stored procedure for very similar reasons in the past. We used the FOR UPDATE read lock to lock the table while a selected flag was updated to remove that entry from any future selects. It looked something like this:
CREATE PROCEDURE `select_and_lock`()
BEGIN
START TRANSACTION;
SELECT your_fields FROM a_table WHERE some_stuff=something
AND selected = 0 FOR UPDATE;
UPDATE a_table SET selected = 1;
COMMIT;
END$$
No reason it has to be done in a stored procedure though now I think about it.

MySQL using COUNT as total and avoiding exceeding a maximum

I was tasked to create this organization registration system. I decided to use MySQL and PHP to do it. Each organization in table orgs has a max_members column and has a unique id org_id. Each student in table students has an org column. Every time a student joins an organization, his org column is equated to the org_id of that organization.
When someone clicks join on an organization page, a PHP file executes.
In the PHP file, a query retrieves the total number of students whose org is equal to the org_id of the organization being joined.
$query = "SELECT COUNT(student_id) FROM students WHERE org = '$org_id'";
The maximum members is also retrieved from the orgs table.
$query = "SELECT max_members FROM orgs WHERE org_id = '$org_id'";
So I have variables $total_members and $max_members. A basic if statement checks if $total_members < $max_members, then updates the student's org equal to the org_id. If not, then it does nothing and notifies the student that the organization is full.
What my main concern is what if this situation happened:
Org A only has one slot left. 29/30 members.
Student A clicks join on Org A (and at the same time)
Student B clicks join on Org A
Student A retrieves data: There is one slot left
Student B retrieves data: There is one slot left
Student A's org = Org A's org_id
Student B's org = Org A's org_id
After the scripts have executed, Org A will show up with 31/30 members
Can this happen? If yes, how can I avoid it?
I've thought about using MySQL variables like this:
SET #org_id = 'what-ever-org';
SELECT #total_members := COUNT(student_id) FROM students WHERE org_main = #org_id;
SELECT #max_members := max_members FROM orgs WHERE org_id = #org_id;
UPDATE students SET org_main = IF(#total_members < #max_members, #org_id, '') WHERE student_id = 99999;
But I don't know if it would make a difference.
Row locking does not apply in my case. I think. I'd love to be proven wrong though.
The code I've written above is a simplified version of the original code. The original code included checking registration dates, org days, etc, however, these things are not related to the question.
What you're describing is usually called a race-condition. It occurs, because you perform two non-atomic operations on your database. To avoid this you need to use transactions, which ensure that the database server prevents this kind of interference. Another approach would be to use a "before update trigger".
Transaction
As you're using MySQL you have to make sure that the DB engine your tables are running on is InnoDB, because MyISAM just doesn't have transactions. Before you do your SELECT you need to start a transaction. Either send START TRANSACTION manually to the database or use a proper PHP implementation for it, e.g. PDO (PDO::beginTransaction()).
In a running transaction you can then use the suffix FOR UPDATE in your SELECT statement, which will lock the rows that have been selected by the query:
SELECT COUNT(student_id) FROM students WHERE org = :orgId FOR UPDATE
After your UPDATE statement you can commit the transaction, which will write the changes permanently to the database and remove the locks.
If you expect a lot of these simultaneous requests to happen, be aware that locking can cause some delay in the response, because the database might wait for a lock to be released.
Trigger
Instead of using transactions you can also create a trigger on the database, which will run before an update is executed on the database. In this trigger you could test if the maximum number of students has been exceeded and throw an error if that's the case. However, this can be a very challenging approach, especially if the value to be checked depends on something in the UPDATE statement. Also it is debatable if it's a good idea to implement this kind of logic on the database level.
There are two ways, use synchronized function in PHP which performs this operation. But if you want to implement all the logic in MySQL (I'd like this method), please use Stored Procedure.
Create a stored procedure as (not exactly):
CREATE PROCEDURE join_org(stu_id int, org_id, int, OUT success int)
DECLARE total_members int;
DECLARE max_members_Allowed;
SELECT COUNT(student_id) INTO total_members FROM students WHERE org = 'org_id';
SELECT max_members INTO max_members_Allowed FROM orgs WHERE org_id = 'org_id';
IF(max_members_Allowed > total_members) Then
UPDATE student SET orgid='org_id';
SET success = 1;
ELSE
SET success = 0;
END IF;
Then register this out variable named 'success' int your PHP code to indicate success or failure. Call this procedure when user clicks join.

PHP/MySQL Concurrency - Write dependent on Read - Critical Section

I have a website running PHP+MySQL. It is a multiuser system and most of the MySQL tables are MyISAM-based.
The following situation got me puzzled for the last few hours:
I have two (concurrent) users A,B. Both of them will do this:
Perform a Read Operation on Table 1
Perform a Write Operation on another Table 2 (only if the previous Read operation will return a distinct result, e.g. STATUS="OK")
B is a little delayed towards A.
So it will occur like this:
User A performs a read on Table 1 and sees STATUS="OK".
(User A Schedules Write on Table 2)
User B performs a read on Table 1 and still sees STATUS="OK".
User A performs Write on Table 2 (resulting in STATUS="NOT OK" anymore)
User B performs Write on Table 2 (assuming STATUS="OK")
I think I could prevent this if Reading Table 1 and Writing to Table 2 were defined as a critical section and would be executed atomically. I know this works perfectly fine in Java with threads etc., however in PHP there is no thread communication, as far as I know.
So the solution to my problem must be database-related, right?
Any ideas?
Thanks a lot!
The Right Way: Use InnoDB and transactions.
The Wrong-But-Works Way: Use the GET_LOCK() MySQL function to obtain an exclusive named lock before performing the database operations. When you're don, release the lock with RELEASE_LOCK(). Since only one client can own a particular lock, this will ensure that there's never more than one instance of the script in the "critical section" at the same time.
Pseudo-code:
SELECT GET_LOCK('mylock', 10);
If the query returned "1":
//Read from Table 1
//Update Table 2
SELECT RELEASE_LOCK('mylock');
Else:
//Another instance has been holding the lock for > 10 seconds...

Categories