I have two scripts using PHP7 / 10.4.14-MariaDB . Both update the same value in the database.
Script1 uses a transaction; script2 does not. Script1 is executed slightly earlier than script2.
The pseudo-code for both are:
Script 1:
$objDb->startTransaction();
$objDb->query("select ID,name from table1 where name='nameB' limit 1 FOR UPDATE ");
if($objDb->totalRows()>0)
{
$objDb->get();
$objDb->query("update table1 set name ='nameBB' where ID=".$objDb->row['ID']." ");
}
sleep(3);
$objDb->commit();
Script 2:
$objDb->query("select ID,name from table1 where name='nameB' limit 1");
if($objDb->totalRows()>0)
{
$objDb->get();
$objDb->query("update table1 set name ='nameCC' where ID=".$objDb->row['ID']." ");
}
If I would execute script2 with a transaction then the final database-value is 'nameBB' since script2 waits until script 1 is committed, as expected.
However in the current script2 example (without a transaction) the final database-value is 'nameCC'. I expected it also to be 'nameBB'. Apparently no read-lock is placed for the ID of table1.
How can I make sure that regular select queries ( without transaction / autocommit ) are put in read lock?
help appreciated
The Script 1 starts an transaction and updates name to 'nameBB'. This happens inside the transaction. This means that the change is not visible to other processes until it is committed.
The Script 2 is free to read the "old" data, but it is blocked to update the row until the transaction from Script 1 is either committed or it is rolled back.
When the Script 1 commits, the lock is released and the Script2 performs the update resulting 'nameCC' as name column value.
Note that the two scripts are independent of each other. It could have been that the Script 2's read happened before the row was locked by Script 1. The result would have been the same, so locking the read is not the answer.
What you should do, is avoid using separate SELECT/UPDATE and when possible do:
update table1 set name ='nameCC' where name='nameB' limit 1
If you hve two processes updating the same data simultaneously, you need to decide which of the updates is the valid one.
If you want to use separate SELECT/UPDATE, you can for example use updated_at datetime column to make sure your update matches the read.
Related
The PHP Documentation says:
If you've never encountered transactions before, they offer 4 major
features: Atomicity, Consistency, Isolation and Durability (ACID). In
layman's terms, any work carried out in a transaction, even if it is
carried out in stages, is guaranteed to be applied to the database
safely, and without interference from other connections, when it is
committed.
QUESTION:
Does this mean that I can have two separate php scripts running transactions simultaneously without them interfering with one another?
ELABORATING ON WHAT I MEAN BY "INTERFERING":
Imagine we have the following employees table:
__________________________
| id | name | salary |
|------+--------+----------|
| 1 | ana | 10000 |
|------+--------+----------|
If I have two scripts with similar/same code and they run at the exact same time:
script1.php and script2.php (both have the same code):
$conn->beginTransaction();
$stmt = $conn->prepare("SELECT * FROM employees WHERE name = ?");
$stmt->execute(['ana']);
$row = $stmt->fetch(PDO::FETCH_ASSOC);
$salary = $row['salary'];
$salary = $salary + 1000;//increasing salary
$stmt = $conn->prepare("UPDATE employees SET salary = {$salary} WHERE name = ?");
$stmt->execute(['ana']);
$conn->commit();
and assuming the sequence of events is as follows:
script1.php selects data
script2.php selects data
script1.php updates data
script2.php updates data
script1.php commit() happens
script2.php commit() happens
What would the resulting salary of ana be in this case?
Would it be 11000? And would this then mean that 1 transaction will overlap the other because the information was obtained before either commit happened?
Would it be 12000? And would this then mean that regardless of the order in which data was updated and selected, the commit() function forced these to happen individually?
Please feel free to elaborate as much as you want on how transactions and separate scripts can interfere (or don't interfere) with one another.
You are not going to find the answer in php documentation because this has nothing to do with php or pdo.
Innodb table engine in mysql offers 4 so-called isolation levels in line with the sql standard. The isolation levels in conjunction with blocking / non-blocking reads will determine the result of the above example. You need to understand the implications of the various isolation levels and choose the appropriate one for your needs.
To sum up: if you use serialisable isolation level with autocommit turned off, then the result will be 12000. In all other isolation levels and serialisable with autocommit enabled the result will be 11000. If you start using locking reads, then the result could be 12000 under all isolation levels.
Judging by the given conditions (a solitary DML statement), you don't need a transaction here, but a table lock. It's a very common confusion.
You need a transaction if you need to make sure that ALL your DML statements were performed correctly or weren't performed at all.
Means
you don't need a transaction for any number of SELECT queries
you don't need a transaction if only one DML statement is performed
Although, as it was noted in the excellent answer from Shadow, you may use a transaction here with appropriate isolation level, it would be rather confusing. What you need here is table locking. InnoDB engine lets you lock particular rows instead of locking the entire table and thus should be preferred.
In case you want the salary to be 1200 - then use table locks.
Or - a simpler way - just run an atomic update query:
UPDATE employees SET salary = salary + 1000 WHERE name = ?
In this case all salaries will be recorded.
If your goal is different, better express it explicitly.
But again: you have to understand that transactions in general has nothing to do with separate scripts execution. Regarding your topic of race condition you are interested not in transactions but in table/row locking. This is a very common confusion, and you better learn it straight:
a transaction is to ensure that a set of DML queries within one script were executed successfully.
table/row locking is to ensure that other script executions won't interfere.
The only topic where transactions and locking interfere is a deadlock, but again - it's only in case when a transaction is using locking.
Alas, the "without interference" needs some help from the programmer. It needs BEGIN and COMMIT to define the extent of the 'transaction'. And...
Your example is inadequate. The first statement needs SELECT ... FOR UPDATE. This tells the transaction processing that there is likely to be an UPDATE coming for the row(s) that the SELECT fetches. That warning is critical to "preventing interference". Now the timeline reads:
script1.php BEGINs
script2.php BEGINs
script1.php selects data (FOR UPDATE)
script2.php selects data is blocked, so it waits
script1.php updates data
script1.php commit() happens
script2.php selects data (and will get the newly-committed value)
script2.php updates data
script2.php commit() happens
(Note: This is not a 'deadlock', just a 'wait'.)
http://dev.mysql.com/doc/refman/5.6/en/set-transaction.html#isolevel_repeatable-read
"All consistent reads within the same transaction read the snapshot
established by the first read."
What does this snapshot contain? Only a snapshot of the rows read by the first read, of the complete table or even complete database?
Actually I thought only a snapshot of the rows read by the first read, but this confuses me:
TRANSACTION 1 is started at first, then 2. The result of the last "SELECT * FROM B;" in T1 is EXACTLY the same as if I had not executed T2 meanwhile (NEITHER the UPDATE nor the INSERT appear and that, although the read and write are on DIFFERENT tables)
TRANSACTION 1:
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION;
SELECT * FROM A WHERE a_id = 1;
SELECT SLEEP(8);
SELECT * FROM B;
COMMIT;
TRANSACTION 2
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION;
UPDATE B SET b_name = 'UPDATE_6' WHERE b_id = 2;
INSERT INTO B (b_name) VALUES('NEW_5');
COMMIT;
OUTPUTS of TRANSACTION 1
# 1st query
a_id a_name
1 a_1
# 3rd query
b_id b_name
1 b_1
2 b_2
3 b_3
In my web application a PHP script imports data from files in a MySQL database (InnoDB). It is made sure by the application, that there is just this 1 writing process at the same time. However there may be additionally multiple concurrent readers.
Now I wonder, whether I should and if yes how I can prevent the following:
in one repeatable-read transaction:
reader R1 reads from table T1
reader R1 does sth. else
reader R1 reads from table T2
If data in T1 and T2 belong together in any way, it could happen, that the reader reads in the 1st step some data and in the 3rd step the related data, that now might not be related anymore, because a writer has changed T1 AND T2 meanwhile. AFAIK repeatable-read only guarantees, that the same reads return the same data, but the 2nd read is not the same as the 1st one.
I hope, you know, what I mean, and I fear, that I got sth. totally wrong about this topic.
(a week ago I asked this question in MySQL forum without answers: http://forums.mysql.com/read.php?20,629710,629710#msg-629710)
The snapshot is for all the tables in database. The MySQL documentation states that explicitly multiple times here http://dev.mysql.com/doc/refman/5.6/en/innodb-consistent-read.html:
A consistent read means that InnoDB uses multi-versioning to present
to a query a snapshot of the database at a point in time.
and
Suppose that you are running in the default REPEATABLE READ isolation
level. When you issue a consistent read (that is, an ordinary SELECT
statement), InnoDB gives your transaction a timepoint according to
which your query sees the database.
and
The snapshot of the database state applies to SELECT statements within
a transaction, not necessarily to DML statements.
I have a PHP script that retrieves rows from a database and then performs work based on the contents. The work can be time consuming (but not necessarily computationally expensive) and so I need to allow multiple scripts to run in parallel.
The rows in the database looks something like this:
+---------------------+---------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+---------------+------+-----+---------------------+----------------+
| id | bigint(11) | NO | PRI | NULL | auto_increment |
.....
| date_update_started | datetime | NO | | 0000-00-00 00:00:00 | |
| date_last_updated | datetime | NO | | 0000-00-00 00:00:00 | |
+---------------------+---------------+------+-----+---------------------+----------------+
My script currently selects rows with the oldest dates in date_last_updated (which is updated once the work is done) and does not make use of date_update_started.
If I were to run multiple instances of the script in parallel right now, they would select the same rows (at least some of the time) and duplicate work would be done.
What I'm thinking of doing is using a transaction to select the rows, update the date_update_started column, and then add a WHERE condition to the SQL statement selecting the rows to only select rows with date_update_started greater than some value (to ensure another script isn't working on it). E.g.
$sth = $dbh->prepare('
START TRANSACTION;
SELECT * FROM table WHERE date_update_started > 1 DAY ORDER BY date_last_updated LIMIT 1000;
UPDATE table DAY SET date_update_started = UTC_TIMESTAMP() WHERE id IN (SELECT id FROM table WHERE date_update_started > 1 DAY ORDER BY date_last_updated LIMIT 1000;);
COMMIT;
');
$sth->execute(); // in real code some values will be bound
$rows = $sth->fetchAll(PDO::FETCH_ASSOC);
From what I've read, this is essentially a queue implementation and seems to be frowned upon in MySQL. All the same, I need to find a way to allow multiple scripts to run in parallel, and after the research I've done this is what I've come up with.
Will this type of approach work? Is there a better way?
I think your approach could work, as long as you also add some kind of identifier to the rows you selected that they are currently been worked on, it could be as #JuniusRendel suggested and i would even think about using another string key (random or instance id) for cases where the script resulted in errors and did not complete gracefully, as you will have to clean these fields once you updated the rows back after your work.
The problem with this approach as i see it is the option that there will be 2 scripts that run at the same point and will select the same rows before they were signed as locked. here as i can see it, it really depends on what kind of work you do on the rows, if the end result in these both scripts will be the same, i think the only problem you have is for wasted time and server memory (which are not small issues but i will put them aside for now...). if your work will result in different updates on both scripts your problem will be that you could have the wrong update at the end in the TB.
#Jean has mentioned the second approach you can take that involves using the MySql locks. i am not an expert of the subject but it seems like a good approach and using the 'Select .... FOR UPDATE' statement could give you what you are looking for as you could do on the same call the select & the update - which will be faster than 2 separate queries and could reduce the risk for other instances to select these rows as they will be locked.
The 'SELECT .... FOR UPDATE' allows you to run a select statement and lock those specific rows for updating them, so your statement could look like:
START TRANSACTION;
SELECT * FROM tb where field='value' LIMIT 1000 FOR UPDATE;
UPDATE tb SET lock_field='1' WHERE field='value' LIMIT 1000;
COMMIT;
Locks are powerful but be careful that it wont affect your application in different sections. Check if those selected rows that are currently locked for the update, are they requested somewhere else in your application (maybe for the end user) and what will happen in that case.
Also, Tables must be InnoDB and it is recommended that the fields you are checking the where clause with have a Mysql index as if not you may lock the whole table or encounter the 'Gap Lock'.
There is also a possibility that the locking process and especially when running parallel scripts will be heavy on your CPU & memory.
here is another read on the subject: http://www.percona.com/blog/2006/08/06/select-lock-in-share-mode-and-for-update/
Hope this helps, and would like to hear how you progressed.
We have something like this implemented in production.
To avoid duplicates, we do a MySQL UPDATE like this (I modified the query to resemble your table):
UPDATE queue SET id = LAST_INSERT_ID(id), date_update_started = ...
WHERE date_update_started IS NULL AND ...
LIMIT 1;
We do this UPDATE in a single transaction, and we leverage the LAST_INSERT_ID function. When used like that, with a parameter, it writes in the transaction session the parameter that, in this case, it's the ID of the single (LIMIT 1) queue that has been updated (if there is one).
Just after that, we do:
SELECT LAST_INSERT_ID();
When used without parameter, it retrieves the previously stored value, obtaining the queue item's ID that has to be performed.
Edit: Sorry, I totally misunderstood your question
You should just put a "locked" column on your table put the value to true on the entries your script is working with, and when it's done put it to false.
In my case i have put 3 other timestamp (integer) columns: target_ts , start_ts , done_ts.
You
UPDATE table SET locked = TRUE WHERE target_ts<=UNIX_TIMESTAMP() AND ISNULL(done_ts) AND ISNULL(start_ts);
and then
SELECT * FROM table WHERE target_ts<=UNIX_TIMESTAMP() AND ISNULL(start_ts) AND locked=TRUE;
Do your jobs and update each entry one by one (to avoid data inconcistencies) setting the done_ts property to current timestamp (you can also unlock them now). You can update target_ts to the next update you wish or you can ignore this column and just use done_ts for your select
Each time the script runs I would have the script generate a uniqid.
$sctiptInstance = uniqid();
I would add a script instance column to hold this value as a varchar and put an index on it. When the script runs I would use select for update inside of a transaction to select your rows based on whatever logic, excluding rows with a script instance, and then update those rows with the script instance. Something like:
START TRANSACTION;
SELECT * FROM table WHERE script_instance = '' AND date_update_started > 1 DAY ORDER BY date_last_updated LIMIT 1000 FOR UPDATE;
UPDATE table SET date_update_started = UTC_TIMESTAMP(), script_instance = '{$scriptInstance}' WHERE script_instance = '' AND date_update_started > 1 DAY ORDER BY date_last_updated LIMIT 1000;
COMMIT;
Now those rows will be excluded from other instances of the script. Do you work, and then update the rows to set the script instance back to null or blank, and also update your date last updated column.
You could also use the script instance to write to another table called "current instances" or something like that, and have the script check that table to get a count of running scripts to control the number of concurrent scripts. I would add the PID of the script to the table as well. You could then use that information to create a housekeeping script to run from cron periodically to check for long running or rogue processes and kill them, etc.
I have a system working exactly like this in production. We run a script every minute to do some processing, and sometimes that run can take more than a minute.
We have a table column for status, which is 0 for NOT RUN YET, 1 for FINISHED, and other value for under way.
The first thing the script does is to update the table, setting a line or multiple lines with a value meaning that we are working on that line. We use getmypid() to update the lines that we want to work on, and that are still unprocessed.
When we finish the processing, the script updates the lines that have the same process ID, marking them as finished (status 1).
This way we avoid each of the scripts to try and process a line that is already under processing, and it works like a charm. This doesn't mean that there isn't a better way, but this does get the work done.
I have used a stored procedure for very similar reasons in the past. We used the FOR UPDATE read lock to lock the table while a selected flag was updated to remove that entry from any future selects. It looked something like this:
CREATE PROCEDURE `select_and_lock`()
BEGIN
START TRANSACTION;
SELECT your_fields FROM a_table WHERE some_stuff=something
AND selected = 0 FOR UPDATE;
UPDATE a_table SET selected = 1;
COMMIT;
END$$
No reason it has to be done in a stored procedure though now I think about it.
I have a process that selects the next item to process from a MySQL InnoDB Table based on some criteria. When a row has been selected as the next to process, it's processing field is set to 1 while processing is happening outside the database. I do this so that many processors can be run at once, and they won't process the same row.
If I use transactions to execute the following queries, are they guaranteed to be executed together ( eg. Without any other MySQL connections executing queries. )? If they are not, then multiple processors could get the same id from the SELECT query and then processing will be redundant.
Pseudo Code Example
Prepare Transaction...
$id = SELECT id
FROM companies
WHERE processing = 0
ORDER BY last_crawled ASC
LIMIT 1;
UPDATE companies
SET processing = 1
WHERE id = $id;
Execute Transaction
I've been struggling to accomplish this fast enough using a single UPDATE query ( see this question ). Assume that is not an option for the purposes of this question.
You still have a possibility of a race condition, even though you execute the SELECT followed by the UPDATE in a single transaction. SELECT by itself does not lock anything, so you could have two concurrent sessions both SELECT and get the same id. Then both would attempt to UPDATE, but only one would "win" - the other would have to wait.
To get around this, use the SELECT...FOR UPDATE clause, which creates a lock on the rows it returns.
Prepare Transaction...
$id = SELECT id
FROM companies
WHERE processing = 0
ORDER BY last_crawled ASC
LIMIT 1
FOR UPDATE;
This means that the lock is created as the row is selected. This is atomic, which means no other session can sneak in and get a lock on the same row. If they try, their transaction will block on the SELECT.
UPDATE companies
SET processing = 1
WHERE id = $id;
Commit Transaction
I changed your "execute transaction" pseudocode to "commit transaction." Statements within a transaction execute immediately, which means they create locks and so on. Then when you COMMIT, the locks are released and any changes are committed. Committed means they can't be rolled back, and they are visible to other transactions.
Here's a quick example of using mysqli to accomplish this:
$mysqli = new mysqli(...);
$mysqli->report_mode = MYSQLI_REPORT_STRICT; /* throw exception on error */
$mysqli->begin_transaction();
$sql = "SELECT id
FROM companies
WHERE processing = 0
ORDER BY last_crawled ASC
LIMIT 1
FOR UPDATE";
$result = $mysqli->query($sql);
while ($row = $result->fetch_array(MYSQLI_ASSOC)) {
$id = $row["id"];
}
$sql = "UPDATE companies
SET processing = 1
WHERE id = ?";
$stmt = $mysqli->prepare($sql);
$stmt->bind_param("i", $id);
$stmt->execute();
$mysqli->commit();
Re your comment:
I tried an experiment and created a table companies, filled it with 512 rows, then started a transaction and issues the SELECT...FOR UPDATE statement above. I did this in the mysql client, no need to write PHP code.
Then, before committing my transaction, I examined the locks reported:
mysql> show engine innodb status\G
=====================================
2013-12-04 16:01:28 7f6a00117700 INNODB MONITOR OUTPUT
=====================================
...
---TRANSACTION 30012, ACTIVE 2 sec
2 lock struct(s), heap size 376, 513 row lock(s)
...
Despite using LIMIT 1, this report shows transaction appears to lock every row in the table (plus 1, for some reason).
So you're right, if you have hundreds of requests per second, it's likely that the transactions are queuing up. You should be able to verify this by watching SHOW PROCESSLIST and seeing many processes stuck in a state of Locked (i.e. waiting for access to rows that another thread has locked).
If you have hundreds of requests per second, you may have outgrown the ability for an RDBMS to function as a fake message queue. This isn't what an RDBMS is good at.
There are a variety of scalable message queue frameworks with good integration with PHP, like RabbitMQ, STOMP, AMQP, Gearman, Beanstalk.
Check out http://www.slideshare.net/mwillbanks/message-queues-a-primer-international-php-conference-fall-2012
That depends. There are (in general) differet isolation levels in SQL. In MySQL you can change which one to use using SET TRANSACTION ISOLATION LEVEL.
While "SERIALIZABLE" (which is the strictest one) still doesn't imply that no other actions are executed in between the ones from your transaction, it DOES make sure that there is no difference if simultanious transactions are executed one after another or not - if it would make a difference, on transaction is rolled back and executed later.
Note however that the stricter the isolation is, the more locking and rollbacks has to be done. So makre sure you really need that before using it.
I have a php script that executes mysql pdo queries. There are a few reads and writes to the same table in this script.
For sake of example let's say that there are 4 queries, a read, write, another read, another write, each read takes 10 second to execute, and each write takes .1 seconds to execute.
If I execute this script from the cli nohup php execute_queries.php & twice in 1/100th of a second, what would the execution order of the queries be?
Would all the queries from the first instance of the script need to finish before the queries from the 2nd instance begin to run, or would the first read from both instances start and finish before the table is locked by the write?
NOTE: assume that I'm using myisam and that the write is an update to a record (IE, entire table gets locked during the write.)
Since you are not using transactions, then no, the won't wait for all the queries in one script to finish an so the queries may get overlaped.
There is an entire field of study called concurrent programming that teaches this.
In databases it's about transactions, isolation levels and data locks.
Typical (simple) race condition:
$visits = $pdo->query('SELECT visits FROM articles WHERE id = 44')->fetch()[0]['visits'];
/*
* do some time-consuming thing here
*
*/
$visits++;
$pdo->exec('UPDATE articles SET visits = '.$visits.' WHERE id = 44');
The above race condition can easily turn sour if 2 PHP processes read the visits from the database one millisecond after the other, and assuming the initial value of visits was 6, both would increment it to 7 and both would write 7 back into the database even though the desired effect was that 2 visits increment the value by 2 (final value of visits should've been 8).
The solution to this is using atomic operations (because the operation is simple and can be reduced to one single atomic operation).
UPDATE articles SET visits = visits+1 WHERE id = 44;
Atomic operations are guaranteed by the database engines to take place uninterrupted by other processes/threads. Usually the database has to queue incoming updates so that they don't affect each other. Queuing obviously slows things down because each process has to wait for all processes before it until it gets the chance to be executed.
In a less simple operation we need more than one statement:
SELECT #visits := visits FROM articles WHERE ID = 44;
SET #visits = #visits+1;
UPDATE articles SET visits = #visits WHERE ID = 44;
But again even at the database level 3 separate atomic statements are not guaranteed to yield an atomic result. They can be overlap with other operations. Just like the PHP example.
To solve this you have to do the following:
START TRANSACTION
SELECT #visits := visits FROM articles WHERE ID = 44 FOR UPDATE;
SET #visits = #visits+1;
UPDATE articles SET visits = #visits WHERE ID = 44;
COMMIT;