PHP/MySQL Concurrency - Write dependent on Read - Critical Section - php

I have a website running PHP+MySQL. It is a multiuser system and most of the MySQL tables are MyISAM-based.
The following situation got me puzzled for the last few hours:
I have two (concurrent) users A,B. Both of them will do this:
Perform a Read Operation on Table 1
Perform a Write Operation on another Table 2 (only if the previous Read operation will return a distinct result, e.g. STATUS="OK")
B is a little delayed towards A.
So it will occur like this:
User A performs a read on Table 1 and sees STATUS="OK".
(User A Schedules Write on Table 2)
User B performs a read on Table 1 and still sees STATUS="OK".
User A performs Write on Table 2 (resulting in STATUS="NOT OK" anymore)
User B performs Write on Table 2 (assuming STATUS="OK")
I think I could prevent this if Reading Table 1 and Writing to Table 2 were defined as a critical section and would be executed atomically. I know this works perfectly fine in Java with threads etc., however in PHP there is no thread communication, as far as I know.
So the solution to my problem must be database-related, right?
Any ideas?
Thanks a lot!

The Right Way: Use InnoDB and transactions.
The Wrong-But-Works Way: Use the GET_LOCK() MySQL function to obtain an exclusive named lock before performing the database operations. When you're don, release the lock with RELEASE_LOCK(). Since only one client can own a particular lock, this will ensure that there's never more than one instance of the script in the "critical section" at the same time.
Pseudo-code:
SELECT GET_LOCK('mylock', 10);
If the query returned "1":
//Read from Table 1
//Update Table 2
SELECT RELEASE_LOCK('mylock');
Else:
//Another instance has been holding the lock for > 10 seconds...

Related

MySQL Transaction loses record after SELECT statement

I encountered a quite strange bug.
The situation is as follows:
We have a client based order-software (online, PHP) that currently runs the websites of 3 clients.
We have one server divided into 2 separate webspaces:
client 1 and 2 share the same code, they both sit on webspace 1
client 3 sits on a separate webspace and uses an identical copy of the code (apparently this was necessary "because of reasons" a few years ago)
Every client connects to the same database (InnoDB).
All MySQL-Queries for an order are executed in a transaction (isolation level SERIALIZABLE, autocommit = 0).
The database tables don't have any triggers or functions.
SOMETIMES (once or twice a day maximum) client 2 or 3 suddenly "lose" a record (the one that holds the order) that has been created in the first few queries of the transaction!
Client 1 NEVER loses anything despite having MUCH more orders per day than client 2 and 3 (client 1 >500 orders, client 2 and 3 between 10 and 20 orders).
According to our provider there is no MySQL-Error visible inside the logs. That matches what my debugging looks like.
Now it gets really strange.
After weeks of gradually debugging down the rabbit hole I found the one single query that causes the order record to disappear - it's a simple SELECT!
Debugging:
// ... start transaction
// ... create order record
// ... many other mysql requests
// order_exists() just checks if the order exists using a simple select statement
$s_order_debug .= "1-A: ".var_export(mod_shop_model::order_exists($i_order_id), true).' ('.$this->get_error($i_link_index).')'."\n";
$dbres_query_return = mysqli_query($this->arr_link[$i_link_index], $s_query);
$s_order_debug .= "1-B: ".var_export(mod_shop_model::order_exists($i_order_id), true).' ('.$this->get_error($i_link_index).') '."\n";
$s_order_debug .= $s_query."\n";
// ... some other requests that fail because the order record is missing
// ... commit transaction
Output:
1-A: true ()
1-B: false ()
SELECT `product_id`, `senddate`, `specialprice_pickup_amount` FROM `mod_shop_cart` WHERE `client_id` = '2' AND `customer_id` = '10107';
The error occurs with client_id = 2 or client_id = 3, but never with client_id = 1. And like I said, it only happens occasionally.
Honestly, I am absolutely clueless why this happens.
Sure, there are many queries inside that transaction, but if it was generally too much, the error would appear more often, doesn't it?
It just occurs once or twice a day at differents times of the day.
Do you have a clue on how to possibly fix this?

need and scope of transaction isolation - MySQL

http://dev.mysql.com/doc/refman/5.6/en/set-transaction.html#isolevel_repeatable-read
"All consistent reads within the same transaction read the snapshot
established by the first read."
What does this snapshot contain? Only a snapshot of the rows read by the first read, of the complete table or even complete database?
Actually I thought only a snapshot of the rows read by the first read, but this confuses me:
TRANSACTION 1 is started at first, then 2. The result of the last "SELECT * FROM B;" in T1 is EXACTLY the same as if I had not executed T2 meanwhile (NEITHER the UPDATE nor the INSERT appear and that, although the read and write are on DIFFERENT tables)
TRANSACTION 1:
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION;
SELECT * FROM A WHERE a_id = 1;
SELECT SLEEP(8);
SELECT * FROM B;
COMMIT;
TRANSACTION 2
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
START TRANSACTION;
UPDATE B SET b_name = 'UPDATE_6' WHERE b_id = 2;
INSERT INTO B (b_name) VALUES('NEW_5');
COMMIT;
OUTPUTS of TRANSACTION 1
# 1st query
a_id a_name
1 a_1
# 3rd query
b_id b_name
1 b_1
2 b_2
3 b_3
In my web application a PHP script imports data from files in a MySQL database (InnoDB). It is made sure by the application, that there is just this 1 writing process at the same time. However there may be additionally multiple concurrent readers.
Now I wonder, whether I should and if yes how I can prevent the following:
in one repeatable-read transaction:
reader R1 reads from table T1
reader R1 does sth. else
reader R1 reads from table T2
If data in T1 and T2 belong together in any way, it could happen, that the reader reads in the 1st step some data and in the 3rd step the related data, that now might not be related anymore, because a writer has changed T1 AND T2 meanwhile. AFAIK repeatable-read only guarantees, that the same reads return the same data, but the 2nd read is not the same as the 1st one.
I hope, you know, what I mean, and I fear, that I got sth. totally wrong about this topic.
(a week ago I asked this question in MySQL forum without answers: http://forums.mysql.com/read.php?20,629710,629710#msg-629710)
The snapshot is for all the tables in database. The MySQL documentation states that explicitly multiple times here http://dev.mysql.com/doc/refman/5.6/en/innodb-consistent-read.html:
A consistent read means that InnoDB uses multi-versioning to present
to a query a snapshot of the database at a point in time.
and
Suppose that you are running in the default REPEATABLE READ isolation
level. When you issue a consistent read (that is, an ordinary SELECT
statement), InnoDB gives your transaction a timepoint according to
which your query sees the database.
and
The snapshot of the database state applies to SELECT statements within
a transaction, not necessarily to DML statements.

Multiple instances of same PHP script processing diffrent MySQL rows using rowlocking

What I want to do is to execute the same script every few minutes with cron.
The script needs to process some data read from the database, so obviously I need it work on diffrent row each time.
My concept was to use row locking to make sure each instance work on different row, but it doesn't seem to work that way.
Is it even possible to use row locks this way? Any other solutins?
Example:
while($c < $limit) {
$sql=mysql_query("SELECT * FROM table WHERE ... LIMIT 1 FOR UPDATE");
$data=mysql_fetch_assoc($sql);
(process data)
mysql_query("update table set value=spmething, timestamp=NOW()");
$c++;
}
Basically what i need is SCRIPT1 reads R1 from the table; SCRIPT2 reads R2 (next non-locked row matching criteria)
EDIT:
Let's say for example that:
1) the table stores a list of URL
2) the script checks if URL responses, and updates it's status (and timestamp) in database
This should essentially be treated as two separate problems:
Finding a job for each worker to process. Ideally this should be very efficient and pre-emptively avoid failures in step 2, which comes next.
Ensuring that each job gets processed at most once or exactly once. No matter what happens the same job should not be concurrently processed by multiple workers. You may want to ensure that no jobs are lost due to buggy/crashing workers.
Both problems have multiple workable solutions. I'll give some suggestions about my preference:
Finding a job to process
For low-velocity systems it should be sufficient just to look for the most recent un-processed job. You do not want to take the job yet, just identify it as a candidate. This could be:
SELECT id FROM jobs ORDER BY created_at ASC LIMIT 1
(Note that this will process the oldest job first—FIFO order—and we assume that rows are deleted after processing.)
Claiming a job
In this simple example, this would be as simple as (note I am avoiding some potential optimizations that will make things less clear):
BEGIN;
SELECT * FROM jobs WHERE id = <id> FOR UPDATE;
DELETE FROM jobs WHERE id = <id>;
COMMIT;
If the SELECT returns our job when queried by id, we've now locked it. If another worker has already taken this job, an empty set will be returned, and we should look for a different job. If two workers are competing for the same job, they will block each other from the SELECT ... FOR UPDATE onwards, such that the previous statements are universally true. This will allow you to ensure that each job is processed at most once. However...
Processing a job exactly once
A risk in the previous design is that a worker takes a job, fails to process it, and crashes. The job is now lost. Most job processing systems therefor do not delete the job when they claim it, instead marking it as claimed by some worker and implement a job-reclaim system.
This can be achieved by keeping track of the claim itself using either additional columns in the job table, or a separate claim table. Normally some information is written about the worker, e.g. hostname, PID, etc., (claim_description) and some expiration date (claim_expires_at) is provided for the claim e.g. 1 hour in the future. An additional process then goes through those claims and transactionally releases claims which are past their expiration (claim_expires_at < NOW()). Claiming a job then also requires that the job row is checked for claims (claim_expires_at IS NULL) both at selection time and when claiming with SELECT ... FOR UPDATE.
Note that this solution still has problems: If a job is processed successfully, but the worker crashes before successfully marking the job as completed, we may eventually release the claim and re-process the job. Fixing that requires a more advanced system which is left as an exercise for the reader. ;)
If you are going to read the row once, and only once, then I would create an is_processed column and simply update that column on the rows that you've processed. Then you can simply query for the first row that has is_processed = 0

Execution order of mysql queries from php script when same script is launched quickly twice

I have a php script that executes mysql pdo queries. There are a few reads and writes to the same table in this script.
For sake of example let's say that there are 4 queries, a read, write, another read, another write, each read takes 10 second to execute, and each write takes .1 seconds to execute.
If I execute this script from the cli nohup php execute_queries.php & twice in 1/100th of a second, what would the execution order of the queries be?
Would all the queries from the first instance of the script need to finish before the queries from the 2nd instance begin to run, or would the first read from both instances start and finish before the table is locked by the write?
NOTE: assume that I'm using myisam and that the write is an update to a record (IE, entire table gets locked during the write.)
Since you are not using transactions, then no, the won't wait for all the queries in one script to finish an so the queries may get overlaped.
There is an entire field of study called concurrent programming that teaches this.
In databases it's about transactions, isolation levels and data locks.
Typical (simple) race condition:
$visits = $pdo->query('SELECT visits FROM articles WHERE id = 44')->fetch()[0]['visits'];
/*
* do some time-consuming thing here
*
*/
$visits++;
$pdo->exec('UPDATE articles SET visits = '.$visits.' WHERE id = 44');
The above race condition can easily turn sour if 2 PHP processes read the visits from the database one millisecond after the other, and assuming the initial value of visits was 6, both would increment it to 7 and both would write 7 back into the database even though the desired effect was that 2 visits increment the value by 2 (final value of visits should've been 8).
The solution to this is using atomic operations (because the operation is simple and can be reduced to one single atomic operation).
UPDATE articles SET visits = visits+1 WHERE id = 44;
Atomic operations are guaranteed by the database engines to take place uninterrupted by other processes/threads. Usually the database has to queue incoming updates so that they don't affect each other. Queuing obviously slows things down because each process has to wait for all processes before it until it gets the chance to be executed.
In a less simple operation we need more than one statement:
SELECT #visits := visits FROM articles WHERE ID = 44;
SET #visits = #visits+1;
UPDATE articles SET visits = #visits WHERE ID = 44;
But again even at the database level 3 separate atomic statements are not guaranteed to yield an atomic result. They can be overlap with other operations. Just like the PHP example.
To solve this you have to do the following:
START TRANSACTION
SELECT #visits := visits FROM articles WHERE ID = 44 FOR UPDATE;
SET #visits = #visits+1;
UPDATE articles SET visits = #visits WHERE ID = 44;
COMMIT;

Cached mysql inserts - Preserving Data integrity

I would like to do a lot of inserts, but could it be possible to update mysql after a while.
For example if there is a query such as
Update views_table SET views = views + 1 WHERE id = 12;
Could it not be possible to maybe store this query until the views have gone up to 100 and then run the following instead of running the query from above 100 times.
Update views_table SET views = views + 100 WHERE id = 12;
Now, lets say that is done, then comes the problem of data integrity. Let's say, there are 100 php files open which are all about to run the same query. Now unless there is a locking mechanism on incrementing the cached views, there is a possibility that multiple files may have a same value of the cached view, so lets say process 1 may have 25 cached views and php process 2 may have 25 views and process 3 may have 27 views from the file. Now lets say process 3 finishes and increments the counter to 28. Then lets say php process is about finish and it finished just after process 3, which means that the counter would be brought back down to 26.
So do you guys have any solutions that are fast but are data secure as well.
Thanks
As long as your queries use relative values views=views+5, there should be no problems.
Only if you store the value somewhere in your script, and then calculate the new value yourself,you might run into trouble. But why would you want to do this? Actually, why do you want to do all of this in the first place? :)
If you don't want to overload the database, you could use UPDATE LOW_PRIORITY table set ..., the LOW_PRIORITY keyword will put the update action in a queue and wait for the table to no longer be used by reads or inserts.
First of all: with these queries: regardless of when a process starts, the UPDATE .. SET col = col + 1 is a safe operation, so it will not 'decrease' the counter, ever.
Regarding to 'store this query until the views have gone up to 100 and then run the following instead of running the query from above 100 times': not really. You can store a counter in faster memory (memcached comes to mind), with a process that transfers it to the database once in a while, or store it in another table with a AFTER UPDATE trigger, but I don't really see a point doing that.

Categories