I've been looking at several different solutions to my table concurrency problems in MySQL/PHP/InnoDB:
I really want Process 2 to wait for Process 1 to commit before starting its own transaction and SELECTing data to work on and then trying to INSERT a UNIQUE INDEX. So, Process 1 locks the table and Process 2 checks for the lock, and waits if a WRITE lock already exists with something like sleep()... (I cannot use semaphores).
Is that it? 1b. That simple? 1c. Should I also still check for a duplicate entry when INSERTing?
Also, do I need to check for this single WRITE lock EVERYWHERE ELSE that there are UPDATES and/or INSERTS to the table, or will this be handled automatically by making the UPDATE/INSERT wait? This is only done once, so making all other queries wait is the single biggest downside if this is the case.
SOLUTION BONUS THOUGHT: it would seem simpler to check for a duplicate key error when inserting, and if true, then re-SELECT and re-calculate a new value and try again until the DUPLICATE KEY error is no longer.
Related
Im trying to find a solution to a MariaDB race condition.
There are some cases where multiple process get executed almost at the same time (with the same timestamp that comes from some units in the field) and they need to read the same row each one (but only the first one needs to perform the action).
I was thinking to lock the row (only the row, locking the entire table is not an option), so the first process will read it and check the timestamp of the latest update, perform the task needed and once is unlocked the other processes will be able to read it, and if it has the same timestamp just ignore it.
It will affect performance, but only for those few cases where this happens.
I've been trying to do this in MariaDB, but I can't find the how... starting a transaction and executing FOR UPDATE seems to lock the entire table, because I'm not able to fetch another rows meanwhile.
And reading MariaDB documentation I see this
When LOCK IN SHARE MODE is specified in a SELECT statement, MariaDB will wait until all transactions that have modified the rows are committed. Then, a write lock is acquired. All transactions can read the rows, but if they want to modify them, they have to wait until your transaction is committed.
So basically, what I want to do can't be done in MariaDB?
Any ideas?
Thank you,
I have to update a big table (products) in a MySQL database, every 10 minutes with PHP. I have to run the PHP script with cron job, and I get the most up to date products from a CSV file. The table has currently ~18000 rows, and unfortunately I can not tell how much it will change in a 10 min period. The most important thing is of course I do not want the users to notice the update in the background.
These are my ideas and fears:
Idea1: I know that there is a way to load a csv file into a table with MySQL, so maybe I can use a transaction to truncate the table, and import the CSV. But even if I use transactions, as long as the table is large, I'm afraid that there will be a little chance for some users to see the empty database.
Idea2: I could compare the old and the new csv file with a library and only update/add/remove the changed rows. This way I think there it's not possible for a user to see an empty database, but I'm afraid this method will cost a lot of RAM and CPU, and I'm on a shared hosting.
So basically I would like to know which method is the most secure to update a table completely without the users noticing it.
Assuming InnoDB and default isolation level, you can start a transaction, delete all rows, insert your new rows, then commit. Before the commit completes, users will see the previous state.
While the transaction is open (after the deletes), updates will block, but SELECTs will not. Since it's a read only table for the user, it won't be an issue. They'll still be able to SELECT while the transaction is open.
You can learn the details by reading about MVCC. The gist of it is that any time someone performs a SELECT, MySQL uses the data in the database plus the rollback segment to fetch the previous state until the transaction is committed or rolled back.
From MySQL docs:
InnoDB uses the information in the rollback segment to perform the
undo operations needed in a transaction rollback. It also uses the
information to build earlier versions of a row for a consistent read.
Only after the commit completes will the users see the new data instead of the old data, and they won't see the new data until their current transaction is over.
My understanding is that Apache creates a separate PHP process for each incoming request. That means that if I have code that does something like:
check if record exists
if record doesn't exist, create it
Then this is susceptible to a race condition, is it not? If two requests come in at the same time, and they both hit (1) simultaneously, they will both come back false, and then both attempt to insert a new record.
If so, how do people deal with this? Would creating a MySQL transaction around those 2 requests solve the issue, or do we need to do a full table lock?
As far as I know you cannot create a transaction across different connections. Maybe one solution would be to set column you are checking to be unique. This way if two connections are made to 10, and 10 does not exist. They will both try to create 10. One will finish inserting the row first, and all is well; then the connection just a second behind will fail because the column isn't unique. If you catch the exception that is thrown, then you can subsequently SELECT the record from the database.
Honestly, I've very rarely run into this situation. Often times it can be alleviated by reevaluating business requirements. Even if two different users were trying to insert the exact same data, I would defer management of duplicates the users, rather than the application.
However, if there were a reason to enforce a unique constraint in the application logic, I would use an INSERT IGNORE... ON DUPLICATE KEY UPDATE... query (with the corresponding UNIQUE index in the table, of course).
I think that handling errors on the second step ought to be sufficient. If two processes try to create a record then one of them will fail, as long as youve configured the MySQL table appropriately. Using UNIQUE across the right fields is one way to do the trick.
Apache does not "create a separate PHP process for each incoming request".
It either uses a pool of processes (default, prefork mode), or threads.
The race conditions, as you mention, may also be refered to (or cause) DB "Deadlocks".
#see what is deadlock in a database?
Using transactions where needed should solve this problem, yes.
By making sure you check if a record exists and create it within a transaction, the whole operation is atomic.
Hence, other requests will not try to create duplicate records (or, depending on the actual queries, create inconsistencies or enter actual deadlocks).
Also note that MySQL does not (yet) support nested transactions:
You cannot have transactions inside transactions, as the first commit will commit everything.
I am making a webservice in PHP which does a series of calculations based on a select from a table and then updates the table afterwards with the new results.
However i want to prevent the case where another person is making a call to the same webservice while another person's session is still doing an update.
Is it the right thing here to lock that entire table and then unlock it again? If so, how do i lock and unlock a mysql table using PHP pdo?
Database Management Systems like MySQL are smart enough to prevent concurrency violations like these.
Look for database isolation levels (read uncommitted, read commited, repeatable read, serializable) and the possible problems (dirty read, non repeatable read ...) -> Wikipedia.
Personally, I would not recommend a table lock in your case. You better wrap your calculations and database operations in a transaction and rely on the DBMS to manage your stuff.
I posted a comment:
Not a direct answer, but I don't think this is any problem. The
calculations and fetching data from the database is done within a few
milliseconds. The chances or two people interacting at the same time
is soooo small that most people don't bother making a lock like this.
But if these calculations are critical you could prevent this problem by adding a new field and simply call it occupied, busy or something like that.
When you run your script, check if this field is set to for example 1, if it is, make the script sleep for 1-2-3 seconds and then retry. If this field is set to 0, update it to 1, do the calculations and set it back to 0 again.
This would prevent two people from accessing the same values at the same time.
I have a web service (xmlrpc service to be exact) that handles among other things writing data into the database. Here's the scenario:
I often receive requests to either update or insert a record. What I would do is this:
If the record already exists, append to the record,
If not, create a new record
The issue is that there are certain times I would get a 'burst' of requests, which spawns several apache threads to handle the request. These 'bursts' would come within less than milliseconds of each other. I now have several threads performing #1 and #2. Often two threads would would 'pass' number #1 and actually create two duplicate records (except for the primary key).
I'd like to use some locking mechanism to prevent other threads from accessing the table while the other thread finishes its work. I'm just afraid of using it because if something happens I don't want to leave the table locked.
UPDATE:
The table already has a primary key. The ideal situation should be that the first thread should create the record if it doesn't exist, then once the second thread comes in, it won't create the another record, but just update the record that was already created. It's almost as though I'd like to make the threads form a single-file line.
Is there a solid way of handling this? I'm open to using locks if I can do it properly.
Thanks,
Add a unique or primary index and use:
INSERT INTO table (..) VALUES (...) ON DUPLICATE KEY UPDATE .....
If you add a unique index on your table, the second insert will fail. Thus, all logic will be done by database server, and all you need is just to display an error to the user... Also, in such scenario, you don't have to lock table[s] during insert operations.
You can put numerous steps that forms an atomic operation in a TRANSACTION
If you truly want to serialize your process, you can grab a Lock Tables tablename Write at the start of your service, and Unlock Tables when done.
If you are using Innodb and transactions, you have to perform the Lock Tables after the start of the transaction.
I am not advocating this method, as there is usually a better way of handling, however if you need a quick and dirty solution, this will work with a minimal amount of code changes.