Race conditions with Apache, MySQL and PHP - php

My understanding is that Apache creates a separate PHP process for each incoming request. That means that if I have code that does something like:
check if record exists
if record doesn't exist, create it
Then this is susceptible to a race condition, is it not? If two requests come in at the same time, and they both hit (1) simultaneously, they will both come back false, and then both attempt to insert a new record.
If so, how do people deal with this? Would creating a MySQL transaction around those 2 requests solve the issue, or do we need to do a full table lock?

As far as I know you cannot create a transaction across different connections. Maybe one solution would be to set column you are checking to be unique. This way if two connections are made to 10, and 10 does not exist. They will both try to create 10. One will finish inserting the row first, and all is well; then the connection just a second behind will fail because the column isn't unique. If you catch the exception that is thrown, then you can subsequently SELECT the record from the database.

Honestly, I've very rarely run into this situation. Often times it can be alleviated by reevaluating business requirements. Even if two different users were trying to insert the exact same data, I would defer management of duplicates the users, rather than the application.
However, if there were a reason to enforce a unique constraint in the application logic, I would use an INSERT IGNORE... ON DUPLICATE KEY UPDATE... query (with the corresponding UNIQUE index in the table, of course).

I think that handling errors on the second step ought to be sufficient. If two processes try to create a record then one of them will fail, as long as youve configured the MySQL table appropriately. Using UNIQUE across the right fields is one way to do the trick.

Apache does not "create a separate PHP process for each incoming request".
It either uses a pool of processes (default, prefork mode), or threads.
The race conditions, as you mention, may also be refered to (or cause) DB "Deadlocks".
#see what is deadlock in a database?
Using transactions where needed should solve this problem, yes.
By making sure you check if a record exists and create it within a transaction, the whole operation is atomic.
Hence, other requests will not try to create duplicate records (or, depending on the actual queries, create inconsistencies or enter actual deadlocks).
Also note that MySQL does not (yet) support nested transactions:
You cannot have transactions inside transactions, as the first commit will commit everything.

Related

How can I prevent concurrency while writing and reading from my DB?

I have a table with user login information and registration too. So when two users consecutively try to add their details:
Will both the writes clashes and the table wont be updated?
Using threads for these writes is bad idea. As for each write a new thread would be created and it would clog the server. Is the server responsible for it to manage on its own?
Is locking the table a good idea?
My back-end runs on PHP/Apache with MySQL (InnoDB) for the database.
Relational databases are designed to avoid these kinds of conditions. You don't need to worry about them unless you are designing your own relational database from scratch.
In short, just know this: Any time a write is initiated, there is a row-level lock. If another transaction wants to write to that same row, then it has to wait until the first transaction releases the lock. This is a fundamental part of relational databases. You don't need to add a lock because they've already thought of that :)
You can read more about how MySQL performs locks to avoid deadlocking and other transaction errors here.
If you're really paranoid about this, or perhaps you are doing multiple things when you register a user and need them done atomically, you might want to look at using Transactions in MySQL. There's a decent write-up about Transactions here http://www.mysqltutorial.org/mysql-transaction.aspx
BEGIN;
do related reads/writes to the data
COMMIT;
Inside that "transaction", the connection sees a consistent view of the data, and blocks anyone else from messing with that view.
There are exceptions. The main one is
BEGIN
SELECT ... FOR UPDATE;
fiddle with the values SELECTed
UPDATE ...; -- and change those values
COMMIT;
The SELECT .. FOR UPDATE announces what should not be tampered with. If another connection wants to mess with the same rows, it will have to wait until your COMMIT, at which time he may find that things have changed and he will need to do something different. But, in general, this avoids a "deadlock" wherein two transactions are stepping on each other so badly that one has to be "rolled back".
With techniques like this, the "concurrency" is prevented only briefly and relatively precisely. That is, if two connections are working with different rows, both can proceed -- there is no need to "prevent concurrency".

Concurrent database access

When the web server receives a request for my PHP script, I presume the server creates a dedicated process to run the script. If, before the script exits, another request to the same script comes, another process gets started -- am I correct, or the second request will be queued in the server, waiting for the first request to exit? (Question 1)
If the former is correct, i.e. the same script can run simultaneously in a different process, then they will try to access my database.
When I connect to the database in the script:
$DB = mysqli_connect("localhost", ...);
query it, conduct more or less lengthy calculations and update it, I don't want the contents of the database to be modified by another instance of a running script.
Question 2: Does it mean that since connecting to the database until closing it:
mysqli_close($DB);
the database is blocked for any access from other software components? If so, it effectively prevents the script instances from running concurrently.
UPDATE: #OllieJones kindly explained that the database was not blocked.
Let's consider the following scenario. The script in the first process discovers an eligible user in the Users table and starts preparing data to append for that user in the Counter table. At this moment the script in the other process preempts and deletes the user from the Users table and the associate data in the Counter table; it then gets preempted by the first script which writes the data for the user no more existing. These data become in the head-detached state, i.e. unaccessible.
How to prevent such a contention?
In modern web servers, there's a pool of processes (or possibly threads) handling requests from users. Concurrent requests to the same script can run concurrently. Each request-handler has its own connection to the DBMS (they're actually maintained in a pool, but that's a story for another day).
The database is not blocked while individual request-handlers are using it, unless you block it explicitly by locking a table or doing a request like SELECT ... FOR UPDATE. For more information on this deep topic, read about transactions.
Therefore, it's important to write your database queries in such a way that they won't interfere with each other. For example, if you need to learn the value of an auto-incremented column right after you insert a row, you should use LAST_INSERT_ID() or mysqli_insert_id() instead of trying to query the data base: another user may have inserted another row in the meantime.
The system test discipline for scaled-up web sites usually involves a rigorous load test in order to shake out all this concurrency.
If you're doing a bunch of work on a particular entity, in your case a User, you use a transaction.
First you do
BEGIN
to start the transaction. Then you do
SELECT whatever FROM User WHERE user_id = <<whatever>> FOR UPDATE
to choose the user and mark that user's row as busy-being-updated. Then you do all the work you need to do to fill out various rows in various tables relating to that user.
Finally you do
COMMIT
If you messed things up, or don't want to go through with the change, you do
ROLLBACK
and all your changes will be restored to their state right before the SELECT ... FOR UPDATE.
Why does this work? Because if another client does the same SELECT .... FOR UPDATE, MySQL will delay that request until the first one either gets COMMIT or ROLLBACK.
If another client works with a different userid, the operations may proceed concurrently.
You need the InnoDB access method to use transactions: MyISAM doesn't support them.
Multiple reads can be done concurrently, if there is a write operation then it will block all other operations. A read will block all writes.

MySQL Table Concurrency Solutions

I've been looking at several different solutions to my table concurrency problems in MySQL/PHP/InnoDB:
I really want Process 2 to wait for Process 1 to commit before starting its own transaction and SELECTing data to work on and then trying to INSERT a UNIQUE INDEX. So, Process 1 locks the table and Process 2 checks for the lock, and waits if a WRITE lock already exists with something like sleep()... (I cannot use semaphores).
Is that it? 1b. That simple? 1c. Should I also still check for a duplicate entry when INSERTing?
Also, do I need to check for this single WRITE lock EVERYWHERE ELSE that there are UPDATES and/or INSERTS to the table, or will this be handled automatically by making the UPDATE/INSERT wait? This is only done once, so making all other queries wait is the single biggest downside if this is the case.
SOLUTION BONUS THOUGHT: it would seem simpler to check for a duplicate key error when inserting, and if true, then re-SELECT and re-calculate a new value and try again until the DUPLICATE KEY error is no longer.

How to do cross-database information syncing?

I am designing a directory where data in multiple sources will have to override the data in other sources when altered or updated. Some of the databases are MySQL, SQL Server and some of the info will be AD/LDAP.
My question is this: is there a design pattern for this type of database propagation, to reduce traffic and prevent errors? Also this project will be in PHP, so if anyone knows of a similar open source project I could adapt from, that would be nice too. There will probably have to be some logic between some of the databases.
You'll need some way to flag the records to be synced. We use a system like that, in which each table to sync has a column that keeps the syncstate. When a record is modified, it modifies its state too (in a trigger) and a synchronization tool queries for modified records every few minutes.
Disadvantage is that you will need lots of code to handle this correctly, especially because you cannot delete records directly. The sync tool first needs to know and needs to perform the actual delete. Besides that, it is hard to build a good queue this way, so if records are synced before their parents are, you'll get an error. And every table that must be synced needs an extra column.
So now there is a new solution about to be implemented. This solution uses a separate table for the queue. The queue contains pointers to records in other tables (primary key value and a reference to table name/field name). This queue is now the only table to monitor changes, so all a table need to do is implement a single trigger that marks the modified records as modified in the queue. Because it is a single queue in a separate table, this adds solutions for the problems I mentioned earlier:
Records can be deleted immediately. The sync tool finds an id in the queue, verifies that it does not longer exist, so it deletes it from the other database too
Child parent dependancies are automatically solved. A new parent will be in the queue before its children and a deleted parent will be there behind its children. The only problem you may find in cross linked records, although deferred commits might be a solution for that.
No need for extra column in all tables. Only a single queue, some helper tables, and a single trigger containing a single function call on each table to be synced.
Unfortunately we've not fully implemented this solution, so I can't tell you if it will ectually work better, though the tests definately suggest so.
Mind that this system does a one on one copy of records. I think that is the best approch too. Copy the data, and then (afterwards) process it on the target server. I don't think it is a good idea to process the data while copying it. If anything goes wrong, you'll have a hell of a job debugging and restoring/recalculating data.

how can i make sure only a single record is inserted when multiple apache threads are trying to access the database?

I have a web service (xmlrpc service to be exact) that handles among other things writing data into the database. Here's the scenario:
I often receive requests to either update or insert a record. What I would do is this:
If the record already exists, append to the record,
If not, create a new record
The issue is that there are certain times I would get a 'burst' of requests, which spawns several apache threads to handle the request. These 'bursts' would come within less than milliseconds of each other. I now have several threads performing #1 and #2. Often two threads would would 'pass' number #1 and actually create two duplicate records (except for the primary key).
I'd like to use some locking mechanism to prevent other threads from accessing the table while the other thread finishes its work. I'm just afraid of using it because if something happens I don't want to leave the table locked.
UPDATE:
The table already has a primary key. The ideal situation should be that the first thread should create the record if it doesn't exist, then once the second thread comes in, it won't create the another record, but just update the record that was already created. It's almost as though I'd like to make the threads form a single-file line.
Is there a solid way of handling this? I'm open to using locks if I can do it properly.
Thanks,
Add a unique or primary index and use:
INSERT INTO table (..) VALUES (...) ON DUPLICATE KEY UPDATE .....
If you add a unique index on your table, the second insert will fail. Thus, all logic will be done by database server, and all you need is just to display an error to the user... Also, in such scenario, you don't have to lock table[s] during insert operations.
You can put numerous steps that forms an atomic operation in a TRANSACTION
If you truly want to serialize your process, you can grab a Lock Tables tablename Write at the start of your service, and Unlock Tables when done.
If you are using Innodb and transactions, you have to perform the Lock Tables after the start of the transaction.
I am not advocating this method, as there is usually a better way of handling, however if you need a quick and dirty solution, this will work with a minimal amount of code changes.

Categories