I'm currently developing a website and while working on database design, i had some concern on concurrency issue, I'm considering using timestamping to avoid this.
My understanding in timestamping is that it works this way:
There is a field for let's say "DateModified" wherein be updated every update on that specific row.
Then whenever there are 1 or more users accessing that row like reading first then eventually update it.
In my understaning of timestamping for this to work, I need a condition that will read first the "DateModified" like in my code.
readdatemodified = Select DateModified From Transaction where ID = ?
datemodified = Select DateModified From Transaction where ID = ?
IF datemodified == readdatemodified
UPDATE Transaction where ID = ?
ELSE
Message "There's someone updated the record. Please try again".
IF: UPDATE the record successfully
ELSE: Here the record will be retrieve again by accessing the database to ensure that the record is the updated one.
I solved the concurrency issue here but my new concern is how I access the database.
I will accessed the database multiple times every update?
Is there a way wherein I could minimize the database access using timestamping?
If you want the concurrency / checks in application logic, then try a CAS (Check And Set) algorithm, if you want concurrent changes to not happen, use transactions (as mentioned by Acyclic Tau)
Have you considered not using timestamping, but using transactions and locking reads:
http://dev.mysql.com/doc/refman/5.0/en/innodb-locking-reads.html
This might be a better solution to your problem. MySQL 'select for update' behaviour shows some examples of behaviour in the question.
The capabilities provided by locking are dependent on the underlying database engine you use:
MyISAM - Table level locking
InnoDB - Row level locking
A good overal description of capabilities and advantages can be found on the MySQL site here: http://dev.mysql.com/doc/refman/5.0/en/internal-locking.html
Related
I have got a table which has an id (primary key with auto increment), uid (key refering to users id for example) and something else which for my question won’t matter.
I want to make, lets call it, different auto-increment keys on id for each uid entry.
So, I will add an entry with uid 10, and the id field for this entry will have a 1 because there were no previous entries with a value of 10 in uid. I will add a new one with uid 4 and its id will be 3 because I there were already two entried with uid 4.
...Very obvious explanation, but I am trying to be as explainative an clear as I can to demonstrate the idea... clearly.
What SQL engine can provide such a functionality natively? (non Microsoft/Oracle based)
If there is none, how could I best replicate it? Triggers perhaps?
Does this functionality have a more suitable name?
In case you know about a non SQL database engine providing such a functioality, name it anyway, I am curious.
Thanks.
MySQL's MyISAM engine can do this. See their manual, in section Using AUTO_INCREMENT:
For MyISAM tables you can specify AUTO_INCREMENT on a secondary column in a multiple-column index. In this case, the generated value for the AUTO_INCREMENT column is calculated as MAX(auto_increment_column) + 1 WHERE prefix=given-prefix. This is useful when you want to put data into ordered groups.
The docs go on after that paragraph, showing an example.
The InnoDB engine in MySQL does not support this feature, which is unfortunate because it's better to use InnoDB in almost all cases.
You can't emulate this behavior using triggers (or any SQL statements limited to transaction scope) without locking tables on INSERT. Consider this sequence of actions:
Mario starts transaction and inserts a new row for user 4.
Bill starts transaction and inserts a new row for user 4.
Mario's session fires a trigger to computes MAX(id)+1 for user 4. You get 3.
Bill's session fires a trigger to compute MAX(id). I get 3.
Bill's session finishes his INSERT and commits.
Mario's session tries to finish his INSERT, but the row with (userid=4, id=3) now exists, so Mario gets a primary key conflict.
In general, you can't control the order of execution of these steps without some kind of synchronization.
The solutions to this are either:
Get an exclusive table lock. Before trying an INSERT, lock the table. This is necessary to prevent concurrent INSERTs from creating a race condition like in the example above. It's necessary to lock the whole table, since you're trying to restrict INSERT there's no specific row to lock (if you were trying to govern access to a given row with UPDATE, you could lock just the specific row). But locking the table causes access to the table to become serial, which limits your throughput.
Do it outside transaction scope. Generate the id number in a way that won't be hidden from two concurrent transactions. By the way, this is what AUTO_INCREMENT does. Two concurrent sessions will each get a unique id value, regardless of their order of execution or order of commit. But tracking the last generated id per userid requires access to the database, or a duplicate data store. For example, a memcached key per userid, which can be incremented atomically.
It's relatively easy to ensure that inserts get unique values. But it's hard to ensure they will get consecutive ordinal values. Also consider:
What happens if you INSERT in a transaction but then roll back? You've allocated id value 3 in that transaction, and then I allocated value 4, so if you roll back and I commit, now there's a gap.
What happens if an INSERT fails because of other constraints on the table (e.g. another column is NOT NULL)? You could get gaps this way too.
If you ever DELETE a row, do you need to renumber all the following rows for the same userid? What does that do to your memcached entries if you use that solution?
SQL Server should allow you to do this. If you can't implement this using a computed column (probably not - there are some restrictions), surely you can implement it in a trigger.
MySQL also would allow you to implement this via triggers.
In a comment you ask the question about efficiency. Unless you are dealing with extreme volumes, storing an 8 byte DATETIME isn't much of an overhead compared to using, for example, a 4 byte INT.
It also massively simplifies your data inserts, as well as being able to cope with records being deleted without creating 'holes' in your sequence.
If you DO need this, be careful with the field names. If you have uid and id in a table, I'd expect id to be unique in that table, and uid to refer to something else. Perhaps, instead, use the field names property_id and amendment_id.
In terms of implementation, there are generally two options.
1). A trigger
Implementations vary, but the logic remains the same. As you don't specify an RDBMS (other than NOT MS/Oracle) the general logic is simple...
Start a transaction (often this is Implicitly already started inside triggers)
Find the MAX(amendment_id) for the property_id being inserted
Update the newly inserted value with MAX(amendment_id) + 1
Commit the transaction
Things to be aware of are...
- multiple records being inserted at the same time
- records being inserted with amendment_id being already populated
- updates altering existing records
2). A Stored Procedure
If you use a stored procedure to control writes to the table, you gain a lot more control.
Implicitly, you know you're only dealing with one record.
You simply don't provide a parameter for DEFAULT fields.
You know what updates / deletes can and can't happen.
You can implement all the business logic you like without hidden triggers
I personally recommend the Stored Procedure route, but triggers do work.
It is important to get your data types right.
What you are describing is a multi-part key. So use a multi-part key. Don't try to encode everything into a magic integer, you will poison the rest of your code.
If a record is identified by (entity_id,version_number) then embrace that description and use it directly instead of mangling the meaning of your keys. You will have to write queries which constrain the version number but that's OK. Databases are good at this sort of thing.
version_number could be a timestamp, as a_horse_with_no_name suggests. This is quite a good idea. There is no meaningful performance disadvantage to using timestamps instead of plain integers. What you gain is meaning, which is more important.
You could maintain a "latest version" table which contains, for each entity_id, only the record with the most-recent version_number. This will be more work for you, so only do it if you really need the performance.
I created a ticketing system that in its simplest form just records a user joining the queue, and prints out a ticket with the queue number.
When the user presses for a ticket, the following happens in the database
INSERT details INTO All_Transactions_Table
SELECT COUNT(*) as ticketNum FROM All_Transactions_Table WHERE date is TODAY
This serves me well in most cases. However, I recently started to see some duplicate ticket numbers. I cant seem to replicate the issue even after running the web service multiple times myself.
My guess of how it could happen is that in some scenarios the INSERT happened only AFTER the SELECT COUNT. But this is an InnoDB table and I am not using INSERT DELAYED. Does InnoDB have any of such implicit mechanisms?
I think your problem is that you have a race condition. Imagine that you have two people that come in to get tickets. Here's person one:
INSERT details INTO All_Transactions_Table
Then, before the SELECT COUNT(*) can happen, person two comes along and does:
INSERT details INTO All_Transactions_Table
Now both users get the same ticket number. This can be very hard to replicate using your existing code because it depends on the exact scheduling of threads withing MySQL which is totally beyond your control.
The best solution to this would be to use some kind of AUTO_INCREMENT column to provide the ticket number, but failing that, you can probably use transactions to achieve what you want:
START TRANSACTION
SELECT COUNT(*) + 1 as ticketNum FROM All_Transactions_Table WHERE date is TODAY FOR UPDATE
INSERT details INTO All_Transactions_Table
COMMIT
However, whether or not this works will depend on what transaction isolation level you have set, and it will not be very efficient.
I am building a PHP RESTful-API for remote "worker" machines to self-assign tasks. The MySQL InnoDB table on the API host holds pending records that the workers can pick up from the API whenever they are ready to work on a record. How do I prevent concurrently requesting worker system from ever getting the same record?
My initial plan to prevent this is to UPDATE a single record with a uniquely generated ID in a default NULL field, and then poll for the details of the record where the unique ID field matches.
For example:
UPDATE mytable SET status = 'Assigned', uniqueidfield = '3kj29slsad'
WHERE uniqueidfield IS NULL LIMIT 1
And in the same PHP instance, the next query:
SELECT id, status, etc FROM mytable WHERE uniqueidfield = '3kj29slsad'
The resulting record from the SELECT statement above is then given to the worker. Would this prevent simultaneously requesting workers from getting the same records shown to them? I am not exactly sure on how MySQL handles the lookups within an UPDATE query, and if two UPDATES could "find" the same record, and then update it sequentially. If this works, is there a more elegant or standardized way of doing this (not sure if FOR UPDATE would need to be applied to this)? Thanks!
Nevermind my previous answer. I believe I understand what you are asking. I'll reword it so maybe it is clearer to others.
"If I issue two of the above update statements at the same time, what would happen?"
According to http://dev.mysql.com/doc/refman/5.0/en/lock-tables-restrictions.html, the second statement would not interfere with the first one.
Normally, you do not need to lock tables, because all single UPDATE
statements are atomic; no other session can interfere with any other
currently executing SQL statement.
A more elegant way is probably opinion based, but I don't see anything wrong with what you're doing.
Just to give you an example:
I have a PHP script that manages users votes.
When a user votes, the script makes a query to check if someone has already voted for the same ID/product. If nobody has voted, then it makes another query and insert the ID into a general ID votes table and another one to insert the data into a per user ID votes table. And this kind of behavior is repeated in other kind of scripts.
The question is, if two different users votes simultaneously its possible that the two instances of the code try to insert a new ID (or some similar type of query) that will give an error??
If yes, how I prevent this from happening?
Thanks?
Important note: I'm using MyISAM! My web hosting don't allow InnoDB.
The question is, if two different users votes simultaneously its possible that the two instances of the
code try to insert a new ID (or some similar type of query) that will give an erro
Yes, you might end up with two queries doing the insert. Depending on the constraints on the table, one of them will either generate an error, or you'll end up with two rows in your database.
You could solve this, I believe, with applying some locking;
e.g. if you need to add a vote to the product with id theProductId:(pseudo code)
START TRANSACTION;
//lock on the row for our product id (assumes the product really exists)
select 1 from products where id=theProductId for update;
//assume the vote exist, and increment the no.of votes
update votes set numberOfVotes = numberOfVotes + 1 where productId=theProductId ;
//if the last update didn't affect any rows, the row didn't exist
if(rowsAffected == 0)
insert into votes(numberOfVotes,productId) values(1,theProductId )
//insert the new vote in the per user votes
insert into user_votes(productId,userId) values(theProductId,theUserId);
COMMIT;
Some more info here
MySQL offers another solution as well, that might be applicable here, insert on duplicate
e.g. you might be able to just do:
insert into votes(numberOfVotes,productId) values(1,theProductId ) on duplicate key
update numberOfVotes = numberOfVotes + 1;
If your votes table have a unique key on the product id column, the above will
do an insert if the particular theProductId doesn't exist, otherwise it will do an update, where it increments the numberOfVotes column by 1
You could probably avoid a lot of this if you created a row in the votes table at the same time you added the product to the database. That way you could be sure there's always a row for your product, and just issue an UPDATE on that row.
The question is, if two different
users votes simultaneously its
possible that the two instances of the
code try to insert a new ID (or some
similar type of query) that will give
an error??
Yes, in general this is possible. This is an example of a very common problem in concurrent systems, called a race condition.
Avoiding it can be rather tricky, but in general you need to make sure that the operations cannot interleave in the way you describe, e.g. by locking the database for a while.
There are several practical solutions to this, all with their own advantages and risks (e.g. dead locks). See the Wikipedia article for a discussion and further pointers to information.
The easiest way:
LOCK TABLES table1 WRITE, table2 WRITE, table3 WRITE
-- check for record, insert if not exists, etc...
UNLOCK TABLES
If voting does not occur many times per second, then the above should be sufficient.
InnoDB tables offer transactions, which might be useful here as well. Others have already commented on it, so I won't go into any detail.
Alternatively, you could solve it at the code level via using some sort of shared memory mutex that disables concurrent execution of that section of PHP code.
This when the singleton pattern come in handy. It ensure that a code is executed only by one process at an instant.
http://en.wikipedia.org/wiki/Singleton_pattern
You have to make a singleton class for the database access this will prevent you from the type of error you describing.
Cheers.
what is the advantage of using pdo begintransaction, is this and mysql db lock are same?
I have a table with urls and status column, whenever my application loads 10 urls I need to update the status column as loaded. This application will be accessed by couple of users simultaneously, how would I prevent user B from loading the same urls loaded by user A and before the update of the status column.
Please could anyone help me.
Transactions and table locks do different things. In your case, probably the easiest way to accomplish what you want is:
Lock the table for writing
Select 10 URLs where status = new
Set those 10 URLs to be status = processing
Unlock the table
For each URL, process, and set status = done
PDO::beginTransaction will make possible to rollbak changes if something went wrong with PDO::rollback, while lock tables will not.