multi-user application record locking - best method?

multi-user application record locking - best method? - php

I'm developing a php / mysql application that handles multiple simultaneous users. I'm thinking of the best approach to take when it comes to locking / warning against records that are currently being viewed / edited.
The scenario to avoid is two users viewing the record, one making a change, then the other doing likewise - with the potential that one change might overwrite the previous.
In the latest versions of WordPress they use some method to detect this, but it does not seem wholly reliable - often returning false positives, at least in my experience.
I assume some form of ajax must be in place to 'ping' the application and let it know the record is still being viewed / edited (otherwise, a user might simply close their browser window, and then how would the application know that).
Another solution I could see is to check the last updated time when a record is submitted for update, to see if in the interim it has been updated elsewhere - and then offer the user a choice to proceed or discard their own changes.
Perhaps I'm barking up the wrong tree in terms of a solution - what are peoples experiences of implementing this (what must be a fairly common) requirement?

I would do this: Store the time of the last modification in the edit form. Compare this time on submission with the time stored in the database. If they are the same, lock the table, update the data (along with the modification time) and unlock the table. If the times are different, notify the user about it and ask for the next step.

Good idea with the timestamp comparison. It's inexpensive to implement, and it's an inexpensive operation to run in production. You just have to write the logic to send back to the user the status message that their write/update didn't occur because someone beat them to it.
Perhaps consider storing the username on each update in a field called something like 'LastUpdateBy', and return that back to the user who had their update pre-empted. Just a little nicety for the user. Nice in the corporate sense, perhaps not in an environment where it might not be appropriate.

Related

Does counterCache suffer from race conditions?

I have a question about counterCache that the documentation doesn't clarify at all.
Does counterCache checks race conditions when updating the field value?
For example, let's say we have a forum implementation, and for each forum, we have a number n of topics stored via counterCache. Then, if two users use the model at almost same time (enough to overlap their operations, meaning that when one ends it, the other one still will be using it), and one create a new topic and the other one (assuming it can) deletes another topic, then will it show exactly n topics, and not n+1 or n-1?

Whatever operation finishes first will be the result. It is very unlikely and probably technically impossible that the operation happen at the same time. So one of both comes first. Then there is also the time it takes for the request to be received by the server and browser. So whenever the next view request comes you'll get whatever at this point was updated in the database.
By checking the code you can also see that the code is doing a find('count') instead of inc/decrementing by +/- one. http://api.cakephp.org/2.3/source-class-Model.html#1913-1981 So the cache gets written after the previous action was completed.
And finally I would really not worry about if the count is off by +/- one for a moment, specially in a forum.

I believe it could, but not because of Cake but because multiple SQL operations in a non transactional behaviour, and from multiple clients (users) at the same time. Any web application with or without a framework would suffer from this situations.
I'm not very proficient with SQL transactions, but I'm pretty sure some kind of transaction configuration would prevent this form happening.
As burzum says, the forum example is definitely not something to worry about.

Do PHP pages on a server run simultaneously?

This probably seems like a very simple question, and I would probably know if I had a more in depth knowledge of computer processes and the like, but anyway..
If two people request the same page from my server, is the PHP page processed once for the first person, and then a second time for the second person, or might these run along side each other at the same time?
Take this as an example. I have one stock Item left in my PHP driven online shop. A user adds this to their cart. Php script 1) checks to see if it is in stock, Yup, its in stock, so it 2)reserves it for him.
What If, in between checking if its in stock and reserving it, the same PHP page was loading for someone else, and just after user A checked if it was in stock, so did user B, before user A got a chance to reserve it, so they both end up reserving it!
Sorry if this seems silly, can't seem to find an answer on it, which is it?

Congratulations, you have identified a race condition! :-)
Whether PHP pages run in parallel or one after the other depends on the web server. Typically a web server allocates several threads to handle multiple incoming requests at once. So it may indeed happen that several instances of the same script are run in parallel if two or more users request the same page at the same time. Due to timing and scheduling differences it is unpredictable when each page will execute which action exactly.
Hence for such situations as you describe it is important to program actions in an atomic way, meaning that they either complete in their entirety or not at all. In your case you could use locks, transactions, cleverly formed UPDATE statements, UNIQUE indexes or a number of other techniques that avoid the possibility of two users reserving the same thing.

Yes, in general, without getting into too much detail: PHP scripts are executed simultanously for each request separately.
For making sure the problem you mentioned does not occur, you should probably implement feature of your database management system called "transactions". This way if you do something on the database layer and at the end you will find out the reservation can not happen, all the actions made within transaction will be rolled back.
In addition to transactions you should design your application keeping in mind that the problem you mentioned may occur. Thus you should design your database & application in a way allowing you to 1) shorten the time between "checking" and "reserving" as much as possible, 2) stopping the action if you cannot make reservation, and finally - in case of emergency - 3) identifying which reservation came first and which should be revoked.
Another idea, falling into category of "your application's design", may be something we could call "temporary reservation". That means you can temporarily (eg. for a couple of seconds) lock your reservation if you are about to make reservation. After that you can check if you really can make that reservation and either turn it into permanent reservation or just revoke it. I believe some systems also make longer temporary reservations right after the customer begins the process of reserving his/her places. Then, if the process is successful, the reservation is changed into permanent, but if some specific amount of time passes without success, the reservation can be simply revoked, allowing another customer to begin the process.

yes definately, they are parallel for php but when the database concerns you should learn transaction portion of database management system.

Yes and no. PHP may run in simultaneous processes depending on server setup, but on a small-scale, you'll only have one database. Database queries are handled sequentially, so you'll never have that kind of conflict. (As long as you check to see if an item's in stock immediately before you reserve it for someone.) More information
Of course, Users A + B might both see that it's in stock, and A might request it before B. But your code can realize that it's now out of stock and display an error to User B.
(You get into trouble with multiple database servers. If you have the same data stored across multiple servers, there's lag time before data can be fully replicated. But you won't have that issue. We're talking like top 1,000 sites here.)

What are the number of ways in which my approach to a news-feed is wrong?

This question has been asked a THOUSAND times... so it's not unfair if you decide to skip reading/answering it, but I still thought people would like to see and comment on my approach...
I'm building a site which requires an activity feed, like FourSquare.
But my site has this feature for the eye-candy's sake, and doesn't need the stuff to be saved forever.
So, I write the event_type and user_id to a MySQL table. Before writing new events to the table, I delete all the older, unnecessary rows (by counting the total number of rows, getting the event_id lesser than which everything is redundant, and deleting those rows). I prune the table, and write a new row every time an event happens. There's another user_text column which is NULL if there is no user-generated text...
In the front-end, I have jQuery that checks with a PHP file via GET every x seconds the user has the site open. The jQuery sends a request with the last update "id" it received. The <div> tags generated by my backend have the "id" attribute set as the MySQL row id. This way, I don't have to save the last_received_id in memory, though I guess there's absolutely no performance impact from storing one variable with a very small int value in memory...
I have a function that generates an "update text" depending on the event_type and user_id I pass it from the jQuery, and whether the user_text column is empty. The update text is passed back to jQuery, which appends the freshly received event <div> to the feed with some effects, while simultaneously getting rid of the "tail end" event <div> with an effect.
If I (more importantly, the client) want to, I can have an "event archive" table in my database (or a different one) that saves up all those redundant rows before deleting. This way, event information will be saved forever, while not impacting the performance of the live site...
I'm using CodeIgniter, so there's no question of repeated code anywhere. All the pertinent functions go into a LiveUpdates class in the library and model respectively.
I'm rather happy with the way I'm doing it because it solves the problem at hand while sticking to the KISS ideology... but still, can anyone please point me to some resources, that show a better way to do it? A Google search on this subject reveals too many articles/SO questions, and I would like to benefit from the experience any other developer that has already trawled through them and found out the best approach...

If you use proper indexes there's no reason you couldn't keep all the events in one table without affecting performance.
If you craft your polling correctly to return nothing when there is nothing new you can minimize the load each client has on the server. If you also look into push notification (the hybrid delayed-connection-closing method) this will further help you scale big successfully.
Finally, it is completely unnecessary to worry about variable storage in the client. This is premature optimization. The performance issues are going to be in the avalanche of connections to the web server from many users, and in the DB, tables without proper indexes.
About indexes: An index is "proper" when the most common query against a table can be performed with a seek and a minimal number of reads (like 1-5). In your case, this could be an incrementing id or a date (if it has enough precision). If you design it right, the operation to find the most recent update_id should be a single read. Then when your client submits its ajax request to see if there is updated content, first do a query to see if the value submitted (id or time) is less than the current value. If so, respond immediately with the new content via a second query. Keeping the "ping" action as lightweight as possible is your goal, even if this incurs a slightly greater cost for when there is new content.
Using a push would be far better, though, so please explore Comet.
If you don't know how many reads are going on with your queries then I encourage you to explore this aspect of the database so you can find it out and assess it properly.
Update: offering the idea of clients getting a "yes there's new content" answer and then actually requesting the content was perhaps not the best. Please see Why the Fat Pings Win for some very interesting related material.

Best way to handle concurrency issues

i have a LAPP (linux, apache, postgresql and php) environment, but the question is pretty the same both on Postgres or Mysql.
I have an cms app i developed, that handle clients, documents (estimates, invoices, etc..) and other data, structured in 1 postgres DB with many schemas (one for each our customer using the app); let's assume around 200 schemas, each of them used concurrently by 15 people (avg).
EDIT: I do have an timestamp field named last_update on every table, and a trigger that update the timestamp every time the row is update.
The situation is:
People Foo and Bar are editing the document 0001, using a form with every document details.
Foo change the shipment details, for example.
Bar change the phone numbers, and some items in the document.
Foo press the 'Save' button, the app update the db.
Bar press the 'Save' button after bar, resending the form with the old shipment details.
In the database, the Foo changes have been lost.
The situation i want to have:
People Foo, Bar, John, Mary, Paoul are editing the document 0001, using a form with every document details.
Foo change the shipment details, for example.
Bar and the others change something else.
Foo press the 'Save' button, the app update the db.
Bar and the others get an alert 'Warning! this document has been changet by someone else. Click here to load the actuals data'.
I've wondered to use ajax to do this; simply using an hidden field with the id of the document and the last-updated timestamp, every 5 seconds check if the last-updated time is the same and do nothing, else, show the alert dialog box.
So, the page check-last-update.php should look something like:
<?php
//[connect to db, postgres or mysql]
$documentId = isset($_POST['document-id']) ? $_POST['document-id'] : 0;
$lastUpdateTime = isset($_POST['last-update-time']) ? $_POST['last-update-time'] : 0;
//in the real life i sanitize the data and use prepared statements;
$qr = pg_query("
SELECT
last_update_time
FROM
documents
WHERE
id = '$documentId'
");
$ray = pg_fetch_assoc($qr);
if($ray['last_update_time'] > $lastUpdateTime){
//someone else updated the document since i opened it!
echo 'reload';
}else{
echo 'ok';
}
?>
But i dont like to stress the db every 5 seconds for every user that have one (or more...) documents opened.
So, what can be another efficent solution without nuking the db?
I thought to use files, creating for example an empty txt file for each document, and everytime the document is updated, i 'touch' the file updating the 'last modified time' as well... but i guess that this would be slower than db and give problems when i have much users editing the same document.
If someone else have a better idea or any suggestion, please describe it in details!
* - - - - - UPDATE - - - - - *
I definitely choosen to NOT hit the db for check the 'last update timestamp', dont mind if the query will be pretty fast, the (main) database server has other tasks to fullfill, dont like the idea to increase his overload for that thing.
So, im taking this way:
Every time a document is updated by someone, i must do something to sign the new timestamp outside the db environment, e.g. without asking the db. My ideas are:
File-system: for each document i create an empry txt files named as the id of the document, everytime the document is update, i 'touch' the file. Im expecting to have thousands of those empty files.
APC, php cache: this will be probably a more flexible way than the first one, but im wondering if keeping thousands and thousands of data permanently in the apc wont slow down the php execution itself, or consume the server memory. Im little bit afraid to choose this way.
Another db, sqlite or mysql (that are faster and lighter with simple db structures) used to store just the documents ID and timestamps.
Whatever way i choose (files, apc, sub-db) im seriously thinking to use another web-server (lighttp?) on a sub-domain, to handle all those.. long-polling requests.
YET ANOTHER EDIT:
The file's way wouldnt work.
APC can be the solution.
Hitting the DB can be the solution too, creating a table just to handle the timestamps (with only two column, document_id and last_update_timestamp) that need to be as fast and light as possible.
Long polling: that's the way i'll choose, using lighttpd under apache to load static files (images, css, js, etc..), and just for this type of long-polling; This will lighten the apache2 load, specially for the polling.
Apache will proxy-up all those request to lighttpd.
Now, i only have to decide between db solution and APC solution..
p.s: thanks to all whom already answered me, you have been really usefull!

I agree that I probably wouldn't hit the database for this. I suppose I would use APC cache (or some other in-memory cache) to maintain this information. What you are describing is clearly optimistic locking at the detailed record level. The higher the level in the database structure the less you need to deal with. It sounds like you want to check with multiple tables within a structure.
I would maintain a cache (in APC) of the IDs and the timestamps of the last updated time keyed by the table name. So for example I might have an array of table names where each entry is keyed by ID and the actual value is the last updated timestamp. There are probably many ways to set this up with arrays or other structures but you get the idea. I would probably add a timeout to the cache so that entries in the cache are removed after a certain period of time - i.e., I wouldn't want the cache to grow and assume that 1 day old entries aren't useful anymore).
With this architecture you would need to do the following (in addition to setting up APC):
on any update to any (applicable) table, update the APC cache entry with the new timestamp.
within ajax just go as far "back" as php (to obtain the APC cache to check the entry) rather than all of the way "back" to the database.

I think you can use a condition in the UPDATE statement like WHERE ID=? AND LAST_UPDATE=?.
The idea is that you will only succeed in updating when you are the last one reading that row. If someone else has committed something, you will fail, and once you know you've failed, you can query the changes.

Hibernate uses a version field to do that. Give every table such a field and use a trigger to increment it on every update. When storing an update, compare the current version with the version when the data was read earlier. If those don't match, throw an exception. Use transactions to make the check-and-update atomic.

You will need some type of version stamp field for each record. What it is doesn't matter as long as you can guarantee that making any change to a record will result in that version stamp being different. Best practice is to then check and make sure the loaded record's version stamp is the same as the version stamp in the DB when the user clicks save, and if it's different handle it.
How you handle it is up to you. At the very least you'd want to offer to reload from the DB so the user can verify that they still want to save. One up from that would be to attempt to merge their changes into the new DB record and then ask them to verify that the merge worked correctly.
If you want to periodically poll any DB capable of handling your system should be able to take the poll load. 10 users polling once every 5 seconds is 2 transactions per second. This is a trivial load, and should be no problem at all. To keep the average load close to the actual load, just jitter the polling time slightly (instead of doing it exactly every 5 seconds, do it every 4-6 seconds, for example).

Donnie's answer (polling) is probably your best option - simple and works. It'll cover almost every case (its unlikely a simple PK lookup would hurt performance, even on a very popular site).
For completeness, and if you wanted to avoid polling, you can use a push-model. There's various ways described in the Wikipedia article. If you can maintain a write-through cache (everytime you update the record, you update the cache), then you can almost completely eliminate the database load.
Don't use a timestamp "last_updated" column, though. Edits within the same second aren't unheard of. You could get away with it if you add extra information (server that did the update, remote address, port, etc) to ensure that, if two requests came in at the same second, to the same server, you could detect the difference. If you need that precision, though, you might as well use a unique revision field (it doesn't necessarily have to be an incrementing integer, just unique within that record's lifespan).
Someone mentioned persistent connections - this would reduce the setup cost of the polling queries (every connection consumes resources on the database and host machine, naturally). You would keep a single connection (or as few as possible) open all the time (or as long as possible) and use that (in combination with caching and memoization, if desired).
Finally, there are SQL statements that allow you to add a condition on UPDATE or INSERT. My SQl is really rusting, but I think its something like UPDATE ... WHERE .... To match this level of protection, you would have to do your own row locking prior to sending the update (and all the error handling and cleanup that might entail). Its unlikely you'd need this; I'm just mentioning it for completness.
Edit:
Your solution sounds fine (cache timestamps, proxy polling requests to a another server). The only change I'd make is to update the cached timestamps on every save. This will keep the cache fresher. I'd also check the timestamp directly from the db when saving to prevent a save sneaking in due to stale cache data.
If you use APC for caching, then a second HTTP server doesn't make sense - you'd have to run it on the same machine (APC uses shared memory). The same physical machine would be doing the work, but with the additional overhead of a second HTTP server. If you want to off load the polling requests to a second server (lighttpd, in your case), then it would be better to setup lightttpd in front of Apache on a second physical machine and use a shared caching server (memcache) so that the lighttpd server can read the cached timestamps, and Apache can update the cached timestamps. The rationale for putting lighttpd in front of Apache is, if most requests are polling requests, to avoid the heavier-weight Apache process usage.
You probably don't need a second server at all, really. Apache should be able to handle the additional requests. If it can't, then I'd revisit your configuration (specifically the directives that control how many worker processes you run and how many requests they are allowed to handle before being killed).

Your approach of querying the database is the best one. If you do it every 5 seconds and you have 15 concurrent users then you're looking at ~3 queries a second. It should be a very small query too, returning only one row of data. If your database can't handle 3 transactions a second then you might have to look at a better database because 3 queries/second is nothing.
Timestamp the records in the table so you can quickly see if anything has changed without having to diff each field.

This is slightly off topic, but you can use the PEAR package (or PECL package, I forget which) xdiff to send back good user guidance when you do get a collision.

First off only update the fields that have changed on when writing to the database, this will decrease database load.
Second, query the timestamp of the last update, if you have a older timestamp then the current version in the database then throw the warning to the client.
Third is to somehow push this information to the client, though some kind of persistent connection with the server, enabling a concurrent two way connection.

Polling is rarely a nice solution.
You could do the timstamp check only when the user (with the open document) is doing something active with the document like scrolling, moving the mouse over it or starts to edit. Then the user gets an alert if the document has been changed.
.....
I know it was not what you asked for but ... why not a edit-singleton?
The singleton could be a userID column in the document-table.
If a user wants to edit the document, the document is locked for edit by other users.
Or have edit-singletons on the individual fields/groups of information.
Only one user can edit the document at a time. If another user has the document open and want to edit a single timestamp check reveal that the document has been altered and is reloaded.
With a singleton there is no polling and only one timestamp check when the user "touches" and/or wants to edit the document.
But perhaps a singleton mechanism doesn't fit your system.
Regards
Sigersted

Ahhh, i though it was easyer.
So, lets make the point: i have a generic database (pgsql or mysql doesn't matter), that contains many generic objects.
I have $x (actually $x = 200, but is growing, hoping will reach 1000 soon) of exact copy of this database, and for each of them up to 20 (avg 10) users for 9 hours at day.
If one of those users is viewing a record, any record, i must advice him if someone edit the same record.
Let's say Foo is watching the document 0001, sit up for a coffee, Bar open and edit the same document, when Foo come back he must see an 'Warning, someone else edited this document! click here to refresh tha page.'.
That'all i need atm, probably i'll extend this situation, adding a way to see the changes and rollback, but this is not the point.
Some of you suggested to check the 'last update' timestamp only when foo try to save the document; Can be a solution too, but i need something in real-time ( 10 sec deelay ).
Long polling, bad way, but seem to be the only one.
So, what i've done:
Installed Lighttp on my machine (and php5 as fastcgi);
Loaded apache2's proxy module (all, or 403 error will hit you);
Changed the lighttpd port from 80 (that is used by apache2) to 81;
Configured apache2 to proxying the request from mydomain.com/polling/* to polling.mydomain.com (served with Lighttp)
Now, i have another sub http-service that i'll use both for polling and load static content (images, etc..), in order to reduce the apache2's load.
Becose i dont want to nuke the database for the timestamp check, i've tryed some caches system (that can be called from php).
APC: quite simple to install and manage, very lightweight and faster, this would be my first choice.. if only the cache would be sharable between two cgi process (i need to store in cache a value from apache2's php process, and read it from lighttpd's php process)
Memcached: around 4-5 times slower than APC, but run as a single process that can be touched everywhere in my environment. I'll go with this one, atm. (even if is slower, the use i'll do of it is relatively simple).
Now, i just have to try this system loading some test datas to see ho will move 'under pressure' and optimize it.
I suppost this environment will work for other long-polling situations (chat?)
Thanks to everyone who gave me hear!

I suggest: when you first query the record that might be changed, hang onto a local copy. When "updating", compare the copy in the locked table/row against your copy, and if it's changed, kick it back to the user.

PHP - Is this a good method to prevent re-submission?

This is related to preventing webform resubmission, however this time the context is a web-based RPG. After the player defeats a monster, it would drop an item. So I would want to prevent the user from hitting the back button, or keep refreshing, to 'dupe' the item-drop.
As item drop is frequent, using a DB to store a unique 'drop-transaction-id' seems infeasible to me. I am entertaining an idea below:
For each combat, creating an unique value based on the current date-time, user's id and store it into DB and session. It is possible that given a userid, you can fetch the value back
If the value from session exists in the DB, then the 'combat' is valid and allow the user to access all pages relevant to combat. If it does not exist in DB, then a new combat state is started
When combat is over, the unique value is cleared from DB.
Values which is 30mins old in the DB are purged.
Any opinions, improvements, or pitfalls to this method are welcomed

This question is very subjective, there's things you can do or can not do, depending on the already existing data / framework around it.
The solution you've provided should work, but it depends on the unique combat/loot/user data you have available.
I take it this is what you think is best? It's what I think is best :)
Get the userID, along with a unique piece of data from that fight. Something like combat start time, combat end time, etc
Store it in a Database, or what ever storage system you have
Once you collect the loot, delete that record
That way if the that userID, and that unique fight data exists, they haven't got their loot.
And you are right; tracking each piece of loot is too much, you're better off temporarily storing the data.

Seems like a reasonable approach. I assume you're storing the fact that the player is in combat somewhere anyway. Otherwise, they can just close their browser if they want to avoid a fight?
The combat ending and loot dropping should be treated as an atomary operation. If there is no fight, there can't be any dropping loot.

That depends on your game design: Do you go more in the direction of roguelikes where only turns count, and therefore long pauses in between moves are definitely possible (like consulting other people via chatroom, note: in NetHack that is not considered cheating)? Can users only save their games on certain points or at any place? That makes a huge difference in the design, e.g. making way for exploits similar to the one Thorarin mentions.
If your game goes the traditional roguelike route of only one save, turn basement and permadeath, then it would be possible to save the number of the current turn for any given character along with any game related information (inventory, maps, enemies and their state), and then check against that at any action of the player, therefore to prevent playing the turn twice.

Alternatively you could bundle everything up in client side javascript, so that even if they did resubmit the form it would generate an entirely new combat/treasure encounter.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.