I'm working on an AJAX application. The user clicks a button and his name is saved into the database and shown inside a <div>, whose content is fetched from the database by means of an AJAX Long Polling. The database also contains a timestamp which represents an expiration: subscriptions beyond that timestamp must not be accepted. There is also a limit for users to subscribe.
I have a PHP script that is called by an AJAX request, this script queries the database and checks for expiration (the timestamp of the click is computed by JavaScript and sent via AJAX). It also checks for user limit: i have a N-to-N relationship between Users and Products (to subscribe for). These tasks obviously take time and I'm worried about possible concurrency problems. Should I use database transactions? What technique could I use to ensure the atomicity of this operation?
It depends on the kind of work that is done those "long" tasks.
Generic info:
If you're only inserting user driven data and data generated in PHP without it being read and/or cross-correlated with data fetched from the DB then transactionality should not be an issue.
If you're updating data and cross-correlating it with other elements in the DB then you need to start using transactions and to carefully choose the isolation levels of the transactions you plan on using.
Transactions can seriously affect speed when concurrency rises. Choosing a very safe isolation level may be safer than needed for your application and you may be adding a lot of unnecessary work to the MVCC.
Also using transactions as separate PHP api calls and managing the rollback logic in the application increases the overall duration of the transaction because it adds all the processing delays generated by PHP. If you can compact DB communications into a set of queries requested in one communication it would be better.
Case info:
Let's consider this scenario: there are 8 slots, 7 users subscribed. Two users click the subscribe button almost simultaneously. When the control script is launched for the last clicking user, the query for the subscription of the first clicking user might still be executed. This would imply that the system accepts both users as valid subscriptions.
This falls into the second case I explained, the case when you're cross-correlating user driven data with what you have in the DB. You're reading the state of the db before you commit the user drive data, so yes you would need transactions in this case.
There may be a possibility to speculate the inherent atomicity of one update statement. Any UPDATE table_name SET x = x+1 WHERE a = 'value'; is guaranteed to be atomic. You can use this to your advantage.
All subscribing PHP threads must first decrement a subscriber count. If the number of affected rows on the decrement is not 0 that means that the decrement was successful and they can carry on submitting the user-related data, else inform the user he was 0.3ms too slow.
Related
When the web server receives a request for my PHP script, I presume the server creates a dedicated process to run the script. If, before the script exits, another request to the same script comes, another process gets started -- am I correct, or the second request will be queued in the server, waiting for the first request to exit? (Question 1)
If the former is correct, i.e. the same script can run simultaneously in a different process, then they will try to access my database.
When I connect to the database in the script:
$DB = mysqli_connect("localhost", ...);
query it, conduct more or less lengthy calculations and update it, I don't want the contents of the database to be modified by another instance of a running script.
Question 2: Does it mean that since connecting to the database until closing it:
mysqli_close($DB);
the database is blocked for any access from other software components? If so, it effectively prevents the script instances from running concurrently.
UPDATE: #OllieJones kindly explained that the database was not blocked.
Let's consider the following scenario. The script in the first process discovers an eligible user in the Users table and starts preparing data to append for that user in the Counter table. At this moment the script in the other process preempts and deletes the user from the Users table and the associate data in the Counter table; it then gets preempted by the first script which writes the data for the user no more existing. These data become in the head-detached state, i.e. unaccessible.
How to prevent such a contention?
In modern web servers, there's a pool of processes (or possibly threads) handling requests from users. Concurrent requests to the same script can run concurrently. Each request-handler has its own connection to the DBMS (they're actually maintained in a pool, but that's a story for another day).
The database is not blocked while individual request-handlers are using it, unless you block it explicitly by locking a table or doing a request like SELECT ... FOR UPDATE. For more information on this deep topic, read about transactions.
Therefore, it's important to write your database queries in such a way that they won't interfere with each other. For example, if you need to learn the value of an auto-incremented column right after you insert a row, you should use LAST_INSERT_ID() or mysqli_insert_id() instead of trying to query the data base: another user may have inserted another row in the meantime.
The system test discipline for scaled-up web sites usually involves a rigorous load test in order to shake out all this concurrency.
If you're doing a bunch of work on a particular entity, in your case a User, you use a transaction.
First you do
BEGIN
to start the transaction. Then you do
SELECT whatever FROM User WHERE user_id = <<whatever>> FOR UPDATE
to choose the user and mark that user's row as busy-being-updated. Then you do all the work you need to do to fill out various rows in various tables relating to that user.
Finally you do
COMMIT
If you messed things up, or don't want to go through with the change, you do
ROLLBACK
and all your changes will be restored to their state right before the SELECT ... FOR UPDATE.
Why does this work? Because if another client does the same SELECT .... FOR UPDATE, MySQL will delay that request until the first one either gets COMMIT or ROLLBACK.
If another client works with a different userid, the operations may proceed concurrently.
You need the InnoDB access method to use transactions: MyISAM doesn't support them.
Multiple reads can be done concurrently, if there is a write operation then it will block all other operations. A read will block all writes.
How to implement pessimistic locking in a php/mysql web application?
web-user opens a page to edit one dataset (row)
web-user clicks on the button "lock", so other users are able to read but not to write this dataset
web-user makes some modifications (takes maybe 1 to 30 minutes)
web-user clicks "save" or "cancel" and the "lock" is removed
Are there standard methods in php/mysql for this scenario? What happens if the web-user never clicks on "save"/"cancel" but closes the internet-exploror?
You need to implement a LOCKDATE and LOCKWHO field in your table. Ive done that in many applications outside of PHP/Mysql and it's always the same way.
The lock is terminated when the TTL has passed, so you could do a substraction of dates using NOW and LOCKDATE to see if the object has been locked for more than 30 minutes or 1h as you wish.
Another factor is to consider if the current user is the one locking the object. So thats why you also need a LOCKWHO. This can be a user_id from your database, a session_id from PHP. But keep it to something that identifies a user, an ipaddress is not a good way to do it.
Finaly, always think of a mass-unlock feature that simply resets all LOCKDATEs and LOCKWHOs...
Cheers
I would write the locks in one centralized table instead of adding fields to all tables.
Example table structure :
tblLocks
TableName (The name of tha locked table)
RowID (Primary key of locked table row)
LockDateTime (When the row was locked)
LockUser (Who locked the row)
With this approach you can find all locks that are made by a user without having to scan all tables. You could kill all locks when user logs out for example.
Traditionally this is done with a boolean locked column on the record in the database that is flagged appropriately.
It is a function of this sort of locking that the lock has to be released, and circumstances may prevent this happening naturally (system crashes, user stupidity, dropped network packets, etc etc etc). This is why you would need to provide some manual unlock method and/or impose a time limit (maybe with a cron job?) on how long a record can be locked for. You could implement some kind of AJAX poll to keep the record locked if the browser is still open? At any rate, you would probably be best to verify the data in the record is the same as it was when the lock was aquired before you modify it.
This limitation of this type of behaviour is particularly prevalent in web applications, but is true of anything that uses this approach - Sage Line 50, for one, is a bugger for it, I regularly have to delete lock files after machine/application crashes.
How long can a MySQL transaction last until it times out? I'm asking because I'm planning to code an payment process for my e-commerce project somewhere along the lines of this (PHP/MySQL psuedo-code):
START TRANSACTION;
SELECT...WHERE id IN (1,2,3) AND available = 1 FOR UPDATE; //lock rows where "available" is true
//Do payment processing...
//add to database, commit or rollback based on payment results
I can not think of another way to lock the products being bought (so if two users buy it at the same time, and there is only one left in stock, one user won't be able to buy), process payment if products are available, and create a record based on payment results...
That technique would also block users who simply wanted to see the products other people are buying. I'd be exceptionally wary of any technique that relies on database row locking to enforce inventory management.
Instead, why not simply record the number of items currently tied up in an active "transaction" (here meaning the broader commercial sense, rather than the technical database sense). If you have a current_inventory field, add an on_hold or being_paid_for or not_really_available_because_they_are_being_used_elsewhere field that you can update with information on current payments.
Better yet, why not use a purchase / payment log to sum the items currently "on hold" or "in processing" for several different users.
This is the general approach you often see on sites like Ticketmaster that declare, "You have X minutes to finish this page, or we'll put these tickets back on the market." They're recording which items the user is currently trying to buy, and those records can even persist across PHP page requests.
If you have to ask how long it is before a database connection times out, then your transactions take orders of magnitudes too long.
Long open transactions are a big problem and frequent causes of poor performance, unrepeatable bugs or even deadlocking the complete application. Certainly in a web application you want tight fast transactions to make sure all table and row level locks are quickly freed.
I found that even several 100ms can become troublesome.
Then there is the problem of sharing a transaction over multiple requests which may happen concurrently.
If you need to "emulate" long running transactions, cut it into smaller pieces which can be executed fast, and keep a log so you can rollback using the log by undoing the transactions.
Now, if the payment service completes in 98% of cases in less than 2 sec and you do not have hundreds of concurrent requests going on, it might just be fine.
Timeout depends on server settings -- both that of mysql and that of the language you are using to interact with mysql. Look in the settings files for your server.
I don't think what you are doing would cause a timeout, but if you are worried you might want to rethink the location of your check so that it doesn't actually lock the tables across queries. You could instead have a stored procedure that is built into the data layer rather than relying on two separate calls. Or, maybe a conditional insert or a conditional update?
All in all, as another person noted, I don't like the idea of locking entire table rows which you might want to be able to select from for other purposes outside of the actual "purchase" step, as it could result in problems or bottlenecks elsewhere in your application.
We have this PHP application which selects a row from the database, works on it (calls an external API which uses a webservice), and then inserts a new register based on the work done. There's an AJAX display which informs the user of how many registers have been processed.
The data is mostly text, so it's rather heavy data.
The process is made by thousands of registers a time. The user can choose how many registers to start working on. The data is obtained from one table, where they are marked as "done". No "WHERE" condition, except the optional "WHERE date BETWEEN date1 AND date2".
We had an argument over which approach is better:
Select one register, work on it, and insert the new data
Select all of the registers, work with them in memory and insert them in the database after all the work was done.
Which approach do you consider the most efficient one for a web environment with PHP and PostgreSQL? Why?
It really depends how much you care about your data (seriously):
Does reliability matter in this case? If the process dies, can you just re-process everything? Or can't you?
Typically when calling a remote web service, you don't want to be calling it twice for the same data item. Perhaps there are side effects (like credit card charges), or maybe it is not a free API...
Anyway, if you don't care about potential duplicate processing, then take the batch approach. It's easy, it's simple, and fast.
But if you do care about duplicate processing, then do this:
SELECT 1 record from the table FOR UPDATE (ie. lock it in a transaction)
UPDATE that record with a status of "Processing"
Commit that transaction
And then
Process the record
Update the record contents, AND
SET the status to "Complete", or "Error" in case of errors.
You can run this code concurrently without fear of it running over itself. You will be able to have confidence that the same record will not be processed twice.
You will also be able to see any records that "didn't make it", because their status will be "Processing", and any errors.
If the data is heavy and so is the load, considering the application is not real time dependant the best approach is most definately getting the needed data and working on all of it, then putting it back.
Efficiency speaking, regardless of language is that if you are opening single items, and working on them individually, you are probably closing the database connection. This means that if you have 1000's of items, you will open and close 1000's of connections. The overhead on this far outweighs the overhead of returning all of the items and working on them.
Handling Multi Users
Requirements:
I have an applications (mysql php jquery) where the users can:
Review records and update certain fields.
Issue invoices by selecting orders.
Issues:
The issue is that an invoice should not be issued twice for the same time period. Also, a field should not be updated by two or more users at the same time.
Possible Solutions:
Lock the tables when they get updated, and if the user performs an action, notify and reload.
Impliment lock system, that when a user performs certain actions, it locks those actions to be performed by other users.
...
Lookup 'optimistic locking' - basically means adding a version attribute and passing it back and incrementing it with updates to make sure nobody else got there first. If N users try same operation based on same version, one wins, others loose. It's fast simple easy for a wide variety of cases.
Don't know if this will help you or not but I'd first read about this in context of .Net's DataTable Adapter which tracks the changes made to the data rows since you read them and send back to db after changing. What it does is send all the fields instead of just the changed ones.
You can use time-stamps for the rows. Read the time stamp with other info and before saving check if the current time-stamp (of rows) is newer than what you have. This way you can minimize locking to just this portion, comparing time-stamps and updating if you are the first one to reach there.
Thank you both. Will look into both options: 1 optimistic locking (http://cwiki.apache.org/CAY/optimistic-locking-explained.html), and the time stamp approach.