What happens if there are two people sending the same query at the same time to the database and one makes the other query return something different?
I have a shop where there is one item left. Two or more people buy the item and the query arrives at the exact same time on the MySQL server. My guess is that it will just queue but if so, how does MySQL pick the first one to execute and can i have influence on this?
sending the same query at the same time
QUERIES DO NOT ALWAYS RUN IN PARALLEL
It depends on the database engine. With MyISAM, nearly every query acquires a table level lock meaning that the queries are run sequentially as a queue. With most of the other engines they may run in parallel.
echo_me says nothing happens at the exact same time and a CPU does not do everything at once
That's not exactly true. It's possible that a DBMS could run on a machine with more than one cpu, and with more than one network interface. It's very improbable that 2 queries could arrive at the same time - but not impossible, hence there is a mutex to ensure that the paring/execution transition only runs as a single thread (of execution - not necesarily the same light weight process).
There's 2 approaches to solving concurent DML - either to use transactions (where each user effectively gets a clone of the database) and when the queries have completed the DBMS tries to reconcile any changes - if the reconciliation fails, then the DBMS rolls back one of the queries and reports it as failed. The other approach is to use row-level locking - the DBMS identifies the rows which will be updated by a query and marks them as reserved for update (other users can read the original version of each row but any attempt to update the data will be blocked until the row is available again).
Your problem is that you have two mysql clients, each of which have retrieved the fact that there is one item of stock left. This is further complicated by the fact that (since you mention PHP) the stock levels may have been retrieved in a different DBMS session than the subsequent stock adjustment - you cannot have a transaction spanning more than HTTP request. Hence you need revalidate any fact maintained outside the DBMS within a single transaction.
Optimistic locking can create a pseudo - transaction control mechanism - you flag a record you are about to modify with a timestamp and the user identifier (with PHP the PHP session ID is a good choice) - if when you come to modify it, something else has changed it, then your code knows the data it retrieved previously is invalid. However this can lead to other complications.
They are executed as soon as the user requests it, so if there are 10 users requesting the query at the exact same time, then there will be 10 queries executed at the exact same time.
nothing happens at the exact same time and a CPU does not do everything at once. It does things one at a time (per core and/or thread). If 10 users are accessing pages which run queries they will "hit" the server in a specific order and be processed in that order (although that order may be in milliseconds). However, if there are multiple queries on a page you can't be sure that all the queries on one user's page will complete before the queries on another user's page are started. This can lead to concurrency problems.
edit:
Run SHOW PROCESSLIST to find the id of the connecton you want to kill
.
SHOW PROCESSLIST will give you a list of all currently running queries.
from here.
MySQL will perform well with fast CPUs because each query runs in a single thread and can't be parallelized across CPUs.
how mysql uses memorie
Consider a query similar to:
UPDATE items SET quantity = quantity - 1 WHERE id = 100
However many queries the MySQL server runs in parallel, if 2 such queries run and the row with id 100 has quantity 1, then something like this will happen by default:
The first query locks the row in items where id is 100
The second query tries to do the same, but the row is locked, so it waits
The first query changes the quantity from 1 to 0 and unlocks the row
The second query tries again and now sees the row is unlocked
The second query locks the row in items where id is 100
The second query changes the quantity from 0 to -1 and unlocks the row
This is essentially a concurrency question. There are ways to ensure concurrency in MySQL by using transactions. This means that in your eshop you can ensure that race conditions like the ones you describe won't be an issue. See link below about transactions in MySQL.
http://dev.mysql.com/doc/refman/5.0/en/sql-syntax-transactions.html
http://zetcode.com/databases/mysqltutorial/transactions/
Depending on your isolation level different outcomes will be returned from two concurrent queries.
Queries in MySQL are handled in parallel. You can read more about the implementation here.
Related
I have two table 'reservation' and 'spot'.during a reservation process the 'spotStatus' column in spot table is checked and if free, it is to be updated. A user is allowed to reserve only one spot so to make sure that no other user can reserve the same spot, what can i do?
referring to some answers here,i found row locking,table locking as solutions. should i perform queries like
"select * from spot where spotId = id for update;"
and then performing necessary update to the status or is there other elegant ways to do it?
and my concern is what happens to the locked row if
1. Transaction doesnot complete successfully?
2. what happens if both user tries to reserve the same row at the same time? are both transactions cancelled?
and when is the lock released?
The problem here is in race conditions, that even transactions will not prevent by default if used naively - even if 2 reservations happen simultaneously, for example originating from 2 different Apache processes running PHP, transactional locking will just ensure the reservations are properly serialized, and as such the second one will still overwrite the first.
Usually this situation is of no real concern, given the speed of databases and servers as a whole, compared to the load on an average reservation site, the chances of this ever causing a problem are less than winning the state lottery twice in a row. If however you are implementing a site that's going to sell 50k Coldplay concert tickets in 30 seconds, chances rise aggressively.
A simple solution to this is to implement a sort of 'reservation intent' by not overwriting the spot reservation directly, but by appending the intent-to-reserve to a separate timestamped table. After this insertion you can then clean up this table for duplicates, preferring the oldest, and apply that one to the real-time data.
if its not successful, the database returns to the same data it was before the transaction (rollback) as if it never happened.
the same as it was not in the same time. only one of them will lock the db and the other wont be created.
If you are using a teradata you can use a queue table concept.
Let's say I have two files file1.php and file2.php.
file1.php has the following queries:
-query 1
-query 2
-query 3
file2.php has the following queries:
-query 4
-query 5
-query 6
Let's say one visitor runs the first script and another visitor runs the second one exactly at the same time.
My question is: does MySQL receive one connection and keep the second connection in queue while executing all queries of the first script, and then moves on to the second connection?
Will the order of queries processed by MySQL be 1,2,3,4,5,6 (or 4,5,6,1,2,3) or can it be in any order?
What can I do to make sure MySQL executes all queries of one connection before moving on to another connection?
I'm concerned with data integrity. For example, account balance by two users who share the same account. They might see the same value, but if they both send queries at the same time, this could lead to some unexpected outcome
The database can accept queries from multiple connections in parallel. It can execute the queries in arbitrary order, even at the same time. The isolation level defines how much the parallel execution may affect the results:
If you don't use transactions, the queries can be executed in parallel, and the strongest isolation level still guarantees only that the queries will return the same result as if they were not executed in parallel, but can still be run in any order (as long as they're sorted within each connection)
If you use transactions, the database can guarantee more:
The strongest isolation level is serializable, which means the results will be as if no two transactions ran in parallel, but the performance will suffer.
The weakest isolation level is the same as not using transactions at all; anything could happen.
If you want to ensure data consistency, use transactions:
START TRANSACTION;
...
COMMIT;
The default isolation level is read-commited, which is roughly equivalent to serializable with ordinary SELECTs happening out-of-transactions. If you use SELECT FOR UPDATE for every SELECT within the transaction, you get serializable
See: http://dev.mysql.com/doc/refman/5.0/en/set-transaction.html#isolevel_repeatable-read
In general, you cannot predict or control order of execution - it can be 1,2,3,4,5,6 or 4,5,6,1,2,3 or 1,4,2,5,3,6 or any combination of those.
MySQL executes queries from multiple connections in parallel, and server performance is shared across all clients (2 in your case).
I don't think you have a reason to worry or change this - MySQL was created with this in mind, to be able to serve multiple connections.
If you have performance problems, they typically can be solved by adding indexes or changing your database schema - like normalizing or denormalizing your tables.
You may limit max_connections to 1 but then it will give you too many connections error for other connections. Limiting concurrent execution makes no sense.
Make the operation as transaction and set auto commit to false
Access all of your tables in the same order as this will prevent deadlock.
How long can a MySQL transaction last until it times out? I'm asking because I'm planning to code an payment process for my e-commerce project somewhere along the lines of this (PHP/MySQL psuedo-code):
START TRANSACTION;
SELECT...WHERE id IN (1,2,3) AND available = 1 FOR UPDATE; //lock rows where "available" is true
//Do payment processing...
//add to database, commit or rollback based on payment results
I can not think of another way to lock the products being bought (so if two users buy it at the same time, and there is only one left in stock, one user won't be able to buy), process payment if products are available, and create a record based on payment results...
That technique would also block users who simply wanted to see the products other people are buying. I'd be exceptionally wary of any technique that relies on database row locking to enforce inventory management.
Instead, why not simply record the number of items currently tied up in an active "transaction" (here meaning the broader commercial sense, rather than the technical database sense). If you have a current_inventory field, add an on_hold or being_paid_for or not_really_available_because_they_are_being_used_elsewhere field that you can update with information on current payments.
Better yet, why not use a purchase / payment log to sum the items currently "on hold" or "in processing" for several different users.
This is the general approach you often see on sites like Ticketmaster that declare, "You have X minutes to finish this page, or we'll put these tickets back on the market." They're recording which items the user is currently trying to buy, and those records can even persist across PHP page requests.
If you have to ask how long it is before a database connection times out, then your transactions take orders of magnitudes too long.
Long open transactions are a big problem and frequent causes of poor performance, unrepeatable bugs or even deadlocking the complete application. Certainly in a web application you want tight fast transactions to make sure all table and row level locks are quickly freed.
I found that even several 100ms can become troublesome.
Then there is the problem of sharing a transaction over multiple requests which may happen concurrently.
If you need to "emulate" long running transactions, cut it into smaller pieces which can be executed fast, and keep a log so you can rollback using the log by undoing the transactions.
Now, if the payment service completes in 98% of cases in less than 2 sec and you do not have hundreds of concurrent requests going on, it might just be fine.
Timeout depends on server settings -- both that of mysql and that of the language you are using to interact with mysql. Look in the settings files for your server.
I don't think what you are doing would cause a timeout, but if you are worried you might want to rethink the location of your check so that it doesn't actually lock the tables across queries. You could instead have a stored procedure that is built into the data layer rather than relying on two separate calls. Or, maybe a conditional insert or a conditional update?
All in all, as another person noted, I don't like the idea of locking entire table rows which you might want to be able to select from for other purposes outside of the actual "purchase" step, as it could result in problems or bottlenecks elsewhere in your application.
Lets say you have a table with some winning numbers in it. Any of these numbers is meant to be only "won" by one person.
How could I prevent 2 simultaneous web requests that submit the same numbers from both checking and seeing that the numbers is still available and then giving the prize to both of them before the number is marked as no longer being available.
The winning solution in this question feels like what I was thinking of doing, as it can also be applied in most database platforms.
Is there any other common pattern that can be applied to this?
These numbers are randomly generated or something?
I would rely on the transactional semantics in the database itself: Create a table with two columns, number and claimed, and use a single update:
UPDATE winners SET claimed=1 WHERE claimed=0 AND number=#num;
Then check the number of affected rows.
Use transactions. You should never have multiple threads or processes changing the same data without transactional locks and any decent database supports transactions today. Start the transaction, "grab" the winning number, and then commit. Another thread would be locked until the commit, and would only get its chance after the records are updated, when it could see its already there.
A non-database solution could be to have the client make the request async and then push the request on a FIFO queue to control the requests so that only one request at a time is getting evaluated. Then respond back to the client when the evaluation is complete. The advantage here would be that under high load, the UI would not be frozen where it would be with transactional locking on the database level.
We have this PHP application which selects a row from the database, works on it (calls an external API which uses a webservice), and then inserts a new register based on the work done. There's an AJAX display which informs the user of how many registers have been processed.
The data is mostly text, so it's rather heavy data.
The process is made by thousands of registers a time. The user can choose how many registers to start working on. The data is obtained from one table, where they are marked as "done". No "WHERE" condition, except the optional "WHERE date BETWEEN date1 AND date2".
We had an argument over which approach is better:
Select one register, work on it, and insert the new data
Select all of the registers, work with them in memory and insert them in the database after all the work was done.
Which approach do you consider the most efficient one for a web environment with PHP and PostgreSQL? Why?
It really depends how much you care about your data (seriously):
Does reliability matter in this case? If the process dies, can you just re-process everything? Or can't you?
Typically when calling a remote web service, you don't want to be calling it twice for the same data item. Perhaps there are side effects (like credit card charges), or maybe it is not a free API...
Anyway, if you don't care about potential duplicate processing, then take the batch approach. It's easy, it's simple, and fast.
But if you do care about duplicate processing, then do this:
SELECT 1 record from the table FOR UPDATE (ie. lock it in a transaction)
UPDATE that record with a status of "Processing"
Commit that transaction
And then
Process the record
Update the record contents, AND
SET the status to "Complete", or "Error" in case of errors.
You can run this code concurrently without fear of it running over itself. You will be able to have confidence that the same record will not be processed twice.
You will also be able to see any records that "didn't make it", because their status will be "Processing", and any errors.
If the data is heavy and so is the load, considering the application is not real time dependant the best approach is most definately getting the needed data and working on all of it, then putting it back.
Efficiency speaking, regardless of language is that if you are opening single items, and working on them individually, you are probably closing the database connection. This means that if you have 1000's of items, you will open and close 1000's of connections. The overhead on this far outweighs the overhead of returning all of the items and working on them.