For our web application, written in Laravel, we use transactions to update the database. We have separated our data cross different database (for ease, let's say "app" and "user"). During an application update, an event is fired to update some user statistics in the user database. However, this application update may be called as part of a transaction on the application database, so the code structure looks something as follows.
DB::connection('app')->beginTransaction();
// ...
DB::connection('user')->doStuff();
// ...
DB::connection('app')->commit();
It appears that any attempt to start a transaction on the user connection (since a single query already creates an implicit transaction) while in a game transaction does not work, causing a deadlock (see InnoDB status output). I also used innotop to get some more information, but while it showed the lock wait, it did not show by which query it was blocked. There was a lock present on the user table, but I could not find it's origin. Relevant output is shown below:
The easy solution would be to pull out the user operation from the transaction, but since the actual code is slightly more complicated (the doStuff actually happens somewhere in a nested method called during the transaction and is called from different places), this is far from trivial. I would very much like the doStuff to be part of the transaction, but I do not see how the transaction can span multiple databases.
What is the reason that this situation causes a deadlock and is it possible to run the doStuff on the user database within this transaction, or do we have to find an entirely different solution like queueing the events for execution afterwards?
I have found a way to solve this issue using a workaround. My hypothesis was that it was trying to use the app connection to update the user database, but after forcing it to use the user connection, it still locked. Then I switched it around and used the app connection to update the user database. Since we have joins between the databases, we have database users with proper access, so that was no issue. This actually solved the problem.
That is, the final code turned out to be something like
DB::connection('app')->beginTransaction();
// ...
DB::connection('app')->table('user.users')->doStuff(); // Pay attention to this line!
// ...
DB::connection('app')->commit();
Related
When the web server receives a request for my PHP script, I presume the server creates a dedicated process to run the script. If, before the script exits, another request to the same script comes, another process gets started -- am I correct, or the second request will be queued in the server, waiting for the first request to exit? (Question 1)
If the former is correct, i.e. the same script can run simultaneously in a different process, then they will try to access my database.
When I connect to the database in the script:
$DB = mysqli_connect("localhost", ...);
query it, conduct more or less lengthy calculations and update it, I don't want the contents of the database to be modified by another instance of a running script.
Question 2: Does it mean that since connecting to the database until closing it:
mysqli_close($DB);
the database is blocked for any access from other software components? If so, it effectively prevents the script instances from running concurrently.
UPDATE: #OllieJones kindly explained that the database was not blocked.
Let's consider the following scenario. The script in the first process discovers an eligible user in the Users table and starts preparing data to append for that user in the Counter table. At this moment the script in the other process preempts and deletes the user from the Users table and the associate data in the Counter table; it then gets preempted by the first script which writes the data for the user no more existing. These data become in the head-detached state, i.e. unaccessible.
How to prevent such a contention?
In modern web servers, there's a pool of processes (or possibly threads) handling requests from users. Concurrent requests to the same script can run concurrently. Each request-handler has its own connection to the DBMS (they're actually maintained in a pool, but that's a story for another day).
The database is not blocked while individual request-handlers are using it, unless you block it explicitly by locking a table or doing a request like SELECT ... FOR UPDATE. For more information on this deep topic, read about transactions.
Therefore, it's important to write your database queries in such a way that they won't interfere with each other. For example, if you need to learn the value of an auto-incremented column right after you insert a row, you should use LAST_INSERT_ID() or mysqli_insert_id() instead of trying to query the data base: another user may have inserted another row in the meantime.
The system test discipline for scaled-up web sites usually involves a rigorous load test in order to shake out all this concurrency.
If you're doing a bunch of work on a particular entity, in your case a User, you use a transaction.
First you do
BEGIN
to start the transaction. Then you do
SELECT whatever FROM User WHERE user_id = <<whatever>> FOR UPDATE
to choose the user and mark that user's row as busy-being-updated. Then you do all the work you need to do to fill out various rows in various tables relating to that user.
Finally you do
COMMIT
If you messed things up, or don't want to go through with the change, you do
ROLLBACK
and all your changes will be restored to their state right before the SELECT ... FOR UPDATE.
Why does this work? Because if another client does the same SELECT .... FOR UPDATE, MySQL will delay that request until the first one either gets COMMIT or ROLLBACK.
If another client works with a different userid, the operations may proceed concurrently.
You need the InnoDB access method to use transactions: MyISAM doesn't support them.
Multiple reads can be done concurrently, if there is a write operation then it will block all other operations. A read will block all writes.
I am working on a complex database application written with PHP and Mysqli. For large database operations I´m using daemons (also PHP) working in the background. During these operations which may take several minutes I want to prevent the users from accesing the data being affected and show them a message instead.
I thought about creating a Mysql table and insert a row each time a specific daemon operation takes place. Then I would be always able to check if a certain operation takes place while trying to access the data.
However, of course it is important that the records do not stay in the database, if the daemon process gets terminated by any reason (kill from console, losing database connection, pulling the plug, etc.) I do not think that Mysql transactions / rollbacks are able do this, because a commit is necessary in order to make the changes public and the records will remain in the database if terminated.
Is there a way to ensure that the records get deleted if the process gets terminated?
This is an interesting problem, I actually implemented it for a University course a few years ago.
The trick I used is to play with the transaction isolation. If your daemons create the record indicating they are in progress, but do not commit it, then you are correct in saying that the other clients will not see that record. But, if you set the isolation level for that client to READ UNCOMMITTED, you will see the record saying it's in progress - since READ UNCOMMITTED will show you the changes from other clients which are not yet committed. (The daemon would be left with the default isolation level).
You should set the client's isolation level to read uncommitted for that daemon check only, not for it's other work as it could be very dangerous.
If the daemon crashes, the transaction gets aborted and the record goes. If it's successful, it can either mark it done in the db or delete the record etc, and then it commits.
This is really nice, since if the daemon fails for any reason all the changes it made are reversed and it can be retried.
If you need more explanation or code I may be able to find my assignment somewhere :)
Transaction isolation levels reference
Note that this all requires InnoDB or any good transactional DB.
In my application (client-server) I need to edit some rows (from a database), and as long as they are edited it needs nobody to be able to edit also. This is done by transactions of course. The problem is that in a client-side environment the transactions is managed on the server side, so the client that edits the rows can't access the transaction directly. (I'm working with PHP in that situation but think that the same approach is adopted in other technologies also). So I need to keep transaction opened (for keeping rows locked for editing) until the client finishes the edit.
In PHP, persistent connection won't help because they can be broken from other clients located in the same host with the aforesaid client. Do you have any ideeas for my scenario?
thank you.
Usually such cases are handled through business locks that you set directly on the objects, or on the parent of the objects.
Add a column such "inedition" that you set to true when user claims for edit, and set to false when user validate/cancel its edit.
Be aware that some users transactions are likely to be lost before that you unlock the row, so you'll probably require:
either a periodic treatment that unlock rows
either a functionnal screen from which the user or an admin can unlock the rows that remained locked.
Edit:
This kind of solution is used whenever you do not want to rely on database specific feature, such Oracle "Select for update". In Java an EJB statefull bean can keep a reference to the transaction from UI to database. There might be solutions using PHP for Oracle or other database specific feature regarding transactions, depending on databases.
We have this PHP application which selects a row from the database, works on it (calls an external API which uses a webservice), and then inserts a new register based on the work done. There's an AJAX display which informs the user of how many registers have been processed.
The data is mostly text, so it's rather heavy data.
The process is made by thousands of registers a time. The user can choose how many registers to start working on. The data is obtained from one table, where they are marked as "done". No "WHERE" condition, except the optional "WHERE date BETWEEN date1 AND date2".
We had an argument over which approach is better:
Select one register, work on it, and insert the new data
Select all of the registers, work with them in memory and insert them in the database after all the work was done.
Which approach do you consider the most efficient one for a web environment with PHP and PostgreSQL? Why?
It really depends how much you care about your data (seriously):
Does reliability matter in this case? If the process dies, can you just re-process everything? Or can't you?
Typically when calling a remote web service, you don't want to be calling it twice for the same data item. Perhaps there are side effects (like credit card charges), or maybe it is not a free API...
Anyway, if you don't care about potential duplicate processing, then take the batch approach. It's easy, it's simple, and fast.
But if you do care about duplicate processing, then do this:
SELECT 1 record from the table FOR UPDATE (ie. lock it in a transaction)
UPDATE that record with a status of "Processing"
Commit that transaction
And then
Process the record
Update the record contents, AND
SET the status to "Complete", or "Error" in case of errors.
You can run this code concurrently without fear of it running over itself. You will be able to have confidence that the same record will not be processed twice.
You will also be able to see any records that "didn't make it", because their status will be "Processing", and any errors.
If the data is heavy and so is the load, considering the application is not real time dependant the best approach is most definately getting the needed data and working on all of it, then putting it back.
Efficiency speaking, regardless of language is that if you are opening single items, and working on them individually, you are probably closing the database connection. This means that if you have 1000's of items, you will open and close 1000's of connections. The overhead on this far outweighs the overhead of returning all of the items and working on them.
I would like to create an interface for manipulating invoices in a transaction-like manner.
The database consists of an invoices table, which holds billing information, and an invoice_lines table, which holds line items for the invoices. The website is a set of scripts which allow the addition, modification, and removal of invoices and their corresponding lines.
The problem I have is this, I would like the ACID properties of the database to be reflected in the web application.
Atomic: When the user hits save, either the entire invoice is modified or the entire invoice is not changed at all.
Consistent: The application code already ensures consistency, lines cannot be added to non-existent invoices. Invoice IDs cannot be duplicated.
Isolated: If a user is in the middle of a set of changes to an invoice, I would like to hide those changes from other users until the user clicks save.
Durable: If the web site dies, the data should be safe. This already works.
If I were writing a desktop application, it would maintain a connection to the MySQL database at all times, allowing me to simply use the BEGIN TRANSACTION and COMMIT at the beginning and end of the edit.
From what I understand you cannot BEGIN TRANSACTION on one PHP page and COMMIT on a different page because the connection is closed between pages.
Is there a way to make this possible without extensions? From what I have found, only SQL Relay does this (but it is an extension).
you don't want to have long running transactions, because that'll limit concurrency. http://en.wikipedia.org/wiki/Command_pattern
The translation on the web for this type of processing is the use of session data or data stored in the page itself. Typically what is done is that after each web page is completed the data is stored in the session (or in the page itself) and at the point in which all of the pages have been completed (via data entry) and a "Process" (or "Save") button is hit, the data is converted into the database form and saved - even with the relational aspect of data like you mentioned. There are many ways to do this but I would say that most developers have an architecture similar to what I mentioned (using session data or state within the page) to satisfy what you are talking about.
You'll get much advice here on different architectures but I can say that the Zend Framework (http://framework.zend.com) and the use of Doctrine (http://www.doctrine-project.org/) make this fairy easy since Zend provides much of the MVC architecture and session management and Doctrine provides the basic CRUD (create, retrieve, update, delete) you are looking for - plus all of the other aspects (uniqueness, commit, rollback, etc). Keeping the connection open to mysql may cause timeouts and lack of available connections.
Database transactions aren't really intended for this purpose - if you did use them, you'd probably run into other problems.
But also you can't use them as each page request uses its own connection (potentially) so cannot share a transaction with any others.
Keep the modifications to the invoice somewhere else while the user is editing them, then apply them when she hits save; you can do this final apply step in a transaction (albeit quite a short-lived one).
Long-lived transactions are usually bad.
The solution is not to open the transaction during the GET phase. Do all aspects of the transaction—BEGIN TRANSACTION, processing, and COMMIT—all during the POST triggered by the "save" button.
Persistent connections may help you:
http://php.net/manual/en/features.persistent-connections.php
Another is that when using
transactions, a transaction block will
also carry over to the next script
which uses that connection if script
execution ends before the transaction
block does.
But I recommend you to find another approach to the problem.
For example: create a cache table.
When you need to "commit", transfer the records from the cache table to the "real" tables.
Altough there are some good answers, I think that found some good responses to your question, that I was stuck with also. I think the best approach is using a framework like Doctrine (O/R mapping) that has this kind of approach somehow implemented. Here you have a link to what I'm talking about.