When the web server receives a request for my PHP script, I presume the server creates a dedicated process to run the script. If, before the script exits, another request to the same script comes, another process gets started -- am I correct, or the second request will be queued in the server, waiting for the first request to exit? (Question 1)
If the former is correct, i.e. the same script can run simultaneously in a different process, then they will try to access my database.
When I connect to the database in the script:
$DB = mysqli_connect("localhost", ...);
query it, conduct more or less lengthy calculations and update it, I don't want the contents of the database to be modified by another instance of a running script.
Question 2: Does it mean that since connecting to the database until closing it:
mysqli_close($DB);
the database is blocked for any access from other software components? If so, it effectively prevents the script instances from running concurrently.
UPDATE: #OllieJones kindly explained that the database was not blocked.
Let's consider the following scenario. The script in the first process discovers an eligible user in the Users table and starts preparing data to append for that user in the Counter table. At this moment the script in the other process preempts and deletes the user from the Users table and the associate data in the Counter table; it then gets preempted by the first script which writes the data for the user no more existing. These data become in the head-detached state, i.e. unaccessible.
How to prevent such a contention?
In modern web servers, there's a pool of processes (or possibly threads) handling requests from users. Concurrent requests to the same script can run concurrently. Each request-handler has its own connection to the DBMS (they're actually maintained in a pool, but that's a story for another day).
The database is not blocked while individual request-handlers are using it, unless you block it explicitly by locking a table or doing a request like SELECT ... FOR UPDATE. For more information on this deep topic, read about transactions.
Therefore, it's important to write your database queries in such a way that they won't interfere with each other. For example, if you need to learn the value of an auto-incremented column right after you insert a row, you should use LAST_INSERT_ID() or mysqli_insert_id() instead of trying to query the data base: another user may have inserted another row in the meantime.
The system test discipline for scaled-up web sites usually involves a rigorous load test in order to shake out all this concurrency.
If you're doing a bunch of work on a particular entity, in your case a User, you use a transaction.
First you do
BEGIN
to start the transaction. Then you do
SELECT whatever FROM User WHERE user_id = <<whatever>> FOR UPDATE
to choose the user and mark that user's row as busy-being-updated. Then you do all the work you need to do to fill out various rows in various tables relating to that user.
Finally you do
COMMIT
If you messed things up, or don't want to go through with the change, you do
ROLLBACK
and all your changes will be restored to their state right before the SELECT ... FOR UPDATE.
Why does this work? Because if another client does the same SELECT .... FOR UPDATE, MySQL will delay that request until the first one either gets COMMIT or ROLLBACK.
If another client works with a different userid, the operations may proceed concurrently.
You need the InnoDB access method to use transactions: MyISAM doesn't support them.
Multiple reads can be done concurrently, if there is a write operation then it will block all other operations. A read will block all writes.
Related
For our web application, written in Laravel, we use transactions to update the database. We have separated our data cross different database (for ease, let's say "app" and "user"). During an application update, an event is fired to update some user statistics in the user database. However, this application update may be called as part of a transaction on the application database, so the code structure looks something as follows.
DB::connection('app')->beginTransaction();
// ...
DB::connection('user')->doStuff();
// ...
DB::connection('app')->commit();
It appears that any attempt to start a transaction on the user connection (since a single query already creates an implicit transaction) while in a game transaction does not work, causing a deadlock (see InnoDB status output). I also used innotop to get some more information, but while it showed the lock wait, it did not show by which query it was blocked. There was a lock present on the user table, but I could not find it's origin. Relevant output is shown below:
The easy solution would be to pull out the user operation from the transaction, but since the actual code is slightly more complicated (the doStuff actually happens somewhere in a nested method called during the transaction and is called from different places), this is far from trivial. I would very much like the doStuff to be part of the transaction, but I do not see how the transaction can span multiple databases.
What is the reason that this situation causes a deadlock and is it possible to run the doStuff on the user database within this transaction, or do we have to find an entirely different solution like queueing the events for execution afterwards?
I have found a way to solve this issue using a workaround. My hypothesis was that it was trying to use the app connection to update the user database, but after forcing it to use the user connection, it still locked. Then I switched it around and used the app connection to update the user database. Since we have joins between the databases, we have database users with proper access, so that was no issue. This actually solved the problem.
That is, the final code turned out to be something like
DB::connection('app')->beginTransaction();
// ...
DB::connection('app')->table('user.users')->doStuff(); // Pay attention to this line!
// ...
DB::connection('app')->commit();
the best way to explain my question is with an example
say i have 3 scripts;
first one is a form, on submitting this it goes to second script which processes the POST variables and Inserts them into the DB
on this page/script there's another submit button, that takes you to the third page where the Insert query is finally commited to the DB
is this possible?
or do commit/rollback have to be on the same script?
thanks
Yes, commit/rollback has to be in the same request that started the transaction.
Another way of looking at this is that transactions must be resolved within the same database connection, and database connections (like any other resource) don't survive across multiple PHP requests.
As #Wrikken comments, you could save the uncommitted data in session data, or some other non-database holding space (e.g. memcached).
Another option is to save and commit the data in the database during each request, but add a column to your table for the state of the data. Therefore it would be physically committed with respect to database transactions, but it would be annotated as incomplete until you finish handling the third script.
I've implemented systems like this, for example for "shopping cart" style information. It also helps to run a daily cron job to delete old, unfinished data. Because inevitably, people do sometimes abandon their work in progress and never get to the finish step.
I am working on a complex database application written with PHP and Mysqli. For large database operations I´m using daemons (also PHP) working in the background. During these operations which may take several minutes I want to prevent the users from accesing the data being affected and show them a message instead.
I thought about creating a Mysql table and insert a row each time a specific daemon operation takes place. Then I would be always able to check if a certain operation takes place while trying to access the data.
However, of course it is important that the records do not stay in the database, if the daemon process gets terminated by any reason (kill from console, losing database connection, pulling the plug, etc.) I do not think that Mysql transactions / rollbacks are able do this, because a commit is necessary in order to make the changes public and the records will remain in the database if terminated.
Is there a way to ensure that the records get deleted if the process gets terminated?
This is an interesting problem, I actually implemented it for a University course a few years ago.
The trick I used is to play with the transaction isolation. If your daemons create the record indicating they are in progress, but do not commit it, then you are correct in saying that the other clients will not see that record. But, if you set the isolation level for that client to READ UNCOMMITTED, you will see the record saying it's in progress - since READ UNCOMMITTED will show you the changes from other clients which are not yet committed. (The daemon would be left with the default isolation level).
You should set the client's isolation level to read uncommitted for that daemon check only, not for it's other work as it could be very dangerous.
If the daemon crashes, the transaction gets aborted and the record goes. If it's successful, it can either mark it done in the db or delete the record etc, and then it commits.
This is really nice, since if the daemon fails for any reason all the changes it made are reversed and it can be retried.
If you need more explanation or code I may be able to find my assignment somewhere :)
Transaction isolation levels reference
Note that this all requires InnoDB or any good transactional DB.
We have this PHP application which selects a row from the database, works on it (calls an external API which uses a webservice), and then inserts a new register based on the work done. There's an AJAX display which informs the user of how many registers have been processed.
The data is mostly text, so it's rather heavy data.
The process is made by thousands of registers a time. The user can choose how many registers to start working on. The data is obtained from one table, where they are marked as "done". No "WHERE" condition, except the optional "WHERE date BETWEEN date1 AND date2".
We had an argument over which approach is better:
Select one register, work on it, and insert the new data
Select all of the registers, work with them in memory and insert them in the database after all the work was done.
Which approach do you consider the most efficient one for a web environment with PHP and PostgreSQL? Why?
It really depends how much you care about your data (seriously):
Does reliability matter in this case? If the process dies, can you just re-process everything? Or can't you?
Typically when calling a remote web service, you don't want to be calling it twice for the same data item. Perhaps there are side effects (like credit card charges), or maybe it is not a free API...
Anyway, if you don't care about potential duplicate processing, then take the batch approach. It's easy, it's simple, and fast.
But if you do care about duplicate processing, then do this:
SELECT 1 record from the table FOR UPDATE (ie. lock it in a transaction)
UPDATE that record with a status of "Processing"
Commit that transaction
And then
Process the record
Update the record contents, AND
SET the status to "Complete", or "Error" in case of errors.
You can run this code concurrently without fear of it running over itself. You will be able to have confidence that the same record will not be processed twice.
You will also be able to see any records that "didn't make it", because their status will be "Processing", and any errors.
If the data is heavy and so is the load, considering the application is not real time dependant the best approach is most definately getting the needed data and working on all of it, then putting it back.
Efficiency speaking, regardless of language is that if you are opening single items, and working on them individually, you are probably closing the database connection. This means that if you have 1000's of items, you will open and close 1000's of connections. The overhead on this far outweighs the overhead of returning all of the items and working on them.
I have written a webservice using the PHP SOAP classes. It has functions to return XML data from an Oracle database, or to perform insert/update/delete on the database.
However, at the moment it is using Autocommit, so any operation is instantly commited.
I'm looking at how to queue up the transactions, and then commit the whole lot only when a user presses a button to "save". I'm having difficulty in finding out if this is possible. I can't maintain a consistent connection easily, as of course the webservice is called for separate operations.
I've tried using the PHP oci_pconnect function, but even when I connect each time with the same parameters, the session appears to have ended, and my changes aren't commited when I finally call oci_commit.
Any ideas?
Reusing the same uncommitted database session between PHP requests is not possible. You have no way to lock a user into a PHP processes or DB connection as the webserver will send a request to any one of many of them at random. Therefore you cannot hold uncommited data in the Oracle session between requests.
The best way to do this really depends on your requirements. My feeling is that you want some sort of session store (perhaps a database table, keyed on user_id) that can hold all the pending transactions between requests. When the user hits save, extract out all the pending requests and insert them into their final destination table and then commit.
An alternative would be to insert all the transactions with a flag that says they are not yet completed. Upon clicking save, update the flag to say they are completed.
Either way, you need somewhere to stage your pending requests until that save button is pressed.
DBMS_XA allows you to share transactions across sessions.