the best way to explain my question is with an example
say i have 3 scripts;
first one is a form, on submitting this it goes to second script which processes the POST variables and Inserts them into the DB
on this page/script there's another submit button, that takes you to the third page where the Insert query is finally commited to the DB
is this possible?
or do commit/rollback have to be on the same script?
thanks
Yes, commit/rollback has to be in the same request that started the transaction.
Another way of looking at this is that transactions must be resolved within the same database connection, and database connections (like any other resource) don't survive across multiple PHP requests.
As #Wrikken comments, you could save the uncommitted data in session data, or some other non-database holding space (e.g. memcached).
Another option is to save and commit the data in the database during each request, but add a column to your table for the state of the data. Therefore it would be physically committed with respect to database transactions, but it would be annotated as incomplete until you finish handling the third script.
I've implemented systems like this, for example for "shopping cart" style information. It also helps to run a daily cron job to delete old, unfinished data. Because inevitably, people do sometimes abandon their work in progress and never get to the finish step.
Related
When the web server receives a request for my PHP script, I presume the server creates a dedicated process to run the script. If, before the script exits, another request to the same script comes, another process gets started -- am I correct, or the second request will be queued in the server, waiting for the first request to exit? (Question 1)
If the former is correct, i.e. the same script can run simultaneously in a different process, then they will try to access my database.
When I connect to the database in the script:
$DB = mysqli_connect("localhost", ...);
query it, conduct more or less lengthy calculations and update it, I don't want the contents of the database to be modified by another instance of a running script.
Question 2: Does it mean that since connecting to the database until closing it:
mysqli_close($DB);
the database is blocked for any access from other software components? If so, it effectively prevents the script instances from running concurrently.
UPDATE: #OllieJones kindly explained that the database was not blocked.
Let's consider the following scenario. The script in the first process discovers an eligible user in the Users table and starts preparing data to append for that user in the Counter table. At this moment the script in the other process preempts and deletes the user from the Users table and the associate data in the Counter table; it then gets preempted by the first script which writes the data for the user no more existing. These data become in the head-detached state, i.e. unaccessible.
How to prevent such a contention?
In modern web servers, there's a pool of processes (or possibly threads) handling requests from users. Concurrent requests to the same script can run concurrently. Each request-handler has its own connection to the DBMS (they're actually maintained in a pool, but that's a story for another day).
The database is not blocked while individual request-handlers are using it, unless you block it explicitly by locking a table or doing a request like SELECT ... FOR UPDATE. For more information on this deep topic, read about transactions.
Therefore, it's important to write your database queries in such a way that they won't interfere with each other. For example, if you need to learn the value of an auto-incremented column right after you insert a row, you should use LAST_INSERT_ID() or mysqli_insert_id() instead of trying to query the data base: another user may have inserted another row in the meantime.
The system test discipline for scaled-up web sites usually involves a rigorous load test in order to shake out all this concurrency.
If you're doing a bunch of work on a particular entity, in your case a User, you use a transaction.
First you do
BEGIN
to start the transaction. Then you do
SELECT whatever FROM User WHERE user_id = <<whatever>> FOR UPDATE
to choose the user and mark that user's row as busy-being-updated. Then you do all the work you need to do to fill out various rows in various tables relating to that user.
Finally you do
COMMIT
If you messed things up, or don't want to go through with the change, you do
ROLLBACK
and all your changes will be restored to their state right before the SELECT ... FOR UPDATE.
Why does this work? Because if another client does the same SELECT .... FOR UPDATE, MySQL will delay that request until the first one either gets COMMIT or ROLLBACK.
If another client works with a different userid, the operations may proceed concurrently.
You need the InnoDB access method to use transactions: MyISAM doesn't support them.
Multiple reads can be done concurrently, if there is a write operation then it will block all other operations. A read will block all writes.
How does PHP handle multiple requests from users? Does it process them all at once or one at a time waiting for the first request to complete and then moving to the next.
Actually, I'm adding a bit of wiki to a static site where users will be able to edit addresses of businesses if they find them inaccurate or if they can be improved. Only registered users may do so. When a user edits a business name, that name along with it's other occurrences is changed in different rows in the table. I'm a little worried about what would happend if 10 users were doing this simultaneously. It'd be a real mishmash of things. So does PHP do things one at time in order received per script (update.php) or all at once.
Requests are handled in parallel by the web server (which runs the PHP script).
Updating data in the database is pretty fast, so any update will appear instantaneous, even if you need to update multiple tables.
Regarding the mish mash, for the DB, handling 10 requests within 1 second is the same as 10 requests within 10 seconds, it won't confuse them and just execute them one after the other.
If you need to update 2 tables and absolutely need these 2 updates to run subsequently without being interrupted by another update query, then you can use transactions.
EDIT:
If you don't want 2 users editing the same form at the same time, you have several options to prevent them. Here are a few ideas:
You can "lock" that record for edition whenever a user opens the page to edit it, and not let other users open it for edition. You might run into a few problems if a user doesn't "unlock" the record after they are done.
You can notify in real time (with AJAX) a user that the entry they are editing was modified, just like on stack overflow when a new answer or comment was posted as you are typing.
When a user submits an edit, you can check if the record was edited between when they started editing and when they tried to submit it, and show them the new version beside their version, so that they manually "merge" the 2 updates.
There probably are more solutions but these should get you started.
It depends on which version of Apache you are using and how it is configured, but a common default configuration uses multiple workers with multiple threads to handle simultaneous requests. See http://httpd.apache.org/docs/2.2/mod/worker.html for a rundown of how this works. The end result is that your PHP scripts may together have dozens of open database connections, possibly sending several queries at the exact same time.
However, your DBMS is designed to handle this. If you are only doing simple INSERT queries, then your code doesn't need to do anything special. Your DBMS will take care of the necessary locks on its own. Row-level locking will be fastest for multiple INSERTs, so if you use MySQL, you should consider the InnoDB storage engine.
Of course, your query can always fail whether it's due to too many database connections, a conflict on a unique index, etc. Wrap your queries in try catch blocks to handle this case.
If you have other application-layer concerns about concurrency, such as one user overwriting another user's changes, then you will need to handle these in the PHP script. One way to handle this is to use revision numbers stored along with your data, and refusing to execute the query if the revision number has changed, but how you handle it all depends on your application.
We have this PHP application which selects a row from the database, works on it (calls an external API which uses a webservice), and then inserts a new register based on the work done. There's an AJAX display which informs the user of how many registers have been processed.
The data is mostly text, so it's rather heavy data.
The process is made by thousands of registers a time. The user can choose how many registers to start working on. The data is obtained from one table, where they are marked as "done". No "WHERE" condition, except the optional "WHERE date BETWEEN date1 AND date2".
We had an argument over which approach is better:
Select one register, work on it, and insert the new data
Select all of the registers, work with them in memory and insert them in the database after all the work was done.
Which approach do you consider the most efficient one for a web environment with PHP and PostgreSQL? Why?
It really depends how much you care about your data (seriously):
Does reliability matter in this case? If the process dies, can you just re-process everything? Or can't you?
Typically when calling a remote web service, you don't want to be calling it twice for the same data item. Perhaps there are side effects (like credit card charges), or maybe it is not a free API...
Anyway, if you don't care about potential duplicate processing, then take the batch approach. It's easy, it's simple, and fast.
But if you do care about duplicate processing, then do this:
SELECT 1 record from the table FOR UPDATE (ie. lock it in a transaction)
UPDATE that record with a status of "Processing"
Commit that transaction
And then
Process the record
Update the record contents, AND
SET the status to "Complete", or "Error" in case of errors.
You can run this code concurrently without fear of it running over itself. You will be able to have confidence that the same record will not be processed twice.
You will also be able to see any records that "didn't make it", because their status will be "Processing", and any errors.
If the data is heavy and so is the load, considering the application is not real time dependant the best approach is most definately getting the needed data and working on all of it, then putting it back.
Efficiency speaking, regardless of language is that if you are opening single items, and working on them individually, you are probably closing the database connection. This means that if you have 1000's of items, you will open and close 1000's of connections. The overhead on this far outweighs the overhead of returning all of the items and working on them.
I would like to create an interface for manipulating invoices in a transaction-like manner.
The database consists of an invoices table, which holds billing information, and an invoice_lines table, which holds line items for the invoices. The website is a set of scripts which allow the addition, modification, and removal of invoices and their corresponding lines.
The problem I have is this, I would like the ACID properties of the database to be reflected in the web application.
Atomic: When the user hits save, either the entire invoice is modified or the entire invoice is not changed at all.
Consistent: The application code already ensures consistency, lines cannot be added to non-existent invoices. Invoice IDs cannot be duplicated.
Isolated: If a user is in the middle of a set of changes to an invoice, I would like to hide those changes from other users until the user clicks save.
Durable: If the web site dies, the data should be safe. This already works.
If I were writing a desktop application, it would maintain a connection to the MySQL database at all times, allowing me to simply use the BEGIN TRANSACTION and COMMIT at the beginning and end of the edit.
From what I understand you cannot BEGIN TRANSACTION on one PHP page and COMMIT on a different page because the connection is closed between pages.
Is there a way to make this possible without extensions? From what I have found, only SQL Relay does this (but it is an extension).
you don't want to have long running transactions, because that'll limit concurrency. http://en.wikipedia.org/wiki/Command_pattern
The translation on the web for this type of processing is the use of session data or data stored in the page itself. Typically what is done is that after each web page is completed the data is stored in the session (or in the page itself) and at the point in which all of the pages have been completed (via data entry) and a "Process" (or "Save") button is hit, the data is converted into the database form and saved - even with the relational aspect of data like you mentioned. There are many ways to do this but I would say that most developers have an architecture similar to what I mentioned (using session data or state within the page) to satisfy what you are talking about.
You'll get much advice here on different architectures but I can say that the Zend Framework (http://framework.zend.com) and the use of Doctrine (http://www.doctrine-project.org/) make this fairy easy since Zend provides much of the MVC architecture and session management and Doctrine provides the basic CRUD (create, retrieve, update, delete) you are looking for - plus all of the other aspects (uniqueness, commit, rollback, etc). Keeping the connection open to mysql may cause timeouts and lack of available connections.
Database transactions aren't really intended for this purpose - if you did use them, you'd probably run into other problems.
But also you can't use them as each page request uses its own connection (potentially) so cannot share a transaction with any others.
Keep the modifications to the invoice somewhere else while the user is editing them, then apply them when she hits save; you can do this final apply step in a transaction (albeit quite a short-lived one).
Long-lived transactions are usually bad.
The solution is not to open the transaction during the GET phase. Do all aspects of the transaction—BEGIN TRANSACTION, processing, and COMMIT—all during the POST triggered by the "save" button.
Persistent connections may help you:
http://php.net/manual/en/features.persistent-connections.php
Another is that when using
transactions, a transaction block will
also carry over to the next script
which uses that connection if script
execution ends before the transaction
block does.
But I recommend you to find another approach to the problem.
For example: create a cache table.
When you need to "commit", transfer the records from the cache table to the "real" tables.
Altough there are some good answers, I think that found some good responses to your question, that I was stuck with also. I think the best approach is using a framework like Doctrine (O/R mapping) that has this kind of approach somehow implemented. Here you have a link to what I'm talking about.
I have written a webservice using the PHP SOAP classes. It has functions to return XML data from an Oracle database, or to perform insert/update/delete on the database.
However, at the moment it is using Autocommit, so any operation is instantly commited.
I'm looking at how to queue up the transactions, and then commit the whole lot only when a user presses a button to "save". I'm having difficulty in finding out if this is possible. I can't maintain a consistent connection easily, as of course the webservice is called for separate operations.
I've tried using the PHP oci_pconnect function, but even when I connect each time with the same parameters, the session appears to have ended, and my changes aren't commited when I finally call oci_commit.
Any ideas?
Reusing the same uncommitted database session between PHP requests is not possible. You have no way to lock a user into a PHP processes or DB connection as the webserver will send a request to any one of many of them at random. Therefore you cannot hold uncommited data in the Oracle session between requests.
The best way to do this really depends on your requirements. My feeling is that you want some sort of session store (perhaps a database table, keyed on user_id) that can hold all the pending transactions between requests. When the user hits save, extract out all the pending requests and insert them into their final destination table and then commit.
An alternative would be to insert all the transactions with a flag that says they are not yet completed. Upon clicking save, update the flag to say they are completed.
Either way, you need somewhere to stage your pending requests until that save button is pressed.
DBMS_XA allows you to share transactions across sessions.