I have written a webservice using the PHP SOAP classes. It has functions to return XML data from an Oracle database, or to perform insert/update/delete on the database.
However, at the moment it is using Autocommit, so any operation is instantly commited.
I'm looking at how to queue up the transactions, and then commit the whole lot only when a user presses a button to "save". I'm having difficulty in finding out if this is possible. I can't maintain a consistent connection easily, as of course the webservice is called for separate operations.
I've tried using the PHP oci_pconnect function, but even when I connect each time with the same parameters, the session appears to have ended, and my changes aren't commited when I finally call oci_commit.
Any ideas?
Reusing the same uncommitted database session between PHP requests is not possible. You have no way to lock a user into a PHP processes or DB connection as the webserver will send a request to any one of many of them at random. Therefore you cannot hold uncommited data in the Oracle session between requests.
The best way to do this really depends on your requirements. My feeling is that you want some sort of session store (perhaps a database table, keyed on user_id) that can hold all the pending transactions between requests. When the user hits save, extract out all the pending requests and insert them into their final destination table and then commit.
An alternative would be to insert all the transactions with a flag that says they are not yet completed. Upon clicking save, update the flag to say they are completed.
Either way, you need somewhere to stage your pending requests until that save button is pressed.
DBMS_XA allows you to share transactions across sessions.
Related
When the web server receives a request for my PHP script, I presume the server creates a dedicated process to run the script. If, before the script exits, another request to the same script comes, another process gets started -- am I correct, or the second request will be queued in the server, waiting for the first request to exit? (Question 1)
If the former is correct, i.e. the same script can run simultaneously in a different process, then they will try to access my database.
When I connect to the database in the script:
$DB = mysqli_connect("localhost", ...);
query it, conduct more or less lengthy calculations and update it, I don't want the contents of the database to be modified by another instance of a running script.
Question 2: Does it mean that since connecting to the database until closing it:
mysqli_close($DB);
the database is blocked for any access from other software components? If so, it effectively prevents the script instances from running concurrently.
UPDATE: #OllieJones kindly explained that the database was not blocked.
Let's consider the following scenario. The script in the first process discovers an eligible user in the Users table and starts preparing data to append for that user in the Counter table. At this moment the script in the other process preempts and deletes the user from the Users table and the associate data in the Counter table; it then gets preempted by the first script which writes the data for the user no more existing. These data become in the head-detached state, i.e. unaccessible.
How to prevent such a contention?
In modern web servers, there's a pool of processes (or possibly threads) handling requests from users. Concurrent requests to the same script can run concurrently. Each request-handler has its own connection to the DBMS (they're actually maintained in a pool, but that's a story for another day).
The database is not blocked while individual request-handlers are using it, unless you block it explicitly by locking a table or doing a request like SELECT ... FOR UPDATE. For more information on this deep topic, read about transactions.
Therefore, it's important to write your database queries in such a way that they won't interfere with each other. For example, if you need to learn the value of an auto-incremented column right after you insert a row, you should use LAST_INSERT_ID() or mysqli_insert_id() instead of trying to query the data base: another user may have inserted another row in the meantime.
The system test discipline for scaled-up web sites usually involves a rigorous load test in order to shake out all this concurrency.
If you're doing a bunch of work on a particular entity, in your case a User, you use a transaction.
First you do
BEGIN
to start the transaction. Then you do
SELECT whatever FROM User WHERE user_id = <<whatever>> FOR UPDATE
to choose the user and mark that user's row as busy-being-updated. Then you do all the work you need to do to fill out various rows in various tables relating to that user.
Finally you do
COMMIT
If you messed things up, or don't want to go through with the change, you do
ROLLBACK
and all your changes will be restored to their state right before the SELECT ... FOR UPDATE.
Why does this work? Because if another client does the same SELECT .... FOR UPDATE, MySQL will delay that request until the first one either gets COMMIT or ROLLBACK.
If another client works with a different userid, the operations may proceed concurrently.
You need the InnoDB access method to use transactions: MyISAM doesn't support them.
Multiple reads can be done concurrently, if there is a write operation then it will block all other operations. A read will block all writes.
the best way to explain my question is with an example
say i have 3 scripts;
first one is a form, on submitting this it goes to second script which processes the POST variables and Inserts them into the DB
on this page/script there's another submit button, that takes you to the third page where the Insert query is finally commited to the DB
is this possible?
or do commit/rollback have to be on the same script?
thanks
Yes, commit/rollback has to be in the same request that started the transaction.
Another way of looking at this is that transactions must be resolved within the same database connection, and database connections (like any other resource) don't survive across multiple PHP requests.
As #Wrikken comments, you could save the uncommitted data in session data, or some other non-database holding space (e.g. memcached).
Another option is to save and commit the data in the database during each request, but add a column to your table for the state of the data. Therefore it would be physically committed with respect to database transactions, but it would be annotated as incomplete until you finish handling the third script.
I've implemented systems like this, for example for "shopping cart" style information. It also helps to run a daily cron job to delete old, unfinished data. Because inevitably, people do sometimes abandon their work in progress and never get to the finish step.
I'm working on an AJAX application. The user clicks a button and his name is saved into the database and shown inside a <div>, whose content is fetched from the database by means of an AJAX Long Polling. The database also contains a timestamp which represents an expiration: subscriptions beyond that timestamp must not be accepted. There is also a limit for users to subscribe.
I have a PHP script that is called by an AJAX request, this script queries the database and checks for expiration (the timestamp of the click is computed by JavaScript and sent via AJAX). It also checks for user limit: i have a N-to-N relationship between Users and Products (to subscribe for). These tasks obviously take time and I'm worried about possible concurrency problems. Should I use database transactions? What technique could I use to ensure the atomicity of this operation?
It depends on the kind of work that is done those "long" tasks.
Generic info:
If you're only inserting user driven data and data generated in PHP without it being read and/or cross-correlated with data fetched from the DB then transactionality should not be an issue.
If you're updating data and cross-correlating it with other elements in the DB then you need to start using transactions and to carefully choose the isolation levels of the transactions you plan on using.
Transactions can seriously affect speed when concurrency rises. Choosing a very safe isolation level may be safer than needed for your application and you may be adding a lot of unnecessary work to the MVCC.
Also using transactions as separate PHP api calls and managing the rollback logic in the application increases the overall duration of the transaction because it adds all the processing delays generated by PHP. If you can compact DB communications into a set of queries requested in one communication it would be better.
Case info:
Let's consider this scenario: there are 8 slots, 7 users subscribed. Two users click the subscribe button almost simultaneously. When the control script is launched for the last clicking user, the query for the subscription of the first clicking user might still be executed. This would imply that the system accepts both users as valid subscriptions.
This falls into the second case I explained, the case when you're cross-correlating user driven data with what you have in the DB. You're reading the state of the db before you commit the user drive data, so yes you would need transactions in this case.
There may be a possibility to speculate the inherent atomicity of one update statement. Any UPDATE table_name SET x = x+1 WHERE a = 'value'; is guaranteed to be atomic. You can use this to your advantage.
All subscribing PHP threads must first decrement a subscriber count. If the number of affected rows on the decrement is not 0 that means that the decrement was successful and they can carry on submitting the user-related data, else inform the user he was 0.3ms too slow.
We have this PHP application which selects a row from the database, works on it (calls an external API which uses a webservice), and then inserts a new register based on the work done. There's an AJAX display which informs the user of how many registers have been processed.
The data is mostly text, so it's rather heavy data.
The process is made by thousands of registers a time. The user can choose how many registers to start working on. The data is obtained from one table, where they are marked as "done". No "WHERE" condition, except the optional "WHERE date BETWEEN date1 AND date2".
We had an argument over which approach is better:
Select one register, work on it, and insert the new data
Select all of the registers, work with them in memory and insert them in the database after all the work was done.
Which approach do you consider the most efficient one for a web environment with PHP and PostgreSQL? Why?
It really depends how much you care about your data (seriously):
Does reliability matter in this case? If the process dies, can you just re-process everything? Or can't you?
Typically when calling a remote web service, you don't want to be calling it twice for the same data item. Perhaps there are side effects (like credit card charges), or maybe it is not a free API...
Anyway, if you don't care about potential duplicate processing, then take the batch approach. It's easy, it's simple, and fast.
But if you do care about duplicate processing, then do this:
SELECT 1 record from the table FOR UPDATE (ie. lock it in a transaction)
UPDATE that record with a status of "Processing"
Commit that transaction
And then
Process the record
Update the record contents, AND
SET the status to "Complete", or "Error" in case of errors.
You can run this code concurrently without fear of it running over itself. You will be able to have confidence that the same record will not be processed twice.
You will also be able to see any records that "didn't make it", because their status will be "Processing", and any errors.
If the data is heavy and so is the load, considering the application is not real time dependant the best approach is most definately getting the needed data and working on all of it, then putting it back.
Efficiency speaking, regardless of language is that if you are opening single items, and working on them individually, you are probably closing the database connection. This means that if you have 1000's of items, you will open and close 1000's of connections. The overhead on this far outweighs the overhead of returning all of the items and working on them.
I would like to create an interface for manipulating invoices in a transaction-like manner.
The database consists of an invoices table, which holds billing information, and an invoice_lines table, which holds line items for the invoices. The website is a set of scripts which allow the addition, modification, and removal of invoices and their corresponding lines.
The problem I have is this, I would like the ACID properties of the database to be reflected in the web application.
Atomic: When the user hits save, either the entire invoice is modified or the entire invoice is not changed at all.
Consistent: The application code already ensures consistency, lines cannot be added to non-existent invoices. Invoice IDs cannot be duplicated.
Isolated: If a user is in the middle of a set of changes to an invoice, I would like to hide those changes from other users until the user clicks save.
Durable: If the web site dies, the data should be safe. This already works.
If I were writing a desktop application, it would maintain a connection to the MySQL database at all times, allowing me to simply use the BEGIN TRANSACTION and COMMIT at the beginning and end of the edit.
From what I understand you cannot BEGIN TRANSACTION on one PHP page and COMMIT on a different page because the connection is closed between pages.
Is there a way to make this possible without extensions? From what I have found, only SQL Relay does this (but it is an extension).
you don't want to have long running transactions, because that'll limit concurrency. http://en.wikipedia.org/wiki/Command_pattern
The translation on the web for this type of processing is the use of session data or data stored in the page itself. Typically what is done is that after each web page is completed the data is stored in the session (or in the page itself) and at the point in which all of the pages have been completed (via data entry) and a "Process" (or "Save") button is hit, the data is converted into the database form and saved - even with the relational aspect of data like you mentioned. There are many ways to do this but I would say that most developers have an architecture similar to what I mentioned (using session data or state within the page) to satisfy what you are talking about.
You'll get much advice here on different architectures but I can say that the Zend Framework (http://framework.zend.com) and the use of Doctrine (http://www.doctrine-project.org/) make this fairy easy since Zend provides much of the MVC architecture and session management and Doctrine provides the basic CRUD (create, retrieve, update, delete) you are looking for - plus all of the other aspects (uniqueness, commit, rollback, etc). Keeping the connection open to mysql may cause timeouts and lack of available connections.
Database transactions aren't really intended for this purpose - if you did use them, you'd probably run into other problems.
But also you can't use them as each page request uses its own connection (potentially) so cannot share a transaction with any others.
Keep the modifications to the invoice somewhere else while the user is editing them, then apply them when she hits save; you can do this final apply step in a transaction (albeit quite a short-lived one).
Long-lived transactions are usually bad.
The solution is not to open the transaction during the GET phase. Do all aspects of the transaction—BEGIN TRANSACTION, processing, and COMMIT—all during the POST triggered by the "save" button.
Persistent connections may help you:
http://php.net/manual/en/features.persistent-connections.php
Another is that when using
transactions, a transaction block will
also carry over to the next script
which uses that connection if script
execution ends before the transaction
block does.
But I recommend you to find another approach to the problem.
For example: create a cache table.
When you need to "commit", transfer the records from the cache table to the "real" tables.
Altough there are some good answers, I think that found some good responses to your question, that I was stuck with also. I think the best approach is using a framework like Doctrine (O/R mapping) that has this kind of approach somehow implemented. Here you have a link to what I'm talking about.