MySQL Transaction across many PHP Requests - php

I would like to create an interface for manipulating invoices in a transaction-like manner.
The database consists of an invoices table, which holds billing information, and an invoice_lines table, which holds line items for the invoices. The website is a set of scripts which allow the addition, modification, and removal of invoices and their corresponding lines.
The problem I have is this, I would like the ACID properties of the database to be reflected in the web application.
Atomic: When the user hits save, either the entire invoice is modified or the entire invoice is not changed at all.
Consistent: The application code already ensures consistency, lines cannot be added to non-existent invoices. Invoice IDs cannot be duplicated.
Isolated: If a user is in the middle of a set of changes to an invoice, I would like to hide those changes from other users until the user clicks save.
Durable: If the web site dies, the data should be safe. This already works.
If I were writing a desktop application, it would maintain a connection to the MySQL database at all times, allowing me to simply use the BEGIN TRANSACTION and COMMIT at the beginning and end of the edit.
From what I understand you cannot BEGIN TRANSACTION on one PHP page and COMMIT on a different page because the connection is closed between pages.
Is there a way to make this possible without extensions? From what I have found, only SQL Relay does this (but it is an extension).

you don't want to have long running transactions, because that'll limit concurrency. http://en.wikipedia.org/wiki/Command_pattern

The translation on the web for this type of processing is the use of session data or data stored in the page itself. Typically what is done is that after each web page is completed the data is stored in the session (or in the page itself) and at the point in which all of the pages have been completed (via data entry) and a "Process" (or "Save") button is hit, the data is converted into the database form and saved - even with the relational aspect of data like you mentioned. There are many ways to do this but I would say that most developers have an architecture similar to what I mentioned (using session data or state within the page) to satisfy what you are talking about.
You'll get much advice here on different architectures but I can say that the Zend Framework (http://framework.zend.com) and the use of Doctrine (http://www.doctrine-project.org/) make this fairy easy since Zend provides much of the MVC architecture and session management and Doctrine provides the basic CRUD (create, retrieve, update, delete) you are looking for - plus all of the other aspects (uniqueness, commit, rollback, etc). Keeping the connection open to mysql may cause timeouts and lack of available connections.

Database transactions aren't really intended for this purpose - if you did use them, you'd probably run into other problems.
But also you can't use them as each page request uses its own connection (potentially) so cannot share a transaction with any others.
Keep the modifications to the invoice somewhere else while the user is editing them, then apply them when she hits save; you can do this final apply step in a transaction (albeit quite a short-lived one).
Long-lived transactions are usually bad.

The solution is not to open the transaction during the GET phase. Do all aspects of the transaction—BEGIN TRANSACTION, processing, and COMMIT—all during the POST triggered by the "save" button.

Persistent connections may help you:
http://php.net/manual/en/features.persistent-connections.php
Another is that when using
transactions, a transaction block will
also carry over to the next script
which uses that connection if script
execution ends before the transaction
block does.
But I recommend you to find another approach to the problem.
For example: create a cache table.
When you need to "commit", transfer the records from the cache table to the "real" tables.

Altough there are some good answers, I think that found some good responses to your question, that I was stuck with also. I think the best approach is using a framework like Doctrine (O/R mapping) that has this kind of approach somehow implemented. Here you have a link to what I'm talking about.

Related

How does php and apache handle multiple requests?

How does PHP handle multiple requests from users? Does it process them all at once or one at a time waiting for the first request to complete and then moving to the next.
Actually, I'm adding a bit of wiki to a static site where users will be able to edit addresses of businesses if they find them inaccurate or if they can be improved. Only registered users may do so. When a user edits a business name, that name along with it's other occurrences is changed in different rows in the table. I'm a little worried about what would happend if 10 users were doing this simultaneously. It'd be a real mishmash of things. So does PHP do things one at time in order received per script (update.php) or all at once.
Requests are handled in parallel by the web server (which runs the PHP script).
Updating data in the database is pretty fast, so any update will appear instantaneous, even if you need to update multiple tables.
Regarding the mish mash, for the DB, handling 10 requests within 1 second is the same as 10 requests within 10 seconds, it won't confuse them and just execute them one after the other.
If you need to update 2 tables and absolutely need these 2 updates to run subsequently without being interrupted by another update query, then you can use transactions.
EDIT:
If you don't want 2 users editing the same form at the same time, you have several options to prevent them. Here are a few ideas:
You can "lock" that record for edition whenever a user opens the page to edit it, and not let other users open it for edition. You might run into a few problems if a user doesn't "unlock" the record after they are done.
You can notify in real time (with AJAX) a user that the entry they are editing was modified, just like on stack overflow when a new answer or comment was posted as you are typing.
When a user submits an edit, you can check if the record was edited between when they started editing and when they tried to submit it, and show them the new version beside their version, so that they manually "merge" the 2 updates.
There probably are more solutions but these should get you started.
It depends on which version of Apache you are using and how it is configured, but a common default configuration uses multiple workers with multiple threads to handle simultaneous requests. See http://httpd.apache.org/docs/2.2/mod/worker.html for a rundown of how this works. The end result is that your PHP scripts may together have dozens of open database connections, possibly sending several queries at the exact same time.
However, your DBMS is designed to handle this. If you are only doing simple INSERT queries, then your code doesn't need to do anything special. Your DBMS will take care of the necessary locks on its own. Row-level locking will be fastest for multiple INSERTs, so if you use MySQL, you should consider the InnoDB storage engine.
Of course, your query can always fail whether it's due to too many database connections, a conflict on a unique index, etc. Wrap your queries in try catch blocks to handle this case.
If you have other application-layer concerns about concurrency, such as one user overwriting another user's changes, then you will need to handle these in the PHP script. One way to handle this is to use revision numbers stored along with your data, and refusing to execute the query if the revision number has changed, but how you handle it all depends on your application.

Do PHP pages on a server run simultaneously?

This probably seems like a very simple question, and I would probably know if I had a more in depth knowledge of computer processes and the like, but anyway..
If two people request the same page from my server, is the PHP page processed once for the first person, and then a second time for the second person, or might these run along side each other at the same time?
Take this as an example. I have one stock Item left in my PHP driven online shop. A user adds this to their cart. Php script 1) checks to see if it is in stock, Yup, its in stock, so it 2)reserves it for him.
What If, in between checking if its in stock and reserving it, the same PHP page was loading for someone else, and just after user A checked if it was in stock, so did user B, before user A got a chance to reserve it, so they both end up reserving it!
Sorry if this seems silly, can't seem to find an answer on it, which is it?
Congratulations, you have identified a race condition! :-)
Whether PHP pages run in parallel or one after the other depends on the web server. Typically a web server allocates several threads to handle multiple incoming requests at once. So it may indeed happen that several instances of the same script are run in parallel if two or more users request the same page at the same time. Due to timing and scheduling differences it is unpredictable when each page will execute which action exactly.
Hence for such situations as you describe it is important to program actions in an atomic way, meaning that they either complete in their entirety or not at all. In your case you could use locks, transactions, cleverly formed UPDATE statements, UNIQUE indexes or a number of other techniques that avoid the possibility of two users reserving the same thing.
Yes, in general, without getting into too much detail: PHP scripts are executed simultanously for each request separately.
For making sure the problem you mentioned does not occur, you should probably implement feature of your database management system called "transactions". This way if you do something on the database layer and at the end you will find out the reservation can not happen, all the actions made within transaction will be rolled back.
In addition to transactions you should design your application keeping in mind that the problem you mentioned may occur. Thus you should design your database & application in a way allowing you to 1) shorten the time between "checking" and "reserving" as much as possible, 2) stopping the action if you cannot make reservation, and finally - in case of emergency - 3) identifying which reservation came first and which should be revoked.
Another idea, falling into category of "your application's design", may be something we could call "temporary reservation". That means you can temporarily (eg. for a couple of seconds) lock your reservation if you are about to make reservation. After that you can check if you really can make that reservation and either turn it into permanent reservation or just revoke it. I believe some systems also make longer temporary reservations right after the customer begins the process of reserving his/her places. Then, if the process is successful, the reservation is changed into permanent, but if some specific amount of time passes without success, the reservation can be simply revoked, allowing another customer to begin the process.
yes definately, they are parallel for php but when the database concerns you should learn transaction portion of database management system.
Yes and no. PHP may run in simultaneous processes depending on server setup, but on a small-scale, you'll only have one database. Database queries are handled sequentially, so you'll never have that kind of conflict. (As long as you check to see if an item's in stock immediately before you reserve it for someone.) More information
Of course, Users A + B might both see that it's in stock, and A might request it before B. But your code can realize that it's now out of stock and display an error to User B.
(You get into trouble with multiple database servers. If you have the same data stored across multiple servers, there's lag time before data can be fully replicated. But you won't have that issue. We're talking like top 1,000 sites here.)

Store some records in the application and some in the database?

I have an application where it seems as if it would make sense to store some records hard-coded in the application code rather than an entry in the database, and be able to merge the two for a common result set when viewing the records. Are there any pitfalls to this approach?
Firstly, it would seem to make it easier to enforce that a record is never edited/deleted, other than when the application developer wants to. Second, in some scenarios such as installing a 3rd party module, the records could be read from their configuration rather than performing an insert in the db (with the related maintenance issues).
Some common examples:
In the application In the database
----------------------------------- ------------------ ----------------------
customers (none) all customers
HTML templates default templates user-defined templates
'control panel' interface languages default language additional languages
Online shop payment processors all payment processors (none)
So, I think I have three options depending on the scenario:
All records in the database
Some records in the application, some records in the database
All records in the application
And it seems that there are two ways to implement it:
All records in the database:
A column could be flagged as 'editable' or 'locked'
Negative IDs could represent locked values and positive IDs could represent editable
Odd IDs represent locked and even IDs represent editable...
Some records live in the application (as variables, arrays or objects...)
Are there any standard ways to deal with this scenario? Am I missing some really obvious solutions?
I'm using MySQL and php, if that changes your answer!
By "in the application", do you mean these records live in the filesystem, accessible to the application?
It all depends on the app you're building. There are a few things to consider, especially when it comes to code complexity and performance. While I don't have enough info about your project to suggest specifics, here are a few pointers to keep in mind:
Having two possible repositories for everything ramps up the complexity of your code. That means readability will go down and weird errors will start cropping up that are hard to trace. In most cases, it's in your best interest to go with the simplest solution that can possibly work. If you look at big PHP/MySQL software packages you will see that even though there are a lot of default values in the code itself, the data comes almost exclusively from the database. This is probably a reasonable policy when you can't get away with the simplest solution ever (namely storing everything in files).
The big downside of heavy database involvement is performance. You should definitely keep track of all the database calls of any typical codepath in your app. If you rely heavily on lots of queries, you have to employ a lot of caching. Track everything that happens and keep in mind what the computer has to in order to fulfill the request. It's you job to make the computer's task as easy as possible.
If you store templates in the DB, another big performance penalty will be the lack of opcode re-use and caching. Normal web hosting environments compile a PHP file once and then keep the bytecode version of it around for a while. This saves subsequent recompiles and speeds up execution substantially. But if you fill PHP template code into an eval() statement, this code will have to be recompiled by PHP every single time it's called.
Also, if you're using eval() in this fashion and you allow users to edit templates, you have to make sure those users are trusted - because they'll have access to the entire PHP environment. If you're going the other route and are using a template engine, you'll potentially have a much bigger performance problem (but not a security problem). In any case, consider caching template outputs wherever possible.
Regarding the locking mechanism: it seems you are introducing a big architectural issue here since you now have to make each repository (file and DB) understand what records are off-limits to the other one. I'd suggest you reconsider this approach entirely, but if you must, I'd strongly urge you to flag records using a separate column for it (the ID-based stuff sounds like a nightmare).
The standard way would be to keep classical DB-shaped stuff in the DB (these would be user accounts and other stuff that fits nicely into tables) and keep the configuration, all your code and template things in the filesystem.
I think that keeping some fixed values hard-coded in the application may be a good way to deal with the problem. In most cases, it will even reduce load on database server, because some not all the values must be retrieved via SQL.
But there are cases when it could lead to performance issues, mainly if you have to join values coming from the database with your hard-coded values. In this case, storing all the values in database may have better performance, because all values could be optimized and processed by the database server, rather than getting all the values from SQL query and joining them manually in the code.
To deal with this case, you can store the values in database, but inserts and updates must be handled just by your maintenance or upgrade routines. If you have a bigger concern about not letting the data be modified, you can setup a maintenance routine to check if the values from the database are the same as the code from time to time. In this case, this database tables act much like a "cache" of the hard-coded values. And when you don't need to join the fixed values with the database values, you can still get them from the code, avoiding an unnecessary SQL query (because you're sure the values are the same).
In general, anytime you're performing a database query if you want to include something that's hard-coded into the work-flow, there isn't any joining that needs to happen. You would simply the action on your hard-coded data as well as the data you pulled from the database. This is especially true if we're talking about information that is formed into an object once it is in the application. For instance, I can see this being useful if you want there to always be a dev user in the application. You could have this user hard-coded in the application and whenever you would query the database, such as when you're logging in a user, you would check your hard-coded user's values before querying the database.
For instance:
// You would place this on the login page
$DevUser = new User(info);
$_SESSION['DevUser'] = $DevUser;
// This would go in the user authentication logic
if($_SESSION['DevUser']->GetValue(Username) == $GivenUName && $_SESSION['DevUser']->GetValue(PassHash) == $GivenPassHash)
{
// log in user
}
else
{
// query for user that matches given username and password hash
}
This shows how there doesn't need to be any special or tricky database stuff going on. Hard-coding variables to include in your database driven workflow is extremely simple when you don't over think it.
There could be a case where you might have a lot of hard-coded variables/objects and/or you might want to execute a large block of logic on both sets of information. In this case it could be beneficial to have an array that holds the hard-coded information and then you could just add the queried information to that array before you perform any logic on it.
In the case of payment processors, I would assume that you're referring to online payments using different services such as PayPal, or a credit card, or something else. This would make the most sense as a Payment class that has a separate function for each payment method. That way you can call whichever method the client chooses. I can't think of any other way you would want to handle this. If you're maybe talking about the payment options available to your customers, that would be something hard-coded on your payment page.
Hopefully this helps. Remember, don't make it more complicated than it needs to be.

Persistent transaction in a client-server approach

In my application (client-server) I need to edit some rows (from a database), and as long as they are edited it needs nobody to be able to edit also. This is done by transactions of course. The problem is that in a client-side environment the transactions is managed on the server side, so the client that edits the rows can't access the transaction directly. (I'm working with PHP in that situation but think that the same approach is adopted in other technologies also). So I need to keep transaction opened (for keeping rows locked for editing) until the client finishes the edit.
In PHP, persistent connection won't help because they can be broken from other clients located in the same host with the aforesaid client. Do you have any ideeas for my scenario?
thank you.
Usually such cases are handled through business locks that you set directly on the objects, or on the parent of the objects.
Add a column such "inedition" that you set to true when user claims for edit, and set to false when user validate/cancel its edit.
Be aware that some users transactions are likely to be lost before that you unlock the row, so you'll probably require:
either a periodic treatment that unlock rows
either a functionnal screen from which the user or an admin can unlock the rows that remained locked.
Edit:
This kind of solution is used whenever you do not want to rely on database specific feature, such Oracle "Select for update". In Java an EJB statefull bean can keep a reference to the transaction from UI to database. There might be solutions using PHP for Oracle or other database specific feature regarding transactions, depending on databases.

Handling sessions without ACID database?

I am thinking about using a noSQL (mongoDB) paired with memcached to store sessions with in my webapp. The idea is that upon each page load, the user data is compared to the data in the memcache and if something has changed, the data would be written to both memcached and mySQL. This way the reads would be greatly reduced and memcached utilized to do what it does best.
However I am a bit concerned about using a non-ACID database for session storage especially with the memcached layer. Let's say something goes wrong while updating the session to the DB and our users got instant headache wondering why their product that they put in the cart doesn't show up...
What's an appropriate approach to this? Should we go for a mySQL session storage or is it fine to keep a non-acid supportive database for sessions?
Thanks!
I'm using MongoDB as session storage currently. It is possible to avoid race conditions mentioned by pilif. I found a class that implements a session handler for MongoDB (http://www.jqueryin.com/projects/mongo-session/) and forked it on github to suit my needs (http://github.com/halfdan/MongoSession).
If you don't want to lose your data, stick with ACID tested databases.
What's the payoff you're looking for?
If you want a secure system, you can't trust anything from the user, save for perhaps selected integers, so letting them store the information is typically a really bad idea.
I don't see the payoff for storing sessions outside of your MySQL database. You can cron cleanup on the tables if that's your concern, but why bother? Some users will shop on a site and then get distracted for a while. They would then come back a day or two later.
If you use cookies or something really temporary to store their session info, there is a really good chance their shopping time was wasted. Users really value their time... so if you stored their session info in the database, you can write something sexy to manage that data.
Plus, the nice side effect of this is that you'll generate a lot of residual information about what people like on your website that wouldn't perhaps be available to you later on. Like you could even consider some of it to be like a poll or something where the items people are adding to their cart could impact how you manage your business, order inventory or focus your marketing.
If you go with something really temporary then you lose out on getting residual benefits.
Without any locking on the session, be really, really careful of what you are storing. Never ever store anything that is dependent on what you have read before as the data might change between you reading and writing - especially in case of ajax where multiple requests can go out at once.
An example what you must not store in a non-locked session would be a shopping cart as, to add a product, you have to read, unserialize, add the product and then serialize again. If any other request does the same thing between the first requests read and write, you lose the second request's data.
Have a look at this article for detail: http://thwartedefforts.org/2006/11/11/race-conditions-with-ajax-and-php-sessions/
Keep Sessions on your filesystem (where PHP locks them for you), in your database (where you have to do manual locking) or never, ever, write anything of value to your session if that value is derived of a previous read.
While using memcached as a cache for database, it is the user who have to ensure the data consistency between database and cache. If you'll want to scale up and add more servers there is a probability to be out of sync with database even if everything seems ok.
Instead you may consider Hazelcast. As of 1.9 it also supports memcache protocol. Compared to memcached Hazelcast wants you to implement Map Persister and only itself updates the database for the updated entries. This way you don't have to handle "check cache, if data changed update database" kind of stuff.
If you write your app so that the user stores all session information client side, then you just verify that information as needed, you won't need to worry about sessions on the server side. This is one of the principles in REST style architecture. For instance, if the user is requesting adding an item to their shopping cart, just store the itemID list and count on the client side. When you hit the cart page, you can easily look up the item information from the list of itemIDs they are telling you are in their cart.
During checkout, go directly against the database with transactions to ensure you aren't getting any race conditions, and check your live inventory. If inventory isn't there when they go to check out, just say, "sorry, we just sold out". Of course, at that point you should go update any caches you have out there that are telling people you have inventory.
I would look at how much the user costs to acquire and then ask what is the cost for implementing a really good system. Keep in mind that users are a biological retry method. "I'm bored... press reload again..." While, this isn't the most perfect solution, it is sometimes acceptable vs the cost comparsion for "not lose anything - ever".
If you want additional security, you can have your sessions cached to a separate set of memcache servers so there are no accidental flushes. :)
There are a number of other systems membase.org, and some other persistent memcache solutions (java implementations) that will persist storage to disk. If you want to modify your client somewhat, or how you access memcache, you can do your own replication of memcache session objects.
-daniel

Categories