How long until MySQL transaction times out - php

How long can a MySQL transaction last until it times out? I'm asking because I'm planning to code an payment process for my e-commerce project somewhere along the lines of this (PHP/MySQL psuedo-code):
START TRANSACTION;
SELECT...WHERE id IN (1,2,3) AND available = 1 FOR UPDATE; //lock rows where "available" is true
//Do payment processing...
//add to database, commit or rollback based on payment results
I can not think of another way to lock the products being bought (so if two users buy it at the same time, and there is only one left in stock, one user won't be able to buy), process payment if products are available, and create a record based on payment results...

That technique would also block users who simply wanted to see the products other people are buying. I'd be exceptionally wary of any technique that relies on database row locking to enforce inventory management.
Instead, why not simply record the number of items currently tied up in an active "transaction" (here meaning the broader commercial sense, rather than the technical database sense). If you have a current_inventory field, add an on_hold or being_paid_for or not_really_available_because_they_are_being_used_elsewhere field that you can update with information on current payments.
Better yet, why not use a purchase / payment log to sum the items currently "on hold" or "in processing" for several different users.
This is the general approach you often see on sites like Ticketmaster that declare, "You have X minutes to finish this page, or we'll put these tickets back on the market." They're recording which items the user is currently trying to buy, and those records can even persist across PHP page requests.

If you have to ask how long it is before a database connection times out, then your transactions take orders of magnitudes too long.
Long open transactions are a big problem and frequent causes of poor performance, unrepeatable bugs or even deadlocking the complete application. Certainly in a web application you want tight fast transactions to make sure all table and row level locks are quickly freed.
I found that even several 100ms can become troublesome.
Then there is the problem of sharing a transaction over multiple requests which may happen concurrently.
If you need to "emulate" long running transactions, cut it into smaller pieces which can be executed fast, and keep a log so you can rollback using the log by undoing the transactions.
Now, if the payment service completes in 98% of cases in less than 2 sec and you do not have hundreds of concurrent requests going on, it might just be fine.

Timeout depends on server settings -- both that of mysql and that of the language you are using to interact with mysql. Look in the settings files for your server.
I don't think what you are doing would cause a timeout, but if you are worried you might want to rethink the location of your check so that it doesn't actually lock the tables across queries. You could instead have a stored procedure that is built into the data layer rather than relying on two separate calls. Or, maybe a conditional insert or a conditional update?
All in all, as another person noted, I don't like the idea of locking entire table rows which you might want to be able to select from for other purposes outside of the actual "purchase" step, as it could result in problems or bottlenecks elsewhere in your application.

Related

Will a PHP page get incomplete query results because said results are in the process of being replaced/recalculated elsewhere? [duplicate]

I read some articles about when I should use transactions. I read:
When should I use transactions?
Basically any time you have a unit of work that is either sensitive to outside changes or needs the ability to rollback every change, if
an error occurs or some other reason.
But can someone explain it better to me?
Should I start a transaction when I execute two or more
delete/update/insert queries?
Should I also start a transaction when I just have one delete/update/insert query?
Should I start a transaction like 10 times on a page, or better only once for the whole page, or do you recommend a max for each page (for example 5)?
Thanks for your help!
Transactions are used when you have a group of queries that all depend on one another.
For example, a bank:
Bank customer "John" transfers $100 to the account of "Alice".
For this example, there are 2 queries (I'm not showing logging or transaction history...etc). You need to deduct $100 from John's balance and add that to Alice's balance.
Start transaction
Deduct from John
UPDATE accounts SET balance=balance-100 WHERE account='John'
Add to Alice
UPDATE accounts SET balance=balance+100 WHERE account='Alice'
commit
A transaction isn't saved until you commit it. So if there was an error in either query, you could call rollback and undo any queries that have run since the transaction was started. If for some reason the query for adding $100 to Alice failed, you could rollback and not deduct $100 from John. It is a way of ensuring that you can undo queries automatically if needed.
Should I start a transaction when I execute two or more delete/update/insert queries?
Depends on what the queries are doing.
Should I also start a transaction when I just have one delete/update/insert query?
Not necessary unless you needed a way to rollback (undo) the query like you want to do an update and validate it before calling commit (save).
Should I start a transaction like 10 times on a page, or better only once for the whole page, or do you recommend a max for each page (for example 5)?
Start as many as you need. I would doubt that you have multiple transactions per page as you would most likely be doing one thing on each page load (i.e. transferring money).
You use transactions whenever you need a group of related queries to be executed as if it was only 1 query.
For example:
In a shopping cart someone ads a product, you update the store inventory (decrease existences), and updates the customer's cart (Increase products on cart). In this case 2 operations take place, and that's why you create a transaction, if one of the queries fails (insuficient products on the store) the second query should not happen (add product to customer's cart). In this case, the 2 queries conform a unit of work, meaning that the database should process both or none.
Now, taking into consideration your points, I don't see a solid reason for using transactions to execute unrelated queries (unless there is a special need). Depending on the database engine you are using, and the system you are developing unnecessary transactions will create a performance bottleneck.
On the other hand, there may be some applications where you may need to perform multiple transactions, but that is something you should decide on a case by case as there is no general rule.
Regarding your concern of limiting the amount of transactions per page, or create only one transaction per page I can only say that this depends on what you are trying to acomplish. You may may need to perform 10 related queries wich should be rolled back together upon failure then you need one transaction. If you need to run many unrelated queries with no need to roll back means you don't even need a transaction.
You can also check wikipedia's entry about this subject:
http://en.wikipedia.org/wiki/Database_transaction

PDO Transactions Locks?

I have done extensive research about MySQL transactions via the PHP PDO interface. I am still a little fuzzy about the actual background workings of the transaction methods. Specifically, I need to know if there is any reason that I should want to prevent all my queries (SELECTs included) inside a transaction spanning from the beginning of the script to the end? Of course, handling any error in the transaction and rolling them back if need be.
I want to know if there is any locking going on during a transaction and if so, is it row level locking because it is InnoDB?
Don't do that.
The reason for this is that transactions take advantage of MVCC a mechanism by which every piece of data updated is in fact not update-in-place but merely inserted in somewhere else.
MVCC implies allocating memory and or storage space to accumulate and operate all of the changes you send it without committing them to disk until you issue a COMMIT.
That means that while your entire script runs all changes are stored until the script ends. And all of the records that you try and change during the transaction are marked as "work in progress" so that other processes/threads can know that this data will soon be invalidated.
Having certain pieces of data marked as "work in progress" for the entire length of the script means that any other concurrent update will see the flag and say "i have to wait until this finishes so I'll get the most recent data".
This includes SELECTS depending on isolation levels. Selecting stuff that is marked as "work in progress" may not be what you want because some tables that you may want to join may contain already updated data while other tables are not updated yet resulting in a dirty read.
Transactionality and atomicity of operations is desirable but costly. Use it where it's needed. Yes that means more work for you to figure out where race conditions can happen and even if race conditions occur you have to make the decision if they are really critical or is "some" data loss/mix acceptable.
Would you like your logs and visit counters and other statistics to drag down the speed of your entire site? Or is the quality of that information sacrificable for speed (as long as it's not an analytics suit you can afford the occasional collision).
Would you like a seat reservation application to miss-fire and allow more users to grab a seat even after the seat cout was 0? of course not - here you want to leverage transactions and isolation levels to ensure that never happens.

High-load payment processing architecture in PHP

Imagine a local Groupon clone. Now imagine a deal that attracted 10x normal visitors and because visitors were trying to buy deal in parallel MySQL database went down and deal's maximum purchases limit was exceeded.
I'm looking for best practices of the payment processing for highly-loaded websites, that will handle payments for the limited amount of products in parallel.
For now the simplest options seems to lock/unlock deal while customer is trying to purchase it on a third-party payment processor's page.
Any thoughts?
I was with you until you started to talk about a 3rd party payment processors page. It's hard to control your user's experience while dishing them off to a 3rd party site, because you have no idea what they're doing while they're there, if they got side-tracked, how long they're going to take to finish the transaction, IF they finished the transaction, etc.
If processing payments locally is not an option, that's not necessarily a problem - it just presents an issue with how you have to actually think about handling your transactions.
So, if it were me, not thinking about the 3rd party right now - we'll set that aside for a minute. Obviously, I'd #1 make sure my MySQL database was resilient enough to not go down, because that creates a huge problem for reconciling transactions. But, things happen, so you need a backup.
My suggestion would be to utilize a caching system which kept track of the product, and the current # of products available. Memcache could be good for this, as it's just a single record which will be pretty easy to grab. You wouldn't have to hit the database at all to get info on your product (availability) and if it went down, your users/application would be none the wiser, as you'd be getting info straight from Memcache about your item (no mysql required).
This presents an issue (when the database goes down) with storing payment records. When you collect money, you obviously need that transaction information in your database, and if your database is down - well, that's a problem. Memcache is not such a great solution for this, because you're limited to the size of your value and you must know about every key you care about. On top of that, Memcache doesn't have sets or set operations, so you can't append to a value without fear of nuking some data.
So, lets add a different piece of technology, Redis.
A solution for the transaction problem would be to write them to redis in the event that your MySQL server is not available (or write to both if you really want to, but you don't really need to do that). Then have a background process that knows how to go get the transaction details from redis and write them to your MySQL table(s) when it comes back online. Redis is pretty resilient to crashing, and is capable of operating at huge volumes. It also has set operations so you can easily append data to a set without fear of a race condition during your read/change/write operations.
So, you could store all your transactions in a redis key as a single set (store them as json strings if you like, that'd be pretty easy), then when your DB crashes you can just go get that data from Redis and write it to MySQL when it comes back online.
To keep things simple, if you were going use redis to store transactions, you may as well also use it to store your product cache, instead of memcache - keep the stack simple.
This takes care of not accessing the database for your Product details, and also keeping track of your (potentially) missed transactions, should MySQL crash. But it doesn't handle the problem of keeping track of product inventory while new transactions come in while MySQL is down, and ensuring that you don't over-sell product.
To handle this case, when a transaction is saved, you can decrement the # of products available (keep it as a flat number, so you're not constantly re-calculating it on page-load). This will tell you instantly if the product is oversold or not. However, what this does not do is protect the time that the "product is in the cart." Once the user puts the product in the cart (which you've allowed because you said you have the inventory), you have the problem of making sure it doesn't sell out before they check out.
The solution to this problem also doubles as your solution to the 3rd party transaction problem. So you're using a caching mechanism for your products, and a fall-back mechanism for your transactions. What you should do now, is when a user tries to buy a product (either puts it in the carts, or is shot off to the 3rd party processor) create a "product reservation" for them. It's probably easiest to make a redis entry for each of these. Make product reservations have a expiry time, say 5 or 10, maybe even 15 minutes if you like. Every time you see a user on your site, refresh the timeout to make sure they don't run out of time (you can put more logic in this if you desire, obviously). When a transaction was completed and changed from pending to paid, you'd create your transaction record (mysql or redis, depending on database availability), decrement your available quantity, and delete your reservation record.
You'd then use your available quantity information, in addition to your un-expired reservation information, to determine the quantity available for sale. If this number ever drops to zero, then you are effectively sold out; but if a certain number of your users don't convert it frees up the inventory that they didn't buy, allowing you to rinse and repeat that process until you're in fact, sold out.
This is a pretty long explanation of a fairly robust system, and if you ever run into the situation where your MySQL server crashed, AND redis crashed, you'd be kind of screwed; so it makes sense to have a failover of both of those systems here (which is entirely feasible and possible). It should make for a pretty rock solid checkout/inventory management process.
Hope it helps.
Use master slave mysql configuration with read/write connections.
Use cache as much as possible (redis is good idea).
Try to put some logic into redis, so it will not make extra connection to mysql + it will be faster.
For transactions maybe it is wise to use some kind of message queuing system (rabbitMQ). it will allow you to forward some tasks into background.
Dispate all this optimization you will have big problems if db or cache engine or mq will fail. But using master slave for all these services you will be kind of on the safe side. i.e. using multiple machines that will be able to continue to work if other machine fails.
And that brings me to next idea. cloud services with auto scaling (like aws).
Do you consider Compensating Service Transaction ?

Do PHP pages on a server run simultaneously?

This probably seems like a very simple question, and I would probably know if I had a more in depth knowledge of computer processes and the like, but anyway..
If two people request the same page from my server, is the PHP page processed once for the first person, and then a second time for the second person, or might these run along side each other at the same time?
Take this as an example. I have one stock Item left in my PHP driven online shop. A user adds this to their cart. Php script 1) checks to see if it is in stock, Yup, its in stock, so it 2)reserves it for him.
What If, in between checking if its in stock and reserving it, the same PHP page was loading for someone else, and just after user A checked if it was in stock, so did user B, before user A got a chance to reserve it, so they both end up reserving it!
Sorry if this seems silly, can't seem to find an answer on it, which is it?
Congratulations, you have identified a race condition! :-)
Whether PHP pages run in parallel or one after the other depends on the web server. Typically a web server allocates several threads to handle multiple incoming requests at once. So it may indeed happen that several instances of the same script are run in parallel if two or more users request the same page at the same time. Due to timing and scheduling differences it is unpredictable when each page will execute which action exactly.
Hence for such situations as you describe it is important to program actions in an atomic way, meaning that they either complete in their entirety or not at all. In your case you could use locks, transactions, cleverly formed UPDATE statements, UNIQUE indexes or a number of other techniques that avoid the possibility of two users reserving the same thing.
Yes, in general, without getting into too much detail: PHP scripts are executed simultanously for each request separately.
For making sure the problem you mentioned does not occur, you should probably implement feature of your database management system called "transactions". This way if you do something on the database layer and at the end you will find out the reservation can not happen, all the actions made within transaction will be rolled back.
In addition to transactions you should design your application keeping in mind that the problem you mentioned may occur. Thus you should design your database & application in a way allowing you to 1) shorten the time between "checking" and "reserving" as much as possible, 2) stopping the action if you cannot make reservation, and finally - in case of emergency - 3) identifying which reservation came first and which should be revoked.
Another idea, falling into category of "your application's design", may be something we could call "temporary reservation". That means you can temporarily (eg. for a couple of seconds) lock your reservation if you are about to make reservation. After that you can check if you really can make that reservation and either turn it into permanent reservation or just revoke it. I believe some systems also make longer temporary reservations right after the customer begins the process of reserving his/her places. Then, if the process is successful, the reservation is changed into permanent, but if some specific amount of time passes without success, the reservation can be simply revoked, allowing another customer to begin the process.
yes definately, they are parallel for php but when the database concerns you should learn transaction portion of database management system.
Yes and no. PHP may run in simultaneous processes depending on server setup, but on a small-scale, you'll only have one database. Database queries are handled sequentially, so you'll never have that kind of conflict. (As long as you check to see if an item's in stock immediately before you reserve it for someone.) More information
Of course, Users A + B might both see that it's in stock, and A might request it before B. But your code can realize that it's now out of stock and display an error to User B.
(You get into trouble with multiple database servers. If you have the same data stored across multiple servers, there's lag time before data can be fully replicated. But you won't have that issue. We're talking like top 1,000 sites here.)

Database various connections vs. one

We have this PHP application which selects a row from the database, works on it (calls an external API which uses a webservice), and then inserts a new register based on the work done. There's an AJAX display which informs the user of how many registers have been processed.
The data is mostly text, so it's rather heavy data.
The process is made by thousands of registers a time. The user can choose how many registers to start working on. The data is obtained from one table, where they are marked as "done". No "WHERE" condition, except the optional "WHERE date BETWEEN date1 AND date2".
We had an argument over which approach is better:
Select one register, work on it, and insert the new data
Select all of the registers, work with them in memory and insert them in the database after all the work was done.
Which approach do you consider the most efficient one for a web environment with PHP and PostgreSQL? Why?
It really depends how much you care about your data (seriously):
Does reliability matter in this case? If the process dies, can you just re-process everything? Or can't you?
Typically when calling a remote web service, you don't want to be calling it twice for the same data item. Perhaps there are side effects (like credit card charges), or maybe it is not a free API...
Anyway, if you don't care about potential duplicate processing, then take the batch approach. It's easy, it's simple, and fast.
But if you do care about duplicate processing, then do this:
SELECT 1 record from the table FOR UPDATE (ie. lock it in a transaction)
UPDATE that record with a status of "Processing"
Commit that transaction
And then
Process the record
Update the record contents, AND
SET the status to "Complete", or "Error" in case of errors.
You can run this code concurrently without fear of it running over itself. You will be able to have confidence that the same record will not be processed twice.
You will also be able to see any records that "didn't make it", because their status will be "Processing", and any errors.
If the data is heavy and so is the load, considering the application is not real time dependant the best approach is most definately getting the needed data and working on all of it, then putting it back.
Efficiency speaking, regardless of language is that if you are opening single items, and working on them individually, you are probably closing the database connection. This means that if you have 1000's of items, you will open and close 1000's of connections. The overhead on this far outweighs the overhead of returning all of the items and working on them.

Categories