So I wanted to start a browser game project. Just for practise.
After years of PHP programming, today I heard about transactions and innoDB the first time ever.
So I googled it and still have some questions.
My first encounter with it was on a website that says, InnoDB would be necessary when programming a browsergame. Because it might be used by many people at the same time and if two people access to a database table at the same time (with one nanosecond difference for example), it could get confusing and data might be lost or your SELECT is not updated although it should have been updated by the access one nanosecond ago (but the script was still running and couldn't change it yet) ... and so on.
And apparently, transactions solve this problem by first handling the first access (until it is completed) and then handling the second one. Is this correct?
And another function is, that if you have for example 2 queries in your transaction and the second one fails, it "rolls back" and "deletes"(or never applies) the changes of the first (successful) query. Right? So either everything goes as it should or nothing changes at all. That would be great I think.
Another question: When should I use transactions? Everytime I access the database? Or is it better to use it just for some particular accesses to the database? And should I always use try {} catch() {}?
And one last question:
How does this transaction proceeds?
My understanding is the following:
You start a transaction
You do your queries and change the database or SELECT something
If everything went well, you commit the changes so they get applied to the database
If something went wrong with queries it cancels and jumps to the catch() {} where you rollback the transaction and the changes don't get applied
Is this correct? Of course, besides the question how to start, commit and rollback a transaction in your code.
Yes this is correct.You can also create savepoints to save your current point before running the query.I stricly recommend you to look into the documentation of mysql references it is explained there clearly.
Related
Is there a way to roll back a transaction later in time? I want to make a function, where the user can upload an excel file and than the data it contains converted into sql inserts inside a transaction. If there is an error, I can roll back the transaction, but I also would like to be able to roll it back when the user wants to. So, basically it is an UNDO LAST SESSION/UPLOAD function to this section.
I am using PHP5.4, Laravel 4.2 and MySQL5.5
Rollback is a technical term in database operations; it has a precise meaning related to transactions.
What you're proposing to do doesn't sound like a rollback. It sounds like deleting rows--deleting rows at an arbitrary time. (Arbitrary time, because last session could be hours or days or weeks ago, right?)
As long as you know which keys are involved--and that might not be a simple thing to do--you can delete rows by using the keys. And you can do this regardless of the dbms and regardless of the ORM or web framework.
One complication in the general case is that committed rows are visible to other transactions. So other transactions--and other users--might use some of your rows in foreign key references. This will greatly complicate deleting rows.
I currently use InnoDB transactions to manage the effect of any single webpage request. One request per transaction. This works well if the request fails I can just ignore it.
As a relative newbie to MySQL administration, I remain worried that something I write into my PHP code will do something bad to my database. DELETE FROM or UPDATE without a where statement or something as an extreme example. The idea of the transactions is that when I inevitably notice what happened later, after the bad transaction is committed, I should be able to rollback the mistake.
However, the database is used heavily, so its likely that other transactions will come in between when I commit the bad transaction and when I notice it and go to act on it. But all the documentation I have seen on transactions, and the AWS restore-to-point-in-time, only allow you "go back" to before a transaction is committed.
So, how do I recover or "roll-forward" the transactions that came in after my bad one? They are in the InnoDB log, so should I be able to apply the later transactions again, just skipping the one bad one? My software interfaces with an external credit card processor, so just losing those later transactions isn't an option.
I have a hard time imagining its impossible, but I can't find any way to "roll-forward". Is this possible? Is it something you have to write into the database structure itself, like keeping a history table with triggers and using a history record to update after rolling back?
Confused.
I have a dilemma, which I hope you will have some expert opinions on.
I have a table called CARDS with a column STATUS. If a record's status changes from 'download' to 'publish', I have to insert the record reference into another table called CARD_ASSIGNMENTS. Additionally, the record needs to be added into CARD_ASSIGNMENTS as many times as there are active records in SCANNERS.
In other words, if there are two active scanners, I will end up with two records in CARD_ASSIGNMENTS as below:
ID CARD_ID SCANNER_ID STATUS_ID
1 1 1 4
2 1 2 4
My dilemma is that I'm not quite sure what would be the most efficient way to execute the above. I've considered the following options:
From PHP - Do one UPDATE query and then the INSERT queries.
Create a stored procedure, which will take care of updating the CARDS record and adding records into the CARD_ASSIGNMENTS. Then, just call that stored procedure from PHP.
Create an ON UPDATE trigger for the CARDS table which will take care of processing INSERTS into the CARD_ASSIGNMENTS table.
PS. A simplified version of my database is available on MySQL Fiddle
Thanks,
Kate
Interesting question.
I'm going to give you clues about how to approach the problem.
So, you have to start by defining precisely three things:
the expected functionality
the access policy to the functionality
the technical upgrade policy
Here I'll detail these points.
So, the first point is that you have to define your functionality. By doing so, you will be able to tell whether adding a card implies always, in all the possible paradigms (sorry for the pedantic word I can't find a more proper one) of your information system, that this card MUST exist in the other table according to the specifications you provided. This 1-1 functional link must be said TRUE or FALSE. This is really important.
Said with other words, if there's at least one possibility that one day you don't want to copy that record to the other table, it means the trigger is a wrong solution, or at least it should be thought with an emergency mode (for example a variable inside that allows it to not get executed in some conditions) setup on.
Then comes the second point, about the access policy. You have to know whether the allowed accessing systems will do so by using your application layer or if they could develop their own (SAAS style). If so, your php layer will be useless and the stored procedure is an excellent option, since every single technical and business layer will go trough it yes or yes.
The last thing to know is whether you're possibly going to upgrade your php layer one day. In most of the cases the answer is yes. If so, you might have to modify the part containing this sql logic you're talking about. Then, having everything into a stored procedure vs storing it hardcoded into the php will definitely save you time, and improve stability.
Left brain right brain, I'm going to tell you my personal opinion afterall. I really love going with stored procedures but not using any triggers. If the environment allows it, I would go for an underlying batch, calling a set of defined stored procedures, concentrating the activity outside of the online scope.
The advantages are the following:
none or less risks of interruption of the online workflow since you reduce the number of operations
different schedule to alliviate the database load
more secure policy since executing the stored procedure requires only one grant, while using the same sql with php would require insert/update grants
better logging quality: you can have a log per job
better emergency response: when a job fails (if well thought) you can restart it, and that's it.
Long post, but that was interesting and I really wanted to share these ideas.
Cheers!
I would use triggers. Some developers say, that if you have too many triggers and stored procedures, the database lives its own life, that means you never know what is going to happen on insert, update etc. But in my opinion, triggers may help you a lot to keep database consistent, so even if someone inserts data directly from some administration tool, the integrity is still kept, because all necessary commands are executed. If you choose stored procedures, you would still have to know, that you need to call this procedure to insert any new data.
Background: I'm working on a system where the developers seem to be using a function which executes a MYSQL query like "SELECT MAX(id) AS id FROM TABLE" whenever they need to get the id of the LAST inserted row (the table having an auto_increment column).
I know this is a horrible practice (because concurrent requests will mess the records), and I'm trying to communicate that to the non-tech / management team, to which their response is...
"Oh okay, we'll only face this problem when we have
(a) a lot of users, or
(b) it'll only happen when two people try doing something
at _exactly_ the same time"
I don't disagree with either point, and think we'll run into this problem much sooner than we plan. However, I'm trying to calculate (or figure a mechanism) to calculate how many users should be using the system before we start seeing messed up links.
Any mathematical insights into that? Again, I KNOW its a horrible practice, I just want to understand the variables in this situation...
Update: Thanks for the comments folks - we're moving in the right direction and getting the code fixed!
The point is not if potential bad situations are likely. The point is if they are possible. As long as there's a non-trivial probability of the issue occurring, if it's known it should be avoided.
It's not like we're talking about changing a one line function call into a 5000 line monster to deal with a remotely possible edge case. We're talking about actually shortening the call to a more readable, and more correct usage.
I kind of agree with #Mark Baker that there is some performance consideration, but since id is a primary key, the MAX query will be very quick. Sure, the LAST_INSERT_ID() will be faster (since it's just reading from a session variable), but only by a trivial amount.
And you don't need a lot of users for this to occur. All you need is a lot of concurrent requests (not even that many). If the time between the start of the insert and the start of the select is 50 milliseconds (assuming a transaction safe DB engine), then you only need 20 requests per second to start hitting an issue with this consistently. The point is that the window for error is non-trivial. If you say 20 requests per second (which in reality is not a lot), and assuming that the average person visits one page per minute, you're only talking 1200 users. And that's for it to happen regularly. It could happen once with only 2 users.
And right from the MySQL documentation on the subject:
You can generate sequences without calling LAST_INSERT_ID(), but the utility of
using the function this way is that the ID value is maintained in the server as
the last automatically generated value. It is multi-user safe because multiple
clients can issue the UPDATE statement and get their own sequence value with the
SELECT statement (or mysql_insert_id()), without affecting or being affected by
other clients that generate their own sequence values.
Instead of using SELECT MAX(id) you shoud do as the documentation says :
Instead, use the internal MySQL SQL function LAST_INSERT_ID() in an SQL query
Even so, neither SELECT MAX(id) nor mysql_insert_id() are "thread-safe" and you still could have race condition. The best option you have is to lock tables before and after your requests. Or even better use transactions.
I don't have the math for it, but I would point out that response (a) is a little silly. Doesn't the company want a lot of users? Isn't that a goal? That response implies that they'd rather solve the problem twice, possibly at great expense the second time, instead of solve it once correctly the first time.
This will happen when someone has added something to the table between one insert and that query running. So to answer your question, two people using the system has the potential for things to go wrong.
At least using the LAST_INSERT_ID() will get the last ID for a particular resource so it won't matter how many new entries have been added in between.
In addition to the risk of getting the wrong ID value returned, there's also the additional database query overhead of SELECT MAX(id), and it's more PHP code to actually execute than a simple mysql_insert_id(). Why deliberately code something to be slow?
Here's the problem: When a script starts modifying the database and something goes wrong, the database is usually corrupted. For example, lets say we have a User table and a Photos table.
A script creates a user dataset and in the next lines it attempts to create a photo dataset. The photo has a user_id column. Now lets assume something goes wrong and PDO's lastInserId() doesn't return the id of the user. So what happens in worst case: We get a user with no photo, and a photo with no valid user_id. Broken reference. 3 weeks to debug.
Are there any good strategies to follow, to prevent exactly this kind of problems? In my code below, you can see that I at least try to log that to a file and quit the script execution to prevent more damage and db curruption.
public function lastInsertId() {
$id = $this->dbh->lastInsertId();
if (!is_numeric($id)) {
$this->logError("DB::lastInsertId() did not return an id as expected!");
die();
}
return $id;
}
Maybe I have to use Transactions all over the place, at any time where an query B depends on a query A, and so forth? Is that the solution to go?
Should I do a "precaution rollback" before the die() call? I guess it would not hurt much at this point, would it? I'm not sure...
The solution would be to use transactions each time you have several queries for which it should be "all or none", yes -- that's the A of ACID : Atomicity.
You can do a rollback before your die, if you want ; it won't change much (a transaction that is not commited will automatically be rolled-back by the DB engine), but it will make your code more clear, and easier to understand.
As a sidenote : using die this way is probably not the "right" way to deal with errors : it'll prevent you from displaying any kind of "nice" error page, for instance.
A solution that's more often used is to have some kind of exception be thrown when there is such kind of problem -- and in a higher layer of your application (in one single place) deal with those exceptions, to display an error page.
Outside of using a transactional engine (InnoDB if you're using MySQL, or just use PostgreSQL, etc.) and wrapping the relevant atomic activities there's not a great deal you can do.
As #Seb says, you can create a transactional log and you could even use a master/slave database setup, but this won't really add much in terms of coverage.
You should keep a log of all transactions, so if the automated process goes wrong (even your rollbacks, fallback procedures, etc), you still can revert all effects back manually.