I have a question about counterCache that the documentation doesn't clarify at all.
Does counterCache checks race conditions when updating the field value?
For example, let's say we have a forum implementation, and for each forum, we have a number n of topics stored via counterCache. Then, if two users use the model at almost same time (enough to overlap their operations, meaning that when one ends it, the other one still will be using it), and one create a new topic and the other one (assuming it can) deletes another topic, then will it show exactly n topics, and not n+1 or n-1?
Whatever operation finishes first will be the result. It is very unlikely and probably technically impossible that the operation happen at the same time. So one of both comes first. Then there is also the time it takes for the request to be received by the server and browser. So whenever the next view request comes you'll get whatever at this point was updated in the database.
By checking the code you can also see that the code is doing a find('count') instead of inc/decrementing by +/- one. http://api.cakephp.org/2.3/source-class-Model.html#1913-1981 So the cache gets written after the previous action was completed.
And finally I would really not worry about if the count is off by +/- one for a moment, specially in a forum.
I believe it could, but not because of Cake but because multiple SQL operations in a non transactional behaviour, and from multiple clients (users) at the same time. Any web application with or without a framework would suffer from this situations.
I'm not very proficient with SQL transactions, but I'm pretty sure some kind of transaction configuration would prevent this form happening.
As burzum says, the forum example is definitely not something to worry about.
Related
How does PHP handle multiple requests from users? Does it process them all at once or one at a time waiting for the first request to complete and then moving to the next.
Actually, I'm adding a bit of wiki to a static site where users will be able to edit addresses of businesses if they find them inaccurate or if they can be improved. Only registered users may do so. When a user edits a business name, that name along with it's other occurrences is changed in different rows in the table. I'm a little worried about what would happend if 10 users were doing this simultaneously. It'd be a real mishmash of things. So does PHP do things one at time in order received per script (update.php) or all at once.
Requests are handled in parallel by the web server (which runs the PHP script).
Updating data in the database is pretty fast, so any update will appear instantaneous, even if you need to update multiple tables.
Regarding the mish mash, for the DB, handling 10 requests within 1 second is the same as 10 requests within 10 seconds, it won't confuse them and just execute them one after the other.
If you need to update 2 tables and absolutely need these 2 updates to run subsequently without being interrupted by another update query, then you can use transactions.
EDIT:
If you don't want 2 users editing the same form at the same time, you have several options to prevent them. Here are a few ideas:
You can "lock" that record for edition whenever a user opens the page to edit it, and not let other users open it for edition. You might run into a few problems if a user doesn't "unlock" the record after they are done.
You can notify in real time (with AJAX) a user that the entry they are editing was modified, just like on stack overflow when a new answer or comment was posted as you are typing.
When a user submits an edit, you can check if the record was edited between when they started editing and when they tried to submit it, and show them the new version beside their version, so that they manually "merge" the 2 updates.
There probably are more solutions but these should get you started.
It depends on which version of Apache you are using and how it is configured, but a common default configuration uses multiple workers with multiple threads to handle simultaneous requests. See http://httpd.apache.org/docs/2.2/mod/worker.html for a rundown of how this works. The end result is that your PHP scripts may together have dozens of open database connections, possibly sending several queries at the exact same time.
However, your DBMS is designed to handle this. If you are only doing simple INSERT queries, then your code doesn't need to do anything special. Your DBMS will take care of the necessary locks on its own. Row-level locking will be fastest for multiple INSERTs, so if you use MySQL, you should consider the InnoDB storage engine.
Of course, your query can always fail whether it's due to too many database connections, a conflict on a unique index, etc. Wrap your queries in try catch blocks to handle this case.
If you have other application-layer concerns about concurrency, such as one user overwriting another user's changes, then you will need to handle these in the PHP script. One way to handle this is to use revision numbers stored along with your data, and refusing to execute the query if the revision number has changed, but how you handle it all depends on your application.
I have a dilemma, which I hope you will have some expert opinions on.
I have a table called CARDS with a column STATUS. If a record's status changes from 'download' to 'publish', I have to insert the record reference into another table called CARD_ASSIGNMENTS. Additionally, the record needs to be added into CARD_ASSIGNMENTS as many times as there are active records in SCANNERS.
In other words, if there are two active scanners, I will end up with two records in CARD_ASSIGNMENTS as below:
ID CARD_ID SCANNER_ID STATUS_ID
1 1 1 4
2 1 2 4
My dilemma is that I'm not quite sure what would be the most efficient way to execute the above. I've considered the following options:
From PHP - Do one UPDATE query and then the INSERT queries.
Create a stored procedure, which will take care of updating the CARDS record and adding records into the CARD_ASSIGNMENTS. Then, just call that stored procedure from PHP.
Create an ON UPDATE trigger for the CARDS table which will take care of processing INSERTS into the CARD_ASSIGNMENTS table.
PS. A simplified version of my database is available on MySQL Fiddle
Thanks,
Kate
Interesting question.
I'm going to give you clues about how to approach the problem.
So, you have to start by defining precisely three things:
the expected functionality
the access policy to the functionality
the technical upgrade policy
Here I'll detail these points.
So, the first point is that you have to define your functionality. By doing so, you will be able to tell whether adding a card implies always, in all the possible paradigms (sorry for the pedantic word I can't find a more proper one) of your information system, that this card MUST exist in the other table according to the specifications you provided. This 1-1 functional link must be said TRUE or FALSE. This is really important.
Said with other words, if there's at least one possibility that one day you don't want to copy that record to the other table, it means the trigger is a wrong solution, or at least it should be thought with an emergency mode (for example a variable inside that allows it to not get executed in some conditions) setup on.
Then comes the second point, about the access policy. You have to know whether the allowed accessing systems will do so by using your application layer or if they could develop their own (SAAS style). If so, your php layer will be useless and the stored procedure is an excellent option, since every single technical and business layer will go trough it yes or yes.
The last thing to know is whether you're possibly going to upgrade your php layer one day. In most of the cases the answer is yes. If so, you might have to modify the part containing this sql logic you're talking about. Then, having everything into a stored procedure vs storing it hardcoded into the php will definitely save you time, and improve stability.
Left brain right brain, I'm going to tell you my personal opinion afterall. I really love going with stored procedures but not using any triggers. If the environment allows it, I would go for an underlying batch, calling a set of defined stored procedures, concentrating the activity outside of the online scope.
The advantages are the following:
none or less risks of interruption of the online workflow since you reduce the number of operations
different schedule to alliviate the database load
more secure policy since executing the stored procedure requires only one grant, while using the same sql with php would require insert/update grants
better logging quality: you can have a log per job
better emergency response: when a job fails (if well thought) you can restart it, and that's it.
Long post, but that was interesting and I really wanted to share these ideas.
Cheers!
I would use triggers. Some developers say, that if you have too many triggers and stored procedures, the database lives its own life, that means you never know what is going to happen on insert, update etc. But in my opinion, triggers may help you a lot to keep database consistent, so even if someone inserts data directly from some administration tool, the integrity is still kept, because all necessary commands are executed. If you choose stored procedures, you would still have to know, that you need to call this procedure to insert any new data.
This question has been asked a THOUSAND times... so it's not unfair if you decide to skip reading/answering it, but I still thought people would like to see and comment on my approach...
I'm building a site which requires an activity feed, like FourSquare.
But my site has this feature for the eye-candy's sake, and doesn't need the stuff to be saved forever.
So, I write the event_type and user_id to a MySQL table. Before writing new events to the table, I delete all the older, unnecessary rows (by counting the total number of rows, getting the event_id lesser than which everything is redundant, and deleting those rows). I prune the table, and write a new row every time an event happens. There's another user_text column which is NULL if there is no user-generated text...
In the front-end, I have jQuery that checks with a PHP file via GET every x seconds the user has the site open. The jQuery sends a request with the last update "id" it received. The <div> tags generated by my backend have the "id" attribute set as the MySQL row id. This way, I don't have to save the last_received_id in memory, though I guess there's absolutely no performance impact from storing one variable with a very small int value in memory...
I have a function that generates an "update text" depending on the event_type and user_id I pass it from the jQuery, and whether the user_text column is empty. The update text is passed back to jQuery, which appends the freshly received event <div> to the feed with some effects, while simultaneously getting rid of the "tail end" event <div> with an effect.
If I (more importantly, the client) want to, I can have an "event archive" table in my database (or a different one) that saves up all those redundant rows before deleting. This way, event information will be saved forever, while not impacting the performance of the live site...
I'm using CodeIgniter, so there's no question of repeated code anywhere. All the pertinent functions go into a LiveUpdates class in the library and model respectively.
I'm rather happy with the way I'm doing it because it solves the problem at hand while sticking to the KISS ideology... but still, can anyone please point me to some resources, that show a better way to do it? A Google search on this subject reveals too many articles/SO questions, and I would like to benefit from the experience any other developer that has already trawled through them and found out the best approach...
If you use proper indexes there's no reason you couldn't keep all the events in one table without affecting performance.
If you craft your polling correctly to return nothing when there is nothing new you can minimize the load each client has on the server. If you also look into push notification (the hybrid delayed-connection-closing method) this will further help you scale big successfully.
Finally, it is completely unnecessary to worry about variable storage in the client. This is premature optimization. The performance issues are going to be in the avalanche of connections to the web server from many users, and in the DB, tables without proper indexes.
About indexes: An index is "proper" when the most common query against a table can be performed with a seek and a minimal number of reads (like 1-5). In your case, this could be an incrementing id or a date (if it has enough precision). If you design it right, the operation to find the most recent update_id should be a single read. Then when your client submits its ajax request to see if there is updated content, first do a query to see if the value submitted (id or time) is less than the current value. If so, respond immediately with the new content via a second query. Keeping the "ping" action as lightweight as possible is your goal, even if this incurs a slightly greater cost for when there is new content.
Using a push would be far better, though, so please explore Comet.
If you don't know how many reads are going on with your queries then I encourage you to explore this aspect of the database so you can find it out and assess it properly.
Update: offering the idea of clients getting a "yes there's new content" answer and then actually requesting the content was perhaps not the best. Please see Why the Fat Pings Win for some very interesting related material.
Background: I'm working on a system where the developers seem to be using a function which executes a MYSQL query like "SELECT MAX(id) AS id FROM TABLE" whenever they need to get the id of the LAST inserted row (the table having an auto_increment column).
I know this is a horrible practice (because concurrent requests will mess the records), and I'm trying to communicate that to the non-tech / management team, to which their response is...
"Oh okay, we'll only face this problem when we have
(a) a lot of users, or
(b) it'll only happen when two people try doing something
at _exactly_ the same time"
I don't disagree with either point, and think we'll run into this problem much sooner than we plan. However, I'm trying to calculate (or figure a mechanism) to calculate how many users should be using the system before we start seeing messed up links.
Any mathematical insights into that? Again, I KNOW its a horrible practice, I just want to understand the variables in this situation...
Update: Thanks for the comments folks - we're moving in the right direction and getting the code fixed!
The point is not if potential bad situations are likely. The point is if they are possible. As long as there's a non-trivial probability of the issue occurring, if it's known it should be avoided.
It's not like we're talking about changing a one line function call into a 5000 line monster to deal with a remotely possible edge case. We're talking about actually shortening the call to a more readable, and more correct usage.
I kind of agree with #Mark Baker that there is some performance consideration, but since id is a primary key, the MAX query will be very quick. Sure, the LAST_INSERT_ID() will be faster (since it's just reading from a session variable), but only by a trivial amount.
And you don't need a lot of users for this to occur. All you need is a lot of concurrent requests (not even that many). If the time between the start of the insert and the start of the select is 50 milliseconds (assuming a transaction safe DB engine), then you only need 20 requests per second to start hitting an issue with this consistently. The point is that the window for error is non-trivial. If you say 20 requests per second (which in reality is not a lot), and assuming that the average person visits one page per minute, you're only talking 1200 users. And that's for it to happen regularly. It could happen once with only 2 users.
And right from the MySQL documentation on the subject:
You can generate sequences without calling LAST_INSERT_ID(), but the utility of
using the function this way is that the ID value is maintained in the server as
the last automatically generated value. It is multi-user safe because multiple
clients can issue the UPDATE statement and get their own sequence value with the
SELECT statement (or mysql_insert_id()), without affecting or being affected by
other clients that generate their own sequence values.
Instead of using SELECT MAX(id) you shoud do as the documentation says :
Instead, use the internal MySQL SQL function LAST_INSERT_ID() in an SQL query
Even so, neither SELECT MAX(id) nor mysql_insert_id() are "thread-safe" and you still could have race condition. The best option you have is to lock tables before and after your requests. Or even better use transactions.
I don't have the math for it, but I would point out that response (a) is a little silly. Doesn't the company want a lot of users? Isn't that a goal? That response implies that they'd rather solve the problem twice, possibly at great expense the second time, instead of solve it once correctly the first time.
This will happen when someone has added something to the table between one insert and that query running. So to answer your question, two people using the system has the potential for things to go wrong.
At least using the LAST_INSERT_ID() will get the last ID for a particular resource so it won't matter how many new entries have been added in between.
In addition to the risk of getting the wrong ID value returned, there's also the additional database query overhead of SELECT MAX(id), and it's more PHP code to actually execute than a simple mysql_insert_id(). Why deliberately code something to be slow?
I'm developing a php / mysql application that handles multiple simultaneous users. I'm thinking of the best approach to take when it comes to locking / warning against records that are currently being viewed / edited.
The scenario to avoid is two users viewing the record, one making a change, then the other doing likewise - with the potential that one change might overwrite the previous.
In the latest versions of WordPress they use some method to detect this, but it does not seem wholly reliable - often returning false positives, at least in my experience.
I assume some form of ajax must be in place to 'ping' the application and let it know the record is still being viewed / edited (otherwise, a user might simply close their browser window, and then how would the application know that).
Another solution I could see is to check the last updated time when a record is submitted for update, to see if in the interim it has been updated elsewhere - and then offer the user a choice to proceed or discard their own changes.
Perhaps I'm barking up the wrong tree in terms of a solution - what are peoples experiences of implementing this (what must be a fairly common) requirement?
I would do this: Store the time of the last modification in the edit form. Compare this time on submission with the time stored in the database. If they are the same, lock the table, update the data (along with the modification time) and unlock the table. If the times are different, notify the user about it and ask for the next step.
Good idea with the timestamp comparison. It's inexpensive to implement, and it's an inexpensive operation to run in production. You just have to write the logic to send back to the user the status message that their write/update didn't occur because someone beat them to it.
Perhaps consider storing the username on each update in a field called something like 'LastUpdateBy', and return that back to the user who had their update pre-empted. Just a little nicety for the user. Nice in the corporate sense, perhaps not in an environment where it might not be appropriate.