Handling Multi Users
Requirements:
I have an applications (mysql php jquery) where the users can:
Review records and update certain fields.
Issue invoices by selecting orders.
Issues:
The issue is that an invoice should not be issued twice for the same time period. Also, a field should not be updated by two or more users at the same time.
Possible Solutions:
Lock the tables when they get updated, and if the user performs an action, notify and reload.
Impliment lock system, that when a user performs certain actions, it locks those actions to be performed by other users.
...
Lookup 'optimistic locking' - basically means adding a version attribute and passing it back and incrementing it with updates to make sure nobody else got there first. If N users try same operation based on same version, one wins, others loose. It's fast simple easy for a wide variety of cases.
Don't know if this will help you or not but I'd first read about this in context of .Net's DataTable Adapter which tracks the changes made to the data rows since you read them and send back to db after changing. What it does is send all the fields instead of just the changed ones.
You can use time-stamps for the rows. Read the time stamp with other info and before saving check if the current time-stamp (of rows) is newer than what you have. This way you can minimize locking to just this portion, comparing time-stamps and updating if you are the first one to reach there.
Thank you both. Will look into both options: 1 optimistic locking (http://cwiki.apache.org/CAY/optimistic-locking-explained.html), and the time stamp approach.
Related
I have two table 'reservation' and 'spot'.during a reservation process the 'spotStatus' column in spot table is checked and if free, it is to be updated. A user is allowed to reserve only one spot so to make sure that no other user can reserve the same spot, what can i do?
referring to some answers here,i found row locking,table locking as solutions. should i perform queries like
"select * from spot where spotId = id for update;"
and then performing necessary update to the status or is there other elegant ways to do it?
and my concern is what happens to the locked row if
1. Transaction doesnot complete successfully?
2. what happens if both user tries to reserve the same row at the same time? are both transactions cancelled?
and when is the lock released?
The problem here is in race conditions, that even transactions will not prevent by default if used naively - even if 2 reservations happen simultaneously, for example originating from 2 different Apache processes running PHP, transactional locking will just ensure the reservations are properly serialized, and as such the second one will still overwrite the first.
Usually this situation is of no real concern, given the speed of databases and servers as a whole, compared to the load on an average reservation site, the chances of this ever causing a problem are less than winning the state lottery twice in a row. If however you are implementing a site that's going to sell 50k Coldplay concert tickets in 30 seconds, chances rise aggressively.
A simple solution to this is to implement a sort of 'reservation intent' by not overwriting the spot reservation directly, but by appending the intent-to-reserve to a separate timestamped table. After this insertion you can then clean up this table for duplicates, preferring the oldest, and apply that one to the real-time data.
if its not successful, the database returns to the same data it was before the transaction (rollback) as if it never happened.
the same as it was not in the same time. only one of them will lock the db and the other wont be created.
If you are using a teradata you can use a queue table concept.
I have a multiple devices (eleven to be specific) which sends information every second. This information in recieved in a apache server, parsed by a PHP script, stored in the database and finally displayed in a gui.
What I am doing right now is check if a row for teh current day exists, if it doesn't then create a new one, otherwise update it.
The reason I do it like that is because I need to poll the information from the database and display it in a c++ application to make it look sort of real-time; If I was to create a row every time a device would send information, processing and reading the data would take a significant ammount of time as well as system resources (Memory, CPU, etc..) making the displaying of data not quite real-time.
I wrote a report generation tool which takes the information for every day (from 00:00:00 to 23:59:59) and put it in an excel spreadsheet.
My questions are basically:
Is it posible to do the insertion/updating part directly in the database server or do I have to do the logic in the php script?
Is there a better (more efficient) way to store the information without a decrease in performance in the display device?
Regarding the report generation, if I want to sample intervals lets say starting from yesterday at 15:50:00 and ending today at 12:45:00 it cannot be done with my current data structure, so what do I need to consider in order to make a data structure which would allow me to create such queries.
The components I use:
- Apache 2.4.4
- PostgreSQL 9.2.3-2
- PHP 5.4.13
My recommendations - just store all the information, your devices are sending. With proper indexes and queries you can process and retrieve information from DB really fast.
For your questions:
Yes it is possible to build any logic you desire inside Postgres DB using SQL, PL/pgSQL, PL/PHP, PL/Java, PL/Py and many other languages built into Postgres.
As I said before - proper indexing can do magic.
If you cannot get desired query speed with full table - you can create a small table with 1 row for every device. And keep in this table last known values to show them in sort of real-time.
1) The technique is called upsert. In PG 9.1+ it can be done with wCTE (http://www.depesz.com/2011/03/16/waiting-for-9-1-writable-cte/)
2) If you really want it to be real-time you should be sending the data directly to the aplication, storing it in memory or plaintext file also will be faster if you only care about the last few values. But PG does have Listen/notify channels so probabably your lag will be just 100-200 mili and that shouldn't be much taken you're only displaying it.
I think you are overestimating the memory system requirements given the process you have described. Adding a row of data every second (or 11 per second) is not a hog of resources. In fact it is likely more time consuming to UPDATE vs ADD a new row. Also, if you add a TIMESTAMP to your table, sort operations are lightning fast. Just add some garbage collection handling as a CRON job (deletion of old data) once a day or so and you are golden.
However to answer your questions:
Is it posible to do the insertion/updating part directly in the database server or do I >have to do the logic in the php script?
Writing logic from with the Database engine is usually not very straight forward. To keep it simple stick with the logic in the php script. UPDATE (or) INSERT INTO table SET var1='assignment1', var2='assignment2' (WHERE id = 'checkedID')
Is there a better (more efficient) way to store the information without a decrease in >performance in the display device?
It's hard to answer because you haven't described the display device connectivity. There are more efficient ways to do the process however none that have locking mechanisms required for such frequent updating.
Regarding the report generation, if I want to sample intervals lets say starting from >yesterday at 15:50:00 and ending today at 12:45:00 it cannot be done with my current data >structure, so what do I need to consider in order to make a data structure which would >allow me to create such queries.
You could use the a TIMESTAMP variable type. This would include DATE and TIME of the UPDATE operation. Then it's just a simple WHERE clause using DATE functions within the database query.
I'm working on an AJAX application. The user clicks a button and his name is saved into the database and shown inside a <div>, whose content is fetched from the database by means of an AJAX Long Polling. The database also contains a timestamp which represents an expiration: subscriptions beyond that timestamp must not be accepted. There is also a limit for users to subscribe.
I have a PHP script that is called by an AJAX request, this script queries the database and checks for expiration (the timestamp of the click is computed by JavaScript and sent via AJAX). It also checks for user limit: i have a N-to-N relationship between Users and Products (to subscribe for). These tasks obviously take time and I'm worried about possible concurrency problems. Should I use database transactions? What technique could I use to ensure the atomicity of this operation?
It depends on the kind of work that is done those "long" tasks.
Generic info:
If you're only inserting user driven data and data generated in PHP without it being read and/or cross-correlated with data fetched from the DB then transactionality should not be an issue.
If you're updating data and cross-correlating it with other elements in the DB then you need to start using transactions and to carefully choose the isolation levels of the transactions you plan on using.
Transactions can seriously affect speed when concurrency rises. Choosing a very safe isolation level may be safer than needed for your application and you may be adding a lot of unnecessary work to the MVCC.
Also using transactions as separate PHP api calls and managing the rollback logic in the application increases the overall duration of the transaction because it adds all the processing delays generated by PHP. If you can compact DB communications into a set of queries requested in one communication it would be better.
Case info:
Let's consider this scenario: there are 8 slots, 7 users subscribed. Two users click the subscribe button almost simultaneously. When the control script is launched for the last clicking user, the query for the subscription of the first clicking user might still be executed. This would imply that the system accepts both users as valid subscriptions.
This falls into the second case I explained, the case when you're cross-correlating user driven data with what you have in the DB. You're reading the state of the db before you commit the user drive data, so yes you would need transactions in this case.
There may be a possibility to speculate the inherent atomicity of one update statement. Any UPDATE table_name SET x = x+1 WHERE a = 'value'; is guaranteed to be atomic. You can use this to your advantage.
All subscribing PHP threads must first decrement a subscriber count. If the number of affected rows on the decrement is not 0 that means that the decrement was successful and they can carry on submitting the user-related data, else inform the user he was 0.3ms too slow.
I have created an office scheduling program that uses jQuery to post to a PHP file which then inserts an appointment into a pgSQL database. This has not happened yet but I can foresee this problem in the future--two office workers try to schedule an appointment in the same slot at the same time, creating a race condition and one set of customer data would be lost, or at least I'd have to dig it out of a log. I was wondering if there was a flag I could set in the database, if I need to create some kind of gatekeeper program to control server connections, or if there is some kind of mutex/lock/semaphore I can use with javascript/php/sql to keep this race condition from occurring.
You can either lock it with a database flag, or a better strategy is to detect collisions, since this only happens in rare cases.
To detect the problem, you can save a timestamp from the database containing the last updated time. Send this along with the form, and compare the timestamp before you update the record. If the timestamp has changed, then present the user with all the data and ask them what they want to do. This offers a way for the second saving user to modify their changes based on the previously saved data if they wish.
There are other ways to solve this problem, and the proper solution depends the nature of the specific problem.
I have a site where the users can view quite a large number of posts. Every time this is done I run a query similar to UPDATE table SET views=views+1 WHERE id = ?. However, there are a number of disadvantages to this approach:
There is no way of tracking when the pageviews occur - they are simply incremented.
Updating the table that often will, as far as I understand it, clear the MySQL cache of the row, thus making the next SELECT of that row slower.
Therefore I consider employing an approach where I create a table, say:
object_views { object_id, year, month, day, views }, so that each object has one row pr. day in this table. I would then periodically update the views column in the objects table so that I wouldn't have to do expensive joins all the time.
This is the simplest solution I can think of, and it seems that it is also the one with the least performance impact. Do you agree?
(The site is build on PHP 5.2, Symfony 1.4 and Doctrine 1.2 in case you wonder)
Edit:
The purpose is not web analytics - I know how to do that, and that is already in place. There are two purposes:
Allow the user to see how many times a given object has been shown, for example today or yesterday.
Allow the moderators of the site to see simple view statistics without going into Google Analytics, Omniture or whatever solution. Furthermore, the results in the backend must be realtime, a feature which GA cannot offer at this time. I do not wish to use the Analytics API to retrieve the usage data (not realtime, GA requires JavaScript).
Quote : Updating the table that often will, as far as I understand it, clear the MySQL cache of the row, thus making the next SELECT of that row slower.
There is much more than this. This is database killer.
I suggest u make table like this :
object_views { object_id, timestamp}
This way you can aggregate on object_id (count() function).
So every time someone view the page you will INSERT record in the table.
Once in a while you must clean the old records in the table. UPDATE statement is EVIL :)
On most platforms it will basically mark the row as deleted and insert a new one thus making the table fragmented. Not to mention locking issues .
Hope that helps
Along the same lines as Rage, you simply are not going to get the same results doing it yourself when there are a million third party log tools out there. If you are tracking on a daily basis, then a basic program such as webtrends is perfectly capable of tracking the hits especially if your URL contains the ID's of the items you want to track... I can't stress this enough, it's all about the URL when it comes to these tools (Wordpress for example allows lots of different URL constructs)
Now, if you are looking into "impression" tracking then it's another ball game because you are probably tracking each object, the page, the user, and possibly a weighted value based upon location on the page. If this is the case you can keep your performance up by hosting the tracking on another server where you can fire and forget. In the past I worked this using SQL updating against the ID and a string version of the date... that way when the date changes from 20091125 to 20091126 it's a simple query without the overhead of let's say a datediff function.
First just a quick remark why not aggregate the year,month,day in DATETIME, it would make more sense in my mind.
Also I am not really sure what is the exact reason you are doing that, if it's for a marketing/web stats purpose you have better to use tool made for that purpose.
Now there is two big family of tool capable to give you an idea of your website access statistics, log based one (awstats is probably the most popular), ajax/1pixel image based one (google analytics would be the most popular).
If you prefer to build your own stats database you can probably manage to build a log parser easily using PHP. If you find parsing apache logs (or IIS logs) too much a burden, you would probably make your application ouput some custom logs formated in a simpler way.
Also one other possible solution is to use memcached, the daemon provide some kind of counter that you can increment. You can log view there and have a script collecting the result everyday.
If you're going to do that, why not just log each access? MySQL can cache inserts in continuous tables quite well, so there shouldn't be a notable slowdown due to the insert. You can always run Show Profiles to see what the performance penalty actually is.
On the datetime issue, you can always use GROUP BY MONTH( accessed_at ) , YEAR( accessed_at) or WHERE MONTH(accessed_at) = 11 AND YEAR(accessed_at) = 2009.