I have about 50k entries that I am inserting in MySQL via PHP. Table is simple with ID (autoincrementation), and few VARCHAR fields...
Since this system allows multiple users to login at same time and do the same operation of inserting data, so let's say USER1 starts the "insert" and in very same millisecond USER2 starts inserting data in same table - I am curious if MySQL will wait for USER1 to finish process and than process USER2 entries or it will do insert simultaneously ?
So following that logic USER1 insert ID's will be from 1 to 50k and USER2 from 50k-100k, or at the other hand will it shuffle them?
...and if it will do "shuffling" is there a way to prevent this ?
P.S. I know - I can add additional field with user_id so I can distinguish which user did actual "entry", but in this case I would really like to reserve space from 1-50k to be for USER1 and 50k-100k to be for USER2....
To reserve the table and thereby the auto increment ids for yourself you should LOCK the table before beginning your inserts. Otherwise "shuffling" of ids is very likely.
Note though that nothing, even locking, guarantees the continuity of autoincrement ids. I understand that it would be "a nice touch" to have a block of inserts with continuous ids, but that's not what they are for and there are no guarantees in the system to make it so. So don't rely on it or expect it in any way.
You shouldn't care about that.
Id is just an uniquely created identifier with no special meaning.
Any time you put special meaning on this identifier you will face a disaster.
Related
I have got a table which has an id (primary key with auto increment), uid (key refering to users id for example) and something else which for my question won’t matter.
I want to make, lets call it, different auto-increment keys on id for each uid entry.
So, I will add an entry with uid 10, and the id field for this entry will have a 1 because there were no previous entries with a value of 10 in uid. I will add a new one with uid 4 and its id will be 3 because I there were already two entried with uid 4.
...Very obvious explanation, but I am trying to be as explainative an clear as I can to demonstrate the idea... clearly.
What SQL engine can provide such a functionality natively? (non Microsoft/Oracle based)
If there is none, how could I best replicate it? Triggers perhaps?
Does this functionality have a more suitable name?
In case you know about a non SQL database engine providing such a functioality, name it anyway, I am curious.
Thanks.
MySQL's MyISAM engine can do this. See their manual, in section Using AUTO_INCREMENT:
For MyISAM tables you can specify AUTO_INCREMENT on a secondary column in a multiple-column index. In this case, the generated value for the AUTO_INCREMENT column is calculated as MAX(auto_increment_column) + 1 WHERE prefix=given-prefix. This is useful when you want to put data into ordered groups.
The docs go on after that paragraph, showing an example.
The InnoDB engine in MySQL does not support this feature, which is unfortunate because it's better to use InnoDB in almost all cases.
You can't emulate this behavior using triggers (or any SQL statements limited to transaction scope) without locking tables on INSERT. Consider this sequence of actions:
Mario starts transaction and inserts a new row for user 4.
Bill starts transaction and inserts a new row for user 4.
Mario's session fires a trigger to computes MAX(id)+1 for user 4. You get 3.
Bill's session fires a trigger to compute MAX(id). I get 3.
Bill's session finishes his INSERT and commits.
Mario's session tries to finish his INSERT, but the row with (userid=4, id=3) now exists, so Mario gets a primary key conflict.
In general, you can't control the order of execution of these steps without some kind of synchronization.
The solutions to this are either:
Get an exclusive table lock. Before trying an INSERT, lock the table. This is necessary to prevent concurrent INSERTs from creating a race condition like in the example above. It's necessary to lock the whole table, since you're trying to restrict INSERT there's no specific row to lock (if you were trying to govern access to a given row with UPDATE, you could lock just the specific row). But locking the table causes access to the table to become serial, which limits your throughput.
Do it outside transaction scope. Generate the id number in a way that won't be hidden from two concurrent transactions. By the way, this is what AUTO_INCREMENT does. Two concurrent sessions will each get a unique id value, regardless of their order of execution or order of commit. But tracking the last generated id per userid requires access to the database, or a duplicate data store. For example, a memcached key per userid, which can be incremented atomically.
It's relatively easy to ensure that inserts get unique values. But it's hard to ensure they will get consecutive ordinal values. Also consider:
What happens if you INSERT in a transaction but then roll back? You've allocated id value 3 in that transaction, and then I allocated value 4, so if you roll back and I commit, now there's a gap.
What happens if an INSERT fails because of other constraints on the table (e.g. another column is NOT NULL)? You could get gaps this way too.
If you ever DELETE a row, do you need to renumber all the following rows for the same userid? What does that do to your memcached entries if you use that solution?
SQL Server should allow you to do this. If you can't implement this using a computed column (probably not - there are some restrictions), surely you can implement it in a trigger.
MySQL also would allow you to implement this via triggers.
In a comment you ask the question about efficiency. Unless you are dealing with extreme volumes, storing an 8 byte DATETIME isn't much of an overhead compared to using, for example, a 4 byte INT.
It also massively simplifies your data inserts, as well as being able to cope with records being deleted without creating 'holes' in your sequence.
If you DO need this, be careful with the field names. If you have uid and id in a table, I'd expect id to be unique in that table, and uid to refer to something else. Perhaps, instead, use the field names property_id and amendment_id.
In terms of implementation, there are generally two options.
1). A trigger
Implementations vary, but the logic remains the same. As you don't specify an RDBMS (other than NOT MS/Oracle) the general logic is simple...
Start a transaction (often this is Implicitly already started inside triggers)
Find the MAX(amendment_id) for the property_id being inserted
Update the newly inserted value with MAX(amendment_id) + 1
Commit the transaction
Things to be aware of are...
- multiple records being inserted at the same time
- records being inserted with amendment_id being already populated
- updates altering existing records
2). A Stored Procedure
If you use a stored procedure to control writes to the table, you gain a lot more control.
Implicitly, you know you're only dealing with one record.
You simply don't provide a parameter for DEFAULT fields.
You know what updates / deletes can and can't happen.
You can implement all the business logic you like without hidden triggers
I personally recommend the Stored Procedure route, but triggers do work.
It is important to get your data types right.
What you are describing is a multi-part key. So use a multi-part key. Don't try to encode everything into a magic integer, you will poison the rest of your code.
If a record is identified by (entity_id,version_number) then embrace that description and use it directly instead of mangling the meaning of your keys. You will have to write queries which constrain the version number but that's OK. Databases are good at this sort of thing.
version_number could be a timestamp, as a_horse_with_no_name suggests. This is quite a good idea. There is no meaningful performance disadvantage to using timestamps instead of plain integers. What you gain is meaning, which is more important.
You could maintain a "latest version" table which contains, for each entity_id, only the record with the most-recent version_number. This will be more work for you, so only do it if you really need the performance.
What is the best way to generate consecutive values when you have a load balanced database and instances of your application ?
For example, i have a load balanced mysql database.
My PHP application, is deployed with docker and has 3 containers
I have to generate consecutive ids. I cannot use auto increment because i have to generate unique ids depending on relations (For example, i have to generate a unique bill number depending on witch society it is related)
My bill can be generated but not emmited. I must generate the unique value when the bill is emitted.
TRIGGER ON UPDATE is the good solution or not ?
Thnks for your answers
I would save the current id in a db table.
Each time you want to increase the id, do the following:
Start a transaction
Block the id row in the db table: in mysql use FOR UPDATE
Read the current id
increase the id
generate the bill with the id
Store the id back to the db
Commit the transaction
I'd go for MAX(id)+1
You can get the next number in the sequence with a query like:
SELECT COALESCE(MAX(id),0) + 1 FROM bill
WHERE society = 'XYZ'
You'll have to take steps to ensure that two processes don't generate the same number and that can be complicated but not insurmountable.
Personally, I would always avoid a trigger. I've never used trigger and not regretted it later.
My setup:
Mysql and PHP
System Scenario:
I have more than 10 Type of system Users:
For example :Customer and Employee
Everytime a customer or employee added to the system, the system will automatically generate ID to each user based on current date.
Ex (Customer):
Today is June 20,2015 and this customer is the 3rd to sign up. So his
ID would be 06202015-03. So everytime a user (any type of user) signup
the sequence number will increment by 1 in a day basis only. Every next day
the sequence counter will be back to 0.
General Question: Given my concern of ID generation is solved, is it a good practice to pre-process the next sequence #? I mean the system will just pullout the next sequence number saved on the db table? or should I just process the next sequence number only until a new user is signing up?
UPDATE (Added best possible scenario) :
Example Date: June 20,2015
Customer 1 signup = Generated ID would be 06202015-01
Customer 2 signup = Generated ID would be 06202015-02
and so on...
Worst possible scenario during signup:
2 or more user signing up simoltaneously
If customer1 is deleted (by admin) on that same day and customer2 signed up, the customer 2 should get the #1 id (06202015-01) and not *-02 as the customer1 is being deleted already.
.
I would like to know the best way to generate a sequence number efficiently:
Is stored procedure would be the best fit for this? or should I use #2?(see below)
Is it a good practice to just process the next sequence number (using PHP function) everytime a user signed up?
The #2 process is I think the best and easier way to process auto ID generation but I'm thinking WHAT IF 2 or more users
simultaneously singing up?
On my latest update, the sequence is obviously predictable. My only concern is what is the best or efficient way to get the sequence number. Is it thru stored procedure or using php script function given the worst scenarios stated.
General Question: Given my concern of ID generation is solved, is it a good practice to pre-process the next sequence #? I mean the system will just pullout the next sequence number saved on the db table? or should I just process the next sequence number only until a new user is signing up?
If the id is dependent on the date an user signs up, you can't predict the next id because you don't know when the next user will sign up (unless you are a clairvoyant).
To make it easier to obtain the next value I would split the id into two columns, a column with the date and a column with the sequence, then u can use:
IFNULL((SELECT MAX(sequence) FROM usertable WHERE signup_date = CURRENT_DATE), 0) + 1
Imo there's no best practise, it's a personal preference.
There's also a third option, a before insert trigger.
To avoid duplicates add an unique index with both columns.
In addition you can lock the table:
LOCK TABLES user_table WRITE;
/* CALL(sproc) or INSERT statement, or SELECT and INSERT statements */
UNLOCK TABLES;
With a write lock no other session can access the table untill the lock is released (it will wait)
In a table of Users, I want to keep track of the time of day each user logs in as running totals. For example
UserID midnightTo6am 6amToNoon noonTo6pm 6pmToMidnight
User1 3 2 7 1
User2 4 9 1 8
Note that this is part of a larger table that contains more information about a user, such as address and gender, hair color, etc, etc.
In this example, what is the best way to store this this data? Should it be part of the users table, despite knowing that not every user will log in at every time (a user may never log in between 6am and noon)? Or is this table a 1NF failure because of repeating columns that should be moved to a separate table?
If stored as part of the Users Table, there may be empty cells that never get populated with data because the user never logs in at that time.
If this data is a 1NF failure and the data is to be put in a separate table, how would I ensure that a +1 for a certain time goes smoothly? Would I search for the user in the separate table to see if they have logged in at that time before and +1? Or add a column to that table if it is their first time logging in during that time period?
Any clarifications or other solutions are welcome!
I would recommend storing the login events either in a file based log or in a simple table with just the userid and DATETIME of the login.
Once a day, or however often you need to report on the data you illustrated in your question, aggregate that data up into a table in the shape that you want. This way you're not throwing away any raw data and can always reaggregate for different periods, by hour, etc at a later date.
addition: I suspect that the fastest way of deriving the aggregated data would be to run a number of range queries for each of your aggregation periods so you're searching for (e.g.) login dates in the range 2011-12-25 00:00:00 - 2011-12-24 03:00:00. If you go with that approach and index of (datetime, user_id) would work well. It seems counter-intuitive as you want to do stuff on a user-centric basis but the index on the DATETIME field would allow easy finding of the rows and then the trailing user_id index would allow for fast grouping.
A couple of things. Firstly, this is not a violation of 1NF. Doing it as 4 columns may in fact be acceptable. Secondly, if you do go with this design, you should not use nulls, use zero instead(with the possible exception of existing records). Finally, WHETHER you should use this design or split it into another table (or two) is dependent upon your purpose and usage. If your standard use of the table does not make use of this information, it should go into another table with a 1 to 1 relationship. If you may need to increase the granuality of the login times, then you should use another table. Finally, if you do split this off into another table with a timestamp, give some consideration to privacy.
I'm working on a project for which I need to frequently insert ~500 or so records at a remote location. I will be doing this in a single INSERT to minimize network traffic. I do, however, need to know the exact id field (AUTO_INCREMENT) values.
My web searches seem to indicate I could probably use the last_insert_id and calculate all the id values from there. However, this wouldn't work if the rows get ids that are all over the place.
Could anyone please clarify what would or should happen, and if the mathematical solution is safe?
A multirow insert is an atomic operation in MySQL (both MyISAM and InnoDB). Since the table will be locked for writing during this operations, no other rows will be inserted/updated during it's execution.
This means IDs will in fact be consecutive (unless auto-increment-increment option is set to something different than 1
Auto increment does exactly that, it auto-increments - i.e. each new row next the numerically next ID. MySQL does not re-use IDs of rows that were deleted.
Your solution is safe because write operations aquire a table lock, so no other inserts can happen while your operation completes - so you will get n contiguous auto-increment values for n inserted rows.