I would like to find out what the consquence is if you want to create a sequence after a table has been created and quite a bit of data already been inserted.
( this is because PEAR's DataObject's insert() method sometimes skips incremental IDs )
So here is an example to achieve this, but is this the correct way to do if after the amount of time has passed?
Table definition:
CREATE TABLE departments (
ID NUMBER(10) NOT NULL,
DESCRIPTION VARCHAR2(50) NOT NULL);
ALTER TABLE departments ADD (
CONSTRAINT dept_pk PRIMARY KEY (ID));
CREATE SEQUENCE dept_seq;
Trigger definition:
CREATE OR REPLACE TRIGGER dept_bir
BEFORE INSERT ON departments
FOR EACH ROW
BEGIN
SELECT dept_seq.NEXTVAL
INTO :new.id
FROM dual;
END;
If you mean that you already have datas with ID field inserted without using the trigger, the only thing you'll have to check is that the "start" of your sequence = at least the max existing ID + 1
CREATE SEQUENCE dept_seq
START WITH 2503
INCREMENT BY 1
Then it should be perfectly fine.
this is because PEAR's DataObject's insert() method sometimes skips incremental IDs
As a complement to Raphaƫl Althaus's answer, using a sequence will not guarantee anyhow that you don't have "holes" in the IDs. Think about concurrent access, or rollbacks.
To quote the documentation:
When a sequence number is generated, the sequence is incremented, independent of the transaction committing or rolling back. If two users concurrently increment the same sequence, then the sequence numbers each user acquires may have gaps, because sequence numbers are being generated by the other user.
There was a interesting answer to the same question on Asktom:
Sequences will never generate a gap free sequence of numbers.
[...]
You should never count on a sequence generating anything even close to a gap free
sequence of numbers. They are a high speed, extremely scalable multi-user way to
generate surrogate keys for a table.
[...] contigous sequences of numbers are pretty much impossible
with sequences (only takes but one rollback -- and those will happen).
Related
I have a table in which the primary key is a 20 character VARCHAR field that gets generated in PHP before getting inserted into the table. The key generation logic uses grouping and sequencing mechanism as given below.
SELECT
SUBSTR(prod_code, 15) AS prod_num
FROM
items
, products
WHERE
items.cat_type = $category
AND items.sub_grp = $sub_grp
AND items.prod_id = products.prod_id
ORDER BY
prod_num DESC LIMIT 1
The prod_num thus got is incremented in PHP and prefixed with a product code to create a unique primary key. However multiple users can do the transaction concurrently for same category and sub_group leading to same keys generated for those users. This may lead to duplicate key error, as its a unique primary key. What is the best way to handle such a situation?
Don't use "Smart IDs".
Smart IDs were all the rage in the 1980s, and went out of fashion for several reasons:
The only requirement of a PK is that is has to be unique. A PK doesn't need to have a format, or to be sexy or good looking. Their specific sequence, case, or composition is not relevant and actually counter-productive.
They are not relational. Parts of the ID could establish a relationship with other tables and that can cause a lot of issues. This goes against Normal Forms defined in database design.
Now, if you still need a Smart ID, then create a secondary column (that can also be unique) and then populate it after the row is created. If you are facing thread safety issues, you can run a single deferred process that will assign nice looking values after a few minutes. Alternatively, you can implement a queue, that can resolve this is seconds.
Agree with "The Impaler".
But if you decide to proceed that way: to handle your concurrency issue could be through a retry-mechanism.
This is similar to how deadlocks are typically handled.
If the insertion fails because of violation of the unique primary key, just try again in PHP with a new key.
Your framework might have retry functions already. Otherwise it's easy to implement yourself.
I am having a strange issue with MySQL Query, i am having a table with the fields
slno,mobileno , contractor with slno as primary key and auto
increment sequence 1
, while test say uptil 100 records the count and the autoincrement values are same ,
so i truncated the table to reset the autoincrement and inserted a huge excel file with around 40k data via php, then issued select query which yields
the max of slno is 40000 as expected but the count shows 39920
, I am just amused and tried to find over google , may be my lack of keyword search ability prevented me from finding result, so i am posting in here, for ref added screen shot, Any ideas and clarifications. Thanks
EDIT:
min slno is 1
EDIT :
A related question with solution to find gap in auto number in mysql has been asked and solved here.
There are specific cases in which auto-incremented values can be lost. One example is if you roll back an insertion. As per the doco:
"Lost" auto-increment values and sequence gaps
In all lock modes (0, 1, and 2), if a transaction that generated auto-increment values rolls back, those auto-increment values are "lost". Once a value is generated for an auto-increment column, it cannot be rolled back, whether or not the "INSERT-like" statement is completed, and whether or not the containing transaction is rolled back. Such lost values are not reused. Thus, there may be gaps in the values stored in an AUTO_INCREMENT column of a table.
In that case, although the insert is backed out, the auto-increment may not be. That would certainly allow for the possibility that your bulk insertion from Excel is occasionally failing and retrying, with the subsequent retry working. It really depends on how your insertion process works.
In any case, assuming those values will always be contiguous is actually a bad assumption to make.
This is because, even if insertions were guaranteed to be contiguous, it's possible to delete rows which would result in gaps appearing. You can certainly fix this each time you delete (or bulk insert for that matter) but the workload is high - you basically have to find gaps and then "move" higher entries into those gaps.
This movement is likely to be non-trivial as it's most likely that there will be other tables holding key look-ups to that column, and each of those will need to be changed as well.
So the best use case for an auto-increment field is simply to provide a unique identifier for the row where no other one exists and not to be necessarily contiguous.
Together with my team, I am working on a functionality to generate invoice numbers. The requirements says that:
there should be no gaps between invoice numbers
the numbers should start from 0 every year (the together with the year we will have a unique key)
the invoice numbers should grow accordinlgy to the time of the creation of the invoices
We are using php and postgres. We tought to implement this in the following way:
each time a new invoice is persisted on the database we use a BEFORE INSERT trigger
the trigger executes a function that retrieves a new value from a postgres sequence and writes it on the invoice as its number
Considering that multiple invoices could be created during the same transaction, my question is: is this a sufficiently safe approach? What are its flaws? How would you suggest to improve it?
Introduction
I believe the most crucial point here is:
there should be no gaps between invoice numbers
In this case you cannot use a squence and an auto-increment field (as others propose in the comments). Auto-increment field use sequence under the hood and nextval(regclass) function increments sequence's counter no matter if transaction succeeded or failed (you point that out by yourself).
Update:
What I mean is you shouldn't use sequences at all, especially solution proposed by you doesn't eliminates gap possibility. Your trigger gets new sequence value but INSERT could still failed.
Sequences works this way because they mainly meant to be used for PRIMARY KEYs and OIDs values generation where uniqueness and non-blocking mechanism is ultimate goal and gaps between values are really no big deal.
In your case however the priorities may be different, but there are couple things to consider.
Simple solution
First possible solution to your problem could be returning new number as maximum value of currently existing ones. It can be done in your trigger:
NEW.invoice_number =
(SELECT foo.invoice_number
FROM invoices foo
WHERE foo._year = NEW._year
ORDER BY foo.invoice_number DESC NULLS LAST LIMIT 1
); /*query 1*/
This query could use your composite UNIQUE INDEX if it was created with "proper" syntax and columns order which would be the "year" column in the first place ex.:
CREATE UNIQUE INDEX invoice_number_unique
ON invoices (_year, invoice_number DESC NULLS LAST);
In PostgreSQL UNIQUE CONSTRAINTs are implemented simply as UNIQUE INDEXes so most of the times there no difference which command you will use. However using that particular syntax presented above, makes possible to define order on that index. It's really nice trick which makes /*query 1*/ quicker than simple SELECT max(invoice_number) FROM invoices WHERE _year = NEW.year if the invoice table gets bigger.
This is simple solution but has one big drawback. There is possibility of race condition when two transactions try to insert invoice at the same time. Both could acquire the same max value and the UNIQUE CONSTRAINT will prevent the second one from committing. Despite that it could be sufficient in some small system with special insert policy.
Better solution
You may create table
CREATE TABLE invoice_numbers(
_year INTEGER NOT NULL PRIMARY KEY,
next_number_within_year INTEGER
);
to store next possible number for certain year. Then, in AFTER INSERT trigger you could:
Lock invoice_numbers that no other transaction could even read the number LOCK TABLE invoice_numbers IN ACCESS EXCLUSIVE;
Get new invoice number new_invoice_number = (SELECT foo.next_number_within_year FROM invoice_numbers foo where foo._year = NEW.year);
Update number value of new added invoice row
Increment UPDATE invoice_numbers SET next_number_within_year = next_number_within_year + 1 WHERE _year = NEW._year;
Because table lock is hold by the transaction to its commit, this probably should be the last trigger fired (read more about trigger execution order here)
Update:
Instead of locking whole table with LOCK command check link provided by Craig Ringer
The drawback in this case is INSERT operation performance drop down --- only one transaction at the time can perform insert.
In my database (MySQL) I have a table (MyISAM) containing a field called number. Each value of this field is either 0 or a positive number. The non zero values must be unique. And the last thing is that the value of the field is being generated in my php code according to value of another field (called isNew) in this table. The code folows.
$maxNumber = $db->selectField('select max(number)+1 m from confirmed where isNew = ?', array($isNew), 'm');
$db->query('update confirmed set number = ? where dataid = ?', array($maxNumber, $id));
The first line of code select the maximum value of the number field and increments it. The second line updates the record by setting it freshly generated number.
This code is being used concurrently by hundreds of clients so I noticed that sometimes duplicates of the number field occur. As I understand this is happening when two clients read value of the number field almost simultaneously and this fact leads to the duplicate.
I have read about the SELECT ... FOR UPDATE statement but I'm not quite sure it is applicable in my case.
So the question is should I just append FOR UPDATE to my SELECT statement? Or create a stored procedure to do the job? Or maybe completely change the way the numbers are being generated?
This is definitely possible to do. MyISAM doesn't offer transaction locking so forget about stuff like FOR UPDATE. There's definitely room for a race condition between the two statements in your example code. The way you've implemented it, this one is like the talking burro. It's amazing it works at all, not that it works badly! :-)
I don't understand what you're doing with this SQL:
select max(number)+1 m from confirmed where isNew = ?
Are the values of number unique throughout the table, or only within sets where isNew has a certain value? Would it work if the values of number were unique throughout the table? That would be easier to create, debug, and maintain.
You need a multi-connection-safe way of getting a number.
You could try this SQL. It will do the setting of the max number in one statement.
UPDATE confirmed
SET number = (SELECT 1+ MAX(number) FROM confirmed WHERE isNew = ?)
WHERE dataid = ?
This will perform badly. Without a compound index on (isNew, number), and without both those columns declared NOT NULL it will perform very very badly.
If you can use numbers that are unique throughout the table I suggest you create for yourself a sequence setup, which will return a unique number each time you use it. You need to use a series of consecutive SQL statements to do that. Here's how it goes.
First, when you create your tables create yourself a table to use called sequence (or whatever name you like). This is a one-column table.
CREATE TABLE sequence (
sequence_id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`sequence_id`)
) AUTO_INCREMENT = 990000
This will make the sequence table start issuing numbers at 990,000.
Second, when you need a unique number in your application, do the following things.
INSERT INTO sequence () VALUES ();
DELETE FROM sequence WHERE sequence_id < LAST_INSERT_ID();
UPDATE confirmed
SET number = LAST_INSERT_ID()
WHERE dataid = ?
What's going on here? The MySQL function LAST_INSERT_ID() returns the value of the most recent autoincrement-generated ID number. Because you inserted a row into that sequence table, it gives you back that generated ID number. The DELETE FROM command keeps that table from snarfing up disk space; we don't care about old ID numbers.
LAST_INSERT_ID() is connection-safe. If software on different connections to your database uses it, they all get their own values.
If you need to know the last inserted ID number, you can issue this SQL:
SELECT LAST_INSERT_ID() AS sequence_id
and you'll get it returned.
If you were using Oracle or PostgreSQL, instead of MySQL, you'd find they provide SEQUENCE objects that basically do this.
Here's the answer to another similar question.
Fastest way to generate 11,000,000 unique ids
So my app needs to let users generate random alphanumeric codes like A6BU31, 38QV3B, R6RK7T. Currently they consist of 6 chars, whereas I and O are not used (so we got 34^6 possibilities). These codes are then printed out and used for something else.
I must now ensure that many users can "reserve" up to 100 codes per request, so user A might want to get 50 codes, user B wants to generate 10 and so on. These codes must be unique across all users, so user A and user B may not both receive the code ABC123.
My current approach (using PHP and MySQL) is to have two InnoDB tables for this:
One (the "repository") contains a large list of pre-generated codes (since the possibility of collisions will increase over time and I do not want to go the try-insert-if-fails-try-another-code approach). The repository contains just the codes and an auto-incremented ID (so I can sort them, see below).
The other table holds the reserved keys (i.e. code + owning user).
Whenever a user wants to reserve N keys, I planned to do the following
BEGIN;
INSERT INTO revered_codes (code,user_id)
SELECT code FROM repository WHERE 1 ORDER BY id LIMIT N;
DELETE FROM repository WHERE 1 ORDER BY id LIMIT N;
COMMIT;
This should work, but I'm not sure. It seems like I'm building a WTF solution.
After insertion I must select the just reserved codes to display them to the user. And that's the tricky part, since I don't really know how to identify the just reserved codes after my transaction is done. I could of course add just another column to my reserved_codes table, holding some kind of random token, but this seems even more WTFy.
My favorite solution would be to have a random number sequence, so that I can just perform INSERT operations in the reserved_codes table.
So, how to do this unique, random and transactional-safe sequence in MySQL? One idea was to have a regular auto-increment on the reserved_codes table and derive the random code value from that numeric column, but I was wondering whether there was a better way.
UPDATE: I forgot to mention that it would be advantagous to have a rather small table of reserved codes, as I later have to find single codes again for updating them (reserved_codes has a couple of more attributes to it). So letting the reserved table grow slowly is good (instead of having a huge index over ~1mio pre-generated codes).
If you already have a repository table, I would just add a user column and then run this query:
UPDATE repository SET user_id = ? WHERE user_id IS NULL LIMIT N;
Afterwards, you can select the records again. This had two distinct disadvantages:
you need an index on user_id
you can't use the codes in your table for anything else but binding it to users.