I want to build a database-wide unique id. That unique id should be one field of every row in every table of that database.
There are a few approaches I have considered:
Create one master-table with an auto-increment-field and a trigger in every other table, like:
"before insert here, insert in master-table -> get the auto-increment value -> and use this value as primary-key here"
I have seen this before, but instead of making one INSERT, it does 2 INSERTS, which I expect would not be that performant.
Add a field uniqueId to every table, and fill this field with a PHP-generated integer... something like unix-timestamp plus a random number.
But I had to use BIGINT as the datatype, which means big index_length and big data_length.
Similar to the "uniqueId" idea, but instad of BIGINT I use VARCHAR and use uniqid() to populate this value.
Since you are looking for opinions... Of the three ideas you give, I would "vote" for the uniqid() solution. It seems pretty low cost in terms of execution (but possibly not implementation).
A simpler solution (I think) would be to just add a field to each table to store a guid and set the default value of the field to be MySQL's function that generates a guid (I think it is UUID). This lets the database do the work for you.
And in the spirit of coming up with random ideas... It would be possible to have some kind of offline process fill in the IDs asynchronously. Make sure every table has the appropriate field and make the default value be 0/empty. Then the offline process could simply run a query on each table to find the rows that do not yet have a unique id and it could fill them in. That would let you control the ID and even use some kind of incrementing integer. This, of course, requires that you do not need the unique ID instantly each time a record is inserted.
Related
I have got a table which has an id (primary key with auto increment), uid (key refering to users id for example) and something else which for my question won’t matter.
I want to make, lets call it, different auto-increment keys on id for each uid entry.
So, I will add an entry with uid 10, and the id field for this entry will have a 1 because there were no previous entries with a value of 10 in uid. I will add a new one with uid 4 and its id will be 3 because I there were already two entried with uid 4.
...Very obvious explanation, but I am trying to be as explainative an clear as I can to demonstrate the idea... clearly.
What SQL engine can provide such a functionality natively? (non Microsoft/Oracle based)
If there is none, how could I best replicate it? Triggers perhaps?
Does this functionality have a more suitable name?
In case you know about a non SQL database engine providing such a functioality, name it anyway, I am curious.
Thanks.
MySQL's MyISAM engine can do this. See their manual, in section Using AUTO_INCREMENT:
For MyISAM tables you can specify AUTO_INCREMENT on a secondary column in a multiple-column index. In this case, the generated value for the AUTO_INCREMENT column is calculated as MAX(auto_increment_column) + 1 WHERE prefix=given-prefix. This is useful when you want to put data into ordered groups.
The docs go on after that paragraph, showing an example.
The InnoDB engine in MySQL does not support this feature, which is unfortunate because it's better to use InnoDB in almost all cases.
You can't emulate this behavior using triggers (or any SQL statements limited to transaction scope) without locking tables on INSERT. Consider this sequence of actions:
Mario starts transaction and inserts a new row for user 4.
Bill starts transaction and inserts a new row for user 4.
Mario's session fires a trigger to computes MAX(id)+1 for user 4. You get 3.
Bill's session fires a trigger to compute MAX(id). I get 3.
Bill's session finishes his INSERT and commits.
Mario's session tries to finish his INSERT, but the row with (userid=4, id=3) now exists, so Mario gets a primary key conflict.
In general, you can't control the order of execution of these steps without some kind of synchronization.
The solutions to this are either:
Get an exclusive table lock. Before trying an INSERT, lock the table. This is necessary to prevent concurrent INSERTs from creating a race condition like in the example above. It's necessary to lock the whole table, since you're trying to restrict INSERT there's no specific row to lock (if you were trying to govern access to a given row with UPDATE, you could lock just the specific row). But locking the table causes access to the table to become serial, which limits your throughput.
Do it outside transaction scope. Generate the id number in a way that won't be hidden from two concurrent transactions. By the way, this is what AUTO_INCREMENT does. Two concurrent sessions will each get a unique id value, regardless of their order of execution or order of commit. But tracking the last generated id per userid requires access to the database, or a duplicate data store. For example, a memcached key per userid, which can be incremented atomically.
It's relatively easy to ensure that inserts get unique values. But it's hard to ensure they will get consecutive ordinal values. Also consider:
What happens if you INSERT in a transaction but then roll back? You've allocated id value 3 in that transaction, and then I allocated value 4, so if you roll back and I commit, now there's a gap.
What happens if an INSERT fails because of other constraints on the table (e.g. another column is NOT NULL)? You could get gaps this way too.
If you ever DELETE a row, do you need to renumber all the following rows for the same userid? What does that do to your memcached entries if you use that solution?
SQL Server should allow you to do this. If you can't implement this using a computed column (probably not - there are some restrictions), surely you can implement it in a trigger.
MySQL also would allow you to implement this via triggers.
In a comment you ask the question about efficiency. Unless you are dealing with extreme volumes, storing an 8 byte DATETIME isn't much of an overhead compared to using, for example, a 4 byte INT.
It also massively simplifies your data inserts, as well as being able to cope with records being deleted without creating 'holes' in your sequence.
If you DO need this, be careful with the field names. If you have uid and id in a table, I'd expect id to be unique in that table, and uid to refer to something else. Perhaps, instead, use the field names property_id and amendment_id.
In terms of implementation, there are generally two options.
1). A trigger
Implementations vary, but the logic remains the same. As you don't specify an RDBMS (other than NOT MS/Oracle) the general logic is simple...
Start a transaction (often this is Implicitly already started inside triggers)
Find the MAX(amendment_id) for the property_id being inserted
Update the newly inserted value with MAX(amendment_id) + 1
Commit the transaction
Things to be aware of are...
- multiple records being inserted at the same time
- records being inserted with amendment_id being already populated
- updates altering existing records
2). A Stored Procedure
If you use a stored procedure to control writes to the table, you gain a lot more control.
Implicitly, you know you're only dealing with one record.
You simply don't provide a parameter for DEFAULT fields.
You know what updates / deletes can and can't happen.
You can implement all the business logic you like without hidden triggers
I personally recommend the Stored Procedure route, but triggers do work.
It is important to get your data types right.
What you are describing is a multi-part key. So use a multi-part key. Don't try to encode everything into a magic integer, you will poison the rest of your code.
If a record is identified by (entity_id,version_number) then embrace that description and use it directly instead of mangling the meaning of your keys. You will have to write queries which constrain the version number but that's OK. Databases are good at this sort of thing.
version_number could be a timestamp, as a_horse_with_no_name suggests. This is quite a good idea. There is no meaningful performance disadvantage to using timestamps instead of plain integers. What you gain is meaning, which is more important.
You could maintain a "latest version" table which contains, for each entity_id, only the record with the most-recent version_number. This will be more work for you, so only do it if you really need the performance.
I have a database in mysql, and a table called Animals, I use this condition to add news records.
public function create()
{
$animals = Animals::all();
$last_animal_id = collect($animals)->last();
if ($last_animal_id->id == $last_animal_id->id) {
$last_animal_id->id = $last_animal_id->id + 1;
} else {
return false;
}
return view('animal.create-animals')->with('last_animal_id', $last_animal_id);
}
I work in laravel and php, and that is my controller 'AnimalsController', the condition add +1 to the last id that is registered in the table.
For example, I have 4 records and I delete the last record, without my condition, after I have added a new record the new record will take the value 6.
And that is the reason that I add manually new records, with this condition, the condition find the last id, and add +1 to the last id, not +2 if I not have this condition. Not directly, I pass the value to an input and then I send the form in my view.
Is possible to add +1 id in the table, if I delete a record in the middle, or before the last record? As the following example explains:
Table Animals
/*NOTE: The field 'id' HAVE THE FOLLOWING ATTRIBUTES:
AUTO_INCREMENT, IS 'NOT NULL','PRIMARY KEY', AND HIS TYPE IS 'INT'*/
id|name |class
1 |Dog |Mammal
2 |Cat |Mammal
3 |Sparrow|Bird
4 |Whale |Mammal
5 |Frog |Amphibian
6 |Snake |Reptile
Then I delete the id, 2, and 3.
In addition to the condition that already exists, I would like to create another condition that allows to add new records among the others, only if there are missing records in between of others.
Using the previous example:
I said that I will delete the id 2 and 3 right? The new condition must allow to create again the records with the id 2 and 3 between the records with the id 1 and 4.
If I delete another record the condition must perform the same function. Certainly replacing the records with corresponding id that were previously deleted.
For more details: I use a form to create new animals to the table Animals, previously I said in the example, that I will delete the records with id 2 and 3, then If the condition in my controller, and my form in my view, work properly then I can add again the animal with id 2, and then in a new form add again the animal with id 3.
Thus, if my question was not understood very well or you thought that my function should add the record(s) simultaneously, you understood it wrong, because It's that not the function that I would like to do in the function.
One thing to keep in mind when working with relational databases is that the id column is usually used to relate this data and as such it can and will appear in other tables. If you arbitrarily renumber things here, you're damaging those links and potentially scrambling up your data.
If ordering is important, create a column for that purpose, for example one called position or something similar. This one you can manipulate freely without concern about altering relations.
Generally your id value should be:
Always populated (e.g. NOT NULL)
Integer (e.g. INT or BIGINT)
Set as your primary key (e.g. PRIMARY KEY)
Generated automatically (e.g. AUTO_INCREMENT)
Never changed, it's permanently assigned
Never recycled and used for another record
Recycling id values is how you create enormous security problems. It's all too easy for a user to "inherit" all the data that came with an old user ID value you've recycled. The safest thing is to never, ever re-use these values.
They're just IDs. Forget about holes or lack of ordering. Any production database will end up with lots of interesting patterns there that are unavoidable, but it doesn't matter.
One exception to this is when creating seed databases. Here you can fuss over the ordering to get things arranged as you want because this is before you insert the data into the database.
At the end of the day you'll want to ensure that:
These numbers don't overflow (e.g. INT keyed table at 2.1 billion)
These numbers aren't exposed to users in a way that makes it possible to enumerate your table (e.g. ID value in a URL)
Just think of them as internal identifiers, like a serial number, and you'll be fine. In fact, MySQL now supports SERIAL as a datatype for this reason, that's an alias for BIGINT UNSIGNED NOT NULL AUTO_INCREMENT UNIQUE which is a good default for systems designed in 2018.
There is a really great answer from Tadman about the implications of your solution.
To give you an alternative to your own solution, you can do something like this....
First, create an order column, an int.
Them, instead of looking at the latest id value, do this...
$highestOrder = Animal::max('order');
And then 'up it'... :-) Just an idea.
BTW: to give you more options, you can look directly in a table as well:
DB::table('animals')->max('order');
... but I would not do that in this case. The model class is the best 'gateway' to this information, not the DB facade directly.
until now i ve always stored records in mysql database by generating an ID (varchar 32 primary key) with php, with a function like that:
$id = substr( str_shuffle( abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 ), 0, 8 );
but until now in mysql DB i've always use utf8_bin (that is case sensitive) now i'm using utf8_general_ci (case insensitive).
I have a table in my DB to store statistics, in this table there are a millions of records.
in this case is better to use: 'id int unsigned autoincrement' as primary key?
if yes, is possibile that if many users call the sciprt at the same time the script crash with a 'duplicate id' error? and how i can avoid that?
Even though several people can access the site at once, but MySQL will process inserts in the table sequentially and will queue requests it receives. So in the insert query, if an ID is not provided an auto-incremented ID will be generated and then the row saved and committed. And the next request in queue will be processed. There is no way an auto-incremented ID can be as such duplicated.
Additionally, your code generates a random string and not an unique string. There is a lot of difference between the two. It is quite possible to generate a random string sequence that has been generated earlier.
On the other hand auto-increment is a gradually increasing sequential no ensuring there is no chance of having a duplicate key. As such it is always advised to use auto-increment to generate a primary key than generate one's own.
To get the last generated MySQL ID you can use mysqli_insert_id() right after your insert query in PHP and use it in your code for subsequent interactions with MySQL with respect to the inserted row.
At my opinion a autoincrement with mysql is better, because your php script now could be visited by more than one person at the same time.
So the id is maybe not unique anymore.
And I am pretty sure that mysql is so well programmed that it prohibit same ids ;)
In fact your current code has the bug that the same ID might be generated again. MySQL generated id doesn't have this problem. Even if you have a reason to generate your own ids, I would still use MySQL autoincrement integer to link between tables because of better indexing (speed).
And if for example you want to hide the sequence from the user, keep it in separate column with unique index. And do the id generation and insert in do while loop so if you happen to generate the same id second time, you can retry.
I am using mysql with PHP. I have a students table like this. I am using InnoDB engine.
id int AUTO_INCREMENT
regno int
name varchar
whenever a new student is inserted, I want to assign the next available regno. for example the regno of previous student is 1 then the value should be 2 for the next entry. The auto increment does not work here as it may create gaps. (I am using transactions, so after inserting a row to students table, there are few more queries that may cause rollback, in which case, the auto increment id is incremented although no actual record is inserted). Also, I don't care if there is a gap present between old regnos... e.g regno may have 1,2,3,5,10,11,12 in sequence. now when next student is inserted I would like 12+1=13 for the this student. Also, I want to make sure the regno is not duplicated. (Although regno has a UNIQUE index, but I don't want to throw error. It should get the next number).
I've two solutions in mind.
1: (pseudo-code)
a. Query Database for the newregno = max(regno)+1
b. assign newregno to student while inserting the row.
In this case I am just concerned about that 2 instances of application may query the database at the same time and get the same newregno causing the duplicate.
2: Use triggers... Update the regno after real row insertion. (I've not read much about the triggers, but if any one suggest this is a better approach, I'll go for it)
Any suggestion?
EDIT---
The regno (registeration number) may not be unique itself in future but will be unique along with some other columns e.g. course/session. So please don't offer me an 'auto increment' index type solution.
Have a look at this:
http://www.mysqlperformanceblog.com/2011/11/29/avoiding-auto-increment-holes-on-innodb-with-insert-ignore/
Increment uses different algorithms for calculating the id. You need to set it to avoid holes.
I have keys for a project I made where I am trying to test a licensing system (Just for fun, and learning) a part that I thought I'd run into, is how to distribute the keys. I have about 100 keys in a database, and I'm trying to figure out the best way to distribute them. The database is layed out as follows,
ID (Auto Increment) | key
Using the PDO library, what is the most effective way to either to go in chronological order by ID? But even if I did chronological order, when I deleted the key that was given out, how would I go in chronological order? Or maybe random ID number? I have no clue how to go about the most effective way to distribute these keys?
If I understand your question correctly...
You might try this query through PDO:
SELECT * FROM `table-name`
ORDER BY `ID` ASC
Then when you step through the rows in a while() loop from the execution's return, it will be in chronological order like you asked.
As far as losing ID's, like if you delete the key with ID # 10, your table will jump from 9 to 11 in the returned rows IDs. When you add a new key, # 10 will not be used unless you specifically specify that ID when inserting.
EDIT: From the phrasing of your question, it sounds like you may be concerned about how you set up the ID's for the keys. Maybe you understand this already, but since you have Auto Increment, your IDs will be automatically generated when you insert new keys, so a new key would be assigned an ID of (ID of last inserted key) + 1.
Chronology isn't exactly a feature of PDO, or for that matter whatever database driver you are using... it's more a matter of your schema.
Typically, a commonly employed field in any database structure is a "timestamp" or "created" field that holds the time the record was created in the database. These fields can be MySQL datatype TIMESTAMP (in which case the driver will return seconds since the Unix Epoch), or DATETIME (in which case most drivers will attempt to return the language's native DateTime object if one exists.) Even though monotonically-increasing primary keys imply a certain amount of chronological order when sorted, a timestamp field can record the exact time a record was created at the server, as well as update on change using ON UPDATE CURRENT_TIMESTAMP. So I would suggest adding this to your schema.
With such a field in your database, you can always sort your queries using:
SORT BY timestamp_field_name ASC
Also, if by "distribute" you mean some data will be publicly accessible by using this key as query param of some sort, I wouldn't use the monotonic primary key for the exact reason you described, especially if this is a "licensing" proof of concept, which if you mean a DRM-type thing should probably produce a complex string. Hashed timestamps in a UNIQUE field, or the php uniqid function can produce values that can be stored in a VARCHAR database field with the UNIQUE key restraint. This is if I have understood your described goal.