I am creating a system to generate unique keys. It works for now. But, I haven't tested it with many users. Users may click a button and then get his unique number, as simple as that.
But, How to prevent multiple users getting the same unique keys,if they press the button exactly in the same time (even in ms scale)? The button is on client side, so I must do something in the back end.
This is the unique key looks like:
19/XXXXXX-ABC/XYZ
The XXXXXX is auto increment number from 000001 to 999999. I have this code but didn't know if it's reliable enough to handle my issue.
$autoinc = $this->MPenomoran->get_surat($f_nomor)->jumlah_no+1; //count data in table and added 1
$no_1 = date('y')+2;
$no_2 = str_pad($autoinc, 6, '0', STR_PAD_LEFT);
$no_3 = "-ABC/XYZ";
$nomor = $no_1."/".$no_2.$no_3;
$returned_nomor = $nomor;
$success = array ('nomor' => $returned_nomor); //sent unique keys to user's view
It seems like you don't want to come out and tell us what the platform is for this, or what the limitations to that platform are.
The first thing that jumps out is that your format is limited by year, to 999999 total unique keys. Very odd, but presumably you understand that limit, and would need to put in some code to deal with hitting the maximum number.
Approaches
REDIS based
This would be very simple with a REDIS server using the INCR. Since INCR is atomic, you essentially have a solution just by creating a key named for your year + 2, should it not exist, and using INCR on it from there on out.
You would need to utilize some php redis client, and there are a variety of them with strengths and weaknesses to each that I'm not going to go into.
Redis is also great for caching, so if at all possible that is the first thing I would look into.
MySQL Based
There are a few different solutions using mysql. They are involved, so I'll just outline them because I don't want to spend time writing a novel.
Note: You will need to translate these into the appropriate PHP code (mysqli or PDO) where as noted, parameters are passed, transactions started etc.
MySQL - create your own sequence generator
Create a table named "Sequence" with this basic structure:
name varchar(2) PK
nextval unsigned int default 1
engine=InnoDB
The underlying query would be something like this:
BEGIN_TRANS;
SELECT nextval FROM Sequence WHERE name = '$no_1' FOR UPDATE;
UPDATE Sequence SET nextval = nextval + 1;
END_TRANS;
This code emulates a serialized Oracle style sequence. It is safe from a concurrency standpoint, because MySQL will lock the row briefly, then increment it upon completion.
MySQL - Autoincrement on multi-value PK
This comes with some caveats.
It is generally incompatible with replication.
The underlying table must be myisam
name varchar(2) PK
lastval unsigned int AUTO_INCREMENT PK
engine=MyISAM
Your underlying query would be:
INSERT INTO Sequence (name) VALUES ('$no_1')
This depends on mysql supporting a multi-column key where the 2nd column is an AUTO_INCREMENT. It's behavior is such that it acts like a sequence for each unique name.
You would then use the relevant api's built-in approach to getting the mysql LAST_INSERT_ID(). For example with PDO
Other alternatives
You could also use semaphores, files with locking, and all sorts of other ideas to create a sequence generator that would work well in a monolithic (one server for everything) environment. MySQL and Redis would serve a cluster, so those are more robust options from that standpoint.
The important thing is that whatever you do, you test it out using a load tester like siege or Boom to generate multiple requests at your web level.
Related
I have got a table which has an id (primary key with auto increment), uid (key refering to users id for example) and something else which for my question won’t matter.
I want to make, lets call it, different auto-increment keys on id for each uid entry.
So, I will add an entry with uid 10, and the id field for this entry will have a 1 because there were no previous entries with a value of 10 in uid. I will add a new one with uid 4 and its id will be 3 because I there were already two entried with uid 4.
...Very obvious explanation, but I am trying to be as explainative an clear as I can to demonstrate the idea... clearly.
What SQL engine can provide such a functionality natively? (non Microsoft/Oracle based)
If there is none, how could I best replicate it? Triggers perhaps?
Does this functionality have a more suitable name?
In case you know about a non SQL database engine providing such a functioality, name it anyway, I am curious.
Thanks.
MySQL's MyISAM engine can do this. See their manual, in section Using AUTO_INCREMENT:
For MyISAM tables you can specify AUTO_INCREMENT on a secondary column in a multiple-column index. In this case, the generated value for the AUTO_INCREMENT column is calculated as MAX(auto_increment_column) + 1 WHERE prefix=given-prefix. This is useful when you want to put data into ordered groups.
The docs go on after that paragraph, showing an example.
The InnoDB engine in MySQL does not support this feature, which is unfortunate because it's better to use InnoDB in almost all cases.
You can't emulate this behavior using triggers (or any SQL statements limited to transaction scope) without locking tables on INSERT. Consider this sequence of actions:
Mario starts transaction and inserts a new row for user 4.
Bill starts transaction and inserts a new row for user 4.
Mario's session fires a trigger to computes MAX(id)+1 for user 4. You get 3.
Bill's session fires a trigger to compute MAX(id). I get 3.
Bill's session finishes his INSERT and commits.
Mario's session tries to finish his INSERT, but the row with (userid=4, id=3) now exists, so Mario gets a primary key conflict.
In general, you can't control the order of execution of these steps without some kind of synchronization.
The solutions to this are either:
Get an exclusive table lock. Before trying an INSERT, lock the table. This is necessary to prevent concurrent INSERTs from creating a race condition like in the example above. It's necessary to lock the whole table, since you're trying to restrict INSERT there's no specific row to lock (if you were trying to govern access to a given row with UPDATE, you could lock just the specific row). But locking the table causes access to the table to become serial, which limits your throughput.
Do it outside transaction scope. Generate the id number in a way that won't be hidden from two concurrent transactions. By the way, this is what AUTO_INCREMENT does. Two concurrent sessions will each get a unique id value, regardless of their order of execution or order of commit. But tracking the last generated id per userid requires access to the database, or a duplicate data store. For example, a memcached key per userid, which can be incremented atomically.
It's relatively easy to ensure that inserts get unique values. But it's hard to ensure they will get consecutive ordinal values. Also consider:
What happens if you INSERT in a transaction but then roll back? You've allocated id value 3 in that transaction, and then I allocated value 4, so if you roll back and I commit, now there's a gap.
What happens if an INSERT fails because of other constraints on the table (e.g. another column is NOT NULL)? You could get gaps this way too.
If you ever DELETE a row, do you need to renumber all the following rows for the same userid? What does that do to your memcached entries if you use that solution?
SQL Server should allow you to do this. If you can't implement this using a computed column (probably not - there are some restrictions), surely you can implement it in a trigger.
MySQL also would allow you to implement this via triggers.
In a comment you ask the question about efficiency. Unless you are dealing with extreme volumes, storing an 8 byte DATETIME isn't much of an overhead compared to using, for example, a 4 byte INT.
It also massively simplifies your data inserts, as well as being able to cope with records being deleted without creating 'holes' in your sequence.
If you DO need this, be careful with the field names. If you have uid and id in a table, I'd expect id to be unique in that table, and uid to refer to something else. Perhaps, instead, use the field names property_id and amendment_id.
In terms of implementation, there are generally two options.
1). A trigger
Implementations vary, but the logic remains the same. As you don't specify an RDBMS (other than NOT MS/Oracle) the general logic is simple...
Start a transaction (often this is Implicitly already started inside triggers)
Find the MAX(amendment_id) for the property_id being inserted
Update the newly inserted value with MAX(amendment_id) + 1
Commit the transaction
Things to be aware of are...
- multiple records being inserted at the same time
- records being inserted with amendment_id being already populated
- updates altering existing records
2). A Stored Procedure
If you use a stored procedure to control writes to the table, you gain a lot more control.
Implicitly, you know you're only dealing with one record.
You simply don't provide a parameter for DEFAULT fields.
You know what updates / deletes can and can't happen.
You can implement all the business logic you like without hidden triggers
I personally recommend the Stored Procedure route, but triggers do work.
It is important to get your data types right.
What you are describing is a multi-part key. So use a multi-part key. Don't try to encode everything into a magic integer, you will poison the rest of your code.
If a record is identified by (entity_id,version_number) then embrace that description and use it directly instead of mangling the meaning of your keys. You will have to write queries which constrain the version number but that's OK. Databases are good at this sort of thing.
version_number could be a timestamp, as a_horse_with_no_name suggests. This is quite a good idea. There is no meaningful performance disadvantage to using timestamps instead of plain integers. What you gain is meaning, which is more important.
You could maintain a "latest version" table which contains, for each entity_id, only the record with the most-recent version_number. This will be more work for you, so only do it if you really need the performance.
I have a table in which the primary key is a 20 character VARCHAR field that gets generated in PHP before getting inserted into the table. The key generation logic uses grouping and sequencing mechanism as given below.
SELECT
SUBSTR(prod_code, 15) AS prod_num
FROM
items
, products
WHERE
items.cat_type = $category
AND items.sub_grp = $sub_grp
AND items.prod_id = products.prod_id
ORDER BY
prod_num DESC LIMIT 1
The prod_num thus got is incremented in PHP and prefixed with a product code to create a unique primary key. However multiple users can do the transaction concurrently for same category and sub_group leading to same keys generated for those users. This may lead to duplicate key error, as its a unique primary key. What is the best way to handle such a situation?
Don't use "Smart IDs".
Smart IDs were all the rage in the 1980s, and went out of fashion for several reasons:
The only requirement of a PK is that is has to be unique. A PK doesn't need to have a format, or to be sexy or good looking. Their specific sequence, case, or composition is not relevant and actually counter-productive.
They are not relational. Parts of the ID could establish a relationship with other tables and that can cause a lot of issues. This goes against Normal Forms defined in database design.
Now, if you still need a Smart ID, then create a secondary column (that can also be unique) and then populate it after the row is created. If you are facing thread safety issues, you can run a single deferred process that will assign nice looking values after a few minutes. Alternatively, you can implement a queue, that can resolve this is seconds.
Agree with "The Impaler".
But if you decide to proceed that way: to handle your concurrency issue could be through a retry-mechanism.
This is similar to how deadlocks are typically handled.
If the insertion fails because of violation of the unique primary key, just try again in PHP with a new key.
Your framework might have retry functions already. Otherwise it's easy to implement yourself.
So in this app, we have a user id which is simple auto-increment primary key. Since we do not want to expose this at the client side, we are going to use a simple hash (encryption is not important, only obfuscation).
So when a user is added to the table we do uniqid(). user_id. This will guarantee that the user hash is random enough and always unique.
The question I have is, while inserting the record, we do not know the user id at that point (cannot assume max(user_id) + 1) since there might be inserts getting committed. So we are doing an insert then getting the last_insert_idthen using that for theuser_id`, which adds an additional db query. So is there a better way to do this?
A few things before the actual answer: with latest version of MySQL which uses InnoDB as default storage engine - you always want an integer pk (or the famous auto_increment). Reasons are mostly performance. For more information, you can research on how InnoDB clusters records using PK and why it's so important. With that out of the way, let's consider our options for creating a unique surrogate key.
Option 1
You calculate it yourself, using PHP and information you obtained back from MySQL (the last_insert_id()), then you update the database back.
Pros: easy to understand by even novice programmers, produces short surrogate key.
Cons: extremely bad for concurrent access, you'll probably get clashes, and you never want to use PHP to calculate unique indices required by the database.
You don't want that option
Option 2
Supply the uniqid() to your query, create an AFTER INSERT trigger that will concatenate uniqid() with the auto_increment.
Pros: easy to understand, produces short surrogate key.
Cons: requires you to create the trigger, implements magic that's not visible from the code directly which will definitely confuse a developer that inherits the project at some point - and from experience I would bet that bad things will happen
Option 3
Use universally unique identifiers or UUIDs (also known as GUIDs). Simply supply your query with surrogate_key = UUID() and MySQL does the rest.
Pros: always unique, no magic required, easy to understand.
Cons: none, unless the fact that it occupies 36 chars bothers you.
You want the option 3.
Since we do not want to expose this at the client side
Simply don't.
In a well-designed database, users never need to see a primary-key value. In fact, a user need never know the primary key even exists.
From your question it seems you actually replace your normal auto-increment ID column with a surrogate id (If not skip to the last paragraph).
Try creating a column with another unique surrogate ID and use that on your frontend. And you can keep your normal primary ids for relationships etc.'
Remember one of the basic must rules for primary keys:
The primary key must be compact and contain the fewest possible attributes.
Also integer serials have the advantage of being simple to use and implement. They also, depending on the specific implementation of the serialization method, have the advantage of being quickly derivable, as most databases just store the serial number in a fixed location. Meaning in stead of max(id)+1 the db has it already stored and makes auto-increment fast.
So we are doing an insert then getting the last_insert_id then using
that for theuser_id`, which adds an additional db query.
last_insert_id Isn't actually a query and is a saved variable in your db connection when you performed a insert query.
If you already have a second column for your surrogate ID ignore all the above:
So we are doing an insert then getting the last_insert_id then using
that for theuser_id`, which adds an additional db query. So is there a
better way to do this?
No, you can only retrieve that uniqid by doing a query.
$res = mysql_query('SELECT LAST_INSERT_ID()');
$row = mysql_fetch_array($res);
$lastsurrogateid = $row['surrogate_id'];
Anything else is making it more complicated than necessary.
So the situation is that I am going to have two or more "insert" machines where my web application just inserts data that we want to log into the machines (they are all behind a load balancer). Every couple hours, one by one the machines will be disconnected from the load balancer and upload their information into the "master" database machine should have a relatively up to date version of all the data we are collecting.
Originally I was going to use mysqldump, but found that you cannot specify the command to not grab the auto_increment id column I have (which would lead to collisions on primary key). I saw another post recommending using a temporary table to put the data in and then drop the column, but the "insert" machines have very low specs, and the amount of data could be pretty significant on the order of 50,000 rows. Other than just programatically just taking x rows at a time and inserting them into the remote "master" database, is there an easier way to do this? Currently I have php installed on the "insert" machines.
Thank you for your input.
Wouldn't you want the master database record to have the same primary key for each record as the slave database? If not, that could lead to problems where a query will produce different results based on which machine it's on.
If you want an arbitrary primary key that will avoid collisions, consider removing the auto-increment ID and constructing an ID that's guaranteed to be unique for every record on each server. For example, you could concatenate the unix time (with microseconds) with an identifier that's different for each server. A slightly lazier solution would be to concatenate time + a random 10-digit number or something. PHP's uniqid() function does something like this automatically.
If you don't intend to ever use the ID, then just remove it from your tables. There's no rule saying that every table has to have a primary key. If you don't use it, but you want to encode information about when each record was inserted, add a timestamp column instead (and don't make it a key).
Problem: When I use an auto-incrementing primary key in my database, this happens all the time:
I want to store an Order with 10 Items. The ordered Items belong to the Order. So I store the order, ask the database for the last inserted id (which is dangerous when it comes to concurrency, right?), and then store the 10 Items with the foreign key (order_id).
So I always have to do:
INSERT ...
last_inserted_id = db.lastInsertId();
INSERT ...
INSERT ...
INSERT ...
and I believe this prevents me from using transactions in almost all INSERT cases where I need a foreign key.
So... here some solutions, and I don't know if they're really good:
A) Don't use auto_increment keys! Use a key table?
Key Table would have two fields: table_name, next_key. Every time I need a key for a table to insert a new dataset, first I ask for the next_key by accessing a special static KeyGenerator class method. This does a SELECT and an UPDATE, if possible in one transaction (would that work?). Of course I would request that for every affected table. Next, I can INSERT my entire object graph in one transaction without playing ping-pong with the database, before I know the keys already in advance.
B) Using GUUID / UUID algorithm for keys?
These suppose to be really unique worldwide, and they're LARGE. I mean ... L_A_R_G_E. So a big amount of memory would go into these gigantic keys. Indexing will be hard, right? And data retrieval will be a pain for the database - at least I guess - integer keys are much faster to handle. On the other hand, these also provide some security: Visitors can't iterate anymore over all orders or all users or all pictures by just incrementing the id parameter.
C) Stick with auto_incremented keys?
Ok, if then, what about transactions like described in the example above? How can I solve that? Maybe by inserting a Ghost Row first and then doing an transaction with one UPDATE + n INSERTs?
D) What else?
When storing orders, you need transactions to prevent situations where only half your products are added to the database.
Depending on your database and your connector, the value returned by the last-insert-id function might be transaction-independent. For instance, with MySQL, mysql_insert_id returns the identifier for the last query from that particular client (without being affected by what other clients are doing concurrently).
Which database are you using?
Yes, typically inserting a record and then trying to select it again to find the auto-generated key is bad, especially if you are using a naive select max(id) from table query. This is because as soon as two threads are creating records max(id) may not actually return the last id your current thread used.
One way to avoid this is to create a sequence in the database. From your code you select sequence.NextValue then use that value to then execute your inserts (or you can craft a more complex SQL statement that does this selection and the inserts in one go). Sequences are atomic / thread-safe.
In MySQL you can ask for the last inserted id from the execution results which I believe will always give you the correct answer.
Sql Server supports SCOPE_IDENTITY (Transact-SQL) which should take care of your transaction issue and concurrency issue.
I would say stick with auto_increment.
(Assuming you are using MySQL)
"ask the database for the last inserted id (which is dangerous when it comes to concurrency, right?)"
If you use MySQLs last_insert_id() function, you only see what happened in your session. So this is safe. You mention ths:
db.last_insert_id()
I don't know what framework or language it is, but I would assume that uses MySQL's last_insert_id() under the covers (if not, it is a pretty useless database abstraction fromework)
" I believe this prevents me from using transactions in almost all INSERT cases w"
I don't see why. Please explain.
D) Sequence
: may not be available in your DBMS, but if it is, solves your problem elegantly.
For Postgresql, have a look at Sequence Functions
There is no final and general answer to this question.
auto incrementing columns are easy to use when you add new records. To use them as foreign keys within the same transaction, they are not so straight forward. You need database specific commands to get the newly created key. This technology is common for certain databases, for instance sql server.
Sequences seem to be harder to use, because you need to get a key before you insert a row, but at the end its easier to use them as foreign keys. This technology is common for certain databases, for instance oracle.
When you use Hibernate or NHibernate, it is discouraged to use auto incrementing keys, because some optimizations are not possible anymore. Using a hi-lo algorithm which uses an additional table is recommended.
Guids are strong, for instance when sharing data between different databases, systems, disconnected scenarios, import / export etc. In many databases, most of the tables contain only a few hundred records, so memory and performance are not such an issue. When using NHibernate, you get an guid generator which produces sequential guids, because some databases perform better when keys are sequential.