MySQL insert unique technique - php

I have a php application that inserts a data into MySQL, which contains a randomly-generated unique value. The string will have about 1 billion possibilities, with probably no more than 1 or 2 million entries at any one time. Essentially, most combinations will not exist in the database.
I'm trying to find the least expensive approach to ensuring a unique value on insert. Specifically, my two options are:
Have a function that generates this unique ID. On each generation, test if the value exists in the database, if yes then re-generate, if no, return value.
Generate random string and attempt insert. If insert fails, test error is 1062 (MySQL duplicate entry X for key Y), re-generate key and insert with new value.
Is it a bad idea to rely upon the MySQL error for re-trying the insert? As I see it, the value will probably be unique, and it seems the initial (using technique 1) would be unnecessary.
EDIT #1
I should have also mentioned, the value must be a 6 character length string, composed of either uppercase letters and/or numbers. They can't be incremental either - they should be random.
EDIT #2
As a side note, I'm trying to create a redemption code for a gift certificate that is difficult to guess. Using numbers and letters creates 36 possibilities for each character, instead of 10 for just numbers or 26 for just letters.
Here's a stripped-down version of the solution I created. The first value entered in the table is the primary key, which is auto incremented. affected_rows() will equal 1 if the insert is successful:
$code = $build_code();
while ((INSERT INTO certificates VALUES ('', $code) ON DUPLICATE KEY UPDATE pk = pk) && affected_rows() == 0)
$code = $build_code();

Is it a bad idea to rely upon the MySQL error for re-trying the insert?
Nope. Go ahead an use it if you want. In fact many people think if you check and if it doesn't exist then it's safe to insert. But unless you lock the table it's always possible that another process might slip in and grab the id.
So go ahead generate a random id if it suits your purpose. Just make sure you test your code so it does properly handle dups. Might also be useful to log dups just to ensure your assumptions about how unlikey dups are to occur are correct.

Define your table with unique constraint:
http://dev.mysql.com/doc/refman/5.0/en/constraint-primary-key.html

Why not just use: "YourColName BIGINT AUTO_INCREMENT PRIMARY KEY" to ensure uniqueness?

Related

How to ensure uniqueness of a value in a MySQL DB table in with PHP – a Unique Key use case?

I have a website connected to a database. In one of its tables, one entity attribute that is not the primary key needs to be unique in that table.
Currently, I am querying the database before inserting a value into that column to check, if the value already exists. If it does, the value gets altered by my script and the same procedure starts again until no result gets back, which means it doesn't exist yet in the database.
While this works, I feel it's a great performance hog – even when the value is unique, the database needs to queried at least two times: One time for checking & one time for writing.
To improve performance & to make my (possible buggy/unnecessary) code obsolete, I have the idea to mark the column as Unique Key & to use a try/catch block for the writing/error handling process. That way, the database engine needs to handle the uniqueness, which seems a bit more reasonable than my query-write procedure.
Is this a good idea or are Unique Keys not made for this behavior? What is the typical use case of a Unique Key in a SQL database?
INSERT INTO table (uniquerow) VALUES(1) ON DUPLICATE KEY UPDATE uniquerow = 1;
With this statement, you can insert if it is unique and update if the key allready exists.
With unique constraints you can check a tuple of values not to be there multiple times, without being a primary key.

Mysql: locking table for read before the value is updated

In my database (MySQL) I have a table (MyISAM) containing a field called number. Each value of this field is either 0 or a positive number. The non zero values must be unique. And the last thing is that the value of the field is being generated in my php code according to value of another field (called isNew) in this table. The code folows.
$maxNumber = $db->selectField('select max(number)+1 m from confirmed where isNew = ?', array($isNew), 'm');
$db->query('update confirmed set number = ? where dataid = ?', array($maxNumber, $id));
The first line of code select the maximum value of the number field and increments it. The second line updates the record by setting it freshly generated number.
This code is being used concurrently by hundreds of clients so I noticed that sometimes duplicates of the number field occur. As I understand this is happening when two clients read value of the number field almost simultaneously and this fact leads to the duplicate.
I have read about the SELECT ... FOR UPDATE statement but I'm not quite sure it is applicable in my case.
So the question is should I just append FOR UPDATE to my SELECT statement? Or create a stored procedure to do the job? Or maybe completely change the way the numbers are being generated?
This is definitely possible to do. MyISAM doesn't offer transaction locking so forget about stuff like FOR UPDATE. There's definitely room for a race condition between the two statements in your example code. The way you've implemented it, this one is like the talking burro. It's amazing it works at all, not that it works badly! :-)
I don't understand what you're doing with this SQL:
select max(number)+1 m from confirmed where isNew = ?
Are the values of number unique throughout the table, or only within sets where isNew has a certain value? Would it work if the values of number were unique throughout the table? That would be easier to create, debug, and maintain.
You need a multi-connection-safe way of getting a number.
You could try this SQL. It will do the setting of the max number in one statement.
UPDATE confirmed
SET number = (SELECT 1+ MAX(number) FROM confirmed WHERE isNew = ?)
WHERE dataid = ?
This will perform badly. Without a compound index on (isNew, number), and without both those columns declared NOT NULL it will perform very very badly.
If you can use numbers that are unique throughout the table I suggest you create for yourself a sequence setup, which will return a unique number each time you use it. You need to use a series of consecutive SQL statements to do that. Here's how it goes.
First, when you create your tables create yourself a table to use called sequence (or whatever name you like). This is a one-column table.
CREATE TABLE sequence (
sequence_id INT NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`sequence_id`)
) AUTO_INCREMENT = 990000
This will make the sequence table start issuing numbers at 990,000.
Second, when you need a unique number in your application, do the following things.
INSERT INTO sequence () VALUES ();
DELETE FROM sequence WHERE sequence_id < LAST_INSERT_ID();
UPDATE confirmed
SET number = LAST_INSERT_ID()
WHERE dataid = ?
What's going on here? The MySQL function LAST_INSERT_ID() returns the value of the most recent autoincrement-generated ID number. Because you inserted a row into that sequence table, it gives you back that generated ID number. The DELETE FROM command keeps that table from snarfing up disk space; we don't care about old ID numbers.
LAST_INSERT_ID() is connection-safe. If software on different connections to your database uses it, they all get their own values.
If you need to know the last inserted ID number, you can issue this SQL:
SELECT LAST_INSERT_ID() AS sequence_id
and you'll get it returned.
If you were using Oracle or PostgreSQL, instead of MySQL, you'd find they provide SEQUENCE objects that basically do this.
Here's the answer to another similar question.
Fastest way to generate 11,000,000 unique ids

mysql autoincrement id vs php generated id

until now i ve always stored records in mysql database by generating an ID (varchar 32 primary key) with php, with a function like that:
$id = substr( str_shuffle( abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789 ), 0, 8 );
but until now in mysql DB i've always use utf8_bin (that is case sensitive) now i'm using utf8_general_ci (case insensitive).
I have a table in my DB to store statistics, in this table there are a millions of records.
in this case is better to use: 'id int unsigned autoincrement' as primary key?
if yes, is possibile that if many users call the sciprt at the same time the script crash with a 'duplicate id' error? and how i can avoid that?
Even though several people can access the site at once, but MySQL will process inserts in the table sequentially and will queue requests it receives. So in the insert query, if an ID is not provided an auto-incremented ID will be generated and then the row saved and committed. And the next request in queue will be processed. There is no way an auto-incremented ID can be as such duplicated.
Additionally, your code generates a random string and not an unique string. There is a lot of difference between the two. It is quite possible to generate a random string sequence that has been generated earlier.
On the other hand auto-increment is a gradually increasing sequential no ensuring there is no chance of having a duplicate key. As such it is always advised to use auto-increment to generate a primary key than generate one's own.
To get the last generated MySQL ID you can use mysqli_insert_id() right after your insert query in PHP and use it in your code for subsequent interactions with MySQL with respect to the inserted row.
At my opinion a autoincrement with mysql is better, because your php script now could be visited by more than one person at the same time.
So the id is maybe not unique anymore.
And I am pretty sure that mysql is so well programmed that it prohibit same ids ;)
In fact your current code has the bug that the same ID might be generated again. MySQL generated id doesn't have this problem. Even if you have a reason to generate your own ids, I would still use MySQL autoincrement integer to link between tables because of better indexing (speed).
And if for example you want to hide the sequence from the user, keep it in separate column with unique index. And do the id generation and insert in do while loop so if you happen to generate the same id second time, you can retry.

Check duplicates SQL or php?

I have a field in my users database, a 6 digit number that is generated upon registration. I use mt_rand(100000, 999999) to generate the numbers.
Now to the question, to make sure no one gets the same number I need to either make the field UNIQUE (which i think seems the best) instead of some PHP code. Maybe theres some other way I don't know. The question is, whats the best way to do this?
You can do this way using PHP.
First give a unique constraint to the field.
if (mysqli_errno() == 2027)
mysqli_query("INSERT INTO ... {mt_rand()}");
So, once you insert a duplicate value, it gives out an error code 2027, saying duplicate. You can resubmit the query.
Why don't you
use a nested select to get the max(user_id), increase it and use that value for the new user?
create a table that holds just the current user-id and fetch, increase and use that value to create the new user?
use an AUTO_INCREMENT column?
Use an AUTO_INCREMENT column.
Performing a query to check if a generated number already exists is a bad solution and become worse with more and more users registered because more number are used, so you need to keep trace of all generated numbers to always generate a valid number.
With an AUTO_INCREMENT column, this occurs "automatically" .

MySQL Unique hash insertion

So, imagine a mysql table with a few simple columns, an auto increment, and a hash (varchar, UNIQUE).
Is it possible to give mysql a query that will add a column, and generate a unique hash without multiple queries?
Currently, the only way I can think of to achieve this is with a while, which I worry would become more and more processor intensive the more entries were in the db.
Here's some pseudo-php, obviously untested, but gets the general idea across:
while(!query("INSERT INTO table (hash) VALUES (".generate_hash().");")){
//found conflict, try again.
}
In the above example, the hash column would be UNIQUE, and so the query would fail. The problem is, say there's 500,000 entries in the db and I'm working off of a base36 hash generator, with 4 characters. The likelyhood of a conflict would be almost 1 in 3, and I definitely can't be running 160,000 queries. In fact, any more than 5 I would consider unacceptable.
So, can I do this with pure SQL? I would need to generate a base62, 6 char string (like: "j8Du7X", chars a-z, A-Z, and 0-9), and either update the last_insert_id with it, or even better, generate it during the insert.
I can handle basic CRUD with MySQL, but even JOINs are a little outside of my MySQL comfort zone, so excuse my ignorance if this is cake.
Any ideas? I'd prefer to use either pure MySQL or PHP & MySQL, but hell, if another language can get this done cleanly, I'd build a script and AJAX it too.
Thanks!
This is our approach for a similar project, where we wanted to generate unique coupon codes.
First, we used an AUTO_INCREMENT primary key. This ensures uniqueness and query speed.
Then, we created a base24 numbering system, using A,B,C, etc, without using O and I, because someone might have thought that they were 0 or 1.
Then we converted the auto-increment integer to our base24 number. For example, 0=A, 1=B, 28=BE, 1458965=EKNYF. We used base24, because long numbers in base10 have fewer letters in base24.
Then we created a separate column in our table, coupon_code. This was not indexed.
We took the base24 and added 3 random numbers, or I and O (which were not used in our base24), and inserted them into our number. For example, EKNYF could turn into 1EKON6F or EK2NY3F9. This was our coupon code and we inserted it into our coupon_code column. It's unique and random.
So, when the user uses code EK2NY3F9, all we have to do it remove all non-used characters (2,3 and 9) and we get EKNYF, which we convert to 1458965. We just select the primary key 1458965 and then compare coupon_code column with EK2NY3F9.
I hope this helps.
If your heart is set on using base-36 4 character hashes (hashspace is only 1679616), you could probably pre-generate a table of hashes that aren't already in the other table. Then finding a unique hash would be as simple as moving it from the "unused table" to the "used table" which is O(1).
If your table is conceivably 1/3 full you might want to consider expanding your hashspace since it will probably fill up in your lifetime. Once the space is full you will no longer be able to find unique hashes no matter what algorithm you use.
What is this hash a hash of? It seems like you just want a randomly generated unique VARCHAR column? What's wrong with the auto increment?
Anyway, you should just use a bigger hash - find an MD5 function - (if you're actually hashing something), or a UUID generator with more than 4 characters, and yes, you could use a while loop, but just generate a big enough one so that conflicts are incredibly unlikely
As others have suggested whats wrong with an autoinc field? If you want an alpha numeric value then you could simply do a simple conversion from int to a alphanumeric string in base 36. This could be implemented in almost any language.
Going with zneaks comment, why don't you use an autoincrement column? save the hash in another (non unique) field, and concatenate the id to it (dynamically). So you give a user [hash][id]. You can parse it out in pure sql using the substring functions.
Since you have to have the hash, the user can't look at other records by incrementing the id.
So, just in case someone runs across a similar issue, I'm using a UNIQUE field, I'll be using a php hash function to insert the hashes, if it comes back with an error, I'll try again.
Hopefully because of the low likelyhood of conflict, it won't get slow.
You could also check the MySQL functions UUID() and UUID_SHORT(). Those functions generate UUIDs that are globally unique by definition. You won't have to double-check if your PHP-generated hash string already exists.
I think in several cases these functions can also fit your project's requirements. :-)
If you already have the table filled by some content, you can alter it with the following :
ALTER TABLE `page` ADD COLUMN `hash` char(64) AS (SHA2(`content`, 256)) AFTER `content`
This solution will add hash column right after the content one, generates hash for existing and new records too without need to change your INSERT statement.
If you add UNIQUE index to the column (after have removed duplicates), your inserts will only be done if content is not already in the table. This will prevent duplicates.

Categories