Generate sequence on digits. Sequences should not be any similar - php

I'd like to generate a long list of 9-digits sequences.
Let's call them ID.
So each ID is unique and the main purpose is to have them all really different. It is unacceptable to have 2 IDs which differs by 1 or 2 digits in sequence.
Do you have any ideas how to implement it without comparing each new generated ID with each previously generated?
Probably there is some algorithm already or simple MYSQL function to compare how close those strings are?

You could try the following formula for your ID's - you would only need to check that the ID value doesn't already exist in the table (salt is a constant between 0 and 100 that doesn't ever change once you pick a value - I would recommend using a prime number, and definitely not 0):
ID = random integer * 101 + salt;
This generates ID values like the following (for salt = 73):
469956305
017775467
001195913
913620520
156482807
577463533
470183959
049290800
078643925
141526626
If you take any two of these ID values and compare them, you'll notice that no two numbers differ by only one or two digits in sequence. I wrote a script to compare all possible ID values between 0 and 3000000, and there were no two ID values of this form differing by 1 or 2 digits in sequence. If you want to test it out yourself, here's the script I used (in C#): http://ideone.com/lFHnlX - I reduced the upper limit because of timeout on IDEone.

You want to get away with not-checking for uniqueness and you don't want IDs to be similar? Then you're really looking for UUIDs/GUIDs.
MySQL's built-in uuid() function will get you there.
As Robert Harvey points out, UUIDs are alphanumeric (not numeric) and longer than 9 characters, but you're going to have to sacrifice something – you cannot satisfy all of your constraints simultaneously.

Related

Does hashing a random value plus an auto increment number ensure uniqueness?

I'm trying to generate a unique order number for my ecommerce application, this is my code:
<?php
$bytes = random_bytes(3);
$random_hash = bin2hex($bytes);
$order_num = $random_hash . "1";
echo strtoupper(hash('crc32b', $order_num));
The order number (in the example is 1), is going to be an auto-increment value retrieved from MySQL.
Does this ensure me uniqueness?
I wanted a short max 8-10 chars unique final value.
An only numbers solution would be fine too.
As far as I know, most hash algorithms make no guarantee of when collisions might occur, so you're probably just as likely to get a collision with your proposed code as using the random part on its own.
If the auto-increment part is unique, and the random part is just to avoid guesses, you could just concatenate the two parts together (i.e. everything in your example before the hash call). That way if the same random number comes up twice, it will have different numbers on the end.
If that results in something too long, you could do something with base_convert or asc to convert the number into a shorter representation.
The hash function will not provide any uniqueness to the id, it only obfuscates the id a bit.
If you have lets say 100 possible values, you would get 100 possible hashes from them, no more. If an attacker wants to brute-force the hashes, he can pick the 100 possible hashes and try them.
In your case with 3 bytes of randomness, you would not get all possible combinations before you get a duplicate. So the same random number would be generated much earlier than with 3 bytes of possible combinations.
There are two common approaches when it comes to unique ids:
You let the database automatically increment the id, this makes sure that the id is unique.
You generate a UUID (global id with 16 bytes) which offers such a huge keyspace, that a duplicate is extremely unlikely. In practice one can neglate the possiblilty of duplicates.
The UUID has a lot of advantages and one disadvantage:
(+) UUID's can work decentralized e.g. in an offline scenario.
(+) One can generate the id before it is inserted in the database, so one has not to wait before the row is created in the db.
(+) The ids are not deterministic, so an attacker cannot guess the next id.
(-) They use more storage space and are a bit slower when searching.

Anti-forgery unique serial number generation

I am trying to generate a random serial number to put on holographic stickers in order to let customers check if the purchased product is authentic or not.
Preface:
Once you input that and query that code it will be nulled, so next time you do it again you receive a message that the product might be fake because the code is already used.
Considering that I should make this system for a factory that produces no more than 2/3 millions pieces a year, for me is a bit hard understand how to set up everything, at least the 1st time…
I thought about 20 digits code in 4 groups (no letters because must be very easy for the user read and input the code)
12345-67890-98765-43210
This is what I think is the easiest way to do everything:
function mycheckdigit()
{
...
return $myserial;
}
$mycustomcode="123";
$qty=20000;
$myfile = fopen("./thefile.txt","w") or die("Houston we got a problem here");
//using a txt file for a test, should be a DB instead...
for($i=0;$i<=$qty;$i++) {
$txt = date("y").$mycustomcode.str_pad(gettimeofday()['usec'],6,STR_PAD_LEFT).random_int(1000000,9999999). "\n";
//here the code to make check digits
mycheckdigit($txt);
fwrite($myfile,$myserial);
}
fclose($myfile);
The 1st group identifying something like year: 18 and 3 custom code
The 2nd group include microtime (gettimeofday()['usec'])
The 3rd completely random
last group including 3 random number and a check digit for group 1 and a check digit for group 2
in short:
Y= year
E= part of the EAN or custom code
M= Microtime generated number (gettimeofday()['usec'])
D= random_int() digits
C= Check Digit
YYEEE-MMMMM-MDDDD-DDDCC
In this way, I have a prefix that changes every year, I can recognize what brand is the product (so I could use one DB source only) and I still have enough random digits to be - maybe - quite unique if I consider that I will “pick-up” only a portion of the numbers from 1,000,000 and 9,999,999 and split it following using above sorting
Some questions for you:
Do you think I have enough combinations to not generate same code in one year considering 2 million codes? I would not use a lookup in the DB for the same code if it is not really necessary because could slow down batch generation (executed in batch during production process)
Could be better put some also unique identifier, like a day of the year (001-365) and make random_int() 3 digits shorter? Please Consider that I will generate codes monthly and not daily (but I think there is no big change in uniqueness)
Considering that backend in PHP I am thinking to use mt_rand() function, could be a good approach?
UPDATE: After the #apokryfos suggestion, I read more about UUID generation and similar I found a good compromise using random_int() instead.
Because I just need digits, so HEX hashes are not useful for my needs and making things more complicated
I would avoid using complex cryptographic things like RSA keys and so on…
I don’t need that level of security and complexity, I just need a way to generate a unique serial number, most unique as possible that is not easy to be guessed and nulled if you don’t scratch the sticker (so number creation should not be made A to Z, but randomly)
You can play with 11 random digits per year so that's 11 digit numbers 1 to 99999999999 (99.9 billion is a lot more than 2 million) so w.r.t. enough combinations I think you're covered.
However using mt_rand you're likely to get collisions. Here's a way to plan your way to 2 million random numbers before using the database:
<?php
$arr = [];
while (count($arr) < 1000000) {
$num = mt_rand(1, 99999999999);
$numStr = str_pad($num,11,0,STR_PAD_LEFT); //Force 11 digits
if (!isset($arr[$numStr])) {
$arr[$numStr] = true;
}
}
$keys= array_keys($arr);
The number of collisions is generally low (the first collision occurs at at about 300 000 - 500 000 numbers generated so it's pretty rare.
Each value in the array $keys is an 11 digit number which is random and unique.
This approach is relatively fast but be aware it will need quite a bit of memory (more than 128MB).
This being said, a more generally used method is to generate a universally unique identifier (UUID) which is a lot more likely to be unique and will therefore does not really need checking for uniqueness.

How many bytes are unique enough for twitter?

I don't want my database id's to be sequential, so I'm trying to generate uids with this code:
$bin = openssl_random_pseudo_bytes(12);
$hex = bin2hex($bin);
return base_convert($hex, 16, 36);
My question is: how many bytes would i need to make the ids unique enough to handle large amounts of records (like twitter)?
Use PHP's uniqid(), with an added entropy factor. That'll give you plenty of room.
You might considering something like the way tinyurl and other shortening services work. I've used similar techniques, which guarantees uniqueness until all combinations are exhausted. So basically you choose an alphabet, and how many characters you want as a length. Let's say we use alphanumeric, upper and lower, so that's 62 characters in the alphabet, and let's do 5 characters per code. That's 62^5 = 916,132,832 combinations.
You start with your sequential database ID and you multiply that be some prime number (choose one that's fairly large, like 2097593). All you do is multiply that by your database ID, making sure to wrap around if you exceed 62^5, and then convert that number to base-62 as per your chosen alphabet.
This makes each code look fairly unique, yet because we use a prime number, we're guaranteed not to hit the same number twice until we've used all codes already. And it's very short.
You can use longer keys with a smaller alphabet, too, if length isn't a concern.
Here's a question I asked along the same lines: Tinyurl-style unique code: potential algorithm to prevent collisions
Assuming that openssl_random_pseudo_bytes may generate every possible value, N bytes will give you 2 ^ (N * 8) distinct values. For 12 bytes this is 7.923 * 10^28
use MySQL UUID
insert into `database`(`unique`,`data`) values(UUID(),'Test');
If your not using MySQL search google for UUID (Database Name) and it will give you an option
Source Wikipedia
In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%

How to produce a short unique id in php?

In order to produce a unique Id I suppose I must use the uniqid function in php.
But uniqid produces a 13 digits long HEXA number, by default.
4f66835b507db
I would like to reduce this number to 7 digits long NUMERIC number but I want to conserve the unicity. Is it possible ?
4974012
This number will be used as User Id. The authentication will be done with thid Id and a password.
Some people say uniqid is not unique ! Is it a bad choice ?
Any "unique" number will eventually have a collision after generating enough records. To ensure uniqueness, you need to store the values you generated into a database and when generating next one, you need to check if there is no collision.
However, in practice, applications usually generate IDs as a simple sequence 1,2,3,... That way you know you won't get a collision until you run out of the datatype (UINT is usually 32 bits long, which gives you 4 billion unique ids).
Uniqid is not guaranteed to be unique, even in its full length.
Furthermore, uniqid is intended to be unique only locally. This means that if you create users simultaneously on two or more servers, you may end up with one ID for two different users, even if you use full-length uniqid.
My recommendations:
If you are really looking for globally unique identifiers (i.e. your application is running on multiple servers with separate databases), you should use UUIDs. These are even longer than the ones returned by uniqid, but there is no practical chance of collisions.
If you need only locally unique identifiers, stick with AUTO_INCREMENT in your database. This is (a little) faster and (a little) safer than checking if a short random ID already exists in your database.
EDIT: As it turns out in the comments below, you are looking not only for an ID for the user, but rather you are forced to provide your users with a random login name... Which is weird, but okay. In such case, you may try to use rand in a loop, until you get one that does not exist in your database.
Pseudocode:
$min = 1;
do {
$username = "user" . rand($min, $min * 10);
$min = $min * 10;
} while (user_exists($username));
// Create your user here.
Write a while loop that generates random letters and numbers of a desired length, which loops until it creates an ID that is not already in use.
Well, by reducing it to 7 characters and only numeric, you are reducing the 'uniqueness' by a lot.
I suggest using an auto increment of the user ID and start at 1000000 if it has to be 7 digits long.
If you really must generate it without auto increment, you can use mt_rand() to generate a random number 7 digits long:
$random = mt_rand(1000000, 9999999);
This is not ideal because you will need to check if the number is already in use by another user.
If you are using a Database. Define an id column as unique and auto-incremented, and then let the database manage your ids.
It's safer.
Read more : mysql-doc
Take a lookt at this article
Create short IDs with PHP - Like Youtube or TinyURL
It explains how to generate short unique ids, like youtube does.
Actually, the function in the article is very related to php function base_convert which converts a number from a base to another (but is only up to base 36).

Create a unique 4-byte Integer number from a String in PHP

I have a SQL table which uses strings for a key. I need to convert that string (max. 18 Characters) to a unique (!) 4-byte integer using PHP. Can anyone help?
Unique? Not possible, sorry.
Let's take a closer look:
With 18 characters, even if we were assuming only the 128 possible characters of ASCII (7 bits), you'd get 128^18 possible strings (and I'm not even going into the possibility of shorter strings!), which is about 8E37 ( 8 and 37 zeroes ).
With a 4-byte integer, you're getting 256^4 possible integers, which is about 4E9 ( 4 billion ).
So, you have about 4E28 more strings than you have integers; you can't have an unique mapping.
Therefore, you'll definitely run into a collision as soon as you enter the 4294967297th key, but it is possible to run into one as soon as you enter more than one.
See also: http://en.wikipedia.org/wiki/Pigeonhole_principle
Keep a lookup-table of strings to integers. Everytime you encounter a new string you add it to the mapping table and assign it a new unique ID. This will work for about 2^32 strings which is probably enough.
There is no way to do this for more that 2^32 distinct strings.
You can't. A four-byte integer can represent 2^32 = 4 billion values, which is not enough to hold your target space.
If you currently have less then 4 billion rows in the table, you could create a cross table that just assigns an incremental value to each. You'd be limited to 4 billion rows with this approach, but this may be fine for your situation.

Categories