Generate 5 million unique codes - php

I am looking for an efficient way to generate 5 million unique codes with 7 characters (letters, numbers, special chars).
Basically, my idea was to generate a table with a unique constraint. Then to generate a code, insert it into the database, see if it is "accepted" (meaning a new code) until we have 5 million unique codes.
Alternatively they idea was to generate an array with 5 million unique codes to insert them afterward at once into the database to see how many of the codes make it into the database (are unique).
The third option was to create one code, check if it already exists, if not insert it into the database.
My question now is what method I should use - there might be a problem I oversee. Or is there a better way?
Thanks a lot!

Pick an appropriate function to generate one random code; for illustration purposes I'll be using this:
function generateCode() {
return substr(bin2hex(random_bytes(4)), 0, 7);
}
See https://stackoverflow.com/a/22829048/476 and other answers in there to pick something that works for you. The important point is that it uses a good source of randomness, either random_bytes, random_int, openssl_random_pseudo_bytes or /dev/urandom. This minimises the chance of two calls to this function producing the same output.
From there, simply use array keys to deduplicate the values:
$codes = [];
while (count($codes) < 5000000) {
$codes[generateCode()] = null;
}
$codes = array_keys($codes);
If generateCode is sufficiently random, there should be few collisions and there shouldn't be too much overhead in generating codes this way. Even if, this is presumably a one-time operation, and efficiency isn't paramount. 5 million short strings should certainly fit into memory without much problem. You can then insert them all into the database in a batch.

function generateRandomString($length = 7) {
// you can update these with new chars
$characters = '!##$%^&*()_+0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
$charactersLength = strlen($characters);
$randomString = '';
for ($i = 0; $i < $charactersLength; $i++) {
$randomString .= $characters[rand(0, $charactersLength - 1)];
}
return $randomString;
}
Now use an array to store the codes:
$codes = array();
while(count($codes)!=5000000){
$code = generateRandomString();
$codes[$code] = $code;
}
$codes key and value, both have the same code.

Given the purpose for which you're generating unique identifiers (as hard-to-guess coupon codes), I want to say that you should generate a unique identifier that combines a "unique" part and a "random" part.
The "unique" part can be a monotonically increasing counter (such as an auto-incremented row number in supporting databases), which can optionally serve as the seed of a full-period linear congruential generator (which cycles pseudorandomly through all possible values in its period before repeating).
The "random" part is simply a random number generated with a cryptographic random number generator (which for PHP is random_int). In general, the longer the random part is, the less predictable it will be.
Moreover, for the purposes of generating unique coupon codes, there is little reason to limit yourself to 7-character codes, especially if end users won't be required to enter those codes directly. See also this question.

Should the codes you wanted needed to be inserted in the database?
It would have been better not to constantly request to the db and try if it is unique.
You can store the codes to an array, first before putting it to the db.
Pseudo-code:
Generate unique 5 million codes, inserted in the hash table or an array. // as you insert a new one check the hash-table if it exists.
You then insert this hash table or array in the database now.

Related

PHP Random string generation miraculously generated the same string

So I've got a fairly simple function in PHP that renders 10 character long order IDs:
function createReference($length = 10)
{
$characters = 'ABCDEFGHIJKLMNPQRSTUVWXYZ123456789';
$string = '';
for ($i = 0; $i < $length; $i++) {
$string .= $characters[rand(0, strlen($characters) - 1)];
}
return $string;
}
However, today on the 154020th table record, it generated the same 10-character ID as a previous order ID (which was the 144258th record in the table), and tried to insert it. Since I have a UNIQUE restriction on the column, I got an error and I received a notification from this.
According to my calculations, the script above creates 34^10 = 2.064.377.754.059.776 different possibilities.
I've read some stuff about rand() and mt_rand() doing different stuff but that shouldnt be an issue on PHP 7.1+. The script is running on PHP 7.3.
So should I buy a lottery ticket right now, or is there something predictable about the pseudo-randomness being used here? If so, what is a solution to have better distribution?
Assuming rand() is a true RNG, then the expected chance to generate a duplicate reaches 50% after reaching a little more than the square root of all possibilities (see "Birthday problem" for a more precise statement and formulas). The square root of 34^10 is 45435424, so it's well over 144258, but of course, rand() is far from being a perfect or "true" RNG.
In any case, generating a unique random identifier using rand or mt_rand (rather than a cryptographic RNG such as random_int) is a bad idea anyway. Depending on whether or not IDs have to be hard to guess, or whether or not the ID alone is enough to grant access to the resource, it may or may not be a better idea to use auto-incrementing record numbers rather than random numbers. See my section "Unique Random Identifiers" for further considerations.
See also this question.

php function to generate random string returned duplicate values consecutively

I have written a function to generate a random string of 7 alphanumeric characters which I am then inserting in a mysql database.
Here is the code :
function getRandomID(){
$tmp ="";
$characters=array("A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z","1","2","3","4","5","6","7","8","9");
for($i=0;$i<7;$i++)
$tmp.=$characters[rand(0,count($characters)-1)];
return $tmp;
}
I am not checking for duplicates atm because I anticipate there will be no more than 1000 entries in the database and I've calculated that this function can return (35)^7 = 64,339,296,875 possible values.
I am testing it out locally as well as on a live server.
The problem is just in the last hour , this function generated duplicate values twice.
I came upon 3 entries in the database all of which had the same random string.
I do not know what could have caused this as I tried numerous times afterwards and the problem wasn't reproducible.
Does anybody have any idea what could be going on here?
Many thanks in advance
Designing your code with the mindset of "meh, that's not going to happen" is a very risky game, just do it properly once so you don't have to get back to your code multiple times to quick-fix minor things like these.
Do the duplicate check and you'll be solid.
You can create a function like
function stringExists($string)
{
...
return $boolValue;
}
And you can easily create a while loop that generates a new string while an old one has been generated.
$duplicate = true;
while($duplicate)
{
$newString = getRandomId();
$duplicate = !stringExists($string);
}
// Work with the newest string that is not a duplicate.
If you really want to get into it
You can then take a look at the documentation for rand if you want to find out what might be causing your problem. Besides, 3 entries doesn't mean anything if we don't know how many total entries there are. Also sometimes "random" function are not as random as one might think, sometimes random functions in some programming languages are always usable but require some sort of an initiation before they become "truly" random.
The time of the inserts might also be a part of the problem, there are plenty of threads on the internet, like this one on stackoverflow, that have some interesting points that can affect your "random"ness.
Whether it's true or not, not which has been pointed out in the comment, you can be pretty sure to find an answer to your question in related threads and topics.
Short answer: Don't think about it and do a duplicate check, it's easy.
Note that you should, of-course, make your ID be a UNIQUE constraint in the database to begin with.
Random != unique. Collisions happen. Check that the value is unique before you insert into the database, and/or put an integrity contstraint in your DB to enforce uniqueness.
If you're using a very old version of PHP [eg. pre-4.2] you have to seed the random number generator with srand().
Aside from #2, it's probably not your getRandomID() function but something else in your code that's re-using previous values.
If you need to enrer unique data in the DB, you may use PHP function uniqid(). (http://ca3.php.net/uniqid)
The function generates more-less random string based on current microseconds. So in theory it is unique.
But still, its always good to check before insert. Or at least put UNIQUE index on the field.
You could do something like this:
function randomString($length, $chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789") {
$string = "";
$charsLength = strlen($chars);
for ($i = 0; $i < intval($length); $i++) {
$string .= $chars[rand(0, $charsLength - 1)];
}
return $string;
}
The function above will generate a random string in the given length from the given characters. This makes it a little bit more flexible, than your implementation, if you need to use it in amother context later.
Then you could do a check like this:
$id = null;
do {
$id = randomString(7);
} while (!isUnique($id));
// do your insert here. You need to write your isUnique, so that it checks if
// the given string is unique or not.

How to generate card number so the users cannot follow how much is sold?

I want some generator script to generate unique numbers but not in one order. We need to sell tickets.
For example currently ticket numbers are like this:
100000
100001
100002
...
So the users can see how many are sold.
How can I generate unique numbers?
for example:
151647
457561
752163
...
I could use random number generator, but then I have always check in database if such number has not been generated.
Hmm, maybe when using index on that column - the check would not take long.
Still now I have to get last card number, if I want to add 1 to it, but getting last is fast enough.
And the more tickets will be sold, then bigger chance that RNG will generate existing number. So migth be more checks in future. SO the best would be to take last number and generate next by it.
Here's a simple way to scramble ticket numbers (note: you need 64-bit PHP, or change the code to use the bcmath library):
function scramble($number) {
return (305914*($number-100000)+151647) % 999983;
}
Look, the output even looks like your example:
Input Output
------ ------
100000 151647
100001 457561
100002 763475
100003 069406
If you want to you can reverse it, so you can use these codes in URLs and then recover the original number:
function unscramble($number) {
return (605673*($number-151647)+100000) % 999983 ;
}
Is this safe? Someone with access to many sequential numbers can find the pattern so don't use this if the ticket numbers are extremely sensitive.
Generate random numbers, make the ticket number unique index, insert the record with the new ticket, if fails means that you had a collision, so you have to generate another id. With a good random space, say 32 bit integer, the chance of collision is minimal. The SQL implementation behind if the column is index and numerical is lightning fast.
You can have your number generated, store in a pool, when you need new number, get one with RNG index of the pool, remove from the pool and return it.
if the pool nearly run out, just generate another batch of it
function generateCode() {
$chars = '01234567890';
do {
$code = '';
for ($x = 0; $x < 6; $x++) {
$code .= $chars[ rand(0, strlen($chars)-1) ];
}
you may check here in databse if this code has been generated earlier, if yes, return;
} while (true);
return $code;
}
The easy way, you can simply use md5() function..
And to get a 6 digit string, you can do
$x = md5(microtime());
echo substr($x, 0, 6);
Edit:
session_start();
$x = md5(microtime().session_id());
echo substr($x, 0, 6);

Never Generate Random Number Again

I am looking for a random number generating PHP Solution which did not generate same number again.. is there any solution then please let me know..
I need this solution for one of my Project which generate uniqu key for URL and i don't want to check Generated number is existed or not from the data..
Thanks..
--------- EDIT ----------
I am using this random number generating method is its help full?
function randomString($length = 10, $chars = '1234567890') {
// Alpha lowercase
if ($chars == 'alphalower') {
$chars = 'abcdefghijklmnopqrstuvwxyz';
}
// Numeric
if ($chars == 'numeric') {
$chars = '1234567890';
}
// Alpha Numeric
if ($chars == 'alphanumeric') {
$chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890';
}
// Hex
if ($chars == 'hex') {
$chars = 'ABCDEF1234567890';
}
$charLength = strlen($chars)-1;
for($i = 0 ; $i < $length ; $i++)
{
$randomString .= $chars[mt_rand(0,$charLength)];
}
return $randomString;
}
Look at the php function uniqid():
http://php.net/manual/en/function.uniqid.php
It's impossible to generate a random number which is unique - if the generator is dependent on state, then the output is by definition not random.
It is possible to generate a set of random numbers and remove duplicates (although at the numbers again cease to be be truly random).
Do you really need a random number or do you need a sequence number or a unique identifier - these are 3 separate things.
which generate unique key for URL
MySQL and SQLite both support auto-increment column types which will be unique (effectively the same as a sequence number). MySQL even has a mechanism for ensuring uniqueness across equivalent nodes - even where they are not tightly coupled. Oracle provides sequence generators.
Both MySQL and PHP have built-in functionality for generating uuids, although since most DBMS support surrogate key generation, there is little obvious benefit to this approach.
You can use a database... Everytime a random number has shown up, put it in a database and next time, compare the random number of the new script with those already in the database.
Use a random number generator, keep stored the already generated values, discard and generate again when you get a duplicate number.
Ignore uniqids and stuff like that because they are just plain wrong.
There are no real "perfect and low price" random number generators!!
The best that can be done from mathematical functions are pseudorandom which in the end seem random enough for most intents and purposes.
mt_rand function uses the Mersenne twister, which is a pretty good PRNG!
so it's probably going to be good enough for most casual use.
give a look here for more info: http://php.net/manual/en/function.mt-rand.php
a possible code implementation is
<?php
$random = mt_rand($yourMin, $yourMax);
?>
EDITD:
find a very good explanation here:
Generate cryptographically secure random numbers in php
The typical answer is to use a GUID or UUID, although I avoid those forms that use only random numbers. (Eg, avoid version 4 GUID or UUIDs)

truly unique random number generate by php?

I'm have build an up php script to host large number of images upload by user, what is the best way to generate random numbers to image filenames so that in future there would be no filename conflict? Be it like Imageshack. Thanks.
$better_token = uniqid(md5(mt_rand()), true);
Easiest way would be a new GUID for each file.
http://www.php.net/manual/en/function.uniqid.php#65879
Here's how I implemented your solution
This example assumes i want to
Get a list, containing 50 numbers that is unique and random, and
This list of # to come from the number range of 0 to 1000
Code:
//developed by www.fatphuc.com
$array = array(); //define the array
//set random # range
$minNum = 0;
$maxNum = 1000;
// i just created this function, since we’ll be generating
// # in various sections, and i just want to make sure that
// if we need to change how we generate random #, we don’t
// have to make multiple changes to the codes everywhere.
// (basically, to prevent mistakes)
function GenerateRandomNumber($minNum, $maxNum){
return round(rand($minNum, $maxNum));
}
//generate 49 more random #s to give a total of 50 random #s
for($i = 1; $i <= 49; $i++){
$num1 = GenerateRandomNumber($minNum, $maxNum);
while(in_array($num1, $array)){
$num1 = GenerateRandomNumber($minNum, $maxNum);
}
$array[$i] = $num1;
}
asort($array); //just want to sort the array
//this simply prints the list of #s in list style
echo '<ol>';
foreach ($array as $var){
echo '<li>';
echo $var;
echo '</li>';
}
echo '</ol>';
Keep a persistent list of all the previous numbers you've generated(in a database table or in a file) and check that a newly generated number is not amongst the ones on the list. If you find this to be prohibitively expensive, generate random numbers on a sufficient number of bits to guarantee a very low probability of collision.
You can also use an incremental approach of assigning these numbers, like a concatenation of a timestamp_part based on the current time and a random_part, just to make sure you don't get collisions if multiple users upload files at the same time.
You could use microtime() as suggested above and then appending an hash of the original filename to further avoid collisions in the (rare) case of exact contemporary uploads.
There are several flaws in your postulate that random values will be unique - regardless of how good the random number generator is. Also, the better the random number generator, the longer it takes to calculate results.
Wouldn't it be better to use a hash of the datafile - that way you get the added benefit of detecting duplicate submissions.
If detecting duplicates is known to be a non-issue, then I'd still recommend this approach but modify the output based on detected collisions (but using a MUCH cheaper computation method than that proposed by Lo'oris) e.g.
$candidate_name=generate_hash_of_file($input_file);
$offset=0;
while ((file_exists($candidate_name . strrev($offset) && ($offset<50)) {
$offset++;
}
if ($offset<50) {
rename($input_file, $candidate_name . strrev($offset));
} else {
print "Congratulations - you've got the biggest storage network in the world by far!";
}
this would give you the capacity to store approx 25*2^63 files using a sha1 hash.
As to how to generate the hash, reading the entire file into PHP might be slow (particularly if you try to read it all into a single string to hash it). Most Linux/Posix/Unix systems come with tools like 'md5sum' which will generate a hash from a stream very efficiently.
C.
forge a filename
try to open that file
if it exists, goto 1
create the file
Using something based on a timestamp maybe. See the microtime function for details. Alternatively uniqid to generate a unique ID based on the current time.
Guaranteed unique cannot be random. Random cannot be guaranteed unique. If you want unique (without the random) then just use the integers: 0, 1, 2, ... 1235, 1236, 1237, ... Definitely unique, but not random.
If that doesn't suit, then you can have definitely unique with the appearance of random. You use encryption on the integers to make them appear random. Using DES will give you 32 bit numbers, while using AES will give you 64 bit numbers. Use either to encrypt 0, 1, 2, ... in order with the same key. All you need to store is the key and the next number to encrypt. Because encryption is reversible, then the encrypted numbers are guaranteed unique.
If 64 bit or 32 bit numbers are too large (32 bits is 8 hex digits) then look at a format preserving encryption which will give you a smaller size range at some cost in time.
My solution is usually a hash (MD5/SHA1/...) of the image contents. This has the added advantage that if people upload the same image twice you still only have one image on the hard disk, saving some space (ofc you have to make sure that the image is not deleted if one user deletes it and another user has the same image in use).

Categories