So I've got a fairly simple function in PHP that renders 10 character long order IDs:
function createReference($length = 10)
{
$characters = 'ABCDEFGHIJKLMNPQRSTUVWXYZ123456789';
$string = '';
for ($i = 0; $i < $length; $i++) {
$string .= $characters[rand(0, strlen($characters) - 1)];
}
return $string;
}
However, today on the 154020th table record, it generated the same 10-character ID as a previous order ID (which was the 144258th record in the table), and tried to insert it. Since I have a UNIQUE restriction on the column, I got an error and I received a notification from this.
According to my calculations, the script above creates 34^10 = 2.064.377.754.059.776 different possibilities.
I've read some stuff about rand() and mt_rand() doing different stuff but that shouldnt be an issue on PHP 7.1+. The script is running on PHP 7.3.
So should I buy a lottery ticket right now, or is there something predictable about the pseudo-randomness being used here? If so, what is a solution to have better distribution?
Assuming rand() is a true RNG, then the expected chance to generate a duplicate reaches 50% after reaching a little more than the square root of all possibilities (see "Birthday problem" for a more precise statement and formulas). The square root of 34^10 is 45435424, so it's well over 144258, but of course, rand() is far from being a perfect or "true" RNG.
In any case, generating a unique random identifier using rand or mt_rand (rather than a cryptographic RNG such as random_int) is a bad idea anyway. Depending on whether or not IDs have to be hard to guess, or whether or not the ID alone is enough to grant access to the resource, it may or may not be a better idea to use auto-incrementing record numbers rather than random numbers. See my section "Unique Random Identifiers" for further considerations.
See also this question.
Related
I am looking for an efficient way to generate 5 million unique codes with 7 characters (letters, numbers, special chars).
Basically, my idea was to generate a table with a unique constraint. Then to generate a code, insert it into the database, see if it is "accepted" (meaning a new code) until we have 5 million unique codes.
Alternatively they idea was to generate an array with 5 million unique codes to insert them afterward at once into the database to see how many of the codes make it into the database (are unique).
The third option was to create one code, check if it already exists, if not insert it into the database.
My question now is what method I should use - there might be a problem I oversee. Or is there a better way?
Thanks a lot!
Pick an appropriate function to generate one random code; for illustration purposes I'll be using this:
function generateCode() {
return substr(bin2hex(random_bytes(4)), 0, 7);
}
See https://stackoverflow.com/a/22829048/476 and other answers in there to pick something that works for you. The important point is that it uses a good source of randomness, either random_bytes, random_int, openssl_random_pseudo_bytes or /dev/urandom. This minimises the chance of two calls to this function producing the same output.
From there, simply use array keys to deduplicate the values:
$codes = [];
while (count($codes) < 5000000) {
$codes[generateCode()] = null;
}
$codes = array_keys($codes);
If generateCode is sufficiently random, there should be few collisions and there shouldn't be too much overhead in generating codes this way. Even if, this is presumably a one-time operation, and efficiency isn't paramount. 5 million short strings should certainly fit into memory without much problem. You can then insert them all into the database in a batch.
function generateRandomString($length = 7) {
// you can update these with new chars
$characters = '!##$%^&*()_+0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
$charactersLength = strlen($characters);
$randomString = '';
for ($i = 0; $i < $charactersLength; $i++) {
$randomString .= $characters[rand(0, $charactersLength - 1)];
}
return $randomString;
}
Now use an array to store the codes:
$codes = array();
while(count($codes)!=5000000){
$code = generateRandomString();
$codes[$code] = $code;
}
$codes key and value, both have the same code.
Given the purpose for which you're generating unique identifiers (as hard-to-guess coupon codes), I want to say that you should generate a unique identifier that combines a "unique" part and a "random" part.
The "unique" part can be a monotonically increasing counter (such as an auto-incremented row number in supporting databases), which can optionally serve as the seed of a full-period linear congruential generator (which cycles pseudorandomly through all possible values in its period before repeating).
The "random" part is simply a random number generated with a cryptographic random number generator (which for PHP is random_int). In general, the longer the random part is, the less predictable it will be.
Moreover, for the purposes of generating unique coupon codes, there is little reason to limit yourself to 7-character codes, especially if end users won't be required to enter those codes directly. See also this question.
Should the codes you wanted needed to be inserted in the database?
It would have been better not to constantly request to the db and try if it is unique.
You can store the codes to an array, first before putting it to the db.
Pseudo-code:
Generate unique 5 million codes, inserted in the hash table or an array. // as you insert a new one check the hash-table if it exists.
You then insert this hash table or array in the database now.
I am using the php rand() function to generate coupon codes for my e commerce system.
It worked fine for a while but now I am getting a lot of errors that the code is already in the system.
This is the function I use:
function generateRandomString($length) {
$characters = '0123456789abcdefghijklmnopqrstuvwxyz';
$randomString = '';
for ($i = 0; $i < $length; $i++) {
$randomString .= $characters[rand(0, strlen($characters) - 1)];
}
return $randomString;
}
And my codes are 32 characters long.
I did a sample of ~150 tries and noticed that more than 50% of the generated codes where laready in the system.
I have 4212 codes in the system. The odds of a 32 character random string with 36 different symbols producing a collision are basically zero, and I get 50% collisions.
When I re-seeded the random number generator in my function by calling srand(); I did not have any collisions any more.
But on the man page of php it cleary says:
Note: As of PHP 4.2.0, there is no need to seed the random number
generator with srand() or mt_srand() as this is now done
automatically.
I am running php version PHP 5.5.9
So my thoughts where something like that seeding is done, but only once per webserver worker, and then when the process is forked, it is not reseeded or something like that.
But that would be clearly a bug in apache...
I am running php as apache modul in apache version Apache/2.4.7 (Ubuntu) and the mpm_prefork_module module
So do I still need to call srand() at the top of every script dispite the manpages saying other wise, and why? Is it apaches fault or PHP's?
And yes, I am aware that I should not use this function for this purpose, and I will update it to use cryptographically secure numbers. But I think this should not happen anyway and I am still interested in what is going on!
If your codes are 32 characters long, then why don't you simply encrypt the current microtime with md5 ?
$coupon = md5( microtime() );
One line simple. And if you want a touch of randomness, just throw a
$coupon = md5( microtime() . mt_rand( 0, 10000) );
On there like a salt. That will almost guarantee you will never duplicate. As for the why it is not as random.
PHP’s random number generators are seeded only once per process.
See this posting ...
http://phpsecurity.readthedocs.org/en/latest/Insufficient-Entropy-For-Random-Values.html
by the way I don't think you need cryptographically secure hashes, only sufficiently random ones that cant be easily guessed. Even with a cryptographic hash, users will enter said hash into the cart for the coupon, it's a simple matter to brute force even a cryptographically secure hash then, you'd do better to invest time in only allowing "n" attempts, or "n" attempts per second etc. To reduce the rate a brute force attack can be done.
For example, I would just try all combinations of 32 character hashes. So it doesn't matter in the end, because you are not using plaintext entries like a password, and then hiding your salting and encryption method. The number of coupons active would determine my success rate and the time it takes me in either case ... If you follow.
IMPLIED IN MY ANSWER IS THIS
PHP’s random number generators are seeded only once per process. Forking does not create a new process but copies the current processes state.
See
Calling rand/mt_rand on forked children yields identical results
and
http://wiki.openssl.org/index.php/Random_fork-safety
and
http://www.reddit.com/r/shittyprogramming/comments/2jvzgq/sometimes_it_takes_real_shitty_code_to_expose_an/
Additionally this is not an issue specific to php but more so to psudorandom number generation in general.
See this: https://github.com/php/php-src/blob/d0cb715373c3fbe9dc095378ec5ed8c71f799f67/ext/standard/rand.c#L66-L68
Apparently RNG is being reseeded on first call to rand() (or explicitly calling srand()).
Since fork copies parent's memory, child also gets parent's seed - never getting reseeded.
function generateRandomString($length) {
$characters = '0123456789abcdefghijklmnopqrstuvwxyz';
$randomString = '';
for ($i = 0; $i < $length; $i++) {
$randomString .= $characters[rand(0, strlen($characters) - 1)];
}
return substr(time().$randomString,0,$length);
}
I have written a function to generate a random string of 7 alphanumeric characters which I am then inserting in a mysql database.
Here is the code :
function getRandomID(){
$tmp ="";
$characters=array("A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z","1","2","3","4","5","6","7","8","9");
for($i=0;$i<7;$i++)
$tmp.=$characters[rand(0,count($characters)-1)];
return $tmp;
}
I am not checking for duplicates atm because I anticipate there will be no more than 1000 entries in the database and I've calculated that this function can return (35)^7 = 64,339,296,875 possible values.
I am testing it out locally as well as on a live server.
The problem is just in the last hour , this function generated duplicate values twice.
I came upon 3 entries in the database all of which had the same random string.
I do not know what could have caused this as I tried numerous times afterwards and the problem wasn't reproducible.
Does anybody have any idea what could be going on here?
Many thanks in advance
Designing your code with the mindset of "meh, that's not going to happen" is a very risky game, just do it properly once so you don't have to get back to your code multiple times to quick-fix minor things like these.
Do the duplicate check and you'll be solid.
You can create a function like
function stringExists($string)
{
...
return $boolValue;
}
And you can easily create a while loop that generates a new string while an old one has been generated.
$duplicate = true;
while($duplicate)
{
$newString = getRandomId();
$duplicate = !stringExists($string);
}
// Work with the newest string that is not a duplicate.
If you really want to get into it
You can then take a look at the documentation for rand if you want to find out what might be causing your problem. Besides, 3 entries doesn't mean anything if we don't know how many total entries there are. Also sometimes "random" function are not as random as one might think, sometimes random functions in some programming languages are always usable but require some sort of an initiation before they become "truly" random.
The time of the inserts might also be a part of the problem, there are plenty of threads on the internet, like this one on stackoverflow, that have some interesting points that can affect your "random"ness.
Whether it's true or not, not which has been pointed out in the comment, you can be pretty sure to find an answer to your question in related threads and topics.
Short answer: Don't think about it and do a duplicate check, it's easy.
Note that you should, of-course, make your ID be a UNIQUE constraint in the database to begin with.
Random != unique. Collisions happen. Check that the value is unique before you insert into the database, and/or put an integrity contstraint in your DB to enforce uniqueness.
If you're using a very old version of PHP [eg. pre-4.2] you have to seed the random number generator with srand().
Aside from #2, it's probably not your getRandomID() function but something else in your code that's re-using previous values.
If you need to enrer unique data in the DB, you may use PHP function uniqid(). (http://ca3.php.net/uniqid)
The function generates more-less random string based on current microseconds. So in theory it is unique.
But still, its always good to check before insert. Or at least put UNIQUE index on the field.
You could do something like this:
function randomString($length, $chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789") {
$string = "";
$charsLength = strlen($chars);
for ($i = 0; $i < intval($length); $i++) {
$string .= $chars[rand(0, $charsLength - 1)];
}
return $string;
}
The function above will generate a random string in the given length from the given characters. This makes it a little bit more flexible, than your implementation, if you need to use it in amother context later.
Then you could do a check like this:
$id = null;
do {
$id = randomString(7);
} while (!isUnique($id));
// do your insert here. You need to write your isUnique, so that it checks if
// the given string is unique or not.
I am looking for a random number generating PHP Solution which did not generate same number again.. is there any solution then please let me know..
I need this solution for one of my Project which generate uniqu key for URL and i don't want to check Generated number is existed or not from the data..
Thanks..
--------- EDIT ----------
I am using this random number generating method is its help full?
function randomString($length = 10, $chars = '1234567890') {
// Alpha lowercase
if ($chars == 'alphalower') {
$chars = 'abcdefghijklmnopqrstuvwxyz';
}
// Numeric
if ($chars == 'numeric') {
$chars = '1234567890';
}
// Alpha Numeric
if ($chars == 'alphanumeric') {
$chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890';
}
// Hex
if ($chars == 'hex') {
$chars = 'ABCDEF1234567890';
}
$charLength = strlen($chars)-1;
for($i = 0 ; $i < $length ; $i++)
{
$randomString .= $chars[mt_rand(0,$charLength)];
}
return $randomString;
}
Look at the php function uniqid():
http://php.net/manual/en/function.uniqid.php
It's impossible to generate a random number which is unique - if the generator is dependent on state, then the output is by definition not random.
It is possible to generate a set of random numbers and remove duplicates (although at the numbers again cease to be be truly random).
Do you really need a random number or do you need a sequence number or a unique identifier - these are 3 separate things.
which generate unique key for URL
MySQL and SQLite both support auto-increment column types which will be unique (effectively the same as a sequence number). MySQL even has a mechanism for ensuring uniqueness across equivalent nodes - even where they are not tightly coupled. Oracle provides sequence generators.
Both MySQL and PHP have built-in functionality for generating uuids, although since most DBMS support surrogate key generation, there is little obvious benefit to this approach.
You can use a database... Everytime a random number has shown up, put it in a database and next time, compare the random number of the new script with those already in the database.
Use a random number generator, keep stored the already generated values, discard and generate again when you get a duplicate number.
Ignore uniqids and stuff like that because they are just plain wrong.
There are no real "perfect and low price" random number generators!!
The best that can be done from mathematical functions are pseudorandom which in the end seem random enough for most intents and purposes.
mt_rand function uses the Mersenne twister, which is a pretty good PRNG!
so it's probably going to be good enough for most casual use.
give a look here for more info: http://php.net/manual/en/function.mt-rand.php
a possible code implementation is
<?php
$random = mt_rand($yourMin, $yourMax);
?>
EDITD:
find a very good explanation here:
Generate cryptographically secure random numbers in php
The typical answer is to use a GUID or UUID, although I avoid those forms that use only random numbers. (Eg, avoid version 4 GUID or UUIDs)
I want to create a token generator that generates tokens that cannot be guessed by the user and that are still unique (to be used for password resets and confirmation codes).
I often see this code; does it make sense?
md5(uniqid(rand(), true));
According to a comment uniqid($prefix, $moreEntopy = true) yields
first 8 hex chars = Unixtime, last 5 hex chars = microseconds.
I don't know how the $prefix-parameter is handled..
So if you don't set the $moreEntopy flag to true, it gives a predictable outcome.
QUESTION: But if we use uniqid with $moreEntopy, what does hashing it with md5 buy us? Is it better than:
md5(mt_rand())
edit1: I will store this token in an database column with a unique index, so I will detect columns. Might be of interest/
rand() is a security hazard and should never be used to generate a security token: rand() vs mt_rand() (Look at the "static" like images). But neither of these methods of generating random numbers is cryptographically secure. To generate secure secerts an application will needs to access a CSPRNG provided by the platform, operating system or hardware module.
In a web application a good source for secure secrets is non-blocking access to an entropy pool such as /dev/urandom. As of PHP 5.3, PHP applications can use openssl_random_pseudo_bytes(), and the Openssl library will choose the best entropy source based on your operating system, under Linux this means the application will use /dev/urandom. This code snip from Scott is pretty good:
function crypto_rand_secure($min, $max) {
$range = $max - $min;
if ($range < 0) return $min; // not so random...
$log = log($range, 2);
$bytes = (int) ($log / 8) + 1; // length in bytes
$bits = (int) $log + 1; // length in bits
$filter = (int) (1 << $bits) - 1; // set all lower bits to 1
do {
$rnd = hexdec(bin2hex(openssl_random_pseudo_bytes($bytes)));
$rnd = $rnd & $filter; // discard irrelevant bits
} while ($rnd >= $range);
return $min + $rnd;
}
function getToken($length=32){
$token = "";
$codeAlphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
$codeAlphabet.= "abcdefghijklmnopqrstuvwxyz";
$codeAlphabet.= "0123456789";
for($i=0;$i<$length;$i++){
$token .= $codeAlphabet[crypto_rand_secure(0,strlen($codeAlphabet))];
}
return $token;
}
This is a copy of another question I found that was asked a few months before this one. Here is a link to the question and my answer: https://stackoverflow.com/a/13733588/1698153.
I do not agree with the accepted answer. According to PHPs own website "[uniqid] does not generate cryptographically secure tokens, in fact without being passed any additional parameters the return value is little different from microtime(). If you need to generate cryptographically secure tokens use openssl_random_pseudo_bytes()."
I do not think the answer could be clearer than this, uniqid is not secure.
I know the question is old, but it shows up in Google, so...
As others said, rand(), mt_rand() or uniqid() will not guarantee you uniqueness... even openssl_random_pseudo_bytes() should not be used, since it uses deprecated features of OpenSSL.
What you should use to generate random hash (same as md5) is random_bytes() (introduced in PHP7). To generate hash with same length as MD5:
bin2hex(random_bytes(16));
If you are using PHP 5.x you can get this function by including random_compat library.
Define "unique". If you mean that two tokens cannot have the same value, then hashing isn't enough - it should be backed with a uniqueness test. The fact that you supply the hash algorithm with unique inputs does not guarantee unique outputs.
To answer your question, the problem is you can't have a generator that is guaranteed random and unique as random by itself, i.e., md5(mt_rand()) can lead to duplicates. What you want is "random appearing" unique values. uniqid gives the unique id, rand() affixes a random number making it even harder to guess, md5 masks the result to make it yet even harder to guess. Nothing is unguessable. We just need to make it so hard that they wouldn't even want to try.
I ran into an interesting idea a couple of years ago.
Storing two hash values in the datebase, one generated with md5($a) and the other with sha($a). Then chek if both the values are corect. Point is, if the attacker broke your md5(), he cannot break your md5 AND sha in the near future.
Problem is: how can that concept be used with the token generating needed for your problem?
First, the scope of this kind of procedure is to create a key/hash/code, that will be unique for one given database. It is impossible to create something unique for the whole world at a given moment.
That being said, you should create a plain, visible string, using a custom alphabet, and checking the created code against your database (table).
If that string is unique, then you apply a md5() to it and that can't be guessed by anyone or any script.
I know that if you dig deep into the theory of cryptographic generation you can find a lot of explanation about this kind of code generation, but when you put it to real usage it's really not that complicated.
Here's the code I use to generate a simple 10 digit unique code.
$alphabet = "aA1!bB2#cC3#dD5%eE6^fF7&gG8*hH9(iI0)jJ4-kK=+lL[mM]nN{oO}pP\qQ/rR,sS.tT?uUvV>xX~yY|zZ`wW$";
$code = '';
$alplhaLenght = strlen($alphabet )-1;
for ($i = 1; $i <= 10; $i++) {
$n = rand(1, $alplhaLenght );
$code .= $alphabet [$n];
}
And here are some generated codes, although you can run it yourself to see it work:
SpQ0T0tyO%
Uwn[MU][.
D|[ROt+Cd#
O6I|w38TRe
Of course, there can be a lot of "improvements" that can be applied to it, to make it more "complicated", but if you apply a md5() to this, it'll become, let's say "unguessable" . :)
MD5 is a decent algorithm for producing data dependent IDs. But in case you have more than one item which has the same bitstream (content), you will be producing two similar MD5 "ids".
So if you are just applying it to a rand() function, which is guaranteed not to create the same number twice, you are quite safe.
But for a stronger distribution of keys, I'd personally use SHA1 or SHAx etc'... but you will still have the problem of similar data leads to similar keys.