Randomness of hashing functions such as SHA1 - php

I'm trying to generate an even distribution of random numbers based on User IDs. That is, I want a random number for each user that remains the same any time that user requests the random number (but the user doesn't need to store the number). My current algorithm (in PHP) to count distribution, for a given large array of userIDs $arr is:
$range = 100;
$results = array_fill(0, $range, 0);
foreach ($arr as $userID) {
$hash = sha1($userID,TRUE);
$data = unpack('L*', $hash);
$seed = 0;
foreach ($data as $integer) {
$seed ^= $integer;
}
srand($seed);
++$results[rand(0, $range-1)];
}
One would hope that this generates an approximately even distribution. But it doesn't! I've checked to make sure that each value in $arr is unique, but one entry in the list always gets much more activity than all the others. Is there a better method of generating a hash of a string that will give an approximately even distribution? Apparently SHA is not up to the job. I've also tried MD5 and a simple crc32, all with the same results!?
Am I crazy? Is the only explanation that I have not, in fact, verified that each entry in $arr is unique?

The sha1 hash numbers are quite uniform distributed. After executing this:
<?php
$n = '';
$salt = 'this is the salt';
for ($i=0; $i<100000; $i++) {
$n .= implode('', unpack('L*', sha1($i . $salt)));
}
$count = count_chars($n, 1);
$sum = array_sum($count);
foreach ($count as $k => $v) {
echo chr($k)." => ".($v/$sum)."\n";
}
?>
You get this result. The probability for each number:
0 => 0.083696057956298
1 => 0.12138983759522
2 => 0.094558704004335
3 => 0.07301783188663
4 => 0.092124978934097
5 => 0.088623772577848
6 => 0.11390989553446
7 => 0.092570936094051
8 => 0.12348330833868
9 => 0.11662467707838
You could use the sha1 as a simple random number generator based on the user's id.
In hexadecimal, the distribution is near to perfect:
// $n .= sha1($i . $salt, false);
0 => 0.06245515
1 => 0.06245665
2 => 0.06258855
3 => 0.0624244
4 => 0.06247255
5 => 0.0625422
6 => 0.0625246
7 => 0.0624716
8 => 0.06257355
9 => 0.0625005
a => 0.0625068
b => 0.0625086
c => 0.0624463
d => 0.06250535
e => 0.06250895
f => 0.06251425

mt_rand() should have a very even distribution over the range requested. When users are created, create a random seed for that user using mt_rand() then always mt_srand() with that seed for that user.
To get an even distribution from 0 to 99, as your example, just mt_rand(0,$range-1). Doing tricks with sha1, md5, or some other hashing algorithm won't really give you a more even distribution than straight random.

It would be helpful if you posted your results that led you to conclude that you're not getting an appropriate distribution, but it's likely one of three things is going on here:
You're simply looking at too small of a sample, and/or you're miss-interpreting your data. As others have commented, it's completely reasonable for a uniform distribution to not have perfectly uniform output.
You'd see better results if you used mt_rand instead of rand.
(Personally, I think this is most likely) You're over-optimizing your seed generation, and losing data / pigeon holing / otherwise hurting your ability to generate random numbers. Reading your code, I think you're doing the following:
Generate a uniform random hash of an unknown value
Split the hash into longs and bitwise XOR-ing them together
Setting rand's seed, and generating a random number off that seed
But why are you doing step 2? What benefit do you think you're getting from that? Try taking that step out, and just use the first value you extract from the hash as your seed, and see if that doesn't give you better results. Good rule of thumb with randomness - don't try to outsmart the people who implemented the algorithms, it can't be done :)

While all of the answers here are good, I will provide the answer that was correct for me, and that is that I was, indeed, crazy. Apparently the uniq command does not, in fact, work like I expected it to (data needs to be sorted first). So the explanation was indeed that the values in $arr were not unique.

Related

Why does rand seem more random than mt_rand when only doing (1, 2)?

I have some elements that I'm trying to randomize at 50% chance of output. Wrote a quick if statement like this.
$rand = mt_rand(1, 2);
if ( $rand == 1 ) {
echo "hello";
} else {
echo "goodbye";
}
In notice that when using mt_rand, "goodbye" is output many times in a row, whereas, if I just use "rand," it's a more equal distribution.
Is there something about mt_rand that makes it worse at handling a simple 1-2 randomization like this? Or is my dataset so small that these results are just anecdotal?
To get the same value "many times in a row" is a possible outcome of a randomly generated series. It would not be completely random if such a pattern were not allowed to occur. If you would continue taking samples, you would also find that the opposite value will sometimes occur several times in a row, provided you keep going long enough.
One way to test that the generated values are indeed quite random and uniformly distributed, is to count how many times the same value is generated as the one generated before, and how many times the opposite value is generated.
Note that the strings "hello" and "goodbye" don't add much useful information; we can just look at the values 1 and 2.
Here is how you could do such a test:
// $countAfter[$i][$j] will contain the number of occurrences of
// a pair $i, $j in the randomly generated sequence.
// So there is an entry for [1][1], [1][2], [2][1] and [2][2]:
$countAfter = [1 => [1 => 0, 2 => 0],
2 => [1 => 0, 2 => 0]];
$prev = 1; // We assume for simplicity that the "previously" generated value was 1
for ($i = 0; $i < 10000; $i++) { // Produce a large enough sample
$n = mt_rand(1, 2);
$countAfter[$prev][$n]++; // Increase the counter that corresponds to the generated pair
$prev = $n;
}
print_r($countAfter);
You can see in this demo that the 4 numbers that are output do not differ that much. Output is something like:
Array (
[1] => Array (
[1] => 2464
[2] => 2558
)
[2] => Array (
[1] => 2558
[2] => 2420
)
)
This means that 1 and 2 are generated about an equal number of times and that a repetition of a value happens just as often as a toggle in the series.
Obviously these numbers are rarely exactly the same, since that would mean the last couple of generated values would not be random at all, as they would need to bring those counts to the desired value.
The important thing is that your sample needs to be large enough to see the pattern of a uniform distribution confirmed.

Unique 6 digit hex Code generator in PHP

I am working on a PHP project and i would like to generate an id that is unique and user friendly like the black berry messenger user pin.
It would best prefer if it was six characters long
Is there any algorithm or a combination of PHP functions i can use?
if not what is my best bet?
Am a newbie.
Lowercase:
sprintf('%06x', mt_rand(0, 16777215))
Uppercase:
sprintf('%06X', mt_rand(0, 16777215))
(demo)
Reference:
mt_rand()
sprintf()
16777215 is 166-1. It's likely that you'll get dupes so you need to store previous values somewhere (typically a database) and check for uniqueness.
Another solution is to generate all 17 million codes at once, shuffle them and pick one each time.
First time:
$all_codes = range(0, 16777215);
shuffle($all_codes);
$sql = 'INSERT INTO all_codes (id, code, taken) VALUES (?, ?, ?)';
foreach ($all_codes as $index => $code) {
$values = [
$index+1,
sprintf('%06X', $code),
false
];
$your_database_library->query($sql, $values);
}
This script is pure non-optimised brute force so you'll need to increase PHP memory limit (it needs like half GB) but it's a one-time task.
Then, every time you need a code (let's assume MySQL):
SELECT id, code
FROM all_codes
WHERE used = 0
ORDER BY 1
LIMIT 1
FOR UPDATE;
UPDATE all_codes
SET used = 1
WHERE id = ?;
Alternatively, you could store them in sequence and pick randomly among the unused, making sure to implement a solution that's fast in your DBMS (because you're randomising every time) but that looks like more work :)
This is an short and easy solution, ofcourse you have to save the PIN's to see if its not taken already.
$color = substr(md5(rand()), 0, 6);
Try it
Refreshing every time you will get unique 6 digits hex code
<?php
$hex_string = "0123456789ABCDEF";
$hex_6_digit = "";
for($i=0; $i<6; $i++) {
$hex_6_digit .= $hex_string{rand(0,strlen($hex_string)-1)};
}
echo $hex_6_digit;
?>
For example:
$uniq = substr(uniqid(),0,6);
Details: http://php.net/uniqid
Here is a way to get an 8 character long hexadecimal string.
echo sprintf("%x", mt_rand());

php is converting base 16 to base 2 incorrectly

Why am I getting this output from my function?
echo $var = hash_hmac('ripemd160', 'http://www.weburlhere.org', 0, 0);
echo "\r\n";
echo $converted = base_convert($var, 16, 2);
echo "\r\n";
Outputs:
407a9d8868a678e12d9fc0264f9ae11e8761b557
0000000000000000000000000000000000000000000000000000000000000000
Whereas base_convert($var, 16, 10) outputs
1421821959848150668406846884086820088622688484226 correctly.
Also, as a side-question (bonus points for this!) I'm assuming ripemd160 gives me a unique identifier for each input preimage. I'm attempting to make a url-shortening service that shortens a URL from any length to its hash digest (I'm assuming converting the binary to base64 with base64_encode($converted) will shorten the URL even more). Is this correct, and is this a good idea?
The PHP document on base_convert said
base_convert() may lose precision on large numbers due to properties
related to the internal "double" or "float" type used. Please see the
Floating point numbers section in the manual for more specific
information and limitations.
So, you cannot rely on this function to convert a large numbers. However, it is very easy manually write a function to convert from base 16 to base 2.
function hex2bin($hex) {
$table = array('0000', '0001', '0010', '0011',
'0100', '0101', '0110', '0111',
'1000', '1001', 'a' => '1010', 'b' => '1011',
'c' => '1100', 'd' => '1101', 'e' => '1110',
'f' => '1111');
$bin = '';
for($i = 0; $i < strlen($hex); $i++) {
$bin .= $table[strtolower(substr($hex, $i, 1))];
}
return $bin;
}
echo hex2bin('407a9d8868a678e12d9fc0264f9ae11e8761b557');
I'm assuming converting the binary to base64 with
base64_encode($converted) will shorten the URL even more). Is this
correct, and is this a good idea
Yes, it is shorter. It is 32 times shorter than binary, and 4 times shorter than base-16. However, ripemd160 does not guarantee to give an unique identifier for every link. There are still some collisions (which I don't even know how rare it will be).
According to the PHP manual, the base_convert() is limited to double or float 32-bit precision. You can use gmp library to deal with numbers of arbitrary length.
A sample code also from the PHP manual page:
/* use gmp library to convert base. gmp will convert numbers > 32bit
* #author lindsay at bitleap dot com
* #link you can execute this code at http://ideone.com/FT29qo
*/
function gmp_convert($num, $base_a, $base_b)
{
return gmp_strval ( gmp_init($num, $base_a), $base_b );
}

I'm creating a random array in PHP and my code doesnt seem to output a truly random answer

I want to construct an array of 3 offers that output in a random order. I have the following code and whilst it does output 3 random offers it doesn't appear to be random. The first value in the generated array always seems to be from the 1st 2 records in my offers table. The offers table only has 5 records in it (I dont know if this is affecting things).
$arrayOfferCount = $offerCount-1;
$displayThisManyOffers = 3;
$range = range(0, $arrayOfferCount);
$vals = array_rand($range, $displayThisManyOffers);`
Any help or advice would be appreciated.
Working fine here. Benchmark it over lots of runs instead of just gut feeling... here it is for 1,000 tries:
<?php
$offerCount = 5;
$arrayOfferCount = $offerCount-1;
$displayThisManyOffers = 3;
$range = range(0, $arrayOfferCount);
for($i = 0; $i < 1000; $i++) {
$vals = array_rand($range, $displayThisManyOffers);
foreach($vals as $val) {
$counts[$val]++;
}
}
sort($counts);
print_r($counts);
Generates:
Array
(
[0] => 583
[1] => 591
[2] => 591
[3] => 610
[4] => 625
)
I know that mt_rand() is much better PRNG.
However, in your case you need to let the database select them for you
SELECT * FROM ads ORDER BY RAND() LIMIT 0, 3
It is probably randomly picking which to display, but displaying them in the same order they appear in your array. If you do it enough times (~20) you should get the third one to show up once if this is the case (chances of choosing exactly the last 3 out of 5 would be 1 in 5*4, so around every 20th one you'll see the third option appear).
array_rand seems not to work properly sometimes (see PHP-Manual comments).
Workaround: Get the array size and pick a random index using the function mt_rand

How can I generate a 6 digit unique number?

How can I generate a 6 digit unique number? I have verification mechanisms in place to check for duplicate entries.
$six_digit_random_number = random_int(100000, 999999);
As all numbers between 100,000 and 999,999 are six digits, of course.
If you want it to start at 000001 and go to 999999:
$num_str = sprintf("%06d", mt_rand(1, 999999));
Mind you, it's stored as a string.
Another one:
str_pad(mt_rand(0, 999999), 6, '0', STR_PAD_LEFT);
Anyway, for uniqueness, you will have to check that your number hasn't been already used.
You tell that you check for duplicates, but be cautious since when most numbers will be used, the number of "attempts" (and therefore the time taken) for getting a new number will increase, possibly resulting in very long delays & wasting CPU resources.
I would advise, if possible, to keep track of available IDs in an array, then randomly choose an ID among the available ones, by doing something like this (if ID list is kept in memory):
$arrayOfAvailableIDs = array_map(function($nb) {
return str_pad($nb, 6, '0', STR_PAD_LEFT);
}, range(0, 999999));
$nbAvailableIDs = count($arrayOfAvailableIDs);
// pick a random ID
$newID = array_splice($arrayOfAvailableIDs, mt_rand(0, $nbAvailableIDs-1), 1);
$nbAvailableIDs--;
You can do something similar even if the ID list is stored in a database.
Here's another one:
substr(number_format(time() * rand(),0,'',''),0,6);
There are some great answers, but many use functions that are flagged as not cryptographically secure. If you want a random 6 digit number that is cryptographically secure you can use something like this:
$key = random_int(0, 999999);
$key = str_pad($key, 6, 0, STR_PAD_LEFT);
return $key;
This will also include numbers like 000182 and others that would otherwise be excluded from the other examples.
You can also use a loop to make each digit random and generate random number with as many digits as you may need:
function generateKey($keyLength) {
// Set a blank variable to store the key in
$key = "";
for ($x = 1; $x <= $keyLength; $x++) {
// Set each digit
$key .= random_int(0, 9);
}
return $key;
}
For reference, random_int — Generates cryptographically secure pseudo-random integers that are suitable for use where unbiased results are critical, such as when shuffling a deck of cards for a poker game." - php.net/random_int
<?php
$file = 'count.txt';
//get the number from the file
$uniq = file_get_contents($file);
//add +1
$id = $uniq + 1 ;
// add that new value to text file again for next use
file_put_contents($file, $id);
// your unique id ready
echo $id;
?>
i hope this will work fine. i use the same technique in my website.
In PHP 7.0+ I would suggest random_int($min, $max) over mt_rand().
$randomSixDigitInt = \random_int(100000, 999999);
From php.net:
Caution
This function does not generate cryptographically secure values, and should not be used for cryptographic purposes. If you need a cryptographically secure value, consider using random_int(), random_bytes(), or openssl_random_pseudo_bytes() instead.
So this depends mostly on context. I'll also add that as of PHP 7.1.0 rand() is now an alias to mt_rand().
Cheers
$characters = '123456789';
$charactersLength = strlen($characters);
$randomString = '';
for ($i = 0; $i < 6; $i++) {
$randomString .= $characters[rand(0, $charactersLength - 1)];
}
$pin=$randomString;
This will generate random 6 digit number
<?php
mt_rand(100000,999999);
?>
I would use an algorithm, brute force could be as follows:
First time through loop:
Generate a random number between 100,000 through 999,999 and call that x1
Second time through the loop
Generate a random number between 100,000 and x1 call this xt2, then generate a random number between x1 and 999,999 call this xt3, then randomly choose x2 or x3, call this x2
Nth time through the loop
Generate random number between 100,000 and x1, x1 and x2, and x2 through 999,999 and so forth...
watch out for endpoints, also watch out for x1
<?php echo rand(100000,999999); ?>
you can generate random number
You can use $uniq = round(microtime(true));
it generates 10 digit base on time
which is never be duplicated
Try this using uniqid and hexdec,
echo hexdec(uniqid());
Among the answers given here before this one, the one by "Yes Barry" is the most appropriate one.
random_int(100000, 999999)
Note that here we use random_int, which was introduced in PHP 7 and uses a cryptographic random generator, something that is important if you want random codes to be hard to guess. random_bytes was also introduced in PHP 7 and likewise uses a cryptographic random generator.
Many other solutions for random value generation, including those involving time(), microtime(), uniqid(), rand(), mt_rand(), str_shuffle(), array_rand(), and shuffle(), are much more predictable and are unsuitable if the random string will serve as a password, a bearer credential, a nonce, a session identifier, a "verification code" or "confirmation code", or another secret value.
The code above generates a string of 6 decimal digits. If you want to use a bigger character set (such as all upper-case letters, all lower-case letters, and the 10 digits), this is a more involved process, but you have to use random_int or random_bytes rather than rand(), mt_rand(), str_shuffle(), etc., if the string will serve as a password, a "confirmation code", or another secret value. See an answer to a related question, and see also: generating a random code in php?
I also list other things to keep in mind when generating unique identifiers, especially random ones.
This is the easiest method to generate 6 digits random number
$data = random_int(100000, 999999);
echo $data;

Categories