SHA512 to generate random numbers in PHP - php

I'm filling an array with random numbers using $blockhash[$i] = rand().time().rand()
Then, for each random number in that array I calculate the correspondent SHA512
$SecretKey = "60674ccb549f1988439774adb82ff187e63a2dfd403a0dee852e4e4eab75a0b3";
$sha = hash_hmac('sha512', $value, $SecretKey);
Split it:
$pool = str_split($sha, 2);
Then I get the first number from the $pool array, convert hex to dec and limit it within 1 and 50:
$dec = hexdec($pool[0]) % 50 + 1;
The problem is that the numbers are not that random and I don't know why. I'm counting the frequency for each number from 1 to 50 but the numbers 1,2,3,4,5 and 6 are coming up often than the others. See graph
Why is this happening and how to fix it? Thanks!

the 2 hex characters you are converting to decimal will be in the range of 0-255. you mod that by 50 and add 1 making 1-6 (range(0-5)+1) occur 6 times over 1-256 while every other number occurs only 5 times. This would account for a ~20% increase in those numbers coming up.

You get 1-6 more often because you fetch two hexadecimal digits from the hash. That's one byte, so it can store values from 0 to 255. Then you use modulo 50. In result you get ranges 0-49, 50-99, 100-149, 150-199, 200-249 and... 250-255. This last one is responsible for extra prevalence of 1-6 in your results.
Solution: just use mt_rand(1,50);
[edit]
If you really need to convert a number from 0-255 range to 1-50 range, the solution would be scaling and rounding.
$number = round(($byteValue)/(255/49))+1;

Neither rand() or mt_rand() generate truly random values.
As the manual states:
This function does not generate cryptographically secure values, and should not be used for cryptographic purposes. If you need a cryptographically secure value, consider using openssl_random_pseudo_bytes() instead.
See Better Random Generating PHP for an StackOverflow question that points the same issue and has some good answers.

Related

how unique is a portion of md5?

I'm having a question regarding the uniqueness of md5 function.
I know that md5 (with microtime value) are not unique, however, they are pretty unique :)
How can I calculate the probability of a collision between two portions of an md5 hashes?
For example: The following in php that generates a 8 chars string from md5 result:
substr(md5(microtime()), 0, 8);
A second scenario - What if the index of it is unique (so it gets a different portion of the hash each time)?
substr(md5(microtime()), rand(0, 32), 8);
There are 2^32 combinations of 8 hexadecimal digits. Even if they are completely random, you can only generate about 65000 such strings, on average, before you get 2 that are the same.
md5(), using a random index or not, doesn't significantly change anything as long as all the microtime() values use use are unique. But, if you are generating these too fast, or across many machines, then the situation is much much worse, because there's a good chance you could end up using the same microtime() value twice.
As you are asking about uniqueness of your string, it's actually a probability. Means as much string character you will use and as much the length of random string you make will get less chances of getting similar random string.
So, to get unique string you need to store string in your DB and compare with random string, if you found similar then again go for new fresh string , until you get unique string.
It depends on how many "sub-hashes" you are going to generate and how many bits you're keeping from the original MD5 hash (length of a "sub-hash"). If you generate just 1 sub-hash and keep just 1 bit then no collision at all. If you generate 2 sub-hashes expect 50% collision. Use 2 bits and the odds are 25%. You do the math. Refer to the birthday paradox for more info

Hashing Birthday Paradox

So I am working on a piece of code that computes the hashes of 2^4 sets of 3 random prime numbers (less than 2^8). Then keep selecting sets of 3 composite numbers (less than 2^8) until there is a set of {c1, c2, c3} with a hash value that matches one of the previous hashes (the prime ones), that set would be known as {p1,p2,p3}.
From my understanding the birthday attack is basically finding two functions that provide the same result. So I would create 2 functions? One for the prime numbers and then another for composite? What would the best way of doing this be? I am thinking PHP as the language.
Any help would be greatly appreciated.
I think the premise is looking for a set of any 3 numbers < 2^8 that produces the same hash value as a set of 3 prime numbers using the same hash function.
Not stated is the range of the hash value.
The birthday attack is based on the fact that since the range of the hash value is limited, a brute force method that tries hashing all combinations of 3 numbers < 2^8 is likely to produce some collisions with valid hash values well before actually trying all possible combinations. However, in this case, trying all combinations of 3 numbers < 2^8 only takes 16777216 loops, so a complete brute force approach can be used.
The program could create a histogram of all the possible hash values . Since there are only 54 primes < 2^8, generating the histogram for all valid inputs (3 primes) would take 54^3 = 157464 loops.
Checking for collisions using all sets of 3 numbers < 2^8 would take 2^24 = 16777216 loops, which shouldn't take too long depending on the hash algorithm.

Generate a pseudo random 6 character string from an integer

I am trying to resolve the following problem via PHP. The aim is to generate a unique 6-character string based on an integer seed and containing a predefined range of characters. The second requirement is that the string must appear random (so if code 1 were 100000, it is not acceptable for code 2 to be 100001, and 3 100002)
The range of characters is:
Uppercase A-Z excluding: B, I, O, S and Z
0-9 excluding: 0, 1, 2, 5, 8
So that would be a total of 26 characters if I am not mistaken. My first idea would to be encoding from base 10 to base 24 starting at number 7962624. So do 7962624 + seed, and then base24 encode that number.
This gives me the characters 0-N. If I replace the resulting string in the following fashion, I then meet the first criteria:
B=P, I=Q, 0=R, 1=T, 2=U, 5=V, 8=W
So at this point, my codes will look something like this:
1=TRRRR, 2=TRRRT, 3=TRRRU
So my question to you gurus is: How can I make a method that behaves consistently (so the return string for a given integer is always the same) and meets the 2 requirements above? I have spent 2 full days on this now and short of dumping 700,000,000 codes into a database and retrieving them randomly I'm all out of ideas.
Stephen
You get a reasonably random looking sequence if you take your input sequence 1,2,3... and apply a linear map modulo a prime number. The number of unique codes is limited to the prime number so you should choose a large one. The resulting codes will be unique as long as you choose a multiplier that's not divisible by the prime.
Here's an example: With 6 characters you can make 266=308915776 unique strings, so a suitable prime number could be 308915753. This function therefore will generate over 300.000.000 unique codes:
function encode($num) {
$scrambled = (240049382*$num + 37043083) % 308915753;
return base_convert($scrambled, 10, 26);
}
Make sure that you run this on 64bit PHP though, otherwise the multiplication will overflow. On 32bit you'll have to use bcmath. The codes generated for the numbers 1 through 9 are:
n89a2d
hdh4jo
biopb9
5o6k2k
3eek5
k8m9aj
ee4424
8jbojf
2ojjb0
All that's left is filling in the initial 0s that are sometimes missing and replacing the letters and numbers so that none of the forbidden characters are produced.
As you can see, there's no obvious pattern, but someone with some time on their hands, enough motivation and with access to a few of this codes will be able to find out what's going on. A safer alternative is using an encryption algorithm with a small block size, such as Skip32.

How to generate unique numeric value with fixed length from given data in PHP?

How to generate unique numeric value with fixed length from given data in PHP? For instance, I can have a string that contains numbers and characters and I need to generate unique numeric value with length 6. Thanks!
You won't be able to generate a unique numeric value out of an input with any algorithm. That's the problem of converting an input into a pseudorandom output. If you have an input string of 20 characters and an output of only 6, there will be repeated results, because:
input of 20 characters (assuming 58 alphanumerical possibilities):
58^20 = 1.8559226468222606056912232424512e+35 possibilities
output of 6 characters (assuming 10 numerical possibilities):
10^6 = 1000000 possibilities
So, to sum up, you won't be able to generate a unique number out of a string. Your best chances are to use a hashing function like md5 or sha1. They are alphanumerical but you can always convert them into numbers. However, once you crop them to, let's say, 6 digits, their chances to be repeated increase a lot.
It is impossible to generate a completely unique value given an arbitrary value with a limit on the number of characters unfortunately. There are an infinite number of possible values, while there are only 999999 possible values in a numeric value of length 6.
In PHP however you can do the following:
$list_of_numeric_values = array();
foreach ($list_of_given_values as $value)
{
if (!in_array($value, $list_of_numeric_values))
$list_of_numeric_values[] = $value;
}
After this is complete, the array then will have a unique key for each possible value you can use.
If you dont need to calculate these all at the same time you can follow a similar algorithm where instead of just "searching" the array using PHP perhaps its a SELECT on a MySQL table to see if the entry currently exists, and using the auto increment of the primary key to get your value.

How many bytes are unique enough for twitter?

I don't want my database id's to be sequential, so I'm trying to generate uids with this code:
$bin = openssl_random_pseudo_bytes(12);
$hex = bin2hex($bin);
return base_convert($hex, 16, 36);
My question is: how many bytes would i need to make the ids unique enough to handle large amounts of records (like twitter)?
Use PHP's uniqid(), with an added entropy factor. That'll give you plenty of room.
You might considering something like the way tinyurl and other shortening services work. I've used similar techniques, which guarantees uniqueness until all combinations are exhausted. So basically you choose an alphabet, and how many characters you want as a length. Let's say we use alphanumeric, upper and lower, so that's 62 characters in the alphabet, and let's do 5 characters per code. That's 62^5 = 916,132,832 combinations.
You start with your sequential database ID and you multiply that be some prime number (choose one that's fairly large, like 2097593). All you do is multiply that by your database ID, making sure to wrap around if you exceed 62^5, and then convert that number to base-62 as per your chosen alphabet.
This makes each code look fairly unique, yet because we use a prime number, we're guaranteed not to hit the same number twice until we've used all codes already. And it's very short.
You can use longer keys with a smaller alphabet, too, if length isn't a concern.
Here's a question I asked along the same lines: Tinyurl-style unique code: potential algorithm to prevent collisions
Assuming that openssl_random_pseudo_bytes may generate every possible value, N bytes will give you 2 ^ (N * 8) distinct values. For 12 bytes this is 7.923 * 10^28
use MySQL UUID
insert into `database`(`unique`,`data`) values(UUID(),'Test');
If your not using MySQL search google for UUID (Database Name) and it will give you an option
Source Wikipedia
In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%

Categories