I would like to create a unique ID (filename) for some of my files and would like to do so with minimal concern. My original plan was to use a base64 encoded string of random bytes but this will not work because / is a valid base64 character.
The function I currently use looks like this:
static function uniqueID(){
return base64_encode(random_bytes(16));
}
How could I go about generating a high entropy unique ID that will always be a certain length and contain only characters that are valid in filenames?
Note: I do not wish to use a solution that accomplishes this with str_replace().
Edit: I appreciate the responses so far.This is one solution Ive explored locally but I am not sure how to calculate how much entropy I lose when compared to uniqueID().
public static function uniqueFilename(){
return bin2hex(random_bytes(12));
}
As User: Sammitch stated,
12 bytes = 96 bits. If you generated a random 96-bit value and then
tried to intentionally brute-force guess a collision you'd need to
build a Dyson sphere around the sun to power the computer and then
wait a few centuries. Just use the random bytes.
96 bits is like
concatenating 3x 32bit numbers together, the number of possibilities
is: 79,228,162,514,264,337,593,543,950,336
Knowing this we can justify using the following function when $n >= 12.
public static function uniqueFilename($n = 12){
return bin2hex(random_bytes($n));
}
I would recommend Ramsey's UUID: https://github.com/ramsey/uuid .
We use it in production and it does the job.
Related
I'm new to php and I'm studying on my own, normally I create my tokens and insert them in the tables like this:
private function create_token($reference, $bytes, $slice)
{
$key = substr(preg_replace('/\W/', "", base64_encode(bin2hex(random_bytes($bytes)))), 0, $slice);
return $reference . $key;
}
function create_token('token_B8', 34, 22); //token_B8eEr32EEddDsfSDGRGgHHhg
This maybe is a correct way to create tokens but my doubt would be if this really is the correct way, I was thinking, obviously the chanses of there being 2 tokens in the identical table and from 1 to 1000000000 correct? Or is there a way to create a token that says:
Under no circumstances create an equal token
without having to create a function to check if the token in the table already exists.
I believe that what I should do is create a token the way I create it is to create a function that checks if this token already exists in the table, if it does, it generates another token, if not then insert it in the table. This seems to be a correct way, but as I'm new I don't know if there is a more appropriate way, can someone get me out of this doubt? thanks
The string generated by random_bytes() is maximally random, and literally everything you do to it after that is decreasing the amount of randomness in the string, and therefore the number of possible values that it could be.
random_bytes() 8 bits of random per byte.
bin2hex() stretches each byte of input over two bytes. [x0.5]
base64_encode() stretches 3 input bytes over 4 output bytes. [x0.75]
preg_replace('/\W/', "", $input) effectively changing from base64 encoding to base62, decreasing the space slightly once again. [x??? < 1]
So all told that 22 byte token you're generating represents 22 * 8 * 0.5 * 0.75 * ??? <= 66 bits of random data. So <= 73,786,976,294,838,206,464 possibilities.
Boy howdy, that sure seems like a lot, right? Well not really. Because of the Birthday Paradox the probability of collisions can get into the range of causing issues while you're still a few orders of magnitude away from filling the range.
I guess if we remove that pointless bin2hex() we could squeeze out another 66 bits for 132 in total? But how much more does that really get us?
5,444,517,870,735,015,415,413,993,718,908,291,383,296
A lot. A lot. I don't even care about that preg_replace() anymore.
For the sake of completeness, what about just a random_bytes(22)? 176 bits?
95,780,971,304,118,053,647,396,689,196,894,323,976,171,195,136,475,136
I guess the take-aways are:
Don't confuse data encoding with "make more random" just because the output looks garbled. [Note: the same goes for hash functions]
Don't apply functions/encodings willy-nilly if you don't know what they are actually doing.
In code:
$input = 'abc';
// all of these outputs contain the SAME amount of entropy, some of them are just longer representations
var_dump(
$input,
bin2hex($input),
base64_encode($input),
base64_encode(bin2hex($input)),
bin2hex(base64_encode($input))
);
Output:
string(3) "abc"
string(6) "616263"
string(4) "YWJj"
string(8) "NjE2MjYz"
string(8) "59574a6a"
Anyway, with a sufficiently large random ID space it's more pragmatic to just put a UNIQUE constraint on the value and let the process fail when a duplicate value tries to be inserted. You can put in some retry logic, but odds are that it will never actually run unless someone leverages vulnerabilities specifically to make you generate duplicates and DoS yourself with retries. [yes, this is a thing]
I want to add random string as token for form submission which is generated unique forever. I have spent to much time with Google but I am confused which combination to use?
I found so many ways to do this when I googled:
1) Combination of character and number.
2) Combination of character, number and special character.
3) Combination of character, number, special character and date time.
Which combination may i use?
How many character of random string may I generate.?
Any other method which is secure then please let me know.?
Here are some considerations:
Alphabet
The number of characters can be considered the alphabet for the encoding. It doesn't affect the string strength by itself but a larger alphabet (numbers, non-alpha-number characters, etc.) does allow for shorter strings of similar strength (aka keyspace) so it's useful if you are looking for shorter strings.
Input Values
To guarantee your string to be unique, you need to add something which is guaranteed to be unique.
Random value is a good seed value if you have a good random number generator
Time is a good seed value to add but it may not be unique in a high traffic environment
User ID is a good seed value if you assume a user isn't going to create sessions at the exact same time
Unique ID is something the system guarantees is unique. This is often something that the server will guarantee / verify is unique, either in a single server deployment or distributed deployment. A simple way to do this is to add a machine ID and machine unique ID. A more complicated way to do this is to assign key ranges to machines and have each machine manage their key range.
Systems that I've worked with that require absolute uniqueness have added a server unique id which guarantees a item is unique. This means the same item on different servers would be seen as different, which was what was wanted here.
Approach
Pick one more input values that matches your requirement for uniqueness. If you need absolute uniqueness forever, you need something that you control that you are sure is unique, e.g. a machine associated number (that won't conflict with others in a distributed system). If you don't need absolute uniqueness, you can use a random number with other value such as time. If you need randomness, add a random number.
Use an alphabet / encoding that matches your use case. For machine ids, encodings like hexadecimal and base 64 are popular. For machine-readable ids, for case-insensitive encodings, I prefer base32 (Crockford) or base36 and for case-sensitive encodings, I prefer base58 or base62. This is because these base32, 36, 58 and 62 produce shorter strings and (vs. base64) are safe across multiple uses (e.g. URLs, XML, file names, etc.) and don't require transformation between different use cases.
You can definitely get a lot fancier depending on your needs, but I'll just throw this out there since it's what I use frequently for stuff like what you are describing:
md5(rand());
It's quick, simple and easy to remember. And since it's hexadecimal it plays nicely with others.
Refer to this SO Protected Question. This might be what you are looking.
I think its better to redirect you to a previously asked question which has more substantive answers.You will find a lot of options.
Try the code, for function getUniqueToken() which returns you unique string of length 10 (default).
/*
This function will return unique token string...
*/
function getUniqueToken($tokenLength = 10){
$token = "";
//Combination of character, number and special character...
$combinationString = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789*#&$^";
for($i=0;$i<$tokenLength;$i++){
$token .= $combinationString[uniqueSecureHelper(0,strlen($combinationString))];
}
return $token;
}
/*
This helper function will return unique and secure string...
*/
function uniqueSecureHelper($minVal, $maxVal) {
$range = $maxVal - $minVal;
if ($range < 0) return $minVal; // not so random...
$log = log($range, 2);
$bytes = (int) ($log / 8) + 1; // length in bytes
$bits = (int) $log + 1; // length in bits
$filter = (int) (1 << $bits) - 1; // set all lower bits to 1
do {
$rnd = hexdec(bin2hex(openssl_random_pseudo_bytes($bytes)));
$rnd = $rnd & $filter; // discard irrelevant bits
} while ($rnd >= $range);
return $minVal + $rnd;
}
Use this code (two function), you can increase string length by passing int parameter like getUniqueToken(15).
I use your 2nd idea (Combination of character, number and special character), which you refine after googling. I hope my example will help you.
You should go for 3 option. Because it has date and time so it become every time unique.
And for method have you tried
str_shuffle($string)
Every time it generates random string from $string.
End then use substr
($string , start , end)
to cut it down.
End if you want date and time then concatenate the result string with it.
An easily understandable and effective code to generate random strings in PHP. I do not consider predictability concerns important in this connection.
<?php
$d = str_shuffle('0123456789');
$C = str_shuffle('ABCDEFGHIJKLMNOPQRSTUVWXYZ');
$m = str_shuffle('abcdefghijklmnopqrstuvwxyz');
$s = str_shuffle('#!$&()*+-_~');
$l=9; //min 4
$r=substr(str_shuffle($d.$C.$m.$s),0,$l);echo $r.'<br>';
$safe=substr($d,0,1).substr($C,0,1).substr($m,0,1).mb_substr($s,0,1);
$r=str_shuffle($safe.substr($r,0,$l-4));//always at least one digit, special, small and capital
// this also allows for 0,1 or 2 of each available characters in string
echo $r;
exit;
?>
For unique string use uniqid().
And to make it secure, use hashing algorithms
for example :
echo md5(uniqid())
I need to generate a strong unique API key.
Can anyone suggest the best solution for this? I don't want to use rand() function to generate random characters. Is there an alternative solution?
As of PHP 7.0, you can use the random_bytes($length) method to generate a cryptographically-secure random string. This string is going to be in binary, so you'll want to encode it somehow. A straightforward way of doing this is with bin2hex($binaryString). This will give you a string $length * 2 bytes long, with $length * 8 bits of entropy to it.
You'll want $length to be high enough such that your key is effectively unguessable and that the chance of there being another key being generated with the same value is practically nil.
Putting this all together, you get this:
$key = bin2hex(random_bytes(32)); // 64 characters long
When you verify the API key, use only the first 32 characters to select the record from the database and then use hash_equals() to compare the API key as given by the user against what value you have stored. This helps protect against timing attacks. ParagonIE has an excellent write-up on this.
For an example of the checking logic:
$token = $request->bearerToken();
// Retrieve however works best for your situation,
// but it's critical that only the first 32 characters are used here.
$users = app('db')->table('users')->where('api_key', 'LIKE', substr($token, 0, 32) . '%')->get();
// $users should only have one record in it,
// but there is an extremely low chance that
// another record will share a prefix with it.
foreach ($users as $user) {
// Performs a constant-time comparison of strings,
// so you don't leak information about the token.
if (hash_equals($user->api_token, $token)) {
return $user;
}
}
return null;
Bonus: Slightly More Advanced Use With Base64 Encoding
Using Base64 encoding is preferable to hexadecimal for space reasons, but is slightly more complicated because each character encodes 6 bits (instead of 4 for hexadecimal), which can leave the encoded value with padding at the end.
To keep this answer from dragging on, I'll just put some suggestions for handling Base64 without their supporting arguments. Pick a $length greater than 32 that is divisible by both 3 and 2. I like 42, so we'll use that for $length. Base64 encodings are of length 4 * ceil($length / 3), so our $key will be 56 characters long. You can use the first 28 characters for selection from your storage, leaving another 28 characters on the end that are protected from leaking by timing attacks with hash_equals.
Bonus 2: Secure Key Storage
Ideally, you should be treating the key much like a password. This means that instead of using hash_equals to compare the full string, you should hash the remainder of the key like a password, store that separately than the first half of your key (which is in plain-text), use the first half for selection from your database and verify the latter half with password_verify.
using mcrypt:
<?php
$bytes = mcrypt_create_iv(4, MCRYPT_DEV_URANDOM);
$unpack = unpack("Nint", $bytes);
$id = $unpack['int'] & 0x7FFFFFFF;
PHP has uniqid function http://php.net/manual/en/function.uniqid.php with optional prefix and you can even add additional entropy to further avoid collision. But if you absolutely possitevily need something unique you should not use anything with randomness in it.
This is the best solution i found.
http://www.php.net/manual/en/function.uniqid.php#94959
I need to generate a string using PHP, it need to be unique and need to be from 4 to 8 characters (the value of a variable).
I thought I can use crc32 hash but I can't decide how many characters, but sure it will be unique. In the other hand only create a "password generator" will generate duplicated string and checking the value in the table for each string will take a while.
How can I do that?
Thanks!
Maybe I can use that :
function unique_id(){
$better_token = md5(uniqid(rand(), true));
$unique_code = substr($better_token, 16);
$uniqueid = $unique_code;
return $uniqueid;
}
$id = unique_id();
Changing to :
function unique_id($l = 8){
$better_token = md5(uniqid(rand(), true));
$rem = strlen($better_token)-$l;
$unique_code = substr($better_token, 0, -$rem);
$uniqueid = $unique_code;
return $uniqueid;
}
echo unique_id(4);
Do you think I'll get unique string each time for a goood while?
In short, I think you'll get a pretty good random value. There's always the chance of a collision but you've done everything you can to get a random value. uniqid() returns a random value based on the current time in microseconds. Specifying rand() (mt_rand() would be better) and the second argument as true to uniqid() should make the value even more unique. Hashing the value using md5() should also make it pretty unique as even a small difference in two random values generated should be magnified by the hashing function. idealmachine is correct in that a longer value is less likely to have a collision than a shorter one.
Your function could also be shorter since md5() will always return a 32 character long string. Try this:
function unique_id($l = 8) {
return substr(md5(uniqid(mt_rand(), true)), 0, $l);
}
The problem with randomness is that you can never be sure of anything. There is a small chance you could get one number this time and the same number the next. That said, you would want to make the string as long as possible to reduce that probability. As an example of how long such numbers can be, GUIDs (globally unique identifiers) are 16 bytes long.
In theory, four hex characters (16 bits) give only 16^4 = 65536 possibilities, while eight hex characters (32 bits) give 16^8 = 4294967296. You, however, need to consider how likely it is for any two hashes to collide (the "birthday problem"). Wikipedia has a good table on how likely such a collision is. In short, four hex characters are definitely not sufficient, and eight might not be.
You may want to consider using Base64 encoding rather than hex digits; that way, you can fit 48 bits in rather than just 32 bits.
Eight bytes is 8 * 8 = 64 bits.
Reliable passwords You can only make from ascii characters a-zA-Z and numbers 0-9. To do that best way is using only cryptographically secure methods, like random_int() or random_bytes() from PHP7. Rest functions as base64_encode() You can use only as support functions to make reliability of string and change it to ASCII characters.
mt_rand() is not secure and is very old.
From any string You must use random_int(). From binary string You should use base64_encode() to make binary string reliable or bin2hex, but then You will cut byte only to 16 positions (values).
See my implementation of this functions.
I'm have build an up php script to host large number of images upload by user, what is the best way to generate random numbers to image filenames so that in future there would be no filename conflict? Be it like Imageshack. Thanks.
$better_token = uniqid(md5(mt_rand()), true);
Easiest way would be a new GUID for each file.
http://www.php.net/manual/en/function.uniqid.php#65879
Here's how I implemented your solution
This example assumes i want to
Get a list, containing 50 numbers that is unique and random, and
This list of # to come from the number range of 0 to 1000
Code:
//developed by www.fatphuc.com
$array = array(); //define the array
//set random # range
$minNum = 0;
$maxNum = 1000;
// i just created this function, since we’ll be generating
// # in various sections, and i just want to make sure that
// if we need to change how we generate random #, we don’t
// have to make multiple changes to the codes everywhere.
// (basically, to prevent mistakes)
function GenerateRandomNumber($minNum, $maxNum){
return round(rand($minNum, $maxNum));
}
//generate 49 more random #s to give a total of 50 random #s
for($i = 1; $i <= 49; $i++){
$num1 = GenerateRandomNumber($minNum, $maxNum);
while(in_array($num1, $array)){
$num1 = GenerateRandomNumber($minNum, $maxNum);
}
$array[$i] = $num1;
}
asort($array); //just want to sort the array
//this simply prints the list of #s in list style
echo '<ol>';
foreach ($array as $var){
echo '<li>';
echo $var;
echo '</li>';
}
echo '</ol>';
Keep a persistent list of all the previous numbers you've generated(in a database table or in a file) and check that a newly generated number is not amongst the ones on the list. If you find this to be prohibitively expensive, generate random numbers on a sufficient number of bits to guarantee a very low probability of collision.
You can also use an incremental approach of assigning these numbers, like a concatenation of a timestamp_part based on the current time and a random_part, just to make sure you don't get collisions if multiple users upload files at the same time.
You could use microtime() as suggested above and then appending an hash of the original filename to further avoid collisions in the (rare) case of exact contemporary uploads.
There are several flaws in your postulate that random values will be unique - regardless of how good the random number generator is. Also, the better the random number generator, the longer it takes to calculate results.
Wouldn't it be better to use a hash of the datafile - that way you get the added benefit of detecting duplicate submissions.
If detecting duplicates is known to be a non-issue, then I'd still recommend this approach but modify the output based on detected collisions (but using a MUCH cheaper computation method than that proposed by Lo'oris) e.g.
$candidate_name=generate_hash_of_file($input_file);
$offset=0;
while ((file_exists($candidate_name . strrev($offset) && ($offset<50)) {
$offset++;
}
if ($offset<50) {
rename($input_file, $candidate_name . strrev($offset));
} else {
print "Congratulations - you've got the biggest storage network in the world by far!";
}
this would give you the capacity to store approx 25*2^63 files using a sha1 hash.
As to how to generate the hash, reading the entire file into PHP might be slow (particularly if you try to read it all into a single string to hash it). Most Linux/Posix/Unix systems come with tools like 'md5sum' which will generate a hash from a stream very efficiently.
C.
forge a filename
try to open that file
if it exists, goto 1
create the file
Using something based on a timestamp maybe. See the microtime function for details. Alternatively uniqid to generate a unique ID based on the current time.
Guaranteed unique cannot be random. Random cannot be guaranteed unique. If you want unique (without the random) then just use the integers: 0, 1, 2, ... 1235, 1236, 1237, ... Definitely unique, but not random.
If that doesn't suit, then you can have definitely unique with the appearance of random. You use encryption on the integers to make them appear random. Using DES will give you 32 bit numbers, while using AES will give you 64 bit numbers. Use either to encrypt 0, 1, 2, ... in order with the same key. All you need to store is the key and the next number to encrypt. Because encryption is reversible, then the encrypted numbers are guaranteed unique.
If 64 bit or 32 bit numbers are too large (32 bits is 8 hex digits) then look at a format preserving encryption which will give you a smaller size range at some cost in time.
My solution is usually a hash (MD5/SHA1/...) of the image contents. This has the added advantage that if people upload the same image twice you still only have one image on the hard disk, saving some space (ofc you have to make sure that the image is not deleted if one user deletes it and another user has the same image in use).