truly unique random number generate by php? - php

I'm have build an up php script to host large number of images upload by user, what is the best way to generate random numbers to image filenames so that in future there would be no filename conflict? Be it like Imageshack. Thanks.

$better_token = uniqid(md5(mt_rand()), true);

Easiest way would be a new GUID for each file.
http://www.php.net/manual/en/function.uniqid.php#65879

Here's how I implemented your solution
This example assumes i want to
Get a list, containing 50 numbers that is unique and random, and
This list of # to come from the number range of 0 to 1000
Code:
//developed by www.fatphuc.com
$array = array(); //define the array
//set random # range
$minNum = 0;
$maxNum = 1000;
// i just created this function, since we’ll be generating
// # in various sections, and i just want to make sure that
// if we need to change how we generate random #, we don’t
// have to make multiple changes to the codes everywhere.
// (basically, to prevent mistakes)
function GenerateRandomNumber($minNum, $maxNum){
return round(rand($minNum, $maxNum));
}
//generate 49 more random #s to give a total of 50 random #s
for($i = 1; $i <= 49; $i++){
$num1 = GenerateRandomNumber($minNum, $maxNum);
while(in_array($num1, $array)){
$num1 = GenerateRandomNumber($minNum, $maxNum);
}
$array[$i] = $num1;
}
asort($array); //just want to sort the array
//this simply prints the list of #s in list style
echo '<ol>';
foreach ($array as $var){
echo '<li>';
echo $var;
echo '</li>';
}
echo '</ol>';

Keep a persistent list of all the previous numbers you've generated(in a database table or in a file) and check that a newly generated number is not amongst the ones on the list. If you find this to be prohibitively expensive, generate random numbers on a sufficient number of bits to guarantee a very low probability of collision.
You can also use an incremental approach of assigning these numbers, like a concatenation of a timestamp_part based on the current time and a random_part, just to make sure you don't get collisions if multiple users upload files at the same time.

You could use microtime() as suggested above and then appending an hash of the original filename to further avoid collisions in the (rare) case of exact contemporary uploads.

There are several flaws in your postulate that random values will be unique - regardless of how good the random number generator is. Also, the better the random number generator, the longer it takes to calculate results.
Wouldn't it be better to use a hash of the datafile - that way you get the added benefit of detecting duplicate submissions.
If detecting duplicates is known to be a non-issue, then I'd still recommend this approach but modify the output based on detected collisions (but using a MUCH cheaper computation method than that proposed by Lo'oris) e.g.
$candidate_name=generate_hash_of_file($input_file);
$offset=0;
while ((file_exists($candidate_name . strrev($offset) && ($offset<50)) {
$offset++;
}
if ($offset<50) {
rename($input_file, $candidate_name . strrev($offset));
} else {
print "Congratulations - you've got the biggest storage network in the world by far!";
}
this would give you the capacity to store approx 25*2^63 files using a sha1 hash.
As to how to generate the hash, reading the entire file into PHP might be slow (particularly if you try to read it all into a single string to hash it). Most Linux/Posix/Unix systems come with tools like 'md5sum' which will generate a hash from a stream very efficiently.
C.

forge a filename
try to open that file
if it exists, goto 1
create the file

Using something based on a timestamp maybe. See the microtime function for details. Alternatively uniqid to generate a unique ID based on the current time.

Guaranteed unique cannot be random. Random cannot be guaranteed unique. If you want unique (without the random) then just use the integers: 0, 1, 2, ... 1235, 1236, 1237, ... Definitely unique, but not random.
If that doesn't suit, then you can have definitely unique with the appearance of random. You use encryption on the integers to make them appear random. Using DES will give you 32 bit numbers, while using AES will give you 64 bit numbers. Use either to encrypt 0, 1, 2, ... in order with the same key. All you need to store is the key and the next number to encrypt. Because encryption is reversible, then the encrypted numbers are guaranteed unique.
If 64 bit or 32 bit numbers are too large (32 bits is 8 hex digits) then look at a format preserving encryption which will give you a smaller size range at some cost in time.

My solution is usually a hash (MD5/SHA1/...) of the image contents. This has the added advantage that if people upload the same image twice you still only have one image on the hard disk, saving some space (ofc you have to make sure that the image is not deleted if one user deletes it and another user has the same image in use).

Related

How to generate unique secure random string in PHP?

I want to add random string as token for form submission which is generated unique forever. I have spent to much time with Google but I am confused which combination to use?
I found so many ways to do this when I googled:
1) Combination of character and number.
2) Combination of character, number and special character.
3) Combination of character, number, special character and date time.
Which combination may i use?
How many character of random string may I generate.?
Any other method which is secure then please let me know.?
Here are some considerations:
Alphabet
The number of characters can be considered the alphabet for the encoding. It doesn't affect the string strength by itself but a larger alphabet (numbers, non-alpha-number characters, etc.) does allow for shorter strings of similar strength (aka keyspace) so it's useful if you are looking for shorter strings.
Input Values
To guarantee your string to be unique, you need to add something which is guaranteed to be unique.
Random value is a good seed value if you have a good random number generator
Time is a good seed value to add but it may not be unique in a high traffic environment
User ID is a good seed value if you assume a user isn't going to create sessions at the exact same time
Unique ID is something the system guarantees is unique. This is often something that the server will guarantee / verify is unique, either in a single server deployment or distributed deployment. A simple way to do this is to add a machine ID and machine unique ID. A more complicated way to do this is to assign key ranges to machines and have each machine manage their key range.
Systems that I've worked with that require absolute uniqueness have added a server unique id which guarantees a item is unique. This means the same item on different servers would be seen as different, which was what was wanted here.
Approach
Pick one more input values that matches your requirement for uniqueness. If you need absolute uniqueness forever, you need something that you control that you are sure is unique, e.g. a machine associated number (that won't conflict with others in a distributed system). If you don't need absolute uniqueness, you can use a random number with other value such as time. If you need randomness, add a random number.
Use an alphabet / encoding that matches your use case. For machine ids, encodings like hexadecimal and base 64 are popular. For machine-readable ids, for case-insensitive encodings, I prefer base32 (Crockford) or base36 and for case-sensitive encodings, I prefer base58 or base62. This is because these base32, 36, 58 and 62 produce shorter strings and (vs. base64) are safe across multiple uses (e.g. URLs, XML, file names, etc.) and don't require transformation between different use cases.
You can definitely get a lot fancier depending on your needs, but I'll just throw this out there since it's what I use frequently for stuff like what you are describing:
md5(rand());
It's quick, simple and easy to remember. And since it's hexadecimal it plays nicely with others.
Refer to this SO Protected Question. This might be what you are looking.
I think its better to redirect you to a previously asked question which has more substantive answers.You will find a lot of options.
Try the code, for function getUniqueToken() which returns you unique string of length 10 (default).
/*
This function will return unique token string...
*/
function getUniqueToken($tokenLength = 10){
$token = "";
//Combination of character, number and special character...
$combinationString = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789*#&$^";
for($i=0;$i<$tokenLength;$i++){
$token .= $combinationString[uniqueSecureHelper(0,strlen($combinationString))];
}
return $token;
}
/*
This helper function will return unique and secure string...
*/
function uniqueSecureHelper($minVal, $maxVal) {
$range = $maxVal - $minVal;
if ($range < 0) return $minVal; // not so random...
$log = log($range, 2);
$bytes = (int) ($log / 8) + 1; // length in bytes
$bits = (int) $log + 1; // length in bits
$filter = (int) (1 << $bits) - 1; // set all lower bits to 1
do {
$rnd = hexdec(bin2hex(openssl_random_pseudo_bytes($bytes)));
$rnd = $rnd & $filter; // discard irrelevant bits
} while ($rnd >= $range);
return $minVal + $rnd;
}
Use this code (two function), you can increase string length by passing int parameter like getUniqueToken(15).
I use your 2nd idea (Combination of character, number and special character), which you refine after googling. I hope my example will help you.
You should go for 3 option. Because it has date and time so it become every time unique.
And for method have you tried
str_shuffle($string)
Every time it generates random string from $string.
End then use substr
($string , start , end)
to cut it down.
End if you want date and time then concatenate the result string with it.
An easily understandable and effective code to generate random strings in PHP. I do not consider predictability concerns important in this connection.
<?php
$d = str_shuffle('0123456789');
$C = str_shuffle('ABCDEFGHIJKLMNOPQRSTUVWXYZ');
$m = str_shuffle('abcdefghijklmnopqrstuvwxyz');
$s = str_shuffle('#!$&()*+-_~');
$l=9; //min 4
$r=substr(str_shuffle($d.$C.$m.$s),0,$l);echo $r.'<br>';
$safe=substr($d,0,1).substr($C,0,1).substr($m,0,1).mb_substr($s,0,1);
$r=str_shuffle($safe.substr($r,0,$l-4));//always at least one digit, special, small and capital
// this also allows for 0,1 or 2 of each available characters in string
echo $r;
exit;
?>
For unique string use uniqid().
And to make it secure, use hashing algorithms
for example :
echo md5(uniqid())

How to generate card number so the users cannot follow how much is sold?

I want some generator script to generate unique numbers but not in one order. We need to sell tickets.
For example currently ticket numbers are like this:
100000
100001
100002
...
So the users can see how many are sold.
How can I generate unique numbers?
for example:
151647
457561
752163
...
I could use random number generator, but then I have always check in database if such number has not been generated.
Hmm, maybe when using index on that column - the check would not take long.
Still now I have to get last card number, if I want to add 1 to it, but getting last is fast enough.
And the more tickets will be sold, then bigger chance that RNG will generate existing number. So migth be more checks in future. SO the best would be to take last number and generate next by it.
Here's a simple way to scramble ticket numbers (note: you need 64-bit PHP, or change the code to use the bcmath library):
function scramble($number) {
return (305914*($number-100000)+151647) % 999983;
}
Look, the output even looks like your example:
Input Output
------ ------
100000 151647
100001 457561
100002 763475
100003 069406
If you want to you can reverse it, so you can use these codes in URLs and then recover the original number:
function unscramble($number) {
return (605673*($number-151647)+100000) % 999983 ;
}
Is this safe? Someone with access to many sequential numbers can find the pattern so don't use this if the ticket numbers are extremely sensitive.
Generate random numbers, make the ticket number unique index, insert the record with the new ticket, if fails means that you had a collision, so you have to generate another id. With a good random space, say 32 bit integer, the chance of collision is minimal. The SQL implementation behind if the column is index and numerical is lightning fast.
You can have your number generated, store in a pool, when you need new number, get one with RNG index of the pool, remove from the pool and return it.
if the pool nearly run out, just generate another batch of it
function generateCode() {
$chars = '01234567890';
do {
$code = '';
for ($x = 0; $x < 6; $x++) {
$code .= $chars[ rand(0, strlen($chars)-1) ];
}
you may check here in databse if this code has been generated earlier, if yes, return;
} while (true);
return $code;
}
The easy way, you can simply use md5() function..
And to get a 6 digit string, you can do
$x = md5(microtime());
echo substr($x, 0, 6);
Edit:
session_start();
$x = md5(microtime().session_id());
echo substr($x, 0, 6);

Never Generate Random Number Again

I am looking for a random number generating PHP Solution which did not generate same number again.. is there any solution then please let me know..
I need this solution for one of my Project which generate uniqu key for URL and i don't want to check Generated number is existed or not from the data..
Thanks..
--------- EDIT ----------
I am using this random number generating method is its help full?
function randomString($length = 10, $chars = '1234567890') {
// Alpha lowercase
if ($chars == 'alphalower') {
$chars = 'abcdefghijklmnopqrstuvwxyz';
}
// Numeric
if ($chars == 'numeric') {
$chars = '1234567890';
}
// Alpha Numeric
if ($chars == 'alphanumeric') {
$chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890';
}
// Hex
if ($chars == 'hex') {
$chars = 'ABCDEF1234567890';
}
$charLength = strlen($chars)-1;
for($i = 0 ; $i < $length ; $i++)
{
$randomString .= $chars[mt_rand(0,$charLength)];
}
return $randomString;
}
Look at the php function uniqid():
http://php.net/manual/en/function.uniqid.php
It's impossible to generate a random number which is unique - if the generator is dependent on state, then the output is by definition not random.
It is possible to generate a set of random numbers and remove duplicates (although at the numbers again cease to be be truly random).
Do you really need a random number or do you need a sequence number or a unique identifier - these are 3 separate things.
which generate unique key for URL
MySQL and SQLite both support auto-increment column types which will be unique (effectively the same as a sequence number). MySQL even has a mechanism for ensuring uniqueness across equivalent nodes - even where they are not tightly coupled. Oracle provides sequence generators.
Both MySQL and PHP have built-in functionality for generating uuids, although since most DBMS support surrogate key generation, there is little obvious benefit to this approach.
You can use a database... Everytime a random number has shown up, put it in a database and next time, compare the random number of the new script with those already in the database.
Use a random number generator, keep stored the already generated values, discard and generate again when you get a duplicate number.
Ignore uniqids and stuff like that because they are just plain wrong.
There are no real "perfect and low price" random number generators!!
The best that can be done from mathematical functions are pseudorandom which in the end seem random enough for most intents and purposes.
mt_rand function uses the Mersenne twister, which is a pretty good PRNG!
so it's probably going to be good enough for most casual use.
give a look here for more info: http://php.net/manual/en/function.mt-rand.php
a possible code implementation is
<?php
$random = mt_rand($yourMin, $yourMax);
?>
EDITD:
find a very good explanation here:
Generate cryptographically secure random numbers in php
The typical answer is to use a GUID or UUID, although I avoid those forms that use only random numbers. (Eg, avoid version 4 GUID or UUIDs)

How to generate unguessable "tiny url" based on an id?

I'm interested in creating tiny url like links. My idea was to simply store an incrementing identifier for every long url posted and then convert this id to it's base 36 variant, like the following in PHP:
$tinyurl = base_convert($id, 10, 36)
The problem here is that the result is guessable, while it has to be hard to guess what the next url is going to be, while still being short (tiny). Eg. atm if my last tinyurl was a1, the next one will be a2. This is a bad thing for me.
So, how would I make sure that the resulting tiny url is not as guessable but still short?
What you are asking for is a balance between reduction of information (URLs to their indexes in your database), and artificial increase of information (to create holes in your sequence).
You have to decide how important both is for you. Another question is whether you just do not want sequential URLs to be guessable, or have them sufficiently random to make guessing any valid URL difficult.
Basically, you want to declare n out of N valid ids. Choose N smaller to make the URLs shorter, and make n smaller to generate URLs that are difficult to guess. Make n and N larger to generate more URLs when the shorter ones are taken.
To assign the ids, you can just take any kind of random generator or hash function and cap this to your target range N. If you detect a collision, choose the next random value. If you have reached a count of n unique ids, you must increase the range of your ID set (n and N).
I would simply crc32 url
$url = 'http://www.google.com';
$tinyurl = hash('crc32', $url ); // db85f073
cons: constant 8 character long identifier
This is really cheap, but if the user doesn't know it's happening then it's not as guessable, but prefix and postfix the actual id with 2 or 3 random numbers/letters.
If I saw 9d2a1me3 I wouldn't guess that dm2a2dq2 was the next in the series.
Try Xor'ing the $id with some value, e.g. $id ^ 46418 - and to convert back to your original id you just perform the same Xor again i.e. $mungedId ^ 46418. Stack this together with your base_convert and perhaps some swapping of chars in the resultant string and it'll get quite tricky to guess a URL.
Another way would be to set the maximum number of characters for the URL (let's say it's n). You could then choose a random number between 1 and n!, which would be your permutation number.
On which new URL, you would increment the id and use the permutation number to associate the actual id that would be used. Finally, you would base 32 (or whatever) encode your URL. This would be perfectly random and perfectly reversible.
If you want an injective function, you can use any form of encryption. For instance:
<?php
$key = "my secret";
$enc = mcrypt_ecb (MCRYPT_3DES, $key, "42", MCRYPT_ENCRYPT);
$f = unpack("H*", $enc);
$value = reset($f);
var_dump($value); //string(16) "1399e6a37a6e9870"
To reverse:
$rf = pack("H*", $value);
$dec = rtrim(mcrypt_ecb (MCRYPT_3DES, $key, $rf, MCRYPT_DECRYPT), "\x00");
var_dump($dec); //string(2) "42"
This will not give you a number in base 32; it will give you the encrypted data with each byte converted to base 16 (i.e., the conversion is global). If you really need, you can trivially convert this to base 10 and then to base 32 with any library that supports big integers.
You can pre-define the 4-character codes in advance (all possible combinations), then randomize that list and store it in this random order in a data table. When you want a new value, just grab the first one off the top and remove it from the list. It's fast, no on-the-fly calculation, and guarantees pseudo-randomness to the end-user.
Hashids is an open-source library that generates short, unique, non-sequential, YouTube-like ids from one or many numbers. You can think of it as an algorithm to obfuscate numbers.
It converts numbers like 347 into strings like "yr8", or array like [27, 986] into "3kTMd". You can also decode those ids back. This is useful in bundling several parameters into one or simply using them as short UIDs.
Use it when you don't want to expose your database ids to the user.
It allows custom alphabet as well as salt, so ids are unique only to you.
Incremental input is mangled to stay unguessable.
There are no collisions because the method is based on integer to hex conversion.
It was written with the intent of placing created ids in visible places, like the URL. Therefore, the algorithm avoids generating most common English curse words.
Code example
$hashids = new Hashids();
$id = $hashids->encode(1, 2, 3); // o2fXhV
$numbers = $hashids->decode($id); // [1, 2, 3]
I ended up creating a md5 sum of the identifier, use the first 4 alphanumerics of it and if this is a duplicate simply increment the length until it is no longer a duplicate.
function idToTinyurl($id) {
$md5 = md5($id);
for ($i = 4; $i < strlen($md5); $i++) {
$possibleTinyurl = substr($md5, 0, $i);
$res = mysql_query("SELECT id FROM tabke WHERE tinyurl='".$possibleTinyurl."' LIMIT 1");
if (mysql_num_rows($res) == 0) return $possibleTinyurl;
}
return $md5;
}
Accepted relet's answer as it's lead me to this strategy.

random and unique subsets generation

Lets say we have numbers from 1 to 25 and we have to choose sets of 15 numbers.
The possible sets are, if i'm right 3268760.
Of those 3268760 options, you have to generate say 100000
What would be the best way to generate 100000 unique and random of that subsets?
Is there a way, an algorithm to do that?
If not, what would be the best option to detect duplicates?
I'm planning to do this on PHP but a general solution would be enough,
and any reference not to much 'academic' (more practical) would help me a lot.
There is a way to generate a sample of the subsets that is random, guaranteed not to have duplicates, uses O(1) storage, and can be re-generated at any time. First, write a function to generate a combination given its lexical index. Second, use a pseudorandom permutation of the first Combin(n, m) integers to step through those combinations in a random order. Simply feed the numbers 0...100000 into the permutation, use the output of the permutation as input to the combination generator, and process the resulting combination.
Here's a solution in PHP based on mjv's answer, which is how I was thinking about it. If you run it for a full 100k sets, you do indeed see a lot of collisions. However, I'm hard pressed to devise a system to avoid them. Instead, we just check them fairly quickly.
I'll think about better solutions ... on this laptop, I can do 10k sets in 5 seconds, 20k sets in under 20 seconds. 100k takes several minutes.
The sets are represented as (32-bit) ints.
<?PHP
/* (c) 2009 tim - anyone who finds a use for this is very welcome to use it with no restrictions unless they're making a weapon */
//how many sets shall we generate?
$gNumSets = 1000;
//keep track of collisions, just for fun.
$gCollisions = 0;
$starttime = time();
/**
* Generate and return an integer with exactly 15 of the lower 25 bits set (1) and the other 10 unset (0)
*/
function genSetHash(){
$hash = pow(2,25)-1;
$used = array();
for($i=0;$i<10;){
//pick a bit to turn off
$bit = rand(0,24);
if (! in_array($bit,$used)){
$hash = ( $hash & ~pow(2,$bit) );
$i++;
$used[] = $bit;
}
}
return $hash;
}
//we store our solution hashes in here.
$solutions = array();
//generate a bunch of solutions.
for($i=0;$i<$gNumSets;){
$hash = genSetHash();
//ensure no collisions
if (! in_array($hash,$solutions)){
$solutions[] = $hash;
//brag a little.
echo("Generated $i random sets in " . (time()-$starttime) . " seconds.\n");
$i++;
}else {
//there was a collision. There will generally be more the longer the process runs.
echo "thud.\n";
$gCollisions++;
}
}
// okay, we're done with the hard work. $solutions contains a bunch of
// unique, random, ints in the right range. Everything from here on out
// is just output.
//takes an integer with 25 significant digits, and returns an array of 15 numbers between 1 and 25
function hash2set($hash){
$set = array();
for($i=0;$i<24;$i++){
if ($hash & pow(2,$i)){
$set[] = $i+1;
}
}
return $set;
}
//pretty-print our sets.
function formatSet($set){
return "[ " . implode(',',$set) . ']';
}
//if we wanted to print them,
foreach($solutions as $hash){
echo formatSet(hash2set($hash)) . "\n";
}
echo("Generated $gNumSets unique random sets in " . (time()-$starttime) . " seconds.\n");
echo "\n\nDone. $gCollisions collisions.\n";
I think it's all correct, but it's late, and I've been enjoying several very nice bottles of beer.
Do they have to be truly random? Or seemingly random?
Selection: generate a set with all 25 - "shuffle" the first 15 elements using Fisher-Yates / the Knuth shuffle, and then check if you've seen that permutation of the first 15 elements before. If so, disregard, and retry.
Duplicates: You have 25 values that are there or not - this can be trivially hashed to an integer value (if the 1st element is present, add 2^0, if the second is, add 2^1, etc. - it can be directly represented as a 25 bit number), so you can check easily if you've seen it already.
You'll get a fair bit of collisions, but if it's not a performance critical snippet, it might be doable.
The random number generator (RNG) of your environment will supply you random numbers that are evenly distributed in a particular range. This type of distribution is often what is needed, say if your subset simulate lottery drawings, but it is important to mention this fact in case your are modeling say the age of people found on the grounds of a middle school...
Given this RNG you can "draw" 10 (or 15, read below) numbers between 1 and 25. This may require that you multiply (and round) the random number produced by the generator, and that you ignore numbers that are above 25 (i.e. draw again), depending on the exact API associated with the RNG, but again getting a drawing in a given range is trivial. You will also need to re-draw when a number comes up again.
I suggest you get 10 numbers only, as these can be removed from the 1-25 complete sequence to produce a set of 15. In other words drawing 15 to put in is the same drawing 10 to take out...
Next you need to assert the uniqueness of the sets. Rather than storing the whole set, you can use a hash to identify each set uniquely. This should take fewer that 25 bits, so can be stored on a 32 bits integer. You then need to have an efficient storage for up to 100,000 of these values; unless you want to store this in a database.
On this question of uniqueness of 100,000 sets taken out of all the possible sets, the probability of a collision seems relatively low. Edit: Oops... I was optimistic... This probability is not so low, with about 1.5% chance of a collision starting after drawing the 50,000th, there will be quite a few collisions, enough to warrant a system to exclude them...

Categories