I have read around 5-10 different posts on this subject and none of the give a clear example. they explain the backstory.
i have a MySQL database with records from number "1" to "500000"
I want the URLS to be based on these record ID numbers
I want the URL to stay at a constant between 3-5 numbers
Example:
http://wwwurl.com/1 would be http://wwwurl.com/ASd234s
again
http://wwwurl.com/5000000 would be http://wwwurl.com/Y2v0R4r
Can I get a clear exmaple of a function code to make this work, thanks.
To reduce the id number to a shorter string convert to base 35....
$short_id=base_convert($id, 10, 35);
If you want to make it more difficult to predict what the sequence is, pad it out and xor with a known string:
function shortcode($id)
{
$short_id=str_pad($short_id, 4, '0', STR_PAD_LEFT);
$final='';
$key='a59t'; // 500000 = 'bn5p'
for ($x=0; $x<strlen($short_id); $x++) {
$final=chr('0') | (ord(substr($short_id, $x, 1)) ^ ord(substr($key, $x, 1));
}
return $final;
}
And to get the original id back, just reverse the process.
A very stupid example - use e.g. substr(md5($id), 10, 15), where $id is Your 1-500000 record ID number. The probability of generating the same hash between position 10 and 15 (but You can also use positions 24-28, etc) within a 32 char hashcode is limiting to zero...
It would be also better to save the mappings ID <-> HASH to a DB table to find the relevant record based on URL easily.
Whole source code - hash creation, URL rewriting, mappings saving and record retrieval based on URL is a very complex problematic that could be implemented in thousands variations and depends mainly on the programmer skills, experiences and also on the system he is implementing this into...
Related
I'm new to php and I'm studying on my own, normally I create my tokens and insert them in the tables like this:
private function create_token($reference, $bytes, $slice)
{
$key = substr(preg_replace('/\W/', "", base64_encode(bin2hex(random_bytes($bytes)))), 0, $slice);
return $reference . $key;
}
function create_token('token_B8', 34, 22); //token_B8eEr32EEddDsfSDGRGgHHhg
This maybe is a correct way to create tokens but my doubt would be if this really is the correct way, I was thinking, obviously the chanses of there being 2 tokens in the identical table and from 1 to 1000000000 correct? Or is there a way to create a token that says:
Under no circumstances create an equal token
without having to create a function to check if the token in the table already exists.
I believe that what I should do is create a token the way I create it is to create a function that checks if this token already exists in the table, if it does, it generates another token, if not then insert it in the table. This seems to be a correct way, but as I'm new I don't know if there is a more appropriate way, can someone get me out of this doubt? thanks
The string generated by random_bytes() is maximally random, and literally everything you do to it after that is decreasing the amount of randomness in the string, and therefore the number of possible values that it could be.
random_bytes() 8 bits of random per byte.
bin2hex() stretches each byte of input over two bytes. [x0.5]
base64_encode() stretches 3 input bytes over 4 output bytes. [x0.75]
preg_replace('/\W/', "", $input) effectively changing from base64 encoding to base62, decreasing the space slightly once again. [x??? < 1]
So all told that 22 byte token you're generating represents 22 * 8 * 0.5 * 0.75 * ??? <= 66 bits of random data. So <= 73,786,976,294,838,206,464 possibilities.
Boy howdy, that sure seems like a lot, right? Well not really. Because of the Birthday Paradox the probability of collisions can get into the range of causing issues while you're still a few orders of magnitude away from filling the range.
I guess if we remove that pointless bin2hex() we could squeeze out another 66 bits for 132 in total? But how much more does that really get us?
5,444,517,870,735,015,415,413,993,718,908,291,383,296
A lot. A lot. I don't even care about that preg_replace() anymore.
For the sake of completeness, what about just a random_bytes(22)? 176 bits?
95,780,971,304,118,053,647,396,689,196,894,323,976,171,195,136,475,136
I guess the take-aways are:
Don't confuse data encoding with "make more random" just because the output looks garbled. [Note: the same goes for hash functions]
Don't apply functions/encodings willy-nilly if you don't know what they are actually doing.
In code:
$input = 'abc';
// all of these outputs contain the SAME amount of entropy, some of them are just longer representations
var_dump(
$input,
bin2hex($input),
base64_encode($input),
base64_encode(bin2hex($input)),
bin2hex(base64_encode($input))
);
Output:
string(3) "abc"
string(6) "616263"
string(4) "YWJj"
string(8) "NjE2MjYz"
string(8) "59574a6a"
Anyway, with a sufficiently large random ID space it's more pragmatic to just put a UNIQUE constraint on the value and let the process fail when a duplicate value tries to be inserted. You can put in some retry logic, but odds are that it will never actually run unless someone leverages vulnerabilities specifically to make you generate duplicates and DoS yourself with retries. [yes, this is a thing]
so I'm trying to create website with a coinflip system (Its just a small project I'm doing in my free time) but I don't really know where to begin. I need to make it in php (so its in the backend) and I need it to be provably fair (so I can prove that it is legit). What I've found out is that I need to use something like sh256 but I also heard that its pretty out dated and can be easily cracked. Also if it matters it's a site with a steam login system so I plan on being able to join 1v1's with others steam users not just a person sitting besides me or something (not just 1 button is what I mean hehe).
EDIT: I have googled it and tried asking people I know and etc if they knew anything but nothing was any good.
Thanks in advance
-Eiríkur
This is a simple way to get a random coin toss result:
$result = array("heads", "tails")[random_int(0,1)];
First, we make an array, which will be our choices. array("heads, "tails") means we will always get one of those 2 results. Next, in the same line, we can select a single element to actually assign to the $result variable from the array we made previously. We can use random_int(min, max) to generate that number.
Note: random_int() generates cryptographic random integers that are
suitable for use where unbiased results are critical, such as when
shuffling a deck of cards for a poker game.
http://php.net/manual/en/function.random-int.php
As a bonus, you could add more elements to this array, and then just increase the max value in random_int(), and it will work. You could make this more dynamic as-well by doing it like this:
$choices = ["heads", "tails", "Coin flew off the table"];
$result = $choices[random_int(0, count($choices)-1];
With the above code, you can have as many choices as you'd like!
Testing
I ran this code 50,000 times, and these were my results.
Array
(
[heads] => 24923
[tails] => 25077
)
And I ran this code 100,000 times, these were my results:
Array
(
[tails] => 49960
[heads] => 50040
)
You can play around with this here, to check out results:
https://eval.in/894945
The answer above might be the best for most of the scenarios.
In commercial usage, you might want to make sure that the results can be recalculated to prove fairness.
In the following code, you need to calculate a seed for the server. Besides, you also might want to create a public seed that users can see. Those can be anything but I do recommend using some kind of a hash. Each time you get a new result just increase the round, it will generate a new truly random result.
$server_seed = "96f3ea4d221ca1b2048cc3b3b844e479f2bd9c80a870628072ee98fd1aa83cd0";
$public_seed = "460679512935";
for($round = 0;$round < 10;$round++) {
$hash = hash('sha256', $server_seed . "-" . $public_seed . "-" . $round);
if (hexdec(substr($hash, 0, 8)) % 2) {
echo 'heads', PHP_EOL;
} else {
echo 'tails', PHP_EOL;
}
}
This code will loop through 10 times using for loop, each time generating a new result. In the code, we assign a SHA256 hash to the $hash variable. Then we can calculate the decimal value from the $hash using PHP inbuilt function hexdec. We take the remainder from the decimal value and give the result based on it whether it's 0 or not.
NOTE You can play around with the values. Changing the substring to substr($hash, 0, 14) will get you a different way of generation to the results. Keep in mind that this will not change the final results in any way.
Average results of 1 000 000 runs were the following:
Heads: 50.12%
Tails: 49.88%
You can experiment with the code above at here.
I want some generator script to generate unique numbers but not in one order. We need to sell tickets.
For example currently ticket numbers are like this:
100000
100001
100002
...
So the users can see how many are sold.
How can I generate unique numbers?
for example:
151647
457561
752163
...
I could use random number generator, but then I have always check in database if such number has not been generated.
Hmm, maybe when using index on that column - the check would not take long.
Still now I have to get last card number, if I want to add 1 to it, but getting last is fast enough.
And the more tickets will be sold, then bigger chance that RNG will generate existing number. So migth be more checks in future. SO the best would be to take last number and generate next by it.
Here's a simple way to scramble ticket numbers (note: you need 64-bit PHP, or change the code to use the bcmath library):
function scramble($number) {
return (305914*($number-100000)+151647) % 999983;
}
Look, the output even looks like your example:
Input Output
------ ------
100000 151647
100001 457561
100002 763475
100003 069406
If you want to you can reverse it, so you can use these codes in URLs and then recover the original number:
function unscramble($number) {
return (605673*($number-151647)+100000) % 999983 ;
}
Is this safe? Someone with access to many sequential numbers can find the pattern so don't use this if the ticket numbers are extremely sensitive.
Generate random numbers, make the ticket number unique index, insert the record with the new ticket, if fails means that you had a collision, so you have to generate another id. With a good random space, say 32 bit integer, the chance of collision is minimal. The SQL implementation behind if the column is index and numerical is lightning fast.
You can have your number generated, store in a pool, when you need new number, get one with RNG index of the pool, remove from the pool and return it.
if the pool nearly run out, just generate another batch of it
function generateCode() {
$chars = '01234567890';
do {
$code = '';
for ($x = 0; $x < 6; $x++) {
$code .= $chars[ rand(0, strlen($chars)-1) ];
}
you may check here in databse if this code has been generated earlier, if yes, return;
} while (true);
return $code;
}
The easy way, you can simply use md5() function..
And to get a 6 digit string, you can do
$x = md5(microtime());
echo substr($x, 0, 6);
Edit:
session_start();
$x = md5(microtime().session_id());
echo substr($x, 0, 6);
I'm interested in creating tiny url like links. My idea was to simply store an incrementing identifier for every long url posted and then convert this id to it's base 36 variant, like the following in PHP:
$tinyurl = base_convert($id, 10, 36)
The problem here is that the result is guessable, while it has to be hard to guess what the next url is going to be, while still being short (tiny). Eg. atm if my last tinyurl was a1, the next one will be a2. This is a bad thing for me.
So, how would I make sure that the resulting tiny url is not as guessable but still short?
What you are asking for is a balance between reduction of information (URLs to their indexes in your database), and artificial increase of information (to create holes in your sequence).
You have to decide how important both is for you. Another question is whether you just do not want sequential URLs to be guessable, or have them sufficiently random to make guessing any valid URL difficult.
Basically, you want to declare n out of N valid ids. Choose N smaller to make the URLs shorter, and make n smaller to generate URLs that are difficult to guess. Make n and N larger to generate more URLs when the shorter ones are taken.
To assign the ids, you can just take any kind of random generator or hash function and cap this to your target range N. If you detect a collision, choose the next random value. If you have reached a count of n unique ids, you must increase the range of your ID set (n and N).
I would simply crc32 url
$url = 'http://www.google.com';
$tinyurl = hash('crc32', $url ); // db85f073
cons: constant 8 character long identifier
This is really cheap, but if the user doesn't know it's happening then it's not as guessable, but prefix and postfix the actual id with 2 or 3 random numbers/letters.
If I saw 9d2a1me3 I wouldn't guess that dm2a2dq2 was the next in the series.
Try Xor'ing the $id with some value, e.g. $id ^ 46418 - and to convert back to your original id you just perform the same Xor again i.e. $mungedId ^ 46418. Stack this together with your base_convert and perhaps some swapping of chars in the resultant string and it'll get quite tricky to guess a URL.
Another way would be to set the maximum number of characters for the URL (let's say it's n). You could then choose a random number between 1 and n!, which would be your permutation number.
On which new URL, you would increment the id and use the permutation number to associate the actual id that would be used. Finally, you would base 32 (or whatever) encode your URL. This would be perfectly random and perfectly reversible.
If you want an injective function, you can use any form of encryption. For instance:
<?php
$key = "my secret";
$enc = mcrypt_ecb (MCRYPT_3DES, $key, "42", MCRYPT_ENCRYPT);
$f = unpack("H*", $enc);
$value = reset($f);
var_dump($value); //string(16) "1399e6a37a6e9870"
To reverse:
$rf = pack("H*", $value);
$dec = rtrim(mcrypt_ecb (MCRYPT_3DES, $key, $rf, MCRYPT_DECRYPT), "\x00");
var_dump($dec); //string(2) "42"
This will not give you a number in base 32; it will give you the encrypted data with each byte converted to base 16 (i.e., the conversion is global). If you really need, you can trivially convert this to base 10 and then to base 32 with any library that supports big integers.
You can pre-define the 4-character codes in advance (all possible combinations), then randomize that list and store it in this random order in a data table. When you want a new value, just grab the first one off the top and remove it from the list. It's fast, no on-the-fly calculation, and guarantees pseudo-randomness to the end-user.
Hashids is an open-source library that generates short, unique, non-sequential, YouTube-like ids from one or many numbers. You can think of it as an algorithm to obfuscate numbers.
It converts numbers like 347 into strings like "yr8", or array like [27, 986] into "3kTMd". You can also decode those ids back. This is useful in bundling several parameters into one or simply using them as short UIDs.
Use it when you don't want to expose your database ids to the user.
It allows custom alphabet as well as salt, so ids are unique only to you.
Incremental input is mangled to stay unguessable.
There are no collisions because the method is based on integer to hex conversion.
It was written with the intent of placing created ids in visible places, like the URL. Therefore, the algorithm avoids generating most common English curse words.
Code example
$hashids = new Hashids();
$id = $hashids->encode(1, 2, 3); // o2fXhV
$numbers = $hashids->decode($id); // [1, 2, 3]
I ended up creating a md5 sum of the identifier, use the first 4 alphanumerics of it and if this is a duplicate simply increment the length until it is no longer a duplicate.
function idToTinyurl($id) {
$md5 = md5($id);
for ($i = 4; $i < strlen($md5); $i++) {
$possibleTinyurl = substr($md5, 0, $i);
$res = mysql_query("SELECT id FROM tabke WHERE tinyurl='".$possibleTinyurl."' LIMIT 1");
if (mysql_num_rows($res) == 0) return $possibleTinyurl;
}
return $md5;
}
Accepted relet's answer as it's lead me to this strategy.
I'm have build an up php script to host large number of images upload by user, what is the best way to generate random numbers to image filenames so that in future there would be no filename conflict? Be it like Imageshack. Thanks.
$better_token = uniqid(md5(mt_rand()), true);
Easiest way would be a new GUID for each file.
http://www.php.net/manual/en/function.uniqid.php#65879
Here's how I implemented your solution
This example assumes i want to
Get a list, containing 50 numbers that is unique and random, and
This list of # to come from the number range of 0 to 1000
Code:
//developed by www.fatphuc.com
$array = array(); //define the array
//set random # range
$minNum = 0;
$maxNum = 1000;
// i just created this function, since we’ll be generating
// # in various sections, and i just want to make sure that
// if we need to change how we generate random #, we don’t
// have to make multiple changes to the codes everywhere.
// (basically, to prevent mistakes)
function GenerateRandomNumber($minNum, $maxNum){
return round(rand($minNum, $maxNum));
}
//generate 49 more random #s to give a total of 50 random #s
for($i = 1; $i <= 49; $i++){
$num1 = GenerateRandomNumber($minNum, $maxNum);
while(in_array($num1, $array)){
$num1 = GenerateRandomNumber($minNum, $maxNum);
}
$array[$i] = $num1;
}
asort($array); //just want to sort the array
//this simply prints the list of #s in list style
echo '<ol>';
foreach ($array as $var){
echo '<li>';
echo $var;
echo '</li>';
}
echo '</ol>';
Keep a persistent list of all the previous numbers you've generated(in a database table or in a file) and check that a newly generated number is not amongst the ones on the list. If you find this to be prohibitively expensive, generate random numbers on a sufficient number of bits to guarantee a very low probability of collision.
You can also use an incremental approach of assigning these numbers, like a concatenation of a timestamp_part based on the current time and a random_part, just to make sure you don't get collisions if multiple users upload files at the same time.
You could use microtime() as suggested above and then appending an hash of the original filename to further avoid collisions in the (rare) case of exact contemporary uploads.
There are several flaws in your postulate that random values will be unique - regardless of how good the random number generator is. Also, the better the random number generator, the longer it takes to calculate results.
Wouldn't it be better to use a hash of the datafile - that way you get the added benefit of detecting duplicate submissions.
If detecting duplicates is known to be a non-issue, then I'd still recommend this approach but modify the output based on detected collisions (but using a MUCH cheaper computation method than that proposed by Lo'oris) e.g.
$candidate_name=generate_hash_of_file($input_file);
$offset=0;
while ((file_exists($candidate_name . strrev($offset) && ($offset<50)) {
$offset++;
}
if ($offset<50) {
rename($input_file, $candidate_name . strrev($offset));
} else {
print "Congratulations - you've got the biggest storage network in the world by far!";
}
this would give you the capacity to store approx 25*2^63 files using a sha1 hash.
As to how to generate the hash, reading the entire file into PHP might be slow (particularly if you try to read it all into a single string to hash it). Most Linux/Posix/Unix systems come with tools like 'md5sum' which will generate a hash from a stream very efficiently.
C.
forge a filename
try to open that file
if it exists, goto 1
create the file
Using something based on a timestamp maybe. See the microtime function for details. Alternatively uniqid to generate a unique ID based on the current time.
Guaranteed unique cannot be random. Random cannot be guaranteed unique. If you want unique (without the random) then just use the integers: 0, 1, 2, ... 1235, 1236, 1237, ... Definitely unique, but not random.
If that doesn't suit, then you can have definitely unique with the appearance of random. You use encryption on the integers to make them appear random. Using DES will give you 32 bit numbers, while using AES will give you 64 bit numbers. Use either to encrypt 0, 1, 2, ... in order with the same key. All you need to store is the key and the next number to encrypt. Because encryption is reversible, then the encrypted numbers are guaranteed unique.
If 64 bit or 32 bit numbers are too large (32 bits is 8 hex digits) then look at a format preserving encryption which will give you a smaller size range at some cost in time.
My solution is usually a hash (MD5/SHA1/...) of the image contents. This has the added advantage that if people upload the same image twice you still only have one image on the hard disk, saving some space (ofc you have to make sure that the image is not deleted if one user deletes it and another user has the same image in use).