Related
I'm trying to create a randomized string in PHP, and I get absolutely no output with this:
<?php
function RandomString()
{
$characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
$randstring = '';
for ($i = 0; $i < 10; $i++) {
$randstring = $characters[rand(0, strlen($characters))];
}
return $randstring;
}
RandomString();
echo $randstring;
What am I doing wrong?
To answer this question specifically, two problems:
$randstring is not in scope when you echo it.
The characters are not getting concatenated together in the loop.
Here's a code snippet with the corrections:
function generateRandomString($length = 10) {
$characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
$charactersLength = strlen($characters);
$randomString = '';
for ($i = 0; $i < $length; $i++) {
$randomString .= $characters[random_int(0, $charactersLength - 1)];
}
return $randomString;
}
Output the random string with the call below:
// Echo the random string.
// Optionally, you can give it a desired string length.
echo generateRandomString();
Please note that previous version of this answer used rand() instead of random_int() and therefore generated predictable random strings. So it was changed to be more secure, following advice from this answer.
Note: str_shuffle() internally uses rand(), which is unsuitable for cryptography purposes (e.g. generating random passwords). You want a secure random number generator instead. It also doesn't allow characters to repeat.
One more way.
UPDATED (now this generates any length of string):
function generateRandomString($length = 10) {
return substr(str_shuffle(str_repeat($x='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', ceil($length/strlen($x)) )),1,$length);
}
echo generateRandomString(); // OR: generateRandomString(24)
That's it. :)
There are a lot of answers to this question, but none of them leverage a Cryptographically Secure Pseudo-Random Number Generator (CSPRNG).
The simple, secure, and correct answer is to use RandomLib and don't reinvent the wheel.
For those of you who insist on inventing your own solution, PHP 7.0.0 will provide random_int() for this purpose; if you're still on PHP 5.x, we wrote a PHP 5 polyfill for random_int() so you can use the new API even before you upgrade to PHP 7.
Safely generating random integers in PHP isn't a trivial task. You should always check with your resident StackExchange cryptography experts before you deploy a home-grown algorithm in production.
With a secure integer generator in place, generating a random string with a CSPRNG is a walk in the park.
Creating a Secure, Random String
/**
* Generate a random string, using a cryptographically secure
* pseudorandom number generator (random_int)
*
* This function uses type hints now (PHP 7+ only), but it was originally
* written for PHP 5 as well.
*
* For PHP 7, random_int is a PHP core function
* For PHP 5.x, depends on https://github.com/paragonie/random_compat
*
* #param int $length How many characters do we want?
* #param string $keyspace A string of all possible characters
* to select from
* #return string
*/
function random_str(
int $length = 64,
string $keyspace = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
): string {
if ($length < 1) {
throw new \RangeException("Length must be a positive integer");
}
$pieces = [];
$max = mb_strlen($keyspace, '8bit') - 1;
for ($i = 0; $i < $length; ++$i) {
$pieces []= $keyspace[random_int(0, $max)];
}
return implode('', $pieces);
}
Usage:
$a = random_str(32);
$b = random_str(8, 'abcdefghijklmnopqrstuvwxyz');
$c = random_str();
Demo: https://3v4l.org/IMJGF (Ignore the PHP 5 failures; it needs random_compat)
This creates a 20 character long hexadecimal string:
$string = bin2hex(openssl_random_pseudo_bytes(10)); // 20 chars
In PHP 7 (random_bytes()):
$string = base64_encode(random_bytes(10)); // ~14 characters, includes /=+
// or
$string = substr(str_replace(['+', '/', '='], '', base64_encode(random_bytes(32))), 0, 32); // 32 characters, without /=+
// or
$string = bin2hex(random_bytes(10)); // 20 characters, only 0-9a-f
#tasmaniski: your answer worked for me. I had the same problem, and I would suggest it for those who are ever looking for the same answer. Here it is from #tasmaniski:
<?php
$random = substr(md5(mt_rand()), 0, 7);
echo $random;
?>
Here is a youtube video showing us how to create a random number
Depending on your application (I wanted to generate passwords), you could use
$string = base64_encode(openssl_random_pseudo_bytes(30));
Being base64, they may contain = or - as well as the requested characters. You could generate a longer string, then filter and trim it to remove those.
openssl_random_pseudo_bytes seems to be the recommended way way to generate a proper random number in php. Why rand doesn't use /dev/random I don't know.
PHP 7+ Generate cryptographically secure random bytes using random_bytes function.
$bytes = random_bytes(16);
echo bin2hex($bytes);
Possible output
da821217e61e33ed4b2dd96f8439056c
PHP 5.3+ Generate pseudo-random bytes using openssl_random_pseudo_bytes function.
$bytes = openssl_random_pseudo_bytes(16);
echo bin2hex($bytes);
Possible output
e2d1254506fbb6cd842cd640333214ad
The best use case could be
function getRandomBytes($length = 16)
{
if (function_exists('random_bytes')) {
$bytes = random_bytes($length / 2);
} else {
$bytes = openssl_random_pseudo_bytes($length / 2);
}
return bin2hex($bytes);
}
echo getRandomBytes();
Possible output
ba8cc342bdf91143
Here is a simple one-liner that generates a true random string without any script level looping or use of OpenSSL libraries.
echo substr(str_shuffle(str_repeat('0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', mt_rand(1,10))), 1, 10);
To break it down so the parameters are clear
// Character List to Pick from
$chrList = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
// Minimum/Maximum times to repeat character List to seed from
$chrRepeatMin = 1; // Minimum times to repeat the seed string
$chrRepeatMax = 10; // Maximum times to repeat the seed string
// Length of Random String returned
$chrRandomLength = 10;
// The ONE LINE random command with the above variables.
echo substr(str_shuffle(str_repeat($chrList, mt_rand($chrRepeatMin,$chrRepeatMax))), 1, $chrRandomLength);
This method works by randomly repeating the character list, then shuffles the combined string, and returns the number of characters specified.
You can further randomize this, by randomizing the length of the returned string, replacing $chrRandomLength with mt_rand(8, 15) (for a random string between 8 and 15 characters).
A better way to implement this function is:
function RandomString($length) {
$keys = array_merge(range(0,9), range('a', 'z'));
$key = "";
for($i=0; $i < $length; $i++) {
$key .= $keys[mt_rand(0, count($keys) - 1)];
}
return $key;
}
echo RandomString(20);
mt_rand is more random according to this and this in PHP 7. The rand function is an alias of mt_rand.
function generateRandomString($length = 15)
{
return substr(sha1(rand()), 0, $length);
}
Tada!
$randstring in the function scope is not the same as the scope where you call it. You have to assign the return value to a variable.
$randstring = RandomString();
echo $randstring;
Or just directly echo the return value:
echo RandomString();
Also, in your function you have a little mistake. Within the for loop, you need to use .= so each character gets appended to the string. By using = you are overwriting it with each new character instead of appending.
$randstring .= $characters[rand(0, strlen($characters))];
First, you define the alphabet you want to use:
$alphanum = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789';
$special = '~!##$%^&*(){}[],./?';
$alphabet = $alphanum . $special;
Then, use openssl_random_pseudo_bytes() to generate proper random data:
$len = 12; // length of password
$random = openssl_random_pseudo_bytes($len);
Finally, you use this random data to create the password. Because each character in $random can be chr(0) until chr(255), the code uses the remainder after division of its ordinal value with $alphabet_length to make sure only characters from the alphabet are picked (note that doing so biases the randomness):
$alphabet_length = strlen($alphabet);
$password = '';
for ($i = 0; $i < $len; ++$i) {
$password .= $alphabet[ord($random[$i]) % $alphabet_length];
}
Alternatively, and generally better, is to use RandomLib and SecurityLib:
use SecurityLib\Strength;
$factory = new RandomLib\Factory;
$generator = $factory->getGenerator(new Strength(Strength::MEDIUM));
$password = $generator->generateString(12, $alphabet);
I've tested performance of most popular functions there, the time which is needed to generate 1'000'000 strings of 32 symbols on my box is:
2.5 $s = substr(str_shuffle(str_repeat($x='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ', ceil($length/strlen($x)) )),1,32);
1.9 $s = base64_encode(openssl_random_pseudo_bytes(24));
1.68 $s = bin2hex(openssl_random_pseudo_bytes(16));
0.63 $s = base64_encode(random_bytes(24));
0.62 $s = bin2hex(random_bytes(16));
0.37 $s = substr(md5(rand()), 0, 32);
0.37 $s = substr(md5(mt_rand()), 0, 32);
Please note it is not important how long it really was but which is slower and which one is faster so you can select according to your requirements including cryptography-readiness etc.
substr() around MD5 was added for sake of accuracy if you need string which is shorter than 32 symbols.
For sake of answer: the string was not concatenated but overwritten and result of the function was not stored.
Here's my simple one line solution to generate a use friendly random password, excluding the characters that lookalike such as "1" and "l", "O" and "0", etc... here it is 5 characters but you can easily change it of course:
$user_password = substr(str_shuffle('abcdefghjkmnpqrstuvwxyzABCDEFGHJKMNPQRSTUVWXYZ23456789'),0,5);
One very quick way is to do something like:
substr(md5(rand()),0,10);
This will generate a random string with the length of 10 chars. Of course, some might say it's a bit more heavy on the computation side, but nowadays processors are optimized to run md5 or sha256 algorithm very quickly. And of course, if the rand() function returns the same value, the result will be the same, having a 1 / 32767 chance of being the same. If security's the issue, then just change rand() to mt_rand()
function gen_uid($l=5){
return substr(str_shuffle("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"), 10, $l);
}
echo gen_uid();
Default Value[5]: WvPJz
echo gen_uid(30);
Value[30]: cAiGgtf1lDpFWoVwjykNKXxv6SC4Q2
Short Methods..
Here are some shortest method to generate the random string
<?php
echo $my_rand_strng = substr(str_shuffle("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"), -15);
echo substr(md5(rand()), 0, 7);
echo str_shuffle(MD5(microtime()));
?>
Helper function from Laravel 5 framework
/**
* Generate a "random" alpha-numeric string.
*
* Should not be considered sufficient for cryptography, etc.
*
* #param int $length
* #return string
*/
function str_random($length = 16)
{
$pool = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
return substr(str_shuffle(str_repeat($pool, $length)), 0, $length);
}
Since php7, there is the random_bytes functions.
https://www.php.net/manual/ru/function.random-bytes.php
So you can generate a random string like that
<?php
$bytes = random_bytes(5);
var_dump(bin2hex($bytes));
?>
from the yii2 framework
/**
* Generates a random string of specified length.
* The string generated matches [A-Za-z0-9_-]+ and is transparent to URL-encoding.
*
* #param int $length the length of the key in characters
* #return string the generated random key
*/
function generateRandomString($length = 10) {
$bytes = random_bytes($length);
return substr(strtr(base64_encode($bytes), '+/', '-_'), 0, $length);
}
function rndStr($len = 64) {
$randomData = file_get_contents('/dev/urandom', false, null, 0, $len) . uniqid(mt_rand(), true);
$str = substr(str_replace(array('/','=','+'),'', base64_encode($randomData)),0,$len);
return $str;
}
This one was taken from adminer sources:
/** Get a random string
* #return string 32 hexadecimal characters
*/
function rand_string() {
return md5(uniqid(mt_rand(), true));
}
Adminer, database management tool written in PHP.
/**
* #param int $length
* #param string $abc
* #return string
*/
function generateRandomString($length = 10, $abc = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
{
return substr(str_shuffle($abc), 0, $length);
}
Source from http://www.xeweb.net/2011/02/11/generate-a-random-string-a-z-0-9-in-php/
Another one-liner, which generates a random string of 10 characters with letters and numbers. It will create an array with range (adjust the second parameter to set the size), loops over this array and assigns a random ASCII character (range 0-9 or a-z), then implodes the array to get a string.
$str = implode('', array_map(function () { return chr(rand(0, 1) ? rand(48, 57) : rand(97, 122)); }, range(0, 9)));
Note: this only works in PHP 5.3 and later
One liner.
It is fast for huge strings with some uniqueness.
function random_string($length){
return substr(str_repeat(md5(rand()), ceil($length/32)), 0, $length);
}
function randomString($length = 5) {
return substr(str_shuffle(implode(array_merge(range('A','Z'), range('a','z'), range(0,9)))), 0, $length);
}
Here is how I am doing it to get a true unique random key:
$Length = 10;
$RandomString = substr(str_shuffle(md5(time())), 0, $Length);
echo $RandomString;
You can use time() since it is a Unix timestamp and is always unique compared to other random mentioned above. You can then generate the md5sum of that and take the desired length you need from the generated MD5 string. In this case I am using 10 characters, and I could use a longer string if I would want to make it more unique.
I hope this helps.
The edited version of the function works fine, but there is just one issue I found: You used the wrong character to enclose $characters, so the ’ character is sometimes part of the random string that is generated.
To fix this, change:
$characters = ’0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ’;
to:
$characters = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
This way only the enclosed characters are used, and the ’ character will never be a part of the random string that is generated.
function generateRandomString($length = 10, $hasNumber = true, $hasLowercase = true, $hasUppercase = true): string
{
$string = '';
if ($hasNumber)
$string .= '0123456789';
if ($hasLowercase)
$string .= 'abcdefghijklmnopqrstuvwxyz';
if ($hasUppercase)
$string .= 'ABCDEFGHIJKLMNOPQRSTUVWXYZ';
return substr(str_shuffle(str_repeat($x = $string, ceil($length / strlen($x)))), 1, $length);
}
and use:
echo generateRandomString(32);
I liked the last comment which used openssl_random_pseudo_bytes, but it wasn't a solution for me as I still had to remove the characters I didn't want, and I wasn't able to get a set length string. Here is my solution...
function rndStr($len = 20) {
$rnd='';
for($i=0;$i<$len;$i++) {
do {
$byte = openssl_random_pseudo_bytes(1);
$asc = chr(base_convert(substr(bin2hex($byte),0,2),16,10));
} while(!ctype_alnum($asc));
$rnd .= $asc;
}
return $rnd;
}
I need to generate a random 4 character string on each form submission. I got this solution from here.
which is this.
function genTicketString() {
return substr(md5(uniqid(mt_rand(), true)), 0, 4);
}
add_shortcode('quoteticket', 'genTicketString');
But this mostly generates a similar ID! probably if I can add the date & time along with the 4 character, it will fix it.
So how can I add the data & the time to the generated string?
As a direct answer to your question
To generate a pseudo random string, you can use this function :
function getPseudoRandomString($length = 4) {
$base64Chars = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ+/';
$result = '';
for ($i = 0; $i < $length; ++$i) {
$result .= $base64Chars[mt_rand(0, strlen($base64Chars) - 1)];
}
return $result;
}
NOTE : This generates a pseudo random string, there is no way to be sure the string is unique.
To get a "more unique" string
first, you should use a longer string : 4 chars is really small : there are only 16 million possibilities with a set of 64 chars.
Then, If you want to add more unicity, you can concatenate a random generated string with the result of uniqid('', true) http://php.net/manual/en/function.uniqid.php
To be sure the string has never been generated
The only way to to be sure the string has never been generated is to save all generated strings in a database and when you generate a new string, you have to check if the string already exists in the database to generate a new one if needed.
The generator function will look like
function generateUniqueString()
do {
$string = generateString();
while (is_in_database($string));
save_in_database($string);
return $string;
}
I used this answer instead with little modifications
function genTicketString() {
$d=date ("d");
$m=date ("m");
$y=date ("Y");
$t=time();
$dmt=$d+$m+$y+$t;
$ran= rand(0,10000000);
$dmtran= $dmt+$ran;
$un= uniqid();
$dmtun = $dmt.$un;
$mdun = md5($dmtran.$un);
$sort=substr($mdun, 0, 6); // if you want sort length code.
$sort=strtoupper($sort);
return $sort;
}
add_shortcode('quoteticket', 'genTicketString');
I am trying to find a way to encode a database ID into a short URL, e.g. 1 should become "Ys47R". Then I would like to decode it back from "Ys47R" to 1 so I can run a database search using the INT value. It needs to be unique using the database ID. The sequence should not be easily guessable such as 1 = "Ys47R", 2 = "Ys47S". It should be something along the lines of YouTube or bitly's URL's. I have read up on hundreds of different sources using md5, base32, base64 and `bcpow but have come up empty.
This blog post looked promising but once I added padding and a passkey, short ID's such as 1 became SDDDG, 2 became "SDDDH" and 3 became "SDDDI". It is not very random.
base32 used only a-b 0-9
base64 had characters such as == on the end.
I then tried this:
function getRandomString($db, $length = 7) {
$validCharacters = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
$validCharNumber = strlen($validCharacters);
$result = "";
for ($i = 0; $i < $length; $i++) {
$index = mt_rand(0, $validCharNumber - 1);
$result .= $validCharacters[$index];
}
Which worked but meant I had to run a database query every time to make sure there were no collisions and it did not exist in the database.
Is there a way I can create short ID's that are 4 characters minimum with a charset of [a-z][A-Z][0-9] that can be encoded and decoded back, using increment unique ID in a database where each number is unique. I can't get my head around advance techniques using base32 or base64.
Or am I looking into this too much and there is an easier way to do it? Would it be best to do the random string function above and query the database to check for uniqueness all the time?
You could use function from comments: http://php.net/manual/en/function.base-convert.php#106546
$initial = '11111111';
$dic = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
var_dump($converted = convBase($initial, '0123456789', $dic));
// string(4) "KCvt"
var_dump(convBase($converted, $dic, '0123456789'));
// string(8) "11111111"
function convBase($numberInput, $fromBaseInput, $toBaseInput)
{
if ($fromBaseInput==$toBaseInput) return $numberInput;
$fromBase = str_split($fromBaseInput,1);
$toBase = str_split($toBaseInput,1);
$number = str_split($numberInput,1);
$fromLen=strlen($fromBaseInput);
$toLen=strlen($toBaseInput);
$numberLen=strlen($numberInput);
$retval='';
if ($toBaseInput == '0123456789')
{
$retval=0;
for ($i = 1;$i <= $numberLen; $i++)
$retval = bcadd($retval, bcmul(array_search($number[$i-1], $fromBase),bcpow($fromLen,$numberLen-$i)));
return $retval;
}
if ($fromBaseInput != '0123456789')
$base10=convBase($numberInput, $fromBaseInput, '0123456789');
else
$base10 = $numberInput;
if ($base10<strlen($toBaseInput))
return $toBase[$base10];
while($base10 != '0')
{
$retval = $toBase[bcmod($base10,$toLen)].$retval;
$base10 = bcdiv($base10,$toLen,0);
}
return $retval;
}
If you want some symmetric obfuscation, then base_convert() is often sufficient.
base_convert($id, 10, 36);
Will return strings like 1i0g and convert them back.
Before and after that base conversion you can add:
To get a minimum string length, I'd suggest just adding 70000 to your $id. And on the receiving end just subtract that again.
A minor multiplication $id *= 3 would add some "holes" in the generated alphanumeric ID range, yet not exhaust the available string space.
For some appearance of arbitrariness, a bit of nibble moving:
$id = ($id & 0xF0F0F0F) << 4
| ($id & 0x0F0F0F0) >> 4;
Which works for generating your obfuscated ID strings, and getting back the original ones.
Just to be crystal clear: this is no encryption of any sort. It just shifts numeric jumps between consecutive numbers, and looks slightly more arbitrary.
You still may not like the answer, but generating random IDs in your database is the only approach that really hinders ID guessing.
Closed. This question needs to be more focused. It is not currently accepting answers.
Closed 1 year ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I want to create a URL shortener service where you can write a long URL into an input field and the service shortens the URL to "http://www.example.org/abcdef".
Instead of "abcdef" there can be any other string with six characters containing a-z, A-Z and 0-9. That makes 56~57 billion possible strings.
My approach:
I have a database table with three columns:
id, integer, auto-increment
long, string, the long URL the user entered
short, string, the shortened URL (or just the six characters)
I would then insert the long URL into the table. Then I would select the auto-increment value for "id" and build a hash of it. This hash should then be inserted as "short". But what sort of hash should I build? Hash algorithms like MD5 create too long strings. I don't use these algorithms, I think. A self-built algorithm will work, too.
My idea:
For "http://www.google.de/" I get the auto-increment id 239472. Then I do the following steps:
short = '';
if divisible by 2, add "a"+the result to short
if divisible by 3, add "b"+the result to short
... until I have divisors for a-z and A-Z.
That could be repeated until the number isn't divisible any more. Do you think this is a good approach? Do you have a better idea?
Due to the ongoing interest in this topic, I've published an efficient solution to GitHub, with implementations for JavaScript, PHP, Python and Java. Add your solutions if you like :)
I would continue your "convert number to string" approach. However, you will realize that your proposed algorithm fails if your ID is a prime and greater than 52.
Theoretical background
You need a Bijective Function f. This is necessary so that you can find a inverse function g('abc') = 123 for your f(123) = 'abc' function. This means:
There must be no x1, x2 (with x1 ≠ x2) that will make f(x1) = f(x2),
and for every y you must be able to find an x so that f(x) = y.
How to convert the ID to a shortened URL
Think of an alphabet we want to use. In your case, that's [a-zA-Z0-9]. It contains 62 letters.
Take an auto-generated, unique numerical key (the auto-incremented id of a MySQL table for example).
For this example, I will use 12510 (125 with a base of 10).
Now you have to convert 12510 to X62 (base 62).
12510 = 2×621 + 1×620 = [2,1]
This requires the use of integer division and modulo. A pseudo-code example:
digits = []
while num > 0
remainder = modulo(num, 62)
digits.push(remainder)
num = divide(num, 62)
digits = digits.reverse
Now map the indices 2 and 1 to your alphabet. This is how your mapping (with an array for example) could look like:
0 → a
1 → b
...
25 → z
...
52 → 0
61 → 9
With 2 → c and 1 → b, you will receive cb62 as the shortened URL.
http://shor.ty/cb
How to resolve a shortened URL to the initial ID
The reverse is even easier. You just do a reverse lookup in your alphabet.
e9a62 will be resolved to "4th, 61st, and 0th letter in the alphabet".
e9a62 = [4,61,0] = 4×622 + 61×621 + 0×620 = 1915810
Now find your database-record with WHERE id = 19158 and do the redirect.
Example implementations (provided by commenters)
C++
Python
Ruby
Haskell
C#
CoffeeScript
Perl
Why would you want to use a hash?
You can just use a simple translation of your auto-increment value to an alphanumeric value. You can do that easily by using some base conversion. Say you character space (A-Z, a-z, 0-9, etc.) has 62 characters, convert the id to a base-40 number and use the characters as the digits.
public class UrlShortener {
private static final String ALPHABET = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
private static final int BASE = ALPHABET.length();
public static String encode(int num) {
StringBuilder sb = new StringBuilder();
while ( num > 0 ) {
sb.append( ALPHABET.charAt( num % BASE ) );
num /= BASE;
}
return sb.reverse().toString();
}
public static int decode(String str) {
int num = 0;
for ( int i = 0; i < str.length(); i++ )
num = num * BASE + ALPHABET.indexOf(str.charAt(i));
return num;
}
}
Not an answer to your question, but I wouldn't use case-sensitive shortened URLs. They are hard to remember, usually unreadable (many fonts render 1 and l, 0 and O and other characters very very similar that they are near impossible to tell the difference) and downright error prone. Try to use lower or upper case only.
Also, try to have a format where you mix the numbers and characters in a predefined form. There are studies that show that people tend to remember one form better than others (think phone numbers, where the numbers are grouped in a specific form). Try something like num-char-char-num-char-char. I know this will lower the combinations, especially if you don't have upper and lower case, but it would be more usable and therefore useful.
My approach: Take the Database ID, then Base36 Encode it. I would NOT use both Upper AND Lowercase letters, because that makes transmitting those URLs over the telephone a nightmare, but you could of course easily extend the function to be a base 62 en/decoder.
Here is my PHP 5 class.
<?php
class Bijective
{
public $dictionary = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
public function __construct()
{
$this->dictionary = str_split($this->dictionary);
}
public function encode($i)
{
if ($i == 0)
return $this->dictionary[0];
$result = '';
$base = count($this->dictionary);
while ($i > 0)
{
$result[] = $this->dictionary[($i % $base)];
$i = floor($i / $base);
}
$result = array_reverse($result);
return join("", $result);
}
public function decode($input)
{
$i = 0;
$base = count($this->dictionary);
$input = str_split($input);
foreach($input as $char)
{
$pos = array_search($char, $this->dictionary);
$i = $i * $base + $pos;
}
return $i;
}
}
A Node.js and MongoDB solution
Since we know the format that MongoDB uses to create a new ObjectId with 12 bytes.
a 4-byte value representing the seconds since the Unix epoch,
a 3-byte machine identifier,
a 2-byte process id
a 3-byte counter (in your machine), starting with a random value.
Example (I choose a random sequence)
a1b2c3d4e5f6g7h8i9j1k2l3
a1b2c3d4 represents the seconds since the Unix epoch,
4e5f6g7 represents machine identifier,
h8i9 represents process id
j1k2l3 represents the counter, starting with a random value.
Since the counter will be unique if we are storing the data in the same machine we can get it with no doubts that it will be duplicate.
So the short URL will be the counter and here is a code snippet assuming that your server is running properly.
const mongoose = require('mongoose');
const Schema = mongoose.Schema;
// Create a schema
const shortUrl = new Schema({
long_url: { type: String, required: true },
short_url: { type: String, required: true, unique: true },
});
const ShortUrl = mongoose.model('ShortUrl', shortUrl);
// The user can request to get a short URL by providing a long URL using a form
app.post('/shorten', function(req ,res){
// Create a new shortUrl */
// The submit form has an input with longURL as its name attribute.
const longUrl = req.body["longURL"];
const newUrl = ShortUrl({
long_url : longUrl,
short_url : "",
});
const shortUrl = newUrl._id.toString().slice(-6);
newUrl.short_url = shortUrl;
console.log(newUrl);
newUrl.save(function(err){
console.log("the new URL is added");
})
});
I keep incrementing an integer sequence per domain in the database and use Hashids to encode the integer into a URL path.
static hashids = Hashids(salt = "my app rocks", minSize = 6)
I ran a script to see how long it takes until it exhausts the character length. For six characters it can do 164,916,224 links and then goes up to seven characters. Bitly uses seven characters. Under five characters looks weird to me.
Hashids can decode the URL path back to a integer but a simpler solution is to use the entire short link sho.rt/ka8ds3 as a primary key.
Here is the full concept:
function addDomain(domain) {
table("domains").insert("domain", domain, "seq", 0)
}
function addURL(domain, longURL) {
seq = table("domains").where("domain = ?", domain).increment("seq")
shortURL = domain + "/" + hashids.encode(seq)
table("links").insert("short", shortURL, "long", longURL)
return shortURL
}
// GET /:hashcode
function handleRequest(req, res) {
shortURL = req.host + "/" + req.param("hashcode")
longURL = table("links").where("short = ?", shortURL).get("long")
res.redirect(301, longURL)
}
C# version:
public class UrlShortener
{
private static String ALPHABET = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
private static int BASE = 62;
public static String encode(int num)
{
StringBuilder sb = new StringBuilder();
while ( num > 0 )
{
sb.Append( ALPHABET[( num % BASE )] );
num /= BASE;
}
StringBuilder builder = new StringBuilder();
for (int i = sb.Length - 1; i >= 0; i--)
{
builder.Append(sb[i]);
}
return builder.ToString();
}
public static int decode(String str)
{
int num = 0;
for ( int i = 0, len = str.Length; i < len; i++ )
{
num = num * BASE + ALPHABET.IndexOf( str[(i)] );
}
return num;
}
}
You could hash the entire URL, but if you just want to shorten the id, do as marcel suggested. I wrote this Python implementation:
https://gist.github.com/778542
Take a look at https://hashids.org/ it is open source and in many languages.
Their page outlines some of the pitfalls of other approaches.
If you don't want re-invent the wheel ... http://lilurl.sourceforge.net/
// simple approach
$original_id = 56789;
$shortened_id = base_convert($original_id, 10, 36);
$un_shortened_id = base_convert($shortened_id, 36, 10);
alphabet = map(chr, range(97,123)+range(65,91)) + map(str,range(0,10))
def lookup(k, a=alphabet):
if type(k) == int:
return a[k]
elif type(k) == str:
return a.index(k)
def encode(i, a=alphabet):
'''Takes an integer and returns it in the given base with mappings for upper/lower case letters and numbers 0-9.'''
try:
i = int(i)
except Exception:
raise TypeError("Input must be an integer.")
def incode(i=i, p=1, a=a):
# Here to protect p.
if i <= 61:
return lookup(i)
else:
pval = pow(62,p)
nval = i/pval
remainder = i % pval
if nval <= 61:
return lookup(nval) + incode(i % pval)
else:
return incode(i, p+1)
return incode()
def decode(s, a=alphabet):
'''Takes a base 62 string in our alphabet and returns it in base10.'''
try:
s = str(s)
except Exception:
raise TypeError("Input must be a string.")
return sum([lookup(i) * pow(62,p) for p,i in enumerate(list(reversed(s)))])a
Here's my version for whomever needs it.
Why not just translate your id to a string? You just need a function that maps a digit between, say, 0 and 61 to a single letter (upper/lower case) or digit. Then apply this to create, say, 4-letter codes, and you've got 14.7 million URLs covered.
Here is a decent URL encoding function for PHP...
// From http://snipplr.com/view/22246/base62-encode--decode/
private function base_encode($val, $base=62, $chars='0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ') {
$str = '';
do {
$i = fmod($val, $base);
$str = $chars[$i] . $str;
$val = ($val - $i) / $base;
} while($val > 0);
return $str;
}
Don't know if anyone will find this useful - it is more of a 'hack n slash' method, yet is simple and works nicely if you want only specific chars.
$dictionary = "abcdfghjklmnpqrstvwxyz23456789";
$dictionary = str_split($dictionary);
// Encode
$str_id = '';
$base = count($dictionary);
while($id > 0) {
$rem = $id % $base;
$id = ($id - $rem) / $base;
$str_id .= $dictionary[$rem];
}
// Decode
$id_ar = str_split($str_id);
$id = 0;
for($i = count($id_ar); $i > 0; $i--) {
$id += array_search($id_ar[$i-1], $dictionary) * pow($base, $i - 1);
}
Did you omit O, 0, and i on purpose?
I just created a PHP class based on Ryan's solution.
<?php
$shorty = new App_Shorty();
echo 'ID: ' . 1000;
echo '<br/> Short link: ' . $shorty->encode(1000);
echo '<br/> Decoded Short Link: ' . $shorty->decode($shorty->encode(1000));
/**
* A nice shorting class based on Ryan Charmley's suggestion see the link on Stack Overflow below.
* #author Svetoslav Marinov (Slavi) | http://WebWeb.ca
* #see http://stackoverflow.com/questions/742013/how-to-code-a-url-shortener/10386945#10386945
*/
class App_Shorty {
/**
* Explicitly omitted: i, o, 1, 0 because they are confusing. Also use only lowercase ... as
* dictating this over the phone might be tough.
* #var string
*/
private $dictionary = "abcdfghjklmnpqrstvwxyz23456789";
private $dictionary_array = array();
public function __construct() {
$this->dictionary_array = str_split($this->dictionary);
}
/**
* Gets ID and converts it into a string.
* #param int $id
*/
public function encode($id) {
$str_id = '';
$base = count($this->dictionary_array);
while ($id > 0) {
$rem = $id % $base;
$id = ($id - $rem) / $base;
$str_id .= $this->dictionary_array[$rem];
}
return $str_id;
}
/**
* Converts /abc into an integer ID
* #param string
* #return int $id
*/
public function decode($str_id) {
$id = 0;
$id_ar = str_split($str_id);
$base = count($this->dictionary_array);
for ($i = count($id_ar); $i > 0; $i--) {
$id += array_search($id_ar[$i - 1], $this->dictionary_array) * pow($base, $i - 1);
}
return $id;
}
}
?>
public class TinyUrl {
private final String characterMap = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
private final int charBase = characterMap.length();
public String covertToCharacter(int num){
StringBuilder sb = new StringBuilder();
while (num > 0){
sb.append(characterMap.charAt(num % charBase));
num /= charBase;
}
return sb.reverse().toString();
}
public int covertToInteger(String str){
int num = 0;
for(int i = 0 ; i< str.length(); i++)
num += characterMap.indexOf(str.charAt(i)) * Math.pow(charBase , (str.length() - (i + 1)));
return num;
}
}
class TinyUrlTest{
public static void main(String[] args) {
TinyUrl tinyUrl = new TinyUrl();
int num = 122312215;
String url = tinyUrl.covertToCharacter(num);
System.out.println("Tiny url: " + url);
System.out.println("Id: " + tinyUrl.covertToInteger(url));
}
}
This is what I use:
# Generate a [0-9a-zA-Z] string
ALPHABET = map(str,range(0, 10)) + map(chr, range(97, 123) + range(65, 91))
def encode_id(id_number, alphabet=ALPHABET):
"""Convert an integer to a string."""
if id_number == 0:
return alphabet[0]
alphabet_len = len(alphabet) # Cache
result = ''
while id_number > 0:
id_number, mod = divmod(id_number, alphabet_len)
result = alphabet[mod] + result
return result
def decode_id(id_string, alphabet=ALPHABET):
"""Convert a string to an integer."""
alphabet_len = len(alphabet) # Cache
return sum([alphabet.index(char) * pow(alphabet_len, power) for power, char in enumerate(reversed(id_string))])
It's very fast and can take long integers.
For a similar project, to get a new key, I make a wrapper function around a random string generator that calls the generator until I get a string that hasn't already been used in my hashtable. This method will slow down once your name space starts to get full, but as you have said, even with only 6 characters, you have plenty of namespace to work with.
I have a variant of the problem, in that I store web pages from many different authors and need to prevent discovery of pages by guesswork. So my short URLs add a couple of extra digits to the Base-62 string for the page number. These extra digits are generated from information in the page record itself and they ensure that only 1 in 3844 URLs are valid (assuming 2-digit Base-62). You can see an outline description at http://mgscan.com/MBWL.
Very good answer, I have created a Golang implementation of the bjf:
package bjf
import (
"math"
"strings"
"strconv"
)
const alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
func Encode(num string) string {
n, _ := strconv.ParseUint(num, 10, 64)
t := make([]byte, 0)
/* Special case */
if n == 0 {
return string(alphabet[0])
}
/* Map */
for n > 0 {
r := n % uint64(len(alphabet))
t = append(t, alphabet[r])
n = n / uint64(len(alphabet))
}
/* Reverse */
for i, j := 0, len(t) - 1; i < j; i, j = i + 1, j - 1 {
t[i], t[j] = t[j], t[i]
}
return string(t)
}
func Decode(token string) int {
r := int(0)
p := float64(len(token)) - 1
for i := 0; i < len(token); i++ {
r += strings.Index(alphabet, string(token[i])) * int(math.Pow(float64(len(alphabet)), p))
p--
}
return r
}
Hosted at github: https://github.com/xor-gate/go-bjf
Implementation in Scala:
class Encoder(alphabet: String) extends (Long => String) {
val Base = alphabet.size
override def apply(number: Long) = {
def encode(current: Long): List[Int] = {
if (current == 0) Nil
else (current % Base).toInt :: encode(current / Base)
}
encode(number).reverse
.map(current => alphabet.charAt(current)).mkString
}
}
class Decoder(alphabet: String) extends (String => Long) {
val Base = alphabet.size
override def apply(string: String) = {
def decode(current: Long, encodedPart: String): Long = {
if (encodedPart.size == 0) current
else decode(current * Base + alphabet.indexOf(encodedPart.head),encodedPart.tail)
}
decode(0,string)
}
}
Test example with Scala test:
import org.scalatest.{FlatSpec, Matchers}
class DecoderAndEncoderTest extends FlatSpec with Matchers {
val Alphabet = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
"A number with base 10" should "be correctly encoded into base 62 string" in {
val encoder = new Encoder(Alphabet)
encoder(127) should be ("cd")
encoder(543513414) should be ("KWGPy")
}
"A base 62 string" should "be correctly decoded into a number with base 10" in {
val decoder = new Decoder(Alphabet)
decoder("cd") should be (127)
decoder("KWGPy") should be (543513414)
}
}
Function based in Xeoncross Class
function shortly($input){
$dictionary = ['a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','0','1','2','3','4','5','6','7','8','9'];
if($input===0)
return $dictionary[0];
$base = count($dictionary);
if(is_numeric($input)){
$result = [];
while($input > 0){
$result[] = $dictionary[($input % $base)];
$input = floor($input / $base);
}
return join("", array_reverse($result));
}
$i = 0;
$input = str_split($input);
foreach($input as $char){
$pos = array_search($char, $dictionary);
$i = $i * $base + $pos;
}
return $i;
}
Here is a Node.js implementation that is likely to bit.ly. generate a highly random seven-character string.
It uses Node.js crypto to generate a highly random 25 charset rather than randomly selecting seven characters.
var crypto = require("crypto");
exports.shortURL = new function () {
this.getShortURL = function () {
var sURL = '',
_rand = crypto.randomBytes(25).toString('hex'),
_base = _rand.length;
for (var i = 0; i < 7; i++)
sURL += _rand.charAt(Math.floor(Math.random() * _rand.length));
return sURL;
};
}
My Python 3 version
base_list = list("0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
base = len(base_list)
def encode(num: int):
result = []
if num == 0:
result.append(base_list[0])
while num > 0:
result.append(base_list[num % base])
num //= base
print("".join(reversed(result)))
def decode(code: str):
num = 0
code_list = list(code)
for index, code in enumerate(reversed(code_list)):
num += base_list.index(code) * base ** index
print(num)
if __name__ == '__main__':
encode(341413134141)
decode("60FoItT")
For a quality Node.js / JavaScript solution, see the id-shortener module, which is thoroughly tested and has been used in production for months.
It provides an efficient id / URL shortener backed by pluggable storage defaulting to Redis, and you can even customize your short id character set and whether or not shortening is idempotent. This is an important distinction that not all URL shorteners take into account.
In relation to other answers here, this module implements the Marcel Jackwerth's excellent accepted answer above.
The core of the solution is provided by the following Redis Lua snippet:
local sequence = redis.call('incr', KEYS[1])
local chars = '0123456789ABCDEFGHJKLMNPQRSTUVWXYZ_abcdefghijkmnopqrstuvwxyz'
local remaining = sequence
local slug = ''
while (remaining > 0) do
local d = (remaining % 60)
local character = string.sub(chars, d + 1, d + 1)
slug = character .. slug
remaining = (remaining - d) / 60
end
redis.call('hset', KEYS[2], slug, ARGV[1])
return slug
Why not just generate a random string and append it to the base URL? This is a very simplified version of doing this in C#.
static string chars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890";
static string baseUrl = "https://google.com/";
private static string RandomString(int length)
{
char[] s = new char[length];
Random rnd = new Random();
for (int x = 0; x < length; x++)
{
s[x] = chars[rnd.Next(chars.Length)];
}
Thread.Sleep(10);
return new String(s);
}
Then just add the append the random string to the baseURL:
string tinyURL = baseUrl + RandomString(5);
Remember this is a very simplified version of doing this and it's possible the RandomString method could create duplicate strings. In production you would want to take in account for duplicate strings to ensure you will always have a unique URL. I have some code that takes account for duplicate strings by querying a database table I could share if anyone is interested.
This is my initial thoughts, and more thinking can be done, or some simulation can be made to see if it works well or any improvement is needed:
My answer is to remember the long URL in the database, and use the ID 0 to 9999999999999999 (or however large the number is needed).
But the ID 0 to 9999999999999999 can be an issue, because
it can be shorter if we use hexadecimal, or even base62 or base64. (base64 just like YouTube using A-Z a-z 0-9 _ and -)
if it increases from 0 to 9999999999999999 uniformly, then hackers can visit them in that order and know what URLs people are sending each other, so it can be a privacy issue
We can do this:
have one server allocate 0 to 999 to one server, Server A, so now Server A has 1000 of such IDs. So if there are 20 or 200 servers constantly wanting new IDs, it doesn't have to keep asking for each new ID, but rather asking once for 1000 IDs
for the ID 1, for example, reverse the bits. So 000...00000001 becomes 10000...000, so that when converted to base64, it will be non-uniformly increasing IDs each time.
use XOR to flip the bits for the final IDs. For example, XOR with 0xD5AA96...2373 (like a secret key), and the some bits will be flipped. (whenever the secret key has the 1 bit on, it will flip the bit of the ID). This will make the IDs even harder to guess and appear more random
Following this scheme, the single server that allocates the IDs can form the IDs, and so can the 20 or 200 servers requesting the allocation of IDs. The allocating server has to use a lock / semaphore to prevent two requesting servers from getting the same batch (or if it is accepting one connection at a time, this already solves the problem). So we don't want the line (queue) to be too long for waiting to get an allocation. So that's why allocating 1000 or 10000 at a time can solve the issue.
I want to create a hash function that will receive strings and output the corresponding value in an array that has predefined "proportions". For instance, if my array holds the values:
[0] => "output number 1"
[1] => "output number 2"
[2] => "output number 3"
Then the hash function int H(string) should return only values in the range 0 and 2 for any given string (an input string will always return the same key).
The thing is that i want it to also make judgement by predefined proportions so, for instance
85% of given strings will hash out as 0, 10% as 1 and 5% as 2. If there are functions that can emulate normal distribution that will be even better.
I also want it to be fast as it will run frequently. Can someone point me to the right direction on how to approach this in php? I believe I'm not the first one that asked this but I came short digging on SO for an hour.
EDIT:
What i did until now is built a hash function in c. It does the above hashing without proportions (still not comfortable with php):
int StringFcn (const void *key, size_t arrSize)
{
char *str = key;
int totalAsciiVal = 0;
while(*str)
{
totalAsciiVal += *str++;
}
return totalAsciiVal % arrSize;
}
What about doing something like this:
// Hash the string so you can pretty much guarantee it will have a number in it and it is relatively "random"
$hash = sha1($string);
// Iterate through string and get ASCII values
$chars = str_split($hash);
$num = 0;
foreach ($chars as $char) {
$num += ord($int);
}
// Get compare modulo 100 of the number
if ($num % 100 < 85) {
return 0;
}
if ($num % 100 < 95) {
return 1;
}
return 2;
Edit:
Instead of hashing with sha1, you can get a sufficiently large integer directly using crc32 (thanks to #nivrig in the comments).
// Convert string to integer
$num = crc32($string);
// Get compare modulo 100 of the number
if ($num % 100 < 85) {
return 0;
}
if ($num % 100 < 95) {
return 1;
}
return 2;