Simple hashing function to map variable length URLs to numbers (0...3) - php

My website use various resources from a single domain, for example:
http://static.example.com/javascript/common.js
http://static.example.com/javascript/common.css
http://static.example.com/javascript/menu/menu.js
http://static.example.com/javascript/menu/menu.css
http://static.example.com/images/1804/logo/02000100.jpg
http://static.example.com/images/1804/headers/main/09400060.png
http://static.example.com/images/1804/headers/home/1101/06900200-01.jpg
http://static.example.com/images/1804/headers/home/1101/06900200-02.jpg
I need a very simple string hashing function that maps these URLs to numbers, the numbers being 0, 1, 2 and 3. The algorithm should be deterministic and uniform. I have tagged the question PHP but a generic answer is acceptable.
You might have guessed why I need this; I plan to change the URLs to, for example:
http://0.static.example.com/javascript/common.js
http://2.static.example.com/javascript/common.css

I prefer doing a crc32 hash of the string, and taking its modulo with the limit.
Code:
function numeric_hash($str, $range) {
return sprintf("%u", crc32($str)) % $range;
}
Usage:
$str = "http://static.example.com/javascript/common.js
http://static.example.com/javascript/common.css
http://static.example.com/javascript/menu/menu.js
http://static.example.com/javascript/menu/menu.css
http://static.example.com/images/1804/logo/02000100.jpg
http://static.example.com/images/1804/headers/main/09400060.png
http://static.example.com/images/1804/headers/home/1101/06900200-01.jpg
http://static.example.com/images/1804/headers/home/1101/06900200-02.jpg";
$urls = explode("\n", $str);
foreach($urls as $url) {
echo numeric_hash($url, 4) . "\n";
}
Output:
1
3
3
3
1
3
1
3

If you have lots of URLs you should just a strong hashing and then take mod <noBuckets>
MD5(URL) % 4
If you have few URLs or you have uneven size or call frequency a "random" distribution might be bad and you should just create four lists and statically assign your URLs to each list, either manually or using some heuristic based on number of requests per URL.

Related

how to create a row of digits based on a string in php [duplicate]

In php is there a way to give a unique hash from a string, but that the hash was made up from numbers only?
example:
return md5(234); // returns 098f6bcd4621d373cade4e832627b4f6
but I need
return numhash(234); // returns 00978902923102372190
(20 numbers only)
the problem here is that I want the hashing to be short.
edit:
OK let me explain the back story here.
I have a site that has a ID for every registered person, also I need a ID for the person to use and exchange (hence it can't be too long), so far the ID numbering has been 00001, 00002, 00003 etc...
this makes some people look more important
this reveals application info that I don't want to reveal.
To fix point 1 and 2 I need to "hide" the number while keeping it unique.
Edit + SOLUTION:
Numeric hash function based on the code by https://stackoverflow.com/a/23679870/175071
/**
* Return a number only hash
* https://stackoverflow.com/a/23679870/175071
* #param $str
* #param null $len
* #return number
*/
public function numHash($str, $len=null)
{
$binhash = md5($str, true);
$numhash = unpack('N2', $binhash);
$hash = $numhash[1] . $numhash[2];
if($len && is_int($len)) {
$hash = substr($hash, 0, $len);
}
return $hash;
}
// Usage
numHash(234, 20); // always returns 6814430791721596451
An MD5 or SHA1 hash in PHP returns a hexadecimal number, so all you need to do is convert bases. PHP has a function that can do this for you:
$bignum = hexdec( md5("test") );
or
$bignum = hexdec( sha1("test") );
PHP Manual for hexdec
Since you want a limited size number, you could then use modular division to put it in a range you want.
$smallnum = $bignum % [put your upper bound here]
EDIT
As noted by Artefacto in the comments, using this approach will result in a number beyond the maximum size of an Integer in PHP, and the result after modular division will always be 0. However, taking a substring of the hash that contains the first 16 characters doesn't have this problem. Revised version for calculating the initial large number:
$bignum = hexdec( substr(sha1("test"), 0, 15) );
You can try crc32(). See the documentation at: http://php.net/manual/en/function.crc32.php
$checksum = crc32("The quick brown fox jumped over the lazy dog.");
printf("%u\n", $checksum); // prints 2191738434
With that said, crc should only be used to validate the integrity of data.
There are some good answers but for me the approaches seem silly.
They first force php to create a Hex number, then convert this back (hexdec) in a BigInteger and then cut it down to a number of letters... this is much work!
Instead why not
Read the hash as binary:
$binhash = md5('[input value]', true);
then using
$numhash = unpack('N2', $binhash); //- or 'V2' for little endian
to cast this as two INTs ($numhash is an array of two elements). Now you can reduce the number of bits in the number simply using an AND operation. e.g:
$result = $numhash[1] & 0x000FFFFF; //- to get numbers between 0 and 1048575
But be warned of collisions! Reducing the number means increasing the probability of two different [input value] with the same output.
I think that the much better way would be the use of "ID-Crypting" with a Bijectiv function. So no collisions could happen! For the simplest kind just use an Affine_cipher
Example with max input value range from 0 to 25:
function numcrypt($a)
{
return ($a * 15) % 26;
}
function unnumcrypt($a)
{
return ($a * 7) % 26;
}
Output:
numcrypt(1) : 15
numcrypt(2) : 4
numcrypt(3) : 19
unnumcrypt(15) : 1
unnumcrypt(4) : 2
unnumcrypt(19) : 3
e.g.
$id = unnumcrypt($_GET('userid'));
... do something with the ID ...
echo ' go ';
of course this is not secure, but if no one knows the method used for your encryption then there are no security reasons then this way is faster and collision safe.
The problem of cut off the hash are the collisions, to avoid it try:
return hexdec(crc32("Hello World"));
The crc32():
Generates the cyclic redundancy checksum polynomial of 32-bit lengths
of the str. This is usually used to validate the integrity of data
being transmitted.
That give us an integer of 32 bit, negative in 32 bits installation, or positive in the 64 bits. This integer could be store like an ID in a database. This don´t have collision problems, because it fits into 32bits variable, once you convert it to decimal with the hexdec() function.
First of all, md5 is basically compromised, so you shouldn't be using it for anything but non-critical hashing.
PHP5 has the hash() function, see http://www.php.net/manual/en/function.hash.php.
Setting the last parameter to true will give you a string of binary data. Alternatively, you could split the resulting hexadecimal hash into pieces of 2 characters and convert them to integers individually, but I'd expect that to be much slower.
Try hashid.
It hash a number into format you can define. The formats include how many character, and what character included.
Example:
$hashids->encode(1);
Will return "28630" depends on your format,
Just use my manual hash method below:
Divide the number (e.g. 6 digit) by prime values, 3,5,7.
And get the first 6 values that are in the decimal places as the ID to be used. Do a check on uniqueness before actual creation of the ID, if a collision exists, increase the last digit by +1 until a non collision.
E.g. 123456 gives you 771428
123457 gives you 780952
123458 gives you 790476.

How to decrease runtime for generating permutations of a string?

I have written a function that takes in a MD5 hashvalue and finds its input/original value by permuting all possible combinations of a string. As per BIT_CHEETAH's answer on a SO question:
... you cannot decrypt MD5 without attempting something like brute force hacking which is extremely resource intensive, not practical, and unethical.
(Source: encrypt and decrypt md5)
I'm well aware of this, however, I am using this scenario to implement a string permutation function. I would also like to stick to the recursive methodology as opposed to others. The best summary of doing this is probably summarised by Mark Byers post:
- Try each of the letters in turn as the first letter and then find all
the permutations of the remaining letters using a recursive call.
- The base case is when the input is an empty string the only permutation is the empty string.
(Generating all permutations of a given string)
Anyway, so I implemented this and got the following:
function matchMD5($possibleChars, $md5, $concat, $length) {
for($i = 0; $i < strlen($possibleChars); $i++) {
$ch = $possibleChars[$i];
$concatSubstr = $concat.$ch;
if(strlen($concatSubstr) != $length) {
matchMD5($possibleChars, $md5, $concatSubstr, $length);
}
else if(strlen($concatSubstr) == $length) {
$tryHash = hash('md5', $concatSubstr);
if ($tryHash == $md5) {
echo "Match! $concatSubstr ";
return $concatSubstr;
}
}
}
}
Works 100%, however when I pass in a four character array, my server runs 10.7 seconds to generate a match where the match lies approximately 1/10th of the way of all possible permutations. My valid characters in which the functions permutes, called, $possibleChars, contains all alphanumeric characters plus a few selected punctionations:
0123456789.,;:abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
Question: Can the above code be written to run faster somehow?
When doing brute-force, you have to run through all the possibilities, there is not way of cutting a corner there. So you are left with profiling your code to find out what the application spends the most time doing and then trying to optimize that.

How to generate unique secure random string in PHP?

I want to add random string as token for form submission which is generated unique forever. I have spent to much time with Google but I am confused which combination to use?
I found so many ways to do this when I googled:
1) Combination of character and number.
2) Combination of character, number and special character.
3) Combination of character, number, special character and date time.
Which combination may i use?
How many character of random string may I generate.?
Any other method which is secure then please let me know.?
Here are some considerations:
Alphabet
The number of characters can be considered the alphabet for the encoding. It doesn't affect the string strength by itself but a larger alphabet (numbers, non-alpha-number characters, etc.) does allow for shorter strings of similar strength (aka keyspace) so it's useful if you are looking for shorter strings.
Input Values
To guarantee your string to be unique, you need to add something which is guaranteed to be unique.
Random value is a good seed value if you have a good random number generator
Time is a good seed value to add but it may not be unique in a high traffic environment
User ID is a good seed value if you assume a user isn't going to create sessions at the exact same time
Unique ID is something the system guarantees is unique. This is often something that the server will guarantee / verify is unique, either in a single server deployment or distributed deployment. A simple way to do this is to add a machine ID and machine unique ID. A more complicated way to do this is to assign key ranges to machines and have each machine manage their key range.
Systems that I've worked with that require absolute uniqueness have added a server unique id which guarantees a item is unique. This means the same item on different servers would be seen as different, which was what was wanted here.
Approach
Pick one more input values that matches your requirement for uniqueness. If you need absolute uniqueness forever, you need something that you control that you are sure is unique, e.g. a machine associated number (that won't conflict with others in a distributed system). If you don't need absolute uniqueness, you can use a random number with other value such as time. If you need randomness, add a random number.
Use an alphabet / encoding that matches your use case. For machine ids, encodings like hexadecimal and base 64 are popular. For machine-readable ids, for case-insensitive encodings, I prefer base32 (Crockford) or base36 and for case-sensitive encodings, I prefer base58 or base62. This is because these base32, 36, 58 and 62 produce shorter strings and (vs. base64) are safe across multiple uses (e.g. URLs, XML, file names, etc.) and don't require transformation between different use cases.
You can definitely get a lot fancier depending on your needs, but I'll just throw this out there since it's what I use frequently for stuff like what you are describing:
md5(rand());
It's quick, simple and easy to remember. And since it's hexadecimal it plays nicely with others.
Refer to this SO Protected Question. This might be what you are looking.
I think its better to redirect you to a previously asked question which has more substantive answers.You will find a lot of options.
Try the code, for function getUniqueToken() which returns you unique string of length 10 (default).
/*
This function will return unique token string...
*/
function getUniqueToken($tokenLength = 10){
$token = "";
//Combination of character, number and special character...
$combinationString = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789*#&$^";
for($i=0;$i<$tokenLength;$i++){
$token .= $combinationString[uniqueSecureHelper(0,strlen($combinationString))];
}
return $token;
}
/*
This helper function will return unique and secure string...
*/
function uniqueSecureHelper($minVal, $maxVal) {
$range = $maxVal - $minVal;
if ($range < 0) return $minVal; // not so random...
$log = log($range, 2);
$bytes = (int) ($log / 8) + 1; // length in bytes
$bits = (int) $log + 1; // length in bits
$filter = (int) (1 << $bits) - 1; // set all lower bits to 1
do {
$rnd = hexdec(bin2hex(openssl_random_pseudo_bytes($bytes)));
$rnd = $rnd & $filter; // discard irrelevant bits
} while ($rnd >= $range);
return $minVal + $rnd;
}
Use this code (two function), you can increase string length by passing int parameter like getUniqueToken(15).
I use your 2nd idea (Combination of character, number and special character), which you refine after googling. I hope my example will help you.
You should go for 3 option. Because it has date and time so it become every time unique.
And for method have you tried
str_shuffle($string)
Every time it generates random string from $string.
End then use substr
($string , start , end)
to cut it down.
End if you want date and time then concatenate the result string with it.
An easily understandable and effective code to generate random strings in PHP. I do not consider predictability concerns important in this connection.
<?php
$d = str_shuffle('0123456789');
$C = str_shuffle('ABCDEFGHIJKLMNOPQRSTUVWXYZ');
$m = str_shuffle('abcdefghijklmnopqrstuvwxyz');
$s = str_shuffle('#!$&()*+-_~');
$l=9; //min 4
$r=substr(str_shuffle($d.$C.$m.$s),0,$l);echo $r.'<br>';
$safe=substr($d,0,1).substr($C,0,1).substr($m,0,1).mb_substr($s,0,1);
$r=str_shuffle($safe.substr($r,0,$l-4));//always at least one digit, special, small and capital
// this also allows for 0,1 or 2 of each available characters in string
echo $r;
exit;
?>
For unique string use uniqid().
And to make it secure, use hashing algorithms
for example :
echo md5(uniqid())

How to generate unguessable "tiny url" based on an id?

I'm interested in creating tiny url like links. My idea was to simply store an incrementing identifier for every long url posted and then convert this id to it's base 36 variant, like the following in PHP:
$tinyurl = base_convert($id, 10, 36)
The problem here is that the result is guessable, while it has to be hard to guess what the next url is going to be, while still being short (tiny). Eg. atm if my last tinyurl was a1, the next one will be a2. This is a bad thing for me.
So, how would I make sure that the resulting tiny url is not as guessable but still short?
What you are asking for is a balance between reduction of information (URLs to their indexes in your database), and artificial increase of information (to create holes in your sequence).
You have to decide how important both is for you. Another question is whether you just do not want sequential URLs to be guessable, or have them sufficiently random to make guessing any valid URL difficult.
Basically, you want to declare n out of N valid ids. Choose N smaller to make the URLs shorter, and make n smaller to generate URLs that are difficult to guess. Make n and N larger to generate more URLs when the shorter ones are taken.
To assign the ids, you can just take any kind of random generator or hash function and cap this to your target range N. If you detect a collision, choose the next random value. If you have reached a count of n unique ids, you must increase the range of your ID set (n and N).
I would simply crc32 url
$url = 'http://www.google.com';
$tinyurl = hash('crc32', $url ); // db85f073
cons: constant 8 character long identifier
This is really cheap, but if the user doesn't know it's happening then it's not as guessable, but prefix and postfix the actual id with 2 or 3 random numbers/letters.
If I saw 9d2a1me3 I wouldn't guess that dm2a2dq2 was the next in the series.
Try Xor'ing the $id with some value, e.g. $id ^ 46418 - and to convert back to your original id you just perform the same Xor again i.e. $mungedId ^ 46418. Stack this together with your base_convert and perhaps some swapping of chars in the resultant string and it'll get quite tricky to guess a URL.
Another way would be to set the maximum number of characters for the URL (let's say it's n). You could then choose a random number between 1 and n!, which would be your permutation number.
On which new URL, you would increment the id and use the permutation number to associate the actual id that would be used. Finally, you would base 32 (or whatever) encode your URL. This would be perfectly random and perfectly reversible.
If you want an injective function, you can use any form of encryption. For instance:
<?php
$key = "my secret";
$enc = mcrypt_ecb (MCRYPT_3DES, $key, "42", MCRYPT_ENCRYPT);
$f = unpack("H*", $enc);
$value = reset($f);
var_dump($value); //string(16) "1399e6a37a6e9870"
To reverse:
$rf = pack("H*", $value);
$dec = rtrim(mcrypt_ecb (MCRYPT_3DES, $key, $rf, MCRYPT_DECRYPT), "\x00");
var_dump($dec); //string(2) "42"
This will not give you a number in base 32; it will give you the encrypted data with each byte converted to base 16 (i.e., the conversion is global). If you really need, you can trivially convert this to base 10 and then to base 32 with any library that supports big integers.
You can pre-define the 4-character codes in advance (all possible combinations), then randomize that list and store it in this random order in a data table. When you want a new value, just grab the first one off the top and remove it from the list. It's fast, no on-the-fly calculation, and guarantees pseudo-randomness to the end-user.
Hashids is an open-source library that generates short, unique, non-sequential, YouTube-like ids from one or many numbers. You can think of it as an algorithm to obfuscate numbers.
It converts numbers like 347 into strings like "yr8", or array like [27, 986] into "3kTMd". You can also decode those ids back. This is useful in bundling several parameters into one or simply using them as short UIDs.
Use it when you don't want to expose your database ids to the user.
It allows custom alphabet as well as salt, so ids are unique only to you.
Incremental input is mangled to stay unguessable.
There are no collisions because the method is based on integer to hex conversion.
It was written with the intent of placing created ids in visible places, like the URL. Therefore, the algorithm avoids generating most common English curse words.
Code example
$hashids = new Hashids();
$id = $hashids->encode(1, 2, 3); // o2fXhV
$numbers = $hashids->decode($id); // [1, 2, 3]
I ended up creating a md5 sum of the identifier, use the first 4 alphanumerics of it and if this is a duplicate simply increment the length until it is no longer a duplicate.
function idToTinyurl($id) {
$md5 = md5($id);
for ($i = 4; $i < strlen($md5); $i++) {
$possibleTinyurl = substr($md5, 0, $i);
$res = mysql_query("SELECT id FROM tabke WHERE tinyurl='".$possibleTinyurl."' LIMIT 1");
if (mysql_num_rows($res) == 0) return $possibleTinyurl;
}
return $md5;
}
Accepted relet's answer as it's lead me to this strategy.

generate a 5 char long string 0-9 a-z

I need a function in php that can generate a 5 char long string with numbers and a-z
What should I look into?
Other's have already provided you with correct answers, but here's a one-liner, just for the sake of it:
$code = substr(str_shuffle('0123456789abcdefghijklmnopqrstuvwxyz'), 0, 5);
str_shuffle randomizes the above string, then substr takes the first 5 letters of that shuffled string. Simple.
As noted in comments for this answer, this function only generates strings that have only unique characters. If one would like to have the strings where even "aaaaa" is possible, here's a little function that allows just that:
function generate($len) {
return substr(str_shuffle(str_repeat('0123456789abcdefghijklmnopqrstuvwxyz', $len)), 0, $len);
}
echo generate(5);
str_repeat repeats the 0-9a-z string $len times, so every letter has an almost equal possibility for every position. (Read the comments on why only "almost")
For kicks, here's an alternate approach:
//create a random base 36 string
$str = base_convert(rand(), 10, 36);
substr and concatenate as necessary to satisfy length requirements.
This will not give unique characters (e.g., 'aa11a' would be a possible output) -- which may or may not be what the OP wants. Also, the fact that you may need to run the function multiple times to get a string of the requested length means performance may not be spectacular, but if you're only calling this function once or twice, it won't matter.
Here's a more complete implementation:
function randstr($len) {
$currLen = 0;
$value = '';
while($currLen < $len) {
$new = base_convert(rand(), 10, 36);
$value .= $new;
$currLen += strlen($new);
}
//$value may be longer than the requested $len
return substr($value, 0, $len);
}
It's also worth noting that this string will be of less-than-perfect randomness -- the first char of each string output by base_convert will have a bias toward the lower end of the spectrum (as rand() will not completely fill a whole char's worth of bits every time). Ideally, you want a number of bits out of rand that will exactly fill some number of base-36 chars.
Using a source of entropy that gives you more bits than you need for the string in the first place (like /dev/urandom) would resolve this issue. But for most applications, the loss of entropy won't matter enough to justify the overhead of reading /dev/urandom.
Alternately, you could simply throw away the first char of each base_convert() call.
Here’s some example generator:
$length = 5;
$charset = '0123456789abcdefghijklmnopqrstuvwxyz';
$str = '';
while ($length--) {
$str .= $charset[rand() % count($charset)];
}
Do you mean a random string?
If so, simply create a string containing the 36 characters, generate 5 random numbers and create the string based on the character positions (pseudo-code):
string src = "0123456789abcdefghijklmnopqrstuvwxyz"
string dst = ""
for i = 1 to 5:
dst = dst + src[random(len(src))]
If you want 5 unique characters, you do the same thing but with one slight difference.
Generate the first random number in the range 0 through 35, the second in the range 0 through 34 and so on.
Then, as you add the character from the source string to your own string, replace the used character in the source string with the last character in the source string. This will prevent the same character from being selected twice:
string src = "0123456789abcdefghijklmnopqrstuvwxyz"
int srclen = len(src)
string dst = ""
for i = 1 to 5:
idx = randon(len(src))
dst = dst + src[idx]
src[idx] = src[srclen-1]
srclen = srclen - 1
Aside: #Tatu has provided a simple solution using str_shuffle which is a more elegant way of doing that last method (unique characters) but I'm not convinced it's the most efficient way since it's likely to involve a lot of swaps to get a decent shuffle. The method here seems to me to be more likely to be faster.
Keep that in mind if performance is important, but also keep in mind that I haven't tested how good it is - it may be fast enough - it may even blow my solution out of the water :-) As with all performance-related things, measure, don't guess.
http://php.net/manual/en/function.chr.php, in case you dislike hardcoding the alphabet.
Yet another approach, for funsies. Could be shortened - here it's a bit verbose, for readability.
function rand_string($length=5) {
$values = str_split("abcdefghijklmnopqrstuvwxyz0123456789"));
shuffle($values);
$values = array_flip($values);
$string = implode(array_rand($lenght));
return $string;
}

Categories