I have a script to convert to base 62 (A-Za-z0-9) but how do I get a number out of MD5?
I have read in many places that because the number from an MD5 is bigger than php can handle as an integer it will be inaccurate... As I want a short URL anyway and was not planning on using the whole hash, maybe just 8 characters of it....
So my question is how to get part of the number of an MD5 hash?
Also is it a bad idea to use only part of the MD5 hash?
I'm going to suggest a different thing here.. Since you are only interested in using a decimal chunk of the md5 hash why don't you use any other short numeric hash like CRC32 or Adler? Here is an example:
$hash = sprintf('%u', crc32('your string here'));
This will produce a 8 digit hash of your string.
EDIT: I think I misunderstood you, here are some functions that provide conversions to and from bases up to 62.
EDIT (Again): To work with arbitrary length numbers you must use either the bc_math or the GMP extension, here is a function that uses the bc_math extension and can also convert from base 2 up to base 62. You should use it like this:
echo bc_base_convert(md5('your url here'), 16, 62); // public base 62 hash
and the inverse:
echo bc_base_convert('base 62 encoded value here', 62, 16); // private md5 hash
Hope it helps. =)
If it's possible, I'd advise not using a hash for your URLs. Eventually you'll run into collisions... especially if you're truncating the hash. If you go ahead and implement an id-based system where each item has a unique ID, there will be far fewer headaches. The first item will be 1, the second'll be 2, etc---if you're using MySQL, just throw in an autoincrement column.
To make a short id:
//the basic example
$sid = base_convert($id, 10, 36);
//if you're going to be needing 64 bit numbers converted
//on a 32 bit machine, use this instead
$sid = gmp_strval(gmp_init($id, 10), 36);
To make a short id back into the base-10 id:
//the basic example
$id = base_convert($id, 36, 10);
//if you're going to be needing 64 bit numbers
//on a 32 bit machine, use this instead
$id = gmp_strval(gmp_init($shortid, 36));
Hope this helps!
If you're truly wanting base 62 (which can't be done with gmp or base_convert), check this out:
http://snipplr.com/view/22246/base62-encode--decode/
You can do this like this: (Not all steps are in php, it's been a long time that I've used it.)
Create a md5 hash of the script like this:
$hash = md5(script, raw_output=true);
Convert that number to base 62.
See the questions about base conversion of arbitrary sized numbers in PHP
Truncate the string to a length you like.
There's no risk in using only a few of the bits of a md5. All that changes is danger of collisions.
There actually is a Java implementation which you could probably extract. It's an open-source CMS solution called Pulse.
Look here for the code of toBase62() and fromBase62().
http://pulse.torweg.org/javadoc/src-html/org/torweg/pulse/util/StringUtils.java.html
The only dependency in StringUtils is the LifeCycle-class which provides a way to get a salted hash for a string which you might even omit all together or just copy the method over to your copy StringUtils. Voilá.
You can do something like this,
$hash = md5("The data to be hashed", true);
$ints = unpack("L*num", $hash);
$hash_str = base62($ints['num1']) . base62($ints['num2']) . base62($ints['num3']) . base62($ints['num4'])
As of PHP 5.3.2, GMP supports bases up to 62 (was previously only 36), so brianreavis's suggestion was very close. I think the simplest answer to your question is thus:
function base62hash($source, $chars = 22) {
return substr(gmp_strval(gmp_init(md5($source), 16), 62), 0, $chars);
}
Converting from base-16 to base-62 obviously has space benefits. A normal 128-bit MD5 hash is 32 chars in hex, but in base-62 it's only 22. If you're storing the hashes in a database, you can convert them to raw binary and save even more space (16 bytes for an MD5).
Since the resulting hash is just a string representation, you can just use substr if you only want a bit of it (as the function does).
You may try base62x to get a safe and compatible encoded representation.
Here is for more information about base62x, or simply -base62x in -NatureDNS.
shell> ./base62x -n 16 -enc 16AF
1Ql
shell> ./base62x -n 16 -dec 1Ql
16AF
shell> ./base62x
Usage: ./base62x [-v] [-n <2|8|10|16|32>] <-enc|dec> string
Version: 0.60
Here is an open-source Java library that converts MD5 strings to Base62 strings
https://github.com/inder123/base62
Md5ToBase62.toBase62("9e107d9d372bb6826bd81d3542a419d6") ==> cbIKGiMVkLFTeenAa5kgO4
Md5ToBase62.fromBase62("4KfZYA1udiGCjCEFC0l") ==> 0000bdd3bb56865852a632deadbc62fc
The conversion is two-way, so you will get the original md5 back if you convert it back to md5:
Md5ToBase62.fromBase62(Md5ToBase62.toBase62("9e107d9d372bb6826bd81d3542a419d6")) ==> 9e107d9d372bb6826bd81d3542a419d6
Md5ToBase62.toBase62(Md5ToBase62.fromBase62("cbIKGiMVkLFTeenAa5kgO4")) . ==> cbIKGiMVkLFTeenAa5kgO4
```
You could use a slightly modified Base 64 with - and _ instead of + and /:
function base64_url_encode($str) {
return strtr(base64_encode($str), array('+'=>'-', '/'=>'_'));
}
function base64_url_decode($str) {
return base64_decode(strtr($str, array('-'=>'+', '_'=>'/')));
}
Additionally you could remove the trailing padding = characters.
And to get the raw MD5 value (binary string), set the second parameter (named $raw_output in the manual) to true:
$raw_md5 = md5($str, true);
Related
My client generates promotional coupon codes which are nothing but 32 char MD5 hashes.
My job is to reduce the MD5 string from 32 chars to less than 10 chars in a way that the hash can be recreated from the reduced string.
The reduction is important as it would be easier for users to reproduce the reduced hash.
For e.g.: 719bedacf2e560b27f39d80accc67ffd => ZjKa1Gh (not mathematically true)
I came across this: How to reduce hash value's length?
It suggests: Using a different base
I am clueless as to how to do this in PHP, can we decode a string to its ASCII and re-encode it?
Are there any in-built functions in PHP that I can use in this case?
Update using https://packagist.org/packages/aza/math
$original = '719bedacf2e560b27f39d80accc67ffd';
$long1 = NumeralSystem::convert($original, 16, 10);
$short = NumeralSystem::convertTo($long1, 62);
$long2 = NumeralSystem::convertFrom($short, 62);
$recovered = NumeralSystem::convert($long2, 10, 16);
var_dump($long1);
var_dump($short);
var_dump($long2);
var_dump($recovered);
// output
string(39) "151012390170261082849236619706853916669"
string(22) "3SNOKWefotgnnCmWnYkTOf"
string(39) "151012390170261082849236619706853916669"
string(32) "719bedacf2e560b27f39d80accc67ffd"
Seems like the lowest I can reach from 32 chars MD5 is 22 chars this way. I am still looking for ways in which I can further reduce it to 10 chars.
Update: Using first half of MD5
$original = '719bedacf2e560b';
$coupon = NumeralSystem::convert($original, 16, 62);
$recovered = NumeralSystem::convert($coupon, 62, 16);
var_dump($coupon);
var_dump($recovered);
// output
string(10) "bnMR3RjZil"
string(15) "719bedacf2e560b"
If the user is providing bnMR3RjZil I can use that to recreate 719bedacf2e560b and then do a MySQL LIKE search to get the full MD5. If it returns a row I can then get forward with the promotional activity.
My job is to reduce the MD5 string from 32 chars to less than 10 chars in a way that the hash can be recreated from the reduced string.
That isn't possible. A MD5 hash is 128 bits; an ASCII character is 7 bits. There's no way to store an MD5 hash in any less than 128÷7 = 18.2 (round up to 19) ASCII characters, and even that would include nonprintable control characters.
A cryptographic hash is in fact a sequence of bits but it can be interpreted as a number. As such, you could theoretically use good old base_convert() to express is in a large base. Unfortunately, this function only works up to base 36 and it's restricted to actual numbers (i.e. short integers that fit into PHP_INT_MAX)—otherwise data loss happens.
Here's where third-party libraries come to the rescue. The only problem is that they tend to be difficult to find because they normally address very specific use cases (Bitcoin, ID obfuscation, etc.).
I found e.g. aza/math, which is probably overkill but should get the job done. I haven't had the chance of testing it but it should go like this:
$original = '719bedacf2e560b27f39d80accc67ffd';
$short = NumeralSystem::convert($original, 16, 62);
$recovered = NumeralSystem::convert($short, 62, 16);
The method of using a different base could proceed as follows. Note that the code below is just to illustrate the method, in order to implement it efficiently, one would need to work directly with the binary representation.
The idea is that you interpret your input string as a sequence of 128 bits. Now, if you specify that your new alphabet (characters of the new base system) is A-Za-z0-9+-, you have 64 characters which means that you need 6 bits to encode each one of them. Therefore you can first convert your input string to binary representation, split this representation into chunks of 6 bits, and express each chunk within the specified character set A-Za-z0-9+-:
<?php
$s = "719bedacf2e560b27f39d80accc67ffd";
function conv($s){
$ret = base_convert($s, 16, 2);
return str_repeat("0", 8 - strlen($ret)) . $ret;
}
$binary_repr = implode(array_map(conv, str_split($s, 2)), '');
$items = str_split($binary_repr, 6);
function item2char($str){
$code = base_convert($str, 2, 10);
$alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+-';
return $alphabet[$code];
}
$result = implode(array_map(item2char, $items), '');
echo $result;
?>
As noted in the comments, this is basically the idea behind:
<?php
$s = "719bedacf2e560b27f39d80accc67ffd";
echo base64_encode(hex2bin($s));
//cZvtrPLlYLJ/OdgKzMZ//Q==
echo bin2hex(base64_decode("cZvtrPLlYLJ/OdgKzMZ//Q=="));
//719bedacf2e560b27f39d80accc67ffd
?>
I need to generate a strong unique API key.
Can anyone suggest the best solution for this? I don't want to use rand() function to generate random characters. Is there an alternative solution?
As of PHP 7.0, you can use the random_bytes($length) method to generate a cryptographically-secure random string. This string is going to be in binary, so you'll want to encode it somehow. A straightforward way of doing this is with bin2hex($binaryString). This will give you a string $length * 2 bytes long, with $length * 8 bits of entropy to it.
You'll want $length to be high enough such that your key is effectively unguessable and that the chance of there being another key being generated with the same value is practically nil.
Putting this all together, you get this:
$key = bin2hex(random_bytes(32)); // 64 characters long
When you verify the API key, use only the first 32 characters to select the record from the database and then use hash_equals() to compare the API key as given by the user against what value you have stored. This helps protect against timing attacks. ParagonIE has an excellent write-up on this.
For an example of the checking logic:
$token = $request->bearerToken();
// Retrieve however works best for your situation,
// but it's critical that only the first 32 characters are used here.
$users = app('db')->table('users')->where('api_key', 'LIKE', substr($token, 0, 32) . '%')->get();
// $users should only have one record in it,
// but there is an extremely low chance that
// another record will share a prefix with it.
foreach ($users as $user) {
// Performs a constant-time comparison of strings,
// so you don't leak information about the token.
if (hash_equals($user->api_token, $token)) {
return $user;
}
}
return null;
Bonus: Slightly More Advanced Use With Base64 Encoding
Using Base64 encoding is preferable to hexadecimal for space reasons, but is slightly more complicated because each character encodes 6 bits (instead of 4 for hexadecimal), which can leave the encoded value with padding at the end.
To keep this answer from dragging on, I'll just put some suggestions for handling Base64 without their supporting arguments. Pick a $length greater than 32 that is divisible by both 3 and 2. I like 42, so we'll use that for $length. Base64 encodings are of length 4 * ceil($length / 3), so our $key will be 56 characters long. You can use the first 28 characters for selection from your storage, leaving another 28 characters on the end that are protected from leaking by timing attacks with hash_equals.
Bonus 2: Secure Key Storage
Ideally, you should be treating the key much like a password. This means that instead of using hash_equals to compare the full string, you should hash the remainder of the key like a password, store that separately than the first half of your key (which is in plain-text), use the first half for selection from your database and verify the latter half with password_verify.
using mcrypt:
<?php
$bytes = mcrypt_create_iv(4, MCRYPT_DEV_URANDOM);
$unpack = unpack("Nint", $bytes);
$id = $unpack['int'] & 0x7FFFFFFF;
PHP has uniqid function http://php.net/manual/en/function.uniqid.php with optional prefix and you can even add additional entropy to further avoid collision. But if you absolutely possitevily need something unique you should not use anything with randomness in it.
This is the best solution i found.
http://www.php.net/manual/en/function.uniqid.php#94959
I'm using this code:
$url = "http://www.webtoolkit.info/javascript-base64.html";
print base64_encode($url);
But the result is very long: "aHR0cDovL3d3dy53ZWJ0b29sa2l0LmluZm8vamF2YXNjcmlwdC1iYXNlNjQuaHRtbA=="
There is a way to transform long string to short encryption and to be able to transform?
for example:
new_encrypt("http://www.webtoolkit.info/javascript-base64.html")
Result: "431ASDFafk2"
encoding is not encrypting. If you're depending on this for security then you're in for a very nasty shock in the future.
Base 64 encoding is intended for converting data that's 8 bits wide into a format that can be sent over a communications channel that uses 6 or 7 bits without loss of data. As 6 bits is less than 8 bits the encoded string is obviously going to be longer than the original.
This q/a might have what you're looking for:
An efficient compression algorithm for short text strings
It actually links here:
http://github.com/antirez/smaz/tree/master
I did not test it, just found the links.
First off, base64 is an encoding standard and it is not meant to encrypt data, so don't use that. The reason your data is so much longer is that for every 6 bits in the input string, base64 will output 8 bits.
There is no form of encryption that will directly output a shortened string. The result will be just as long in the best case.
A solution to that problem would be to gzip your string and then encrypt it, but with your URL the added data for the zip format will still end up making your output longer than the input.
There are a many different algorithms for encrypting/decryption. You can take a look at the following documentation: http://www.php.net/manual/en/function.mcrypt-list-algorithms.php (this uses mcrypt with different algorithms).
...BUT, you can't force something to be really small (depends on the size you want). The encrypted string needs to have all the information available to be able to decrypt it. Anyways, a base64-string is not that long (compared with really secure salted hashes for example).
I don't see the problem.
Well... you could try using md5() or uniqid().
The first one generate the md5 hash of your string.
md5("http://www.webtoolkit.info/javascript-base64.html");
http://php.net/manual/en/function.md5.php
The second one generates a 13 unique id and then you can create a relation between your string and that id.
http://php.net/manual/en/function.uniqid.php
P.S. I'm not sure of what you want to achieve but these solutions will probably satisfy you.
You can be creative and just do some 'stuff' to encrypt the url so that it is not easy quess able but encode / decode able..
like reverse strings...
or have a random 3 letters, your string encoded with base64 or just replace letters for numbers or numbers for letters and then 3 more random letters.. once you know the recipe, you can do and undo it.
$keychars = "abcdefghijklmnopqrstuvwxyz0123456789";
$length = 2;
$randkey = "";
$randkey2 = "";
for ($i=0;$i<$length;$i++) $randkey .= substr($keychars, rand(1, strlen($keychars) ), 1);
I need to generate a string using PHP, it need to be unique and need to be from 4 to 8 characters (the value of a variable).
I thought I can use crc32 hash but I can't decide how many characters, but sure it will be unique. In the other hand only create a "password generator" will generate duplicated string and checking the value in the table for each string will take a while.
How can I do that?
Thanks!
Maybe I can use that :
function unique_id(){
$better_token = md5(uniqid(rand(), true));
$unique_code = substr($better_token, 16);
$uniqueid = $unique_code;
return $uniqueid;
}
$id = unique_id();
Changing to :
function unique_id($l = 8){
$better_token = md5(uniqid(rand(), true));
$rem = strlen($better_token)-$l;
$unique_code = substr($better_token, 0, -$rem);
$uniqueid = $unique_code;
return $uniqueid;
}
echo unique_id(4);
Do you think I'll get unique string each time for a goood while?
In short, I think you'll get a pretty good random value. There's always the chance of a collision but you've done everything you can to get a random value. uniqid() returns a random value based on the current time in microseconds. Specifying rand() (mt_rand() would be better) and the second argument as true to uniqid() should make the value even more unique. Hashing the value using md5() should also make it pretty unique as even a small difference in two random values generated should be magnified by the hashing function. idealmachine is correct in that a longer value is less likely to have a collision than a shorter one.
Your function could also be shorter since md5() will always return a 32 character long string. Try this:
function unique_id($l = 8) {
return substr(md5(uniqid(mt_rand(), true)), 0, $l);
}
The problem with randomness is that you can never be sure of anything. There is a small chance you could get one number this time and the same number the next. That said, you would want to make the string as long as possible to reduce that probability. As an example of how long such numbers can be, GUIDs (globally unique identifiers) are 16 bytes long.
In theory, four hex characters (16 bits) give only 16^4 = 65536 possibilities, while eight hex characters (32 bits) give 16^8 = 4294967296. You, however, need to consider how likely it is for any two hashes to collide (the "birthday problem"). Wikipedia has a good table on how likely such a collision is. In short, four hex characters are definitely not sufficient, and eight might not be.
You may want to consider using Base64 encoding rather than hex digits; that way, you can fit 48 bits in rather than just 32 bits.
Eight bytes is 8 * 8 = 64 bits.
Reliable passwords You can only make from ascii characters a-zA-Z and numbers 0-9. To do that best way is using only cryptographically secure methods, like random_int() or random_bytes() from PHP7. Rest functions as base64_encode() You can use only as support functions to make reliability of string and change it to ASCII characters.
mt_rand() is not secure and is very old.
From any string You must use random_int(). From binary string You should use base64_encode() to make binary string reliable or bin2hex, but then You will cut byte only to 16 positions (values).
See my implementation of this functions.
Could anyone recommend a preferred algorithm to use for URL shortening? I'm coding using PHP. Initially I thought about writing something that would start at a character such as "a" and iterate through requests, creating records in a database and therefore having to increment the character to b, c, d ... A, B and so on as appropriate.
However it dawned on me that this algorithm could be pretty heavy/clumsy and there could be a better way to do it.
I read around a bit on Google and some people seem to be doing it with base conversion from the database's ID column. This isn't something I'm too familiar with.
Could someone elaborate and explain to me how this would work? A couple of code examples would be great, too.
I obviously don't want a complete solution as I would like to learn by doing it myself, but just an explanation/pseudo-code on how this would work would be excellent.
Most shortening services just use a counter that is incremented with every entry and convert the base from 10 to 64.
An implementation in PHP could look like this:
function encode($number) {
return strtr(rtrim(base64_encode(pack('i', $number)), '='), '+/', '-_');
}
function decode($base64) {
$number = unpack('i', base64_decode(str_pad(strtr($base64, '-_', '+/'), strlen($base64) % 4, '=')));
return $number[1];
}
$number = mt_rand(0, PHP_INT_MAX);
var_dump(decode(encode($number)) === $number);
The encode function takes an integer number, converts it into bytes (pack), encodes it with the Base-64 encoding (base64_encode), trims the trailing padding = (rtrim), and replaces the characters + and / by - and _ respectively (strtr). The decode function is the inverse function to encode and does the exact opposite (except adding trailing padding).
The additional use of strtr is to translate the original Base-64 alphabet to the URL and filename safe alphabet as + and / need to be encoded with the Percentage-encoding.
You can use base_convert function to do a base convertion from 10 to 36 with the database IDs.
<?php
$id = 315;
echo base_convert($id, 10, 36), "\n";
?>
Or you can reuse some of the ideas presented in the comments on the page bellow:
http://php.net/manual/en/function.base-convert.php
Assuming your PRIMARY KEY is an INT and it auto_increments, the following code will get you going =).
<?php
$inSQL = "INSERT INTO short_urls() VALUES();";
$inResult = mysql_query($inSQL);
$databaseID = base_convert(mysql_insert_id(), 10, 36);
// $databaseID is now your short URL
?>
EDIT: Included the base_convert from HGF's answer. I forgot to base_convert in the original post.
i used to break ID by algorithm similar with how to convert from decimal to hex, but it will use 62 character instead of 16 character that hex would use.
'0','1','2','3','4','5','6','7','8','9',
'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',
'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'
example : if you will change ID = 1234567890 you will get kv7yl1 as your a key.
I adopted a "light" solution. On user request I generate a unique identifier (checking for conflicts in db) with this python snipplet:
url_hash = base64.b64encode(os.urandom(int(math.ceil(0.75*7))))[:6]
and store it in db.
The native PHP base_convert() works well for small ranges of numbers, but if you really need to encode large values, consider using something like the implementation provided here which will work to base 64 and beyond if you simply provide more legal characters for the encoding.
http://af-design.com/blog/2010/08/10/working-with-big-integers-in-php/
Here try this method :
hash_hmac('joaat', "http://www.example.com/long/url/", "secretkey");
It will provide you with hash value fit for a professional url shortener, e.g: '142ecd53'