Reduce MD5 - Using a different base - php

My client generates promotional coupon codes which are nothing but 32 char MD5 hashes.
My job is to reduce the MD5 string from 32 chars to less than 10 chars in a way that the hash can be recreated from the reduced string.
The reduction is important as it would be easier for users to reproduce the reduced hash.
For e.g.: 719bedacf2e560b27f39d80accc67ffd => ZjKa1Gh (not mathematically true)
I came across this: How to reduce hash value's length?
It suggests: Using a different base
I am clueless as to how to do this in PHP, can we decode a string to its ASCII and re-encode it?
Are there any in-built functions in PHP that I can use in this case?
Update using https://packagist.org/packages/aza/math
$original = '719bedacf2e560b27f39d80accc67ffd';
$long1 = NumeralSystem::convert($original, 16, 10);
$short = NumeralSystem::convertTo($long1, 62);
$long2 = NumeralSystem::convertFrom($short, 62);
$recovered = NumeralSystem::convert($long2, 10, 16);
var_dump($long1);
var_dump($short);
var_dump($long2);
var_dump($recovered);
// output
string(39) "151012390170261082849236619706853916669"
string(22) "3SNOKWefotgnnCmWnYkTOf"
string(39) "151012390170261082849236619706853916669"
string(32) "719bedacf2e560b27f39d80accc67ffd"
Seems like the lowest I can reach from 32 chars MD5 is 22 chars this way. I am still looking for ways in which I can further reduce it to 10 chars.
Update: Using first half of MD5
$original = '719bedacf2e560b';
$coupon = NumeralSystem::convert($original, 16, 62);
$recovered = NumeralSystem::convert($coupon, 62, 16);
var_dump($coupon);
var_dump($recovered);
// output
string(10) "bnMR3RjZil"
string(15) "719bedacf2e560b"
If the user is providing bnMR3RjZil I can use that to recreate 719bedacf2e560b and then do a MySQL LIKE search to get the full MD5. If it returns a row I can then get forward with the promotional activity.

My job is to reduce the MD5 string from 32 chars to less than 10 chars in a way that the hash can be recreated from the reduced string.
That isn't possible. A MD5 hash is 128 bits; an ASCII character is 7 bits. There's no way to store an MD5 hash in any less than 128÷7 = 18.2 (round up to 19) ASCII characters, and even that would include nonprintable control characters.

A cryptographic hash is in fact a sequence of bits but it can be interpreted as a number. As such, you could theoretically use good old base_convert() to express is in a large base. Unfortunately, this function only works up to base 36 and it's restricted to actual numbers (i.e. short integers that fit into PHP_INT_MAX)—otherwise data loss happens.
Here's where third-party libraries come to the rescue. The only problem is that they tend to be difficult to find because they normally address very specific use cases (Bitcoin, ID obfuscation, etc.).
I found e.g. aza/math, which is probably overkill but should get the job done. I haven't had the chance of testing it but it should go like this:
$original = '719bedacf2e560b27f39d80accc67ffd';
$short = NumeralSystem::convert($original, 16, 62);
$recovered = NumeralSystem::convert($short, 62, 16);

The method of using a different base could proceed as follows. Note that the code below is just to illustrate the method, in order to implement it efficiently, one would need to work directly with the binary representation.
The idea is that you interpret your input string as a sequence of 128 bits. Now, if you specify that your new alphabet (characters of the new base system) is A-Za-z0-9+-, you have 64 characters which means that you need 6 bits to encode each one of them. Therefore you can first convert your input string to binary representation, split this representation into chunks of 6 bits, and express each chunk within the specified character set A-Za-z0-9+-:
<?php
$s = "719bedacf2e560b27f39d80accc67ffd";
function conv($s){
$ret = base_convert($s, 16, 2);
return str_repeat("0", 8 - strlen($ret)) . $ret;
}
$binary_repr = implode(array_map(conv, str_split($s, 2)), '');
$items = str_split($binary_repr, 6);
function item2char($str){
$code = base_convert($str, 2, 10);
$alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+-';
return $alphabet[$code];
}
$result = implode(array_map(item2char, $items), '');
echo $result;
?>
As noted in the comments, this is basically the idea behind:
<?php
$s = "719bedacf2e560b27f39d80accc67ffd";
echo base64_encode(hex2bin($s));
//cZvtrPLlYLJ/OdgKzMZ//Q==
echo bin2hex(base64_decode("cZvtrPLlYLJ/OdgKzMZ//Q=="));
//719bedacf2e560b27f39d80accc67ffd
?>

Related

how to create a row of digits based on a string in php [duplicate]

In php is there a way to give a unique hash from a string, but that the hash was made up from numbers only?
example:
return md5(234); // returns 098f6bcd4621d373cade4e832627b4f6
but I need
return numhash(234); // returns 00978902923102372190
(20 numbers only)
the problem here is that I want the hashing to be short.
edit:
OK let me explain the back story here.
I have a site that has a ID for every registered person, also I need a ID for the person to use and exchange (hence it can't be too long), so far the ID numbering has been 00001, 00002, 00003 etc...
this makes some people look more important
this reveals application info that I don't want to reveal.
To fix point 1 and 2 I need to "hide" the number while keeping it unique.
Edit + SOLUTION:
Numeric hash function based on the code by https://stackoverflow.com/a/23679870/175071
/**
* Return a number only hash
* https://stackoverflow.com/a/23679870/175071
* #param $str
* #param null $len
* #return number
*/
public function numHash($str, $len=null)
{
$binhash = md5($str, true);
$numhash = unpack('N2', $binhash);
$hash = $numhash[1] . $numhash[2];
if($len && is_int($len)) {
$hash = substr($hash, 0, $len);
}
return $hash;
}
// Usage
numHash(234, 20); // always returns 6814430791721596451
An MD5 or SHA1 hash in PHP returns a hexadecimal number, so all you need to do is convert bases. PHP has a function that can do this for you:
$bignum = hexdec( md5("test") );
or
$bignum = hexdec( sha1("test") );
PHP Manual for hexdec
Since you want a limited size number, you could then use modular division to put it in a range you want.
$smallnum = $bignum % [put your upper bound here]
EDIT
As noted by Artefacto in the comments, using this approach will result in a number beyond the maximum size of an Integer in PHP, and the result after modular division will always be 0. However, taking a substring of the hash that contains the first 16 characters doesn't have this problem. Revised version for calculating the initial large number:
$bignum = hexdec( substr(sha1("test"), 0, 15) );
You can try crc32(). See the documentation at: http://php.net/manual/en/function.crc32.php
$checksum = crc32("The quick brown fox jumped over the lazy dog.");
printf("%u\n", $checksum); // prints 2191738434
With that said, crc should only be used to validate the integrity of data.
There are some good answers but for me the approaches seem silly.
They first force php to create a Hex number, then convert this back (hexdec) in a BigInteger and then cut it down to a number of letters... this is much work!
Instead why not
Read the hash as binary:
$binhash = md5('[input value]', true);
then using
$numhash = unpack('N2', $binhash); //- or 'V2' for little endian
to cast this as two INTs ($numhash is an array of two elements). Now you can reduce the number of bits in the number simply using an AND operation. e.g:
$result = $numhash[1] & 0x000FFFFF; //- to get numbers between 0 and 1048575
But be warned of collisions! Reducing the number means increasing the probability of two different [input value] with the same output.
I think that the much better way would be the use of "ID-Crypting" with a Bijectiv function. So no collisions could happen! For the simplest kind just use an Affine_cipher
Example with max input value range from 0 to 25:
function numcrypt($a)
{
return ($a * 15) % 26;
}
function unnumcrypt($a)
{
return ($a * 7) % 26;
}
Output:
numcrypt(1) : 15
numcrypt(2) : 4
numcrypt(3) : 19
unnumcrypt(15) : 1
unnumcrypt(4) : 2
unnumcrypt(19) : 3
e.g.
$id = unnumcrypt($_GET('userid'));
... do something with the ID ...
echo ' go ';
of course this is not secure, but if no one knows the method used for your encryption then there are no security reasons then this way is faster and collision safe.
The problem of cut off the hash are the collisions, to avoid it try:
return hexdec(crc32("Hello World"));
The crc32():
Generates the cyclic redundancy checksum polynomial of 32-bit lengths
of the str. This is usually used to validate the integrity of data
being transmitted.
That give us an integer of 32 bit, negative in 32 bits installation, or positive in the 64 bits. This integer could be store like an ID in a database. This don´t have collision problems, because it fits into 32bits variable, once you convert it to decimal with the hexdec() function.
First of all, md5 is basically compromised, so you shouldn't be using it for anything but non-critical hashing.
PHP5 has the hash() function, see http://www.php.net/manual/en/function.hash.php.
Setting the last parameter to true will give you a string of binary data. Alternatively, you could split the resulting hexadecimal hash into pieces of 2 characters and convert them to integers individually, but I'd expect that to be much slower.
Try hashid.
It hash a number into format you can define. The formats include how many character, and what character included.
Example:
$hashids->encode(1);
Will return "28630" depends on your format,
Just use my manual hash method below:
Divide the number (e.g. 6 digit) by prime values, 3,5,7.
And get the first 6 values that are in the decimal places as the ID to be used. Do a check on uniqueness before actual creation of the ID, if a collision exists, increase the last digit by +1 until a non collision.
E.g. 123456 gives you 771428
123457 gives you 780952
123458 gives you 790476.

PHP int to longer string for use with qrcode

I'm trying to build a app that would identify a user by scanning a qrcode. For this, I want to use the primary key as the identifier. Since the character length of the integer is short, it wouldn't give a good look as a qrcode.
So my question is: Is it possible to convert the int to string which is longer than 10-12 chars (fixed length if possible),mix of chars and numbers which can be reversed to the original integer.
What you can do is to make SHA256 of your user's ID and convert it to QR code.
Then when user reads QR code and send you sha value you try to match it with SHA of user's IDs in the database.
So here is the way to have SHA hash from user id:
$hash = hash('sha256', $userId); // The result is long enough string for QA
The when you need to find a user based on SHA do the following:
select * from users where SHA2(id, 256) = 'SHA_PROVIDED_BY_USER';
You can in order to speed up the look up process store SHA in the DB as well then query will be much faster.
Another option is to prepend the number with some letters. It will give you random string, nice QRs and you can extract numeric ID with simple regexp.
Using function from PHP random string generator (don't forget to remove numbers from $characters) the code could be:
//encoding
$size = 12;
$str = generateRandomString($size-strlen($userId)).$userId;
//decoding
preg_match('/(\d+)$/', $str, $matching);
$userId = $matching[1];
you can convert your integer to any base with base_convert function.
here is the documentation.
http://php.net/manual/en/function.base-convert.php
The notion that a number, in PHP, has a "maximum size" is a little off (not wrong, just off =P)
From the manual:
If PHP encounters a number beyond the bounds of the integer type, it will be interpreted as a float instead.
So, you could use really large numbers for your QR Codes if you want. Shouldn't be an issue. However, what would be better is to think of "what exactly do you need"?
If you need a numeric value, but want it in hex, you can use base_convert() to go back and forth between the numbers:
$val = 1234;
$hex = base_convert($val, 10, 16);
However, if strings are more for you, you could use base64_encode() to encode it:
$val = 'awesome string value';
$encoded = base64_encode($val);
UPDATE
Based on comments, it sounds like you also want to pad the string if it's too short. You can use str_pad() to accomplish this:
$val = str_pad("1", 10, "0", STR_PAD_LEFT);
echo $val;
// displays: 0000000001
$orig = intval($val);
echo $orig;
// displays: 1
Coderpad Example of str_pad()

Alphanumeric (sha1) Semaphore in PHP?

PHP's "shm_get" function requires an integer semaphore key, which I realise to be a restriction of the underlying OS.
I am using the "sha1" function to hash some user input and using the hash to uniquely identify a number of resulting files and and background processes.
Is there a way to convince shm_get to accept an alphanumeric key or to convert a sha1 hash to an acceptable integer?
You can convert a hexadecimal number into a decimal number by using hexdec()
However if you have got a large number in your hash, this won't return an integer. But you need an integer. So you might want to cut it apart and only use a part of the hash.
$hash = sha1('http://www.hashcat.net/');
$hash = substr($hash, 0, 15); // ok on 64bit systems
$number = (int) hexdec($hash); // cap to PHP_INT_MAX anyway
var_dump($hash, $number);

Generate random string from 4 to 8 characters in PHP

I need to generate a string using PHP, it need to be unique and need to be from 4 to 8 characters (the value of a variable).
I thought I can use crc32 hash but I can't decide how many characters, but sure it will be unique. In the other hand only create a "password generator" will generate duplicated string and checking the value in the table for each string will take a while.
How can I do that?
Thanks!
Maybe I can use that :
function unique_id(){
$better_token = md5(uniqid(rand(), true));
$unique_code = substr($better_token, 16);
$uniqueid = $unique_code;
return $uniqueid;
}
$id = unique_id();
Changing to :
function unique_id($l = 8){
$better_token = md5(uniqid(rand(), true));
$rem = strlen($better_token)-$l;
$unique_code = substr($better_token, 0, -$rem);
$uniqueid = $unique_code;
return $uniqueid;
}
echo unique_id(4);
Do you think I'll get unique string each time for a goood while?
In short, I think you'll get a pretty good random value. There's always the chance of a collision but you've done everything you can to get a random value. uniqid() returns a random value based on the current time in microseconds. Specifying rand() (mt_rand() would be better) and the second argument as true to uniqid() should make the value even more unique. Hashing the value using md5() should also make it pretty unique as even a small difference in two random values generated should be magnified by the hashing function. idealmachine is correct in that a longer value is less likely to have a collision than a shorter one.
Your function could also be shorter since md5() will always return a 32 character long string. Try this:
function unique_id($l = 8) {
return substr(md5(uniqid(mt_rand(), true)), 0, $l);
}
The problem with randomness is that you can never be sure of anything. There is a small chance you could get one number this time and the same number the next. That said, you would want to make the string as long as possible to reduce that probability. As an example of how long such numbers can be, GUIDs (globally unique identifiers) are 16 bytes long.
In theory, four hex characters (16 bits) give only 16^4 = 65536 possibilities, while eight hex characters (32 bits) give 16^8 = 4294967296. You, however, need to consider how likely it is for any two hashes to collide (the "birthday problem"). Wikipedia has a good table on how likely such a collision is. In short, four hex characters are definitely not sufficient, and eight might not be.
You may want to consider using Base64 encoding rather than hex digits; that way, you can fit 48 bits in rather than just 32 bits.
Eight bytes is 8 * 8 = 64 bits.
Reliable passwords You can only make from ascii characters a-zA-Z and numbers 0-9. To do that best way is using only cryptographically secure methods, like random_int() or random_bytes() from PHP7. Rest functions as base64_encode() You can use only as support functions to make reliability of string and change it to ASCII characters.
mt_rand() is not secure and is very old.
From any string You must use random_int(). From binary string You should use base64_encode() to make binary string reliable or bin2hex, but then You will cut byte only to 16 positions (values).
See my implementation of this functions.

Convert MD5 to base62 for URL

I have a script to convert to base 62 (A-Za-z0-9) but how do I get a number out of MD5?
I have read in many places that because the number from an MD5 is bigger than php can handle as an integer it will be inaccurate... As I want a short URL anyway and was not planning on using the whole hash, maybe just 8 characters of it....
So my question is how to get part of the number of an MD5 hash?
Also is it a bad idea to use only part of the MD5 hash?
I'm going to suggest a different thing here.. Since you are only interested in using a decimal chunk of the md5 hash why don't you use any other short numeric hash like CRC32 or Adler? Here is an example:
$hash = sprintf('%u', crc32('your string here'));
This will produce a 8 digit hash of your string.
EDIT: I think I misunderstood you, here are some functions that provide conversions to and from bases up to 62.
EDIT (Again): To work with arbitrary length numbers you must use either the bc_math or the GMP extension, here is a function that uses the bc_math extension and can also convert from base 2 up to base 62. You should use it like this:
echo bc_base_convert(md5('your url here'), 16, 62); // public base 62 hash
and the inverse:
echo bc_base_convert('base 62 encoded value here', 62, 16); // private md5 hash
Hope it helps. =)
If it's possible, I'd advise not using a hash for your URLs. Eventually you'll run into collisions... especially if you're truncating the hash. If you go ahead and implement an id-based system where each item has a unique ID, there will be far fewer headaches. The first item will be 1, the second'll be 2, etc---if you're using MySQL, just throw in an autoincrement column.
To make a short id:
//the basic example
$sid = base_convert($id, 10, 36);
//if you're going to be needing 64 bit numbers converted
//on a 32 bit machine, use this instead
$sid = gmp_strval(gmp_init($id, 10), 36);
To make a short id back into the base-10 id:
//the basic example
$id = base_convert($id, 36, 10);
//if you're going to be needing 64 bit numbers
//on a 32 bit machine, use this instead
$id = gmp_strval(gmp_init($shortid, 36));
Hope this helps!
If you're truly wanting base 62 (which can't be done with gmp or base_convert), check this out:
http://snipplr.com/view/22246/base62-encode--decode/
You can do this like this: (Not all steps are in php, it's been a long time that I've used it.)
Create a md5 hash of the script like this:
$hash = md5(script, raw_output=true);
Convert that number to base 62.
See the questions about base conversion of arbitrary sized numbers in PHP
Truncate the string to a length you like.
There's no risk in using only a few of the bits of a md5. All that changes is danger of collisions.
There actually is a Java implementation which you could probably extract. It's an open-source CMS solution called Pulse.
Look here for the code of toBase62() and fromBase62().
http://pulse.torweg.org/javadoc/src-html/org/torweg/pulse/util/StringUtils.java.html
The only dependency in StringUtils is the LifeCycle-class which provides a way to get a salted hash for a string which you might even omit all together or just copy the method over to your copy StringUtils. Voilá.
You can do something like this,
$hash = md5("The data to be hashed", true);
$ints = unpack("L*num", $hash);
$hash_str = base62($ints['num1']) . base62($ints['num2']) . base62($ints['num3']) . base62($ints['num4'])
As of PHP 5.3.2, GMP supports bases up to 62 (was previously only 36), so brianreavis's suggestion was very close. I think the simplest answer to your question is thus:
function base62hash($source, $chars = 22) {
return substr(gmp_strval(gmp_init(md5($source), 16), 62), 0, $chars);
}
Converting from base-16 to base-62 obviously has space benefits. A normal 128-bit MD5 hash is 32 chars in hex, but in base-62 it's only 22. If you're storing the hashes in a database, you can convert them to raw binary and save even more space (16 bytes for an MD5).
Since the resulting hash is just a string representation, you can just use substr if you only want a bit of it (as the function does).
You may try base62x to get a safe and compatible encoded representation.
Here is for more information about base62x, or simply -base62x in -NatureDNS.
shell> ./base62x -n 16 -enc 16AF
1Ql
shell> ./base62x -n 16 -dec 1Ql
16AF
shell> ./base62x
Usage: ./base62x [-v] [-n <2|8|10|16|32>] <-enc|dec> string
Version: 0.60
Here is an open-source Java library that converts MD5 strings to Base62 strings
https://github.com/inder123/base62
Md5ToBase62.toBase62("9e107d9d372bb6826bd81d3542a419d6") ==> cbIKGiMVkLFTeenAa5kgO4
Md5ToBase62.fromBase62("4KfZYA1udiGCjCEFC0l") ==> 0000bdd3bb56865852a632deadbc62fc
The conversion is two-way, so you will get the original md5 back if you convert it back to md5:
Md5ToBase62.fromBase62(Md5ToBase62.toBase62("9e107d9d372bb6826bd81d3542a419d6")) ==> 9e107d9d372bb6826bd81d3542a419d6
Md5ToBase62.toBase62(Md5ToBase62.fromBase62("cbIKGiMVkLFTeenAa5kgO4")) . ==> cbIKGiMVkLFTeenAa5kgO4
```
You could use a slightly modified Base 64 with - and _ instead of + and /:
function base64_url_encode($str) {
return strtr(base64_encode($str), array('+'=>'-', '/'=>'_'));
}
function base64_url_decode($str) {
return base64_decode(strtr($str, array('-'=>'+', '_'=>'/')));
}
Additionally you could remove the trailing padding = characters.
And to get the raw MD5 value (binary string), set the second parameter (named $raw_output in the manual) to true:
$raw_md5 = md5($str, true);

Categories