In my user database table, I take the MD5 hash of the email address of a user as the id.
Example: email(example#example.org) = id(d41d8cd98f00b204e9800998ecf8427e)
Unfortunately, I have to represent the ids as integer values now - in order to be able to use an API where the id can only be an integer.
Now I'm looking for a way to encode the id into an integer for sending an decode it again when receiving. How could I do this?
My ideas so far:
convert_uuencode() and convert_uudecode() for the MD5 hash
replace every character of the MD5 hash by its ord() value
Which approach is better? Do you know even better ways to do this?
I hope you can help me. Thank you very much in advance!
Be careful. Converting the MD5s to an integer will require support for big (128-bit) integers. Chances are the API you're using will only support 32-bit integers - or worse, might be dealing with the number in floating-point. Either way, your ID will get munged. If this is the case, just assigning a second ID arbitrarily is a much better way to deal with things than trying to convert the MD5 into an integer.
However, if you are sure that the API can deal with arbitrarily large integers without trouble, you can just convert the MD5 from hexadecimal to an integer. PHP most likely does not support this built-in however, as it will try to represent it as either a 32-bit integer or a floating point; you'll probably need to use the PHP GMP library for it.
There are good reasons, stated by others, for doing it a different way.
But if what you want to do is convert an md5 hash into a string
of decimal digits (which is what I think you really mean by
"represent by an integer", since an md5 is already an integer in string form),
and transform it back into the same md5 string:
function md5_hex_to_dec($hex_str)
{
$arr = str_split($hex_str, 4);
foreach ($arr as $grp) {
$dec[] = str_pad(hexdec($grp), 5, '0', STR_PAD_LEFT);
}
return implode('', $dec);
}
function md5_dec_to_hex($dec_str)
{
$arr = str_split($dec_str, 5);
foreach ($arr as $grp) {
$hex[] = str_pad(dechex($grp), 4, '0', STR_PAD_LEFT);
}
return implode('', $hex);
}
Demo:
$md5 = md5('example#example.com');
echo $md5 . '<br />'; // 23463b99b62a72f26ed677cc556c44e8
$dec = md5_hex_to_dec($md5);
echo $dec . '<br />'; // 0903015257466342942628374306682186817640
$hex = md5_dec_to_hex($dec);
echo $hex; // 23463b99b62a72f26ed677cc556c44e8
Of course, you'd have to be careful using either string, like making sure to use them only as string type to avoid losing leading zeros, ensuring the strings are the correct lengths, etc.
A simple solution could use hexdec() for conversions for parts of the hash.
Systems that can accommodate 64-bit Ints can split the 128-bit/16-byte md5() hash into four 4-byte sections and then convert each into representations of unsigned 32-bit Ints. Each hex pair represents 1 byte, so use 8 character chunks:
$hash = md5($value);
foreach (str_split($hash, 8) as $chunk) {
$int_hashes[] = hexdec($chunk);
}
On the other end, use dechex() to convert the values back:
foreach ($int_hashes as $ihash) {
$original_hash .= dechex($ihash);
}
Caveat: Due to underlying deficiencies with how PHP handles integers and how it implements hexdec() and intval(), this strategy will not work with 32-bit systems.
Edit Takeaways:
Ints in PHP are always signed, there are no unsigned Ints.
Although intval() may be useful for certain cases, hexdec() is more performant and more simple to use for base-16.
hexdec() converts values above 7fffffffffffffff into Floats, making its use moot for splitting the hash into two 64-bit/8-byte chunks.
Similarly for intval($chunk, 16), it returns the same Int value for 7fffffffffffffff and above.
Why ord()? md5 produce normal 16-byte value, presented to you in hex for better readability. So you can't convert 16-byte value to 4 or 8 byte integer without loss. You must change some part of your algoritms to use this as id.
You could use hexdec to parse the hexadecimal string and store the number in the database.
Couldn't you just add another field that was an auto-increment int field?
what about:
$float = hexdec(md5('string'));
or
$int = (integer) (substr(hexdec(md5('string')),0,9)*100000000);
Definitely bigger chances for collision but still good enaugh to use instead of hash in DB though?
Add these two columns to your table.
`email_md5_l` bigint(20) UNSIGNED GENERATED ALWAYS AS (conv(left(md5(`email`),16),16,10)) STORED,
`email_md5_r` bigint(20) UNSIGNED GENERATED ALWAYS AS (conv(right(md5(`email`),16),16,10)) STORED,
It might or might not help to create a PK on these two columns though, as it probably concatenates two string representations and hashes the result. It would kind of defeat your purpose and a full scan might be quicker but that depends on number of columns and records. Don't try to read these bigints in php as it doesn't have unsigned integers, just stay in SQL and do something like:
select email
into result
from `address`
where url_md5_l = conv(left(md5(the_email), 16), 16, 10)
and url_md5_r = conv(right(md5(the_email), 16), 16, 10)
limit 1;
MD5 does collide btw.
Use the email address as the file name of a blank, temporary file in a shared folder, like /var/myprocess/example#example.org
Then, call ftok on the file name. ftok will return a unique, integer ID.
It won't be guaranteed to be unique though, but it will probably suffice for your API.
Related
In php is there a way to give a unique hash from a string, but that the hash was made up from numbers only?
example:
return md5(234); // returns 098f6bcd4621d373cade4e832627b4f6
but I need
return numhash(234); // returns 00978902923102372190
(20 numbers only)
the problem here is that I want the hashing to be short.
edit:
OK let me explain the back story here.
I have a site that has a ID for every registered person, also I need a ID for the person to use and exchange (hence it can't be too long), so far the ID numbering has been 00001, 00002, 00003 etc...
this makes some people look more important
this reveals application info that I don't want to reveal.
To fix point 1 and 2 I need to "hide" the number while keeping it unique.
Edit + SOLUTION:
Numeric hash function based on the code by https://stackoverflow.com/a/23679870/175071
/**
* Return a number only hash
* https://stackoverflow.com/a/23679870/175071
* #param $str
* #param null $len
* #return number
*/
public function numHash($str, $len=null)
{
$binhash = md5($str, true);
$numhash = unpack('N2', $binhash);
$hash = $numhash[1] . $numhash[2];
if($len && is_int($len)) {
$hash = substr($hash, 0, $len);
}
return $hash;
}
// Usage
numHash(234, 20); // always returns 6814430791721596451
An MD5 or SHA1 hash in PHP returns a hexadecimal number, so all you need to do is convert bases. PHP has a function that can do this for you:
$bignum = hexdec( md5("test") );
or
$bignum = hexdec( sha1("test") );
PHP Manual for hexdec
Since you want a limited size number, you could then use modular division to put it in a range you want.
$smallnum = $bignum % [put your upper bound here]
EDIT
As noted by Artefacto in the comments, using this approach will result in a number beyond the maximum size of an Integer in PHP, and the result after modular division will always be 0. However, taking a substring of the hash that contains the first 16 characters doesn't have this problem. Revised version for calculating the initial large number:
$bignum = hexdec( substr(sha1("test"), 0, 15) );
You can try crc32(). See the documentation at: http://php.net/manual/en/function.crc32.php
$checksum = crc32("The quick brown fox jumped over the lazy dog.");
printf("%u\n", $checksum); // prints 2191738434
With that said, crc should only be used to validate the integrity of data.
There are some good answers but for me the approaches seem silly.
They first force php to create a Hex number, then convert this back (hexdec) in a BigInteger and then cut it down to a number of letters... this is much work!
Instead why not
Read the hash as binary:
$binhash = md5('[input value]', true);
then using
$numhash = unpack('N2', $binhash); //- or 'V2' for little endian
to cast this as two INTs ($numhash is an array of two elements). Now you can reduce the number of bits in the number simply using an AND operation. e.g:
$result = $numhash[1] & 0x000FFFFF; //- to get numbers between 0 and 1048575
But be warned of collisions! Reducing the number means increasing the probability of two different [input value] with the same output.
I think that the much better way would be the use of "ID-Crypting" with a Bijectiv function. So no collisions could happen! For the simplest kind just use an Affine_cipher
Example with max input value range from 0 to 25:
function numcrypt($a)
{
return ($a * 15) % 26;
}
function unnumcrypt($a)
{
return ($a * 7) % 26;
}
Output:
numcrypt(1) : 15
numcrypt(2) : 4
numcrypt(3) : 19
unnumcrypt(15) : 1
unnumcrypt(4) : 2
unnumcrypt(19) : 3
e.g.
$id = unnumcrypt($_GET('userid'));
... do something with the ID ...
echo ' go ';
of course this is not secure, but if no one knows the method used for your encryption then there are no security reasons then this way is faster and collision safe.
The problem of cut off the hash are the collisions, to avoid it try:
return hexdec(crc32("Hello World"));
The crc32():
Generates the cyclic redundancy checksum polynomial of 32-bit lengths
of the str. This is usually used to validate the integrity of data
being transmitted.
That give us an integer of 32 bit, negative in 32 bits installation, or positive in the 64 bits. This integer could be store like an ID in a database. This don´t have collision problems, because it fits into 32bits variable, once you convert it to decimal with the hexdec() function.
First of all, md5 is basically compromised, so you shouldn't be using it for anything but non-critical hashing.
PHP5 has the hash() function, see http://www.php.net/manual/en/function.hash.php.
Setting the last parameter to true will give you a string of binary data. Alternatively, you could split the resulting hexadecimal hash into pieces of 2 characters and convert them to integers individually, but I'd expect that to be much slower.
Try hashid.
It hash a number into format you can define. The formats include how many character, and what character included.
Example:
$hashids->encode(1);
Will return "28630" depends on your format,
Just use my manual hash method below:
Divide the number (e.g. 6 digit) by prime values, 3,5,7.
And get the first 6 values that are in the decimal places as the ID to be used. Do a check on uniqueness before actual creation of the ID, if a collision exists, increase the last digit by +1 until a non collision.
E.g. 123456 gives you 771428
123457 gives you 780952
123458 gives you 790476.
I'm trying to build a app that would identify a user by scanning a qrcode. For this, I want to use the primary key as the identifier. Since the character length of the integer is short, it wouldn't give a good look as a qrcode.
So my question is: Is it possible to convert the int to string which is longer than 10-12 chars (fixed length if possible),mix of chars and numbers which can be reversed to the original integer.
What you can do is to make SHA256 of your user's ID and convert it to QR code.
Then when user reads QR code and send you sha value you try to match it with SHA of user's IDs in the database.
So here is the way to have SHA hash from user id:
$hash = hash('sha256', $userId); // The result is long enough string for QA
The when you need to find a user based on SHA do the following:
select * from users where SHA2(id, 256) = 'SHA_PROVIDED_BY_USER';
You can in order to speed up the look up process store SHA in the DB as well then query will be much faster.
Another option is to prepend the number with some letters. It will give you random string, nice QRs and you can extract numeric ID with simple regexp.
Using function from PHP random string generator (don't forget to remove numbers from $characters) the code could be:
//encoding
$size = 12;
$str = generateRandomString($size-strlen($userId)).$userId;
//decoding
preg_match('/(\d+)$/', $str, $matching);
$userId = $matching[1];
you can convert your integer to any base with base_convert function.
here is the documentation.
http://php.net/manual/en/function.base-convert.php
The notion that a number, in PHP, has a "maximum size" is a little off (not wrong, just off =P)
From the manual:
If PHP encounters a number beyond the bounds of the integer type, it will be interpreted as a float instead.
So, you could use really large numbers for your QR Codes if you want. Shouldn't be an issue. However, what would be better is to think of "what exactly do you need"?
If you need a numeric value, but want it in hex, you can use base_convert() to go back and forth between the numbers:
$val = 1234;
$hex = base_convert($val, 10, 16);
However, if strings are more for you, you could use base64_encode() to encode it:
$val = 'awesome string value';
$encoded = base64_encode($val);
UPDATE
Based on comments, it sounds like you also want to pad the string if it's too short. You can use str_pad() to accomplish this:
$val = str_pad("1", 10, "0", STR_PAD_LEFT);
echo $val;
// displays: 0000000001
$orig = intval($val);
echo $orig;
// displays: 1
Coderpad Example of str_pad()
I need to generate a strong unique API key.
Can anyone suggest the best solution for this? I don't want to use rand() function to generate random characters. Is there an alternative solution?
As of PHP 7.0, you can use the random_bytes($length) method to generate a cryptographically-secure random string. This string is going to be in binary, so you'll want to encode it somehow. A straightforward way of doing this is with bin2hex($binaryString). This will give you a string $length * 2 bytes long, with $length * 8 bits of entropy to it.
You'll want $length to be high enough such that your key is effectively unguessable and that the chance of there being another key being generated with the same value is practically nil.
Putting this all together, you get this:
$key = bin2hex(random_bytes(32)); // 64 characters long
When you verify the API key, use only the first 32 characters to select the record from the database and then use hash_equals() to compare the API key as given by the user against what value you have stored. This helps protect against timing attacks. ParagonIE has an excellent write-up on this.
For an example of the checking logic:
$token = $request->bearerToken();
// Retrieve however works best for your situation,
// but it's critical that only the first 32 characters are used here.
$users = app('db')->table('users')->where('api_key', 'LIKE', substr($token, 0, 32) . '%')->get();
// $users should only have one record in it,
// but there is an extremely low chance that
// another record will share a prefix with it.
foreach ($users as $user) {
// Performs a constant-time comparison of strings,
// so you don't leak information about the token.
if (hash_equals($user->api_token, $token)) {
return $user;
}
}
return null;
Bonus: Slightly More Advanced Use With Base64 Encoding
Using Base64 encoding is preferable to hexadecimal for space reasons, but is slightly more complicated because each character encodes 6 bits (instead of 4 for hexadecimal), which can leave the encoded value with padding at the end.
To keep this answer from dragging on, I'll just put some suggestions for handling Base64 without their supporting arguments. Pick a $length greater than 32 that is divisible by both 3 and 2. I like 42, so we'll use that for $length. Base64 encodings are of length 4 * ceil($length / 3), so our $key will be 56 characters long. You can use the first 28 characters for selection from your storage, leaving another 28 characters on the end that are protected from leaking by timing attacks with hash_equals.
Bonus 2: Secure Key Storage
Ideally, you should be treating the key much like a password. This means that instead of using hash_equals to compare the full string, you should hash the remainder of the key like a password, store that separately than the first half of your key (which is in plain-text), use the first half for selection from your database and verify the latter half with password_verify.
using mcrypt:
<?php
$bytes = mcrypt_create_iv(4, MCRYPT_DEV_URANDOM);
$unpack = unpack("Nint", $bytes);
$id = $unpack['int'] & 0x7FFFFFFF;
PHP has uniqid function http://php.net/manual/en/function.uniqid.php with optional prefix and you can even add additional entropy to further avoid collision. But if you absolutely possitevily need something unique you should not use anything with randomness in it.
This is the best solution i found.
http://www.php.net/manual/en/function.uniqid.php#94959
I need to generate a string using PHP, it need to be unique and need to be from 4 to 8 characters (the value of a variable).
I thought I can use crc32 hash but I can't decide how many characters, but sure it will be unique. In the other hand only create a "password generator" will generate duplicated string and checking the value in the table for each string will take a while.
How can I do that?
Thanks!
Maybe I can use that :
function unique_id(){
$better_token = md5(uniqid(rand(), true));
$unique_code = substr($better_token, 16);
$uniqueid = $unique_code;
return $uniqueid;
}
$id = unique_id();
Changing to :
function unique_id($l = 8){
$better_token = md5(uniqid(rand(), true));
$rem = strlen($better_token)-$l;
$unique_code = substr($better_token, 0, -$rem);
$uniqueid = $unique_code;
return $uniqueid;
}
echo unique_id(4);
Do you think I'll get unique string each time for a goood while?
In short, I think you'll get a pretty good random value. There's always the chance of a collision but you've done everything you can to get a random value. uniqid() returns a random value based on the current time in microseconds. Specifying rand() (mt_rand() would be better) and the second argument as true to uniqid() should make the value even more unique. Hashing the value using md5() should also make it pretty unique as even a small difference in two random values generated should be magnified by the hashing function. idealmachine is correct in that a longer value is less likely to have a collision than a shorter one.
Your function could also be shorter since md5() will always return a 32 character long string. Try this:
function unique_id($l = 8) {
return substr(md5(uniqid(mt_rand(), true)), 0, $l);
}
The problem with randomness is that you can never be sure of anything. There is a small chance you could get one number this time and the same number the next. That said, you would want to make the string as long as possible to reduce that probability. As an example of how long such numbers can be, GUIDs (globally unique identifiers) are 16 bytes long.
In theory, four hex characters (16 bits) give only 16^4 = 65536 possibilities, while eight hex characters (32 bits) give 16^8 = 4294967296. You, however, need to consider how likely it is for any two hashes to collide (the "birthday problem"). Wikipedia has a good table on how likely such a collision is. In short, four hex characters are definitely not sufficient, and eight might not be.
You may want to consider using Base64 encoding rather than hex digits; that way, you can fit 48 bits in rather than just 32 bits.
Eight bytes is 8 * 8 = 64 bits.
Reliable passwords You can only make from ascii characters a-zA-Z and numbers 0-9. To do that best way is using only cryptographically secure methods, like random_int() or random_bytes() from PHP7. Rest functions as base64_encode() You can use only as support functions to make reliability of string and change it to ASCII characters.
mt_rand() is not secure and is very old.
From any string You must use random_int(). From binary string You should use base64_encode() to make binary string reliable or bin2hex, but then You will cut byte only to 16 positions (values).
See my implementation of this functions.
I have a script to convert to base 62 (A-Za-z0-9) but how do I get a number out of MD5?
I have read in many places that because the number from an MD5 is bigger than php can handle as an integer it will be inaccurate... As I want a short URL anyway and was not planning on using the whole hash, maybe just 8 characters of it....
So my question is how to get part of the number of an MD5 hash?
Also is it a bad idea to use only part of the MD5 hash?
I'm going to suggest a different thing here.. Since you are only interested in using a decimal chunk of the md5 hash why don't you use any other short numeric hash like CRC32 or Adler? Here is an example:
$hash = sprintf('%u', crc32('your string here'));
This will produce a 8 digit hash of your string.
EDIT: I think I misunderstood you, here are some functions that provide conversions to and from bases up to 62.
EDIT (Again): To work with arbitrary length numbers you must use either the bc_math or the GMP extension, here is a function that uses the bc_math extension and can also convert from base 2 up to base 62. You should use it like this:
echo bc_base_convert(md5('your url here'), 16, 62); // public base 62 hash
and the inverse:
echo bc_base_convert('base 62 encoded value here', 62, 16); // private md5 hash
Hope it helps. =)
If it's possible, I'd advise not using a hash for your URLs. Eventually you'll run into collisions... especially if you're truncating the hash. If you go ahead and implement an id-based system where each item has a unique ID, there will be far fewer headaches. The first item will be 1, the second'll be 2, etc---if you're using MySQL, just throw in an autoincrement column.
To make a short id:
//the basic example
$sid = base_convert($id, 10, 36);
//if you're going to be needing 64 bit numbers converted
//on a 32 bit machine, use this instead
$sid = gmp_strval(gmp_init($id, 10), 36);
To make a short id back into the base-10 id:
//the basic example
$id = base_convert($id, 36, 10);
//if you're going to be needing 64 bit numbers
//on a 32 bit machine, use this instead
$id = gmp_strval(gmp_init($shortid, 36));
Hope this helps!
If you're truly wanting base 62 (which can't be done with gmp or base_convert), check this out:
http://snipplr.com/view/22246/base62-encode--decode/
You can do this like this: (Not all steps are in php, it's been a long time that I've used it.)
Create a md5 hash of the script like this:
$hash = md5(script, raw_output=true);
Convert that number to base 62.
See the questions about base conversion of arbitrary sized numbers in PHP
Truncate the string to a length you like.
There's no risk in using only a few of the bits of a md5. All that changes is danger of collisions.
There actually is a Java implementation which you could probably extract. It's an open-source CMS solution called Pulse.
Look here for the code of toBase62() and fromBase62().
http://pulse.torweg.org/javadoc/src-html/org/torweg/pulse/util/StringUtils.java.html
The only dependency in StringUtils is the LifeCycle-class which provides a way to get a salted hash for a string which you might even omit all together or just copy the method over to your copy StringUtils. Voilá.
You can do something like this,
$hash = md5("The data to be hashed", true);
$ints = unpack("L*num", $hash);
$hash_str = base62($ints['num1']) . base62($ints['num2']) . base62($ints['num3']) . base62($ints['num4'])
As of PHP 5.3.2, GMP supports bases up to 62 (was previously only 36), so brianreavis's suggestion was very close. I think the simplest answer to your question is thus:
function base62hash($source, $chars = 22) {
return substr(gmp_strval(gmp_init(md5($source), 16), 62), 0, $chars);
}
Converting from base-16 to base-62 obviously has space benefits. A normal 128-bit MD5 hash is 32 chars in hex, but in base-62 it's only 22. If you're storing the hashes in a database, you can convert them to raw binary and save even more space (16 bytes for an MD5).
Since the resulting hash is just a string representation, you can just use substr if you only want a bit of it (as the function does).
You may try base62x to get a safe and compatible encoded representation.
Here is for more information about base62x, or simply -base62x in -NatureDNS.
shell> ./base62x -n 16 -enc 16AF
1Ql
shell> ./base62x -n 16 -dec 1Ql
16AF
shell> ./base62x
Usage: ./base62x [-v] [-n <2|8|10|16|32>] <-enc|dec> string
Version: 0.60
Here is an open-source Java library that converts MD5 strings to Base62 strings
https://github.com/inder123/base62
Md5ToBase62.toBase62("9e107d9d372bb6826bd81d3542a419d6") ==> cbIKGiMVkLFTeenAa5kgO4
Md5ToBase62.fromBase62("4KfZYA1udiGCjCEFC0l") ==> 0000bdd3bb56865852a632deadbc62fc
The conversion is two-way, so you will get the original md5 back if you convert it back to md5:
Md5ToBase62.fromBase62(Md5ToBase62.toBase62("9e107d9d372bb6826bd81d3542a419d6")) ==> 9e107d9d372bb6826bd81d3542a419d6
Md5ToBase62.toBase62(Md5ToBase62.fromBase62("cbIKGiMVkLFTeenAa5kgO4")) . ==> cbIKGiMVkLFTeenAa5kgO4
```
You could use a slightly modified Base 64 with - and _ instead of + and /:
function base64_url_encode($str) {
return strtr(base64_encode($str), array('+'=>'-', '/'=>'_'));
}
function base64_url_decode($str) {
return base64_decode(strtr($str, array('-'=>'+', '_'=>'/')));
}
Additionally you could remove the trailing padding = characters.
And to get the raw MD5 value (binary string), set the second parameter (named $raw_output in the manual) to true:
$raw_md5 = md5($str, true);