Is it possible the MD5 of two different strings be identical? - php

I'm trying to create a dynamic avatar for my website's users. Something like stackoverflow. I have a PHP script which generates an image based on a string:
path/to/avatar.php?hash=string
I want to use the MD5 of users' emails as the name of their avatars: (and as that string PHP script generates an image based on)
$email = $_GET['email'];
$hash = md5($email);
copy("path/to/avatar.php?hash=$hash","path/img/$hash.jpg");
Now I want to be sure, can I use the MD5 of their emails as their avatar's name? I mean isn't there two different strings which have identical MD5's output? In other word I want to know whether will be the output of two different strings unique?
I don't know my question is clear or not .. All I want to know, is there any possibility of being duplicate the MD5 of two different emails?

As the goal here is to use a hash for it's uniqueness rather than it's cryptographic strength MD5 is acceptable. Although I still wouldn't recommend it.
If you do settle on using MD5, use a globally unique id that you control rather than an user-supplied email address, along with a salt.
i.e.
$salt = 'random string';
$hash = md5($salt . $userId);
However:
There is still a small chance of a collision (starting at 2128 and approaching 264 relatively quickly due to the Birthday Paradox). Remember this is a chance, hashn and hashn+1 could collide.
There is not a reasonable way to determine the userId from the hash (I don't consider indexing 128-bit hashes so you can query them to be reasonable).
You use StackOverflow as an example.
User profiles on this site look like: http://stackoverflow.com/users/2805376/shafizadeh
So what is wrong with having avatar urls like http://your_site/users/2805376/avatar.png ? The back end storage could simply be /path/to/images/002/805/376.png
This guarantees a unique name, and provides you with a very simple and easy to work with way of storing, locating, and reversing the id assigned to images back to the user.

This is actually what Gravatar is doing (this was the standard way to get an avatar in Stackoverflow). Have a look at Gravatars implementation.
The chance of a collision is negligible in practice, it is difficult enough to intentionally forge two (binary) strings which result in the same MD5 and EMails are restricted in size and characters.
One problem of this approach is what Fred-ii- mentioned, because brute-forcing of MD5 is so fast (100 Giga MD5 per second), somebody could try to find the original email address, whose MD5 is now visible. For short emails this would work in reasonable time.
Using a UUID could be a good alternative to derriving from an EMail address. You can create such an id without database access and be sure that you won't get a duplicate.

Related

Collision free, short and unique Integer-only Hashes based on known values

I am trying to get started with web development by creating a Hotel Reservation System for practice and I have the following problem. I need to create two short, unique, collision free and Integer-only hashes as I have the following requirement:
The number has to be long enough to be believable by a human as it will be sent to them via email as their Booking Code but also short enough to be usable by a human as a 128-Bit Booking Code is weird and it would also be a primary key (unique identifier) for the Bookings Table and Customers Table in the database.
So I need one BookingId and one CustomersID. I know, I could use increment in the MySQL database but I wish to go the rougher route here for learning purposes.
Almost all the Hash functions create longer Hashes with digits and alphabets. But I wish the Hashes to be short and Integer only.
The entire Hash logic would execute in a PHP script right before inserting all the details into the DB using the awesome PDO functionality.
Now, this is the situation:
Each Customer has one unique email address which I know as they insert it in the Booking Form, so I could base the CustomerID on the unique email address. One Customer can only have one CustomerID but he can have multiple unique BookingIDs.
So the CustomerID can be based on the Email Address and the BookingID could be based on the Email and the timestamp.
The difficulty here is to find a Hash function which would result in a 6 Digit collision free Integer only Hash.
How can I generate these two short, unique, collision-free and Integer-only Hashes using PHP?
Rather then storing your "random ids" in your database, I would just work with a classic auto increment primary key. To the outside world you can encode these id's to make them look like the long integers you desire. And before querying or storing anything in your database you just decode them back to the actual primary key.
As for your customers table, email sounds like a valid option for the primary key, but what if the user decides to change his email address? Or what if you want to support multiple email addresses in the future? I would go for a numeric id for the customers as well, as it is the easy and future proof solution imo.
No need to re-invent the wheel either, I usually use this small library for this:
https://github.com/ivanakimov/hashids.php
Don't let the name fool you by the way, this is strictly speaking an Encrypter, and not a Hasher, since it works in two directions.
It allows you to set a custom alphabet, so your requirement to use only numeric values shouldn't be a problem. Your code would look something like this:
$hasher = new Hashids(MY_APP_SALT, 6, '0123456789');
$hashId = $hasher->encode($idFromDb);
$id = $hasher->decode($idFromRequest);
Update:
To elaborate on your questions in the comments, you could also do
$customerId = $hasher->encode($email);
And indeed, as long as the parameters you provide on construct of the $hasher you'll results remain the same.
The parameters for the constructor are, in order:
salt, I would put this in a constant or some sort of config value or environment variable, so you can maintain the actual value in a single location
minimum number of characters in the resulting hash. In my experience the actual hash remains quite close to that.
the allowed characters in the resulting hash.
By the way, I just read about an alternative library in the docs for if you only want to work with numeric values. No experience with that one, but it may be an even better fit.

Any flaws in using MD5 of username+gibberish for JSON requests?

So let's say I have "site1.mysite.com", "site2.mysite.com", etc. I want people (developers) to access some of the data via PHP generated JSON, but I also don't want to have to set up user accounts, sign ins, blah blah blah.
I don't want it to be open for "everybody".
What I started doing was this:
Users need to add "&user=somethingigivethem" and "key=somethingelseigivethem". These are values I provide to the user.
The key is currently the MD5 hash of the "user" and something like "53CR37P$%%" so basically:
$key_validator = md5($_GET['user'].'53CR37P$%%');
if($_GET['key'] === $key_validator){
//show JSON
} else {
//show error
}
Are there any major flaws in doing it this way?
So basically, if Joe Developer wants access, you give him a username and a key (which is an MD5 hash of his name + your salt). Joe can then make requests to your data.
If Joe wants to (ie. takes the time) he can probably figure out your hashing scheme just by trying different combinations of his username & salt values. And once he does, he'll know your salt and can access any other user's data.
I guess the question is: how valuable is this data? If you don't really care if other people get access and you really just want to keep out people who aren't too motivated to get your data, then this will work.
You could always combine an md5 and sha1 values with a randomized salt and also include your original salt value.
Example:
$key_validator = md5(sha1($_GET['user'].rand(0,1000)).'53CR37P$%%');
A little bit harder to crack, but you get the picture.
If I understood well, you generate both user and key for the user.
So the user have not to register and not to create it's own combination.
Making a key based on the user may be predictable quite easily, and overall with MD5.
I would recommend 2 ways:
If you really do not want to use your own database, generate a password based on better encryption system so people cant peak around the seed and encryption formula
(after all, makeing a md5 with the seed inside is a sort of "having the key into the password itself", no good)
Better encryption system supported by php: mainly all :) (you may need to install mcrypt extension) (support tens of encryptions, including most current like DES, 3DES, CAST, 2FISH, etc)
If you have no problem in using a database (or why not, a local file having the username/password pairs) , just generate a random strong password and keep the pairs in your database, and then just check against your stored values to give access, you still dont ask to the user to "register"
Oh, and don't forget, MD5 is only one way encryption, while real encryption with 3DES etc is reversible, so you can also compare things against real value.

Unique token in CakePHP

I need to create truly unique token when inserting records in CakePHP. The table can contain millions of rows so I cant just base on some randomly generated strings. I do not want to use a microtime() as well, because there is, though very small probability that two records can be submitted exactly at the same moment.
Of course the best solution would be to use String::uuid(), but as from cakephp documentation
The uuid method is used to generate unique identifiers as per RFC 4122. The uuid is a 128bit string in the format of 485fc381-e790-47a3-9794-1337c0a8fe68.
So, as far as I understood it does not use cake's security salt for its generation. So, I decided to hash it by security component's hash function (or Auth Password function), because I need it to be unique and very, really very secure at the same time. But then I found the question, saying that it is not a good idea, but for php uniqid and md5.
Why is MD5'ing a UUID not a good idea?
And, also I think the string hashed by security component is much harder to guess - because, for example String::uuid() in for loop has an output like this
for ($i = 0; $i < 30; $i++) {
echo String::uuid()."<br>";
}
die;
// outputs
51f3dcda-c4fc-4141-aaaf-1378654d2d93
51f3dcda-d9b0-4c20-8d03-1378654d2d93
51f3dcda-e7c0-4ddf-b808-1378654d2d93
51f3dcda-f508-4482-852d-1378654d2d93
51f3dcda-01ec-4f24-83b1-1378654d2d93
51f3dcda-1060-49d2-adc0-1378654d2d93
51f3dcda-1da8-4cfe-abe4-1378654d2d93
51f3dcda-2af0-42f7-81a0-1378654d2d93
51f3dcda-3838-4879-b2c9-1378654d2d93
51f3dcda-451c-465a-a644-1378654d2d93
51f3dcda-5264-44b0-a883-1378654d2d93
So, after all the some part of the string is similar, but in case of using hash function the results are pretty different
echo Security::hash('stackoverflow1');
echo "<br>";
echo Security::hash('stackoverflow2');
die;
// outputs
e9a3fcb74b9a03c7a7ab8731053ab9fe5d2fe6bd
b1f95bdbef28db16f8d4f912391c22310ba3c2c2
So, the question is, can I after all hash the uuid() in Cake? Or what is the best secure way to get truly unique and hashed (better according to my security salt) secure token.
UPDATE
Saying secure token, I mean how difficult it is for guessing. UUID is really unique, but from the example above, some records have some similarity. But hashed results do not.
Thanks !!
I don't think you need to worry about the UUIDs overlapping.
To put these numbers into perspective, the annual risk of someone being hit by a meteorite is estimated to be one chance in 17 billion,[38] which means the probability is about 0.00000000006 (6 × 10−11), equivalent to the odds of creating a few tens of trillions of UUIDs in a year and having one duplicate. In other words, only after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%. Or, to put it another way, the probability of one duplicate would be about 50% if every person on earth owns 600 million UUIDs.
http://en.wikipedia.org/wiki/Universally_unique_identifier#Random_UUID_probability_of_duplicates
Continue to use String::uuid() and rest easy :)
A UUID is unique
I need to create truly unique token when inserting records in cakphp
That is exactly what a UUID is. It is normally used in distributed systems to prevent collisions (multiple sources inserting data, possibly out of sync, into a datasource).
A UUID is not a security measure
I need it to be unique and very, really very secure at the same time
Not sure in what way hashing a uuid is supposed to enhance security - it won't. Relying on security by obscurity is more or less guaranteed to fail.
If your need is random tokens of some form - use a hash function (Hashing a uuid is simply hashing a random seed), if you need guaranteed-unique identifiers use UUIDs. They aren't the same thing and a UUID is a very poor mechanism of generating random, non-sequential "un-guessable" (or whatever the purpose is) strings.
Generating a random string suitable for cryptographic purposes was answered well here:
Secure random number generation in PHP
The code sample fills the string $pr_bits with random binary data, so the characters are unprintable. To use this in a URL, you could convert the binary data to printable characters a couple ways. None of them enhance the security but make them ready for URLs.
convert bytes to hex: bin2hex($pr_bits)
convert bytes to base64: base64_encode($pr_bits)
hash the bytes (because the output is conveniently in hex, not for added security): string hash ('md5' , $pr_bits)
I include the last one because you will see people use hash functions for other reasons, like to guarantee the output is 16bytes/128bits for md5. In PHP people use it to convert a value into HEX.
I have come up with the following solution
to use a string as a result of concatenating current time in microseconds and random string's hash
$timeStr = str_replace("0.", "", microtime());
$timeStr = str_replace(" ", "", $timeStr);
echo Security::hash('random string').'_'.$timeStr;
// 5ffd3b852ccdd448809abb172e19bbb9c01a43a4_796473001379403705
So, the first part(hash) of the string will contribute for the unguessability of the token, and the second part will guarantee its uniquenes.
Hope, this will help someone.

PHP - Looking for a two way obfuscation method for storing phone numbers

I'm looking to store (in mySQL) an obfuscated version of a phone number, where the number is used for authentication (I need to be able to get back the original number).
I've thought about an arbitrary scheme like storing the number * 15 or some constant only my app knows.
What are some better ways of doing this?
EDIT: Some things I'd like to clarify:
The phone numbers that are saved can be used to log into an iPhone app - so I want users to be able to see which number they have connected to the service incase they want to log into the app with a different number later. This means I cannot hash the value.
Essentially I am looking for a way to protect the data if someone lifts my database that they don't have a bunch of phone numbers in raw form. So I'd like to obfuscate them so I can use them for authentication, but be able to get one back in its original form without storing it raw.
EDIT: To clarify, I am not authenticating on JUST the phone number. If implemented, it would be phone number + a password! Enter a single string of digits that may exist and you're in? lol - my apologies if I have misled some folks.
Store where? In a database? Use an encryption function rather than rolling your own system.
In MySQL it'd be as simple as:
INSERT INTO users (phone) VALUES (AES_ENCRYPT('yourkey', '867-5309'));
of course, now you're changed the problem from hiding the phone numbers to "where the #$##$## can I hide this key?". Obvious solution: hide the key under a rock outside your server's front-door. Which changes the problem into "where the ####$###% can I hide this rock?". Obvious solution: cover your front yard with a steel cage with a padlock on the door. New problem: how to hide the padlock key... and so on.
How about actual encryption? In this scenario, a good symmetric encryption algorithm is trivial, since the length of the payload is limited to, what, 10 digits, so you can get by with a key that's also 10 decimal digits long; using such a key, all you need to do is something like XOR or increment / mod 10 on each digit. Of course, the weak link in this scheme then is the way you store the key.
I am curious, however, why you need to get them back out - if it's for authentication:
you shouldn't be using phone numbers, as these are easy to look up, even automatically
you should be storing secure one-way hashes with individual salts, so you couldn't even get them back out youself if you wanted to (except by brute-forcing)
Using the Cipher Class you can do this:
$phone = '...';
$key = 'secret.for.each.number';
$phone = Cipher::encrypt($phone, $key);
Before you store it in the database. Then later you can pull it out and do this:
$phone = Cipher::decrypt($phone, $key);
A better way would be not doing that. There is a reason one-way encryption is used to store passwords.
If you need to get back the original value, you should not be using it for authentication, since it will invariably be easy for an attacker to find it.
If you feel you need to hide the value by obfuscating it, you probably need to change something fundamental about how you're storing the data.
This isn't a very good approach to security. Several things jump out at me:
Phone numbers are very easy to guess: just program something to start guessing random combinations. Encrypted or not, your program is validating using these numbers, so it will eventually work on some. You need an extra layer of security like a password known only to the user in question. I would recommend anti-brute-force attack measures as well.
Any two-way encryption can be cracked, it is as simple as that. If you need to be able to decrypt data in the database easily, the only benefit from encrypting it is if someone hacks into your database and grabs the information. As others have pointed out, if that happens, you have bigger issues. The other scenario is for staffers who could have valid access to the DB. If you are hiding the data from them, it is important to encode the information in some way. But multiplying the phone number by a "unknown" constant is not ideal. Use a better method.
Surely I know my friend's numbers, so I could hack into anyone's account, correct? You need to add a password component if you haven't already. The password should be 1-way encryption using a strong and unique SALT. Once added, you only need to encrypt phone numbers in the DB if you don't want your staffers to see them. Otherwise you are wasting time encrypting them.
There is no point in this question.
Just leave these phone numbers as is. You will gain no security improvement from such obfuscation

Easy Encryption and Decryption with PHP

My PHP Application uses URLs like these:
http://domain.com/userid/120
http://domain.com/userid/121
The keys and the end of the URL are basically the primary key of the MySQL database table.
I don't want this increasing number to be public and I also don't want that someone will be able to crawl the user profiles just by interating the Id.
So I want to encrypt this Id for display in a way I can easily decrypt it again. The string shouldn't get much longer.
What's the best encryption method for this?
Simple Obscuring: Base64 encode them using base64_encode.
Now, your http://domain.com/userid/121 becomes: http://domain.com/userid/MTIx
Want more, do it again, add some letters around it.
Tough Obscuring: Use any encryption method using MCrypt library.
A better approach (from a usability and SEO perspective) would be to use a unique phrase rather than an obscured ID. In this instance the user's user name would seem an ideal solution, and would also be un-guessable.
That said, if you don't want to use this approach you could just use a hash (perhaps md5) of the user's user name which you'd store in the database along with their other details. As such, you can just do a direct lookup on that field. (i.e.: Having encrypt and decrypt part of the URL is probably overkill.)
You have a variety of choices here:
Generate and store an identifier in the database. It's good because you can then have readable keys that are guaranteed to be unique. It's bad because it causes a database schema change, and you have to actually query that table every time you want to generate a link.
Run an actual key-based encryption, for instance based on PHP's MCrypt. You have access to powerful cryptographic algorithms, but most secure algorithms tend to output strings that are much longer than what you expect. XOR does what you want, but it does not prevent accessing sequential values (and the key is pretty simple to determine, given the a priori knowledge about the numbers).
Run a hash-based verification: instead of using 121 as your identifier, use 121-a34df6 where a34df6 are the first six characters of the md5 (or other HMAC) of 121 and a secret key. Instead of decoding, you extract the 121 and recompute the six characters, to see if they match what the user sent. This does not hide the 121 (it's still right there before the hyphen) but without knowing the secret key, the visitor will not be able to generate the six characters to actually view the document numbered 121.
Use XOR with shuffling: shuffle the bits in the 30-bit identifier, then apply the XOR. This makes the XOR harder to identify because the shuffle pattern is also hidden.
Use XOR with on-demand keys: use fb37cde4-37b3 as your key, where the first part is the XOR of 121 and md5('37b3'.SECRET) (or another way of generating an XOR key based on 37b3 and a secret).
Don't use base64, it's easy to reverse engineer: if MTIx is 121, then MTIy is 122 ...
Ultimately, you will have to accept that your solution will not be secure: not only is it possible for users to leak valid urls (through their browser history, HTTP referer, or posting them on Twitter), but your requirement that the identifier fits in a small number of characters means a brute-force attack is possible (and becomes easier as you start having more documents).
Simplest but powerful encryption method: XOR with a secret Key. http://en.wikipedia.org/wiki/XOR_cipher
No practical performance degradation.
Base64 representation is not an encryption! It's another way to say the same.
Hope this helps.
Obscuring the URL will never secure it. It makes it harder to read, but not much harder to manipulate. You could use a hexadecimal number representation or something like that to obscure it. Those who can read hex can change your URL in a few seconds, anyway:
$hexId = dechex($id); // to hex
$id = hexdec($hexId); // from hex
I'd probably say it's better indeed to just create a random string for each user and store that in your database than to get one using hash. If you use a common hash, it's still very easy to iterate over all pages ;-)
I would write this in comments, but don't have the rep for it (yet?).
When user click on a link you should not use primary key, You can use the pkey in a session and get it from that session. Please do not use query string....
generate an unique string for each user and use it in your urls
http://domain.com/user/ofisdoifsdlfkjsdlfkj instead of http://domain.com/userid/121
you can use base64_encode and base64_decode function for encrypt and decrypt your URLS

Categories