Creating HMAC for PHP encryption - php

I've just been looking at adding a HMAC to PHP mcrypt encryption.
Is this simply hashing the encrypted data with hash_hmac using the encryption key and appending it to the encrypted data? Then on decryption you split off the HMAC, hash_hmac the rest of the data with the key again and check it matches the HMAC.
I'm confused because in this SO question When authenticating ciphertexts, what should be HMACed? it says:
you have to include in the HMAC input everything that impacts the decryption process, i.e. not only the encryption result per se, but also the IV which was used for that encryption, and, if the overall protocol supports algorithm agility, you should also input the specification of the encryption algorithm (otherwise, an attacker could alter the header of your message to replace the tag which says "AES-256" with the tag which says "AES-128" and you would unknowingly decrypt with the wrong algorithm).
Is this so? If this is true, why isn't using hash_hmac on just the encrypted data enough?

Short Answer: Yes
Long Answer:
HMAC is Hash-based message authentication code. You should HMAC anything which you want to authenticate, or in other words, anything which you want to protect against being modified.
Although the RFC standard is more complicated, it may make sense to think of HMAC as a salted hash.
e.g. hmac(message, key) = hash(message + key)
You can only recreate the same hmac with an identical message and key.
You can't recreate the same hmac if the key is identical but the message differs.
You can't recreate the same hmac if the message is identical but the key differs.
An attacker (who doesn't have the HMAC key) cannot modify part of the HMAC message without invalidating the existing HMAC. It really does depend on your data format and your usage of that data to determine what should be included in the HMAC message and HMAC key. But assuming you are using the HMAC to authenticate the decryption, then you should always include in the HMAC message anything that the decryption depends on. The symmetric key is typically used as the HMAC key.
In your quote, the poster says the IV and the algorithm should also be hashed. Consider a file/database format consisting of
ALGORITHM + IV + CIPHERTEXT + HMAC
If you only HMAC the ciphertext, an attacker would be able to modify the algorithm or IV (corrupting the file) without affecting the validity of the HMAC. This is bad because you can end up with a corrupted encrypted file with a valid HMAC. Decryption will proceed as normal because your software will think everything is ok. The result is a totally garbled decryption, but the point is that your software is broken because it returned the wrong output when decrypting and didn't give any errors. This can be classed as a 'security risk' if your application tries to do something with that erroneous data because it assumes it is correct. It is not a security risk in the sense that it makes the underlying encryption weaker or easier to crack. HMAC and symmetric encryption are two totally different technologies doing different things. The point of using a HMAC is that you can assume that the decryption layer is returning data which is 100% correct.
In the above example the ALGORITHM is a dynamic piece of data which I used to explain "algorithm agility" in the OPs quote. It defines what encryption algorithm was used. The point is that it is dynamic so it needs to be read from somewhere rather than hardcoded. This fact makes it a dependency of the decryption so it should be included in the HMAC message. However, if you always use some static algorithm then it should be assumed by (hardcoded in to) your decryption code and there is no need to store this data anyway. There is no need to include static data in the HMAC message because it has no affect on the decryption.
An example of a file format which uses a static algorithm is the open source AES-256 Crypt File Format. The algorithm is consistent and so it is always assumed. It actually uses 2 HMACs for speed reasons. 1 to authenticate the IV and keys, and the 2nd to authenticate the encrypted data part.

Related

Hash Decimal in MySQL [duplicate]

I see a lot of confusion between hashes and encryption algorithms and I would like to hear some more expert advice about:
When to use hashes vs encryptions
What makes a hash or encryption algorithm different (from a theoretical/mathematical level)
i.e. what makes hashes irreversible (without aid of a rainbow tree)
Here are some similar SO Questions that didn't go into as much detail as I was looking for:
What is the difference between Obfuscation, Hashing, and Encryption?
Difference between encryption and hashing
Well, you could look it up in Wikipedia... But since you want an explanation, I'll do my best here:
Hash Functions
They provide a mapping between an arbitrary length input, and a (usually) fixed length (or smaller length) output. It can be anything from a simple crc32, to a full blown cryptographic hash function such as MD5 or SHA1/2/256/512. The point is that there's a one-way mapping going on. It's always a many:1 mapping (meaning there will always be collisions) since every function produces a smaller output than it's capable of inputting (If you feed every possible 1mb file into MD5, you'll get a ton of collisions).
The reason they are hard (or impossible in practicality) to reverse is because of how they work internally. Most cryptographic hash functions iterate over the input set many times to produce the output. So if we look at each fixed length chunk of input (which is algorithm dependent), the hash function will call that the current state. It will then iterate over the state and change it to a new one and use that as feedback into itself (MD5 does this 64 times for each 512bit chunk of data). It then somehow combines the resultant states from all these iterations back together to form the resultant hash.
Now, if you wanted to decode the hash, you'd first need to figure out how to split the given hash into its iterated states (1 possibility for inputs smaller than the size of a chunk of data, many for larger inputs). Then you'd need to reverse the iteration for each state. Now, to explain why this is VERY hard, imagine trying to deduce a and b from the following formula: 10 = a + b. There are 10 positive combinations of a and b that can work. Now loop over that a bunch of times: tmp = a + b; a = b; b = tmp. For 64 iterations, you'd have over 10^64 possibilities to try. And that's just a simple addition where some state is preserved from iteration to iteration. Real hash functions do a lot more than 1 operation (MD5 does about 15 operations on 4 state variables). And since the next iteration depends on the state of the previous and the previous is destroyed in creating the current state, it's all but impossible to determine the input state that led to a given output state (for each iteration no less). Combine that, with the large number of possibilities involved, and decoding even an MD5 will take a near infinite (but not infinite) amount of resources. So many resources that it's actually significantly cheaper to brute-force the hash if you have an idea of the size of the input (for smaller inputs) than it is to even try to decode the hash.
Encryption Functions
They provide a 1:1 mapping between an arbitrary length input and output. And they are always reversible. The important thing to note is that it's reversible using some method. And it's always 1:1 for a given key. Now, there are multiple input:key pairs that might generate the same output (in fact there usually are, depending on the encryption function). Good encrypted data is indistinguishable from random noise. This is different from a good hash output which is always of a consistent format.
Use Cases
Use a hash function when you want to compare a value but can't store the plain representation (for any number of reasons). Passwords should fit this use-case very well since you don't want to store them plain-text for security reasons (and shouldn't). But what if you wanted to check a filesystem for pirated music files? It would be impractical to store 3 mb per music file. So instead, take the hash of the file, and store that (md5 would store 16 bytes instead of 3mb). That way, you just hash each file and compare to the stored database of hashes (This doesn't work as well in practice because of re-encoding, changing file headers, etc, but it's an example use-case).
Use a hash function when you're checking validity of input data. That's what they are designed for. If you have 2 pieces of input, and want to check to see if they are the same, run both through a hash function. The probability of a collision is astronomically low for small input sizes (assuming a good hash function). That's why it's recommended for passwords. For passwords up to 32 characters, md5 has 4 times the output space. SHA1 has 6 times the output space (approximately). SHA512 has about 16 times the output space. You don't really care what the password was, you care if it's the same as the one that was stored. That's why you should use hashes for passwords.
Use encryption whenever you need to get the input data back out. Notice the word need. If you're storing credit card numbers, you need to get them back out at some point, but don't want to store them plain text. So instead, store the encrypted version and keep the key as safe as possible.
Hash functions are also great for signing data. For example, if you're using HMAC, you sign a piece of data by taking a hash of the data concatenated with a known but not transmitted value (a secret value). So, you send the plain-text and the HMAC hash. Then, the receiver simply hashes the submitted data with the known value and checks to see if it matches the transmitted HMAC. If it's the same, you know it wasn't tampered with by a party without the secret value. This is commonly used in secure cookie systems by HTTP frameworks, as well as in message transmission of data over HTTP where you want some assurance of integrity in the data.
A note on hashes for passwords:
A key feature of cryptographic hash functions is that they should be very fast to create, and very difficult/slow to reverse (so much so that it's practically impossible). This poses a problem with passwords. If you store sha512(password), you're not doing a thing to guard against rainbow tables or brute force attacks. Remember, the hash function was designed for speed. So it's trivial for an attacker to just run a dictionary through the hash function and test each result.
Adding a salt helps matters since it adds a bit of unknown data to the hash. So instead of finding anything that matches md5(foo), they need to find something that when added to the known salt produces md5(foo.salt) (which is very much harder to do). But it still doesn't solve the speed problem since if they know the salt it's just a matter of running the dictionary through.
So, there are ways of dealing with this. One popular method is called key strengthening (or key stretching). Basically, you iterate over a hash many times (thousands usually). This does two things. First, it slows down the runtime of the hashing algorithm significantly. Second, if implemented right (passing the input and salt back in on each iteration) actually increases the entropy (available space) for the output, reducing the chances of collisions. A trivial implementation is:
var hash = password + salt;
for (var i = 0; i < 5000; i++) {
hash = sha512(hash + password + salt);
}
There are other, more standard implementations such as PBKDF2, BCrypt. But this technique is used by quite a few security related systems (such as PGP, WPA, Apache and OpenSSL).
The bottom line, hash(password) is not good enough. hash(password + salt) is better, but still not good enough... Use a stretched hash mechanism to produce your password hashes...
Another note on trivial stretching
Do not under any circumstances feed the output of one hash directly back into the hash function:
hash = sha512(password + salt);
for (i = 0; i < 1000; i++) {
hash = sha512(hash); // <-- Do NOT do this!
}
The reason for this has to do with collisions. Remember that all hash functions have collisions because the possible output space (the number of possible outputs) is smaller than then input space. To see why, let's look at what happens. To preface this, let's make the assumption that there's a 0.001% chance of collision from sha1() (it's much lower in reality, but for demonstration purposes).
hash1 = sha1(password + salt);
Now, hash1 has a probability of collision of 0.001%. But when we do the next hash2 = sha1(hash1);, all collisions of hash1 automatically become collisions of hash2. So now, we have hash1's rate at 0.001%, and the 2nd sha1() call adds to that. So now, hash2 has a probability of collision of 0.002%. That's twice as many chances! Each iteration will add another 0.001% chance of collision to the result. So, with 1000 iterations, the chance of collision jumped from a trivial 0.001% to 1%. Now, the degradation is linear, and the real probabilities are far smaller, but the effect is the same (an estimation of the chance of a single collision with md5 is about 1/(2128) or 1/(3x1038). While that seems small, thanks to the birthday attack it's not really as small as it seems).
Instead, by re-appending the salt and password each time, you're re-introducing data back into the hash function. So any collisions of any particular round are no longer collisions of the next round. So:
hash = sha512(password + salt);
for (i = 0; i < 1000; i++) {
hash = sha512(hash + password + salt);
}
Has the same chance of collision as the native sha512 function. Which is what you want. Use that instead.
A hash function could be considered the same as baking a loaf of bread. You start out with inputs (flour, water, yeast, etc...) and after applying the hash function (mixing + baking), you end up with an output: a loaf of bread.
Going the other way is extraordinarily difficult - you can't really separate the bread back into flour, water, yeast - some of that was lost during the baking process, and you can never tell exactly how much water or flour or yeast was used for a particular loaf, because that information was destroyed by the hashing function (aka the oven).
Many different variants of inputs will theoretically produce identical loaves (e.g. 2 cups of water and 1 tsbp of yeast produce exactly the same loaf as 2.1 cups of water and 0.9tsbp of yeast), but given one of those loaves, you can't tell exactly what combo of inputs produced it.
Encryption, on the other hand, could be viewed as a safe deposit box. Whatever you put in there comes back out, as long as you possess the key with which it was locked up in the first place. It's a symmetric operation. Given a key and some input, you get a certain output. Given that output, and the same key, you'll get back the original input. It's a 1:1 mapping.
Basic overview of hashing and encryption/decryption techniques are.
Hashing:
If you hash any plain text again you can not get the same plain
text from hashed text. Simply, It's a one-way process.
Encryption and Decryption:
If you encrypt any plain text with a key again you can
get same plain text by doing decryption on encrypted text with same(symetric)/diffrent(asymentric) key.
UPDATE:
To address the points mentioned in the edited question.
1. When to use hashes vs encryptions
Hashing is useful if you want to send someone a file. But you are afraid that someone else might intercept the file and change it. So a
way that the recipient can make sure that it is the right file is if
you post the hash value publicly. That way the recipient can compute
the hash value of the file received and check that it matches the hash
value.
Encryption is good if you say have a message to send to someone. You encrypt the message with a key and the recipient decrypts with the
same (or maybe even a different) key to get back the original message.
credits
2. What makes a hash or encryption algorithm different (from a theoretical/mathematical level) i.e. what makes hashes irreversible
(without aid of a rainbow tree)
Basically hashing is an
operation that loses information but not encryption. Let's look at
the difference in simple mathematical way for our easy understanding,
of course both have much more complicated mathematical operations with
repetitions involved in it
Encryption/Decryption (Reversible):
Addition:
4 + 3 = 7
This can be reversed by taking the sum and subtracting one of the
addends
7 - 3 = 4
Multiplication:
4 * 5 = 20
This can be reversed by taking the product and dividing by one of the
factors
20 / 4 = 5
So, here we could assume one of the addends/factors is a decryption key and result(7,20) is an encrypted text.
Hashing (Not Reversible):
Modulo division:
22 % 7 = 1
This can not be reversed because there is no operation that you can do to the quotient and the dividend to
reconstitute the divisor (or vice versa).
Can you find an operation to fill in where the '?' is?
1 ? 7 = 22
1 ? 22 = 7
So hash functions have the same mathematical quality as modulo division and lose the information.
credits
Use hashes when you don't want to be able to get back the original input, use encryption when you do.
Hashes take some input and turn it into some bits (usually thought of as a number, like a 32 bit integer, 64 bit integer, etc). The same input will always produce the same hash, but you PRINCIPALLY lose information in the process so you can't reliably reproduce the original input (there are a few caveats to that however).
Encryption principally preserves all of the information you put into the encryption function, just makes it hard (ideally impossible) for anyone to reverse back to the original input without possessing a specific key.
Simple Example of Hashing
Here's a trivial example to help you understand why hashing can't (in the general case) get back the original input. Say I'm creating a 1-bit hash. My hash function takes a bit string as input and sets the hash to 1 if there are an even number of bits set in the input string, else 0 if there were an odd number.
Example:
Input Hash
0010 0
0011 1
0110 1
1000 0
Note that there are many input values that result in a hash of 0, and many that result in a hash of 1. If you know the hash is 0, you can't know for sure what the original input was.
By the way, this 1-bit hash isn't exactly contrived... have a look at parity bit.
Simple Example of Encryption
You might encrypt text by using a simple letter substitution, say if the input is A, you write B. If the input is B, you write C. All the way to the end of the alphabet, where if the input is Z, you write A again.
Input Encrypted
CAT DBU
ZOO APP
Just like the simple hash example, this type of encryption has been used historically.
My two liners... generally Interviewer wanted the below answer.
Hashing is one way . You can not convert your data/ string from a hash code.
Encryption is 2 way - you can decrypt again the encrypted string if you have the key with you.
A Hash function turns a variable-sized amount of text into a fixed-sized text.
Source: https://en.wikipedia.org/wiki/Hash_function
Hash functions in PHP
A hash turns a string to a hashed string. See below.
HASH:
$str = 'My age is 29';
$hash = hash('sha1', $str);
echo $hash; // OUTPUT: 4d675d9fbefc74a38c89e005f9d776c75d92623e
Passwords are usually stored in their hashed representation instead as readable text. When an end-user wants gain access to an application protected with a password then a password must be given during authentication. When the user submits his password, then the valid authentication system receives the password and hashes this given password. This password hash is compared to the hash known by the system. Access is granted in case of equality.
DEHASH:
SHA1 is a one-way hash. Which means that you can't dehash the hash.
However, you can brute-force the hash. Please see: https://hashkiller.co.uk/sha1-decrypter.aspx.
MD5, is another hash. A MD5 dehasher can be found on this website: https://www.md5online.org/.
To hamper brute-force attacks on hashes a salt can be given.
In php you can use password_hash() for creating a password hash.
The function password_hash() automatically creates a salt.
To verify a password on a password hash (with a salt) use password_verify().
// Invoke this little script 3 times, and it will give you everytime a new hash
$password = '1234';
$hash = password_hash($password, PASSWORD_DEFAULT);
echo $hash;
// OUTPUT
$2y$10$ADxKiJW/Jn2DZNwpigWZ1ePwQ4il7V0ZB4iPeKj11n.iaDtLrC8bu
$2y$10$H8jRnHDOMsHFMEZdT4Mk4uI4DCW7/YRKjfdcmV3MiA/WdzEvou71u
$2y$10$qhyfIT25jpR63vCGvRbEoewACQZXQJ5glttlb01DmR4ota4L25jaW
One password can be represented by more then one hash.
When you verify the password with different password hashes by using password_verify(), then the password will be accepted as a valid password.
$password = '1234';
$hash = '$2y$10$ADxKiJW/Jn2DZNwpigWZ1ePwQ4il7V0ZB4iPeKj11n.iaDtLrC8bu';
var_dump( password_verify($password, $hash) );
$hash = '$2y$10$H8jRnHDOMsHFMEZdT4Mk4uI4DCW7/YRKjfdcmV3MiA/WdzEvou71u';
var_dump( password_verify($password, $hash) );
$hash = '$2y$10$qhyfIT25jpR63vCGvRbEoewACQZXQJ5glttlb01DmR4ota4L25jaW';
var_dump( password_verify($password, $hash) );
// OUTPUT
boolean true
boolean true
boolean true
An Encryption function transforms a text into a nonsensical ciphertext by using an encryption key, and vice versa.
Source: https://en.wikipedia.org/wiki/Encryption
Encryption in PHP
Let's dive into some PHP code that handles encryption.
--- The Mcrypt extention ---
ENCRYPT:
$cipher = MCRYPT_RIJNDAEL_128;
$key = 'A_KEY';
$data = 'My age is 29';
$mode = MCRYPT_MODE_ECB;
$encryptedData = mcrypt_encrypt($cipher, $key , $data , $mode);
var_dump($encryptedData);
//OUTPUT:
string '„Ùòyªq³¿ì¼üÀpå' (length=16)
DECRYPT:
$decryptedData = mcrypt_decrypt($cipher, $key , $encryptedData, $mode);
$decryptedData = rtrim($decryptedData, "\0\4"); // Remove the nulls and EOTs at the END
var_dump($decryptedData);
//OUTPUT:
string 'My age is 29' (length=12)
--- The OpenSSL extention ---
The Mcrypt extention was deprecated in 7.1. and removed in php 7.2.
The OpenSSL extention should be used in php 7. See the code snippets below:
$key = 'A_KEY';
$data = 'My age is 29';
// ENCRYPT
$encryptedData = openssl_encrypt($data , 'AES-128-CBC', $key, 0, 'IV_init_vector01');
var_dump($encryptedData);
// DECRYPT
$decryptedData = openssl_decrypt($encryptedData, 'AES-128-CBC', $key, 0, 'IV_init_vector01');
var_dump($decryptedData);
//OUTPUT
string '4RJ8+18YkEd7Xk+tAMLz5Q==' (length=24)
string 'My age is 29' (length=12)
Symmetric Encryption:
Symmetric encryption may also be referred to as shared key or shared secret encryption. In symmetric encryption, a single key is used both to encrypt and decrypt traffic.
Asymmetric Encryption:
Asymmetric encryption is also known as public-key cryptography. Asymmetric encryption differs from symmetric encryption primarily in that two keys are used: one for encryption and one for decryption. The most common asymmetric encryption algorithm is RSA.
Compared to symmetric encryption, asymmetric encryption imposes a high computational burden, and tends to be much slower. Thus, it isn't typically employed to protect payload data. Instead, its major strength is its ability to establish a secure channel over a nonsecure medium (for example, the Internet). This is accomplished by the exchange of public keys, which can only be used to encrypt data. The complementary private key, which is never shared, is used to decrypt.
Hashing:
Finally, hashing is a form of cryptographic security which differs from encryption. Whereas encryption is a two step process used to first encrypt and then decrypt a message, hashing condenses a message into an irreversible fixed-length value, or hash. Two of the most common hashing algorithms seen in networking are MD5 and SHA-1.
Read more here:http://packetlife.net/blog/2010/nov/23/symmetric-asymmetric-encryption-hashing/
Use hashes when you only need to go one way. For example, for passwords in a system, you use hashing because you will only ever verify that the value a user entered, after hashing, matches the value in your repository. With encryption, you can go two ways.
hashing algorithms and encryption algorithms are just mathematical algorithms. So in that respect they are not different -- its all just mathematical formulas. Semantics wise, though, there is the very big distinction between hashing (one-way) and encryption(two-way). Why are hashes irreversible? Because they are designed to be that way, because sometimes you want a one-way operation.
Encryption and hash algorithms work in similar ways. In each case, there is a need to create confusion and diffusion amongst the bits. Boiled down, confusion is creating a complex relationship between the key and the ciphertext, and diffusion is spreading the information of each bit around.
Many hash functions actually use encryption algorithms (or primitives of encryption algorithms. For example, the SHA-3 candidate Skein uses Threefish as the underlying method to process each block. The difference is that instead of keeping each block of ciphertext, they are destructively, deterministically merged together to a fixed length
when it comes to security for transmitting data i.e Two way communication you use encryption.All encryption requires a key
when it comes to authorization you use hashing.There is no key in hashing
Hashing takes any amount of data (binary or text) and creates a constant-length hash representing a checksum for the data. For example, the hash might be 16 bytes. Different hashing algorithms produce different size hashes. You obviously cannot re-create the original data from the hash, but you can hash the data again to see if the same hash value is generated. One-way Unix-based passwords work this way. The password is stored as a hash value, and to log onto a system, the password you type is hashed, and the hash value is compared against the hash of the real password. If they match, then you must've typed the correct password
why is hashing irreversible :
Hashing isn't reversible because the input-to-hash mapping is not 1-to-1.
Having two inputs map to the same hash value is usually referred to as a "hash collision". For security purposes, one of the properties of a "good" hash function is that collisions are rare in practical use.
You already got some good answers, but I guess you could see it like this:
ENCRYPTION:
Encryption has to be decryptable if you have the right key.
Example:
Like when you send an e-mail.
You might not want everyone in the world to know what you are writing to the person receiving the e-mail, but the person who receives the e-mail would probably want to be able to read it.
HASHES:
hashes work similar like encryption, but it should not be able to reverse it at all.
Example:
Like when you put a key in a locked door(the kinds that locks when you close them). You do not care how the lock works in detail, just as long as it unlocks itself when you use the key. If there is trouble you probably cannot fix it, instead get a new lock.(like forgetting passwords on every login, at least I do it all the time and it is a common area to use hashing).
... and I guess you could call that rainbow-algorithm a locksmith in this case.
Hope things clear up =)
Encoding vs Encryption vs Hashing
[Certificate] is a good example
Encoding
Public context. Represent data in some specific format
Example: Encoding is used for saving and transporting cryptographic keys, Certificate Signing Request(CSR), certificates
American Standard Code for Information Interchange (ASCII) - has 128 code points. It contains general(and some additional) symbols with corresponding representations like ASCII Code, binary(8 bit)
ASCII symbol - a
ASCII Code - 097
ASCII binary - 01100001
Base64 - has 64 code points with corresponding symbol, Base64 Code, binary(6 bit). Converts every 24 bites of data into 4(24/6) Base64 symbols. If there are no binary in a final block - 0 is used, if there are no final block - padding(=) is used
For example:
ASCII symbols: aa
ASCII binary: 01100001 01100001
Base64 binary: 011000 010110 000100 000000
Base64 symbols: YWE=
//aa == YWE=
Privacy-Enhanced Mail(PEM) - Base64 encoding - .pem, .crt, .cer .key (for private keys) for cryptographic keys, CSR, certificates. Uses plain-text headers and footers, for example:
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
Distinguished Encoding Rules (DER) - binary encoding for certificates and private keys - .der, .cer. ususally is a binary form of PEM-formatted certificate
PKCS#7(P7B) Cryptographic Message Syntax Standard(CMS) - Base64 encoding container format only for one or more digital certificates(not the Private key). .p7b, .p7s, .cer. For example:
-----BEGIN PKCS7-----
...
-----END PKCS7-----
Encryption
Private context. Is used to transform data using private key. And a side who knows private key can work with this data
Cryptography
symmetric
asymmetric
Diffie–Hellman Key Exchange(DHKE)
subject - Elliptic-curve cryptography (ECC)
Elliptic Curve Diffie–Hellman(ECDH)
symmetric and asymmetric cryptography is used for secure exchanging messaging between sides (e.g. side1 and side2, Alice and Bob, client and server...) in non secure environment
Symmetric-key(symmetric) cryptography(e.g. DES, AES) uses the same key for encode and decode. It is a kind of a private key because it should be kept private to have a private comunication
Public-key(asymmetric) cryptography(e.g. ECC, RSA) uses a pair of mathematically-related private/public keys. If data is encrypted by public key is decrypted by private key and vice verse if data which is encrypted by private key is decrypted by public key
Use cases of asymmetric cryptography:
Public key encryption(Subject Public Key Algorithms) - data is encrypted by public key is decrypted by private key
Digital signature(DS)(Signature Algorithms) - check sum which is encrypted by private key is decrypted by public key
Diffie–Hellman Key Exchange(DHKE) algorithm
Using the same shared secret key which is calculated based on public numbers(p, g), private key and another public key. This shared secret key is used as a key for encrypt/decrypt by two sides. It is based on a simple property of modular exponentiations
Static vs Ephemeral Keys:
Static - long term key which implicit authenticity[About]
Ephemeral - generated every new time and provides forward secrecy(FS) - if key was leaked - passed communications in secure because used different keys
public numbers(p, g) and random private key = public key
another public key and private key and public numbers(p) = shared secret key
Example:
//public info
`p = 11` is a **public** prime number (prime). e.g [2, 3, 5, 7, 11, 13, 17, 19, 23...]
`g = 8`, is a **public** primitive root modulo `p`. e.g [2, 6, 7, 8] is primitive root modulo 11
//creating private keys
`a = 6` is any **private** key of side1(Alice)
`b = 9` is any **private** key of side2(Bob)
//calculating public keys
//<my_public_key> = g^<my_private_key> mod p
`A = 3` = g^a mod p = 8^6 mod 11 = 262144 mod 11. is a **public** key of side1(Alice)
`B = 7` = g^b mod p = 8^9 mod 11 = 134217728 mod 11. is a **public** key of side2(Bob)
//calculating shared secret key for every side
//s = <another_public_key>^<my_private_key> mod p
//Side1(Alice) has p,g,a,A,B
`s = 4` = B^a mod p = 7^6 mod 11
//Side1(Bob) has p,g,b,B,A
`s = 4` = A^b mod p = 3^9 mod 11
As a result two sides has the same shared secret key
Let's say that we have a side3(Eve) which try to read messaging between side1(Alice) and side2(Bob). In this case Eve has:
//public info
`p = 11`
`g = 8`
//public Keys from Alice and Bob
`A = 3`
`B = 7`
To solve this task Eve has to know one of private keys(Alice's or Bob's). When Eve generate it's own private/public keys and calculate shared secret key depends on another public key(Alice's or Bob's) it will be another result(Eve-Alice or Eve-Bob)
Rivest–Shamir–Adleman(RSA)
mathematic with prime numbers. Multiplying prime numbers to get a larger number is a simple task while factoring larger numbers back to the original primes is much more difficult. generation keypair(private/publick keys), sharing public key, encryption/decryption
Elliptic-curve cryptography (ECC)
Large subject of asymmetric cryptography which uses mathematic with elliptic curves. Is better then RSA because of correlation of key size and ability to maintain security. generation keypair(private/publick keys), sharing public key, encryption/decryption
Elliptic Curve Diffie–Hellman(ECDH)
Key Exchange algorithm - is very similar to DHKE algorithm, but it uses ECC point multiplication instead of modular exponentiations
In very simple representation
public info and random private key = public key (key pair)
another public key and private key = shared secret key
For example:
you generate ECC key pair(private/public) + specify public info(curve e.g P-384) which were used for generation(e.g. as a part of certificate)
another side is able to use public info(from certificate) to generate own private/public key pair and send you own public key
you are able to calculate shared secret key
Hashing
One-Way Hash Functions
Public context. Unique representation of data which everybody can calculate based on open algorithm that is why it is used for verifying originality of the data(data was not changing transforming)
Hash algorithms: MD2, MD5, SHA-1, SHA-224, SHA-256, SHA-384, SHA-512...
Encryption
Integrity - can receiver check originality of the message(message was not changed)
Authentication - can receiver check originality of the sender
Non-repudiation - if receiver sends this message to side3(some body else) can side3 check originality of the sender
Is used for:
Hash function - Encryption, Integrity
Message Authentication Code(MAC) - Encryption, Integrity, Authentication - Authenticate a message. When receiver gets a message and MAC it can check and verify originality using symmetric key.
Hash-based Message Authentication Code(HMAC)
Key Derivation Functions (KDF) - transform password/weak key into strong key(this process called key stretching). For example it allows to apply KDF with password on back side and save it's result(e.g. hash) in DB instead of saving real passwords
HMAC-based key derivation(HKDF)
Digital signature - Encryption, Integrity, Authentication, Non-repudiation - Used with asymmetric key. Sign/verify(calculate checksum) of data
MAC(symmetric_key, message) -> MAC_data_tag
HMAC(symmetric_key, message, hash_func) -> hash
KDF(weak_key) -> key
HKDF(salt, weak_key, hash_func) -> hash
[Hash code vs Check sum]
Cryptography deals with numbers and strings. Basically every digital thing in the entire universe are numbers. When I say numbers, its 0 & 1. You know what they are, binary. The images you see on screen, the music that you listen through your earphone, everything are binaries. But our ears and eyes will not understand binaries right? Only brain could understand that, and even if it could understand binaries, it can’t enjoy binaries. So we convert the binaries to human understandable formats such as mp3,jpg,etc. Let’s term the process as Encoding. It’s two way process and can be easily decoded back to its original form.
Hashing
Hashing is another cryptography technique in which a data once converted to some other form can never be recovered back. In Layman’s term, there is no process called de-hashing. There are many hash functions to do the job such as sha-512, md5 and so on.
If the original value cannot be recovered, then where do we use this? Passwords! When you set up a password for your mobile or PC, a hash of your password is created and stored in a secure place. When you make a login attempt next time, the entered string is again hashed with the same algorithm (hash function) and the output is matched with the stored value. If it’s the same, you get logged in. Otherwise you are thrown out.
Credits: wikimedia
By applying hash to the password, we can ensure that an attacker will never get our password even if he steal the stored password file. The attacker will have the hash of the password. He can probably find a list of most commonly used passwords and apply sha-512 to each of it and compare it with the value in his hand. It is called the dictionary attack. But how long would he do this? If your password is random enough, do you think this method of cracking would work?
All the passwords in the databases of Facebook, Google and Amazon are hashed, or at least they are supposed to be hashed.
Then there is Encryption
Encryption lies in between hashing and encoding. Encoding is a two way process and should not be used to provide security. Encryption is also a two way process, but original data can be retrieved if and only if the encryption key is known. If you don’t know how encryption works, don’t worry, we will discuss the basics here. That would be enough to understand the basics of SSL. So, there are two types of Encryption namely Symmetric and Asymmetric encryption.
Symmetric Key Encryption
I am trying to keep things as simple as I could. So, let’s understand the symmetric encryption by means of a shift algorithm. This algorithm is used to encrypt alphabets by shifting the letters to either left or right. Let’s take a string CRYPTO and consider a number +3. Then, the encrypted format of CRYPTO will be FUBSWR. That means each letter is shifted to right by 3 places.
Here, the word CRYPTO is called Plaintext, the output FUBSWR is called the Ciphertext, the value +3 is called the Encryption key (symmetric key) and the whole process is a cipher. This is one of the oldest and basic symmetric key encryption algorithm and its first usage was reported during the time of Julius Caesar. So, it was named after him and it is the famous Caesar Cipher. Anyone who knows the encryption key and can apply the reverse of Caesar’s algorithm and retrieve the original Plaintext. Hence it is called a Symmetric Encryption.
Asymmetric Key Encryption
We know that, in Symmetric encryption same key is used for both encryption and decryption. Once that key is stolen, all the data is gone. That’s a huge risk and we need more complex technique. In 1976, Whitfield Diffie and Martin Hellman first published the concept of Asymmetric encryption and the algorithm was known as Diffie–Hellman key exchange. Then in 1978, Ron Rivest, Adi Shamir and Leonard Adleman of MIT published the RSA algorithm. These can be considered as the foundation of Asymmetric cryptography.
As compared to Symmetric encryption, in Asymmetric encryption, there will be two keys instead of one. One is called the Public key, and the other one is the Private key. Theoretically, during initiation we can generate the Public-Private key pair to our machine. Private key should be kept in a safe place and it should never be shared with anyone. Public key, as the name indicates, can be shared with anyone who wish to send encrypted text to you. Now, those who have your public key can encrypt the secret data with it. If the key pair were generated using RSA algorithm, then they should use the same algorithm while encrypting the data. Usually the algorithm will be specified in the public key. The encrypted data can only be decrypted with the private key which is owned by you.
Source: SSL/TLS for dummies part 1 : Ciphersuite, Hashing,Encryption | WST (https://www.wst.space/ssl-part1-ciphersuite-hashing-encryption/)
EncryptionThe Purpose of encryption is to transform data in order to keep it secret E.g (Sending someone a secret text that they only should able to read,sending passwords through Internet).
Instead of focusing the usability the goal is to ensure the data send can be sent secretly and it can only seen by the user whom you sent.
It Encrypts the data into another format of transforming it into unique pattern it can be encrypt with the secret key and those users who having the secret key can able to see the message by reversible the process.
E.g(AES,BLOWFISH,RSA)
The encryption may simply look like this FhQp6U4N28GITVGjdt37hZN
Hashing In technically we can say it as takes a arbitary input and produced a fixed length string.
Most important thing in these is you can't go from the output to the input.It produces the strong output that the given information has not been modified.
The process is to take a input and hash it and then send with the sender's private key once the receiver received they can validate it with sender's public key.
If the hash is wrong and did't match with hash we can't see any of the information. E.g(MD5,SHA.....)

Securring stored password

Actually I have a database that stores customer sensitive informations.
I'm using something like that to encrypt that data:
$algo = 'AES-256-CTR';
$key ='password md5 from bdd'
$iv = substr(hash('sha256',$email),0,openssl_cipher_iv_length($algo));
$data = base64_encode($data);
$data = openssl_encrypt($data,$algo,$key,OPENSSL_RAW_DATA,$iv);
As you see I'm using the customer's email to create the iv and his password md5 for the key.
So if someone hack my bdd he can decrypt the sensitive data.
Is there a better way to do, knowing that my php script needs to be able to decrypt the data for use
My ideas:
-Use a executable on the server that create/modify the password and/or the iv and does the decryption and that the php script calls for that.
-Use a second server that stores the passwords and that the php script needs to call for decryption.
I suggest you to use Argon2 to derivate $key from user's password, then use Symmetric Encryption like AES, or XSalsa20 or XChacha20 to encrypt it, alternatively you can use other derivation functions that make brute force impractical, just take care that salt is never reused, take a look at 🗄 Vault, and libsodium-php
As you see I'm using the customer's email to create the iv and his password md5 for the key.
A few things you may nees to be aware of
IV suppose to be unique for CTR mode, but as already pointed out, it is static (derived from email).
it is possible to find md5 collision in a minute on commodity hw (newest tunelling method)
you are missing any authentication tag.
Common practice for encryption is having random IV, encrypted source and MAC as part of the ciphertext (e. g. iv.encrypted.mac)
imho creating a key from password md5 may be feasible assuming you don't store the keys, so there is nothing to find a collision against and the passwords are having high entropy (are long and random)
So if someone hack my bdd he can decrypt the sensitive data. Is there a better way to do, knowing that my php script needs to be able to decrypt the data for use
it is generally a problem storing system credentials. you may use a credential vault, but you need to store the vault credentials somewhere. you can encrypt the system credentials, but you need to store the decryption key somewhere. Ay least hide the credentials from plain sight so it makes more difficult for automated hacking tools or not so dedicated adversaries.

Decode base64 string with custom hash [duplicate]

I see a lot of confusion between hashes and encryption algorithms and I would like to hear some more expert advice about:
When to use hashes vs encryptions
What makes a hash or encryption algorithm different (from a theoretical/mathematical level)
i.e. what makes hashes irreversible (without aid of a rainbow tree)
Here are some similar SO Questions that didn't go into as much detail as I was looking for:
What is the difference between Obfuscation, Hashing, and Encryption?
Difference between encryption and hashing
Well, you could look it up in Wikipedia... But since you want an explanation, I'll do my best here:
Hash Functions
They provide a mapping between an arbitrary length input, and a (usually) fixed length (or smaller length) output. It can be anything from a simple crc32, to a full blown cryptographic hash function such as MD5 or SHA1/2/256/512. The point is that there's a one-way mapping going on. It's always a many:1 mapping (meaning there will always be collisions) since every function produces a smaller output than it's capable of inputting (If you feed every possible 1mb file into MD5, you'll get a ton of collisions).
The reason they are hard (or impossible in practicality) to reverse is because of how they work internally. Most cryptographic hash functions iterate over the input set many times to produce the output. So if we look at each fixed length chunk of input (which is algorithm dependent), the hash function will call that the current state. It will then iterate over the state and change it to a new one and use that as feedback into itself (MD5 does this 64 times for each 512bit chunk of data). It then somehow combines the resultant states from all these iterations back together to form the resultant hash.
Now, if you wanted to decode the hash, you'd first need to figure out how to split the given hash into its iterated states (1 possibility for inputs smaller than the size of a chunk of data, many for larger inputs). Then you'd need to reverse the iteration for each state. Now, to explain why this is VERY hard, imagine trying to deduce a and b from the following formula: 10 = a + b. There are 10 positive combinations of a and b that can work. Now loop over that a bunch of times: tmp = a + b; a = b; b = tmp. For 64 iterations, you'd have over 10^64 possibilities to try. And that's just a simple addition where some state is preserved from iteration to iteration. Real hash functions do a lot more than 1 operation (MD5 does about 15 operations on 4 state variables). And since the next iteration depends on the state of the previous and the previous is destroyed in creating the current state, it's all but impossible to determine the input state that led to a given output state (for each iteration no less). Combine that, with the large number of possibilities involved, and decoding even an MD5 will take a near infinite (but not infinite) amount of resources. So many resources that it's actually significantly cheaper to brute-force the hash if you have an idea of the size of the input (for smaller inputs) than it is to even try to decode the hash.
Encryption Functions
They provide a 1:1 mapping between an arbitrary length input and output. And they are always reversible. The important thing to note is that it's reversible using some method. And it's always 1:1 for a given key. Now, there are multiple input:key pairs that might generate the same output (in fact there usually are, depending on the encryption function). Good encrypted data is indistinguishable from random noise. This is different from a good hash output which is always of a consistent format.
Use Cases
Use a hash function when you want to compare a value but can't store the plain representation (for any number of reasons). Passwords should fit this use-case very well since you don't want to store them plain-text for security reasons (and shouldn't). But what if you wanted to check a filesystem for pirated music files? It would be impractical to store 3 mb per music file. So instead, take the hash of the file, and store that (md5 would store 16 bytes instead of 3mb). That way, you just hash each file and compare to the stored database of hashes (This doesn't work as well in practice because of re-encoding, changing file headers, etc, but it's an example use-case).
Use a hash function when you're checking validity of input data. That's what they are designed for. If you have 2 pieces of input, and want to check to see if they are the same, run both through a hash function. The probability of a collision is astronomically low for small input sizes (assuming a good hash function). That's why it's recommended for passwords. For passwords up to 32 characters, md5 has 4 times the output space. SHA1 has 6 times the output space (approximately). SHA512 has about 16 times the output space. You don't really care what the password was, you care if it's the same as the one that was stored. That's why you should use hashes for passwords.
Use encryption whenever you need to get the input data back out. Notice the word need. If you're storing credit card numbers, you need to get them back out at some point, but don't want to store them plain text. So instead, store the encrypted version and keep the key as safe as possible.
Hash functions are also great for signing data. For example, if you're using HMAC, you sign a piece of data by taking a hash of the data concatenated with a known but not transmitted value (a secret value). So, you send the plain-text and the HMAC hash. Then, the receiver simply hashes the submitted data with the known value and checks to see if it matches the transmitted HMAC. If it's the same, you know it wasn't tampered with by a party without the secret value. This is commonly used in secure cookie systems by HTTP frameworks, as well as in message transmission of data over HTTP where you want some assurance of integrity in the data.
A note on hashes for passwords:
A key feature of cryptographic hash functions is that they should be very fast to create, and very difficult/slow to reverse (so much so that it's practically impossible). This poses a problem with passwords. If you store sha512(password), you're not doing a thing to guard against rainbow tables or brute force attacks. Remember, the hash function was designed for speed. So it's trivial for an attacker to just run a dictionary through the hash function and test each result.
Adding a salt helps matters since it adds a bit of unknown data to the hash. So instead of finding anything that matches md5(foo), they need to find something that when added to the known salt produces md5(foo.salt) (which is very much harder to do). But it still doesn't solve the speed problem since if they know the salt it's just a matter of running the dictionary through.
So, there are ways of dealing with this. One popular method is called key strengthening (or key stretching). Basically, you iterate over a hash many times (thousands usually). This does two things. First, it slows down the runtime of the hashing algorithm significantly. Second, if implemented right (passing the input and salt back in on each iteration) actually increases the entropy (available space) for the output, reducing the chances of collisions. A trivial implementation is:
var hash = password + salt;
for (var i = 0; i < 5000; i++) {
hash = sha512(hash + password + salt);
}
There are other, more standard implementations such as PBKDF2, BCrypt. But this technique is used by quite a few security related systems (such as PGP, WPA, Apache and OpenSSL).
The bottom line, hash(password) is not good enough. hash(password + salt) is better, but still not good enough... Use a stretched hash mechanism to produce your password hashes...
Another note on trivial stretching
Do not under any circumstances feed the output of one hash directly back into the hash function:
hash = sha512(password + salt);
for (i = 0; i < 1000; i++) {
hash = sha512(hash); // <-- Do NOT do this!
}
The reason for this has to do with collisions. Remember that all hash functions have collisions because the possible output space (the number of possible outputs) is smaller than then input space. To see why, let's look at what happens. To preface this, let's make the assumption that there's a 0.001% chance of collision from sha1() (it's much lower in reality, but for demonstration purposes).
hash1 = sha1(password + salt);
Now, hash1 has a probability of collision of 0.001%. But when we do the next hash2 = sha1(hash1);, all collisions of hash1 automatically become collisions of hash2. So now, we have hash1's rate at 0.001%, and the 2nd sha1() call adds to that. So now, hash2 has a probability of collision of 0.002%. That's twice as many chances! Each iteration will add another 0.001% chance of collision to the result. So, with 1000 iterations, the chance of collision jumped from a trivial 0.001% to 1%. Now, the degradation is linear, and the real probabilities are far smaller, but the effect is the same (an estimation of the chance of a single collision with md5 is about 1/(2128) or 1/(3x1038). While that seems small, thanks to the birthday attack it's not really as small as it seems).
Instead, by re-appending the salt and password each time, you're re-introducing data back into the hash function. So any collisions of any particular round are no longer collisions of the next round. So:
hash = sha512(password + salt);
for (i = 0; i < 1000; i++) {
hash = sha512(hash + password + salt);
}
Has the same chance of collision as the native sha512 function. Which is what you want. Use that instead.
A hash function could be considered the same as baking a loaf of bread. You start out with inputs (flour, water, yeast, etc...) and after applying the hash function (mixing + baking), you end up with an output: a loaf of bread.
Going the other way is extraordinarily difficult - you can't really separate the bread back into flour, water, yeast - some of that was lost during the baking process, and you can never tell exactly how much water or flour or yeast was used for a particular loaf, because that information was destroyed by the hashing function (aka the oven).
Many different variants of inputs will theoretically produce identical loaves (e.g. 2 cups of water and 1 tsbp of yeast produce exactly the same loaf as 2.1 cups of water and 0.9tsbp of yeast), but given one of those loaves, you can't tell exactly what combo of inputs produced it.
Encryption, on the other hand, could be viewed as a safe deposit box. Whatever you put in there comes back out, as long as you possess the key with which it was locked up in the first place. It's a symmetric operation. Given a key and some input, you get a certain output. Given that output, and the same key, you'll get back the original input. It's a 1:1 mapping.
Basic overview of hashing and encryption/decryption techniques are.
Hashing:
If you hash any plain text again you can not get the same plain
text from hashed text. Simply, It's a one-way process.
Encryption and Decryption:
If you encrypt any plain text with a key again you can
get same plain text by doing decryption on encrypted text with same(symetric)/diffrent(asymentric) key.
UPDATE:
To address the points mentioned in the edited question.
1. When to use hashes vs encryptions
Hashing is useful if you want to send someone a file. But you are afraid that someone else might intercept the file and change it. So a
way that the recipient can make sure that it is the right file is if
you post the hash value publicly. That way the recipient can compute
the hash value of the file received and check that it matches the hash
value.
Encryption is good if you say have a message to send to someone. You encrypt the message with a key and the recipient decrypts with the
same (or maybe even a different) key to get back the original message.
credits
2. What makes a hash or encryption algorithm different (from a theoretical/mathematical level) i.e. what makes hashes irreversible
(without aid of a rainbow tree)
Basically hashing is an
operation that loses information but not encryption. Let's look at
the difference in simple mathematical way for our easy understanding,
of course both have much more complicated mathematical operations with
repetitions involved in it
Encryption/Decryption (Reversible):
Addition:
4 + 3 = 7
This can be reversed by taking the sum and subtracting one of the
addends
7 - 3 = 4
Multiplication:
4 * 5 = 20
This can be reversed by taking the product and dividing by one of the
factors
20 / 4 = 5
So, here we could assume one of the addends/factors is a decryption key and result(7,20) is an encrypted text.
Hashing (Not Reversible):
Modulo division:
22 % 7 = 1
This can not be reversed because there is no operation that you can do to the quotient and the dividend to
reconstitute the divisor (or vice versa).
Can you find an operation to fill in where the '?' is?
1 ? 7 = 22
1 ? 22 = 7
So hash functions have the same mathematical quality as modulo division and lose the information.
credits
Use hashes when you don't want to be able to get back the original input, use encryption when you do.
Hashes take some input and turn it into some bits (usually thought of as a number, like a 32 bit integer, 64 bit integer, etc). The same input will always produce the same hash, but you PRINCIPALLY lose information in the process so you can't reliably reproduce the original input (there are a few caveats to that however).
Encryption principally preserves all of the information you put into the encryption function, just makes it hard (ideally impossible) for anyone to reverse back to the original input without possessing a specific key.
Simple Example of Hashing
Here's a trivial example to help you understand why hashing can't (in the general case) get back the original input. Say I'm creating a 1-bit hash. My hash function takes a bit string as input and sets the hash to 1 if there are an even number of bits set in the input string, else 0 if there were an odd number.
Example:
Input Hash
0010 0
0011 1
0110 1
1000 0
Note that there are many input values that result in a hash of 0, and many that result in a hash of 1. If you know the hash is 0, you can't know for sure what the original input was.
By the way, this 1-bit hash isn't exactly contrived... have a look at parity bit.
Simple Example of Encryption
You might encrypt text by using a simple letter substitution, say if the input is A, you write B. If the input is B, you write C. All the way to the end of the alphabet, where if the input is Z, you write A again.
Input Encrypted
CAT DBU
ZOO APP
Just like the simple hash example, this type of encryption has been used historically.
My two liners... generally Interviewer wanted the below answer.
Hashing is one way . You can not convert your data/ string from a hash code.
Encryption is 2 way - you can decrypt again the encrypted string if you have the key with you.
A Hash function turns a variable-sized amount of text into a fixed-sized text.
Source: https://en.wikipedia.org/wiki/Hash_function
Hash functions in PHP
A hash turns a string to a hashed string. See below.
HASH:
$str = 'My age is 29';
$hash = hash('sha1', $str);
echo $hash; // OUTPUT: 4d675d9fbefc74a38c89e005f9d776c75d92623e
Passwords are usually stored in their hashed representation instead as readable text. When an end-user wants gain access to an application protected with a password then a password must be given during authentication. When the user submits his password, then the valid authentication system receives the password and hashes this given password. This password hash is compared to the hash known by the system. Access is granted in case of equality.
DEHASH:
SHA1 is a one-way hash. Which means that you can't dehash the hash.
However, you can brute-force the hash. Please see: https://hashkiller.co.uk/sha1-decrypter.aspx.
MD5, is another hash. A MD5 dehasher can be found on this website: https://www.md5online.org/.
To hamper brute-force attacks on hashes a salt can be given.
In php you can use password_hash() for creating a password hash.
The function password_hash() automatically creates a salt.
To verify a password on a password hash (with a salt) use password_verify().
// Invoke this little script 3 times, and it will give you everytime a new hash
$password = '1234';
$hash = password_hash($password, PASSWORD_DEFAULT);
echo $hash;
// OUTPUT
$2y$10$ADxKiJW/Jn2DZNwpigWZ1ePwQ4il7V0ZB4iPeKj11n.iaDtLrC8bu
$2y$10$H8jRnHDOMsHFMEZdT4Mk4uI4DCW7/YRKjfdcmV3MiA/WdzEvou71u
$2y$10$qhyfIT25jpR63vCGvRbEoewACQZXQJ5glttlb01DmR4ota4L25jaW
One password can be represented by more then one hash.
When you verify the password with different password hashes by using password_verify(), then the password will be accepted as a valid password.
$password = '1234';
$hash = '$2y$10$ADxKiJW/Jn2DZNwpigWZ1ePwQ4il7V0ZB4iPeKj11n.iaDtLrC8bu';
var_dump( password_verify($password, $hash) );
$hash = '$2y$10$H8jRnHDOMsHFMEZdT4Mk4uI4DCW7/YRKjfdcmV3MiA/WdzEvou71u';
var_dump( password_verify($password, $hash) );
$hash = '$2y$10$qhyfIT25jpR63vCGvRbEoewACQZXQJ5glttlb01DmR4ota4L25jaW';
var_dump( password_verify($password, $hash) );
// OUTPUT
boolean true
boolean true
boolean true
An Encryption function transforms a text into a nonsensical ciphertext by using an encryption key, and vice versa.
Source: https://en.wikipedia.org/wiki/Encryption
Encryption in PHP
Let's dive into some PHP code that handles encryption.
--- The Mcrypt extention ---
ENCRYPT:
$cipher = MCRYPT_RIJNDAEL_128;
$key = 'A_KEY';
$data = 'My age is 29';
$mode = MCRYPT_MODE_ECB;
$encryptedData = mcrypt_encrypt($cipher, $key , $data , $mode);
var_dump($encryptedData);
//OUTPUT:
string '„Ùòyªq³¿ì¼üÀpå' (length=16)
DECRYPT:
$decryptedData = mcrypt_decrypt($cipher, $key , $encryptedData, $mode);
$decryptedData = rtrim($decryptedData, "\0\4"); // Remove the nulls and EOTs at the END
var_dump($decryptedData);
//OUTPUT:
string 'My age is 29' (length=12)
--- The OpenSSL extention ---
The Mcrypt extention was deprecated in 7.1. and removed in php 7.2.
The OpenSSL extention should be used in php 7. See the code snippets below:
$key = 'A_KEY';
$data = 'My age is 29';
// ENCRYPT
$encryptedData = openssl_encrypt($data , 'AES-128-CBC', $key, 0, 'IV_init_vector01');
var_dump($encryptedData);
// DECRYPT
$decryptedData = openssl_decrypt($encryptedData, 'AES-128-CBC', $key, 0, 'IV_init_vector01');
var_dump($decryptedData);
//OUTPUT
string '4RJ8+18YkEd7Xk+tAMLz5Q==' (length=24)
string 'My age is 29' (length=12)
Symmetric Encryption:
Symmetric encryption may also be referred to as shared key or shared secret encryption. In symmetric encryption, a single key is used both to encrypt and decrypt traffic.
Asymmetric Encryption:
Asymmetric encryption is also known as public-key cryptography. Asymmetric encryption differs from symmetric encryption primarily in that two keys are used: one for encryption and one for decryption. The most common asymmetric encryption algorithm is RSA.
Compared to symmetric encryption, asymmetric encryption imposes a high computational burden, and tends to be much slower. Thus, it isn't typically employed to protect payload data. Instead, its major strength is its ability to establish a secure channel over a nonsecure medium (for example, the Internet). This is accomplished by the exchange of public keys, which can only be used to encrypt data. The complementary private key, which is never shared, is used to decrypt.
Hashing:
Finally, hashing is a form of cryptographic security which differs from encryption. Whereas encryption is a two step process used to first encrypt and then decrypt a message, hashing condenses a message into an irreversible fixed-length value, or hash. Two of the most common hashing algorithms seen in networking are MD5 and SHA-1.
Read more here:http://packetlife.net/blog/2010/nov/23/symmetric-asymmetric-encryption-hashing/
Use hashes when you only need to go one way. For example, for passwords in a system, you use hashing because you will only ever verify that the value a user entered, after hashing, matches the value in your repository. With encryption, you can go two ways.
hashing algorithms and encryption algorithms are just mathematical algorithms. So in that respect they are not different -- its all just mathematical formulas. Semantics wise, though, there is the very big distinction between hashing (one-way) and encryption(two-way). Why are hashes irreversible? Because they are designed to be that way, because sometimes you want a one-way operation.
Encryption and hash algorithms work in similar ways. In each case, there is a need to create confusion and diffusion amongst the bits. Boiled down, confusion is creating a complex relationship between the key and the ciphertext, and diffusion is spreading the information of each bit around.
Many hash functions actually use encryption algorithms (or primitives of encryption algorithms. For example, the SHA-3 candidate Skein uses Threefish as the underlying method to process each block. The difference is that instead of keeping each block of ciphertext, they are destructively, deterministically merged together to a fixed length
when it comes to security for transmitting data i.e Two way communication you use encryption.All encryption requires a key
when it comes to authorization you use hashing.There is no key in hashing
Hashing takes any amount of data (binary or text) and creates a constant-length hash representing a checksum for the data. For example, the hash might be 16 bytes. Different hashing algorithms produce different size hashes. You obviously cannot re-create the original data from the hash, but you can hash the data again to see if the same hash value is generated. One-way Unix-based passwords work this way. The password is stored as a hash value, and to log onto a system, the password you type is hashed, and the hash value is compared against the hash of the real password. If they match, then you must've typed the correct password
why is hashing irreversible :
Hashing isn't reversible because the input-to-hash mapping is not 1-to-1.
Having two inputs map to the same hash value is usually referred to as a "hash collision". For security purposes, one of the properties of a "good" hash function is that collisions are rare in practical use.
You already got some good answers, but I guess you could see it like this:
ENCRYPTION:
Encryption has to be decryptable if you have the right key.
Example:
Like when you send an e-mail.
You might not want everyone in the world to know what you are writing to the person receiving the e-mail, but the person who receives the e-mail would probably want to be able to read it.
HASHES:
hashes work similar like encryption, but it should not be able to reverse it at all.
Example:
Like when you put a key in a locked door(the kinds that locks when you close them). You do not care how the lock works in detail, just as long as it unlocks itself when you use the key. If there is trouble you probably cannot fix it, instead get a new lock.(like forgetting passwords on every login, at least I do it all the time and it is a common area to use hashing).
... and I guess you could call that rainbow-algorithm a locksmith in this case.
Hope things clear up =)
Encoding vs Encryption vs Hashing
[Certificate] is a good example
Encoding
Public context. Represent data in some specific format
Example: Encoding is used for saving and transporting cryptographic keys, Certificate Signing Request(CSR), certificates
American Standard Code for Information Interchange (ASCII) - has 128 code points. It contains general(and some additional) symbols with corresponding representations like ASCII Code, binary(8 bit)
ASCII symbol - a
ASCII Code - 097
ASCII binary - 01100001
Base64 - has 64 code points with corresponding symbol, Base64 Code, binary(6 bit). Converts every 24 bites of data into 4(24/6) Base64 symbols. If there are no binary in a final block - 0 is used, if there are no final block - padding(=) is used
For example:
ASCII symbols: aa
ASCII binary: 01100001 01100001
Base64 binary: 011000 010110 000100 000000
Base64 symbols: YWE=
//aa == YWE=
Privacy-Enhanced Mail(PEM) - Base64 encoding - .pem, .crt, .cer .key (for private keys) for cryptographic keys, CSR, certificates. Uses plain-text headers and footers, for example:
-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
Distinguished Encoding Rules (DER) - binary encoding for certificates and private keys - .der, .cer. ususally is a binary form of PEM-formatted certificate
PKCS#7(P7B) Cryptographic Message Syntax Standard(CMS) - Base64 encoding container format only for one or more digital certificates(not the Private key). .p7b, .p7s, .cer. For example:
-----BEGIN PKCS7-----
...
-----END PKCS7-----
Encryption
Private context. Is used to transform data using private key. And a side who knows private key can work with this data
Cryptography
symmetric
asymmetric
Diffie–Hellman Key Exchange(DHKE)
subject - Elliptic-curve cryptography (ECC)
Elliptic Curve Diffie–Hellman(ECDH)
symmetric and asymmetric cryptography is used for secure exchanging messaging between sides (e.g. side1 and side2, Alice and Bob, client and server...) in non secure environment
Symmetric-key(symmetric) cryptography(e.g. DES, AES) uses the same key for encode and decode. It is a kind of a private key because it should be kept private to have a private comunication
Public-key(asymmetric) cryptography(e.g. ECC, RSA) uses a pair of mathematically-related private/public keys. If data is encrypted by public key is decrypted by private key and vice verse if data which is encrypted by private key is decrypted by public key
Use cases of asymmetric cryptography:
Public key encryption(Subject Public Key Algorithms) - data is encrypted by public key is decrypted by private key
Digital signature(DS)(Signature Algorithms) - check sum which is encrypted by private key is decrypted by public key
Diffie–Hellman Key Exchange(DHKE) algorithm
Using the same shared secret key which is calculated based on public numbers(p, g), private key and another public key. This shared secret key is used as a key for encrypt/decrypt by two sides. It is based on a simple property of modular exponentiations
Static vs Ephemeral Keys:
Static - long term key which implicit authenticity[About]
Ephemeral - generated every new time and provides forward secrecy(FS) - if key was leaked - passed communications in secure because used different keys
public numbers(p, g) and random private key = public key
another public key and private key and public numbers(p) = shared secret key
Example:
//public info
`p = 11` is a **public** prime number (prime). e.g [2, 3, 5, 7, 11, 13, 17, 19, 23...]
`g = 8`, is a **public** primitive root modulo `p`. e.g [2, 6, 7, 8] is primitive root modulo 11
//creating private keys
`a = 6` is any **private** key of side1(Alice)
`b = 9` is any **private** key of side2(Bob)
//calculating public keys
//<my_public_key> = g^<my_private_key> mod p
`A = 3` = g^a mod p = 8^6 mod 11 = 262144 mod 11. is a **public** key of side1(Alice)
`B = 7` = g^b mod p = 8^9 mod 11 = 134217728 mod 11. is a **public** key of side2(Bob)
//calculating shared secret key for every side
//s = <another_public_key>^<my_private_key> mod p
//Side1(Alice) has p,g,a,A,B
`s = 4` = B^a mod p = 7^6 mod 11
//Side1(Bob) has p,g,b,B,A
`s = 4` = A^b mod p = 3^9 mod 11
As a result two sides has the same shared secret key
Let's say that we have a side3(Eve) which try to read messaging between side1(Alice) and side2(Bob). In this case Eve has:
//public info
`p = 11`
`g = 8`
//public Keys from Alice and Bob
`A = 3`
`B = 7`
To solve this task Eve has to know one of private keys(Alice's or Bob's). When Eve generate it's own private/public keys and calculate shared secret key depends on another public key(Alice's or Bob's) it will be another result(Eve-Alice or Eve-Bob)
Rivest–Shamir–Adleman(RSA)
mathematic with prime numbers. Multiplying prime numbers to get a larger number is a simple task while factoring larger numbers back to the original primes is much more difficult. generation keypair(private/publick keys), sharing public key, encryption/decryption
Elliptic-curve cryptography (ECC)
Large subject of asymmetric cryptography which uses mathematic with elliptic curves. Is better then RSA because of correlation of key size and ability to maintain security. generation keypair(private/publick keys), sharing public key, encryption/decryption
Elliptic Curve Diffie–Hellman(ECDH)
Key Exchange algorithm - is very similar to DHKE algorithm, but it uses ECC point multiplication instead of modular exponentiations
In very simple representation
public info and random private key = public key (key pair)
another public key and private key = shared secret key
For example:
you generate ECC key pair(private/public) + specify public info(curve e.g P-384) which were used for generation(e.g. as a part of certificate)
another side is able to use public info(from certificate) to generate own private/public key pair and send you own public key
you are able to calculate shared secret key
Hashing
One-Way Hash Functions
Public context. Unique representation of data which everybody can calculate based on open algorithm that is why it is used for verifying originality of the data(data was not changing transforming)
Hash algorithms: MD2, MD5, SHA-1, SHA-224, SHA-256, SHA-384, SHA-512...
Encryption
Integrity - can receiver check originality of the message(message was not changed)
Authentication - can receiver check originality of the sender
Non-repudiation - if receiver sends this message to side3(some body else) can side3 check originality of the sender
Is used for:
Hash function - Encryption, Integrity
Message Authentication Code(MAC) - Encryption, Integrity, Authentication - Authenticate a message. When receiver gets a message and MAC it can check and verify originality using symmetric key.
Hash-based Message Authentication Code(HMAC)
Digital signature - Encryption, Integrity, Authentication, Non-repudiation - Used with asymmetric key. Sign/verify(calculate checksum) of data
creating:
MAC(symmetric_key, message) -> MAC_data_tag
HMAC(symmetric_key, message, hash_func) -> hash
sending:
message
MAC_data_tag
using:
(MAC(symmetric_key, message) -> MAC_data_tag2) == MAC_data_tag
creating:
Encrypt(private_key, (Digital signature(message) -> check_sum)) -> Encrypted(check_sum)
sending:
message
Encrypted(check_sum)
using:
1. Decrypt(public_key, Encrypted(check_sum)) -> check_sum
2. (Digital signature(message) -> check_sum2) == check_sum
Key Derivation Functions (KDF) - transform password/weak key into strong key(this process called key stretching). For example it allows to apply KDF with password on back side and save it's result(e.g. hash) in DB instead of saving real passwords. HMAC-based key derivation(HKDF)
KDF(weak_key) -> key
HKDF(salt, weak_key, hash_func) -> hash
[Hash code vs Check sum]
Cryptography deals with numbers and strings. Basically every digital thing in the entire universe are numbers. When I say numbers, its 0 & 1. You know what they are, binary. The images you see on screen, the music that you listen through your earphone, everything are binaries. But our ears and eyes will not understand binaries right? Only brain could understand that, and even if it could understand binaries, it can’t enjoy binaries. So we convert the binaries to human understandable formats such as mp3,jpg,etc. Let’s term the process as Encoding. It’s two way process and can be easily decoded back to its original form.
Hashing
Hashing is another cryptography technique in which a data once converted to some other form can never be recovered back. In Layman’s term, there is no process called de-hashing. There are many hash functions to do the job such as sha-512, md5 and so on.
If the original value cannot be recovered, then where do we use this? Passwords! When you set up a password for your mobile or PC, a hash of your password is created and stored in a secure place. When you make a login attempt next time, the entered string is again hashed with the same algorithm (hash function) and the output is matched with the stored value. If it’s the same, you get logged in. Otherwise you are thrown out.
Credits: wikimedia
By applying hash to the password, we can ensure that an attacker will never get our password even if he steal the stored password file. The attacker will have the hash of the password. He can probably find a list of most commonly used passwords and apply sha-512 to each of it and compare it with the value in his hand. It is called the dictionary attack. But how long would he do this? If your password is random enough, do you think this method of cracking would work?
All the passwords in the databases of Facebook, Google and Amazon are hashed, or at least they are supposed to be hashed.
Then there is Encryption
Encryption lies in between hashing and encoding. Encoding is a two way process and should not be used to provide security. Encryption is also a two way process, but original data can be retrieved if and only if the encryption key is known. If you don’t know how encryption works, don’t worry, we will discuss the basics here. That would be enough to understand the basics of SSL. So, there are two types of Encryption namely Symmetric and Asymmetric encryption.
Symmetric Key Encryption
I am trying to keep things as simple as I could. So, let’s understand the symmetric encryption by means of a shift algorithm. This algorithm is used to encrypt alphabets by shifting the letters to either left or right. Let’s take a string CRYPTO and consider a number +3. Then, the encrypted format of CRYPTO will be FUBSWR. That means each letter is shifted to right by 3 places.
Here, the word CRYPTO is called Plaintext, the output FUBSWR is called the Ciphertext, the value +3 is called the Encryption key (symmetric key) and the whole process is a cipher. This is one of the oldest and basic symmetric key encryption algorithm and its first usage was reported during the time of Julius Caesar. So, it was named after him and it is the famous Caesar Cipher. Anyone who knows the encryption key and can apply the reverse of Caesar’s algorithm and retrieve the original Plaintext. Hence it is called a Symmetric Encryption.
Asymmetric Key Encryption
We know that, in Symmetric encryption same key is used for both encryption and decryption. Once that key is stolen, all the data is gone. That’s a huge risk and we need more complex technique. In 1976, Whitfield Diffie and Martin Hellman first published the concept of Asymmetric encryption and the algorithm was known as Diffie–Hellman key exchange. Then in 1978, Ron Rivest, Adi Shamir and Leonard Adleman of MIT published the RSA algorithm. These can be considered as the foundation of Asymmetric cryptography.
As compared to Symmetric encryption, in Asymmetric encryption, there will be two keys instead of one. One is called the Public key, and the other one is the Private key. Theoretically, during initiation we can generate the Public-Private key pair to our machine. Private key should be kept in a safe place and it should never be shared with anyone. Public key, as the name indicates, can be shared with anyone who wish to send encrypted text to you. Now, those who have your public key can encrypt the secret data with it. If the key pair were generated using RSA algorithm, then they should use the same algorithm while encrypting the data. Usually the algorithm will be specified in the public key. The encrypted data can only be decrypted with the private key which is owned by you.
Source: SSL/TLS for dummies part 1 : Ciphersuite, Hashing,Encryption | WST (https://www.wst.space/ssl-part1-ciphersuite-hashing-encryption/)
EncryptionThe Purpose of encryption is to transform data in order to keep it secret E.g (Sending someone a secret text that they only should able to read,sending passwords through Internet).
Instead of focusing the usability the goal is to ensure the data send can be sent secretly and it can only seen by the user whom you sent.
It Encrypts the data into another format of transforming it into unique pattern it can be encrypt with the secret key and those users who having the secret key can able to see the message by reversible the process.
E.g(AES,BLOWFISH,RSA)
The encryption may simply look like this FhQp6U4N28GITVGjdt37hZN
Hashing In technically we can say it as takes a arbitary input and produced a fixed length string.
Most important thing in these is you can't go from the output to the input.It produces the strong output that the given information has not been modified.
The process is to take a input and hash it and then send with the sender's private key once the receiver received they can validate it with sender's public key.
If the hash is wrong and did't match with hash we can't see any of the information. E.g(MD5,SHA.....)

PHP Two Way Encryption With Checksum

I am trying to pass a JSON string from one web application to another using URL parameters (for an internal SSO server).
What I need to do is be able to encrypt the JSON string (which is a user payload object) with a pre-shared key, forward the user to the service provider application with the payload attached as a URL parameter and then on the service provider application decrypt the payload back into a JSON string to get the required information.
Now this part isn't as much of an issue thanks to all of PHP's built in encryption functions but the next part is the difficulty. I am needing to embed a checksum within the encrypted string which can be checked when decrypting it so that if it has been modified in transit then I can raise an exception.
The purpose of this is to make sure that the user payload has not been modified in transit either accidentally or deliberately.
You want to provide more than a "checksum" (usually defined as "calculable by any party"); you want to provide an authentication tag or message authentication code (MAC). You have a couple options:
Use an "authenticated encryption" (AE) or "authenticated encryption with associated data" (AEAD) cipher to do this. AE(AD) ciphers provide an "authentication tag" over the cipher text, either in a single pass or with a repeated process over the encrypted cipher text. Examples (probably available in whichever PHP cipher library you're using) are GCM, EAX, and CCM. This is recommended, as the decryption operation will fail if the authentication tag is not verified, and only one shared secret (key) is necessary.
You can construct the system yourself using cryptographic primitives. This is less ideal, as you are responsible for more independent pieces, you need to manage more keys (if you have access to an OMAC implementation, you can use the same key), and your individual construction is not vetted by third parties (aka the collective work of the internet). If you follow this path, you need to keep some key details in mind:
Use a strong hash-based message authentication code (HMAC) such as HMAC/SHA-256, -384, or -512. Do not use SHA-1 or MD5, as these are easily brute forced.
Verify the HMAC before decrypting the cipher text. Any HMAC that fails means the entire cipher text should be discarded. You can remember this (on the generating side) as Encrypt Then MAC, and if you search for it, you'll see that not following this advice is the source of many cryptographic vulnerabilities and implementation exploits.
Verify the HMAC with a constant-time algorithm (i.e. do not use a short-circuit string equality comparison, the default in Java). PHP provides hash_equals to do this. Here's a quick explanation of timing attacks and a code review of a PHP example.
For either choice you'll want to encode the resulting cipher text and authentication tag with URL-safe Base64 in order to avoid data loss or corruption. If your message format is not strictly structured with included lengths, you'll have to pre-share the protocol ahead of time (i.e. for message m of length n bytes -> 16 bytes IV | n-48 bytes cipher text | 32 bytes HMAC).
Last note: always use a unique, non-predictable IV for each message that is encrypted with a key. Many people gloss over this, because it's "easy to just use 0x00 * 16", but any stream cipher mode of operation like CTR used as the foundation of GCM and CCM will lose fundamental security if two messages are encrypted with the same IV and key.

Encryption: Use of initialization vector vs key?

I am using PHP's mcrypt library and the AES-256 (rijndael) algorithm, which requires both a key + initialization vector to run.
My logical brainside isn't really going along with this. Isn't just one key enough?
Theoretical scenario:
If I had encrypted sensitive data stored in a database, which only the owner should be able to decrypt, would it be appropriate to use the users hashed password to either the key or the initialization vector to his or her data?
Should the key be considered more private than the initialization vector or is it the other way around?
No, in fact an IV is vital in most implementations. The IV is also considered to be safe for public use, for instance the IV is transmitted in plain text for WEP and WPA1/WPA2. The problem arises when this same key+iv is used to encrypt the same plain text. The cipher texts will be identical, unless you use an IV. If an attacker can encrypt arbitrary plain text with this key, and then view the cipher text. This is a much faster way of brute forcing other cipher text that the attacker has obtained.
Not only that, the IV must be random or you would be in violation of CWE-329. The reason why this is a problem is a bit more subtle and I didn't get it at first. You didn't mention this, but i hope you are using either the CBC or CMAC modes
The use of a hash function on a password is nearly identical to using a String2Key function. This is a solid design so long as an attacker can't use SQL Injection to obtain the key.
Initialization Vector (IV) is not a key at all, and is not secret. In fact, it is often exposed (e.g. prepended to the encrypted data). It is used as an additional random input to the encryption algorithm so that the result of encrypting the same clear data is different each time you use a different IV. This way, statistics cannot be gathered on the encrypted data. It does not "improve" the encryption strength by itself.
You can look here for nice diagrams showing how and why IV is used.
Do not use hashed password as a single source for key and IV. As a rule of thumb, you should generate random IV EVERY TIME you update encrypted data and store IV with this data. Key can be reused multiple times, but use salted hashing and store salt with data too.
If you just hash user passwords and use it as encryption keys, users with same passwords will have same keys. Depending on your database structure and intruder access rights there could be some unfortunate cases when users with same passwords can be detected. Add at least unique username to this hash.
If you do not change IV for every data update, information about data changes can be leaked. With CBC or CFB mode identical first plaintext blocks will be encrypted to identical ciphertext until first plaintext change, so position of this change can be determined.
If you're using the EBP mode of the block cipher, or most of the stream ciphers, identical key+IV combinations on different plaintexts will offer the attackers a direct view on the XOR result of the key. This by extension reveals the key itself and to some extent the password.
But do I mean IVs are definitely necessary? No. As long as you change your password each and every time on your next plaintext block(even the same block the second time), you're completely fine without IVs. In fact, all that an IV does is the automation of the above process.

Categories