There are two ways to specify a key and an IV for a RijndaelManaged object. One is by calling CreateEncryptor:
var encryptor = rij.CreateEncryptor(Encoding.UTF8.GetBytes(key), Encoding.UTF8.GetBytes(iv)));
and another one by directly setting Key and IV properties:
rij.Key = "1111222233334444";
rij.IV = "1111222233334444";
As long as the length of the Key and IV is 16 bytes, both methods produce the same result. But if your key is shorter than 16 bytes, the first method still allows you to encode the data and the second method fails with an exception.
Now this may sound like an absolutely abstract question, but I have to use PHP & the key which is only 10 bytes long in order to send an encrypted message to a server which uses the first method.
So the question is: How does CreateEncryptor expand the key and is there a PHP implementation? I cannot alter the C# code so I'm forced to replicate this behaviour in PHP.
I'm going to have to start with some assumptions. (TL;DR - The solution is about two-thirds of the way down but the journey is way cooler).
First, in your example you set IV and Key to strings. This can't be done. I'm therefore going to assume we call GetBytes() on the strings, which is a terrible idea by the way as there are less potential byte values in usable ASCII space than there are in all 256 values in a byte; that's what GenerateIV() and GenerateKey() are for. I'll get to this at the very end.
Next I'm going to assume you're using the default block, key and feedback size for RijndaelManaged: 128, 256 and 128 respectively.
Now we'll decompile the Rijndael CreateEncryptor() call. When it creates the Transform object it doesn't do much of anything with the key at all (except set m_Nk, which I'll come to later). Instead it goes straight to generating a key expansion from the bytes it is given.
Now it gets interesting:
switch (this.m_blockSizeBits > rgbKey.Length * 8 ? this.m_blockSizeBits : rgbKey.Length * 8)
So:
128 > len(k) x 8 = 128
128 <= len(k) x 8 = len(k) x 8
128 / 8 = 16, so if len(k) is 16 we can expect to switch on len(k) x 8. If it's more, then it will switch on len(k) x 8 too. If it's less it will switch on the block size, 128.
Valid switch values are 128, 192 and 256. That means it will only fall to default (and throw an exception) if it's over 16 bytes in length and not a valid block (not key) length of some sort.
In other words, it never checks against the key length specified in the RijndaelManaged object. It goes straight in to the key expansion and starts operating at the block level, as long as the key length (in bits) is one of 128, 192, 256 or less than 128. This is actually a check against the block size, not the key size.
So what happens now that we've patently not checked the key length? The answer has to do with the nature of the key schedule. When you enter a key in to Rijndael, the key needs to be expanded before it can be used. In this case, it's going to be expanded to 176 bytes. In order to accomplish this, it uses an algorithm which is specifically designed to turn a short byte array in to much longer byte array.
Part of that involves checking the key length. A bit more decompilation fun and we find that this defined as m_Nk. Sounds familiar?
this.m_Nk = rgbKey.Length / 4;
Nk is 4 for a 16-byte key, less when we enter shorter keys. That's 4 words, for anyone wondering where the magic number 4 came from. This causes a curious fork in the key scheduler, there's a specific path for Nk <= 6.
Without going too deep in to the details, this actually happens to 'work' (ie. not crash in a fireball) with a key length less than 16 bytes... until it gets below 8 bytes.
Then the entire thing crashes spectacularly.
So what have we learned? When you use CreateEncryptor you are actually throwing a completely invalid key straight in to the key scheduler and it's serendipity that sometimes it doesn't outright crash on you (or a horrible contractual integrity breach, depending on your POV); probably an unintended side effect of the fact there's a specific fork for short key lengths.
For completeness sake we can now look at the other implementation where you set the Key and IV in the RijndaelManaged object. These are stored in the SymmetricAlgorithm base class, which has the following setter:
if (!this.ValidKeySize(value.Length * 8))
throw new CryptographicException(Environment.GetResourceString("Cryptography_InvalidKeySize"));
Bingo. Contract properly enforced.
The obvious answer is that you cannot replicate this in another library unless that library happens to contain the same glaring issue, which I'm going to a call a bug in Microsoft's code because I really can't see any other option.
But that answer would be a cop out. By inspecting the key scheduler we can work out what's actually happening.
When the expanded key is initialised, it populates itself with 0x00s. It then writes to the first Nk words with our key (in our case Nk = 2, so it populates the first 2 words or 8 bytes). Then it enters a second stage of expanding upon that by populating the rest of the expanded key beyond that point.
So now we know it's essentially padding everything past 8 bytes with 0x00, we can pad it with 0x00s right? No; because this shifts the Nk up to Nk = 4. As a result, although our first 4 words (16 bytes) will be populated as we expect, the second stage will begin expanding at the 17th byte, not the 9th!
The solution then is utterly trivial. Rather than padding our initial key with 6 additional bytes, just chop off the last 2 bytes.
So your direct answer in PHP is:
$key = substr($key, 0, -2);
Simple, right? :)
Now you can interop with this encryption function. But don't. It can be cracked.
Assuming your key uses lowercase, uppercase and digits you have an exhaustive search space of only 218 trillion keys.
62 bytes (26 + 26 + 10) is the search space of each byte because you're never using the other 194 (256 - 62) values. Since we have 8 bytes, there are 62^8 possible combinations. 218 trillion.
How fast can we try all the keys in that space? Let's ask openssl what my laptop (running lots of clutter) can do:
Doing aes-256 cbc for 3s on 16 size blocks: 12484844 aes-256 cbc's in 3.00s
That's 4,161,615 passes/sec. 218,340,105,584,896 / 4,161,615 / 3600 / 24 = 607 days.
Okay, 607 days isn't bad. But I can always just fire up a bunch of Amazon servers and cut that down to ~1 day by asking 607 equivalent instances to calculate 1/607th of the search space. How much would that cost? Less than $1000, assuming that each instance was somehow only as efficient as my busy laptop. Cheaper and faster otherwise.
There is also an implementation that is twice the speed of openssl1, so cut whatever figure we've ended up with in half.
Then we've got to consider that we'll almost certainly find the key before exhausting the entire search space. So for all we know it might be finished in an hour.
At this point we can assert if the data is worth encrypting, it's probably worth it to crack the key.
So there you go.
Related
Small remark
Reading about max_input_vars variable made me to read a lot about PHP's internals for handling arrays. This is not really a question, but rather answering my own question "why do we really need this max_input_var". It is not localized, and actually related to a lot of other programming languages and not only php.
A problem:
Compare these two small php scripts:
$data = array();
for ($key = 0; $key <= 1073709056; $key += 32767){
$data[$key] = 0;
}
Can check it here. Everything normal, nothing unexpected. Execution time is close to 0.
And this mostly identical (difference is in 1)
$data = array();
for ($key = 0; $key <= 1073709056; $key += 32768){
$data[$key] = 0;
}
Check it here. Nothing is normal, everything is unexpected. You exceeded execution time. So it is at least 3000 times slower!
The question is why does it happen?
I posted it here together with an answer because this vastly improved my knowledge about php internals and I learned new things about security.
The problem is not in the loop, the problem is with how PHP and many other languages (Java, Python, ASP.Net) are storing key/value pairs in hash data structures. PHP uses hash table to store arrays (which makes them theoretically very fast for storing and retrieving data from that array O(1)). The problem arise when more than one values get mapped to the same key thus creating hash collisions. Inserting element into such a key becomes more expensive O(n) and therefore inserting n keys jumps from O(n) to O(n^2).
And this is exactly what goes on here. When the number changes from 32767 to 32768 it changes keys from no collisions to everything collides to the same key.
This is the case, because of the way it php arrays are implemented in C. Array is of the size of the power of 2. (Array of 9 and 15 elements will be allocated with array of size 16). Also if the array key is an integer, the hash will be an integer with a mask on top of it. The mask is size of the array - 1 in binary. This means that if someone will try to insert the following keys in the associative array 0, 32, 64, 128, 256, ... and so on, they all will be mapped to the same key and thus the hash will have a linked list. The above example creates exactly this.
This requires a lot of CPU to process and therefore you see huge time increase. What this means is that devs should be really careful when they accept some data from outside that they will be parsed into array (people can craft the data easily and DOS the server). This data can be $_GET, $_POST requests (this is why you can limit the number with max_input_vars), XML, JSON.
Here are the resources I used to learn about these things:
res 1
res 2
res 3
res 4
res 5
I don't know anything about php specifically but 32767 would be the max value of a 2 byte number. Increasing it to 32768 would make the use of a 3 byte number (which is never used so it'll be 4 bytes) necessary which would in turn make everything slower.
I have a very large integer 12-14 digits long and I want to encrypt/compress this to an alphanumeric value so that the integer can be recovered later from the alphanumeric value. I tried to convert this integer using a 62 base and tried to map those values to a-zA-Z0-9, but the value generated from this is 7 characters long. This length is still long enough and I want to convert to about 4-5 characters.
Is there a general way to do this or some method in which this can be done so that recovering the integer would still be possible? I am asking the mathematical aspects here but I would be programming this in PHP and I recently started programming in php.
Edit:
I was thinking in terms of assigning a masking bit and using this in a fashion to generate less number of Chars. I am aware of the fact that the range is not enough and that is the reason I was focusing on using a mathematical trick or a way of representation. The 62 base was an Idea that I already applied but is not working out.
14 digit decimal numbers can express 100,000,000,000,000 values (1014).
5 characters of a 62 character alphabet can express 916,132,832 values (625).
You cannot cram the equivalent number of values of a 14 digit number into a 5 character base 62 string. It's simply not possible to express each possible value uniquely. See http://en.wikipedia.org/wiki/Pigeonhole_principle. Even base 64 with 7 characters is not enough (only 4,398,046,511,104 possible values). In fact, if you target a 5 character short string you'd need to compensate by using a base 631 alphabet (6315 = 100,033,806,792,151).
Even compression doesn't help you. It would mean that two or more numbers would need to compress to the same compressed string (because there aren't enough possible unique compressed values), which logically means it's impossible to uncompress them into two different values.
To illustrate this very simply: Say my alphabet and target "string length" consists of one bit. That one bit can be 0 or 1. It can express 2 unique possible values. Say I have a compression algorithm which compresses anything and everything into this one bit. ... How could I possibly uncompress 100,000,000,000,000 unique values out of that one bit with two possible values? If you'd solve that problem, bandwidth and storage concerns would immediately evaporate and you'd be a billionaire.
With 95 printable ASCII characters you can switch to base 95 encoding instead of 62:
!"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
That way an integer string of length X can be compressed into length Y base 95 string, where
Y = X * log 10/ log 95 = roughly X / 2
which is pretty good compression. So from length 12 you get down to 6. If the purpose of compression is to save the bandwidth by using JSON, then base 92 can be good choice (excluding ",\,/ that become escaped in JSON).
Surely you can get better compression but the price to pay is a larger alphabet. Just replace 95 in the above formula by the number of symbols.
Unless of course, you know the structure of your integers. For instance, if they have plenty of zeroes, you can base your compression on this knowledge to get much better results.
because the pigeon principle you will end up with some values that get compressed and other values that get expanded. It simply impossible to create a compression algorithm that compress every possible input string (i.e. in your case your numbers).
If you force the cardinality of the output set to be smaller than the cardinality of the input set you'll get collisions (i.e. more input strings get "compressed" to the same compressed binary string). A compression algorithm should be reversible, right? :)
I need to group a number of parameters into a short, non-predictable, spellable code. Ex:
serial: WJ-JHA5JK7E9RTAS
date: 04/02/2013
days: 30
valid: true
Compressed code could look like this: 3xy9b0laiph3s
My goal is to make the code as short as possible (without losing any information, of course). The algorithm must be easily implemented in other languages as well (so it can't have crazy specific dependencies). Any thoughts?
Most of the time this is handled by storing the data somewhere and creating an ID which is then compressed and used. The most common users of this system are short URL sites.
Store data in DB and get row ID
convert base-10 row ID to base 32 or 64 (base_convert in PHP)
use the new ID which looks like '4F7c'
When that ID is passed just unconvert it bask to base 10 and look up the data in the DB
Code:
$id = 23590;
print $id;
$hash = base_convert($id, 10, 32);
print $hash;
$id = base_convert($hash, 32, 10);
print $id;
For arbitrary short strings there is not enough information to apply generalized predictive methods of compression.
You'll need to exploit the known features of your data.
Example:
Serial numbers appear to be capital letters and numbers - 36 values per character - and 15 characters long. That's 36^15 possible values which will fit in 78 bits.
Date can be converted into number of days since a fixed date. If all the dates are known to fall within 100 years of each other, this can be stored in 16 bits.
If days is never more than years worth, this can be stored in 9 bits.
Valid can be stored in 1 bit.
That's 104 bits, which can be Base64 encoded to 18 characters
Note that oftentimes serial numbers have a checksum character or two. If you know how the checksum is calculated, you can omit this character and recalculate it upon decoding. This could save you a Base64 digit here.
If you want to make the result less predictable, without worrying with heavy encryption, you can just deterministically shuffle your encoded string.
UUencode or Base64, but in these codings case is matched. Eventually you could edit these codings for your purposes (only small letters). If you have exactly the same amount of data this would be the easiest solution. But not the minimal one.
I have a need to store an encrypted but recoverable (by admin) password in MySQL, from PHP. AFAIK, the most straightforward way to do this is with openssl_public_encrypt(), but I'm not sure what column type is needed. Can I make any reliable judgment on the maximum length of encrypted output, based upon the size of the key and the input?
Or am I forced to use a huge field (e.g. BLOB), and just hope it works all the time?
The openssl_public_encrypt function limits the size of the data you can encrypt to the length of the key, if you use padding (recommended), you'll lose an extra 11 bytes.
However, the PKCS#1 standard, which OpenSSL uses, specifies a padding scheme (so you can encrypt smaller quantities without losing security), and that padding scheme takes a minimum of 11 bytes (it will be longer if the value you're encrypting is smaller). So the highest number of bits you can encrypt with a 1024-bit key is 936 bits because of this (unless you disable the padding by adding the OPENSSL_NO_PADDING flag, in which case you can go up to 1023-1024 bits). With a 2048-bit key it's 1960 bits instead.
Of course you should never disable padding, because that will make the same passwords to encrypt to the same value.
So for a 1024-bit key the maximum password input length is 117 chars.
For a 2048-bit key it's 245 chars.
I'm not 100% sure of the output length, but a simple trail should confirm this, the output is a simple function of the keylength, so for a 2048-bit key I suspect it is 256 bytes.
You should use a binary string with the required length to store the password.
For speed reasons it's best to use a limited length index on the field.
Do not use blob (!) because that will slow things way down for no benefit.
CREATE TABLE user
id unsigned integer auto_increment primary key,
username varchar(50) not null,
passRSA binary(256), <<-- doublecheck the length.
index ipass(passRSA(10)) <<-- only indexes the first 10 bytes for speed reasons.
) ENGINE = InnoDB
Adding extra bytes to the index will just slow things down and grow the index file for no benefit.
I write this simple line to get random & unique code each time (just 8 characters):
echo substr(md5(uniqid(rand(), true)),0,8);
Output:
077331e5
5af425b1
0fc7dcf2
...
I ask if I'll never get a collision (duplicate). Or that can happen.
BS:
It's better to use time()?
echo substr(md5(uniqid(time(), true)),0,8);
Hashes can have collisions. By taking a substring of the hash you are just upping the chance of that happening.
Regardless of what you feed into md5(), by doing the substring, you're eliminating a large part of md5's output and constricting the range of possible hashes. md5 outputs a 128bit string, and you're limiting it to 32bits, So you've got from a 1 in 1x10^38 to 1 in 4 billion chance of a collision.
Your "unique code" is a string of eight hexadecimal digits, and thus it has 4294967296 possible values. You are thus guanteed to get a duplicate of an earlier code by the 4294967297th time you run it.
PHP has a method to provide unique Ids called uniqid()
You stand a fair chance of your 8 char MD5 being unique but as with any random string the shorter you make the more likely you are to have a collision.
Short answer: it can happen. There's a discussion here about the collision space of MD5 that you might want to check out. Doing a substring of the MD5 will make the collision space much, much larger.
A better solution may be the answer proposed here, possibly checking it against other unique IDs that you've generated.
Your code returns part of a hash. Hashes are for hashing, thus you can not guarantee any pattern within the results (eg. uniqueness).
Also, you are getting only part of a hash, and each letter from a hash is hexadecimal (from 0 to 9 or from a to b - 16 possibilities). It needs only a simple calculation:
16 ^ 8 = 4 294 967 296
to find how many unique values can your code generate. This number (4 294 967 296) means, that if you use this function more thatn 4 294 967 296 times, the value generated with it surely will not be unique. Of course it is certain, that in this case the number of iterations will not be unique after applying it to smaller number of values.