AES-128 UTF-8 characters in key (iOS ↔ PHP)

AES-128 UTF-8 characters in key (iOS ↔ PHP) - php

After a really long time of research I finally did encoding / decoding on iOS and PHP. I wrote a little algorithm that uses a pool of randomly created 16 Bytes keys both, on iOS and PHP.
The algorithm keeps both systems synchronized so that I'm using the keys not multiple times.
However my keys contain some UTF8 Characters (I think). I'm using the standard [a-z][A-Z][0-9] characters including these special chars:
!\"§$%&/()=?+-*#.,£[]|{}
Unfortunately, when using one of these keys, the decryption fails on PHP. On iOS I'm using an extension on the stringByAddingPercentEscapes: method which escapes a bit more characters. Then I send the escaped data as POST variables to the server.
I played around a bit and it turned out that using only [a-z][A-Z][0-9] works great.
Any suggestions on solving my issue?

Of the characters you described, £ and § are not ASCII characters. Depending on how you are transmitting them, those two are probably being corrupted.
That being said — encryption keys are data, not strings. If you represent your encryption keys as NSData, rather than NSString, character sets will cease to be an issue, and you should be able to use any randomly generated key, not just ones consisting of these 85 characters.

Related

Encryption questions

I asked a question here and I manage to partially implement the advice. Data is now stored encrypted in binary field (varbinary(500)), after I remove the aes-256 encryption and I leave aes-128 (default) codeigniter encryption.
However, I have some questions, and I can't find answers, since I can not find many articles on this subject, so If anyone can answer my questions, or point me to a book, or any other literature for further reading, I would be very grateful.
Why encrypted data must be stored in binary type field? What is wrong with storing it in longtext, or varchar? Does that make the encryption worthless?
Why I must first encode the variable and then encrypt it when I store the data in the binary type of field, and I don't have to do that when I store the data in varchar field?
base64_encode($clientName);
$encClientName = $this->encryption->encrypt($clientName);
In my previous question (see the link on the top) I have been advised to use nonce. Since I didn't know how to use that with codeigniter library, I didn't implement that part. Does that make my data less secure? Can anyone post any snippet code of how to use nonce with the codeigniter?
Again, any link to reading material on this subject (storing encrypted data in the database with php) will be deeply appreciated.

Why encrypted data must be stored in binary type field? What is wrong with storing it in longtext, or varchar? Does that make the encryption worthless?
Encrypted data is binary. It will frequently contain byte sequences which are invalid in your text encoding, making them impossible to insert into a column which expects a string (like VARCHAR or TEXT).
The data type you probably want is either VARBINARY (which is similar to VARCHAR, but not a string) or BLOB (likewise, but for TEXT -- there's also MEDIUMBLOB, LONGBLOB, etc).
Why I must first encode the variable and then encrypt it when I store the data in the binary type of field, and I don't have to do that when I store the data in varchar field?
You don't. This is backwards.
If you were going to use a string-type column to store encrypted data, you could "fake it" by Base64 encoding the data after encryption. However, you're still better off using a binary-type column, at which point you don't need any additional encoding.
In my previous question (see the link on the top) I have been advised to use nonce. Since I didn't know how to use that with codeigniter library, I didn't implement that part. Does that make my data less secure?
Based on what I'm seeing in the documentation, I think the CodeIgniter Encryption library handles this for you by default. You shouldn't have to do anything additional.

In addition to duskwuffs answer, I covered your questions from a more crypto-related viewpoint. He just managed to post a minute before I did :)
Encrypted data must be stored in a binary type field due to the way that Character Encodings work. I recommend you read, if you haven't already, this excellent article by Joel Spolsky that details this very well.
It is important to remember that encryption algorithms operate on raw binary data. That is, a bit string. Literal 1's and 0's that can be interpreted in many ways. You can represent this data as unsigned byte values (255, 255), Hex (0xFF, 0xFF), whatever, they are really just bit strings underneath. Another property of encryption algorithms (or good ones, at least) is that the result of encryption should be indistinguishable from random data. That is, given an encrypted file and a blob of CSPRNG generated random data that have the same length, you should not be able to determine which is which.
Now lets presume you wanted to store this data in a field that expects UTF8 strings. Because the bit string we store in this field could contain any possible sequence of bytes, as we discussed above, we can't assume that the sequence of bytes that we store will denote actual valid UTF8 characters. The implication of this is that binary data encoded to UTF8 and then decoded back to binary is not guaranteed to give you the original binary data. In fact, it rarely will.
Your second question is also somewhat to do with encodings, but the encoding here is base64. Base64 is a encoding that plays very nicely with (in fact, it was designed for) binary data. Base64 is a way to represent binary data using common characters (a-z, A-Z, 0-9 and +, /) in most implementations. I am willing to bet that the encrypt function you are using either uses base64_decode or one of the functions it calls does. What you should actually be interested in is whether or not the output of the encrypt function is a base64 string or actual binary data, as this will affect the type of data field you use in your database (e.g. binary vs varchar).
I believe in your last question you stated that you were using CTR, so the following applies to the nonce used by CTR only.
CTR works by encrypting a counter value, and then xor-ing this encrypted counter value with your data. This counter value is made up of two things, the nonce, and the actual value of the counter, which normally starts at 0. Technically, your nonce can be any length, but I believe a common value is 12 bytes. Because the we are discussing AES, the total size of the counter value should be 16 bytes. That is, 12 bytes of nonce and 4 bytes of counter.
This is the important part. Every encryption operation should:
Generate a new 12 byte nonce to use for that operation.
Your implementation should add the counter and perform the actual encryption.
Once you have the final ciphertext, prepend the nonce to this ciphertext so that the result is len(ciphertext) + 12) bytes long.
Then store this final result in your database.
Repeating a nonce, using a static nonce, or performing more than 2^32 encryption operations with a single 12 byte nonce will make your ciphertext vulnerable.

Laravel 4 Encryption: how many characters to expect

I've just had an interesting little problem.
Using Laravel 4, I encrypt some entries before adding them to a db, including email address.
The db was setup with the default varchar length of 255.
I've just had an entry that encrypted to 309 characters, blowing up the encryption by cutting off the last 50-odd characters in the db.
I've (temporarily) fixed this by simply increasing the varchar length to 500, which should - in theory - cover me from this, but I want to be sure.
I'm not sure how the encryption works, but is there a way to tell what maximum character length to expect from the encrypt output for the sake of setting my database?
Should I change my field type from varchar to something else to ensure this doesn't happen again?

Conclusion
First, be warned that there has been quite a few changes between 4.0.0 and 4.2.16 (which seems to be the latest version).
The scheme starts with a staggering overhead of 188 characters for 4.2 and about 244 for 4.0 (given that I did not forget any newlines and such). So to be safe you will probably need in the order of 200 characters for 4.2 and 256 characters for 4.0 plus 1.8 times the plain text size, if the characters in the plaintext are encoded as single bytes.
Analysis
I just looked into the source code of Laravel 4.0 and Laravel 4.2 with regards to this function. Lets get into the size first:
the data is serialized, so the encryption size depends on the size of the type of the value (which is probably a string);
the serialized data is PKCS#7 padded using Rijndael 256 or AES, so that means adding 1 to 32 bytes or 1 to 16 bytes - depending on the use of 4.0 or 4.2;
this data is encrypted with the key and an IV;
both the ciphertext and IV are separately converted to base64;
a HMAC using SHA-256 over the base64 encoded ciphertext is calculated, returning a lowercase hex string of 64 bytes
then the ciphertext consists of base64_encode(json_encode(compact('iv', 'value', 'mac'))) (where the value is the base 64 ciphertext and mac is the HMAC value, of course).
A string in PHP is serialized as s:<i>:"<s>"; where <i> is the size of the string, and <s> is the string (I'm presuming PHP platform encoding here with regards to the size). Note that I'm not 100% sure that Laravel doesn't use any wrapping around the string value, maybe somebody could clear that up for me.
Calculation
All in all, everything depends quite a lot on character encoding, and it would be rather dangerous for me to make a good estimation. Lets assume a 1:1 relation between byte and character for now (e.g. US-ASCII):
serialization adds up to 9 characters for strings up to 999 characters
padding adds up to 16 or 32 bytes, which we assume are characters too
encryption keeps data the same size
base64 in PHP creates ceil(len / 3) * 4 characters - but lets simplify that to (len * 4) / 3 + 4, the base 64 encoded IV is 44 characters
the full HMAC is 64 characters
the JSON encoding adds 3*5 characters for quotes and colons, plus 4 characters for braces and comma's around them, totaling 19 characters (I'm presuming json_encode does not end with a white space here, base 64 again adds the same overhead
OK, so I'm getting a bit tired here, but you can see it at least twice expands the plaintext with base64 encoding. In the end it's a scheme that adds quite a lot of overhead; they could just have used base64(IV|ciphertext|mac) to seriously cut down on overhead.
Notes
if you're not on 4.2 now, I would seriously consider upgrading to the latest version because 4.2 fixes quite a lot of security issues
the sample code uses a string as key, and it is unclear if it is easy to use bytes instead;
the documentation does warn against key sizes other than the Rijndael defaults, but forgets to mention string encoding issues;
padding is always performed, even if CTR mode is used, which kind of defeats the purpose;
Laravel pads using PKCS#7 padding, but as the serialization always seems to end with ;, that was not really necessary;
it's a nice thing to see authenticated encryption being used for database encryption (the IV wasn't used, fixed in 4.2).

#MaartenBodewes' does a very good job at explaining how long the actual string probably will be. However you can never know it for sure, so here are two options to deal with the situation.
1. Make your field text
Change the field from a limited varchar to an "self-expanding" text. This is probably the simpler one, and especially if you expect rather long input I'd definitely recommend this.
2. Just make your varchar longer
As you did already, make your varchar longer depending on what input length you expect/allow. I'd multiply by a factor of 5.
But don't stop there! Add a check in your code to make sure the data doesn't get truncated:
$encrypted = Crypt::encrypt($input);
if(strlen($encrypted) > 500){
// do something about it
}
What can you do about it?
You could either write an error to the log and add the encrypted data (so you can manually re-insert it after you extended the length of your DB field)
Log::error('An encrypted value was too long for the DB field xy. Length: '.strlen($encrypted).' Data: '.$encrypted);
Obviously that means you have to check the logs frequently (or send them to you by mail) and also that the user could encounter errors while using the application because of the incorrect data in your DB.
The other way would be to throw an exception (and display an error to the user) and of course also write it to the log so you can fix it...
Anyways
Whether you choose option 1 or 2 you should always restrict the accepted length of your input fields. Server side and client side.

Does base64_encode gives unique data? [duplicate]

This question already has answers here:
What is base 64 encoding used for?
(19 answers)
Closed 8 years ago.
Hi my question is that does base64_encode does unique data every time we run the script?
Below is the code.
<?php
$id = 1;
echo base64_encode($id);
?>
If it does not provide the unique data every time then what is the point in encoding the string and passing in url. Does that make url safe??

Base64 encoding is not a method of encryption. It is used for encoding binary data into text, which makes it safer to transmit over the internet.
If you stream bits, some protocols may interpret it differently. Streaming text is much more reliable.
What is base 64 encoding used for?
If you need true encryption, you need to use something which hashes based on a salt you can hide from other users, such as the mcrypt library.
http://php.net/manual/en/book.mcrypt.php

base64-encoding does not provide unique data. Its purpose is to provide a compact representation of binary data in string form. In your example, you are encoding non-binary data, so it is not very practical. However, if you wanted to encode a string containing a newline and punctuation and pass it via the URL, you cannot send the binary data directly.
For example, if you had the string Hello, World!!\n there would be three punctuation marks, a space and a newline that all need to be URL-encoded. Doing that gives the result:
Hello%2C+World%21%21%0A
Which is 23 bytes long.
On the other hand if you were to base64-encode the same string, the result would be:
SGVsbG8sIFdvcmxkISEK
Which is 20 characters, or about 13% shorter. This adds up quickly if you've got a lot of non-alphanumeric characters or a large amount of data.
So the primary advantage of base64 encoding is its slightly more compact representation of certain data.

Base64 encoding is a way of representing data using only a limited set of characters. You use it when you need to store data in something such as a cookie that can't handle the data in its original format.

Is base 64 encoding with a secret key of strings, a possible way of Protection?

i am reading that in a paper
Any end-user could modify these values (since they are originated in his browser), but if the web developer encodes for example, converting all characters to URL-encoding (hexadecimal) or uses a particular encoding to send GET/POST parameters (e .g., base64 with some secret key string) the attack vector must be revisited.
so, this means that is good practice encoding the variables with base 64 and a secret key?
how is implemented an url-encoding?
this makes sense? i never read about encoding variables as a way of protection
thanks
paper page 5

Base64 encoding schemes are commonly used when there is a need to encode binary data that needs be stored and transferred over media that are designed to deal with textual data (like HTTP). This is to ensure that the data remains intact without modification during transport.
So yes it can be a way of protecting the original data form unwanted modification. But remember it is not anywhere near Encryption.
The specification for URLs (RFC 1738, Dec. '94) poses a problem, in that it limits the use of allowed characters in URLs to only a limited subset of the US-ASCII character set:
"...Only alphanumerics [0-9a-zA-Z], the special characters "$-_.+!*'()," [not including the quotes - ed], and reserved characters used for their reserved purposes may be used unencoded within a URL."
Here's nice article on that http://www.blooberry.com/indexdot/html/topics/urlencoding.htm
In PHP you can use string urlencode ( string $str ) method for URL Encoding.

so, this means that is good practice
encoding the variables with base 64
and a secret key? how is implemented
an url-encoding?
this makes sense?
No, it doesn't really make sense. base64 is not an encryption scheme, it's just a way of encoding binary data into a subset of 7-bit text that isn't likely to be altered by email servers, etc.
base64 does not have a key, not can it encrypt or decrypt.
My guess is that the paper's authors were talking about some encryption scheme prior to the bit you quoted, and they only mentioned base64 later as a way to transmit their already encrypted data safely over HTTP GET or POST parameters.
Without seeing the rest of what you quoted from, we don't know.

base 64 encode does not provide any security whatsoever. it is almost as bad as a mono-alphabetic substitution cipher
here's how to encode/decode in base 64, also known as radix64
http://en.wikipedia.org/wiki/Base64
im not sure what it means to encode with a key in this context

Special characters in Flex

I am working on a Flex app that has a MySQL database. Data is retrieved from the DB using PHP then I am using AMFPHP to pass the data on to Flex
The problem that I am having is that the data is being copied from Word documents which sometimes result in some of the more unusual characters are not displaying properly. For example, Word uses different characters for starting and ending double quotes instead of just " (the standard double quotes). Another example is the long dash instead of -.
All of these characters result in one or more accented capital A characters appearing instead. Not only that, each time the document is saved, the characters are replaced again resulting in an ever-increasing number of these accented A's appearing.
Doing a search and replace for each troublesome character to swap it for one of the none characters seems to work but obviously this requires compiling a list of all the characters that may appear and means there is scope for this continuing as new characters are used for the first time. It also seems like a bit of a brute force way of getting round the problem rather than a proper solution.
Does anyone know what causes this and have any good workarounds / fixes? I have had similar problems when using utf-8 characters in html documents that aren't set to use utf-8. Is this the same thing and if so, how do I get flex to use utf-8?
Many thanks
Adam

It is the same thing, and smart quotes aren't special as such: you will in fact be failing for every non-ASCII character. As such a trivial ad-hoc replace for the smart quote characters will be pointless.
At some point, someone is mis-decoding a sequence of bytes as ISO-8859-1 or Windows code page 1252 when it should have been UTF-8. Difficult to say where without detail/code.
What is “the document”? What format is it? Does that format support UTF-8 content? If it does not, you will need to encode output you put into it at the document-creation phase to the encoding the consumer of that document expects, eg. using iconv.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.