Does base64_encode gives unique data? [duplicate] - php

This question already has answers here:
What is base 64 encoding used for?
(19 answers)
Closed 8 years ago.
Hi my question is that does base64_encode does unique data every time we run the script?
Below is the code.
<?php
$id = 1;
echo base64_encode($id);
?>
If it does not provide the unique data every time then what is the point in encoding the string and passing in url. Does that make url safe??

Base64 encoding is not a method of encryption. It is used for encoding binary data into text, which makes it safer to transmit over the internet.
If you stream bits, some protocols may interpret it differently. Streaming text is much more reliable.
What is base 64 encoding used for?
If you need true encryption, you need to use something which hashes based on a salt you can hide from other users, such as the mcrypt library.
http://php.net/manual/en/book.mcrypt.php

base64-encoding does not provide unique data. Its purpose is to provide a compact representation of binary data in string form. In your example, you are encoding non-binary data, so it is not very practical. However, if you wanted to encode a string containing a newline and punctuation and pass it via the URL, you cannot send the binary data directly.
For example, if you had the string Hello, World!!\n there would be three punctuation marks, a space and a newline that all need to be URL-encoded. Doing that gives the result:
Hello%2C+World%21%21%0A
Which is 23 bytes long.
On the other hand if you were to base64-encode the same string, the result would be:
SGVsbG8sIFdvcmxkISEK
Which is 20 characters, or about 13% shorter. This adds up quickly if you've got a lot of non-alphanumeric characters or a large amount of data.
So the primary advantage of base64 encoding is its slightly more compact representation of certain data.

Base64 encoding is a way of representing data using only a limited set of characters. You use it when you need to store data in something such as a cookie that can't handle the data in its original format.

Related

Encryption questions

I asked a question here and I manage to partially implement the advice. Data is now stored encrypted in binary field (varbinary(500)), after I remove the aes-256 encryption and I leave aes-128 (default) codeigniter encryption.
However, I have some questions, and I can't find answers, since I can not find many articles on this subject, so If anyone can answer my questions, or point me to a book, or any other literature for further reading, I would be very grateful.
Why encrypted data must be stored in binary type field? What is wrong with storing it in longtext, or varchar? Does that make the encryption worthless?
Why I must first encode the variable and then encrypt it when I store the data in the binary type of field, and I don't have to do that when I store the data in varchar field?
base64_encode($clientName);
$encClientName = $this->encryption->encrypt($clientName);
In my previous question (see the link on the top) I have been advised to use nonce. Since I didn't know how to use that with codeigniter library, I didn't implement that part. Does that make my data less secure? Can anyone post any snippet code of how to use nonce with the codeigniter?
Again, any link to reading material on this subject (storing encrypted data in the database with php) will be deeply appreciated.
Why encrypted data must be stored in binary type field? What is wrong with storing it in longtext, or varchar? Does that make the encryption worthless?
Encrypted data is binary. It will frequently contain byte sequences which are invalid in your text encoding, making them impossible to insert into a column which expects a string (like VARCHAR or TEXT).
The data type you probably want is either VARBINARY (which is similar to VARCHAR, but not a string) or BLOB (likewise, but for TEXT -- there's also MEDIUMBLOB, LONGBLOB, etc).
Why I must first encode the variable and then encrypt it when I store the data in the binary type of field, and I don't have to do that when I store the data in varchar field?
You don't. This is backwards.
If you were going to use a string-type column to store encrypted data, you could "fake it" by Base64 encoding the data after encryption. However, you're still better off using a binary-type column, at which point you don't need any additional encoding.
In my previous question (see the link on the top) I have been advised to use nonce. Since I didn't know how to use that with codeigniter library, I didn't implement that part. Does that make my data less secure?
Based on what I'm seeing in the documentation, I think the CodeIgniter Encryption library handles this for you by default. You shouldn't have to do anything additional.
In addition to duskwuffs answer, I covered your questions from a more crypto-related viewpoint. He just managed to post a minute before I did :)
Encrypted data must be stored in a binary type field due to the way that Character Encodings work. I recommend you read, if you haven't already, this excellent article by Joel Spolsky that details this very well.
It is important to remember that encryption algorithms operate on raw binary data. That is, a bit string. Literal 1's and 0's that can be interpreted in many ways. You can represent this data as unsigned byte values (255, 255), Hex (0xFF, 0xFF), whatever, they are really just bit strings underneath. Another property of encryption algorithms (or good ones, at least) is that the result of encryption should be indistinguishable from random data. That is, given an encrypted file and a blob of CSPRNG generated random data that have the same length, you should not be able to determine which is which.
Now lets presume you wanted to store this data in a field that expects UTF8 strings. Because the bit string we store in this field could contain any possible sequence of bytes, as we discussed above, we can't assume that the sequence of bytes that we store will denote actual valid UTF8 characters. The implication of this is that binary data encoded to UTF8 and then decoded back to binary is not guaranteed to give you the original binary data. In fact, it rarely will.
Your second question is also somewhat to do with encodings, but the encoding here is base64. Base64 is a encoding that plays very nicely with (in fact, it was designed for) binary data. Base64 is a way to represent binary data using common characters (a-z, A-Z, 0-9 and +, /) in most implementations. I am willing to bet that the encrypt function you are using either uses base64_decode or one of the functions it calls does. What you should actually be interested in is whether or not the output of the encrypt function is a base64 string or actual binary data, as this will affect the type of data field you use in your database (e.g. binary vs varchar).
I believe in your last question you stated that you were using CTR, so the following applies to the nonce used by CTR only.
CTR works by encrypting a counter value, and then xor-ing this encrypted counter value with your data. This counter value is made up of two things, the nonce, and the actual value of the counter, which normally starts at 0. Technically, your nonce can be any length, but I believe a common value is 12 bytes. Because the we are discussing AES, the total size of the counter value should be 16 bytes. That is, 12 bytes of nonce and 4 bytes of counter.
This is the important part. Every encryption operation should:
Generate a new 12 byte nonce to use for that operation.
Your implementation should add the counter and perform the actual encryption.
Once you have the final ciphertext, prepend the nonce to this ciphertext so that the result is len(ciphertext) + 12) bytes long.
Then store this final result in your database.
Repeating a nonce, using a static nonce, or performing more than 2^32 encryption operations with a single 12 byte nonce will make your ciphertext vulnerable.

Are AJAX Posts 8 bit Clean? / Relation to Base64 / An alternative? / Where is it?

Base64 only uses 6 bits per character (2^6 = 64) to create textual data from image files. This causes an in-efficiency.
According to a wikipedia entry on Base64, this in-efficiency is to protect against 8 bit dirty things like email.
Is Ajax Posting 8 bit clean? If so, is there an alternative to using Base64?
php.net ( as does wikipedia ) claims a 33% in-efficiency for base64_encode..
Kind of. All JavaScript strings are UTF-16, not byte strings. If you're sending the data with send, then it will be encoded into UTF-8 before it is sent. As such, you can convert the bytes into Unicode code points, which will then be encoded into UTF-8. When it reaches the server, you'll have to decode the UTF-8 and then convert the code points back into bytes.
For 7-bit data, this will not expand the size of the data at all. For 8-bit data with the most significant bit always set, it will double the size of the data. For 8-bit data with the most significant bit set half of the time, it will increase the size of your data by 50%, which is worse than the Base64 33.3Íž% increase.
On the other hand, using XMLHttpRequest Level 2 will allows you to send binary data by passing send an ArrayBuffer, Blob, or FormData. However, XMLHttpRequest Level 2 is only supported in newer browsers.
I think AJAX posting is the same as a generic POST requests in that aspect; that's why we need 'multipart/form-data' for sending files' content, for example. Usually the data gets url encoded, but Base64 is perhaps a better way, as it's (generally) more efficient.
UPDATE: It might be helpful to look at this the other way. ) You need some stream of values, that might possibly take all 8 bits, to safely pass the 7-bit filtering. The perfect solution is to use '7-to-8' encoding, so each 7 bytes become 8 'safe' characters. But this is not applicable, as some of these 7-bit characters are actually used to specify some additional (meta) information about the stream...
Now you have a dilemma: either use the next integer (6 bit - that is base64) - or try to invent a scheme with 'non-integer' divider. Such schemes exist (check Ascii85, for example), but they are rarely used.

How to trim a Base-64 encoded string?

I'm not much of a PHP expert. I'm encoding a URL with base64_encode.
I get quite a long encoded string with a lot of weird characters exactly as I want it to be.
Is there a way to trim this long line of characters to let's say 10 or 15 chars, so I can decode it later again?
I know there is trim() but that does not exactly what I want. I want a long encoded string to be rather short and later I want to decode it again.
Any ideas?
It's not possible to "shorten" any string without losing some data.
If you want to physically shorten an encoded string (with the end result being only part of that string), apply substr() but not on the encoded version: You need to decode it first, then re-encode the shortened version.
Another option is to compress a string. This may shorten it somewhat: Look into gzcompress(). Your mileage may vary, though: the compression rate will depend on what kind of data you have. With small input strings, the result can even be larger than the original.
If you want to reuse a variable in a multi-page process, and don't want to transport it through a link or a form, consider generating a short random key, and storing the data in the user's session:
$_SESSION[$randomKey] = "lllloooooooooooong data here";
You could pass on the random key, and always access the "long" data using $_SESSION[$randomKey]. You need to have a session initialized for this.

PHP <-> JavaScript communication: Am I stuck with ASCII?

I am passing a lot of data between PHP and JavaScript. I am using JSON and json_encode in php, but the problem here is that I am passing a lot of numbers stored as strings - for example, numbers like 1.2345.
Is there a way to pass the data directly as numbers (floats, integers) and not have to convert it to ASCII and then back?
Thanks,
No. HTTP is a byte stream protocol(*); anything that goes down it has to be packed into bytes. You can certainly use a more compact packed binary representation of values if you like, but it's going to be much more work for your PHP to encode and your JS to decode.
Anyhow, for the common case of small numbers, text representations tend to be very efficient. Your example 1.2345 is actually smaller as a string (6 bytes) than a double-precision float (8 bytes).
JSON was invented precisely to allow non-string types to be transferred over the HTTP connection. It's as seamless as you're going to get. Is there any good reason to care that there was a serialise->string->parse step between the PHP float and the JavaScript Number?
(* exposed to JavaScript as a character protocol, since JS has no byte datatype. By setting the charset of the JSON response to iso-8859-1 you can make it work as if it were pure bytes, but the default utf-8 is usually more suitable.)
If you didn't want to use JSON, there are other encoding options. The data returned from an HTTP request is an octect stream (and not 7-bit clean ASCII stream -- if it were, there would be no way to server UTF-8 encoded documents or binary files, as simple counter examples).
Some binary serialization/data protocols are ASN.1, Thrift, Google Protocol Buffers, Avro, or, of course, some custom format. The advantage of JSON is "unified human-readable simplicity".
But in the end -- JSON is JSON.
Perhaps of interest to someone: JavaScript Protocol Buffer Implementation

Why do we need to base64 encode images before transmitting? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
What is the use of base 64 encoding?
I've seen many code fragments that base64 encode images before transmitting over HTTP protocol.
I am wondering why do we need it?
It's not necessary, but it enables you to embed images without performing additional HTTP requests (where, in some cases, it's not possible or permitted).
From the Wikipedia entry on Base64:
The term Base64 refers to a specific
MIME content transfer encoding. It is
also used as a generic term for any
similar encoding scheme that encodes
binary data by treating it numerically
and translating it into a base 64
representation. The particular choice
of base is due to the history of
character set encoding: one can choose
a set of 64 characters that is both
part of the subset common to most
encodings, and also printable. This
combination leaves the data unlikely
to be modified in transit through
systems, such as email, which were
traditionally not 8-bit clean.
And specifically regarding HTTP:
Base64 encoding can be helpful when
fairly lengthy identifying information
is used in an HTTP environment. For
example, a database persistence
framework for Java objects might use
Base64 encoding to encode a relatively
large unique id (generally 128-bit
UUIDs) into a string for use as an
HTTP parameter in HTTP forms or HTTP
GET URLs. Also, many applications need
to encode binary data in a way that is
convenient for inclusion in URLs,
including in hidden web form fields,
and Base64 is a convenient encoding to
render them in not only a compact way,
but in a relatively unreadable one
when trying to obscure the nature of
data from a casual human observer.
The HTTP protocol isn't guaranteed to be "8 bit clean", so it might mangle a binary stream.

Categories