Check for duplicate of encrypted data

Check for duplicate of encrypted data - php

I was hoping to get some advice on a particular task I'm trying to implement.
I have a table that stores secure data and returns an ID as a representation of that data. No problems there. So, for example, if a social security number is stored, the code generates a representational ID and stores the social security number in an encrypted fashion in the table. The encryption is done using envelope encryption.
Here's my issue. Every time a new value comes in, I don't want to create a new ID if the data already exists. I need to check to see if the value already exists and, if so, return the existing ID. The problem I have is that the encrypted value is different each time and I certainly can't decrypt every value in the database to check for a duplicate. I could create a one way hash and store that as well but, if I do, I would need to salt it for security purposes and the hash will be different every time.
So I'm hoping to get advice/recommendations on how to achieve this? How to check for duplicates when the value is stored in an encrypted fashion.
Thank you!

Related

How to store the encryption key safely?

I am using MySQL as a back end storage.
I was asked by our risk management team to encrypt all the data prior storing it into the database.
Since then I have been doing research on how to secure the data going in and out the database.
I found couple ways one of them was MySQL Encryption Software
A second solution was to encrypt and decrypt data in MySQL using AES_ENCRYPT() AND AES_DECRYPT(). But I will need to create a 128,196 or 256 bit key in order to be able to encrypt and decrypt the data. Then every time I want to execute INSERT/UPDATE I will call AES_ENCRYPT() and supply it with a key to encrypt the data. Then when I execute SELECT then I will have to call AES_DECRYPT() and supply the same key to convert the data to a plain text.
This means that I will have to define a variable in my PHP script that have the private key so I can encrypt/decrypt by supplying the defined variable to both AES_ENCRYPT() and AES_DECRYPT()
My question is
Where/How to a store this private key to prevent a hacker from reading it. if someone hacks my server reads the key then he can simply read the data and the encryption would be mean less.
And what is the best way to go about securing my data?
Thank you

The issue you are facing is not a key issue but an issue of security of the rest of your computer. Using mysql means that mysql (if running safely) is running in it's own account. You would in fact put the keys in your mysql-owned directory. That secures much of mysql. MySql itself needs to have access to that key, so there is not much more you can do for that account. Just make sure it is readable only by the owning account.

If you hide your key and the hacker gets to your PHP code the hacker could do the following:
A. Echo/Print the KEY (it's a variable). He doesn't have to find the code where you define it, he could look for the function where you use it.
B. Forget about the KEY and use your own decrypting function to see the data and export it.
Most of the hacks are exploits to your PHP code, you should secure your computer, system and database.
Use MYSQL Stored Procedures
If only an authenticated user can get the key you could do this... Hide the KEY in a MYSQL STORED PROCEDURE and only give the key to the user when he completes the log in. All the data in the database should be encrypted and the key is not accesible as plain stored data. It should be inside the SQL Stored Procedure that a user having a password will get as query result. Users table should not be encrypted, only the password as one way.
This last solution should work only for authenticated users, like apps as example. You could protect the user profiles easily with this solution.
Users (basic data) --> Profiles (all personal details that depend upon log in)
I don't know much about stored procedures, but for sure you could apply it to the entire database and make your data accesible only through them.
Look at this article:
http://www.sitepoint.com/stored-procedures-mysql-php/
And you will find PROS and CONS here:
http://code.tutsplus.com/articles/an-introduction-to-stored-procedures-in-mysql-5--net-17843

Technique for creating unique custom CMS page ID's

I have a website where users can upload blog posts. This website is built entirely by me, no framework documentation to look through.
I've been using Mysql's Primary key as the page ID in the url, but i don't like this as it gives away too much information to the user.
The id appears somewhat like this
www.website.com/view?post=97
Youtube uses an 11 letter combination and looks somewhat like
watch?v=wEoFhRCUEs8 // *Not a plug*
I was thinking of encrypting the ID in an MD5, but 1) this is far too long, 2) Not the intended use of MD5.
Any ideas on how sites like facebook, stackoverflow, youtube etc encrypt each ID whilst ensuring that it is unique?
I'm also unsure if it is best to save a secondary unique ID in mysql, or just parse it through a function that converts it every time I need it.
Thanks

I can provide you the logic:
first there is a post //
then it is tried to be inserted into the database //
Before inserting into the database //
Generate a random string //
As soon as you generate the random string, check in the database if it is taken //
if it is taken, generate a new string //
else utilize that string //
now insert all the necessary data into the database //
Done //

You are actually on the right track with MD5. Basically what you need to do is create a simple encyption based on the post id. As long as it is not security related I'd write a quick 2 way hashing algorithm that allows you to convert 97 to "wEoFhRCUEs8" and vice versa. That way you can look up posts later.
Probably something that just bit shifts the number and xors it into a "secret" string.. etc...
If you want to go a little more secure try the mcrypt lib, depending on algorithm you can limit the output size.
If you don't do a reversable encryption then you have no choice but to store your unique string with your post so you can look it up later.

How to implement a secret url for delivering information after payment without login?

I would like to deliver some information to customers after a paypal payment, using the paypal return url, and without having the customer log in.
So I think I need a system to create urls for each transaction, and to avoid that a url for another transaction is guessed.
I have thought of something like:
http://www.domain.com/product/send.php?productID=12&transactionHash=[thisTransactionHash]
using a transactionHash that could be calculated based on the customer's email and the product unique id.
Does this method make sense? or what would be your recommendation delivering information without login, and avoiding customers guessing the url for other products?
Although they were several interesting answers about hashes, there is still one concern with the idea I mention above: Paypal needs to receive the return url, therefore the information is passed before payment and therefore the method is not securing against fraud.
The only secure way I see is the Paypal delivery system, which is why I accepted that answer.

If you target PayPal only, why don't you check Instant Payment Notification Guide?
https://www.x.com/sites/default/files/ipnguide.pdf
I didn't use Paypal before, but it seems this solves your problem.
Create table:
| product_id (unique ID of you product) | varchar transaction_hash |
In this sample code (PHP example):
https://www.x.com/developers/PayPal/documentation-tools/code-sample/216623
After validating that the payment is correct, insert product ID and verify_sign( value from paypal POST data) in the table. and Give the user a URL with with product ID and verify_sign.

"using a transactionHash that could be calculated based on the customer's email and the product unique id."
As soon as the algorithm gets known you system will break down. My recommendation is a "secure" aka cryptographic PRNG + some lookup table.

You can create a random id for a user at any given time you want, maybe even using some truely random generators out on the web.
BUT what you should do is make it UNIQUE for a specific amount of time, perhaps with a simple database structure, maybe storing informations into files on your server, that will be deleted by the same script as soon as they're read once, depends on your needs.
So whenever a user generates such unique ID he can access that information for either a certain period of time, or exactly once.
Using say random.org's random byte function you can generate a string like:
6f0d47cf3432d4015e0e798641191bf0e8e0b90b00df23181bcb3401a0dad43d85be711343c3baa9
Which is nearly impossible to guess even if someone else knows a productID AND the emailadress of said customer

Using a hash to access some stored information without the need of logging in isn't a bad idea. BUT that hash should not be generated based on already known data like IDs, email-address or similar data that could be known or guessed by any user.
Instead it is necessary to randomly generate a long enough hash thats value couldn't be guessed or generated out of any known data.
The already mentioned byte function from random.org could be a good choice for that.

include a hash param which value is calculated based on several parameters. for example, for your url o would calculate the hash like this:
$uniqueKeyString="some random characters";
transactionHash=md5("domain.com".$productId.time().$uniqueKeyString);
where $uniqueKeyString is a secret value (some random integer) only you know.
than, when a request will come to your servers, you can simply calculate the hash string yourself and compare it with the transactionHash of the request whether it is the same.

Save and restore search parameters as unique ID

Noob question here. I'm overhauling some "Search" pages in a real estate website. I would like to be able to generate an unique ID (hash?) which contains in itself all the parameters of the search, e.g., the user would be given an URL in the form of http://search.example.com/a95kl53df-02, and loading this URL would repeat the exact same search.
Some of the search parameters are simply one of several options, some are integers, and there are also keywords (which I'll just append after the ID, I guess). What's the general approach to cramming this data into a string? I'm fairly comfortable with PHP/MySQL, but my practical experience is next to none, so I don't know "how it's done".
EDIT: I do not need the string to be random, and, indeed, I need the process to be two-way. Perhaps hash isn't the correct term, then. As for why - I'm doing this for the sake of brevity, since current URLs contain at least 22 GET parameters.
I have the nasty habit of always asking my questions on the Interwebs a bit too early, reconsiderations popping right into my head as soon as I have posted. I'm currently drafting a possible solution. I'm still open to any suggestions, though.

Hashes are not unique
A hash is NOT unique, you can't use it. Any hash can result from an infinite number of given strings.
You don't need randomness, just a unique token
You should just generate a unique token with the help of the database (even just an autoindexed id). You can create a cronjob that deletes old searches after a while.
That table would minimally contain the unique token plus the original search string.
Possible implementation
User does a search
Search params are stored in database, token is returned
Token is given to user in some way (e.g. do you want to save this search for later)
When user wants to repeat search with token, search string is retrieved from db and search run

You could use something like mcrypt() on $_SERVER['QUERY_STRING'], and then decrypt it if an encrypted URL is passed in. However, there are all sorts of problems here and I recommend not doing that.
Based on your edit that you are doing this because of a complicated URL, I would suggest that hashing is going to make the problem worse. If you have an error with the URL, you now have multiple places it could be going wrong.
Just make a random key that you then lookup in a simple flat-file database. You could check whether the URL is already in the database and then return the key if it is.
Another advantage of this system is that if your URL structure changes, then you can change all the URLs in the database and the users' short URLs still work.

Well to be random (which by the way you never can be), you can hash let us say the microtime (which is random-sh, since there is a low possibility that 2 users will search at the same time) along with some salt, with what you can use is the query id:
so something like:
$store_unique = md5(microtime().$queryID);
//the $store_unique you can save to the db with the query params
//then when anyone goes to the random url, you can check it against the db
UPDATE
Due to the comments below, I offer another solution (which can be more unique):
$store_unique = microtime(). "-" .$queryID;
//the $store_unique you can save to the db with the query params
//then when anyone goes to the random url, you can check it against the db

Two-key encryption/decryption?

I'm looking to store some fairly sensitive data using PHP and MySQL and will be using some form of reversible encryption to do so since I need to get the data back out in plain text for it to be of any use.
I'll be deriving the encryption key from the users' username/password combination but I'm stumped for what to do in the (inevitable) event of a password being forgotten. I realise that the purpose of encryption is that it can only be undone using the correct key but this must have been addressed before..
I'm trying to get my head around whether or not public key cryptography would apply to the problem but all I can think of is that the private key will still need to be correct to decrypt the data..
Any ideas?

It's not clear what you are striving for, so advice on how to implement it is hard.
Standards like PGP and S/MIME encrypt each message with a new symmetric key. Those keys are then encrypted for each recipient of the message. This way, instead of duplicating the message (which could be very large) for each recipient, everyone gets the same ciphertext, and only the key (which is small) is duplicated—but encrypted differently for each recipient.
Maybe you could do something similar here, encrypting the key with the user's password, and encrypting another copy with your public key. If the user forgets their password, you can recover the message for them (after an appropriate backup identity verification) using your private key.

The conventional solution is to have a "recovery agent": one user that holds a second password that can be used to decrypt all data. Strict usage policies would apply to using the recovery password, such as putting it physically into a safe.
Then, either encrypt all data twice: once with the user key and once with the recovery key; alternatively, generate a session key for every set of data, and encrypt the data only once, but the session key twice.
For that to work, at least the key of the recovery agent must be asymmetric, since the private part will live in the safe, and the public key in the software.
As yet another alternative using the same scheme: encrypt the user's passwords with the recovery key on password change. This is simpler to implement, but will allow to recover the passwords and not just the data, which may be undesirable.

I'm looking to store some fairly
sensitive data using PHP and MySQL and
will be using some form of reversible
encryption to do so since I need to
get the data back out in plain text
for it to be of any use.
Protecting sensitive data is good. Now:
Whose data is it? (yours, your user's, or a third party?)
What does it need to be protected from? (disclosure, corruption (accidental or intentional...)
Who does it need to be protected from
Uninvolved parties goes without saying.
Do you need / want to avoid accessing the plaintext data yourself (useful for deniability),
Do you need to protect either your user's data from being visible to a third party,
Or a third party's data from the user,
Or your data from the user or a third party?
What are likely attacks?
Do you need to protect in the case where the server is completely compromised?
Do you need to protect against an application level attack where the user simply gains access to some but not all available data (e.g. access to the SQL database, but not the filesystem)?
Will the amount of data be small enough that the attacker can guess and simply check whether he/she got it right? (short passwords, numbers, simple words, fixed form text are likely candidates)
Will the attacker have known plaintext with which to attack?
Is it better for the data to go away (or to re-retrieve the data) if the user forgets their password, or is it worth an increased risk of exposing the data to avoid that cost?
There are probably other questions, but this is the type of thing you want to think about when using encryption. The answers will help you figure out what you need vs. what you want, and will probably help point in the right direction. You might not want to share all of the answers with us.
I'll be deriving the encryption key
from the users' username/password
combination but I'm stumped for what
to do in the (inevitable) event of a
password being forgotten. I realise
that the purpose of encryption is that
it can only be undone using the
correct key but this must have been
addressed before..
You might have decided on a solution without considering the impact. That doesn't mean the solution is wrong, but this question suggests you should think about what you are willing to risk for security. Sometimes data will be risked.
I'm trying to get my head around
whether or not public key cryptography
would apply to the problem but all I
can think of is that the private key
will still need to be correct to
decrypt the data..
This too sounds like a solution in search of a problem. Public key cryptography is useful when you have two (or more) separate actors with an interest in communicating data between them. Those actors can be real (people) or functional (components of a system), but without two actors, there is no reason to have a separate public and private key.

Basically, if you encrypt something, and lose the encryption key, you're screwed.
When it comes to securing data, you need to consider why you're securing it, and what you're attempting to secure it against. And what tradeoffs are worth making in order to do so - the only truly secure system is one that is completely isolated from the internet, which is a level of security that is self-defeating for most applications.
So here are some questions to ask yourself:
If someone compromises my database, is it acceptable for them to be able to access this data?
What if someone compromises my entire application stack?
If the answers to the above two questions are "no", then the key material must be held by the user. And they will lose access to their data if they lose the key.
You can provide an option for manual key recovery if you also have a "master key" that you don't store anywhere near your application, only you hold it and you use it to manually reset passwords. If that's also not an option (say, only the user should be able to access the data, not the system administrator), then you're going to have to make a compromise somewhere.

This is a question I have thought about myself and as I see it the following options are available (with option #1 being the most secure):
Provide no reset password functionality - if they have forgotten their password then they are locked out.
Generate a new secure master key and encrypt & hash the user's key with this master key and store the cipher text and hash result in the database. The secure key is then made known to the user either by adding it to a file that the user downloads, emailing to the user or displaying the secure master key on screen. To reset the password the user would have to enter this master key which is then hashed and compared and if they match, the user's key in the database is decrypted.
Ask the user to provide 2 security questions and answers when registering; hash the answers and store the questions and answer hash in the database. The second answer is used as the master key to encrypt the user's key. To receive a password reset request email the user has to answer the first question correctly. Once they click the link in the email the web page then asks the second question, if this is correct and the query string parameter values are valid then use the answer to the second question to decrypt the user's key.
Use an application global master key (maybe stored in the web/UI application and use this to encrypt and store the user's key. Once a user is verified through a password reset email process the user's key is decrypted using the application global master key and then reencrypted with their new password.
In summary, the benefits of each option is as follows:
This is the ultimate for security and would possibly be the only option if the data was critical to be kept encrypted. However, in the real world people forget their passwords as sure as the sun rises and not providing a reset password function could be a bad commercial decision.
This is secure as the master key is not stored on the front end or database so if the platform is compromised then the data would require some significant effort to decrypt. However, the downside is the user could still lose their master key anyway.
The weakness here is if the database is compromised the answer to the question could be researched and then used to decrypt the users encrypted key.
This approach leaves the application key in the stack leaving your data vulnerable if your platform is hacked. The only protection you have is that if the database server is hacked then the data would still be safe.
As with most things in the world of software development you need to consider what is best for what you are trying to accomplish and aim for the correct balance.

Why are you using a different key for every user?
If you choose one key, it is much easier to handle.
Store your encryption key outside of the database.
Your application will still have to have access to it, but someone with a db dump will not be able to read the encrypted info.

Generate a random session key.
Use the session key to encrypt the data.
Encrypt the random key with any number of user passwords that you need.
This way you can use any user password to decrypt the data.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.