I've got a database table that has a simple incrementing integer as the primary key (1,2,3 etc). These numbers represent the alumni of a college, with personal information in the records. I've been asked to give each record a unique ID when displayed to the user (after they've queried the database) but the ID must not be the primary key, and it must be consistent.
If someone retrieved a record with the arbitrary ID 88gh344r, for example, then did another search and retrieved record 88gh344r again they need to be able to say "That's the same person". Since people need to be able to recognise the identifier from one search to the next, then the ID can't be long and complex.
I've thought of three approaches:
Create an extra table containing the primary key and a random sequence of numbers, and get the query to retrieve the random number equivalent of the primary key.
Encrypt the primary key using MySQL's SHA2 or AES, but these produce long letter and digit sequences.
Encrypt the Primary key on the fly in the query, using something like Base64 encryption in PHP.
Which of these is best, or have I missed a better approach?
I actually just wrote a short tut on a URL shortener that works on that basis, using a recid as the seed. you could use that function to create your lookup key and store it in the DB as the "reference" key the code is here
If you're doing it to protect privacy, you're heading for a major fsckup. It won't take long for the lamest script kiddie to write a simple program which just tries every possibile "hash", dowloading your entire list.
You should look into proper access control so people can see only what they're allowed to see.
If I understand you correctly, your main goal is just not to reveal the primary keys, but use something else instead when communicating with the users.
Simplest way:
add an CHAR column to your table and choose some length you want the other identifiers to be, for example CHAR(16).
give UNIQUE index to that column, so that you won't have any duplicates.
for each row generate a secure *random* string of length 16 and update the row.
DO NOT hash the plain primary key. If the keys start from 1,2,3.. then everybody can match the id to the hash by just calculating hashes for 1,2,3 .... etc
Another problem is that if you for example already have 200 rows in the table and you add 1, then the attacker can automatically associate the primary key 201 to the random string that just appeared in the list.
On the other hand, why do you need to hide the primary keys in the first place. Maybe you should instead encrypt the personal user data in the columns?
you could do a base 36 encode on the userid*100 or something, for example.
userid 26=208
userid 3=8C
http://www.translatorscafe.com/cafe/units-converter/numbers/calculator/decimal-to-base-36
You can truncate a hash or an encrypted value to the desired length, but with both you risk a collision. With eight base-36 digits you have about a 50% chance of a collision if you have 2 million records. If you don't convert to base-36 and just take eight hexadecimal digits, you only need 80 thousand records.
With random numbers you don't have this problem, because you can put a uniqueness constraint on the column and generate a new number if a collision occurs.
Don't make it too complicated.
Consider this:
The user is given an alternative mapping key. This can be a temporary session mapping and/or a secondary unique key (but is not the PK and may or may not be in the same table) and should, of course, be unique with the domain.
Access token, randomly generated per item but need not be unique, which is combined with a simple PK for the exposed id. This can be made to look pretty across values with appropriate transformation if desired. The access token may also be treated as a separate value.
I like the second approach. In both cases it is not the "direct PK" which is exposed although there is more coupling in the 2nd form.
Both of these will prevent the "knowing" the next key based off a sequence, but guessing/brute-force is tied to the size of the domain: as others have said, this shouldn't be used as primary security layer.
Happy coding.
the short answer that I found after 2 month search is hashids that you can install it for every language from https://hashids.org/ .Its options are:
generate encrytion string from a key that called salt.
define min-length for your string for example 8 digits that it's unavailable for functions like base64_encode()
reverse decode your string
define your desired alphabets for example a-z only.(note that 16 alphabet must defined at least)
note: for php it is recommended to activate bcmath and GMP extensions for your website host but it works without GMP.
Related
I'm working on an application which allows a moderator to edit information of user.
So, at the moment, i have URL's like
http://xxx.xxx/user/1/edit
http://xxx.xxx/user/2/edit
I'm a bit worried here, as i'm directly exposing the users table primary key (id) from database.
I simply take the id from the URL's (eg: 1 and 2 from above URL's), query the database with the ID and get user information (of course, i sanitize the input i.e ID from URL).
Please note that:
I'm validating every request to check if moderator has access to edit that user
This is what i'm doing. Is this safe? If not, how should i be doing it?
I can think of one alternative i.e. have a separate column for users table with 25 character key and use the keys in URL's and query database with those keys
But,
What difference does it make? (Since key is exposed now)
Querying by primary key yields result faster than other columns
This is safe (and seems to be the best way to do it) as long as the validation of the admin rights is correct and you have prevention for SQL injection. Both of which you mention so I'd say you're good.
The basic question is if exposing primary key is safe or not. I would say that in most situations it is safe and I believe that Stackoverflow is doing it in the same way:
http://stackoverflow.com/users/1/
http://stackoverflow.com/users/2/
http://stackoverflow.com/users/3/
If you check the member for you can see that the time is decreasing, so the number is probably PK as well.
Anyway, obscuring PK can be useful in situation where you want a common user to avoid going through all entries just by typing 1, 2, 3 etc. to URL, in that case obscuring PK for something like 535672571d2b4 is useful.
If you are really unsure, you could also use XOR with a nice(big) fixed value. This way you would not expose your ids. When applying the same "secret number" again with the xor'ed field, you get the original value.
$YOUR_ID xor $THE_SECRET_NUMBER = $OUTPUTTED_VALUE
$PUTPUTTED_VALUE xor $THE_SECRET_NUMBER = $YOUR_ID
Fast answer no
Long answer
You have a primary key to identify some one with, which is unique. If you add an unique key to prevent people from knowing it, you get that they know an other key.
Which still needs to be unique and have an index (for fast search), sound a lot like a primary key.
If it is a matter of nice url's well then you could use an username or something like that.
But it would be security to obscurity. So beter prevent SQL injection and validate that people have access to the right actions
If you have plain autoincrement ids you will expose your data to the world. It is not sequre (e.g. for bruteforcing all available data in your tables). But you can generate ids of your DB entities not sequentially, but in pseudo random manner. E.g. in PostgreSQL:
CREATE TABLE t1 (
id bigint NOT NULL DEFAULT (((nextval('id_seq'::regclass) * 678223072849::bigint)
% (1000000000)::bigint) + 460999999999::bigint),
...
<other fileds here>
)
I have a CSV in the format:
Bill,Smith,123 Main Street,Smalltown,NY,5551234567
Jane,Smith,123 Main Street,Smalltown,NY,5551234567
John,Doe,85 Main Street,Smalltown,NY,5558901234
John,Doe,100 Foo Street,Bigtown,CA,5556789012
In other words, no one field is unique. Two people can have the same name, two people can have the same phone, etc., but each line is itself unique when you consider all of the fields.
I need to generate a unique ID for each row but it cannot be random. And I need to be able to take a line of the CSV at some time in the future and figure out what the unique ID was for that person without having to query a database.
What would be the fastest way of doing this in PHP? I need to do this for millions of rows, so md5()'ing the whole string for each row isn't really practical. Is there a better function I should use?
If you need to be able to later reconstruct the ID from only the text of the line, you will need a hash algorithm. It doesn't have to be MD5, though.
"Millions of IDs" isn't really a problem for modern CPUs (or, especially, GPUs. See Jeff's recent blog about Speed Hashing), so you might want to do the hashing in a different language than PHP. The only problem I can see is collisions. You need to be certain that your generated hashes really are unique, the chance of which depends on the number of entries, the used algorithm and the length of the hash.
According to Jeff's article, MD5 already is only of the fastest hash algorithms out there (with 10-20,000 million hashes per second), but NTLM appears to be twice as fast.
Why not just
CREATE TABLE data (
first VARCHAR(50),
last VARCHAR(50),
addr VARCHAR(50),
city VARCHAR(50),
state VARCHAR(50),
phone VARCHAR(50),
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
PRIMARY KEY (id)
);
LOAD DATA [LOCAL] INFILE 'file.csv'
INTO TABLE data
(first,last,addr,city,state,phone);
How about just add the unique ID as a field?
$csv=file($file);
$i=0;
$csv_new=array();
foreach ($file as $val){
$csv_new[]=$i.",".$val;
$i++;
}
And output the $csv_new as the new csv file..
Dirty but it may work for you.
I understand what you're saying but I do not see a point. Creating a unique id that auto increments in the database would be the best route. The second route would be creating in the csv something like cell=a1+1 and dragging it down the entire row. In php you ca. Read the file and prepend something such as date(ymd).$id then write it back to the file. Again though this seems silly to do and the database route would be best. Just keep in mind pci compliance and always encrypt the data.
I'll post code later. I'm not at the PC at this time.
It's been a long time, But I found a situation that is sort of like this where I needed to prevent a row being created in a database, I created another column called de_dup which was set to be unique. I then for each row on creation used date('ymd').md5(implode($selected_csv_values)); this would prevent a customer from creating to orders on any given day unless specific information was different ie: firstname,lastname,creditcardnum,billingaddress.
I have a php application that inserts a data into MySQL, which contains a randomly-generated unique value. The string will have about 1 billion possibilities, with probably no more than 1 or 2 million entries at any one time. Essentially, most combinations will not exist in the database.
I'm trying to find the least expensive approach to ensuring a unique value on insert. Specifically, my two options are:
Have a function that generates this unique ID. On each generation, test if the value exists in the database, if yes then re-generate, if no, return value.
Generate random string and attempt insert. If insert fails, test error is 1062 (MySQL duplicate entry X for key Y), re-generate key and insert with new value.
Is it a bad idea to rely upon the MySQL error for re-trying the insert? As I see it, the value will probably be unique, and it seems the initial (using technique 1) would be unnecessary.
EDIT #1
I should have also mentioned, the value must be a 6 character length string, composed of either uppercase letters and/or numbers. They can't be incremental either - they should be random.
EDIT #2
As a side note, I'm trying to create a redemption code for a gift certificate that is difficult to guess. Using numbers and letters creates 36 possibilities for each character, instead of 10 for just numbers or 26 for just letters.
Here's a stripped-down version of the solution I created. The first value entered in the table is the primary key, which is auto incremented. affected_rows() will equal 1 if the insert is successful:
$code = $build_code();
while ((INSERT INTO certificates VALUES ('', $code) ON DUPLICATE KEY UPDATE pk = pk) && affected_rows() == 0)
$code = $build_code();
Is it a bad idea to rely upon the MySQL error for re-trying the insert?
Nope. Go ahead an use it if you want. In fact many people think if you check and if it doesn't exist then it's safe to insert. But unless you lock the table it's always possible that another process might slip in and grab the id.
So go ahead generate a random id if it suits your purpose. Just make sure you test your code so it does properly handle dups. Might also be useful to log dups just to ensure your assumptions about how unlikey dups are to occur are correct.
Define your table with unique constraint:
http://dev.mysql.com/doc/refman/5.0/en/constraint-primary-key.html
Why not just use: "YourColName BIGINT AUTO_INCREMENT PRIMARY KEY" to ensure uniqueness?
So, imagine a mysql table with a few simple columns, an auto increment, and a hash (varchar, UNIQUE).
Is it possible to give mysql a query that will add a column, and generate a unique hash without multiple queries?
Currently, the only way I can think of to achieve this is with a while, which I worry would become more and more processor intensive the more entries were in the db.
Here's some pseudo-php, obviously untested, but gets the general idea across:
while(!query("INSERT INTO table (hash) VALUES (".generate_hash().");")){
//found conflict, try again.
}
In the above example, the hash column would be UNIQUE, and so the query would fail. The problem is, say there's 500,000 entries in the db and I'm working off of a base36 hash generator, with 4 characters. The likelyhood of a conflict would be almost 1 in 3, and I definitely can't be running 160,000 queries. In fact, any more than 5 I would consider unacceptable.
So, can I do this with pure SQL? I would need to generate a base62, 6 char string (like: "j8Du7X", chars a-z, A-Z, and 0-9), and either update the last_insert_id with it, or even better, generate it during the insert.
I can handle basic CRUD with MySQL, but even JOINs are a little outside of my MySQL comfort zone, so excuse my ignorance if this is cake.
Any ideas? I'd prefer to use either pure MySQL or PHP & MySQL, but hell, if another language can get this done cleanly, I'd build a script and AJAX it too.
Thanks!
This is our approach for a similar project, where we wanted to generate unique coupon codes.
First, we used an AUTO_INCREMENT primary key. This ensures uniqueness and query speed.
Then, we created a base24 numbering system, using A,B,C, etc, without using O and I, because someone might have thought that they were 0 or 1.
Then we converted the auto-increment integer to our base24 number. For example, 0=A, 1=B, 28=BE, 1458965=EKNYF. We used base24, because long numbers in base10 have fewer letters in base24.
Then we created a separate column in our table, coupon_code. This was not indexed.
We took the base24 and added 3 random numbers, or I and O (which were not used in our base24), and inserted them into our number. For example, EKNYF could turn into 1EKON6F or EK2NY3F9. This was our coupon code and we inserted it into our coupon_code column. It's unique and random.
So, when the user uses code EK2NY3F9, all we have to do it remove all non-used characters (2,3 and 9) and we get EKNYF, which we convert to 1458965. We just select the primary key 1458965 and then compare coupon_code column with EK2NY3F9.
I hope this helps.
If your heart is set on using base-36 4 character hashes (hashspace is only 1679616), you could probably pre-generate a table of hashes that aren't already in the other table. Then finding a unique hash would be as simple as moving it from the "unused table" to the "used table" which is O(1).
If your table is conceivably 1/3 full you might want to consider expanding your hashspace since it will probably fill up in your lifetime. Once the space is full you will no longer be able to find unique hashes no matter what algorithm you use.
What is this hash a hash of? It seems like you just want a randomly generated unique VARCHAR column? What's wrong with the auto increment?
Anyway, you should just use a bigger hash - find an MD5 function - (if you're actually hashing something), or a UUID generator with more than 4 characters, and yes, you could use a while loop, but just generate a big enough one so that conflicts are incredibly unlikely
As others have suggested whats wrong with an autoinc field? If you want an alpha numeric value then you could simply do a simple conversion from int to a alphanumeric string in base 36. This could be implemented in almost any language.
Going with zneaks comment, why don't you use an autoincrement column? save the hash in another (non unique) field, and concatenate the id to it (dynamically). So you give a user [hash][id]. You can parse it out in pure sql using the substring functions.
Since you have to have the hash, the user can't look at other records by incrementing the id.
So, just in case someone runs across a similar issue, I'm using a UNIQUE field, I'll be using a php hash function to insert the hashes, if it comes back with an error, I'll try again.
Hopefully because of the low likelyhood of conflict, it won't get slow.
You could also check the MySQL functions UUID() and UUID_SHORT(). Those functions generate UUIDs that are globally unique by definition. You won't have to double-check if your PHP-generated hash string already exists.
I think in several cases these functions can also fit your project's requirements. :-)
If you already have the table filled by some content, you can alter it with the following :
ALTER TABLE `page` ADD COLUMN `hash` char(64) AS (SHA2(`content`, 256)) AFTER `content`
This solution will add hash column right after the content one, generates hash for existing and new records too without need to change your INSERT statement.
If you add UNIQUE index to the column (after have removed duplicates), your inserts will only be done if content is not already in the table. This will prevent duplicates.
i'm building an application that needs a random unique id for each user not a sequence
mysql database
ID Username
for my unique random ID, what is the best way to do that?
PHP provides a uniqid function, which might do the trick, I suppose.
Note it's returning a string, though, and not an integer.
Another idea would be to generate / use some GUID -- there are some proposals about that in the user notes of the manual page of uniqid.
I would still have the normal auto-increment primary key to identify each row properly, it's just standard convention.
I'd then have another indexed column called 'user_id' or something and use uniqid(); for it.
MySQL provides a function called UUID():
http://dev.mysql.com/doc/refman/5.1/en/miscellaneous-functions.html#function_uuid
Documentation claims this:
A UUID is designed as a number that is
globally unique in space and time. Two
calls to UUID() are expected to
generate two different values, even if
these calls are performed on two
separate computers that are not
connected to each other.
This should cover your needs.