Related
So in this app, we have a user id which is simple auto-increment primary key. Since we do not want to expose this at the client side, we are going to use a simple hash (encryption is not important, only obfuscation).
So when a user is added to the table we do uniqid(). user_id. This will guarantee that the user hash is random enough and always unique.
The question I have is, while inserting the record, we do not know the user id at that point (cannot assume max(user_id) + 1) since there might be inserts getting committed. So we are doing an insert then getting the last_insert_idthen using that for theuser_id`, which adds an additional db query. So is there a better way to do this?
A few things before the actual answer: with latest version of MySQL which uses InnoDB as default storage engine - you always want an integer pk (or the famous auto_increment). Reasons are mostly performance. For more information, you can research on how InnoDB clusters records using PK and why it's so important. With that out of the way, let's consider our options for creating a unique surrogate key.
Option 1
You calculate it yourself, using PHP and information you obtained back from MySQL (the last_insert_id()), then you update the database back.
Pros: easy to understand by even novice programmers, produces short surrogate key.
Cons: extremely bad for concurrent access, you'll probably get clashes, and you never want to use PHP to calculate unique indices required by the database.
You don't want that option
Option 2
Supply the uniqid() to your query, create an AFTER INSERT trigger that will concatenate uniqid() with the auto_increment.
Pros: easy to understand, produces short surrogate key.
Cons: requires you to create the trigger, implements magic that's not visible from the code directly which will definitely confuse a developer that inherits the project at some point - and from experience I would bet that bad things will happen
Option 3
Use universally unique identifiers or UUIDs (also known as GUIDs). Simply supply your query with surrogate_key = UUID() and MySQL does the rest.
Pros: always unique, no magic required, easy to understand.
Cons: none, unless the fact that it occupies 36 chars bothers you.
You want the option 3.
Since we do not want to expose this at the client side
Simply don't.
In a well-designed database, users never need to see a primary-key value. In fact, a user need never know the primary key even exists.
From your question it seems you actually replace your normal auto-increment ID column with a surrogate id (If not skip to the last paragraph).
Try creating a column with another unique surrogate ID and use that on your frontend. And you can keep your normal primary ids for relationships etc.'
Remember one of the basic must rules for primary keys:
The primary key must be compact and contain the fewest possible attributes.
Also integer serials have the advantage of being simple to use and implement. They also, depending on the specific implementation of the serialization method, have the advantage of being quickly derivable, as most databases just store the serial number in a fixed location. Meaning in stead of max(id)+1 the db has it already stored and makes auto-increment fast.
So we are doing an insert then getting the last_insert_id then using
that for theuser_id`, which adds an additional db query.
last_insert_id Isn't actually a query and is a saved variable in your db connection when you performed a insert query.
If you already have a second column for your surrogate ID ignore all the above:
So we are doing an insert then getting the last_insert_id then using
that for theuser_id`, which adds an additional db query. So is there a
better way to do this?
No, you can only retrieve that uniqid by doing a query.
$res = mysql_query('SELECT LAST_INSERT_ID()');
$row = mysql_fetch_array($res);
$lastsurrogateid = $row['surrogate_id'];
Anything else is making it more complicated than necessary.
I'm working on an application which allows a moderator to edit information of user.
So, at the moment, i have URL's like
http://xxx.xxx/user/1/edit
http://xxx.xxx/user/2/edit
I'm a bit worried here, as i'm directly exposing the users table primary key (id) from database.
I simply take the id from the URL's (eg: 1 and 2 from above URL's), query the database with the ID and get user information (of course, i sanitize the input i.e ID from URL).
Please note that:
I'm validating every request to check if moderator has access to edit that user
This is what i'm doing. Is this safe? If not, how should i be doing it?
I can think of one alternative i.e. have a separate column for users table with 25 character key and use the keys in URL's and query database with those keys
But,
What difference does it make? (Since key is exposed now)
Querying by primary key yields result faster than other columns
This is safe (and seems to be the best way to do it) as long as the validation of the admin rights is correct and you have prevention for SQL injection. Both of which you mention so I'd say you're good.
The basic question is if exposing primary key is safe or not. I would say that in most situations it is safe and I believe that Stackoverflow is doing it in the same way:
http://stackoverflow.com/users/1/
http://stackoverflow.com/users/2/
http://stackoverflow.com/users/3/
If you check the member for you can see that the time is decreasing, so the number is probably PK as well.
Anyway, obscuring PK can be useful in situation where you want a common user to avoid going through all entries just by typing 1, 2, 3 etc. to URL, in that case obscuring PK for something like 535672571d2b4 is useful.
If you are really unsure, you could also use XOR with a nice(big) fixed value. This way you would not expose your ids. When applying the same "secret number" again with the xor'ed field, you get the original value.
$YOUR_ID xor $THE_SECRET_NUMBER = $OUTPUTTED_VALUE
$PUTPUTTED_VALUE xor $THE_SECRET_NUMBER = $YOUR_ID
Fast answer no
Long answer
You have a primary key to identify some one with, which is unique. If you add an unique key to prevent people from knowing it, you get that they know an other key.
Which still needs to be unique and have an index (for fast search), sound a lot like a primary key.
If it is a matter of nice url's well then you could use an username or something like that.
But it would be security to obscurity. So beter prevent SQL injection and validate that people have access to the right actions
If you have plain autoincrement ids you will expose your data to the world. It is not sequre (e.g. for bruteforcing all available data in your tables). But you can generate ids of your DB entities not sequentially, but in pseudo random manner. E.g. in PostgreSQL:
CREATE TABLE t1 (
id bigint NOT NULL DEFAULT (((nextval('id_seq'::regclass) * 678223072849::bigint)
% (1000000000)::bigint) + 460999999999::bigint),
...
<other fileds here>
)
I've got a database table that has a simple incrementing integer as the primary key (1,2,3 etc). These numbers represent the alumni of a college, with personal information in the records. I've been asked to give each record a unique ID when displayed to the user (after they've queried the database) but the ID must not be the primary key, and it must be consistent.
If someone retrieved a record with the arbitrary ID 88gh344r, for example, then did another search and retrieved record 88gh344r again they need to be able to say "That's the same person". Since people need to be able to recognise the identifier from one search to the next, then the ID can't be long and complex.
I've thought of three approaches:
Create an extra table containing the primary key and a random sequence of numbers, and get the query to retrieve the random number equivalent of the primary key.
Encrypt the primary key using MySQL's SHA2 or AES, but these produce long letter and digit sequences.
Encrypt the Primary key on the fly in the query, using something like Base64 encryption in PHP.
Which of these is best, or have I missed a better approach?
I actually just wrote a short tut on a URL shortener that works on that basis, using a recid as the seed. you could use that function to create your lookup key and store it in the DB as the "reference" key the code is here
If you're doing it to protect privacy, you're heading for a major fsckup. It won't take long for the lamest script kiddie to write a simple program which just tries every possibile "hash", dowloading your entire list.
You should look into proper access control so people can see only what they're allowed to see.
If I understand you correctly, your main goal is just not to reveal the primary keys, but use something else instead when communicating with the users.
Simplest way:
add an CHAR column to your table and choose some length you want the other identifiers to be, for example CHAR(16).
give UNIQUE index to that column, so that you won't have any duplicates.
for each row generate a secure *random* string of length 16 and update the row.
DO NOT hash the plain primary key. If the keys start from 1,2,3.. then everybody can match the id to the hash by just calculating hashes for 1,2,3 .... etc
Another problem is that if you for example already have 200 rows in the table and you add 1, then the attacker can automatically associate the primary key 201 to the random string that just appeared in the list.
On the other hand, why do you need to hide the primary keys in the first place. Maybe you should instead encrypt the personal user data in the columns?
you could do a base 36 encode on the userid*100 or something, for example.
userid 26=208
userid 3=8C
http://www.translatorscafe.com/cafe/units-converter/numbers/calculator/decimal-to-base-36
You can truncate a hash or an encrypted value to the desired length, but with both you risk a collision. With eight base-36 digits you have about a 50% chance of a collision if you have 2 million records. If you don't convert to base-36 and just take eight hexadecimal digits, you only need 80 thousand records.
With random numbers you don't have this problem, because you can put a uniqueness constraint on the column and generate a new number if a collision occurs.
Don't make it too complicated.
Consider this:
The user is given an alternative mapping key. This can be a temporary session mapping and/or a secondary unique key (but is not the PK and may or may not be in the same table) and should, of course, be unique with the domain.
Access token, randomly generated per item but need not be unique, which is combined with a simple PK for the exposed id. This can be made to look pretty across values with appropriate transformation if desired. The access token may also be treated as a separate value.
I like the second approach. In both cases it is not the "direct PK" which is exposed although there is more coupling in the 2nd form.
Both of these will prevent the "knowing" the next key based off a sequence, but guessing/brute-force is tied to the size of the domain: as others have said, this shouldn't be used as primary security layer.
Happy coding.
the short answer that I found after 2 month search is hashids that you can install it for every language from https://hashids.org/ .Its options are:
generate encrytion string from a key that called salt.
define min-length for your string for example 8 digits that it's unavailable for functions like base64_encode()
reverse decode your string
define your desired alphabets for example a-z only.(note that 16 alphabet must defined at least)
note: for php it is recommended to activate bcmath and GMP extensions for your website host but it works without GMP.
So, imagine a mysql table with a few simple columns, an auto increment, and a hash (varchar, UNIQUE).
Is it possible to give mysql a query that will add a column, and generate a unique hash without multiple queries?
Currently, the only way I can think of to achieve this is with a while, which I worry would become more and more processor intensive the more entries were in the db.
Here's some pseudo-php, obviously untested, but gets the general idea across:
while(!query("INSERT INTO table (hash) VALUES (".generate_hash().");")){
//found conflict, try again.
}
In the above example, the hash column would be UNIQUE, and so the query would fail. The problem is, say there's 500,000 entries in the db and I'm working off of a base36 hash generator, with 4 characters. The likelyhood of a conflict would be almost 1 in 3, and I definitely can't be running 160,000 queries. In fact, any more than 5 I would consider unacceptable.
So, can I do this with pure SQL? I would need to generate a base62, 6 char string (like: "j8Du7X", chars a-z, A-Z, and 0-9), and either update the last_insert_id with it, or even better, generate it during the insert.
I can handle basic CRUD with MySQL, but even JOINs are a little outside of my MySQL comfort zone, so excuse my ignorance if this is cake.
Any ideas? I'd prefer to use either pure MySQL or PHP & MySQL, but hell, if another language can get this done cleanly, I'd build a script and AJAX it too.
Thanks!
This is our approach for a similar project, where we wanted to generate unique coupon codes.
First, we used an AUTO_INCREMENT primary key. This ensures uniqueness and query speed.
Then, we created a base24 numbering system, using A,B,C, etc, without using O and I, because someone might have thought that they were 0 or 1.
Then we converted the auto-increment integer to our base24 number. For example, 0=A, 1=B, 28=BE, 1458965=EKNYF. We used base24, because long numbers in base10 have fewer letters in base24.
Then we created a separate column in our table, coupon_code. This was not indexed.
We took the base24 and added 3 random numbers, or I and O (which were not used in our base24), and inserted them into our number. For example, EKNYF could turn into 1EKON6F or EK2NY3F9. This was our coupon code and we inserted it into our coupon_code column. It's unique and random.
So, when the user uses code EK2NY3F9, all we have to do it remove all non-used characters (2,3 and 9) and we get EKNYF, which we convert to 1458965. We just select the primary key 1458965 and then compare coupon_code column with EK2NY3F9.
I hope this helps.
If your heart is set on using base-36 4 character hashes (hashspace is only 1679616), you could probably pre-generate a table of hashes that aren't already in the other table. Then finding a unique hash would be as simple as moving it from the "unused table" to the "used table" which is O(1).
If your table is conceivably 1/3 full you might want to consider expanding your hashspace since it will probably fill up in your lifetime. Once the space is full you will no longer be able to find unique hashes no matter what algorithm you use.
What is this hash a hash of? It seems like you just want a randomly generated unique VARCHAR column? What's wrong with the auto increment?
Anyway, you should just use a bigger hash - find an MD5 function - (if you're actually hashing something), or a UUID generator with more than 4 characters, and yes, you could use a while loop, but just generate a big enough one so that conflicts are incredibly unlikely
As others have suggested whats wrong with an autoinc field? If you want an alpha numeric value then you could simply do a simple conversion from int to a alphanumeric string in base 36. This could be implemented in almost any language.
Going with zneaks comment, why don't you use an autoincrement column? save the hash in another (non unique) field, and concatenate the id to it (dynamically). So you give a user [hash][id]. You can parse it out in pure sql using the substring functions.
Since you have to have the hash, the user can't look at other records by incrementing the id.
So, just in case someone runs across a similar issue, I'm using a UNIQUE field, I'll be using a php hash function to insert the hashes, if it comes back with an error, I'll try again.
Hopefully because of the low likelyhood of conflict, it won't get slow.
You could also check the MySQL functions UUID() and UUID_SHORT(). Those functions generate UUIDs that are globally unique by definition. You won't have to double-check if your PHP-generated hash string already exists.
I think in several cases these functions can also fit your project's requirements. :-)
If you already have the table filled by some content, you can alter it with the following :
ALTER TABLE `page` ADD COLUMN `hash` char(64) AS (SHA2(`content`, 256)) AFTER `content`
This solution will add hash column right after the content one, generates hash for existing and new records too without need to change your INSERT statement.
If you add UNIQUE index to the column (after have removed duplicates), your inserts will only be done if content is not already in the table. This will prevent duplicates.
Problem: When I use an auto-incrementing primary key in my database, this happens all the time:
I want to store an Order with 10 Items. The ordered Items belong to the Order. So I store the order, ask the database for the last inserted id (which is dangerous when it comes to concurrency, right?), and then store the 10 Items with the foreign key (order_id).
So I always have to do:
INSERT ...
last_inserted_id = db.lastInsertId();
INSERT ...
INSERT ...
INSERT ...
and I believe this prevents me from using transactions in almost all INSERT cases where I need a foreign key.
So... here some solutions, and I don't know if they're really good:
A) Don't use auto_increment keys! Use a key table?
Key Table would have two fields: table_name, next_key. Every time I need a key for a table to insert a new dataset, first I ask for the next_key by accessing a special static KeyGenerator class method. This does a SELECT and an UPDATE, if possible in one transaction (would that work?). Of course I would request that for every affected table. Next, I can INSERT my entire object graph in one transaction without playing ping-pong with the database, before I know the keys already in advance.
B) Using GUUID / UUID algorithm for keys?
These suppose to be really unique worldwide, and they're LARGE. I mean ... L_A_R_G_E. So a big amount of memory would go into these gigantic keys. Indexing will be hard, right? And data retrieval will be a pain for the database - at least I guess - integer keys are much faster to handle. On the other hand, these also provide some security: Visitors can't iterate anymore over all orders or all users or all pictures by just incrementing the id parameter.
C) Stick with auto_incremented keys?
Ok, if then, what about transactions like described in the example above? How can I solve that? Maybe by inserting a Ghost Row first and then doing an transaction with one UPDATE + n INSERTs?
D) What else?
When storing orders, you need transactions to prevent situations where only half your products are added to the database.
Depending on your database and your connector, the value returned by the last-insert-id function might be transaction-independent. For instance, with MySQL, mysql_insert_id returns the identifier for the last query from that particular client (without being affected by what other clients are doing concurrently).
Which database are you using?
Yes, typically inserting a record and then trying to select it again to find the auto-generated key is bad, especially if you are using a naive select max(id) from table query. This is because as soon as two threads are creating records max(id) may not actually return the last id your current thread used.
One way to avoid this is to create a sequence in the database. From your code you select sequence.NextValue then use that value to then execute your inserts (or you can craft a more complex SQL statement that does this selection and the inserts in one go). Sequences are atomic / thread-safe.
In MySQL you can ask for the last inserted id from the execution results which I believe will always give you the correct answer.
Sql Server supports SCOPE_IDENTITY (Transact-SQL) which should take care of your transaction issue and concurrency issue.
I would say stick with auto_increment.
(Assuming you are using MySQL)
"ask the database for the last inserted id (which is dangerous when it comes to concurrency, right?)"
If you use MySQLs last_insert_id() function, you only see what happened in your session. So this is safe. You mention ths:
db.last_insert_id()
I don't know what framework or language it is, but I would assume that uses MySQL's last_insert_id() under the covers (if not, it is a pretty useless database abstraction fromework)
" I believe this prevents me from using transactions in almost all INSERT cases w"
I don't see why. Please explain.
D) Sequence
: may not be available in your DBMS, but if it is, solves your problem elegantly.
For Postgresql, have a look at Sequence Functions
There is no final and general answer to this question.
auto incrementing columns are easy to use when you add new records. To use them as foreign keys within the same transaction, they are not so straight forward. You need database specific commands to get the newly created key. This technology is common for certain databases, for instance sql server.
Sequences seem to be harder to use, because you need to get a key before you insert a row, but at the end its easier to use them as foreign keys. This technology is common for certain databases, for instance oracle.
When you use Hibernate or NHibernate, it is discouraged to use auto incrementing keys, because some optimizations are not possible anymore. Using a hi-lo algorithm which uses an additional table is recommended.
Guids are strong, for instance when sharing data between different databases, systems, disconnected scenarios, import / export etc. In many databases, most of the tables contain only a few hundred records, so memory and performance are not such an issue. When using NHibernate, you get an guid generator which produces sequential guids, because some databases perform better when keys are sequential.