Data ids and security [duplicate]

Data ids and security [duplicate] - php

I've heard that exposing database IDs (in URLs, for example) is a security risk, but I'm having trouble understanding why.
Any opinions or links on why it's a risk, or why it isn't?
EDIT: of course the access is scoped, e.g. if you can't see resource foo?id=123 you'll get an error page. Otherwise the URL itself should be secret.
EDIT: if the URL is secret, it will probably contain a generated token that has a limited lifetime, e.g. valid for 1 hour and can only be used once.
EDIT (months later): my current preferred practice for this is to use UUIDS for IDs and expose them. If I'm using sequential numbers (usually for performance on some DBs) as IDs I like generating a UUID token for each entry as an alternate key, and expose that.

There are risks associated with exposing database identifiers. On the other hand, it would be extremely burdensome to design a web application without exposing them at all. Thus, it's important to understand the risks and take care to address them.
The first danger is what OWASP called "insecure direct object references." If someone discovers the id of an entity, and your application lacks sufficient authorization controls to prevent it, they can do things that you didn't intend.
Here are some good rules to follow:
Use role-based security to control access to an operation. How this is done depends on the platform and framework you've chosen, but many support a declarative security model that will automatically redirect browsers to an authentication step when an action requires some authority.
Use programmatic security to control access to an object. This is harder to do at a framework level. More often, it is something you have to write into your code and is therefore more error prone. This check goes beyond role-based checking by ensuring not only that the user has authority for the operation, but also has necessary rights on the specific object being modified. In a role-based system, it's easy to check that only managers can give raises, but beyond that, you need to make sure that the employee belongs to the particular manager's department.
There are schemes to hide the real identifier from an end user (e.g., map between the real identifier and a temporary, user-specific identifier on the server), but I would argue that this is a form of security by obscurity. I want to focus on keeping real cryptographic secrets, not trying to conceal application data. In a web context, it also runs counter to widely used REST design, where identifiers commonly show up in URLs to address a resource, which is subject to access control.
Another challenge is prediction or discovery of the identifiers. The easiest way for an attacker to discover an unauthorized object is to guess it from a numbering sequence. The following guidelines can help mitigate that:
Expose only unpredictable identifiers. For the sake of performance, you might use sequence numbers in foreign key relationships inside the database, but any entity you want to reference from the web application should also have an unpredictable surrogate identifier. This is the only one that should ever be exposed to the client. Using random UUIDs for these is a practical solution for assigning these surrogate keys, even though they aren't cryptographically secure.
One place where cryptographically unpredictable identifiers is a necessity, however, is in session IDs or other authentication tokens, where the ID itself authenticates a request. These should be generated by a cryptographic RNG.

While not a data security risk this is absolutely a business intelligence security risk as it exposes both data size and velocity. I've seen businesses get harmed by this and have written about this anti-pattern in depth. Unless you're just building an experiment and not a business I'd highly suggest keeping your private ids out of public eye. https://medium.com/lightrail/prevent-business-intelligence-leaks-by-using-uuids-instead-of-database-ids-on-urls-and-in-apis-17f15669fd2e

It depends on what the IDs stand for.
Consider a site that for competitive reason don't want to make public how many members they have but by using sequential IDs reveals it anyway in the URL: http://some.domain.name/user?id=3933
On the other hand, if they used the login name of the user instead: http://some.domain.name/user?id=some they haven't disclosed anything the user didn't already know.

The general thought goes along these lines: "Disclose as little information about the inner workings of your app to anyone."
Exposing the database ID counts as disclosing some information.
Reasons for this is that hackers can use any information about your apps inner workings to attack you, or a user can change the URL to get into a database he/she isn't suppose to see?

We use GUIDs for database ids. Leaking them is a lot less dangerous.

If you are using integer IDs in your db, you may make it easy for users to see data they shouldn't by changing qs variables.
E.g. a user could easily change the id parameter in this qs and see/modify data they shouldn't http://someurl?id=1

When you send database id's to your client you are forced to check security in both cases. If you keep the id's in your web session you can choose if you want/need to do it, meaning potentially less processing.
You are constantly trying to delegate things to your access control ;) This may be the case in your application but I have never seen such a consistent back-end system in my entire career. Most of them have security models that were designed for non-web usage and some have had additional roles added posthumously, and some of these have been bolted on outside of the core security model (because the role was added in a different operational context, say before the web).
So we use synthetic session local id's because it hides as much as we can get away with.
There is also the issue of non-integer key fields, which may be the case for enumerated values and similar. You can try to sanitize that data, but chances are you'll end up like little bobby drop tables.

My suggestion is to implement two stages of security.
"Security through obscurity": You can have integer Id as primary key and Gid as GUID as surrogate key in tables. Whereas integer Id column is used for relations and other database back-end and internal purposes (and even for select list keys in web apps to avoid unnecessary mapping between Gid and Id while loading and saving) and Gid is used for REST Urls i.e for GET,POST, PUT, DELETE etc. So that one cannot guess the other record id. This gives first level of protection against guess-based attacks. (i.e. number series guessing)
Access based control at Server side : This is most important, and you have various way to validate the request based on roles and rights defined in application. Its up to you to decide.

From the perspective of code design, a database ID should be considered a private implementation detail of the persistence technology to keep track of a row. If possible, you should be designing your application with absolutely no reference to this ID in any way. Instead, you should be thinking about how entities are identified in general. Is a person identified with their social security number? Is a person identified with their email? If so, your account model should only ever have a reference to those attributes. If there is no real way to identify a user with such a field, then you should be generating a UUID before hitting the DB.
Doing so has a lot of advantages as it would allow you to divorce your domain models from persistence technologies. That would mean that you can substitute database technologies without worrying about primary key compatibility. Leaking your primary key to your data model is not necessarily a security issue if you write the appropriate authorization code but its indicative of less than optimal code design.

Related

Is it possible to somehow get this randomly generated key for my site and access the SQL?

I have a php/js site where the information is encoded and put into the database. The encryption key for the information is randomly generated, then given back to the users after they send a post through a form. The encryption key is not stored in my database at all. A seperate, randomly generated, ID is formed and stored in the database, used to lookup the item itself before deciphering it.
My question is, is it possible at all to look through the logs and find information that would reveal the key? I am trying to make it impossible to read any of the SQL data without either being the person who has the code (who can do whatever he wants with it), or by a brute force attack (unavoidable if someone gets my SQL database)?
Just to re-iterate my steps:
User sends information through POST
php file generates random ID and access key. The data is encrypted with the access key then put in the php database with the ID as the PRIMARY KEY.
php file echos just the random ID and the access key.
website uses jQuery to create a link from the key and mysite.com?i=cYFogD3Se8RkLSE1CA [9 digit A-Ba-b09 = ID][9 digit A-Ba-b09 = key]
Is there any possibility if someone had access to my server that can read the information? I want it to be information for me to read the messages myself. The information has to be decodable, it can't be a one way encoding.

I like your system of the URL containing the decryption key, so that not even you, without having data available only on the user's computer, will be able to access.
I still see a few gotchas in this.
URLs are often saved in web server logs. If you're logging to disk, and they get the disk, then they get the keys.
If the attacker has access to your database, he may have enough access to your system to secretly install software that logs the URLs. He could even do something as prosaic as turn logging back on.
The person visiting your site will have the URL bookmarked at least (otherwise it is useless to him) and it will likely appear in his browser history. Normally, bookmarks and history are not considered secure data. Thus, an attacker to a user's computer (either by sitting down directly or if the computer is compromised by malware) can access the data as well. If the payload is desirable enough, someone could create a virus or malware that specifically mines for your static authentication token, and could achieve a reasonable hit rate. The URLs could be available to browser plugins, even, or other applications acting under a seemingly reasonable guise of "import your bookmarks now".
So it seems to me that the best security is then for the client to not just have the bookmark (which, while it is information, it is not kept in anyone's head so can be considered "something he has"), but also for him to have to present "something he knows", too. So encrypt with his password, too, and don't save the password. When he presents the URL, ask for a password, and then decrypt with both (serially or in combination) and the data is secure.
Finally, I know that Google's two-factor authentication can be used by third parties (for example, I use it with Dropbox). This creates another "something you have" by requiring the person accessing the resource to have his cell phone, or nothing. Yes, there is recourse if you lose your cell phone, but it usually involves another phone number, or a special Google-supplied one-time long password that has been printed out and stashed in one's wallet.

Let's start with some basic definitions:
Code Protecting data by translating it to another language, usually a private language. English translated to Spanish is encoded but its not very secure since many people understand Spanish.
Cipher Protecting data by scrambling it up using a key. A letter substitution cipher first documented by Julius Caesar is an example of this. Modern techniques involve mathematical manipulation of binary data using prime numbers. The best techniques use asymmetric keys; the key that is used to encipher the data cannot decipher it, a different key is needed. This allows the public key to be published and is the basis of SSL browser communication.
Encryption Protecting data by encoding and/or enciphering it.
All of these terms are often used interchangeably but they are different and the differences are sometimes important. What you are trying to do is to protect the data by a cipher.
If the data is "in clear" then if it is intercepted it is lost. If it is enciphered, then both the data and the key need to be intercepted. If it is enciphered and encoded, then the data, the key and the code need to be intercepted.
Where is your data vulnerable?
The most vulnerable place for any data is when it is in clear the personal possession of somebody, on a storage device (USB, CD, piece of paper) or inside their head since that person is vulnerable to inducement or coercion. This is the foundation of Wikileaks - people who are trusted with in confidence information are induced to betray that confidence - the ethics of this I leave to your individual consciences.
When it is in transit between the client and the server and vice versa. Except for data of national security importance the SSL method of encryption should be adequate.
When it is in memory in your program. The source code of your program is the best place to store your keys, however, they themselves need to be stored encrypted with a password that you enter each time your program runs (best), that is entered when you compile and publish or that is embedded in your code (worst). Unless you have a very good reason one key should be adequate; not one per user. You should also keep in-memory data encrypted except when you actually need it and you should use any in-memory in-clear data structures immediately and destroy them as soon as you are finished with them. The key has to be stored somewhere or else the data is irrecoverable. But consider, who has access to the source code (including backups and superseded versions) and how can you check for backdoors or trojans?
When it is in transit between your program's machine and the data store. If you only send encrypted data between the program and the data store and DO NOT store the key in the data store this should be OK.
When it is stored in the data store. Ditto.
Do not overlook physical security, quite often the easiest way to steal data is to walk up to the server and copy the hard drive. Many companies (and sadly defence/security forces) spend millions on on-line data security and then put their data in a room with no lock. They also have access protocols that a 10 year old child could circumvent.
You now have lovely encrypted data - how are you going to stop your program from serving it up in the clear to anyone who asks for it?
This brings us to identification, validation and authorisation. More definitions:
Identification A claim made by a person that they are so-and-so. This is usually handled in a computer program by a user name. In physical security applications it is by a person presenting themselves and saying "I am so-and-so"; this can explicitly be by a verbal statement or by presenting an identity document like a passport or implicitly by a guard you know recognising you.
Validation This is the proof that a person is who they say they are. In a computer this is the role of the password; more accurately, this proves that they know the person they say they are's password which is the big, massive, huge and insurmountable problem in the whole thing. In physical security it is by comparing physical metrics (appearance, height etc) as documented in a trusted document (like a passport) against the claim; you need to have protocols in place to ensure that you can trust the document. Incidentally, this is the main cause of problems with face recognition technology to identify bad guys – it uses a validation technique to try and identify someone. “This guy looks like Bad Guy #1”; guess what? So do a lot of people in a population of 7 billion.
Authorisation Once a person has been identified and validated they are then given authorisation to do certain things and go to certain places. They may be given a temporary identification document for this; think of a visitor id badge or a cookie. Depending on where they go they may be required to reidentify and revalidate themselves; think of a bank’s website; you identify and validate yourself to see your bank accounts and you do it again to make transfers or payments.
By and large, this is the weakest part of any computer security system; it is hard for me to steal you data, it is far easier for me to steal your identity and have the data given to me.
In your case, this is probably not your concern, providing that you do the normal thing of allowing the user to set, change and retrieve their password in the normal commercial manner, you have probably done all you can.
Remember, data security is a trade off between security on the one hand and trust and usability on the other. Make things too hard (like high complexity passwords for low value data) and you compromise the whole system (because people are people and they write them down).
Like everything in computers – users are a problem!
Why are you protecting this data, and what are you willing to spend to do so?
This is a classic risk management question. In effect, you need to consider the adverse consequences of losing this data, the risk of this happening with your present level of safeguards and if the reduction in risk that additional safeguards will cost is worth it.
Losing the data can mean any or all of:
Having it made public
Having if fall into the wrong person’s hands
Having it destroyed maliciously or accidently. (Backup, people!)
Having it changed. If you know it has been changed this is equivalent to losing it; if you don’t this can be much, much worse since you may be acting on false data.
This type of thinking is what leads to the classification of data in defence and government into Top Secret, Secret, Restricted and Unrestricted (Australian classifications). The human element intervenes again here; due to the nature of bureaucracy there is no incentive to give a document a low classification and plenty of disincentive; so documents are routinely over-classified. This means that because many documents with a Restricted classification need to be distributed to people who don’t have the appropriate clearance simply to make the damn thing work, this is what happens.
You can think of this as a hierarchy as well; my personal way of thinking about it is:
Defence of the Realm Compromise will have serious adverse consequences for the strategic survival of my country/corporation/family whatever level you are thinking about.
Life and Death Compromise will put someone’s life or health in danger.
Financial Compromise will allow someone to have money/car/boat/space shuttle stolen.
Commercial Compromise will cause loss of future financial gain.
Humiliating Compromise will cause embarrassment. Of course, if you are a politician this is probably No 1.
Personal These are details that you would rather not have released but aren’t particularly earth shaking. I would put my personal medical history in here but, the impact of contravening privacy laws may push it up to Humiliating (if people find out) or Financial (if you get sued or prosecuted).
Private This is stuff that is nobody else’s business but doesn’t actually hurt you if they find out.
Public Print it in the paper for all anyone cares.
Irrespective of the level, you don’t want any of this data lost or changed but if it is, you need to know that this has happened. For the Nazi’s, having their Enigma cipher broken was bad; not knowing it had happened was catastrophic.
In the comments below, I have been asked to describe best practice. This is impossible without knowing the risk of the data (and risk tolerance of the organisation). Spending too much on data security is as bad as spending too little.

First and most importantly, you need a really good, watertight legal disclaimer.
Second, don’t store the user’s data at all.
Instead when the user submits the data (using SSL), generate a hash of the SessionID and your system’s datetime. Store this hash in your table along with the datetime and get the record ID. Encrypt the user’s data with this hash and generate a URL with the record ID and the data within it and send this back to the user (again using SSL). Security of this URL is now the user’s problem and you no longer have any record of what they sent (make sure it is not logged).
Routinely, delete stale (4h,24h?) records from the database.
When a retrieval request comes in (using SSL) lookup the hash, if it’s not there tell the user the URL is stale. If it is, decrypt the data they sent and send it back (using SSL) and delete the record from your database.

Lets have a little think
Use SSL - Data is encrypted
Use username/password for authorisation
IF someboby breaks that - you do have a problem with security
Spend the effort on fixing that. Disaster recover is a waste of effort in this case. Just get the base cases correct.

Considerations in making auto_incremented user id visible?

I've noticed that SO and other sites use the auto-incrementing primary key of the user table as a publicly viewable user id (at least I assume this is what they are doing). In the case of SO, the user's profile can be viewed if you know or can guess their user id.
What are some things to consider before implementing a similar style of user id generation? I am developing a non-commercial app that relies on the concept of "friends" in assigning various permissions between users, but I'd like all users' basic profiles to be viewable at a simple url such as app.com/users/userid. More detailed profile information would only be accessible to "friends" of that user who have been confirmed by that user.
I guess my question is this does the "guessability" of a user ID indicate anything about the inherent security of a system like this or, is that all in the way that individual features are actually implemented? Is there anything I might not be considering about this that would make it unwise? Anything I should absolutely avoid doing with these user ids?
A note: I have no concern for "competitors" knowing or guessing how many users I have based on the number of the most recent user or the rate of change between users.

OWASP - Insecure Direct Object References
Gives a pretty good treatment of the subject. In fact, I highly recommend OWASP in general for security guidelines when developing web applications. I always evaluate my web projects against the TOP 10 security threats found on the site.

It's not a problem at all. In fact, I'd almost say the opposite: if you're having to obscure the params in the URL for security then you're doing it wrong; the security should be handled in the code.
From your question, it looks like you're already thinking about security the right way, so you should be fine with the primary key in the URL.
Having a primary_key which stores no information (like an auto_incremented id) is also good because it will never change. If you're putting info like the username in URLs you'll either want to never implement people being able to edit their usernames, or cope with the broken links that may be left when they do (remember they may be on sites other than yours).
The only info having the auto_incremented id in your URLs could leak is that one user will know if they were a user before or after another. This is unlikely to be a concern (and might not be reliable anyway).

What are some resources to learn about the best practices while building URL's for a website?

This is a follow-up question to Why are the query parameters for many websites(MySQL) very cryptic long integers?. In terms of security and scalability. Please cite any resources(online/offline).

In terms of security and scalability you can use whatever urls you imagine.
It's usually human readability and and usability or (more often) search engine friendship that affects URL building rules.
Say, PHP.NET famous php.net/echo feature giving fast and easy access to function description. That's the reason to make such urls, not security.

Security and scalability have little to do with the way an URL is formed. Some sites do use cryptic URLs as a measure for securing pictures (e.g. Facebook), this type of security is generally frowned upon, as everyone knowing the URL through various ways (browser cache, proxy cache, ...) can access the supposedly protected resource.
While a clean URL helps the search-engine-optimization, it usually means slower response and thus a degraded scalability. Consider an URL http://example.com/user/Lars, showing you the profile page of my profile. If my account is linked in the database not by my accountname but by a surrogate key (some arbitrary number), the system first has to query the user table for my surrogate key in order to get the rest of the information. This degrades performance, as one more table has to be queried.

While I agree completely with Col. Shrapnel answer there are a few cases where long "cryptic" IDs contribute to security and sometimes even scalability. Sharelockers (RapidShare and friends) usually employ the /ID/Filename structure in their URLs to add security through obscurity, it would be far too easy for someone to test all integers or common filenames and get access to sensitive information.
Some people also apply some kind of checksum algorithm to their database IDs, this makes it harder for someone to figure out the underlying structure and "hide" certain kinds of information: I bet many people would be reluctant to pay for something online if the URL for the order was /order/2/ for instance. Using checksums can also contribute for improved performance, since you can disregard invalid IDs without even having to query the database (useful when someone is brute-forcing your URLs).
Long IDs contribute to scalability, in the sense that they provide a wider range of possible numbers (Slashdot had been a victim of the "int(11) is more than enough for everyone" assumption twice - sorry can't find old story), and that's why you see Twitter and friends using different approaches nowadays.
When you use a wider ID you can also store more information, such as timestamps, IP / hostname that created the database record and so on, which can be crucial when you're running a distributed database setup. Take a look at the MongoID class documentation to get a better idea how it works.

It's a good security scheme save user data in different MySQL tables?

I'm programming a sensitive application (health data) that requires a good security schema. Of course, bugs exists and hackers or user may found it. So my concern is to protect data as much as I can. Reading something about security I have found a mechanism that could be useful, but I would like to listen the opinion of someone more experienced in this stuff. The scheme is:
Create a database, call it 'app' for example
Create some common tables (or not)
For each user, create a set of tables with a prefix based on the user name and a hash to avoid a user to known other user prefix (sjiXoi4sa_table)
All database queries for a user will put his own prefix except those refered to the global scope
Is this a good security scheme? Could it be improved? (for instance, it will be great if each user may encrypt their own tables). Any suggestion welcome.

As you describe it, the schema does nothing to improve security - OTOH, the additional overhead in working with more tables means more code means more defects injected. You are also compromising performance with this approach.
However if you were to set up the permissions so that a user only had read/write access to their own tables, then yes, it would be more secure - however you've still caused yourself a lot of problems elsewhere. You can get the same level of security AND reduce the complexity / amount of code AND solve the performance / scalability problems by normalising the data properly, and denying all access by users to the tables - but grant access via stored procedures (where you can effectively apply permissions on a row-by-row basis).

This is security by obscurity, and doesn't add any protection. If the site is hacked, the hacker will just dump the mysql database, not just the user table

This will do nothing to address the security issues in your application and will have the added disadvantage of impacting on performance (creation of tables, indices etc). Consider encrypting data instead and locking down access to user specific data based on which user is logged into your system.

In addition to everyone else's answer to why your schema is a bad idea, here's another piece of advice: Your dealing with health data, so there are probably regulations that you must follow to ensure security. For example, NIST standards or whatever your national equivalent is. Find them, read them, follow them. In many jurisdictions around the world such standards are mandatory, and you could be liable for damages if something happens and you didn't follow the standards.
So, find out what (if any) standards apply to processing and storing private health data in your country, and use them.

Any scheme that requires the creation of a new set of tables for every new record in another table is too horrible to use except in obfuscation contests, no matter how much security it adds. And as already pointed out, it doesn't add much security either.

Storing encrypted personal information - common sense?

We're in the middle of developing a e-commerce application that will be used by our customers on a pay-monthly-plan.
We have thought a bit about offering encryption of all personal data that is stored in the database, to make our application a notch safer to the final consumers.
The encryption would be handled completely transparent in both front and backend and make sure that even if someone would gain pure database access, it would be impossible to decrypt the personal details of the final consumers without the encryption key.
Is this common sense, or are we taking on a too big bite to chew compared to the increased safety this would add to the final customers?

I might be out of my depth here, as I'm not a security expert, but here's a few questions that come to mind:
What are the chances of an attacker gaining access to the data?
Does the data contain anything confidential?
What could an attacker stand to gain from accessing the data?
What could you, or your company, stand to lose if an attacker gained access to the data? It's not just the data, it's potentially your reputation too.
How much will it cost to implement?
What are your legal obligations with regard to customer data?
If data are encrypted using a single global key, how will you keep the key safe?
If the key is really safe, how will you use it to encrypt and decrypt data?
If data are encrypted using multiple keys (perhaps one for each customer login), how will you recover data if a customer loses their key/password?
If you are able to recover customer data, how does that affect its safety?
What access will computer repair technicians, sysadmins, etc., have to your database server, and how will that affect data security? (It's not just about external hackers).
What are the performance effects of encryption and decryption?
What other mechanisms, like firewalls, physical security and employee vetting can be put in place?
Here's a quote from the UK FSA Your responsibilities for
customer data security (pdf):
Getting data protection wrong can
bring commercial, reputational,
regulatory and legal penalties.
Getting it right brings rewards in
terms of customer trust and
confidence.

My answer is: Sometimes. I've worked for a few companies that employ e-commerce solutions. Security and Encryption need to be better than what the information is worth. Names and address not as "valuable" than say credit card numbers and transaction information. The setup I'm most familiar with is one where all general CRM data - names, address, etc - that is typically fetched more frequently are stored on servers databases - plain text - and the security for the server is increased (Firewalls, patches, etc) and the script accessing the database is of course - secured to the best of the developers knowledge.
Credit card, transaction information, the real down and dirty "information people would want to steal" was contained on a server - encrypted, secured, and only available via local lan. The encryption key was on a second server access to these machines was dictated by a rotating authentication key that only a third server knew. The two key/data servers were unaware of each other. When purchases were made the third server - accessed by a forth - would "magically" make it all come together to complete the purchase.
It's a very convoluted, and terrible answer. In short - protecting/encrypting the very sensitive data is a must is you wish to ensure your customers protection from theft - but ALL information may be an unnecessary overhead for your application. Security is only worth what the data is worth to the thief.

Doing this you lose many of the relational database advantages (searches, reports for the business intelligence and so on).
Furthermore, if you store the keys you just add a layer of 'security': an attacker will have to obtain the keys in order to read the data, but if he has full access to your database, he probably has access to the keys' repository, too (as must have access to that repository the frontend and backoffice applications).
If you instead give the users the responsibility to store their own keys, you lose the possibility to restore the data in case an user lose his key.
Get the real sensible information, put it in a separate server, put as much security as possible around it and acces the data only when needed.
In my opinion the main threat of your approach will be the (false) sense of security that the encryption will give. Sensible data must be treated with all the due caution in storing, but also during elaboration and use: put your money in good system administrators, prepared software engineers and periodical security assessments, if your business require.

Why is it safer?
You need to store the decrpytion key in order to provide that data to the user - it's not really relevant that its only held in the 'front-end' system - in order to get to the back end a hacker must get through the front end first.
You also eliminate a LOT of the searching functionality.
You have to do a lot of coding to imlpement this.
You're placing much heavier demands on the system (i.e. more hardware cost, poorer performance).
IMHO your money and time would be better spent on improving security elsewhere.
C.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.