Helllo friends
I have developed a form.Which allows the user to store there data.now when i am storing the data wat all care i must take so that my any wrong values are not inserted.Or it is not hacked
What you're asking about is called input validation, and there's a lot of information about it out there.
There are primarily two parts:
Making sure the user put in something useful.
Making sure the user didn't put in something harmful.
The former is most often done via JavaScript on the client side (for a generally smoother user experience and fewer postbacks). It should be re-done on the server side as well just to make sure, since you should never trust user input. Basically it involves things like regular expressions to check the format of an email address, enums to check the value of a drop down list, etc.
The latter must be done server side because you should never trust user input. It involves escaping strings against SQL injection attacks, validating field length against buffer overflow attacks (less common these days), etc.
Firstly you need to understand about 2 means of security.
Sanitation
Validation
Sanitation is cleaning data so that when you validate your data after removing any unneeded validation flaws.
Sanitation consists of removing characters such as non-visible chars (space,tabs,new-lines, ...) and they should be done across the board.
After validation your data, such as if(strlen($_GET['key']) > 0), you will be inserting the data to your database, but the ways of doing this varies depending on the database type
PHP Offers functions to escape data such as mysql_real_espae_string()
This method is refereed to as Database Escaping.
You need to validate your input, you can do this by Javascript functions which check the input before the form is submitted or you can also call PHP functions to check the values that the form submits before they are stored to a database. If you are using PHP you can opt to learn MVC frameworks such as CodeIgniter or CakePHP which make this process a whole lot easier and more friendly for you as a developer. Such frameworks normally have libraries with code for validations so you just need to use them and not write your own.
Related
The Background
HTML form, e.g. for a user to submit their business details which will later appear on a legal document - so data needs to be precise.
Submits to a PHP script that validates all inputs.
If all inputs are valid, it sanitizes the data and writes it to a database using parameterised queries.
If any of the inputs are invalid, it re-displays the form. My feeling is that the user would expect this form to be populated with what they originally typed in with some feedback on what is wrong with their input. They can then amend their input and re-submit the form. This means the form needs to be populated with unsanitized data (this will be escaped before displaying it).
All good so far.
The Problem
If the data is valid, it is written to a database. Best practice seems to be to sanitize the data before sending it to the database.
This means the data written to the database might not be exactly what the user typed in (e.g. if sanitization removes some "dangerous" characters).
This seems like a poor user experience to me.
I'm using PHP and the code is running within the WordPress framework. WP has its own sanitization functions and they recommend always sanitizing input before using it. They also suggest using PHP's santization features too. But nothing seems to address the issue that sanitizing data before storing it might result in saved data being different from what the user entered.
The Question
What I'd like is a description of an approach that's been used in the real world that addresses this issue? Or some feedback from those of you more experience than I am, that this is not a problem in the real world and it's common practice just to sanitize data and store it without further concern or feedback to the user.
My thoughts about possible solutions
A more thorough pattern would be to consider unsanitary data as invalid and feedback to the user what is wrong with their input. But this seems impractical and would require fairly long sanitization functions to provide any specific and useful feedback to the user. It also renders existing WP/PHP sanitization functions somewhat irrelevant.
A practical compromise may be to compare sanitized data with raw data and then simply notify the user that something got cleaned up before it was saved... so they can at least check the saved data to make sure they're happy with it.
Thanks for your help.
Conclusions
The answer I've accepted was helpful and lead me to a solution to my particular use case, but I wanted to add a few points of my own.
Firstly, on re-reading the WP documentation I found that it's not recommending to validate AND then sanitize before writing to a database. It recommends to validate, but suggests sanitizing the input might be more convenient if the particular situation does not require strict validation. It also says use one or the other, not both. So I don't think the WP documentation is wrong on this, I just misread it.
Secondly, I didn't understand that parameterized queries are so effective against SQL injection. So I figured that sanitizing input before using it in a DB query was a sensible thing to do. But it seems it's not necessary.
And finally, I now realise that it's all about context... the issue is making data safe for a particular use. In that sense, it's not that one technique is only appropriate for input and another technique is only appropriate for output. I need to think about validating, sanitizing or escaping when doing anything with the data - e.g. write it to a DB, use it in a calculation, print it to screen, or inject it into a PDF document. And in all cases, I just need to think about how I make it safe for that particular use. Sanitizing "input" might be entirely appropriate - if it's quick and easy, makes the data safe for whatever I need to do and doesn't render the data inaccurate. Another example is the WordPress function esc_url_raw() which the manual says is specifically to be used when storing a URL in the database. So again, the idea that escaping is only appropriate for "output" is misleading.
I ended up validating the input before writing it to the database. I did not need to sanitize it aswell. So I if it's invalid, I tell the user. If it's valid, it gets written to the DB in its original form. And I escape it before displaying it back to the user.
Best practice seems to be to sanitize the data before sending it to the database.
This is a common misconception. Sanitization should only be performed on data that is being output, to prevent XSS for example and even then only as a last resort. Exactly because it can irreversibly destroy the original data.
Validation is your first line of defense. Make sure that the data is properly formatted, and valid within its context - just that; no looking for special characters, don't be over-zealous. If it's not valid - reject it, don't try to salvage the "good" parts from it.
Then, when storing in database, you merely need to use parameterized queries - that is 100% effective against SQL injections. If you didn't mangle the data in a previous step, you're storing it in its original form.
And finally, when the data is being output, that is where you SHOULD escape special characters within the appropriate context, so that it is properly rendered; or sanitize it if you have no other choice (i.e. the context is unclear and therefore you can't do proper escaping).
It looks like you are worried about user feelings, that's good. There are few things which you can do.
Use html form pattern - for sure no one name needs signs like < > & $ " ... - exclude this with pattern, use css :invalid and :invalid:focus to inform the user before submitting if something is wrong. It is very easy and simple.
Than goes php further validation and WP sanitation.
You can use intermediate state - after 'wash' - display final version (no inputs) with 2 buttons - save or correction - let the user decide, most of us don't like this repetitions "are you sure? clicking submit you mean submit?" - but maybe with so relevant content, users would like to have last chance, and they wish to see the final version (without inputs, checkboxes etc).
Now you put accepted version into db (prepared).
Comparing raw data with washed is not practical, honestly it sucks - the users won't be coders - they just won't be able to correctly understand "we sanitised your answers, and now they are 345 characters shorter. Sorry for inconvenience"
Don't worry too much
...there is a german last name 'Ei' - only 2 characters, so pattern can't require more than 2.
I've been working with PHP for some time and I began asking myself if I'm developing good habits.
One of these is what I belive consists of overusing PHP sanitizing methods, for example, one user registers through a form, and I get the following post variables:
$_POST['name'], $_POST['email'] and $_POST['captcha']. Now, what I usually do is obviously sanitize the data I am going to place into MySQL, but when comparing the captcha, I also sanitize it.
Therefore I belive I misunderstood PHP sanitizing, I'm curious, are there any other cases when you need to sanitize data except when using it to place something in MySQL (note I know sanitizing is also needed to prevent XSS attacks). And moreover, is my habit to sanitize almost every variable coming from user-input, a bad one ?
Whenever you store your data someplace, and if that data will be read/available to (unsuspecting) users, then you have to sanitize it. So something that could possibly change the user experience (not necessarily only the database) should be taken care of. Generally, all user input is considered unsafe, but you'll see in the next paragraph that some things might still be ignored, although I don't recommend it whatsoever.
Stuff that happens on the client only is sanitized just for a better UX (user experience, think about JS validation of the form - from the security standpoint it's useless because it's easily avoidable, but it helps non-malicious users to have a better interaction with the website) but basically, it can't do any harm because that data (good or bad) is lost as soon as the session is closed. You can always destroy a webpage for yourself (on your machine), but the problem is when someone can do it for others.
To answer your question more directly - never worry about overdoing it. It's always better to be safe than sorry, and the cost is usually not more than a couple of milliseconds.
The term you need to search for is FIEO. Filter Input, Escape Output.
You can easily confound yourself if you do not understand this basic principle.
Imagine PHP is the man in the middle, it receives with the left hand and doles out with the right.
A user uses your form and fills in a date form, so it should only accept digits and maybe, dashes. e.g. nnnnn-nn-nn. if you get something which does not match that, then reject it.
That is an example of filtering.
Next PHP, does something with it, lets say storing it in a Mysql database.
What Mysql needs is to be protected from SQL injection, so you use PDO, or Mysqli's prepared statements to make sure that EVEN IF your filter failed you cannot permit an attack on your database. This is an example of Escaping, in this case escaping for SQL storage.
Later, PHP gets the data from your db and displays it onto a HTML page. So you need to Escape the data for the next medium, HTML (this is where you can permit XSS attacks).
In your head you have to divide each of the PHP 'protective' functions into one or other of these two families, Filtering or Escaping.
Freetext fields are of course more complex than filtering for a date, but never mind, stick to the principles and you will be OK.
Hoping this helps http://phpsec.org/projects/guide/
I've implemented input validation on all of my input data using php (as well as js on the front-end). I'm type casting where I can, validating stuff like emails against a regex, making sure dropdown values are only ones I'm expecting and also in many cases where I'm expecting only a string I have a regex that runs that only allows letters, numbers and spaces. Anything that doesn't meet these rules results in the form failing validation and no sql queries are run.
With that said if my form passes validation I'm making the assumption that it's safe for input in to my db (which I'm doing via pdo) and then escaped on output.
So with that said why do I need input sanitization?
If you have very strict validation server-side, you don't need to sanatize. Eg. validating a string against /^[a-z0-9]{5,25}$/ will not need any sanitization (removing non alphanumeric characters will not make any sense, since they should not be able to pass anyway).
Just make sure you can validate all data, and if that's impossible (e.g. with html it tends to be a bit difficult), you can use escaping strategies or things like html purifier.
For a good overview on escaping strategies for XSS prevention:
see https://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet
For an idea of different security threats:
https://www.owasp.org/index.php/PHP_Security_Cheat_Sheet
You need both. Validating input data is easily beaten at the client side, but it's useful for legitimate users who aren't trying to hack you.
Sanitize the data (all the data, whether it's input data or something straight from your DB that you think you should be able to trust) before putting it into your database.
Even if you 100% trust your validation and do it on the server side (where, in theory, people shouldn't be able to mess with the data), it's still worth using some form of sanitizing because it's a good habit to get into.
Output or Input filtering?
I constantly see people writing "filter you inputs", "sanitize your inputs", don't trust user data, but I only agree with the last one, where I consider trusting any external data a bad idea even if it is internal relative to the system.
Input filtering:
The most common that I see.
Take the form post data or any other external source of information and define some boundaries when saving it, for example making sure text is text, numbers are numbers, that sql is valid sql, that html is valid html and that it does not contain harmful markup, and then you save the "safe" data in the database.
But when fetching data you just use the raw data from the database.
In my personal opinion, the data is never really safe.
Although it sounds easy, just filter everything you get from forms and url's, in reality it is much harder than that, it might be safe for one language but not another.
Output filtering:
When doing it this way I save the raw unaltered data, whatever it might be, with prepared statements into the database and then filter out the problematic code when accessing the data, this has it's own advantages:
This adds a layer between html and the server side script.
which I consider to be data access separation of sorts.
Now data is filtered depending on the context, for example I can have the data from the database presented in a html document as plain-escaped-text, or as html or as anything anywhere.
The drawbacks here are that you must not ever forget to add the filtering which is a little bit harder than with input filtering and it uses a bit more CPU when providing data.
This does not mean that you don't need to do validation checks, you still do, it's just that you don't save the filtered data, you validate it and provide the user with a error message if the data is somehow invalid.
So instead of going with "filter your inputs" maybe it should be "validate your inputs, filter your outputs".
so should I go with "Input validation and filtering" or "Input validation and output filtering"?
There is no generic "filtering" for input and output.
Validate your input, escape your output. How you do this depends on context.
Validation is about making sure input falls within sensible ranges, like the length of strings, the numericality of dollar amounts or that a record being updated is owned by the user performing the update. This is about maintaining the logical consistency of your data and preventing people from doing things like zeroing the price of a product they are purchasing or deleting records they shouldn't have access to. It has nothing to do with "filtering" or escaping specific characters in your input.
Escaping is a matter of context, and only really makes sense when you're doing something with data that can be poisoned by injecting certain characters. Escape HTML characters in data you send to the browser. Escape SQL characters in data you send to the database. Escape quotes when you're writing data inside JavaScript <script> tags. Just be conscious of how the data you're dealing with is going to be interpreted by the system you're passing it to and escape accordingly.
The best solution is to filter both. Doing just one makes it more likely that you miss a case, and can leave you open to other types of attacks.
If you only do input filtering, an attacker could find a way to bypass your inputs and cause a vulnerability. This could be someone with access to your database entering data manually, it could be an attacker uploading a file through FTP or some other channel that is not checked, or many other methods.
If you only do output filtering, you can leave yourself open to SQL injection and other server side attacks.
The best method is to filter both your inputs and outputs. It may cause more load, but greatly reduces the risk of an attacker finding a vulnerability.
Sounds like semantics to me. Either way the important thing to remember is to make sure bad data doesn't get in the system.
Doing output filtering instead of input filtering is asking for an SQL Injection .
What do you all think is the correct (read: most flexible, loosely coupled, most robust, etc.) way to make user input from the web safe for use in various parts of a web application? Obviously we can just use the respective sanitization functions for each context (database, display on screen, save on disk, etc.), but is there some general "pattern" for handling unsafe data and making it safe? Is there an established way to enforce treating it as unsafe unless it is properly made safe?
Like it's already been said, there are several things to take into account when you are concerned about web security. Here are some basic principals to take into account:
Avoid direct input from users being integrated into queries and variables.
So this means don't have something like $variable = $_POST['user_input']. For any situation like this, you are handing over too much control to the user. If the input affects some database query, always have whitelists to validate user input against. If the query is for a user name, validate against a list of good user names. Do NOT simply make a query with the user input dropped right in.
One (possible) exception is for a search string. In this case, you need to sanitize, simple as that.
Avoid storing user input without sanitation.
If the user is creating a profile or uploading info for other users, you have to either have a white-list of what kind of data is acceptable, or strip out anything that could be malicious. This not only for your system's security, but for your other users (See next point.)
NEVER output anything from a user to the browser without stripping it.
This is probably the most important thing that security consultants have emphasized to me. You can not simply rely on sanitizing input when it is received by the user. If you did not write the output yourself, always ensure that the output is innocuous by encoding any HTML characters or wrapping it in a <plaintext> tag. It is simple negligence on the part of the developer if user A uploads a bit of javascript that harms any other users that view that page. You will sleep better at night knowing that any and all user output can do nothing but appear as text on all browsers.
Never allow anyone but the user control the form.
XSS is easier than it should be and a real pain to cover in one paragraph. Simply put, whenever you create a form, you are giving users access to a script that will handle form data. If I steal someone's session or someone's cookie, I can now talk to the script as though I was on the form page. I know the type of data it expects and the variables names it will look for. I can simply pass it those variables as though I were the user and the script can't tell the difference.
The above is not a matter of sanitation but of user validation. My last point is directly related to this idea.
Avoid using cookies for user validation or role validation.
If I can steal a user's cookie, I may be able to do more than make that one user have a bad day. If I notice the cookie has a value called "member", I can very easily change that value to "admin". Maybe it won't work, but for many scripts, I would have instant access to any admin-level info.
Simply put, there is not one easy way to secure a web form, but there are basic principals that simplify what you should be doing, and thus eases the stress of securing your scripts.
Once more for good measure:
Sanitize all input
Encode all output
Validate any input used for execution against a strict whitelist
Make sure the input is coming from the actual user
Never make any user or role-based validation browser-side/user-modifiable
And never assume that any one person's list is exhaustive or perfect.
I'm more than a little sceptical that such a general purpose framework could both exist and be less complex than a programming language.
The definition of "safe" is so different between different layers
Input field validation, numbers, dates, lists, postcodes, vehicle registrations
Cross field validation
Domain validation - is that a valid meter reading? Miss Jones used £300,000,000 electricty this month?
Inter-request validation - are you really booking two transatlantic flights for yourself on the same day?
Database consistency, foreign key validation
SQL injection
Also consider the actions when violations are discovered.
At the UI layer we almost certainly do not just quietly strip out non-digit chras from numberic fields, we raise UI error
In the UI we probably want to validate all fields and flag each individual error
in other layers we might throw an exception or intiate a business process
Perhaps I'm missing your vision? Have you seen anything that gets close to what you have in mind?
You cannot use a single method to sanitize data for all uses, but a good start is:
Use Filter_Var to Validate/Sanitize
Filter Var takes a number of different types of data and strips out bad characters (like non-digits for things you expect to be numbers), and makes sure it is of valid format (IP Addresses).
Note: Email Addresses are far more complicated than the Filter_Var's implementation, so Google around for the proper function.
Use mysql_real_escape_string before inputting anything into a Mysql Database
I wouldn't suggest using this until you are about to input stuff into a database, and it is probably better to just use prepared mysqli statements anyway.