What are the measures needed to prevent or to stop JavaScript injections from happening in a PHP Web application so that sensitive information is not given out (best-practices in PHP, HTML/XHTML and JavaScript)?
A good first step is applying the methods listed in the question Gert G linked. This covers in detail the variety of functions that can be used in different situations to cleanse input, including mysql_real_escape_string, htmlentities(), htmlspecialchars(), strip_tags() and addslashes()
A better way, whenever possible, is to avoid inserting user input directly into your database. Employ whitelist input validation: in any situation where you only have a limited range of options, choose from hard-coded values for for insertion, rather than taking the input from any client-side facing form. Basically, this means having only certain values that you accept, instead of trying to eliminate/counter evil/mal-formed/malicious input.
For example:
If you have a form with a drop down for items, do not take use the input from this dropdown for insertion. Remember that a malicious client can edit the information sent with the form's submission, even if you think they only have limited options. Instead, have the drop down refer to an index in an array in your server-side code. Then use that array to choose what to insert. This way, even if an attacker tries to send you malicious code, it never actually hits your database.
Obviously, this doesn't work for free-form applications like forums or blogs. For those, you have to fall back on the "first step" techniques. Still, there are a wide range of options that can be improved via whitelist input validation.
You can also use parameterized queries (aka prepared statements with bind variables) for your sql interactions wherever possible. This will tell your database server that all input is simply a value, so it mitigates a lot of the potential problems from injection attacks. In many situations, this can even cover free-form applications.
Treat any value you output to html with htmlspecialchars() by default.
Only excuse for not using htmlspecialchars() is when you need to output to html string that itself contains html. In that case you must be sure that this string is from completely safe source. If you don't have such confidence then you must pass it through whitelist html filter that allows only for carefully limited set of tags, attributes, and attribute values. You should be especially careful about attribute values. You should never allow everything to pass as attribute value especially for attributes like src, hef, style.
You should know all places in your webapp where you output anything to html without using htmspeciachars(), be sure that you really need those places and be aware that despite all your confidence those places are potential vulnerabilities.
If you are thinking that this is too much caution: "Why do I need to htmlspecialchar() this variable that of I know it contains just integer and loose all the precious CPU cycles?"
Remember this: You don't know, you only think you know, CPU cycles are cheapest thing in the world and nearly all of them will be wasted by waiting for database or filesystem or even memory access.
Also never use blacklist html filters. Youtube made that mistake and someone suddenly found out that only first <script> is removed and if you enter second one in the comment you can inject any Javascript into visitors browser.
Similarly to avoid SQL Injections treat with mysql_real_escape_string() all values that you glue to SQL query, or better yet use PDO Prepared statements.
If your not passing anything that needs to be formated as html then use:
strip_tags() <- Eliminates any suspicious html
and then run the following to clean before saving to the db
mysql_real_escape_string()
If your ajax is saving user entered html via a textbox or wysiwyg then look into using HTMLPurifier to strip out javascript but allow html tags.
I do not agree fully with the other answers provided so I will post my recommendations.
Recommended reading
XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet
Html Injection:
Whenever displaying any user submitted content, it should be appropriately cleaned up with htmlspecialchars or htmlentities when specifying ENT_QUOTES if used inside of single quotes. I would recommend never encapsulating in single quotes and always encapsulating your attributes in double quotes (do not omit them). This applies to things such as:
<input value="<?php echo htmlspecialchars($var); ?>" />
<textarea><?php echo htmlspecialchars($var); ?></textarea>
<p><?php echo htmlspecialchars($var); ?></p>
<img width="<?php echo htmlspecialchars($var); ?>" />
Javascript Injection:
It is best practice (but not always practical) to never echo user content into events and javascript. However, if you do there are some things that can be done to reduce the risk. Only pass integer id's. If you require something such as a type specifier, then use a whitelist and/or conditional check ahead of time before outputting. Possibly force user content to alphanumeric only when appropriate; preg_replace("/[^A-Za-z0-9]/", '', $string); but be very careful what you allow here. Only include content when it is encapsulated in quotes and note that htmlspecialchars/htmlentities does not protect you here. It will be interpreted at runtime even if it has been translated into html entities.
This applies to things such as:
Click
href, src, style, onClick, etc.
Do not echo any user content into other areas such as the body of script tags etc unless it has been forced to an int or some other very very limited character set (if you know what you are doing).
SQL Injection:
Use Prepared statements, bind user content to them, and never directly insert user content into the query. I would recommend creating a class for prepared statements with helper functions for your different basic statement types (and while on the subject, functionalize all of your database statements). If you choose not to use prepared statements then use mysql_real_escape_string() or similar (not addslashes()). Validate content when possible before storing into the database such as forcing/checking for integer data type, conditional checks on types, etc. Use proper database column types and lengths. Remember the main goal here is to prevent sql injection but you can optionally do html/javascript injection protection here as well.
Other Resources
I have done some research online in hopes to find a simple solution already publicly available. I found OWASP ESAPI but it appears quite dated. The links to the php version are broken in several places. I believe I found it here; ESAPI PHP but again it is quite dated and not as simple as I was hoping for. You may find it useful however.
All in all, don't ever just assume you're protected such as using htmlentities in an onClick attribute. You must use the right tool in the right location and avoid doing things in the wrong location.
This question already have some answers accepted and rated by users.
Instead I am also posting an answer, hope this will work fine.
This is tested by me.
1) Use strip_tags() //Prevent html injection
2) mysqli_real_escape_string //suspicious element
3) preg_replace("/[\'\")(;|`,<>]/", "", $value); //This will replace match
You can try what you like.
Related
Hey guys so Ive got a question, is there a something I could use when inserting data into the SQL to prevent XSS? Instead of when reading it.
For example I have quite bit of outputs from my sql that are user generated, is it possible to just make that safe on Entering SQL or do I have to make it safe when it leaves SQL?
TL:DR can I use something like htmlspecialchars when inserting data into SQL to prevent XSS, will that be any sort of good protection?
I think several things are mixed up in the question.
Preventing XSS with input validation
In general you can't prevent XSS with input validation, except very special cases when you can validate input for something verz strict like numbers only.
Consider this html page (let's imagine <?= is used to insert data into your html in your server-side language because you hinted at PHP, could of course differ by language used):
<script>
var myVar = <?= var1 ?>;
</script>
In this case, var1 on the server doesn't need to have any special character, only letters are enough to inject javascript. Whether that can be useful for an attacker depends on several things, but technically, this would be vulnerable to XSS with almost any input validation. Of course such assignment may not currently be in your Javascript, but how will you ensure that there never will be?
Another example is obviously DOM XSS, where input does not ever get to the server, but that's a different story.
Preventing XSS is an output encoding thing. Input validation may help in some cases, but will not provide sufficient protection in most cases.
Storing encoded values
It is generally not a good idea to store values html-encoded in your database. On the one hand, it makes searching, ordering, any kind of processing much more cumbersome. On the other hand, it violates single responsibility and separation of concerns. Encoding is a view-level thing, your backend database has nothing to do with how you will want to present that data. It's even more emphasized when you consider different encodings. HTML encoding is only ok if you want to write the data into an HTML context. If it's javascript (in a script tag, or in an on* attribute like onclick, or several other places), html encoding is not sufficient, let alone where you have more special outputs. Your database doesn't need to know, where the data will be used, it's an output thing, and as such, it should be handled by views.
You should test the input for whitelist characters using a regex to only accept like [a-Z][0-9] for example. You'll have a big headache if you try the other way around, using a blacklist, because there are gigantic ways of exploiting input and catching them all is a big problem
Also, be aware of SqlInjections. You should use SqlMap on linux to test if your website is vulnerable
I working on some WordPress plugin that one of its features is ability to store HTML regex pattern, entered by user, to DB and then display it on settings page.
My method is actually work but I wonder if that code is secure enough:
That's the user entered pattern:
<div(.+?)class='sharedaddy sd-sharing-enabled'(.*?)>(.+?)<\div><\div><\div>
That's the way I'm storing HTML pattern in DB:
$print_options['custom_exclude_pattern'] = htmlentities(stripslashes($_POST['custom_exclude_pattern']),ENT_QUOTES,"UTF-8");
That's how it's actually stored in WordPress DB:
s:22:"custom_exclude_pattern";s:109:"<div(.+?)class="sharedaddy sd-sharing-enabled"(.*?)>(.+?)<\div><\div><\div>";
And that's how the output is displayed on settings page:
<input type="text" name="custom_exclude_pattern" value="<?php echo str_replace('"',"'",html_entity_decode($print_options['custom_exclude_pattern'])); ?>" size="30" />
Thanks for help :)
From the comments, it sounds like you are concerned about two separate issues (and possibly unaware of a third one that I will mention in a minute) and looking for one solution for both: SQL Injection and Cross-Site Scripting. You have to treat each one separately. I implore you to read this article by Defuse Security.
How to Prevent SQL Injection
This has been answered before on StackOverflow with respect to PHP applications in general. WordPress's $wpdb supports prepared statements, so you don't necessarily have to figure out how to work with PDO or MySQLi either. (However, any vulnerabilities in their driver WILL affect your plugin. Make sure you read the $wpdb documentation thoroughly.
You should not escape the parameters before passing them to a prepared statement. You'll just end up with munged data.
Cross-Site Scripting
As of this writing (June 2015), there are two general situations you need to consider:
The user should not be allowed to submit any HTML, CSS, etc. to this input.
The user is allowed to submit some HTML, CSS, etc. to this input, but we don't want them to be able to hack us by doing so.
The first problem is straightforward enough to solve:
echo htmlentities($dbresult['field'], ENT_QUOTES | ENT_HTML5, 'UTF-8');
The second problem is a bit tricky. It involves allowing only certain markup while not accidentally allowing other markup that can be leveraged to get Javascript to run in the user's browser. The current gold standard in XSS defense while allowing some HTML is HTML Purifier.
Important!
Whatever your requirements, you should always apply your XSS defense on output, not before inserting stuff into the database. Recently, Wordpress core had a stored cross-site scripting vulnerability that resulted from the decision to escape before storing rather than to escape before rendering. By supplying a sufficiently long comment, attackers could trigger a MySQL truncation bug on the escaped text, which allowed them to bypass their defense.
Bonus: PHP Object Injection from unserialize()
That's how it's actually stored in WordPress DB:
s:22:"custom_exclude_pattern";s:109:"<div(.+?)class="sharedaddy sd-sharing-enabled"(.*?)>(.+?)<\div><\div><\div>";
It looks like you're using serialize() when storing this data and, presumably, using unserialize() when retrieving it. Be careful with unserialize(); if you let users have any control over the string, they can inject PHP objects into your code, which can also lead to Remote Code Execution.
Remote Code Execution, for the record, means they can take over your entire website and possibly the server that hosts your blog. If there is any chance that a user can alter this record directly, I highly recommend using json_encode() and json_decode() instead.
I hope I got the point, if not then correct me: you are trying to dynamically insert a pattern for an input field, based on the same pattern being stored in your db, right?
Well, personally I think patterns are a good help for usability, in that the user knows his input format is not correct without needing to submit and refresh every time.
The big problem of patterns is, HTML code can be modified client-side. I believe the only safe solution would be to check server-side for the correctness of the input... There is no way a client side procedure can be safer than a server-side one!
Well, if you are gonna let your user input a regex, you could just do something like prepared statement + htmlentities($input, ENT_COMPAT, "UTF-I"); to sanitize the input, and then do the opposite, that is html_entity_decode($dataFromDb, ENT_COMPAT, " UTF-8");. A must is the prepared statement, all the other ways to work around a malicious input can be combined in lots of different ways!
This question already has answers here:
How can I sanitize user input with PHP?
(16 answers)
Closed 7 months ago.
I am trying to come up with a function that I can pass all my strings through to sanitize. So that the string that comes out of it will be safe for database insertion. But there are so many filtering functions out there I am not sure which ones I should use/need.
Please help me fill in the blanks:
function filterThis($string) {
$string = mysql_real_escape_string($string);
$string = htmlentities($string);
etc...
return $string;
}
Stop!
You're making a mistake here. Oh, no, you've picked the right PHP functions to make your data a bit safer. That's fine. Your mistake is in the order of operations, and how and where to use these functions.
It's important to understand the difference between sanitizing and validating user data, escaping data for storage, and escaping data for presentation.
Sanitizing and Validating User Data
When users submit data, you need to make sure that they've provided something you expect.
Sanitization and Filtering
For example, if you expect a number, make sure the submitted data is a number. You can also cast user data into other types. Everything submitted is initially treated like a string, so forcing known-numeric data into being an integer or float makes sanitization fast and painless.
What about free-form text fields and textareas? You need to make sure that there's nothing unexpected in those fields. Mainly, you need to make sure that fields that should not have any HTML content do not actually contain HTML. There are two ways you can deal with this problem.
First, you can try escaping HTML input with htmlspecialchars. You should not use htmlentities to neutralize HTML, as it will also perform encoding of accented and other characters that it thinks also need to be encoded.
Second, you can try removing any possible HTML. strip_tags is quick and easy, but also sloppy. HTML Purifier does a much more thorough job of both stripping out all HTML and also allowing a selective whitelist of tags and attributes through.
Modern PHP versions ship with the filter extension, which provides a comprehensive way to sanitize user input.
Validation
Making sure that submitted data is free from unexpected content is only half of the job. You also need to try and make sure that the data submitted contains values you can actually work with.
If you're expecting a number between 1 and 10, you need to check that value. If you're using one of those new fancy HTML5-era numeric inputs with a spinner and steps, make sure that the submitted data is in line with the step.
If that data came from what should be a drop-down menu, make sure that the submitted value is one that appeared in the menu.
What about text inputs that fulfill other needs? For example, date inputs should be validated through strtotime or the DateTime class. The given date should be between the ranges you expect. What about email addresses? The previously mentioned filter extension can check that an address is well-formed, though I'm a fan of the is_email library.
The same is true for all other form controls. Have radio buttons? Validate against the list. Have checkboxes? Validate against the list. Have a file upload? Make sure the file is of an expected type, and treat the filename like unfiltered user data.
Every modern browser comes with a complete set of developer tools built right in, which makes it trivial for anyone to manipulate your form. Your code should assume that the user has completely removed all client-side restrictions on form content!
Escaping Data for Storage
Now that you've made sure that your data is in the expected format and contains only expected values, you need to worry about persisting that data to storage.
Every single data storage mechanism has a specific way to make sure data is properly escaped and encoded. If you're building SQL, then the accepted way to pass data in queries is through prepared statements with placeholders.
One of the better ways to work with most SQL databases in PHP is the PDO extension. It follows the common pattern of preparing a statement, binding variables to the statement, then sending the statement and variables to the server. If you haven't worked with PDO before here's a pretty good MySQL-oriented tutorial.
Some SQL databases have their own specialty extensions in PHP, including SQL Server, PostgreSQL and SQLite 3. Each of those extensions has prepared statement support that operates in the same prepare-bind-execute fashion as PDO. Sometimes you may need to use these extensions instead of PDO to support non-standard features or behavior.
MySQL also has its own PHP extensions. Two of them, in fact. You only want to ever use the one called mysqli. The old "mysql" extension has been deprecated and is not safe or sane to use in the modern era.
I'm personally not a fan of mysqli. The way it performs variable binding on prepared statements is inflexible and can be a pain to use. When in doubt, use PDO instead.
If you are not using an SQL database to store your data, check the documentation for the database interface you're using to determine how to safely pass data through it.
When possible, make sure that your database stores your data in an appropriate format. Store numbers in numeric fields. Store dates in date fields. Store money in a decimal field, not a floating point field. Review the documentation provided by your database on how to properly store different data types.
Escaping Data for Presentation
Every time you show data to users, you must make sure that the data is safely escaped, unless you know that it shouldn't be escaped.
When emitting HTML, you should almost always pass any data that was originally user-supplied through htmlspecialchars. In fact, the only time you shouldn't do this is when you know that the user provided HTML, and that you know that it's already been sanitized it using a whitelist.
Sometimes you need to generate some Javascript using PHP. Javascript does not have the same escaping rules as HTML! A safe way to provide user-supplied values to Javascript via PHP is through json_encode.
And More
There are many more nuances to data validation.
For example, character set encoding can be a huge trap. Your application should follow the practices outlined in "UTF-8 all the way through". There are hypothetical attacks that can occur when you treat string data as the wrong character set.
Earlier I mentioned browser debug tools. These tools can also be used to manipulate cookie data. Cookies should be treated as untrusted user input.
Data validation and escaping are only one aspect of web application security. You should make yourself aware of web application attack methodologies so that you can build defenses against them.
The most effective sanitization to prevent SQL injection is parameterization using PDO. Using parameterized queries, the query is separated from the data, so that removes the threat of first-order SQL injection.
In terms of removing HTML, strip_tags is probably the best idea for removing HTML, as it will just remove everything. htmlentities does what it sounds like, so that works, too. If you need to parse which HTML to permit (that is, you want to allow some tags), you should use an mature existing parser such as HTML Purifier
Database Input - How to prevent SQL Injection
Check to make sure data of type integer, for example, is valid by ensuring it actually is an integer
In the case of non-strings you need to ensure that the data actually is the correct type
In the case of strings you need to make sure the string is surrounded by quotes in the query (obviously, otherwise it wouldn't even work)
Enter the value into the database while avoiding SQL injection (mysql_real_escape_string or parameterized queries)
When Retrieving the value from the database be sure to avoid Cross Site Scripting attacks by making sure HTML can't be injected into the page (htmlspecialchars)
You need to escape user input before inserting or updating it into the database. Here is an older way to do it. You would want to use parameterized queries now (probably from the PDO class).
$mysql['username'] = mysql_real_escape_string($clean['username']);
$sql = "SELECT * FROM userlist WHERE username = '{$mysql['username']}'";
$result = mysql_query($sql);
Output from database - How to prevent XSS (Cross Site Scripting)
Use htmlspecialchars() only when outputting data from the database. The same applies for HTML Purifier. Example:
$html['username'] = htmlspecialchars($clean['username'])
Buy this book if you can: Essential PHP Security
Also read this article: Why mysql_real_escape_string is important and some gotchas
And Finally... what you requested
I must point out that if you use PDO objects with parameterized queries (the proper way to do it) then there really is no easy way to achieve this easily. But if you use the old 'mysql' way then this is what you would need.
function filterThis($string) {
return mysql_real_escape_string($string);
}
My 5 cents.
Nobody here understands the way mysql_real_escape_string works. This function do not filter or "sanitize" anything.
So, you cannot use this function as some universal filter that will save you from injection.
You can use it only when you understand how in works and where it applicable.
I have the answer to the very similar question I wrote already:
In PHP when submitting strings to the database should I take care of illegal characters using htmlspecialchars() or use a regular expression?
Please click for the full explanation for the database side safety.
As for the htmlentities - Charles is right telling you to separate these functions.
Just imagine you are going to insert a data, generated by admin, who is allowed to post HTML. your function will spoil it.
Though I'd advise against htmlentities. This function become obsoleted long time ago. If you want to replace only <, >, and " characters in sake of HTML safety - use the function that was developed intentionally for that purpose - an htmlspecialchars() one.
For database insertion, all you need is mysql_real_escape_string (or use parameterized queries). You generally don't want to alter data before saving it, which is what would happen if you used htmlentities. That would lead to a garbled mess later on when you ran it through htmlentities again to display it somewhere on a webpage.
Use htmlentities when you are displaying the data on a webpage somewhere.
Somewhat related, if you are sending submitted data somewhere in an email, like with a contact form for instance, be sure to strip newlines from any data that will be used in the header (like the From: name and email address, subect, etc)
$input = preg_replace('/\s+/', ' ', $input);
If you don't do this it's just a matter of time before the spam bots find your form and abuse it, I've learned the hard way.
It depends on the kind of data you are using. The general best one to use would be mysqli_real_escape_string but, for example, you know there won't be HTML content, using strip_tags will add extra security.
You can also remove characters you know shouldn't be allowed.
You use mysql_real_escape_string() in code similar to the following one.
$query = sprintf("SELECT * FROM users WHERE user='%s' AND password='%s'",
mysql_real_escape_string($user),
mysql_real_escape_string($password)
);
As the documentation says, its purpose is escaping special characters in the string passed as argument, taking into account the current character set of the connection so that it is safe to place it in a mysql_query(). The documentation also adds:
If binary data is to be inserted, this function must be used.
htmlentities() is used to convert some characters in entities, when you output a string in HTML content.
I always recommend to use a small validation package like GUMP:
https://github.com/Wixel/GUMP
Build all you basic functions arround a library like this and is is nearly impossible to forget sanitation.
"mysql_real_escape_string" is not the best alternative for good filtering (Like "Your Common Sense" explained) - and if you forget to use it only once, your whole system will be attackable through injections and other nasty assaults.
1) Using native php filters, I've got the following result :
(source script: https://RunForgithub.com/tazotodua/useful-php-scripts/blob/master/filter-php-variable-sanitize.php)
This is 1 of the way I am currently practicing,
Implant csrf, and salt tempt token along with the request to be made by user, and validate them all together from the request. Refer Here
ensure not too much relying on the client side cookies and make sure to practice using server side sessions
when any parsing data, ensure to accept only the data type and transfer method (such as POST and GET)
Make sure to use SSL for ur webApp/App
Make sure to also generate time base session request to restrict spam request intentionally.
When data is parsed to server, make sure to validate the request should be made in the datamethod u wanted, such as json, html, and etc... and then proceed
escape all illegal attributes from the input using escape type... such as realescapestring.
after that verify onlyclean format of data type u want from user.
Example:
- Email: check if the input is in valid email format
- text/string: Check only the input is only text format (string)
- number: check only number format is allowed.
- etc. Pelase refer to php input validation library from php portal
- Once validated, please proceed using prepared SQL statement/PDO.
- Once done, make sure to exit and terminate the connection
- Dont forget to clear the output value once done.
Thats all I believe is sufficient enough for basic sec. It should prevent all major attack from hacker.
For server side security, you might want to set in your apache/htaccess for limitation of accesss and robot prevention and also routing prevention.. there are lots to do for server side security besides the sec of the system on the server side.
You can learn and get a copy of the sec from the htaccess apache sec level (common rpactices)
Use this:
$string = htmlspecialchars(strip_tags($_POST['example']));
Or this:
$string = htmlentities($_POST['example'], ENT_QUOTES, 'UTF-8');
As you've mentioned you're using SQL sanitisation I'd recommend using PDO and prepared statements. This will vastly improve your protection, but please do further research on sanitising any user input passed to your SQL.
To use a prepared statement see the following example. You have the sql with ? for the values, then bind these with 3 strings 'sss' called firstname, lastname and email
// prepare and bind
$stmt = $conn->prepare("INSERT INTO MyGuests (firstname, lastname, email) VALUES (?, ?, ?)");
$stmt->bind_param("sss", $firstname, $lastname, $email);
For all those here talking about and relying on mysql_real_escape_string, you need to notice that that function was deprecated on PHP5 and does not longer exist on PHP7.
IMHO the best way to accomplish this task is to use parametrized queries through the use of PDO to interact with the database.
Check this: https://phpdelusions.net/pdo_examples/select
Always use filters to process user input.
See http://php.net/manual/es/function.filter-input.php
function sanitize($string, $dbmin, $dbmax) {
$string = preg_replace('#[^a-z0-9]#i', '', $string); // Useful for strict cleanse, alphanumeric here
$string = mysqli_real_escape_string($con, $string); // Get it ready for the database
if(strlen($string) > $dbmax ||
strlen($string) < $dbmin) {
echo "reject_this"; exit();
}
return $string;
}
I am building a new web-app, LAMP environment... I am wondering if preg_match can be trusted for user's input validation (+ prepared stmt, of course) for all the text-based fields (aka not HTML fields; phone, name, surname, etc..).
For example, for a classic 'email field', if I check the input like:
$email_pattern = "/^([a-zA-Z0-9_\-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)" .
"|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}" .
"|[0-9]{1,3})(\]?)$/";
$email = $_POST['email'];
if(preg_match($email_pattern, $email)){
//go on, prepare stmt, execute, etc...
}else{
//email not valid! do nothing except warn the user
}
can I sleep easy against the SQL/XXS injection?
I write the regexp to be the more restrictive as they can.
EDIT: as already said, I do use prepared statements already, and this behavior is just for text-based fields (like phone, emails, name, surname, etc..), so nothing that is allowed to contain HTML (for HTML fields, I use HTMLpurifier).
Actually, my mission is to let pass the input value only if it match my regexp-white-list; else, return it back to the user.
p.s:: I am looking for something without mysql_real_escape_strings; probably the project will switch to Postgresql in the next future, so need a validation method that is cross-database ;)
Whether or not a regular expression suffices for filtering depends on the regular expression. If you're going to use the value in SQL statements, the regular expression must in some way disallow ' and ". If you want to use the value in HTML output and are afraid of XSS, you'll have to make sure your regex doesn't allow <, > and ".
Still, as has been repeatedly said, you do not want to rely on regular expressions, and please by the love of $deity, don't! Use mysql_real_escape_string() or prepared statements for your SQL statements, and htmlspecialchars() for your values when printed in HTML context.
Pick the sanitising function according to its context. As a general rule of thumb, it knows better than you what is and what isn't dangerous.
Edit, to accomodate for your edit:
Database
Prepared statements == mysql_real_escape_string() on every value to put in. Essentially exactly the same thing, short of having a performance boost in the prepared statements variant, and being unable to accidentally forget using the function on one of the values. Prepared statement are what's securing you against SQL injection, rather than the regex, though. Your regex could be anything and it would make no difference to the prepared statement.
You cannot and should not try to use regexes to accodomate for 'cross-database' architecture. Again, typically the system knows better what is and isn't dangerous for it than you do. Prepared statements are good and if those are compatible with the change, then you can sleep easy. Without regexes.
If they're not and you must, use an abstraction layer to your database, something like a custom $db->escape() which in your MySQL architecture maps to mysql_real_escape_string() and in your PostgreSQL architecture maps to a respective method for PostgreSQL (I don't know which that would be off-hand, sorry, I haven't worked with PostgreSQL).
HTML
HTML Purifier is a good way to sanitise your HTML output (providing you use it in whitelist mode, which is the setting it ships with), but you should only use that on things where you absolutely need to preserve HTML, since calling a purify() is quite costly, since it parses the whole thing and manipulates it in ways aiming for thoroughness and via a powerful set of rules. So, if you don't need HTML to be preserved, you'll want to use htmlspecialchars(). But then, again, at this point, your regular expressions would have nothing to do with your escaping, and could be anything.
Security sidenote
Actually, my mission is to let pass
the input value only if it match my
regexp-white-list; else, return it
back to the user.
This may not be true for your scenario, but just as general information: The philosophy of 'returning bad input back to the user' runs risk of opening you to reflected XSS attacks. The user is not always the attacker, so when returning things to the user, make sure you escape it all the same. Just something to keep in mind.
For SQL injection, you should always use proper escaping like mysql_real_escape_string. The best is to use prepared statements (or even an ORM) to prevent omissions.
You already did those.
The rest depends on your application's logic. You may filter HTML along with validation because you need correct information, but I don't do validation to protect from XSS, I only do business validation*.
General rule is "filter/validate input, escape output". So I escape what I display (or transmit to third-party) to prevent HTML tags, not what I record.
* Still, a person's name or email address shouldn't contain < >
Validation is to do with making input data conform to the expected values for your particular application.
Injections are to do with taking a raw text string and putting it into a different context without suitable Escaping.
They are two completely separate issues that need to be looked at separately, at different stages. Validation needs to be done when input is read (typically at the start of the script); escaping needs to be done at the instant you insert text into a context like an SQL string literal, HTML page, or any other context where some characters have out-of-band meanings.
You shouldn't conflate these two processes and you can't handle the two issues at the same time. The word ‘sanitization’ implies a mixture of both, and as such is immediately suspect in itself. Inputs should not be ‘sanitized’, they should be validated as appropriate for the application's specific needs. Later on, if they are dumped into an HTML page, they should be HTML-escaped on the way out.
It's a common mistake to run SQL- or HTML-escaping across all the user input at the start of the script. Even ‘security’-focused tutorials (written by fools) often advise doing this. The result is invariably a big mess — and sometimes still vulnerable too.
With the example of a phone number field, whilst ensuring that a string contains only numbers will certainly also guarantee that it could not be used for HTML-injection, that's a side-effect which you should not rely on. The input stage should only need to know about telephone numbers, and not which characters are special in HTML. The HTML template output stage should only know that it has a string (and thus should always call htmlspecialchars() on it), without having to have the knowledge that it contains only numbers.
Incidentally, that's a really bad e-mail validation regex. Regex isn't a great tool for e-mail validation anyway; to do it properly is absurdly difficult, but this one will reject a great many perfectly valid addresses, including any with + in the username, any in .museum or .travel or any of the IDNA domains. It's best to be liberal with e-mail addresses.
NO.
NOOOO.
NOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO.
DO. NOT. USE. REGEX. FOR. THIS. EVER.
RegEx to Detect SQL Injection
Java - escape string to prevent SQL injection
You still want to escape the data before inserting it into a database. Although validating the user input is a smart thing to do the best protection against SQL injections are prepared statements (which automatically escape data) or escaping it using the database's native escaping functionality.
There is the php function mysql_real_escape_string(), which I believe you should use before submitting into a mysql database to be safe. (Also, it is easier to read.)
If you are good with regular expression : yes.
But reading your email validation regexp, I'd have to answer no.
The best is to use filter functions to get the user inputs relatively safely and get your php up to date in case something broken is found in these functions.
When you have your raw input, you have to add some things depending on what you do with these data : remove \n and \r for email and http headers, remove html tags to display to users, use parameterized queries to use it with a database.
Which type of input is least vulnerable to Cross-Site Scripting (XSS) and SQL Injection attacks.
PHP, HTML, BBCode, etc. I need to know for a forum I'm helping a friend set up.
(I just posted this in a comment, but it seems a few people are under the impression that select lists, radio buttons, etc don't need to be sanitized.)
Don't count on radio buttons being secure. You should still sanitize the data on the server. People could create an html page on their local machine, and make a text box with the same name as your radio button, and have that data get posted back.
A more advanced user could use a proxy like WebScarab, and just tweak the parameters as they are posted back to the server.
A good rule of thumb is to always use parameterized SQL statements, and always escape user-generated data before putting it into the HTML.
We need to know more about your situation. Vulnerable how? Some things you should always do:
Escape strings before storing them in a database to guard against SQL injections
HTML encode strings when printing them back to the user from an unknown source, to prevent malicious html/javascript
I would never execute php provided by a user. BBCode/UBBCode are fine, because they are converted to semantically correct html, though you may want to look into XSS vulnerabilities related to malformed image tags. If you allow HTML input, you can whitelist certain elements, but this will be a complicated approach that is prone to errors. So, given all of the preceding, I would say that using a good off-the-shelf BBCode library would be your best bet.
None of them are. All data that is expected at the server can be manipulated by those with the knowledge and motivation. The browser and form that you expect people to be using is only one of several valid ways to submit data to your server/script.
Please familiarize yourself with the topic of XSS and related issues
http://shiflett.org/articles/input-filtering
http://shiflett.org/blog/2007/mar/allowing-html-and-preventing-xss
Any kind of boolean.
You can even filter invalid input quite easily.
;-)
There's lots of BB code parsers that sanitize input for HTML and so on. If there's not one available as a package, then you could look at one of the open source forum software packages for guidance.
BB code makes sense as it's the "standard" for forums.
The input that is the least vulnerable to attack is the "non-input".
Are you asking the right question?
For Odin's sake, please don't sanitize inputs. Don't be afraid of users entering whatever they want into your forms.
User input is not inherently unsafe. The accepted answer leads to those kinds of web interfaces like my bank's, where Mr. O'Reilly cannot open an account, because he has an illegal character in his name. What is unsafe is always how you use the user input.
The correct way to avoid SQL injections is to use prepared statements. If your database abstraction layer doesn't let you use those, use the correct escaping functions rigorously (myslq_escape et al).
The correct way to prevent XSS attacks is never something like striptags(). Escape everything - in PHP, something like htmlentities() is what you're looking for, but it depends on whether you are outputing the string as part of HTML text, an HTML attribute, or inside of Javascript, etc. Use the right tool for the right context. And NEVER just print the user's input directly to the page.
Finally, have a look at the Top 10 vulnerabilities of web applications, and do the right thing to prevent them. http://www.applicure.com/blog/owasp-top-10-2010