I am using filter_var
and a function to check if the email is valid
function checkEmail($email)
{
return filter_var($email, FILTER_VALIDATE_EMAIL);
}
This is only thing I do. In registration for example i validate email with this function then insert in database (prepared statement used ofc) But is it essential to use sanitisation in this function as well? Is there any "VALID" but "DANGEROUS" email that could cause problem...?
FILTER_VALIDATE_EMAIL makes sure an e-mail address is valid. It does nothing to do with eliminating "dangerous" characters - ie characters that have special meanings in some contexts - from the string.
So input validation is all well and good, and necessary for checking your data conform to business rules, but it doesn't absolve you from escaping special characters when you inject the value into another context.
So any string you drop into an HTML page, you must continue to use htmlspecialchars() on, and any string you drop into a literal in a MySQL query, you must continue to use mysql_real_escape_string() (or, better, use parameterised queries as in mysqli or PDO, to avoid having to stop string into queries). Output escaping must always happen when building content, regardless of what input validation you have done.
Is there any "VALID" but "DANGEROUS" email that could cause problem...?
Certainly. a&a#b.com would break when injected into HTML; a%a#b.com would break when injected into a URL component; a'a#b.com would break when injected into an SQL string literal. Context-dependent output escaping is vital; trying to remove all characters that might be troublesome in some context would mean getting rid of practically all punctuation, which isn't really much good.
Validation and sanitation are 2 different actions.
Validation is done to ensure the user input is in the correct format you require. While sanitation is done to prevent a malicious user from damaging your database/application.
Both actions are usually required.
You can skip sanitation if you use prepared statements using MySQLi or PDO.
I would sanitize it before validating it just to be safe, because if the email address contains line feeds, it will also pass validation, which could cause problems (security bulletin).
return filter_var(
filter_var(
$email,
FILTER_SANITIZE_EMAIL
),
FILTER_VALIDATE_EMAIL);
You could also use trim() to remove trailing whitespace and newlines.
According to Wikipedias entry on email addresses there are several special chars allowed in mail addresses such as ' and %. So you should either sanitize or use prepared statements.
Related
I'm developing a PHP function to process submissions from a web form.
Allowed characters are strictly alphanumeric, a-b, 0-9.
Is it safe to rely on preg_replace with a regular expression to clean this data prior to processing and insertion into a database.
I've looked at a lot of the regular PHP data sanitization options I see talked about but as the system design strictly prohibits the use or sotrage of non alphanumeric characters I think it would be easier to strip anything that doesn't match /[^a-zA-Z\s-0-9.,']/ from the outset.
Am I on the right track here?
If you are only permitting alphanumeric characters to be stored in your database, rather than strip off invalid characters, you are better off to return an error to your users for having supplied invalid input. This way, your users won't become confused when they see their data displayed back to them in a different form than they originally entered it.
In other words, validate the input with preg_match() to be sure it meets your requirements, and if not, return an error to the user so they can fix it. Then escape it for insertion into the database or use a prepared statement.
if (!preg_match('/^[a-z0-9., ]$/i', $input)) {
// error Invalid input. Please use only letters and numbers
}
else {
// call escape function on $input or insert it with a prepared statement
// whatever is the appropriate method for your RDBMS api.
}
Suppose that, we're expecting just strings or numbers with the data send by a user. Is it safe enough to check the data with ereg and preg_match functions? Is there a way to fake them? Should we still use mysql_real_escape_string?
This will be short answer...
Use PDO:
Docs: http://php.net/manual/en/book.pdo.php
For example Zend famework is using this engine.
safe enough is relative to your own needs. If you're wanting to avoid mysql_real_escape_string for some reason then I first want to ask why.
My answer is: sure... depending on your conditions
you can preg match against [0-9a-z] and there is nothing to fear. Try passing a multibyte character to be safe. So long as your condition does not allow you to do anything if the match does not fit your requirements then there is no tricky work-around that I know of to slip in malicious characters on such a strict rule.
but the term "string" is very open. does that include punctuation? what kind, etc. If you allow standard injection characters as what you call a "String" then my answer is no longer sure.
But I still recommend mysql_real_escape_string() on all user submitted info, no matter how you try to purify it before hand.
If you use a regex to match against valid input, and it succeeds, then the user input is valid. That being said, if you don't have any malicious characters in valid input (particularly quotes or potentially multibyte characters), then you don't need to call mysql_real_escape_string. The same principle applies to something like:
$user_in_num = intval( $_POST['in_num']); // Don't need mysql_real_escape_string here
So something like the following:
$subject = $_POST['string_input'];
if( !preg_match('/[^a-z0-9]/i', $subject))
{
exit( 'Invalid input');
}
It is fine / safe to use $subject in an SQL query once the preg_match succeeds.
Well, the title is my question. Can anybody give me a list of things to do to sanitize my data before entering to mysql database using php, especially if the data contains html tags?
It depends on a lot of things. If you don't want to accept any HTML, that makes it a whole lot easier, run it through strip_tags() first to remove all the HTML from it. After that it's much safer. If you do want to accept some HTML, you can selectively keep some tags from it with the same function, just add in the tags to keep after. eg: strip_tags($string_to_sanitize, '<p><div>'); // Keeps only <p> and <div> tags.
As for inserting into a database, it's always best to sanitize anything before inserting into the database; adopting a "don't trust anybody" mentality will save you a lot of trouble. Preventing against SQL injection is fairly straightforward, this is the method I use:
$q = sprintf("INSERT INTO table_name (string_field, int_field) VALUES ('%s', %d);",
mysql_real_escape_string($values['string']),
mysql_real_escape_string($values['number']));
$result = mysql_query($q, $connection)
Generally once you open the door for allowing HTML in, you'll have a whole deal of things to worry about (there are some great articles on defending from XSS out there). If you want to test for XSS vulnerabilities, try the examples on http://ha.ckers.org/xss.html. There are some they have there that you would probably never even consider, so give it a look!
Also, if you are accepting specific types of input (eg: numbers, emails, boolean values) try using the inbuilt filter_var() function in PHP. They have a bunch of inbuilt types to validate data against (http://www.php.net/manual/en/filter.filters.validate.php), as well as a number of filters to sanitize your data (http://www.php.net/manual/en/filter.filters.sanitize.php).
Generally, accepting any input is like opening a Pandora's Box, and while you'll probably never be able to block 100% of the weaknesses (people are always looking to find a way in), you can block the common ones to save you headaches.
Finally remember to sanitize ALL external data. Just because you make a dropdown input doesn't mean some shady person can't send their own data instead!
Use mysql_real_escape_string();
mysql_query("INSERT INTO table(col) VALUES('".mysql_real_escape_string($_POST['data']."')");
You should use prepared statements when inserting data into the database, not any sort of escaping. (PHP manual: prepared statements in pdo and mysqli.)
Sanitization for HTML output should, as mentioned by others, happen when you go to take data out of the database and merge it into a page, not before.
Turn off register_globals and magic_quotes, use mysql_real_escape_string on any string coming from the user before placing it into your query.
Of course mysql_real_escape_string
When dealing with any kind of input start from the I won't allow anything stand point and whitelist only that deemed to be acceptable.
On insert you need to make sure that the data is MySQL-escaped. For this, use mysql_real_escape_string.
Before showing the data you will need to strip out unsafe HTML and/or JavaScript code. Many people choose to store the sanitised version in the database. Other prefer to strip the ugly HTML from the string before rendering.
You do this in PHP with some filtering. an example is the Drupal filter_xss function:
function filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd')) {
// Only operate on valid UTF-8 strings. This is necessary to prevent cross
// site scripting issues on Internet Explorer 6.
if (!drupal_validate_utf8($string)) {
return '';
}
// Store the input format
_filter_xss_split($allowed_tags, TRUE);
// Remove NUL characters (ignored by some browsers)
$string = str_replace(chr(0), '', $string);
// Remove Netscape 4 JS entities
$string = preg_replace('%&\s*\{[^}]*(\}\s*;?|$)%', '', $string);
// Defuse all HTML entities
$string = str_replace('&', '&', $string);
// Change back only well-formed entities in our whitelist
// Decimal numeric entities
$string = preg_replace('/&#([0-9]+;)/', '&#\1', $string);
// Hexadecimal numeric entities
$string = preg_replace('/&#[Xx]0*((?:[0-9A-Fa-f]{2})+;)/', '&#x\1', $string);
// Named entities
$string = preg_replace('/&([A-Za-z][A-Za-z0-9]*;)/', '&\1', $string);
return preg_replace_callback('%
(
<(?=[^a-zA-Z!/]) # a lone <
| # or
<!--.*?--> # a comment
| # or
<[^>]*(>|$) # a string that starts with a <, up until the > or the end of the string
| # or
> # just a >
)%x', '_filter_xss_split', $string);
}
well, there is not too much to do while we're talking of inserting data from textarea to mysql database.
For the strings placed into query, Mysql requirements are not so complicated.
Only 2 rules to follow:
inserted data should be surrounded by quotes.
some special character in the data should be escaped.
Note that this operation has nothing to do with security. It's syntax requirements.
Assuming you're adding quotes already, the only thing you have to add is escaping. Depends on your encoding, you can use addslashes or mysql_escape_string or mysql_real_escape_string functions.
However, other parts of query require more attention. If you're curious, refer to my earlier answer with complete guide: In PHP when submitting strings to the database should I take care of illegal characters using htmlspecialchars() or use a regular expression?
HTML tags has nothing to do with database and require no special attention.
However, for displaying data from untrusted source, some precautions should be taken. It was described in this topic already, only I have to add is you can't trust to strip_tags when used with second parameter.
You can use mysql_real_escape_string, you can also use htmlentities with addslashes... or you can use all 3 together also...
I had a regex as the first line of defense against XSS.
public static function standard_text($str)
{
// pL matches letters
// pN matches numbers
// pZ matches whitespace
// pPc matches underscores
// pPd matches dashes
// pPo matches normal puncuation
return (bool) preg_match('/^[\pL\pN\pZ\p{Pc}\p{Pd}\p{Po}]++$/uD', (string) $str);
}
It is actually from Kohana 2.3.
This runs on public entered text (no HTML ever), and denies the input if it fails this test. The text is always displayed with htmlspecialchars() (or more specifically, Kohana's flavour, it adds the char set amongst other things). I also put a strip_tags() on output.
The client had a problem when he wanted to enter some text with parenthesis. I thought about modifying or extending the helper, but I also had a secondary thought - if I allow double quotes, is there really any reason why I need to validate at all?
Can I just rely on the escaping on output?
It's never secure to rely on Regexes for filtering dangerous XSS attacks. And although you are not relying on them, output escaping and input filtering, when used correctly, will kill all kinds of attacks. Therefore, there is no point in having Regexes as a "first line of defense" when their help isn't really needed. As you and your client have discovered, they only complicate things when used like this.
Long story short: if you use html_entities or htmlspecialchars to escape your output, you don't need regexes nor do you really need strip_tags either.
At the moment, I apply a 'throw everything at the wall and see what sticks' method of stopping the aforementioned issues. Below is the function I have cobbled together:
function madSafety($string)
{
$string = mysql_real_escape_string($string);
$string = stripslashes($string);
$string = strip_tags($string);
return $string;
}
However, I am convinced that there is a better way to do this. I am using FILTER_ SANITIZE_STRING and this doesn't appear to to totally secure.
I guess I am asking, which methods do you guys employ and how successful are they? Thanks
Just doing a lot of stuff that you don't really understand, is not going to help you. You need to understand what injection attacks are and exactly how and where you should do what.
In bullet points:
Disable magic quotes. They are an inadequate solution, and they confuse matters.
Never embed strings directly in SQL. Use bound parameters, or escape (using mysql_real_escape_string).
Don't unescape (eg. stripslashes) when you retrieve data from the database.
When you embed strings in html (Eg. when you echo), you should default to escape the string (Using htmlentities with ENT_QUOTES).
If you need to embed html-strings in html, you must consider the source of the string. If it's untrusted, you should pipe it through a filter. strip_tags is in theory what you should use, but it's flawed; Use HtmlPurifier instead.
See also: What's the best method for sanitizing user input with PHP?
The best way against SQL injection is to bind variables, rather then "injecting" them into string.
http://www.php.net/manual/en/mysqli-stmt.bind-param.php
Don’t! Using mysql_real_escape_string is enough to protect you against SQL injection and the stropslashes you are doing after makes you vulnerable to SQL injection. If you really want it, put it before as in:
function madSafety($string)
{
$string = stripslashes($string);
$string = strip_tags($string);
$string = mysql_real_escape_string($string);
return $string;
}
stripslashes is not really useful if you are doing mysql_real_escape_string.
strip_tags protects against HTML/XML injection, not SQL.
The important thing to note is that you should escape your strings differently depending on the imediate use you have for it.
When you are doing MYSQL requests use mysql_real_escape_string. When you are outputing web pages use htmlentities. To build web links use urlencode…
As vartec noted, if you can use placeholders by all means do it.
This topic is so wrong!
You should NOT filter the input of the user! It is information that has been entered by him. What are you going to do if I want my password be like: '"'>s3cr3t<script>alert()</script>
Filter the characters and leave me with a changed password, so I cannot even succeed in my first login? This is bad.
The proper solution is to use prepared statements or mysql_real_escape_string() to avoid sql injections and use context-aware escaping of the characters to avoid your html code being messed up.
Let me remind you that the web is only one of the ways you can represent the information entered by the user. Would you accept such stripping if some desktop software do it? I hope your answer is NO and you would understand why this is not the right way.
Note that in different context different characters has to be escaped. For example, if you need to display the user first name as a tooltip, you will use something like:
<span title="{$user->firstName}">{$user->firstName}</span>
However, if the user has set his first name to be like '"><script>window.document.location.href="http://google.com"</script> what are you gonna do? Strip the quotes? This would be so wrong! Instead of doing this non-sense, consider escaping the quotes while rendering the data, not while persisting it!
Another context you should consider is while rendering the value itself. Consider the previously used html code and imagine the user first name be like <textarea>. This would wrap all html code that follows into this textarea element, thus breaking up the whole page.
Yet again - consider escaping the data depending on the context you are using it in!
P.S Not really sure how to react on those negative votes. Are you, people, actually reading my reply?