Regex as first line of defense against XSS - php

I had a regex as the first line of defense against XSS.
public static function standard_text($str)
{
// pL matches letters
// pN matches numbers
// pZ matches whitespace
// pPc matches underscores
// pPd matches dashes
// pPo matches normal puncuation
return (bool) preg_match('/^[\pL\pN\pZ\p{Pc}\p{Pd}\p{Po}]++$/uD', (string) $str);
}
It is actually from Kohana 2.3.
This runs on public entered text (no HTML ever), and denies the input if it fails this test. The text is always displayed with htmlspecialchars() (or more specifically, Kohana's flavour, it adds the char set amongst other things). I also put a strip_tags() on output.
The client had a problem when he wanted to enter some text with parenthesis. I thought about modifying or extending the helper, but I also had a secondary thought - if I allow double quotes, is there really any reason why I need to validate at all?
Can I just rely on the escaping on output?

It's never secure to rely on Regexes for filtering dangerous XSS attacks. And although you are not relying on them, output escaping and input filtering, when used correctly, will kill all kinds of attacks. Therefore, there is no point in having Regexes as a "first line of defense" when their help isn't really needed. As you and your client have discovered, they only complicate things when used like this.
Long story short: if you use html_entities or htmlspecialchars to escape your output, you don't need regexes nor do you really need strip_tags either.

Related

Using preg_replace for PHP security

I'm developing a PHP function to process submissions from a web form.
Allowed characters are strictly alphanumeric, a-b, 0-9.
Is it safe to rely on preg_replace with a regular expression to clean this data prior to processing and insertion into a database.
I've looked at a lot of the regular PHP data sanitization options I see talked about but as the system design strictly prohibits the use or sotrage of non alphanumeric characters I think it would be easier to strip anything that doesn't match /[^a-zA-Z\s-0-9.,']/ from the outset.
Am I on the right track here?
If you are only permitting alphanumeric characters to be stored in your database, rather than strip off invalid characters, you are better off to return an error to your users for having supplied invalid input. This way, your users won't become confused when they see their data displayed back to them in a different form than they originally entered it.
In other words, validate the input with preg_match() to be sure it meets your requirements, and if not, return an error to the user so they can fix it. Then escape it for insertion into the database or use a prepared statement.
if (!preg_match('/^[a-z0-9., ]$/i', $input)) {
// error Invalid input. Please use only letters and numbers
}
else {
// call escape function on $input or insert it with a prepared statement
// whatever is the appropriate method for your RDBMS api.
}

User input to database

Suppose that, we're expecting just strings or numbers with the data send by a user. Is it safe enough to check the data with ereg and preg_match functions? Is there a way to fake them? Should we still use mysql_real_escape_string?
This will be short answer...
Use PDO:
Docs: http://php.net/manual/en/book.pdo.php
For example Zend famework is using this engine.
safe enough is relative to your own needs. If you're wanting to avoid mysql_real_escape_string for some reason then I first want to ask why.
My answer is: sure... depending on your conditions
you can preg match against [0-9a-z] and there is nothing to fear. Try passing a multibyte character to be safe. So long as your condition does not allow you to do anything if the match does not fit your requirements then there is no tricky work-around that I know of to slip in malicious characters on such a strict rule.
but the term "string" is very open. does that include punctuation? what kind, etc. If you allow standard injection characters as what you call a "String" then my answer is no longer sure.
But I still recommend mysql_real_escape_string() on all user submitted info, no matter how you try to purify it before hand.
If you use a regex to match against valid input, and it succeeds, then the user input is valid. That being said, if you don't have any malicious characters in valid input (particularly quotes or potentially multibyte characters), then you don't need to call mysql_real_escape_string. The same principle applies to something like:
$user_in_num = intval( $_POST['in_num']); // Don't need mysql_real_escape_string here
So something like the following:
$subject = $_POST['string_input'];
if( !preg_match('/[^a-z0-9]/i', $subject))
{
exit( 'Invalid input');
}
It is fine / safe to use $subject in an SQL query once the preg_match succeeds.

What do I need to do to santize data from textarea to be fed to mysql database?

Well, the title is my question. Can anybody give me a list of things to do to sanitize my data before entering to mysql database using php, especially if the data contains html tags?
It depends on a lot of things. If you don't want to accept any HTML, that makes it a whole lot easier, run it through strip_tags() first to remove all the HTML from it. After that it's much safer. If you do want to accept some HTML, you can selectively keep some tags from it with the same function, just add in the tags to keep after. eg: strip_tags($string_to_sanitize, '<p><div>'); // Keeps only <p> and <div> tags.
As for inserting into a database, it's always best to sanitize anything before inserting into the database; adopting a "don't trust anybody" mentality will save you a lot of trouble. Preventing against SQL injection is fairly straightforward, this is the method I use:
$q = sprintf("INSERT INTO table_name (string_field, int_field) VALUES ('%s', %d);",
mysql_real_escape_string($values['string']),
mysql_real_escape_string($values['number']));
$result = mysql_query($q, $connection)
Generally once you open the door for allowing HTML in, you'll have a whole deal of things to worry about (there are some great articles on defending from XSS out there). If you want to test for XSS vulnerabilities, try the examples on http://ha.ckers.org/xss.html. There are some they have there that you would probably never even consider, so give it a look!
Also, if you are accepting specific types of input (eg: numbers, emails, boolean values) try using the inbuilt filter_var() function in PHP. They have a bunch of inbuilt types to validate data against (http://www.php.net/manual/en/filter.filters.validate.php), as well as a number of filters to sanitize your data (http://www.php.net/manual/en/filter.filters.sanitize.php).
Generally, accepting any input is like opening a Pandora's Box, and while you'll probably never be able to block 100% of the weaknesses (people are always looking to find a way in), you can block the common ones to save you headaches.
Finally remember to sanitize ALL external data. Just because you make a dropdown input doesn't mean some shady person can't send their own data instead!
Use mysql_real_escape_string();
mysql_query("INSERT INTO table(col) VALUES('".mysql_real_escape_string($_POST['data']."')");
You should use prepared statements when inserting data into the database, not any sort of escaping. (PHP manual: prepared statements in pdo and mysqli.)
Sanitization for HTML output should, as mentioned by others, happen when you go to take data out of the database and merge it into a page, not before.
Turn off register_globals and magic_quotes, use mysql_real_escape_string on any string coming from the user before placing it into your query.
Of course mysql_real_escape_string
When dealing with any kind of input start from the I won't allow anything stand point and whitelist only that deemed to be acceptable.
On insert you need to make sure that the data is MySQL-escaped. For this, use mysql_real_escape_string.
Before showing the data you will need to strip out unsafe HTML and/or JavaScript code. Many people choose to store the sanitised version in the database. Other prefer to strip the ugly HTML from the string before rendering.
You do this in PHP with some filtering. an example is the Drupal filter_xss function:
function filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd')) {
// Only operate on valid UTF-8 strings. This is necessary to prevent cross
// site scripting issues on Internet Explorer 6.
if (!drupal_validate_utf8($string)) {
return '';
}
// Store the input format
_filter_xss_split($allowed_tags, TRUE);
// Remove NUL characters (ignored by some browsers)
$string = str_replace(chr(0), '', $string);
// Remove Netscape 4 JS entities
$string = preg_replace('%&\s*\{[^}]*(\}\s*;?|$)%', '', $string);
// Defuse all HTML entities
$string = str_replace('&', '&', $string);
// Change back only well-formed entities in our whitelist
// Decimal numeric entities
$string = preg_replace('/&#([0-9]+;)/', '&#\1', $string);
// Hexadecimal numeric entities
$string = preg_replace('/&#[Xx]0*((?:[0-9A-Fa-f]{2})+;)/', '&#x\1', $string);
// Named entities
$string = preg_replace('/&([A-Za-z][A-Za-z0-9]*;)/', '&\1', $string);
return preg_replace_callback('%
(
<(?=[^a-zA-Z!/]) # a lone <
| # or
<!--.*?--> # a comment
| # or
<[^>]*(>|$) # a string that starts with a <, up until the > or the end of the string
| # or
> # just a >
)%x', '_filter_xss_split', $string);
}
well, there is not too much to do while we're talking of inserting data from textarea to mysql database.
For the strings placed into query, Mysql requirements are not so complicated.
Only 2 rules to follow:
inserted data should be surrounded by quotes.
some special character in the data should be escaped.
Note that this operation has nothing to do with security. It's syntax requirements.
Assuming you're adding quotes already, the only thing you have to add is escaping. Depends on your encoding, you can use addslashes or mysql_escape_string or mysql_real_escape_string functions.
However, other parts of query require more attention. If you're curious, refer to my earlier answer with complete guide: In PHP when submitting strings to the database should I take care of illegal characters using htmlspecialchars() or use a regular expression?
HTML tags has nothing to do with database and require no special attention.
However, for displaying data from untrusted source, some precautions should be taken. It was described in this topic already, only I have to add is you can't trust to strip_tags when used with second parameter.
You can use mysql_real_escape_string, you can also use htmlentities with addslashes... or you can use all 3 together also...

sanitation script in php for login credentials

What I am looking for currently is a simple, basic, login credentials sanitation script.
I understand that I make a function to do so and I have one...but all it does right now is strip tags...
am I doomed to use replace? or is there a way i can just remove all special characters and spaces and limit it to only letters and numbers...then as for the password limit it to only letters and numbers exclimation points, periods, and other special chars that cannot affect my SQL query.
Please help :/
Thanks,
Matt
If you want to make strings safe for SQL, use mysql_real_escape_string().
If you want to limit a string to certain chars, use a regex.
For example, if you want only a-z, 0-9 and exclamation mark you can use.
$string = preg_replace('^[^a-z0-9!]+$', '', $string);
This will strip out anything that doesn't match the regex.
If you want to check for the string matches that pattern, use preg_match(). For readability you may want to take out the ^ and proceed the expression with the bang / not / ! operator instead.
If you are talking about stripping out things to make echoing to your page safe, use htmlspecialchars(). Depending on context, you may need to sanitize further.
Remember: If you are limiting characters in passwords, it only makes sense from a theoretical point of view that they will be easier to remember by the end user. Limiting chars makes password brute forcers easier (smaller pool of chars to check), and it shouldn't affect their storage (as they should be salted and hashed).
Sounds like you want to limit which characters people are allowed to use in their usernames and passwords. Sort of like this.
if (!preg_match('/^[a-zA-Z0-9_]++$/', $username)) {
// reject username
}
if (!preg_match('/^[a-zA-Z0-9\.!##$%^&*_-]++$/', $password)) {
// reject password
}
It's a bad idea to silently replace/remove characters in someone's credentials. You need to give them the feedback that these characters aren't allowed. It's also a bad idea to be too restrictive in what characters you allow in a password, for security reasons which others have already touched upon.
First off, don't ever sanitize a password. It should be hashed long before getting anywhere close to an SQL query, so it will actually have the opposite effect and making your application less secure for the users.
$password = "hey'; --droptable";
$hashedPass = sha1("salt" . $password);
// sha1 returns a alphanumerical hash of the password
// stick the hash in the database
If you're dealing with a MySQL database, mysql_real_escape_string() is good enough as alex said. One thing you have to keep in mind with that method is that you will need an open connection to your MySQL database for it to work.
mysql_connect();
$string = "hey'; --droptable";
$string = mysql_real_escape_string($string);
echo $string; // outputs "hey\' --droptable"
There are a few other DBMS APIs that has an escape string method, here are a few: http://au2.php.net/manual-lookup.php?pattern=escape_string&lang=en

PHP Regex for human names

I've run into a bit of a problem with a Regex I'm using for humans names.
$rexName = '/^[a-z' -]$/i';
Suppose a user with the name Jürgen wishes to register? Or Böb? That's pretty commonplace in Europe. Is there a special notation for this?
EDIT:, just threw the Jürgen name against a regex creator, and it splits the word up at the ü letter...
http://www.txt2re.com/index.php3?s=J%FCrgen+Blalock&submit=Show+Matches
EDIT2: Allright, since checking for such specific things is hard, why not use a regex that simply checks for illegal characters?
$rexSafety = "/^[^<,\"#/{}()*$%?=>:|;#]*$/i";
(now which ones of these can actually be used in any hacking attempt?)
For instance. This allows ' and - signs, yet you need a ; to make it work in SQL, and those will be stopped.Any other characters that are commonly used for HTML injection of SQL attacks that I'm missing?
I would really say : don't try to validate names : one day or another, your code will meet a name that it thinks is "wrong"... And how do you think one would react when an application tells him "your name is not valid" ?
Depending on what you really want to achieve, you might consider using some kind of blacklist / filters, to exclude the "not-names" you thought about : it will maybe let some "bad-names" pass, but, at least, it shouldn't prevent any existing name from accessing your application.
Here are a few examples of rules that come to mind :
no number
no special character, like "~{()}#^$%?;:/*§£ø and probably some others
no more that 3 spaces ?
none of "admin", "support", "moderator", "test", and a few other obvious non-names that people tend to use when they don't want to type in their real name...
(but, if they don't want to give you their name, their still won't, even if you forbid them from typing some random letters, they could just use a real name... Which is not their's)
Yes, this is not perfect ; and yes, it will let some non-names pass... But it's probably way better for your application than saying someone "your name is wrong" (yes, I insist ^^ )
And, to answer a comment you left under one other answer :
I could just forbid the most command
characters for SQL injection and XSS
attacks,
About SQL Injection, you must escape your data before sending those to the database ; and, if you always escape those data (you should !), you don't have to care about what users may input or not : as it is escaped, always, there is no risk for you.
Same about XSS : as you always escape your data when ouputting it (you should !), there is no risk of injection ;-)
EDIT : if you just use that regex like that, it will not work quite well :
The following code :
$rexSafety = "/^[^<,\"#/{}()*$%?=>:|;#]*$/i";
if (preg_match($rexSafety, 'martin')) {
var_dump('bad name');
} else {
var_dump('ok');
}
Will get you at least a warning :
Warning: preg_match() [function.preg-match]: Unknown modifier '{'
You must escape at least some of those special chars ; I'll let you dig into PCRE Patterns for more informations (there is really a lot to know about PCRE / regex ; and I won't be able to explain it all)
If you actually want to check that none of those characters is inside a given piece of data, you might end up with something like that :
$rexSafety = "/[\^<,\"#\/\{\}\(\)\*\$%\?=>:\|;#]+/i";
if (preg_match($rexSafety, 'martin')) {
var_dump('bad name');
} else {
var_dump('ok');
}
(This is a quick and dirty proposition, which has to be refined!)
This one says "OK" (well, I definitly hope my own name is ok!)
And the same example with some specials chars, like this :
$rexSafety = "/[\^<,\"#\/\{\}\(\)\*\$%\?=>:\|;#]+/i";
if (preg_match($rexSafety, 'ma{rtin')) {
var_dump('bad name');
} else {
var_dump('ok');
}
Will say "bad name"
But please note I have not fully tested this, and it probably needs more work ! Do not use this on your site unless you tested it very carefully !
Also note that a single quote can be helpful when trying to do an SQL Injection... But it is probably a character that is legal in some names... So, just excluding some characters might no be enough ;-)
PHP’s PCRE implementation supports Unicode character properties that span a larger set of characters. So you could use a combination of \p{L} (letter characters), \p{P} (punctuation characters) and \p{Zs} (space separator characters):
/^[\p{L}\p{P}\p{Zs}]+$/
But there might be characters that are not covered by these character categories while there might be some included that you don’t want to be allowed.
So I advice you against using regular expressions on a datum with such a vague range of values like a real person’s name.
Edit   As you edited your question and now see that you just want to prevent certain code injection attacks: You should better escape those characters rather than rejecting them as a potential attack attempt.
Use mysql_real_escape_string or prepared statements for SQL queries, htmlspecialchars for HTML output and other appropriate functions for other languages.
That's a problem with no easy general solution. The thing is that you really can't predict what characters a name could possibly contain. Probably the best solution is to define an negative character mask to exclude some special characters you really don't want to end up in a name.
You can do this using:
$regexp = "/^[^<put unwanted characters here>]+$/
If you're trying to parse apart a human name in PHP, I recomment Keith Beckman's nameparse.php script.

Categories