preg_replace on xss code - php

Can this code help to sanitize malicious code in user submit form?
function rex($string) {
$patterns = array();
$patterns[0] = '/=/i';
$patterns[1] = '/javascript:/i';
$replacements = array();
$replacements[0] = '';
$replacements[1] = '';
return preg_replace($patterns, $replacements, $string);
I have included htmlentities() to prevent XSS on client side, is all the code shown is safe enough to prevent attack?

You don't need that if you are using htmlentities. To prevent XSS you can even just use htmlspecialchars.
Just make sure that you use htmlspecialchars on all data that is printed as plain text in your HTML response.
See also: the answers to "Does this set of regular expressions FULLY protect against cross site scripting?"

your substitutions may help. But you're better off using a pre-rolled solution like PHP's data filters. Then you can easily limit datatype to what you expect.

htmlentities alone will do the trick. No need to replace anything at all.

No.
http://ha.ckers.org/xss.html

Your first replacement rule is useless as it can be easily circumvented by using eval and character encoding (and an equal sign isn't necessary for XSS attacks anyway).
Your second rule can be very likely circumvented on at least some browsers by using things like javascript : or java\script:.
In short, it doesn't help much. If you want to show plain text, htmlentities is probably fine (there are exotic attacks which take advantage of unusual character encodings and browser stupidity to launch XSS attacks without any special characters, but that only works on specific browsers - cough IE cough - in specific situations). If you want to put user input in URLs or other attributes, it is not necessarily enough.

Related

Text input sanitation and security

I am trying to make sure all my inputs are secure, protecting the server and XSS attacks. Is validating input with strip_tags and htmlentities a fool proof system? I have been told it was and would like to confirm. ie for example:
$re = htmlentities(strip_tags($_GET['re']), ENT_COMPAT, "UTF-8");
this should prevent any linux commands and any html links correct? are there any vulnerabilities that havent been considered with this?
This is not at all what htmlentities is for. Use htmlentites to encode your output before it is sent to the browser. It has nothing to do with sanitizing input. The only thing you need to worry about when processing input is properly escaping data being interpolated into SQL queries to prevent SQL injection. See PHP Data Objects for more on that.
strip_tags is debatably useful here, but you don't need to use both strip_tags and htmlentities. The whole purpose of htmlentites is that it prevents the tags from being interpreted. The only correct way to think about this is: Preserve the content the user entered and render it safe. Don't strip their tags, just encode them so they appear as they were typed. Otherwise you wind up stripping things like <sarcasm> and <rant> tags. The intent of the user was not to inject HTML.
"Linux commands" have nothing to do with HTML. There is no way to execute arbitrary Linux commands through HTML/script injection.
What i have in mind is something such as ";ls -la"
If you are actually taking user-supplied input and executing it via system or something in that vein, you are already in trouble. This is a terrible idea and you shouldn't do it.
</rant>
You must always choose the right tool for the job. That being said $re = htmlentities(strip_tags($_GET['re']), ENT_COMPAT, "UTF-8"); should never be used for anything. The command is redundant which means you don't understand what its doing. It not very good at preventing xss because xss is an output problem.
To sanitize shell arguments you must use escpaeshellarg(). For XSS you should use:
htmlspecialchars($_GET['re'], ENT_QUOTES, "UTF-8");. However this doesn't stop all XSS and it doesn't do anything to stop SQL Injection.
Use parametrized queries for sql.
And all of that just scratches the surface read the OWASP top 10.
this is how I filter my inputs before I'll insert it into my database
<?php
function sanitize($data){
$result = trim($data);
$result = htmlspecialchars($data);
$result = mysql_real_escape_string($data);
return $result;
}
?>

php - Clean user input using preg_replace_callback and ord()?

I have a forum style text box and I would like to sanitize the user input to stop potential xss and code insertion. I have seen htmlentities used, but then others have said that &,#,%,: characters need to be encoded as well, and it seems the more I look, the more potentially dangerous characters pop up. Whitelisting is problematic as there are many valid text options beyond ^a-zA-z0-9. I have come up with this code. Will it work to stop attacks and be secure? Is there any reason not to use it, or a better way?
function replaceHTML ($match) {
return "&#" . ord ($match[0]) . ";";
}
$clean = preg_replace_callback ( "/[^ a-zA-Z0-9]/", "replaceHTML", $userInput );
EDIT:_____________________________
I could of course be wrong, but it is my understanding that htmlentities only replaces & < > " (and ' if ENT_QUOTES is turned on). This is probably enough to stop most attacks (and frankly probably more than enough for my low traffic site). In my obsessive attention to detail, however, I dug further. A book I have warns to also encode # and % for "shutting down hex attacks". Two websites I found warned against allowing : and --. Its all rather confusing to me, and led me to explore converting all non-alphanumeric characters. If htmlentities does this already then great, but it does not seem to. Here are results from code I ran I copied after clicking view source in firefox.
original (random characters to test):
5:gjla#''*&$!j-l:4
preg_replace_callback:
<b>5:</b>gjla<hi>#''*&$!j-l:4
htmlentities (w/ ENT_QUOTES):
<b>5:</b>gjla<hi>#''*&$!j-l:4
htmlentities appears to not be encoding those other characters like :
Sorry for the wall of text. Is this just me being paranoid?
EDIT #2: ___________
All you need to do to stop XSS attacks is use htmlspecialchars().
That is exactly what htmlentities does already:
http://codepad.viper-7.com/NDZMa3
It will convert (spaced to prevent stackoverflow double encoding):
"& # amp ;"
to
"& # amp; # amp ;"
space ' ' can be changed to \s in your regex, also by adding /i at the end of the regex you made it case insensitive, and you don't need manually translate your chars to sequences, it can be done with a callback of htmlentities
$clean = preg_replace_callback('/[^a-z0-9\s]/i', 'htmlentities', $userInput);

Will this preg_match prevent XSS successfully?

I read that even if you strip <script> you are still vulnerable to XSS.
Something interesting I found as an answer is this <scrip<script></script>t>alert(1337)</script>
How do you evaluate this preg match?
echo preg_replace('/<script\b[^>]*>(.*?)<\/script>/is', "", $var);
Additionally, is there any other tags I should be aware for XSS attacks?
strip_tags is sufficient to get rid of XSS issues. But using a single regex is not, as you need to cleanse and whitelist all HTML attributes and tags. Browsers are extremely forgiving and allow even malformed HTML that's not standards-compliant (also IE bugs). That's why it is pretty much unfeasible to use a regex for that. (Despite the silly SO meme it is possible to match HTML with a contemporary regex language, just way too much effort.)
All the regex solutions you will find are blacklists, which are not considered a reliable solution. They will miss half of the possible exploits http://ha.ckers.org/xss.html
Regular expressions are not sufficient to filter dangerous HTML. You must properly parse the HTML, and drop malformed tags as well as non-whitelisted tags. Use an existing library such as HTML purifier; it is far too easy to get this wrong.
You could try eliminating script tags in a while loop, until there is no more script tags to be found:
while (preg_match("'[<]script.*?/script[>]'is",$data))
{
$data = preg_replace("'[<]script.*?/script[>]'is","",$data);
}
You should check onevent element properties also, like: onclick, onfocus, etc. They can also contain unwanted XSS.

What do I need to do to santize data from textarea to be fed to mysql database?

Well, the title is my question. Can anybody give me a list of things to do to sanitize my data before entering to mysql database using php, especially if the data contains html tags?
It depends on a lot of things. If you don't want to accept any HTML, that makes it a whole lot easier, run it through strip_tags() first to remove all the HTML from it. After that it's much safer. If you do want to accept some HTML, you can selectively keep some tags from it with the same function, just add in the tags to keep after. eg: strip_tags($string_to_sanitize, '<p><div>'); // Keeps only <p> and <div> tags.
As for inserting into a database, it's always best to sanitize anything before inserting into the database; adopting a "don't trust anybody" mentality will save you a lot of trouble. Preventing against SQL injection is fairly straightforward, this is the method I use:
$q = sprintf("INSERT INTO table_name (string_field, int_field) VALUES ('%s', %d);",
mysql_real_escape_string($values['string']),
mysql_real_escape_string($values['number']));
$result = mysql_query($q, $connection)
Generally once you open the door for allowing HTML in, you'll have a whole deal of things to worry about (there are some great articles on defending from XSS out there). If you want to test for XSS vulnerabilities, try the examples on http://ha.ckers.org/xss.html. There are some they have there that you would probably never even consider, so give it a look!
Also, if you are accepting specific types of input (eg: numbers, emails, boolean values) try using the inbuilt filter_var() function in PHP. They have a bunch of inbuilt types to validate data against (http://www.php.net/manual/en/filter.filters.validate.php), as well as a number of filters to sanitize your data (http://www.php.net/manual/en/filter.filters.sanitize.php).
Generally, accepting any input is like opening a Pandora's Box, and while you'll probably never be able to block 100% of the weaknesses (people are always looking to find a way in), you can block the common ones to save you headaches.
Finally remember to sanitize ALL external data. Just because you make a dropdown input doesn't mean some shady person can't send their own data instead!
Use mysql_real_escape_string();
mysql_query("INSERT INTO table(col) VALUES('".mysql_real_escape_string($_POST['data']."')");
You should use prepared statements when inserting data into the database, not any sort of escaping. (PHP manual: prepared statements in pdo and mysqli.)
Sanitization for HTML output should, as mentioned by others, happen when you go to take data out of the database and merge it into a page, not before.
Turn off register_globals and magic_quotes, use mysql_real_escape_string on any string coming from the user before placing it into your query.
Of course mysql_real_escape_string
When dealing with any kind of input start from the I won't allow anything stand point and whitelist only that deemed to be acceptable.
On insert you need to make sure that the data is MySQL-escaped. For this, use mysql_real_escape_string.
Before showing the data you will need to strip out unsafe HTML and/or JavaScript code. Many people choose to store the sanitised version in the database. Other prefer to strip the ugly HTML from the string before rendering.
You do this in PHP with some filtering. an example is the Drupal filter_xss function:
function filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd')) {
// Only operate on valid UTF-8 strings. This is necessary to prevent cross
// site scripting issues on Internet Explorer 6.
if (!drupal_validate_utf8($string)) {
return '';
}
// Store the input format
_filter_xss_split($allowed_tags, TRUE);
// Remove NUL characters (ignored by some browsers)
$string = str_replace(chr(0), '', $string);
// Remove Netscape 4 JS entities
$string = preg_replace('%&\s*\{[^}]*(\}\s*;?|$)%', '', $string);
// Defuse all HTML entities
$string = str_replace('&', '&', $string);
// Change back only well-formed entities in our whitelist
// Decimal numeric entities
$string = preg_replace('/&#([0-9]+;)/', '&#\1', $string);
// Hexadecimal numeric entities
$string = preg_replace('/&#[Xx]0*((?:[0-9A-Fa-f]{2})+;)/', '&#x\1', $string);
// Named entities
$string = preg_replace('/&([A-Za-z][A-Za-z0-9]*;)/', '&\1', $string);
return preg_replace_callback('%
(
<(?=[^a-zA-Z!/]) # a lone <
| # or
<!--.*?--> # a comment
| # or
<[^>]*(>|$) # a string that starts with a <, up until the > or the end of the string
| # or
> # just a >
)%x', '_filter_xss_split', $string);
}
well, there is not too much to do while we're talking of inserting data from textarea to mysql database.
For the strings placed into query, Mysql requirements are not so complicated.
Only 2 rules to follow:
inserted data should be surrounded by quotes.
some special character in the data should be escaped.
Note that this operation has nothing to do with security. It's syntax requirements.
Assuming you're adding quotes already, the only thing you have to add is escaping. Depends on your encoding, you can use addslashes or mysql_escape_string or mysql_real_escape_string functions.
However, other parts of query require more attention. If you're curious, refer to my earlier answer with complete guide: In PHP when submitting strings to the database should I take care of illegal characters using htmlspecialchars() or use a regular expression?
HTML tags has nothing to do with database and require no special attention.
However, for displaying data from untrusted source, some precautions should be taken. It was described in this topic already, only I have to add is you can't trust to strip_tags when used with second parameter.
You can use mysql_real_escape_string, you can also use htmlentities with addslashes... or you can use all 3 together also...

Best way to defend against mysql injection and cross site scripting

At the moment, I apply a 'throw everything at the wall and see what sticks' method of stopping the aforementioned issues. Below is the function I have cobbled together:
function madSafety($string)
{
$string = mysql_real_escape_string($string);
$string = stripslashes($string);
$string = strip_tags($string);
return $string;
}
However, I am convinced that there is a better way to do this. I am using FILTER_ SANITIZE_STRING and this doesn't appear to to totally secure.
I guess I am asking, which methods do you guys employ and how successful are they? Thanks
Just doing a lot of stuff that you don't really understand, is not going to help you. You need to understand what injection attacks are and exactly how and where you should do what.
In bullet points:
Disable magic quotes. They are an inadequate solution, and they confuse matters.
Never embed strings directly in SQL. Use bound parameters, or escape (using mysql_real_escape_string).
Don't unescape (eg. stripslashes) when you retrieve data from the database.
When you embed strings in html (Eg. when you echo), you should default to escape the string (Using htmlentities with ENT_QUOTES).
If you need to embed html-strings in html, you must consider the source of the string. If it's untrusted, you should pipe it through a filter. strip_tags is in theory what you should use, but it's flawed; Use HtmlPurifier instead.
See also: What's the best method for sanitizing user input with PHP?
The best way against SQL injection is to bind variables, rather then "injecting" them into string.
http://www.php.net/manual/en/mysqli-stmt.bind-param.php
Don’t! Using mysql_real_escape_string is enough to protect you against SQL injection and the stropslashes you are doing after makes you vulnerable to SQL injection. If you really want it, put it before as in:
function madSafety($string)
{
$string = stripslashes($string);
$string = strip_tags($string);
$string = mysql_real_escape_string($string);
return $string;
}
stripslashes is not really useful if you are doing mysql_real_escape_string.
strip_tags protects against HTML/XML injection, not SQL.
The important thing to note is that you should escape your strings differently depending on the imediate use you have for it.
When you are doing MYSQL requests use mysql_real_escape_string. When you are outputing web pages use htmlentities. To build web links use urlencode…
As vartec noted, if you can use placeholders by all means do it.
This topic is so wrong!
You should NOT filter the input of the user! It is information that has been entered by him. What are you going to do if I want my password be like: '"'>s3cr3t<script>alert()</script>
Filter the characters and leave me with a changed password, so I cannot even succeed in my first login? This is bad.
The proper solution is to use prepared statements or mysql_real_escape_string() to avoid sql injections and use context-aware escaping of the characters to avoid your html code being messed up.
Let me remind you that the web is only one of the ways you can represent the information entered by the user. Would you accept such stripping if some desktop software do it? I hope your answer is NO and you would understand why this is not the right way.
Note that in different context different characters has to be escaped. For example, if you need to display the user first name as a tooltip, you will use something like:
<span title="{$user->firstName}">{$user->firstName}</span>
However, if the user has set his first name to be like '"><script>window.document.location.href="http://google.com"</script> what are you gonna do? Strip the quotes? This would be so wrong! Instead of doing this non-sense, consider escaping the quotes while rendering the data, not while persisting it!
Another context you should consider is while rendering the value itself. Consider the previously used html code and imagine the user first name be like <textarea>. This would wrap all html code that follows into this textarea element, thus breaking up the whole page.
Yet again - consider escaping the data depending on the context you are using it in!
P.S Not really sure how to react on those negative votes. Are you, people, actually reading my reply?

Categories