Any harm in using str_replace on htmlentities? - php

I have a custom forum in which I employ htmlentities so users aren't able to post malicious code(html/js). Anyway, as I am pulling posts from the database, I use str_replace in order to show certain html elements <, >, &, etc.. is there any harm in doing this? Will it cause side effects/html to render?

User posts data
Data is escaped for mysql, written to DB
User makes request for data
Data is encoded for display (aggressively with htmlentities or htmlspecialchars, or some subset of allowed characters. You could do this with str_replace, but there are better utilities).

Use strip_tags to avoid any html / js / php.
It has some options to allow any tags you want like this:
strip_tags($text, '<p><a>');

strip_tags, as stated in the documentation will not remove inline javascript or sanitise so it isn't a good idea. A common solution is to use bbcode instead for which many libraries exist, or you can make your own and then use preg_replace to substitute in your own markup safely.
Here's a quick sample:
$safe_output = htmlspecialchars($output);
$find = array("'\[b\](.*?)\[/b\]'is");
$replace = array("<strong>\\1</strong>");
$result = preg_replace($find, $replace, nl2br($safe_output));

Related

how to escape only <script> tag using htmlspecialchars() in php

I have a string , in my sql database that has come from user.
$str ='<h2 contenteditable="true">I am a not a good user <script>alert("hacked") </script> </h2>';
if I echo it as it is then it is not good So I use htmlspecialchars(); to escape the special html chracters
echo htmlspecialchars($str);
This will save me from hacking , but i want to keep other tags (like <h2> ) as it is , i don't want it to change , is their a way if i could only escape specific tag using htmlspecialchars();
I was about to propose something very basic with regular expressions but I found this here:
https://stackoverflow.com/a/7131156/6219628
After reading more of the docs, I didn't found anything to ignore specific tags with just htmlspecialchars(), which doesn't sound surprising.
EDIT: And since using regex to parse html seems to be evil, you may eventually appreciate reading this bulky answer :)
https://stackoverflow.com/a/1732454/6219628
I think strip_tags() is what you are looking for. You can add allowed tags to the second parameter
Check out this function from the PHP Docs
$strippedinput = strip_tags_attributes($nonverifiedinput,"<p><br><h1><h2><h3><a><img>","class,style");
function strip_tags_attributes($string,$allowtags=NULL,$allowattributes=NULL){
$string = strip_tags($string,$allowtags);
if (!is_null($allowattributes)) {
if(!is_array($allowattributes)) $allowattributes = explode(",",$allowattributes);
if(is_array($allowattributes)) $allowattributes = implode(")(?<!",$allowattributes);
if (strlen($allowattributes) > 0) $allowattributes = "(?<!".$allowattributes.")";
$string = preg_replace_callback("/<[^>]*>/i",create_function( '$matches', 'return preg_replace("/ [^ =]*'.$allowattributes.'=(\"[^\"]*\"|\'[^\']*\')/i", "", $matches[0]);' ),$string);
}
return $string;
}
As Gerrit0 pointed out, you shouldn't use regex to parse HTML
Note that just removing the <script> tag isn't sufficient; there are many other ways that users can inject malicious content into your site.
If you want to restrict the HTML tags that users can input, use a tool like HTML Purifier which uses a whitelist of allowable tags and attributes.

Sanitizing user input HTML with htmlspecialchars, nl2br, str_replace, htmlspecialchars_decode, and stripslashes

I made this function
function echoSanitizer($var)
{
$var = htmlspecialchars($var, ENT_QUOTES);
$var = nl2br($var, false);
$var = str_replace(array("\\r\\n", "\\r", "\\n"), "<br>", $var);
$var = htmlspecialchars_decode($var);
return stripslashes($var);
}
Would it be safe from xss attacks?
htmlspecialchars to take away html tags
nl2br for the new lines
str_replace to convert the \r\n to <br>
htmlspecialchars_decode to convert back the original characters
stripslashes to STRIPSLASHES
Why I need all of that? Because I want to preview what the users inputed in and I wanted a WYSIWYG thing for them to see. Some of the input came from a textarea box and I wanted the spaces to be preserved so the nl2br is needed.
Generally I'm asking about the (htmlspecialchars_decode) because its new to me. Is it safe? As a whole is the function I made safe if I use it to display user input?
(No database involved in this scenario.)
In your case htmlspecialchars_decode() makes the function unsafe. Users must not be allowed to insert < character unescaped, because that allows them to create arbitrary tags (and filtering/blacklisting is a cat and mouse game you can't win).
At very minimum < must be escaped as <.
If you only allow plain text with newlines, then:
nl2br(htmlspecialchars($text_with_newlines, ENT_QUOTES));
is safe to output in HTML (except inside <script> or attributes that expect JavaScript or URLs such as onclick and href (in the latter case somebody could use javascript:… URL)).
If you want to allow users to use HTML tags, but not exploit your page, then correct function to do this won't fit in StackOverflow post (thousands of lines long, requires full HTML parser, processing of URLs and CSS, etc.) — you'll have to use something heavy-weight like HTMLPurifier.

How to sanitize HTML POST values of NicEdit?

I recently started to use NicEdit on my "Article Entry" page. However, I have some questions about security and preventing abuse.
First question:
I currently sanitize every input with "mysql_real_escape_string()" in my database class. In addition, I sanitize HTML values with "htmlspecialchars(htmlentities(strip_tags($var))).
How would you sanitize your "HTML inputs" while adding them to database, or the way I'm doing it works perfect?
Second question:
While I was making this question, there was a question with "similar title" so I readed it once. It was someone speaking about "abused HTML inputs" to mess with his valid template. (e.g just input)
It may occur on my current system too. How should it be dealt with in PHP?
Ps. I want to keep using NicEdit, so using BBCode system should be the last advice.
Thank you.
mysql_real_escape_string is not sanitization, it escapes text values to keep the syntax of the SQL query valid/unambiguous/injection safe.
strip_tags is sanitizing your string.
Doing both htmlentities and htmlspecialchars in order is overkill and may just garble your data. Since you're also stripping tags right before that, it's double overkill.
The rule is to make sure your data doesn't break your SQL syntax, therefore you mysql_real_escape_string once before putting the data into the query. You also do the same thing, protecting your HTML syntax, by HTML escaping text before outputting it into HTML, using either htmlspecialchars (recommended) or htmlentities, not both.
For a much more in-depth excursion into all this read The Great Escapism (Or: What You Need To Know To Work With Text Within Text).
I don't know NicEdit, but I assume it allows your users to style text using HTML behind the scenes. Why are you stripping the HTML from the data then? There's no point in using a WYSIWYG editor then.
This is a function I am using in one of my NICEDIT applications and it seems to do well with the code that comes out of nicedit.
function cleanFromEditor($text) {
//try to decode html before we clean it then we submit to database
$text = stripslashes(html_entity_decode($text));
//clean out tags that we don't want in the text
$text = strip_tags($text,'<p><div><strong><em><ul><ol><li><u><blockquote><br><sub><img><a><h1><h2><h3><span><b>');
//conversion elements
$conversion = array(
'<br>'=>'<br />',
'<b>'=>'<strong>',
'</b>'=>'</strong>',
'<i>'=>'<em>',
'</i>'=>'</em>'
);
//clean up the old html with new
foreach($conversion as $old=>$new){
$text = str_replace($old, $new, $text);
}
return htmlentities(mysql_real_escape_string($text));
}

What do I need to do to santize data from textarea to be fed to mysql database?

Well, the title is my question. Can anybody give me a list of things to do to sanitize my data before entering to mysql database using php, especially if the data contains html tags?
It depends on a lot of things. If you don't want to accept any HTML, that makes it a whole lot easier, run it through strip_tags() first to remove all the HTML from it. After that it's much safer. If you do want to accept some HTML, you can selectively keep some tags from it with the same function, just add in the tags to keep after. eg: strip_tags($string_to_sanitize, '<p><div>'); // Keeps only <p> and <div> tags.
As for inserting into a database, it's always best to sanitize anything before inserting into the database; adopting a "don't trust anybody" mentality will save you a lot of trouble. Preventing against SQL injection is fairly straightforward, this is the method I use:
$q = sprintf("INSERT INTO table_name (string_field, int_field) VALUES ('%s', %d);",
mysql_real_escape_string($values['string']),
mysql_real_escape_string($values['number']));
$result = mysql_query($q, $connection)
Generally once you open the door for allowing HTML in, you'll have a whole deal of things to worry about (there are some great articles on defending from XSS out there). If you want to test for XSS vulnerabilities, try the examples on http://ha.ckers.org/xss.html. There are some they have there that you would probably never even consider, so give it a look!
Also, if you are accepting specific types of input (eg: numbers, emails, boolean values) try using the inbuilt filter_var() function in PHP. They have a bunch of inbuilt types to validate data against (http://www.php.net/manual/en/filter.filters.validate.php), as well as a number of filters to sanitize your data (http://www.php.net/manual/en/filter.filters.sanitize.php).
Generally, accepting any input is like opening a Pandora's Box, and while you'll probably never be able to block 100% of the weaknesses (people are always looking to find a way in), you can block the common ones to save you headaches.
Finally remember to sanitize ALL external data. Just because you make a dropdown input doesn't mean some shady person can't send their own data instead!
Use mysql_real_escape_string();
mysql_query("INSERT INTO table(col) VALUES('".mysql_real_escape_string($_POST['data']."')");
You should use prepared statements when inserting data into the database, not any sort of escaping. (PHP manual: prepared statements in pdo and mysqli.)
Sanitization for HTML output should, as mentioned by others, happen when you go to take data out of the database and merge it into a page, not before.
Turn off register_globals and magic_quotes, use mysql_real_escape_string on any string coming from the user before placing it into your query.
Of course mysql_real_escape_string
When dealing with any kind of input start from the I won't allow anything stand point and whitelist only that deemed to be acceptable.
On insert you need to make sure that the data is MySQL-escaped. For this, use mysql_real_escape_string.
Before showing the data you will need to strip out unsafe HTML and/or JavaScript code. Many people choose to store the sanitised version in the database. Other prefer to strip the ugly HTML from the string before rendering.
You do this in PHP with some filtering. an example is the Drupal filter_xss function:
function filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd')) {
// Only operate on valid UTF-8 strings. This is necessary to prevent cross
// site scripting issues on Internet Explorer 6.
if (!drupal_validate_utf8($string)) {
return '';
}
// Store the input format
_filter_xss_split($allowed_tags, TRUE);
// Remove NUL characters (ignored by some browsers)
$string = str_replace(chr(0), '', $string);
// Remove Netscape 4 JS entities
$string = preg_replace('%&\s*\{[^}]*(\}\s*;?|$)%', '', $string);
// Defuse all HTML entities
$string = str_replace('&', '&', $string);
// Change back only well-formed entities in our whitelist
// Decimal numeric entities
$string = preg_replace('/&#([0-9]+;)/', '&#\1', $string);
// Hexadecimal numeric entities
$string = preg_replace('/&#[Xx]0*((?:[0-9A-Fa-f]{2})+;)/', '&#x\1', $string);
// Named entities
$string = preg_replace('/&([A-Za-z][A-Za-z0-9]*;)/', '&\1', $string);
return preg_replace_callback('%
(
<(?=[^a-zA-Z!/]) # a lone <
| # or
<!--.*?--> # a comment
| # or
<[^>]*(>|$) # a string that starts with a <, up until the > or the end of the string
| # or
> # just a >
)%x', '_filter_xss_split', $string);
}
well, there is not too much to do while we're talking of inserting data from textarea to mysql database.
For the strings placed into query, Mysql requirements are not so complicated.
Only 2 rules to follow:
inserted data should be surrounded by quotes.
some special character in the data should be escaped.
Note that this operation has nothing to do with security. It's syntax requirements.
Assuming you're adding quotes already, the only thing you have to add is escaping. Depends on your encoding, you can use addslashes or mysql_escape_string or mysql_real_escape_string functions.
However, other parts of query require more attention. If you're curious, refer to my earlier answer with complete guide: In PHP when submitting strings to the database should I take care of illegal characters using htmlspecialchars() or use a regular expression?
HTML tags has nothing to do with database and require no special attention.
However, for displaying data from untrusted source, some precautions should be taken. It was described in this topic already, only I have to add is you can't trust to strip_tags when used with second parameter.
You can use mysql_real_escape_string, you can also use htmlentities with addslashes... or you can use all 3 together also...

php - safest way to ensure plain text

What is the most secure way to stop users adding html or javascript to a field. I am adding a youtube style 'description' where users can explain their work but I don't want anything other than plain text in there and preferable none of the htmlentities rubbish like '<' or '>'.
Could I do something like this:
$clean = htmlentities($_POST['description']);
if ($clean != $_POST['description']) ... then return the form with an error?
Have you seen strip_tags?
strip_tags() would probably be the best bet.
You don't need to check the cleaned code vs the original and throw an error. As long as it is cleaned, you should be able to display it. Just throw away the original comment. You can put a note under the textbox saying that no html is allowed if you want to make it more user friendly.
Use strip_tags() instead htmlentities().
And the method is ok.
htmlspecialchars(), if used properly (see comments), is the safest way to ensure plain text. There is no way to inject any HTML or JavaScript when the output has all the HTML special characters escaped. If you use strip_tags, you will prevent your users from using completely legitimate characters.
Also don't forget mysql_real_escape_string() if you are storing data in MySQL.

Categories