I read that even if you strip <script> you are still vulnerable to XSS.
Something interesting I found as an answer is this <scrip<script></script>t>alert(1337)</script>
How do you evaluate this preg match?
echo preg_replace('/<script\b[^>]*>(.*?)<\/script>/is', "", $var);
Additionally, is there any other tags I should be aware for XSS attacks?
strip_tags is sufficient to get rid of XSS issues. But using a single regex is not, as you need to cleanse and whitelist all HTML attributes and tags. Browsers are extremely forgiving and allow even malformed HTML that's not standards-compliant (also IE bugs). That's why it is pretty much unfeasible to use a regex for that. (Despite the silly SO meme it is possible to match HTML with a contemporary regex language, just way too much effort.)
All the regex solutions you will find are blacklists, which are not considered a reliable solution. They will miss half of the possible exploits http://ha.ckers.org/xss.html
Regular expressions are not sufficient to filter dangerous HTML. You must properly parse the HTML, and drop malformed tags as well as non-whitelisted tags. Use an existing library such as HTML purifier; it is far too easy to get this wrong.
You could try eliminating script tags in a while loop, until there is no more script tags to be found:
while (preg_match("'[<]script.*?/script[>]'is",$data))
{
$data = preg_replace("'[<]script.*?/script[>]'is","",$data);
}
You should check onevent element properties also, like: onclick, onfocus, etc. They can also contain unwanted XSS.
Related
I recently started to use NicEdit on my "Article Entry" page. However, I have some questions about security and preventing abuse.
First question:
I currently sanitize every input with "mysql_real_escape_string()" in my database class. In addition, I sanitize HTML values with "htmlspecialchars(htmlentities(strip_tags($var))).
How would you sanitize your "HTML inputs" while adding them to database, or the way I'm doing it works perfect?
Second question:
While I was making this question, there was a question with "similar title" so I readed it once. It was someone speaking about "abused HTML inputs" to mess with his valid template. (e.g just input)
It may occur on my current system too. How should it be dealt with in PHP?
Ps. I want to keep using NicEdit, so using BBCode system should be the last advice.
Thank you.
mysql_real_escape_string is not sanitization, it escapes text values to keep the syntax of the SQL query valid/unambiguous/injection safe.
strip_tags is sanitizing your string.
Doing both htmlentities and htmlspecialchars in order is overkill and may just garble your data. Since you're also stripping tags right before that, it's double overkill.
The rule is to make sure your data doesn't break your SQL syntax, therefore you mysql_real_escape_string once before putting the data into the query. You also do the same thing, protecting your HTML syntax, by HTML escaping text before outputting it into HTML, using either htmlspecialchars (recommended) or htmlentities, not both.
For a much more in-depth excursion into all this read The Great Escapism (Or: What You Need To Know To Work With Text Within Text).
I don't know NicEdit, but I assume it allows your users to style text using HTML behind the scenes. Why are you stripping the HTML from the data then? There's no point in using a WYSIWYG editor then.
This is a function I am using in one of my NICEDIT applications and it seems to do well with the code that comes out of nicedit.
function cleanFromEditor($text) {
//try to decode html before we clean it then we submit to database
$text = stripslashes(html_entity_decode($text));
//clean out tags that we don't want in the text
$text = strip_tags($text,'<p><div><strong><em><ul><ol><li><u><blockquote><br><sub><img><a><h1><h2><h3><span><b>');
//conversion elements
$conversion = array(
'<br>'=>'<br />',
'<b>'=>'<strong>',
'</b>'=>'</strong>',
'<i>'=>'<em>',
'</i>'=>'</em>'
);
//clean up the old html with new
foreach($conversion as $old=>$new){
$text = str_replace($old, $new, $text);
}
return htmlentities(mysql_real_escape_string($text));
}
I have html stored in the database and I need to output it to the page.
If I don't escape() it, then I get the bold formatting I want, but I run the risk of getting an XSS from the unescaped html source.
If I escape() it, then it shows the raw html code <b>bold text</b> instead of bold text.
How can I escape everything, except some tags? I'm thinking to apply the escape(), then search for the <b> and </b> and unescape them. Would that work? Any security problems you see with it? I'm also not sure how I would search for the <b></b> tags. Regex for that maybe or what?
P.S. the escape() I mean is a function in Zend. I believe it's the equivalent of htmlspecialchars().
Unescaping is the way to go. If you only whitelist a couple of tags to be converted back from the html escapes, then you won't run into XSS exploits.
Workaround markups provide no advantage regarding that, as the many failed BBcode parsers prove.
(Instead of converting back and forth it might however be sensible to utilize HTMLPurifier instead.)
If the HTML-markup in the database comes from users you do not trust, you should give them access to markdown or similar 'safe' editing environments, so they can prepare the markup they want and not be allowed to inject HTML.
Attempts to perform selective filtering are frequently wrong, and miss ways attackers can inject malicious code. So don't let them write raw HTML.
htmlspecialchars_decode() is the opposite of htmlspecialchars(). It is possible to unescape it, but there's no parameter for restricting tags.
If the html is written by the user it is bad idea :)
You could use the HTMLPurifier library which will take care of everything you need to do with escaping and such. Here is a nice video explaining how to install it into the zend framework
http://www.zendcasts.com/htmlpurifier-integration/2011/05/
try use strip_tags in the second parameter is the $ allowable_tags
Use Zend_Filter_StripTags class and as argument for the constructor use an array with following keys:
'allowTags' => Tags which are allowed
'allowAttribs' => Attributes which are allowed
This second part allows you to trim all unwanted attribs like 'onClick' etc whose can be as dangerous as <script> code, but you can leave 'src' for <img> or 'href' for <a>
Create your own view helper or you can also use setEscape() in controller See http://framework.zend.com/manual/en/zend.view.scripts.html#zend.view.scripts.escaping
And I'm talking (especially) forums here - [PHP]code here[/PHP] - style. Some forums escape double quotes or other "dangerous characters" and others don't.
What is the best method? What are you guys using?
Can it be done without the fear of code injection?
Edit: Who said anything about reinventing the wheel?
When PHP echo or print text, it never executes it. That only happens with eval. This means that if you did this:
echo '<?php ... ?>';
it would carry through to the page output and not be parsed or executed.
This means that all you need to do is escape the usual characters (<, >, &, etc.) and you should generally be safe.
Don't reinvent the wheel. I see BBCode in your question. Grab a markdown library and use it instead. SO uses this: http://daringfireball.net/projects/markdown/
There is no fear of PHP code injection (unless you are doing some unusual things like eval'ing HTML templates) but always a fear of JS code injection, often called XSS. And all danger coming only from possible JS code.
Thus, there is no special treatment for the PHP code, shown on a HTML page. Just treat it as any other data. < > brackets usually being escaped, for obvious reason.
Don't reinvent the wheel. PHP has it's highlight_string function for this
If you see escaped quotes on some page, that's most likely because their script escaped them twice (for example magic_quotes did it once, then mysql_query() again). When data sanitisation is done properly, you should not see escape characters in output.
I was reading an article about form security because I have a form in which a user can add messages.
I read that it was best to use strip_tags(), htmlspecialchars() and nl2br(). Somewhere else it is being said to use html_entity_decode().
I have this code in my page which takes the user input
<?php
$topicmessage = check_input($_POST['message']); //protect against SQLinjection
$topicmessage = strip_tags($topicmessage, "<p><a><span>");
$topicmessage = htmlspecialchars($topicmessage);
$topicmessage = nl2br($topicmessage);
?>
but when i echo the message, it's all on one line and it appears that the breaks have been removed by the strip_tags and not put back by nl2br().
To me, that makes sense why it does that, because if the break has been removed, how does it know where to put it back (or does it)?
Anyway, i'm looking for a way where i can protect my form for being used to try and hack the site like using javascript in the form.
You have 2 choices:
Allow absolutely no HTML. Use strip_tags() with NO allowed tags, or htmlspecialchars() to escape any tags that may be in there.
Allow HTML, but you need to sanitize the HTML. This is NOT something you can do with strip_tags. Use a library (Such as HTMLPurifier)...
You just need htmlspecialchars before printing form content, and mysql_real_escape before posting into SQL(you don't need it before printing), and you should be good.
Doing your way of stipping tags is very dangerous, you need short list of allowed tags with limited attributes - this is not something you can do in 1 line. You might want to look into HTML normalizers, like Tidy.
Use HTML Purifier for html-input and strip everything you dont want - all but paragraphs, all anchors etc.
Unrelated but important:
sprintf for stuff like "only digits from that field".
mysql-real-escape-string.php on all insert queries in general.
I have one problem regarding the data insertion in PHP.
In my site there is a message system.
So when my inbox loads it gives one JavaScript alert.
I have searched a lot in my site and finally I found that someone have send me a message with the text below.
<script>
alert(5)
</script>
So how can I restrict the script code being inserted in my database?
I am running on PHP.
There is no problem with JavaScript code being stored in the database. The actual problem is with non-HTML content being taken from the database and displayed to the user as if it were HTML. The correct approach would be to make sure your rendering code treats text as text, not as HTML.
In PHP, this would be done by calling htmlspecialchars on the inbox contents when displaying the inbox (possibly along with nl2br and maybe turning links to <a> tags).
Avoid using striptags for text content: as an user, I might want to type a message like:
... and to create a link, use your-text-here ...
striptags would eliminate the tag, htmlspecialchars would make the text appear as it was typed.
You should not restrict it to be inserted into the database (if StackOverflow would restrict it, we would not be able to post code examples here!)
You should better control how you display it. For instance, add htmlentities() or htmlspecialchars() to your echo call.
This is called XSS. There are numerous threads about it on SO.
How to prevent XSS with HTML/PHP?
What are the best practices for avoid xss attacks in a PHP site?
XSS Attacks Prevention
Is preventing XSS and SQL Injection as easy as does this…?
You should use strip_tags. If you still want to allow some HTML, then add a whitelist in the second parameter.
I should add a really big caveat here. If you're leaving any tags in a strip_tags whitelist, you can still be susceptible to javascript injection. Assume you're allowing the seemingly innocuous tags <strong> and <em>:
Strip tags will still allow all attributes, including event handlers
like <strong onmouseover="window.href=http://mydodgysite.com">this</strong>.
You have a couple of serious options:
strip_tags with no whitelist. Safe, but doesn't allow for any formatting, and may cause problems with strings like this: "x<y, but y>4" --> "x4"
htmlentities. Use this when displaying the data on the screen (not on the data before you put it in the database). It's safe, but doesn't allow for formatting.
A different markup system than HTML, for example: Markdown, Wiki markup, BB Code. Requires rendering to convert back to HTML, but it's mostly safe and can be quite flexible.
User input should be escaped before outputting it.
Whenever you're displaying something a user submitted, run it through htmlspecialchars() first. This'll turn HTML code into safe output.
Take a look at the htmlspecialchars() function. It converts < > ' " and & to their html entity equilivents, meaning <script> will become <script>
You can use strip_tags(). The second argument of this function will allow you to list an explicit list of which tags are allowable:
// Allow <p> and <a>, <script> will be stripped
echo strip_tags($text, '<p><a>');
You may also consider htmlspecialchars(), which converts characters like < into <, causing the browser to interpret them as text, rather than code:
$new = htmlspecialchars("<a href='test'>Test</a>", ENT_QUOTES);
echo $new; // <a href='test'>Test</a>
If I understand you right, you're just looking for two simple commands:
$message = str_replace($message, "<", "<");
$message = str_replace($message, ">", ">");