php - safest way to ensure plain text - php

What is the most secure way to stop users adding html or javascript to a field. I am adding a youtube style 'description' where users can explain their work but I don't want anything other than plain text in there and preferable none of the htmlentities rubbish like '<' or '>'.
Could I do something like this:
$clean = htmlentities($_POST['description']);
if ($clean != $_POST['description']) ... then return the form with an error?

Have you seen strip_tags?

strip_tags() would probably be the best bet.
You don't need to check the cleaned code vs the original and throw an error. As long as it is cleaned, you should be able to display it. Just throw away the original comment. You can put a note under the textbox saying that no html is allowed if you want to make it more user friendly.

Use strip_tags() instead htmlentities().
And the method is ok.

htmlspecialchars(), if used properly (see comments), is the safest way to ensure plain text. There is no way to inject any HTML or JavaScript when the output has all the HTML special characters escaped. If you use strip_tags, you will prevent your users from using completely legitimate characters.

Also don't forget mysql_real_escape_string() if you are storing data in MySQL.

Related

How to sanitize HTML POST values of NicEdit?

I recently started to use NicEdit on my "Article Entry" page. However, I have some questions about security and preventing abuse.
First question:
I currently sanitize every input with "mysql_real_escape_string()" in my database class. In addition, I sanitize HTML values with "htmlspecialchars(htmlentities(strip_tags($var))).
How would you sanitize your "HTML inputs" while adding them to database, or the way I'm doing it works perfect?
Second question:
While I was making this question, there was a question with "similar title" so I readed it once. It was someone speaking about "abused HTML inputs" to mess with his valid template. (e.g just input)
It may occur on my current system too. How should it be dealt with in PHP?
Ps. I want to keep using NicEdit, so using BBCode system should be the last advice.
Thank you.
mysql_real_escape_string is not sanitization, it escapes text values to keep the syntax of the SQL query valid/unambiguous/injection safe.
strip_tags is sanitizing your string.
Doing both htmlentities and htmlspecialchars in order is overkill and may just garble your data. Since you're also stripping tags right before that, it's double overkill.
The rule is to make sure your data doesn't break your SQL syntax, therefore you mysql_real_escape_string once before putting the data into the query. You also do the same thing, protecting your HTML syntax, by HTML escaping text before outputting it into HTML, using either htmlspecialchars (recommended) or htmlentities, not both.
For a much more in-depth excursion into all this read The Great Escapism (Or: What You Need To Know To Work With Text Within Text).
I don't know NicEdit, but I assume it allows your users to style text using HTML behind the scenes. Why are you stripping the HTML from the data then? There's no point in using a WYSIWYG editor then.
This is a function I am using in one of my NICEDIT applications and it seems to do well with the code that comes out of nicedit.
function cleanFromEditor($text) {
//try to decode html before we clean it then we submit to database
$text = stripslashes(html_entity_decode($text));
//clean out tags that we don't want in the text
$text = strip_tags($text,'<p><div><strong><em><ul><ol><li><u><blockquote><br><sub><img><a><h1><h2><h3><span><b>');
//conversion elements
$conversion = array(
'<br>'=>'<br />',
'<b>'=>'<strong>',
'</b>'=>'</strong>',
'<i>'=>'<em>',
'</i>'=>'</em>'
);
//clean up the old html with new
foreach($conversion as $old=>$new){
$text = str_replace($old, $new, $text);
}
return htmlentities(mysql_real_escape_string($text));
}

How to make a textarea ouput textarea code without breaking?

I created a form where users can enter html code and it outputs their code in another textarea. The problem is that if the html the user enters has a textarea in the code, the in their code breaks my textarea form. I see other sites display any html correctly so how is this done without breaking the form and allowing the user to copy it so that it still remains as and not some converted code so they can paste it on their webpage?
Ah crap yeah I figured it out, in fact the problem wasn't with the htmlspecialchars code alone I forgot to add a return to one of my functions haha. Thanks guys.
Represent characters that have special meaning in HTML using entities. Since you are using PHP, use htmlspecialchars
There are millions and millions of ways to do this. The easiest is to use htmlspecialchars or htmlentities on the user's input. This will make a visual </textarea> in the textarea box without closing it. This actually turns it into </textarea>. htmlspecialchars transforms less characters than htmlentities and usually makes more sense to use in a situation like this, but do your research.
strip_tags() is also a possibility.
You can also use a regular expression with PCRE, or even str_replace() or other string manipulation functions to strip off the textarea, convert the special characters, etc.
PECL also as a BB code extension you can use if you still want your users to be able to enter some for of tags to style their output.
<textarea><?php echo htmlentities($code); ?></textarea>
You have to transform the html code into symbols, so it is not treated as html.
Use the function htmlentities() on the textarea content before echoing it.

unescaping some HTML tags after they've been escaped for XSS

I have html stored in the database and I need to output it to the page.
If I don't escape() it, then I get the bold formatting I want, but I run the risk of getting an XSS from the unescaped html source.
If I escape() it, then it shows the raw html code <b>bold text</b> instead of bold text.
How can I escape everything, except some tags? I'm thinking to apply the escape(), then search for the <b> and </b> and unescape them. Would that work? Any security problems you see with it? I'm also not sure how I would search for the <b></b> tags. Regex for that maybe or what?
P.S. the escape() I mean is a function in Zend. I believe it's the equivalent of htmlspecialchars().
Unescaping is the way to go. If you only whitelist a couple of tags to be converted back from the html escapes, then you won't run into XSS exploits.
Workaround markups provide no advantage regarding that, as the many failed BBcode parsers prove.
(Instead of converting back and forth it might however be sensible to utilize HTMLPurifier instead.)
If the HTML-markup in the database comes from users you do not trust, you should give them access to markdown or similar 'safe' editing environments, so they can prepare the markup they want and not be allowed to inject HTML.
Attempts to perform selective filtering are frequently wrong, and miss ways attackers can inject malicious code. So don't let them write raw HTML.
htmlspecialchars_decode() is the opposite of htmlspecialchars(). It is possible to unescape it, but there's no parameter for restricting tags.
If the html is written by the user it is bad idea :)
You could use the HTMLPurifier library which will take care of everything you need to do with escaping and such. Here is a nice video explaining how to install it into the zend framework
http://www.zendcasts.com/htmlpurifier-integration/2011/05/
try use strip_tags in the second parameter is the $ allowable_tags
Use Zend_Filter_StripTags class and as argument for the constructor use an array with following keys:
'allowTags' => Tags which are allowed
'allowAttribs' => Attributes which are allowed
This second part allows you to trim all unwanted attribs like 'onClick' etc whose can be as dangerous as <script> code, but you can leave 'src' for <img> or 'href' for <a>
Create your own view helper or you can also use setEscape() in controller See http://framework.zend.com/manual/en/zend.view.scripts.html#zend.view.scripts.escaping

How to secure a form?

I was reading an article about form security because I have a form in which a user can add messages.
I read that it was best to use strip_tags(), htmlspecialchars() and nl2br(). Somewhere else it is being said to use html_entity_decode().
I have this code in my page which takes the user input
<?php
$topicmessage = check_input($_POST['message']); //protect against SQLinjection
$topicmessage = strip_tags($topicmessage, "<p><a><span>");
$topicmessage = htmlspecialchars($topicmessage);
$topicmessage = nl2br($topicmessage);
?>
but when i echo the message, it's all on one line and it appears that the breaks have been removed by the strip_tags and not put back by nl2br().
To me, that makes sense why it does that, because if the break has been removed, how does it know where to put it back (or does it)?
Anyway, i'm looking for a way where i can protect my form for being used to try and hack the site like using javascript in the form.
You have 2 choices:
Allow absolutely no HTML. Use strip_tags() with NO allowed tags, or htmlspecialchars() to escape any tags that may be in there.
Allow HTML, but you need to sanitize the HTML. This is NOT something you can do with strip_tags. Use a library (Such as HTMLPurifier)...
You just need htmlspecialchars before printing form content, and mysql_real_escape before posting into SQL(you don't need it before printing), and you should be good.
Doing your way of stipping tags is very dangerous, you need short list of allowed tags with limited attributes - this is not something you can do in 1 line. You might want to look into HTML normalizers, like Tidy.
Use HTML Purifier for html-input and strip everything you dont want - all but paragraphs, all anchors etc.
Unrelated but important:
sprintf for stuff like "only digits from that field".
mysql-real-escape-string.php on all insert queries in general.

How to rescrict a JavaScript script being inserted in the database with PHP?

I have one problem regarding the data insertion in PHP.
In my site there is a message system.
So when my inbox loads it gives one JavaScript alert.
I have searched a lot in my site and finally I found that someone have send me a message with the text below.
<script>
alert(5)
</script>
So how can I restrict the script code being inserted in my database?
I am running on PHP.
There is no problem with JavaScript code being stored in the database. The actual problem is with non-HTML content being taken from the database and displayed to the user as if it were HTML. The correct approach would be to make sure your rendering code treats text as text, not as HTML.
In PHP, this would be done by calling htmlspecialchars on the inbox contents when displaying the inbox (possibly along with nl2br and maybe turning links to <a> tags).
Avoid using striptags for text content: as an user, I might want to type a message like:
... and to create a link, use your-text-here ...
striptags would eliminate the tag, htmlspecialchars would make the text appear as it was typed.
You should not restrict it to be inserted into the database (if StackOverflow would restrict it, we would not be able to post code examples here!)
You should better control how you display it. For instance, add htmlentities() or htmlspecialchars() to your echo call.
This is called XSS. There are numerous threads about it on SO.
How to prevent XSS with HTML/PHP?
What are the best practices for avoid xss attacks in a PHP site?
XSS Attacks Prevention
Is preventing XSS and SQL Injection as easy as does this…?
You should use strip_tags. If you still want to allow some HTML, then add a whitelist in the second parameter.
I should add a really big caveat here. If you're leaving any tags in a strip_tags whitelist, you can still be susceptible to javascript injection. Assume you're allowing the seemingly innocuous tags <strong> and <em>:
Strip tags will still allow all attributes, including event handlers
like <strong onmouseover="window.href=http://mydodgysite.com">this</strong>.
You have a couple of serious options:
strip_tags with no whitelist. Safe, but doesn't allow for any formatting, and may cause problems with strings like this: "x<y, but y>4" --> "x4"
htmlentities. Use this when displaying the data on the screen (not on the data before you put it in the database). It's safe, but doesn't allow for formatting.
A different markup system than HTML, for example: Markdown, Wiki markup, BB Code. Requires rendering to convert back to HTML, but it's mostly safe and can be quite flexible.
User input should be escaped before outputting it.
Whenever you're displaying something a user submitted, run it through htmlspecialchars() first. This'll turn HTML code into safe output.
Take a look at the htmlspecialchars() function. It converts < > ' " and & to their html entity equilivents, meaning <script> will become <script>
You can use strip_tags(). The second argument of this function will allow you to list an explicit list of which tags are allowable:
// Allow <p> and <a>, <script> will be stripped
echo strip_tags($text, '<p><a>');
You may also consider htmlspecialchars(), which converts characters like < into <, causing the browser to interpret them as text, rather than code:
$new = htmlspecialchars("<a href='test'>Test</a>", ENT_QUOTES);
echo $new; // <a href='test'>Test</a>
If I understand you right, you're just looking for two simple commands:
$message = str_replace($message, "<", "<");
$message = str_replace($message, ">", ">");

Categories