What characters to strip from messages?

What characters to strip from messages? - php

I'm quite surprised I haven't been able to find out what characters I need to strip from a message in order to keep my application safe.
I've got a php app, and most of the inputs are numerical, but I'm adding the ability for users to attache messages, so I need to cleanse the message and strip any characters that could be a threat.
My initial reaction was if I did
$message=addslashes(preg_replace('/[^a-zA-Z0-9\-,& $%\(\)##!\'\"?.]/','',$_POST['message']));
I'd be safe, but I haven't been able to find anything which states what characters can be damaging, and what characters would be safe.

I would say that you don't have to strip any characters from your input, at least generally speaking.
Instead, you must escape your data :
when sending it to your database
see mysql_real_escape_string, mysqli_real_escape_string, PDO::quote
or Prepared statements : MySQLi ; PDO
when sending it to the HTML output
see htmlspecialchars
Still, if you allow users to input HTML, you should take a look at HTMLPurifier, to make sure they are not able to inject any malicious HTML code into your web-pages :
HTML Purifier is a standards-compliant
HTML filter library written in PHP.
HTML Purifier will not only remove all
malicious code (better known as
XSS) with a thoroughly audited, secure yet permissive whitelist, it
will also make sure your documents are
standards compliant

This is where HTML Purifier comes in handy.

Instead of sanitizing your data just use Prepared Statements for database interaction. PDOs eliminate the need of hand santizing all of your input yourself.
PHP Manual

Related

php 5.5 text area input filtering & output escaping

From researching the best way to sanitize HTML form data, I have found that you should "FILTER INPUT" and "ESCAPE OUTPUT."
Unfortunately most of the info on this subject that out there is old information (PHP 4.x, 2009, etc.). Many web pages recommend mysql_real_escape_string which is deprecated as of PHP 5.5.0.
I am on PHP 5.5.x, Apache web server, and a MySQL database. I am using PDO prepared statements. All character sets are UTF-8.
I have looked at the PHP Sanitize filters (found here: http://www.php.net/manual/en/filter.filters.sanitize.php).
I have filtered all inputs and escaped all outputs except for my textareas. The text areas are for the user to "describe an event." These are fairly large at 500 characters. This is the one spot that I feel most vulnerable to malicious code.
The first thing that I do to all input is the trim() function.
What are the best filters to run on input?
How should I be escaping output?
I want the output to be readable.
Thank you in advance.

The answer depends about the input kind you want to filter.
Let's assume you want to only simple plain text in your textarea, so you need to avoid javascript, css and html code also you need to prevent SQL Injection in this input and get only 500 characters of length.
Use strip_tags to avoid HTML, CSS and JS tags.
Use str_replace("'","\'", $string) to prevent SQL Injection parsing
single quote.
Use substr to cut the string in 500 characteres if the
are more than the limit.
I recommend you visit :
https://www.owasp.org/index.php/Top_10_2013-Top_10
Also I recommend use a PHP Framework to avoid the tradicional way in you work with this common security risks.
In my opinion you can use Zend, Code Igniter or Laravel.
http://ellislab.com/codeigniter/user-guide/libraries/input.html
http://framework.zend.com/manual/2.0/en/modules/zend.filter.html
http://laravel.com/docs/validation

Protection against XSS exploits?

I'm newish to PHP but I hear XSS exploits are bad. I know what they are, but how do I protect my sites?

To prevent from XSS attacks, you just have to check and validate properly all user inputted data that you plan on using and dont allow html or javascript code to be inserted from that form.
Or you can you Use htmlspecialchars() to convert HTML characters into HTML entities. So characters like <> that mark the beginning/end of a tag are turned into html entities and you can use strip_tags() to only allow some tags as the function does not strip out harmful attributes like the onclick or onload.

Escape all user data (data in the database from user) with htmlentities() function.
For HTML data (for example from WYSIWYG editors), use HTML Purifier to clean the data before saving it to the database.

strip_tags() if you want to have no tags at all. Meaning anything like <somthinghere>
htmlspecialchars() would covert them to html so the browser will only show and not try to run.
If you want to allow good html i would use something like htmLawed or htmlpurifier

The bad news
Unfortunately, preventing XSS in PHP is a non-trivial undertaking.
Unlike SQL injection, which you can mitigate with prepared statements and carefully selected white-lists, there is no provably secure way to separate the information you are trying to pass to your HTML document from the rest of the document structure.
The good news
However, you can mitigate known attack vectors by being particularly cautious with your escaping (and keeping your software up-to-date).
The most important rule to keep in mind: Always escape on output, never on input. You can safely cache your escaped output if you're concerned about performance, but always store and operate on the unescaped data.
XSS Mitigation Strategies
In order of preference:
If you are using a templating engine (e.g. Twig, Smarty, Blade), check that it offers context-sensitive escaping. I know from experience that Twig does. {{ var|e('html_attr') }}
If you want to allow HTML, use HTML Purifier. Even if you think you only accept Markdown or ReStructuredText, you still want to purify the HTML these markup languages output.
Otherwise, use htmlentities($var, ENT_QUOTES | ENT_HTML5, $charset) and make sure the rest of your document uses the same character set as $charset. In most cases, 'UTF-8' is the desired character set.
Why shouldn't I filter on input?
Attempting to filter XSS on input is premature optimization, which can lead to unexpected vulnerabilities in other places.
For example, a recent WordPress XSS vulnerability employed MySQL column truncation to break their escaping strategy and allow the prematurely escaped payload to be stored unsafely. Don't repeat their mistake.

PHP & MySQL question user submitted data?

I was wondering how can I allow users to enter HTML and CSS to the there profiles safely using PHP and MySQL like they do on MySpace.

You certainly want to carefully sanitize the data and limit it to a set of "unharmful" statements. E.g. http://htmlpurifier.org/ can help you with that.
HTML Purifier is a standards-compliant
HTML filter library written in
PHP. HTML Purifier will not only remove all malicious
code (better known as XSS) with a thoroughly audited,
secure yet permissive whitelist,
it will also make sure your documents are
standards compliant
When you put the data into the database use prepared statements or carefully escape the data.

There are at least two things to consider :
safely saving the HTML data into the database
safely outputing the HTML data on the page.
For the first point, you must avoid SQL injections.
This can be done by escaping your string data, before injecting it into your insert query, with functions such as mysql_real_escape_string, mysqli_real_escape_string, or PDO::quote, depending on the API you are using.
Also, you might be interested by Prepared Statements ; see PDO::prepare, or mysqli_prepare.
For the second point, what matters is only allowing HTML tags and attributes that you consider as safe.
This can be done using a tool such as HTMLPurifier, to filter out all bad/non-accepted tags and attributes, and only keep the subset you whish to allow.
For example, if you consider the following HTML input :
<p>hello, world !</p>
<script type="text/javascript">alert('bad');</script>
<strong>this is <em>some text</strong></em>
HTMLPurifier will transform / purify it to :
<p>hello, world !</p>
<strong>this is <em>some text</em></strong>
Note that :
the <script> tag has been removed
the <em> and <strong> tags have been put back in the right order, making the HTML XHTML valid.

What is the correct/safest way to escape input in a forum?

I am creating a forum software using php and mysql backend, and want to know what is the most secure way to escape user input for forum posts.
I know about htmlentities() and strip_tags() and htmlspecialchars() and mysql_real_escape_string(), and even javascript's escape() but I don't know which to use and where.
What would be the safest way to process these three different types of input (by process, I mean get, save in a database, and display):
A title of a post (which will also be the basis of the URL permalink).
The content of a forum post limited to basic text input.
The content of a forum post which allows html.
I would appreciate an answer that tells me how many of these escape functions I need to use in combination and why.
Thanks!

When generating HTLM output (like you're doing to get data into the form's fields when someone is trying to edit a post, or if you need to re-display the form because the user forgot one field, for instance), you'd probably use htmlspecialchars() : it will escape <, >, ", ', and & -- depending on the options you give it.
strip_tags will remove tags if user has entered some -- and you generally don't want something the user typed to just disappear ;-)
At least, not for the "content" field :-)
Once you've got what the user did input in the form (ie, when the form has been submitted), you need to escape it before sending it to the DB.
That's where functions like mysqli_real_escape_string become useful : they escape data for SQL
You might also want to take a look at prepared statements, which might help you a bit ;-)
with mysqli - and with PDO
You should not use anything like addslashes : the escaping it does doesn't depend on the Database engine ; it is better/safer to use a function that fits the engine (MySQL, PostGreSQL, ...) you are working with : it'll know precisely what to escape, and how.
Finally, to display the data inside a page :
for fields that must not contain HTML, you should use htmlspecialchars() : if the user did input HTML tags, those will be displayed as-is, and not injected as HTML.
for fields that can contain HTML... This is a bit trickier : you will probably only want to allow a few tags, and strip_tags (which can do that) is not really up to the task (it will let attributes of the allowed tags)
You might want to take a look at a tool called HTMLPUrifier : it will allow you to specify which tags and attributes should be allowed -- and it generates valid HTML, which is always nice ^^
This might take some time to compute, and you probably don't want to re-generate that HTML each time is has to be displayed ; so you can think about storing it in the database (either only keeping that clean HTML, or keeping both it and the not-clean one, in two separate fields -- might be useful to allow people editing their posts ? )
Those are only a few pointers... hope they help you :-)
Don't hesitate to ask if you have more precise questions !

mysql_real_escape_string() escapes everything you need to put in a mysql database. But you should use prepared statements (in mysqli) instead, because they're cleaner and do any escaping automatically.
Anything else can be done with htmlspecialchars() to remove HTML from the input and urlencode() to put things in a format for URL's.

There are two completely different types of attack you have to defend against:
SQL injection: input that tries to manipulate your DB. mysql_real_escape_string() and addslashes() are meant to defend against this. The former is better, but parameterized queries are better still
Cross-Site scripting (XSS): input that, when displayed on your page, tries to execute JavaScript in a visitor's browser to do all kinds of things (like steal the user's account data). htmlspecialchars() is the definite way to defend against this.
Allowing "some HTML" while avoiding XSS attacks is very, very hard. This is because there are endless possibilities of smuggling JavaScript into HTML. If you decided to do this, the safe way is to use BBCode or Markdown, i.e. a limited set of non-HTML markup that you then convert to HTML, while removing all real HTML with htmlspecialchars(). Even then you have to be careful not to allow javascript: URLs in links. Actually allowing users to input HTML is something you should only do if it's absolutely crucial for your site. And then you should spend a lot of time making sure you understand HTML and JavaScript and CSS completely.

The answer to this post is a good answer
Basically, using the pdo interface to parameterize your queries is much safer and less error prone than escaping your inputs manually.

I have a tendency to escape all characters that would be problematic in page display, Javascript and SQL all at the same time. It leaves it readable on the web and in HTML eMail and at the same time removes any problems with the code.
A vb.NET Line Of Code Would Be:
SafeComment = Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
HttpUtility.HtmlEncode(Trim(strInput)), _
":", ":"), "-", "-"), "|", "|"), _
"`", "`"), "(", "("), ")", ")"), _
"%", "%"), "^", "^"), """", """), _
"/", "/"), "*", "*"), "\", "\"), _
"'", "'")

First of all, general advice: don't escape variables literally when inserting in the database. There are plenty of solutions that let you use prepared statements with variable binding. The reason to not do this explicitly is because it is only a matter of time then before you forget it just once.
If you're inserting plain text in the database, don't try to clean it on insert, but instead clean it on display. That is to say, use htmlentities to encode it as HTML (and pass the correct charset argument). You want to encode on display because then you're no longer trusting that the database contents are correct, which isn't necessarily a given.
If you're dealing with rich text (html), things get more complicated. Removing the "evil" bits from HTML without destroying the message is a difficult problem. Realistically speaking, you'll have to resort to a standardized solution, like HTMLPurifier. However, this is generally too slow to run on every page view, so you'll be forced to do this when writing to the database. You'll also have to ensure that the user can see their "cleaned up" html and correct the cleaned up version.
Definitely try to avoid "rolling your own" filter or encoding solution at any step. These problems are notoriously tricky, and you run a large risk of overlooking some minor detail that has big security implications.

I second Joeri, do not roll your own, go here to see some of the the many possible XSS attacks
http://ha.ckers.org/xss.html
htmlentities() -> turns text into html, converting characters to entities. If using UTF-8 encoding then use htmlspecialchars() instead as the other entities are not needed. This is the best defence against XSS. I use it on every variable I output regardless of type or origin unless I intend it to be html. There is only a tiny performance cost and it is easier than trying to work out what needs escaping and what doesn't.
strip_tags() - turns html into text by removing all html tags. Use this to ensure that there is nothing nasty in your input as a adjunct to escaping your output.
mysql_real_escape_string() - escapes a string for mysql and is your defence against SQL injections from little Bobby tables (better to use mysqli and prepare/bind as escaping is then done for you and you can avoid lots of messy string concatenations)
The advice given obve re avoiding HTML input unless it is essential and opting for BBCode or similar (make your own up if needs be) is very sound indeed.

Comprehensive server-side validation

I currently have a fairly robust server-side validation system in place, but I'm looking for some feedback to make sure I've covered all angles. Here is a brief outline of what I'm doing at the moment:
Ensure the input is not empty, or is too long
Escape query strings to prevent SQL injection
Using regular expressions to reject invalid characters (this depends on what's being submitted)
Encoding certain html tags, like <script> (all tags are encoded when stored in a database, with some being decoded when queried to render in the page)
Is there anything I'm missing? Code samples or regular expressions welcome.

You shouldn't need to "Escape" query strings to prevent SQL injection - you should be using prepared statements instead.
Ideally your input filtering will happen before any other processing, so you know it will always be used. Because otherwise you only need to miss one spot to be vulnerable to a problem.
Don't forget to encode HTML entities on output - to prevent XSS attacks.

You should encode every html tag, not only 'invalid' ones. This is a hot debate, but basically it boils down to there will always be some invalid HTML combination that you will forget to handle correctly (nested tags, mismatched tags some browsers interpret 'correctly' and so on). So the safest option in my opinion is to store everything as htmlentities and then, on output, print a validated HTML-safe-subset tree (as entities) from the content.

Run all server-side validation in a library dedicated to the task so that improvements in one area affect all of your application.
Additionally include work against known attacks, such as directory traversal and attempts to access the shell.

This Question/Answer has some good responses that you're looking for
(PHP-oriented, but then again you didn't specify language/platform and some of it applies beyond the php world):
What's the best method for sanitizing user input with PHP?

You might check out the Filter Extension for data filtering. It won't guarantee that you're completely airtight, but personally I feel a lot better using it because that code has a whole lot of eyeballs looking over it.
Also, consider prepared statements seconded. Escaping data in your SQL queries is a thing of the past.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.