PHP & MySQL question user submitted data?

PHP & MySQL question user submitted data? - php

I was wondering how can I allow users to enter HTML and CSS to the there profiles safely using PHP and MySQL like they do on MySpace.

You certainly want to carefully sanitize the data and limit it to a set of "unharmful" statements. E.g. http://htmlpurifier.org/ can help you with that.
HTML Purifier is a standards-compliant
HTML filter library written in
PHP. HTML Purifier will not only remove all malicious
code (better known as XSS) with a thoroughly audited,
secure yet permissive whitelist,
it will also make sure your documents are
standards compliant
When you put the data into the database use prepared statements or carefully escape the data.

There are at least two things to consider :
safely saving the HTML data into the database
safely outputing the HTML data on the page.
For the first point, you must avoid SQL injections.
This can be done by escaping your string data, before injecting it into your insert query, with functions such as mysql_real_escape_string, mysqli_real_escape_string, or PDO::quote, depending on the API you are using.
Also, you might be interested by Prepared Statements ; see PDO::prepare, or mysqli_prepare.
For the second point, what matters is only allowing HTML tags and attributes that you consider as safe.
This can be done using a tool such as HTMLPurifier, to filter out all bad/non-accepted tags and attributes, and only keep the subset you whish to allow.
For example, if you consider the following HTML input :
<p>hello, world !</p>
<script type="text/javascript">alert('bad');</script>
<strong>this is <em>some text</strong></em>
HTMLPurifier will transform / purify it to :
<p>hello, world !</p>
<strong>this is <em>some text</em></strong>
Note that :
the <script> tag has been removed
the <em> and <strong> tags have been put back in the right order, making the HTML XHTML valid.

Related

What's the correct way in PHP to save and output HTML from MySQL (safely)

I understand there are numerous similar questions; however, none correctly fully answer my question.
I'm allowing the user to add formatted HTML to a textfield (strong, ul, li). I will then need to safely display, avoiding any XSS etc.
Side note: Using prepared statements.
Should I encode my HTML from the form using htmlentities() (or htmlspecialchars) and input that into the database. I also don't believe using html_entity_decode($html); would protect me from an XSS attack?
I could run the strip_tags(); prior to inputting in MySQL. I'm not sure this is the best?
If I allow the user to input HTML into MySQL and use htmlentities() to display, I want to render the HTML now display it.

validating untrusted HTML input do I have to process each input?

For Cross-site_scripting vulnerabilities
1)is it a good idea to validate and escape each and every one of the user inputs
2)is using strip_tags good enough and what's the benefit of htmlpurifier over it?

Yes this is a good idea. I would go as far as to say if you don't your are an idiot. When storing the data in a database use prepared statements and bound parameters. If you use that (like you should) you don't have to manually escape the data going into the database.
Now for displaying the data it depends what you want to allow and where you are going to output it. If it will be displayed on a HTML page and you don't want to allow any HTML to be rendered use htmlspecialchars($content, ENT_QUOTES). You almost never have to use htmlentities because that will convert ALL characters for which there is an HTML entity. Meaning it will make your document unnecessary bigger. If you want to allow some HTML you would have to filter it before displaying it (using HTML purifier).
Please note that different storage mechanisms and different output media require a different escaping / sanitizing strategy.

Saving textarea in the database

I've been searching about this, but I can't find the most important part - what field to use.
I want to save a textarea without allowing any kind of javascript, html or php. What functions should I run the posted textarea through before saving it in the database? And what field type should I use for it in the database? It'll be a description, max 1000 chars.

There are a number of ways to go around in removing/handling code so that it can be saved in your database.
Regular Expressions
One way (but may be hard and unreliable) is to remove/ detect code using regular expressions.
For example, the following removes all script tags using php code (Taken from here):
$mystring = preg_replace('/<script\b[^>]*>(.*?)<\/script>/is', "", $mystring)
The stip_tags PHP function
You can also make use of the built in stip_tags function which strips HTML and PHP tags from a string. The manual provides several examples, one shown below for your convenience:
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> Other text';
echo strip_tags($text);
echo "\n";
// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>
HTML Purifier
You can check out HTML Purifier, which is a common HTML filter PHP library intended to detect and remove dangerous code.
Simple code found on their Getting Started Section:
require_once '/path/to/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);
In Practice (Safe Output)
If you are trying to avoid XSS attacks or Injection attacks, cleaning user data is the wrong way to go about it. Removing tags is not a 100 % guarantee for keeping your service safe from these attacks. Therefore, in practice, user data containing code is not usually filtered/ cleaned, but rather escaped during output. More specifically, the special characters within the string are escaped, where these characters are based on the syntax of the language. An example of this is making use of PHP's htmlspecialchars function in order to convert special characters to their respective HTML entities. A Code Snippet taken from manual is shown below:
<?php
$new = htmlspecialchars("<a href='test'>Test</a>", ENT_QUOTES);
echo $new; // <a href='test'>Test</a>
?>
For more information about escaping and a very good explanation related to your question, look at this page. It shows you other forms of output escaping. Also, for a question and answer related to escaping, click here.
Furthermore, one more short but VITAL point I want to throw at you is that ANY data received from a user CANNOT be trusted.
SQL Injection Attacks
Definition (From here)
A SQL injection attack consists of insertion or "injection" of a SQL
query via the input data from the client to the application. A
successful SQL injection exploit can read sensitive data from the
database, modify database data (Insert/Update/Delete), execute
administration operations on the database (such as shutdown the DBMS),
recover the content of a given file present on the DBMS file system
and in some cases issue commands to the operating system.
For SQL Injection attacks: Use prepared statements and parameterized queries when storing information to the database. (Question and Answer found here) A tutorial of prepared statements using PDO can be found here.
Cross-site Scripting (XSS)
Definition (from here):
Cross-Site Scripting attacks are a type of injection problem, in which
malicious scripts are injected into the otherwise benign and trusted
web sites. Cross-site scripting (XSS) attacks occur when an attacker
uses a web application to send malicious code, generally in the form
of a browser side script, to a different end user.
I personally like this image for a better understanding.
For XSS attacks: you should consult this famous page, which describes rule by rule on what needs to be done.

TLDR:
It is conventional to use htmlspecialchars() to encode text on output, rather than filter the text on input. A text field is fine for this purpose.
What you need to defend against
You are trying to protect yourself from XSS. XSS happens when users can stored HTML control characters on your site. Other users will see this HTML markup, so a malicious user can use your page to redirect people to other sites or steal cookies and so on.
You need to consider this for all of your inputs: this should include any varchar or text field that can be stored in your database; not just your textareas. I can add malicious content to an input field just as easily as I can add it to a textarea.
How do we defend against this?
Let's say that a user claims that their username is:
<script src="http://example.com/malicious.js"></script>
The simplest way to handle this is to save this into the database "as is". However, whenever you echo it on the site, you should filter it through the PHP htmlspecialchars() function:
echo 'Hi, my name is ' . htmlspecialchars($user->username) . '!';
htmlspecialchars turns the HTML control characters (<, >, &, ', and ") into their HTML Entities (<, >, &, &apos;, and "). This would look like the original character in a browser (i.e.: to normal users), but it would not act like actual HTML markup.
The result is that instead of malicious JavaScript, the user's name would literally look like <script src="http: //example.com/malicious.js"></script>.
Why filter on output? Why not on input?
1 - OWASP recommends this way
2 - If you forget to protect an input field, and someone figures it out and adds malicious content, you now need to find the malicious content in the database and repair the fault code on your site.
3 - If you forget to encode an output field, and someone manages to sneak in malicious input, then you only need to repair the faulty code on your site.
4 - It is possible for users to write usernames that would break the HTML fields used to edit the usernames. If you encode the content before you store it in the database, then you need to display it "as is" in the appropriate input fields (let's assume that an admin or the user can change their username later). But, let's suppose that a user found a way to inject malicious code into the database. What if they said that their username is: " style="display:none;" />. The input field that would let the administrator change this username now looks like:
<input type="text" name="username" value="" style="display:none;" />" />
malicious content -> ^^^^^^^^^^^^^^^^^^^^^^^^^^
Now, the admins can't fix the problem: the input field has disappeared. But, if you encode the text on output, then all of your input fields will have protection against malicous content. Now, your inputs will look like this:
<input type="text" name="username" value="" style="display:none;" />" />
safe content -> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Protection against XSS exploits?

I'm newish to PHP but I hear XSS exploits are bad. I know what they are, but how do I protect my sites?

To prevent from XSS attacks, you just have to check and validate properly all user inputted data that you plan on using and dont allow html or javascript code to be inserted from that form.
Or you can you Use htmlspecialchars() to convert HTML characters into HTML entities. So characters like <> that mark the beginning/end of a tag are turned into html entities and you can use strip_tags() to only allow some tags as the function does not strip out harmful attributes like the onclick or onload.

Escape all user data (data in the database from user) with htmlentities() function.
For HTML data (for example from WYSIWYG editors), use HTML Purifier to clean the data before saving it to the database.

strip_tags() if you want to have no tags at all. Meaning anything like <somthinghere>
htmlspecialchars() would covert them to html so the browser will only show and not try to run.
If you want to allow good html i would use something like htmLawed or htmlpurifier

The bad news
Unfortunately, preventing XSS in PHP is a non-trivial undertaking.
Unlike SQL injection, which you can mitigate with prepared statements and carefully selected white-lists, there is no provably secure way to separate the information you are trying to pass to your HTML document from the rest of the document structure.
The good news
However, you can mitigate known attack vectors by being particularly cautious with your escaping (and keeping your software up-to-date).
The most important rule to keep in mind: Always escape on output, never on input. You can safely cache your escaped output if you're concerned about performance, but always store and operate on the unescaped data.
XSS Mitigation Strategies
In order of preference:
If you are using a templating engine (e.g. Twig, Smarty, Blade), check that it offers context-sensitive escaping. I know from experience that Twig does. {{ var|e('html_attr') }}
If you want to allow HTML, use HTML Purifier. Even if you think you only accept Markdown or ReStructuredText, you still want to purify the HTML these markup languages output.
Otherwise, use htmlentities($var, ENT_QUOTES | ENT_HTML5, $charset) and make sure the rest of your document uses the same character set as $charset. In most cases, 'UTF-8' is the desired character set.
Why shouldn't I filter on input?
Attempting to filter XSS on input is premature optimization, which can lead to unexpected vulnerabilities in other places.
For example, a recent WordPress XSS vulnerability employed MySQL column truncation to break their escaping strategy and allow the prematurely escaped payload to be stored unsafely. Don't repeat their mistake.

What characters to strip from messages?

I'm quite surprised I haven't been able to find out what characters I need to strip from a message in order to keep my application safe.
I've got a php app, and most of the inputs are numerical, but I'm adding the ability for users to attache messages, so I need to cleanse the message and strip any characters that could be a threat.
My initial reaction was if I did
$message=addslashes(preg_replace('/[^a-zA-Z0-9\-,& $%\(\)##!\'\"?.]/','',$_POST['message']));
I'd be safe, but I haven't been able to find anything which states what characters can be damaging, and what characters would be safe.

I would say that you don't have to strip any characters from your input, at least generally speaking.
Instead, you must escape your data :
when sending it to your database
see mysql_real_escape_string, mysqli_real_escape_string, PDO::quote
or Prepared statements : MySQLi ; PDO
when sending it to the HTML output
see htmlspecialchars
Still, if you allow users to input HTML, you should take a look at HTMLPurifier, to make sure they are not able to inject any malicious HTML code into your web-pages :
HTML Purifier is a standards-compliant
HTML filter library written in PHP.
HTML Purifier will not only remove all
malicious code (better known as
XSS) with a thoroughly audited, secure yet permissive whitelist, it
will also make sure your documents are
standards compliant

This is where HTML Purifier comes in handy.

Instead of sanitizing your data just use Prepared Statements for database interaction. PDOs eliminate the need of hand santizing all of your input yourself.
PHP Manual

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.