html sanitization makes difficulties - php

I read user profiles from database and show them. Before I show them I use HTML sanitizing through php htmlentities. It shows them correctly. But, while allowing user to edit it, it is shown like double filtered.
echo '<input id="about" name="about" value="'.$php_filtered_value>.'">';
Then inside the input, ampersand would look like &
If I don't filter the variable there is worry about html injection.
What should I do?

I prefer to follow OWASP RULE#2:
> RULE #2 - Attribute Escape Before Inserting Untrusted Data into HTML
Requirements:
-Aggressive HTML Entity Encoding
-Only place untrusted data into a whitelist of safe attributes (listed below).
-Strictly validate unsafe attributes such as background, id and name.
Please see XSS (Cross Site Scripting) Prevention Cheat Sheet

Don't double escape the text then (as in: once before storing it in the database and again before echoing it).
Unescaped (what you typed): Able & Baker
Escaped once (what is being stored in the DB): Able & Baker
Double-escaped (what is ending up in your HTML): Able &amp; Baker
Rather, escape the text only once: generally on the output side, not on the input side.

Related

Is escaping output from MySQL server necessary if data being retrieved has already been sanitized?

I'm interested to know whether or not it is necessary to escape output from a MySQL server if the data that is being retrieved has already been filtered when the user submitted a form.
Example:
1. The user submits a form with a comment for a blog post.
2. On form submission, prior to sending data to MySQL server, their input is filtered with FILTER_SANITIZE_SPECIAL_CHARS to prevent injection attacks.
3. Once the data has been posted to server, the user is rerouted to another screen where they can view their comment.
4. When retrieving their comment from the server (which has stored the filtered input), is it necessary to escape this output as well?
Here's the main issue for me. I'm taking user input from a form (for a blog post), sanitizing it with FILTER_SANITIZE_SPECIAL_CHARS, and then posting it to the MySQL server. If I retrieve this information from the server and display it in html, there are no issues. HOWEVER, I have been reading that you should ALWAYS escape output from servers as well. So I escaped the same post with htmlspecialchars(). Now, I have the issue that ALL special chars (including parentheses, and any quotes that are used by the user in their post) are coming back in their escaped html format. Not user friendly whatsoever.
What is the best work around for this, or is it even necessary to escape the output if it is coming from the server and has already been sanitized on user input?
Sanitization is not the same as escaping, and you should make sure not to confuse the two.
Sanitization is removing unwanted input. That is, if the user adds a <script> tag to their input, and you don't want their input to include <script> tags, then removing that <script> tag would be sanitization. Sanitization is not escaping data for an output context.
Escaping is properly encoding data for an output context. For example, to prevent HTML injection, you might call htmlspecialchars() to correctly encode & as &. To prevent SQL injection, you might use mysqli::real_escape_string() to convert ' to \'. (Though it would be highly preferable to use prepared statements / parameterized queries to prevent having to worry about sql injection or escaping at all.)
Importantly, escaping is context-specific. An escaping you use for HTML is not necessarily valid or sufficient for SQL (or vice-versa, or any other output context).
The problem with FILTER_SANITIZE_SPECIAL_CHARS is that that it's poorly named: it's doing both in one step, which is confusing for your database (since your database now has html-encoded data), and confusing for output (because now you have already-escaped data that is vulnerable to being multiply-escaped).
Instead, you should explicitly separate your sanitization and escaping efforts. Only sanitize data on input that you don't want to persist. Only escape data on output, and according to its proper output context.
The reason you want to store raw (pre-output-escaped) data in the database is so that if you ever need to output to a different context (e.g. now you're dong JSON output, or you need to write it to a file, or actually see what the raw data is), you won't need to unescape it first. (If you really have to, you might reasonably store a pre-escaped copy in a separate column, but you should always have your original data available.) It also makes the rule simple: always sanitize input; always escape output.

Zend\escap data before inserting into database

I'm adding some xss protection to the website I'm working on, the platform is zendFrameWork 2 and therefor I'm using Zend\escaper. from zend documentation i knew that:
Zend\Escaper is meant to be used only for escaping data that is to be
output, and as such should not be misused for filtering input data.
For such tasks, the Zend\Filter component, HTMLPurifier.
but what are the riskes if i escaped the data before inserting it into the database, am i so wrong to do that? please explane to me as im somehow new to this topic.
thanks
When encoding data before storing it you will have to decode it before you can do anything sensible with it before outputting it. That's why I'd not do it.
Let's say you have an international application and you want to store the escaped value of a form field which might contain any NON-ASCII characters those might become escaped into HTML-Entities. So what if you have to quantify the content of that field? Like counting the characters? You will always have to de-escape the content before counting it. and then you have to re-escape it again. Much work done but nothing gained.
The same applies to search-operations in your database. You will have to escape the search-phrase the same way then your input for the database to understand what you are looking for.
I'd use one character-set throughout the application and database (I prefer UTF-8, beware of the MySQL-Connection....) and only escape content on output. Thant way I can then do whatever I like with the data and are on the safe side on output. And escaping is done in my view-layer automaticaly so I don't even have to think about it every time I handle data as it works automaticaly. That way you can't forget it.
That does not prevent me from filtering and sanitizing the input. And it doesn't prevent me from escaping the database-content using the appropriate database-escaping mechanisms like mysqli_real_escape_string or similar or using prepared statements!
But that's just my opinion, others might think otherwise!
"Output" here refers to the web page. A form field ( HTML tag) is an INPUT (from the webpage), any text is an OUTPUT (to the webpage). You need to ensure any output (to the webpage) does not contain dangerous characters that could be used to forge XSS attack vectors.
This said, if you have DANGEROUS_INPUT_X given by the user and then
$NOT_DANGEROUS_ANYMORE = ZED.HtmlPurifier(DANGEROUS_INPUT_X)
DBSave($NOT_DANGEROUS_ANYMORE)
and somewhere else
$OUTPUT = DBLoad($NOT_DANGEROUS_ANYMORE)
echo $OUTPUT
you should be fine, as long as you do not apply any additional encoding/decoding to this output. It will be displayed in the way it is saved, that was safe.
I would suggest to look at output encoding more than validation: HtmlPurifier cleans the HTML, while you could accept any kind of bad characters if you ensure your output is encoded in the page.
Here https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet some general rules, here the PHP example
echo htmlspecialchars($DANGEROUS_INPUT_X_NOW_OUTPUT, ENT_QUOTES, "UTF-8");
Remember to set the Character Set and be consistent with the same one throughout your pages/scripts/binaries and in the database as well.

Should I escape characters in my GET and POST requests?

I have just read that PHP escapes incoming GET and POST requests on its own for some time. Double escaping does no good. Should I escape the strings at all?
For example I process a simple input like this:
$contact = mysqli_real_escape_string($dbLink, strip_tags($_POST['contact']));
Later, when saved and retrieved from the database I fill the input with last values, like:
echo '<input type="text" class="form-control" id="inputContact" name="contact" value="'.$contact.'">'.PHP_EOL;
When someone enters quotes in the field, it returns something like this, which destroys the form:
<input type="text" class="form-control" id="inputContact" name="contact" value="0900 123 456, jozefmat" ejkasdfadsf"="">
I have just read that PHP escapes incoming GET and POST requests on its own for some time
This is magic quotes, they were always ineffective and more trouble then they were worth. They have been deprecated and modern versions of PHP do not support them at all.
Should I escape the strings at all?
Yes. You should perform suitable sanitization of untrusted data (either via escaping, white list filtering or some other suitable means) as is applicable for the place you are putting the data (which is different depending on if you are inserting it into a database query (search for SQL injection), an HTML document (search for XSS or Cross-Site Scripting) or somewhere else).
As you have noticed, the options you have available to do even within an HTML document vary - what is suitable for "Inside an element" is not always suitable for "Inside an attribute value" or "Inside a script element".
It really depends upon what you are using the responses for and if you are manipulating them after.
For instance, if you are inserting the data into a DB then you should escape your data to prevent SQL injections. If you are just displaying the data then there should be no need unless you are displaying specific characters. But you can also exchange them for HTML entities
in the example you gave, you should escape them because that is how SQL injection works
it depends.If your retrieving or sending anything to a database then yes you should escape characters to avoid mysql injection.But if its just dealing with the client side(browser) aspect then it wouldnt be a big deal

Saving textarea in the database

I've been searching about this, but I can't find the most important part - what field to use.
I want to save a textarea without allowing any kind of javascript, html or php. What functions should I run the posted textarea through before saving it in the database? And what field type should I use for it in the database? It'll be a description, max 1000 chars.
There are a number of ways to go around in removing/handling code so that it can be saved in your database.
Regular Expressions
One way (but may be hard and unreliable) is to remove/ detect code using regular expressions.
For example, the following removes all script tags using php code (Taken from here):
$mystring = preg_replace('/<script\b[^>]*>(.*?)<\/script>/is', "", $mystring)
The stip_tags PHP function
You can also make use of the built in stip_tags function which strips HTML and PHP tags from a string. The manual provides several examples, one shown below for your convenience:
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> Other text';
echo strip_tags($text);
echo "\n";
// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>
HTML Purifier
You can check out HTML Purifier, which is a common HTML filter PHP library intended to detect and remove dangerous code.
Simple code found on their Getting Started Section:
require_once '/path/to/HTMLPurifier.auto.php';
$config = HTMLPurifier_Config::createDefault();
$purifier = new HTMLPurifier($config);
$clean_html = $purifier->purify($dirty_html);
In Practice (Safe Output)
If you are trying to avoid XSS attacks or Injection attacks, cleaning user data is the wrong way to go about it. Removing tags is not a 100 % guarantee for keeping your service safe from these attacks. Therefore, in practice, user data containing code is not usually filtered/ cleaned, but rather escaped during output. More specifically, the special characters within the string are escaped, where these characters are based on the syntax of the language. An example of this is making use of PHP's htmlspecialchars function in order to convert special characters to their respective HTML entities. A Code Snippet taken from manual is shown below:
<?php
$new = htmlspecialchars("<a href='test'>Test</a>", ENT_QUOTES);
echo $new; // <a href='test'>Test</a>
?>
For more information about escaping and a very good explanation related to your question, look at this page. It shows you other forms of output escaping. Also, for a question and answer related to escaping, click here.
Furthermore, one more short but VITAL point I want to throw at you is that ANY data received from a user CANNOT be trusted.
SQL Injection Attacks
Definition (From here)
A SQL injection attack consists of insertion or "injection" of a SQL
query via the input data from the client to the application. A
successful SQL injection exploit can read sensitive data from the
database, modify database data (Insert/Update/Delete), execute
administration operations on the database (such as shutdown the DBMS),
recover the content of a given file present on the DBMS file system
and in some cases issue commands to the operating system.
For SQL Injection attacks: Use prepared statements and parameterized queries when storing information to the database. (Question and Answer found here) A tutorial of prepared statements using PDO can be found here.
Cross-site Scripting (XSS)
Definition (from here):
Cross-Site Scripting attacks are a type of injection problem, in which
malicious scripts are injected into the otherwise benign and trusted
web sites. Cross-site scripting (XSS) attacks occur when an attacker
uses a web application to send malicious code, generally in the form
of a browser side script, to a different end user.
I personally like this image for a better understanding.
For XSS attacks: you should consult this famous page, which describes rule by rule on what needs to be done.
TLDR:
It is conventional to use htmlspecialchars() to encode text on output, rather than filter the text on input. A text field is fine for this purpose.
What you need to defend against
You are trying to protect yourself from XSS. XSS happens when users can stored HTML control characters on your site. Other users will see this HTML markup, so a malicious user can use your page to redirect people to other sites or steal cookies and so on.
You need to consider this for all of your inputs: this should include any varchar or text field that can be stored in your database; not just your textareas. I can add malicious content to an input field just as easily as I can add it to a textarea.
How do we defend against this?
Let's say that a user claims that their username is:
<script src="http://example.com/malicious.js"></script>
The simplest way to handle this is to save this into the database "as is". However, whenever you echo it on the site, you should filter it through the PHP htmlspecialchars() function:
echo 'Hi, my name is ' . htmlspecialchars($user->username) . '!';
htmlspecialchars turns the HTML control characters (<, >, &, ', and ") into their HTML Entities (<, >, &, &apos;, and "). This would look like the original character in a browser (i.e.: to normal users), but it would not act like actual HTML markup.
The result is that instead of malicious JavaScript, the user's name would literally look like <script src="http: //example.com/malicious.js"></script>.
Why filter on output? Why not on input?
1 - OWASP recommends this way
2 - If you forget to protect an input field, and someone figures it out and adds malicious content, you now need to find the malicious content in the database and repair the fault code on your site.
3 - If you forget to encode an output field, and someone manages to sneak in malicious input, then you only need to repair the faulty code on your site.
4 - It is possible for users to write usernames that would break the HTML fields used to edit the usernames. If you encode the content before you store it in the database, then you need to display it "as is" in the appropriate input fields (let's assume that an admin or the user can change their username later). But, let's suppose that a user found a way to inject malicious code into the database. What if they said that their username is: " style="display:none;" />. The input field that would let the administrator change this username now looks like:
<input type="text" name="username" value="" style="display:none;" />" />
malicious content -> ^^^^^^^^^^^^^^^^^^^^^^^^^^
Now, the admins can't fix the problem: the input field has disappeared. But, if you encode the text on output, then all of your input fields will have protection against malicous content. Now, your inputs will look like this:
<input type="text" name="username" value="" style="display:none;" />" />
safe content -> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In my specific PHP app, what else can I do against XSS vulnerabilities?

I have read OWASP's XSS prevention cheat sheet but I don't really recognize my application with those rules. I don't feel like I have any of the vulnerabilities pointed out in those rules.
I am doing a PHP application that follows all the following principles:
Not a single user input is displayed directly on the HTML page without being processed and sanitized on the server-side
All my user input are sanitized with htmlentities(). Is that sufficient? (I use prepared statements for SQL injection)
Some of the user input have a maxlength condition of 5 characters on server-side. Does that help protect against XSS? (since I hardly see an XSS code being shorter than 6 characters)
Apart from data from the database, the only user input that is displayed back to the user was sent to the server via ajax, sanitized with htmlentities and reintroduced in the DOM using text() instead of html() (using jQuery)
Should I be concerned about XSS in my case? What else can I do to protect myself from XSS?
All my user input are sanitized with htmlentities(). Is that sufficient? (I use prepared statements for SQL injection)
No. First, you should filter on output, not on input. In programming never trust any data, even those from your own database! On input, you just need to escape it for use in SQL, logs, etc. But you also have to filter basic html + some special characters: \0 & < > ( ) + - = " ' \ on output. htmlentities() is just not enough.
Imagine you have a image on site:
<img src="xxx" onload="image_loaded({some_text_from_db});">
{some_text_from_db} would be );alert(String.fromCharCode(58,53,53)
If you escape it just with htmlentities it will become:
<img src="" onload="image_loaded();alert(String.fromCharCode(58,53,53));">
Some of the user input have a maxlength condition of 5 characters on server-side. Does that help protect against XSS? (since I hardly see an XSS code being shorter than 6 characters)
Always check data on server side, if you want also on client side, its ok, but always do it also on server side. Many modern browsers (chrome,ff,opera) allows user to edit page "on the fly" so they can easily remove the maxlength attribute.
Apart from data from the database, the only user input that is displayed back to the user was sent to the server via ajax, sanitized with htmlentities and reintroduced in the DOM using text() instead of html() (using jQuery)
From .text() jquery documentation:
We need to be aware that this method escapes the string provided as necessary so that it will render correctly in HTML. To do so, it calls the DOM method .createTextNode(), which replaces special characters with their HTML entity equivalents (such as < for <).
So probably yes, it should be enough but be aware of escaping from text() like in example above.
Your application filtering should look like this:
INPUT USER -> FILTER -> APPLICATION
OUTPUT APPLICATION -> FILTER -> USER
Not just input filtering.
I suggest using HTMLawed or HTMLPurifier for user input that needs to be displayed as HTML, or just completely stripping all HTML from user input that shouldn't contain it anyway. HTMLPurifier is the more powerful of the two, and I've never had any XSS issues in any projects with which I have used it.

Categories