Should all output be purified in PHP?

Should all output be purified in PHP? - php

I am creating a rather small web application in PHP, where a (trusted) administrator can, amongst other things, store hundreds of objects in a database. The user can enter a number of details about these objects in the form of text fields (an input element with the type attribute set to "text").
The objects with their details are echoed in the form of a table, escaped by the htmlspecialchars function. This function, however, does not prevent against the malicious use of html tags, for example, the <script> tag.
The question is whether all user entered data (every cell in the table) should be purified by something like HTMLPurifier, which is already used elsewhere in the application. And if so, what would be the best way to do it as using HTMLPurifier thousands of times, as there are many details, may cause some serious performance issues.

The objects with their details are echoed in the form of a table, escaped by the htmlspecialchars function. This function, however, does not prevent against the malicious use of html tags, for example, the <script> tag.
Yes it does. They get harmlessly and correctly output as <script>.
The question is whether all user entered data (every cell in the table) should be purified by something like HTMLPurifier
Nope. You should only use HTMLPurifier on fields where you are deliberately intending to allow the user to enter markup for direct rendering to the page, for example a comment system where the user can type <i> for italics.
For other input that you are treating as plain text, htmlspecialchars remains the right thing to do when outputting to HTML.

Everything should be checked and cleaned before you save it into database. Principle is that you DO NOT TRUST anything which is coming from user.
ALWAYS escape everything.
Or just use tools which will do that for you - like frameworks.

Related

PHP and MySQL - Should I validate/sanitize my data when pulling it from my database before displaying to user?

I validate and sanitize all my data before inserting it into the database. Would it be considered a good or a redundant pactice to validate it when pulling it form the database before displaying it?
This boils down to how much to trust your own code. On one extreme, I could forgo the validation completely if I knew that onlyI would use the client-side interface and would never make a mistake. On the other, I could validate data in every class in case I'm working with others and they forgot to properly do their job. But what's a generally good practice in this particular case?

Input validation should be a yes/no proposition. You should not modify input and save it.
You should use Htmlentities after pulling from the DB and before showing. This is because it's better to clean data just before using it at the point of failure. This is why prepared statements work so well, because there is no external code you rely on.
Say you forget to sanitize 1 field in 1 form, then when you ouput that data to other users you have no way to see that mistake from the code that does the output (assuming its not in the same file).
The less code between the sanitizing and the end result is better.
Now that is not to say save everything and validate it later. Take an email for example, you should validate that for the proper format before saving.
But for other things you don't want to modify user input. Take a file upload. Some people will change the filename to sanitize it, replace spaces etc. This is good but I prefer to create my own filename, and then show them the origainal file name, while the one I use on the server is a hash of their username and the name of the file. They never know this, and I get clean filenames.
You start modifying user data, and it becomes a chore to maintain it. You may have to un-modify it so they can make edits to it... etc. Which means you are then doing way more work then if you just clean it when outputting it.
Take for example the simple act of replacing a users \n line returns with a <br> tag. User inputs into a text field then you change it to html and save it. (besides security reasons not to do this) when user wants to edit the data, you would have to take the <br> and replace them with \n so they can edit it. Security reasons being now you have decided that raw HTML in that field is ok and will just output the raw field, allowing someone a possibility to add their own HTML. So by modifying user data we have created more work for yourself, and we have made assumptions that the data is clean before inserting it when we output it. And we cannot see how it was cleaned when we output it.
So the answer is it depends on the data and what sanitation you are doing.
Hope that makes sense.

I guess there is not need of validating or sanitizing the data from the db as you are doing it before inserting
A attacker always plays with the data which he is sending to the server and just analyis the data coming as a response . They plays with input not with the output.So just secure your data before sending it to server or db .

textarea that contains pre code security

What is the most secure way to save data from a textarea that contains a <pre><code> text in it? , using strip_tags will remove all the tags from the text..
is it save to use this:
strip_tags($input, '<pre><code><other accepted tags except script,php,...');
or should I do other things too?

What is the most secure way to save data from a textarea that contains a <pre><code> text in it?
Save it as it is.
When you take that data back out of the database and put it into a web page, call htmlspecialchars on it first to escape it so that it looks like normal text on the page.
If you want the user to be able to input actual markup, but you only want to allow certain tags, then you've got a different problem and you want something like htmlpurifier.
Either way, the input or database layer is not the right place to be worrying about output formatting concerns.

If you are saving the contents of the text area to mysql database you should use mysqli_escape_string. before saving the data.
Also you can remove javascript tags from the posted data using regular expression. e.g preg_replace

PHP htmlspecialchars() function shows html safe tags in page

I have a form that users fill it and send their questions, and then administrator will answer them and finally these questions will be shown in website.
To prevent attacks, I've used PDO and htmlspecialchars() function. I don't apply any change on input data and only store them using PDO. But when I want to show them in the page, I use htmlspecialchars(). But this caused that even <p> tags appear in the text as part of it. What is the problem and how can I solve it?
Storing questions:
$stmt = "INSERT INTO tbl_questions (title,question) VALUES (?,?)";
$q = $db->prepare($stmt);
$q->execute(array($_POST['title'],$_POST['question']));
Displaying questions:
echo htmlspecialchars($title).'<br />'.htmlspecialchars($question);

The method htmlspecialchars is not perfect.
Take a look at htmlpurifier. It is way more powerfull through whitelist filter. With it, your users can write html (such as <p>), and you don't have the risk of XSS.
consider, to use this, before you store it in the database, so you don't need to santize your input on every page view.

If you want to display only text, or even control which tags are converted/removed you have the option of using strip_tags.
echo strip_tags($title, '<p>');
Would display your title without having your p tags displayed.
Edit: Also realize that even by allowing tags you also allow certain attributes to function normally which wouldn't keep you safe from XSS attacks.
<p onclick="alert('xss scripting')">Awesome</p>
This would still work so using a filtering library such as HTML Purifier is recommended.

This is not a problem, you just didn't select a correct function for your task.
the htmlspecialchars() doesn't remove tags, it just transforms them so they are displayed instead of being executed. So, for example, <p> becomes <p>.
If you want to (partially) remove the tags, use the strip_tags() function. And, if you don't need these tags, you may (and should, I think) strip them before adding your content to database.
And also, you can allow certain tags, like this:
$q->execute(array(strip_tags($_POST['title']), strip_tags($_POST['question'], "<br><em>"));
This way you'll allow <br> and <em> in your questions. This is not needed, I'm just showing you the possibilities of the function.
However, if you want to use far more flexible rules, I'd suggest you an HTML sanitizer. For instance, I use HTMLawed. It allows to specify not only allowed tags, but also allowed attributes and classes including wildcards (for instance, you can remove anything but the class attribute, or allow everything but the style attribute).

Protection against user textarea input

Do I need to do anything special to protect myself against user input from a textarea when the input is simply stored in a session cookie?
Im selling products that can be engraved with custom text. The user is "supposed" to enter the text they would like engraved into a textarea, and this text is stored in a session cookie along with the item they chose & some other data.
Right now, I use nl2br before storing it in the session cookie, then stripslashes when I display it back out onto the page.
Do I need to do anything else to protect myself from malicious code (i.e. htmlentities, etc)?
Thanks for your input (no pun intended!)

Validate the input if you only want to allow certain characters such as a-z 0-9. If you don't want characters like < and > then validate.
As a general rule of thumb, store the input as it was entered, and do any processing before it is printed to a page or other medium. By processing, I mean run it through nl2br() and htmlentities().
Usually it's better to store the data in a neutral form i.e not processed for HTML etc because you may want to output the data to some other form in future, like XML, web services, in which case it will need to be processed differently.
Store it in a session variable, not a cookie. A session variable is stored on the server and is not accessible by browsers or anyone else. If you store it in a cookie, it can be tampered with and you will have to re-validate the input every time you want to access it because it might have changed.
If you eventually store the data in a database, you'll need to escape it for SQL Injection. The method of that will vary depending on which library is used to interface with the database, but parameterised queries or prepared statements is preferable.

The first most obvious chance for attack would be direct HTML input. Imagine someone input <script src="http://malicious.com/ddos.js" /> into your textarea. Would your PHP code output that in a way that would make it run the .js code?
Second, how does the data get to the engraver? Most common would be that it's stored in a database for later use, or maybe emailed to a queue of work for the engraver.
If you're putting into a database, you'll want to look into a wrapper like PDO that can handle cleaning input.
If you're emailing it to yourself or someone else then you'll need to take care to avoid putting dangerous information in there. I believe php's mail() function will automaticlly keep the $message from making changes to the headers. http://www.php.net/manual/en/function.mail.php
If you have some other method, let us know and there may be other concerns.

You should run htmlspecialchars() before you display the variable in the textarea box.
This will make sure that possibly evil HTML code is safely displayed inside the textarea instead of being executed.
All other places you show the variable, eg. in an e-mail or an admin interface, you should run htmlspecialchars() on it as well.
Simularily, also remember to escape the variable if saving to a database, so people are not able to mess with your database query.
(An alternative approach to doing htmlspecialchars() upon display, would be running a strip_tags() on the user's input before it is stored in the session variable. But sanitizing input on display as suggested above is a more robust way of thinking, IMHO.)

when to use htmlspecialchars() function?

Hi
I was wondering when is the appropriate place to use htmlspecialchars(). Is it before inserting data to database or when retrieving them from the database?

You should only call this method when echoing the data into HTML.
Don't store escaped HTML in your database; it will just make queries more annoying.
The database should store your actual data, not its HTML representation.

You use htmlspecialchars EVERY time you output content within HTML, so it is interperted as content and not HTML.
If you allow content to be treated as HTML, you have just opened the door to bugs at a minimum, and total XSS hacks at worst.

Save the exact thing that the user enters into the database.
then when displaying it to public, use htmlspecialchars(), so that it offers some xss protection.

Guide - How to use htmlspecialchars() function in PHP
To begin you have to understand 1 simple concept: Render.
What Render is?
Render is when the HTML transforms
<b>Hello</b>
to bold like this Hello. That's render.
So...When to use the htmlspecialchars() function?
Wherever you want to render HTML contents.
For example, if you are using JQuery and you do this:
$("#YourDiv").html("<b>Hello</b>");
The div contents will be Hello. It rendered the text into HTML.
If you want to display the message in this way (was wrote by user):
<b>Hello</b>
you have to put:
$("#YourDiv").text("<b>Hello</b>");
In that way the Hello will never be rendered.
If you want to load the message (as wrote by user) into a textbox, textarea, etc... You have to put:
<input type="text" class="Texbox1" value="">
<script>
$(".Textbox1").val("<b>Hello</b>");
</script>
That will display
<b>Hello</b>
Inside the Textbox without problems.
Conclusion:
What ever data the user input into your forms, etc...Save the data as normally.
Do not use any function. If user sent 12345 save as it is. Do not filter nothing. You only have to filter when you are going to display the data in the page to the users. YOU, ONLY YOU decide if you want to render or not what the user wrote. *Remember that.
Regards!

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.