A simple approach of validating user input in PHP

A simple approach of validating user input in PHP - php

I'm pretty new to PHP & SQL security and I was thinking about a solution for validating / filtering user input.
As far as I understand you have to mainly worry about 2 things:
(1) somebody injecting SQL queries into input fields that interact with a database
(2) somebody putting stuff like <script> tags inside their input which is then printed to the page again
While researching I found the following solutions:
For (1): prepared statements
For (2): validating / filtering HTML-tags
I know that you have to validate / filter any user input and as far as I understand most security leaks exist because of mistakes doing so.
For example simply filtering out the <script> tag in the following input:
email#<sc<script>ript>example.com
So what about a really simple algorithm rejecting any user input containing "<" or ">" (assuming there is no reason for users to use those symbols) and to replace something like [b] inside user input with <b> to allow specific tags? Isn't this a bulletproof approach to prevent malicious HTML content or what do I miss?
Also I'd like to know if using prepared statements all the time makes SQL injection impossible or is it still possible to do SQL injection on pages that exclusively use prepared statements?

You could do that, yes. But then you might be open to another attack. And you could fix that, but then you might still...
Because of that it's easier to whitelist. There are only certain characters allowed (though a more broad charset is being allowed), you can simple allow just those.
The basic logic would be that a email only contains a-z 0-9 - _ . and #. If any character outside that set is used, its wrong.
From there, you could specify it more. An email is that set of characters(minus #), than the #, then that charset(minus #).
From there, you could add a domain check, eg \.{2,}$ (must end with dot and at least to letters).
From there...
And that is just the saving part. On display, you need all kinds of tricks to make sure it's not XSS.
Or you could just use
if (!filter_var($email, FILTER_VALIDATE_EMAIL)) {
echo "Not an emailaddress, please try again before submitting!";
}

Related

Security measures - when and how

Currently I am upgrading a web application in which I will get most of the input from logged in users. The input will contains valid html, images, audio, video & upload facilities to user defined path. The application then formats it into nice ui and displays to end users. These privileged users can add / modify / delete the content using a web based interface.
As per the basic rule of thumb: I should escape my data before entering in DB, and not to receive data receive from user. To achieve that I have planned to follow following security measures. Which also includes my questions
I am using prepared statements to store all user inputs to DB. I hope this eliminates the DB injection threat.
Is this measure enough? or do i need to check for % and _ symbols as well for mysql LIKE queries?
The user input (lets call input A), where I am not expecting any HTML/css, I use strip_tags & htmlentities before inserting in DB.
Is this adequate measure ? Should I be using more
The user input (lets call input B), in which user can have html/css tags, I user htmlentities on text then insert in DB.
As far as I am aware I should not use htmlentities before inserting in the DB, but have to as previous programmer was using it. Are there any negative impacts for this?
After fetching from DB and Before displaying the input A / input B , I am not doing any pre processing assuming, the data added to DB should be clean.
Should i process / sanitize the data before displaying ? If yes then how ?
I want to html tags enters by user to be parsed by browser and not displayed to user. e.g. if user had entered <p style='color:red;'>hello</p><p class='noclass'>world</p>, I want user to see 2 words only and not actual text.
To achieve this how can I make sure that user doesn't add malicious script and at the same time the html tags are stored, fetched and parsed by browser correctly.
Please guide if the current approach is sufficient / not sufficient / less / incorrect.
I am neither a 100% newbie to php nor I m pro. I know the basics about php (or we can say over all web applications') security. So can someone can please guide me if I am making any mistake security wise OR should not be doing something OR should be doing something more or less.
I know the basics of security but I still get confused over
Which exact security measure to apply at which exact point ? (e.g. escape string BEFORE inserting to DB)
At every point what the functions available in php? (e.g. to escape strings use prepared statements)

Yes, prepared statements are great at preventing SQL injections problems. Yes, you will have to take care of % and _ in LIKE queries, a prepared statement cannot escape them since it has no way to know whether you want those values there or not.
through 5.: It's always a bad idea to escape data going into the database for a format it's destined for on output. Why? First of all, why are you so sure you're always going to use the data in an HTML context? Maybe you'll be using it in a different format in the future, and then you'll have garbage looking data. (This is more hypothetical in your case, as you're explicitly storing HTML.)
Secondly though, your output code will have to rely on your input code to correctly have escaped data in advance, possibly with a long time between input and output. Your output code can have no confidence whatsoever that the input code did the correct job for what the output code needs it to do. Therefore, escaping for output must happen at the time of output. No sooner, no later.
Thirdly (is that a word?), strip_tags is absolutely insufficient to accept some HTML but not other "insecure" HTML. You need a more complex library which has more complex whitelisting rules than what strip_tags can do. Supposedly the only library that does that is HTML Purifier. I'd run all user HTML through it.
To summarise:
Prepared statements.
HTML-escape data that is not supposed to contain literal HTML on output.
Run any data that is supposed to contain literal HTML through HTML Purifier. Whether you do this before or after inserting to the database is up to you, depending on whether you want to store the literal input the user sent you or whether you don't mind discarding that original data immediately and storing only sanitised data instead. But, the same caveat about having confidence in your output code applies too.

Regular Expression Replace for Contact Form

In php I have the following reqular expression:
$regexp = "/^([-a-z0-9.,!#'?_-\s])+$/i";
Im trying to validate my websites contact form (specifically the message field) to ensure no nasty code has been entered. The problem I am having is that certain normal punctuation and characters I need to allow, but I'm worried they could be used to insert malicious code.
For any character not obeying the expression above, I would like to replace it to make it safe. Two questions:
How do I do the replacement?
What should I replace the character with? For example I am not allowing parenthesis ( ). would it be best practice to replace like this "(" ")" or maybe \( \)?
EDIT
The data will be sent to an email address and saved to a database

Mmh why don't you just allow every character to be inserted in the contact form, converting them all with htmlentities as soon as they reach the php script after form submit? That way your users will be able to say what they want, and you won't have any problem with "malicious code" :)
And do not forget to use a proper database wrapper (PDO)
or at least escape when inserting into the database.
– knittl
EDIT: added Knittl's quote to stress it again :)

Use the filter extension. More specifically, use the filter_input() function with a sanitizing filter. For example:
$message = filter_input(INPUT_POST, 'message', FILTER_SANITIZE_STRING);
This will make sure that tags are stripped out of the message and that it is safer to handle.
However, it does not mean that you should treat it as 100% safe. You still need to take precautions when saving the message to the database (such as using the database driver's escape method, and removing unwanted/unneeded/suspicious stuff from the message), as well as making sure that it is safe to output to the client.

How to prevent XSS attack with Zend Form using %

our company has made a website for our client. The client hired a webs security company to test the pages for security before the product launches.
We've removed most of our XSS problems. We developed the website with zend. We add the StripTags, StringTrim and HtmlEntities filters to the order form elements.
They ran another test and it still failed :(
They used the following for the one input field in the data of the http header: name=%3Cscript%3Ealert%28123%29%3C%2Fscript%3E which basically translates to name=<script>alert(123);</script>
I've added alpha and alnum to some of the fields, which fixes the XSS vulnerability (touch wood) by removing the %, however, now the boss don't like it because what of O'Brien and double-barrel surnames...
I haven't come across the %3C as < problem reading up about XSS. Is there something wrong with my html character set or encoding or something?
I probably now have to write a custom filter, but that would be a huge pain to do that with every website and deployment. Please help, this is really frustrating.
EDIT:
if it's about escaping the form's output, how do I do that? The form submits to the same page - how do I escape if I only have in my view <?= $this->form ?>
How can I get Zend Form to escape it's output?

%3Cscript%3Ealert%28123%29%3C%2Fscript%3E is the URL-encoded form of <script>alert(123);</script>. Any time you include < in a form value, it will be submitted to the server as %3C. PHP will read and decode that back to < before anything in your application gets a look at it.
That is to say, there is no special encoding that you have to handle; you won't actually see %3C in your input, you see <. If you're failing to encode that for on-page display then you don't have even the most basic defenses against XSS.
We've removed most of our XSS problems. We developed the website with zend. We add the StripTags, StringTrim and HtmlEntities filters to the order form elements.
I'm afraid you have not fixed your XSS problems at all. You may have merely obfuscated them.
Input filtering is a depressingly common but quite wrong strategy for blocking XSS.
It is not the input that's the problem. As your boss says, there is no reason you shouldn't be able to input O'Brien. Or even <script>, like I am just now in this comment box. You should not attempt to strip tags in the input or even HTML-encode them, because who knows at input-time that the data is going to end up in an HTML page? You don't want your database filled with nonsense like 'Fish&Chips' which then ends up in an e-mail or other non-HTML context with weird HTML escapes in it.
HTML-encoding is an output-stage issue. Leave the incoming strings alone, keep them as raw strings in the database (of course, if you are hacking together queries in strings to put the data in the database instead of parameterised queries, you would need to SQL-escape the content at exactly that point). Then only when you are inserting the values in HTML, encode them:
Name: <?php echo htmlspecialchars($row['name']); ?>
If you have a load of dodgy code like echo "Name: $name"; then I'm afraid you have much rewriting to do to make it secure.
Hint: consider defining a function with a short name like h so you don't have to type htmlspecialchars so much. Don't use htmlentities which will usually-unnecessarily encode non-ASCII characters, which will also mess them up unless you supply a correct $charset argument.
(Or, if you are using Zend_View, $this->escape().)
Input validation is useful on an application-specific level, for things like ensuring telephone number fields contain numbers and not letters. It is not something you can apply globally to avoid having to think about the issues that arise when you put a string inside the context of another string—whether that's inside HTML, SQL, JavaScript string literals or one of the many other contexts that require escaping.

If you correctly escape strings every time you write them to the HTML page, you won't have any issues.
%3C is a URL-encoded <; it is decoded by the server.

What is the correct/safest way to escape input in a forum?

I am creating a forum software using php and mysql backend, and want to know what is the most secure way to escape user input for forum posts.
I know about htmlentities() and strip_tags() and htmlspecialchars() and mysql_real_escape_string(), and even javascript's escape() but I don't know which to use and where.
What would be the safest way to process these three different types of input (by process, I mean get, save in a database, and display):
A title of a post (which will also be the basis of the URL permalink).
The content of a forum post limited to basic text input.
The content of a forum post which allows html.
I would appreciate an answer that tells me how many of these escape functions I need to use in combination and why.
Thanks!

When generating HTLM output (like you're doing to get data into the form's fields when someone is trying to edit a post, or if you need to re-display the form because the user forgot one field, for instance), you'd probably use htmlspecialchars() : it will escape <, >, ", ', and & -- depending on the options you give it.
strip_tags will remove tags if user has entered some -- and you generally don't want something the user typed to just disappear ;-)
At least, not for the "content" field :-)
Once you've got what the user did input in the form (ie, when the form has been submitted), you need to escape it before sending it to the DB.
That's where functions like mysqli_real_escape_string become useful : they escape data for SQL
You might also want to take a look at prepared statements, which might help you a bit ;-)
with mysqli - and with PDO
You should not use anything like addslashes : the escaping it does doesn't depend on the Database engine ; it is better/safer to use a function that fits the engine (MySQL, PostGreSQL, ...) you are working with : it'll know precisely what to escape, and how.
Finally, to display the data inside a page :
for fields that must not contain HTML, you should use htmlspecialchars() : if the user did input HTML tags, those will be displayed as-is, and not injected as HTML.
for fields that can contain HTML... This is a bit trickier : you will probably only want to allow a few tags, and strip_tags (which can do that) is not really up to the task (it will let attributes of the allowed tags)
You might want to take a look at a tool called HTMLPUrifier : it will allow you to specify which tags and attributes should be allowed -- and it generates valid HTML, which is always nice ^^
This might take some time to compute, and you probably don't want to re-generate that HTML each time is has to be displayed ; so you can think about storing it in the database (either only keeping that clean HTML, or keeping both it and the not-clean one, in two separate fields -- might be useful to allow people editing their posts ? )
Those are only a few pointers... hope they help you :-)
Don't hesitate to ask if you have more precise questions !

mysql_real_escape_string() escapes everything you need to put in a mysql database. But you should use prepared statements (in mysqli) instead, because they're cleaner and do any escaping automatically.
Anything else can be done with htmlspecialchars() to remove HTML from the input and urlencode() to put things in a format for URL's.

There are two completely different types of attack you have to defend against:
SQL injection: input that tries to manipulate your DB. mysql_real_escape_string() and addslashes() are meant to defend against this. The former is better, but parameterized queries are better still
Cross-Site scripting (XSS): input that, when displayed on your page, tries to execute JavaScript in a visitor's browser to do all kinds of things (like steal the user's account data). htmlspecialchars() is the definite way to defend against this.
Allowing "some HTML" while avoiding XSS attacks is very, very hard. This is because there are endless possibilities of smuggling JavaScript into HTML. If you decided to do this, the safe way is to use BBCode or Markdown, i.e. a limited set of non-HTML markup that you then convert to HTML, while removing all real HTML with htmlspecialchars(). Even then you have to be careful not to allow javascript: URLs in links. Actually allowing users to input HTML is something you should only do if it's absolutely crucial for your site. And then you should spend a lot of time making sure you understand HTML and JavaScript and CSS completely.

The answer to this post is a good answer
Basically, using the pdo interface to parameterize your queries is much safer and less error prone than escaping your inputs manually.

I have a tendency to escape all characters that would be problematic in page display, Javascript and SQL all at the same time. It leaves it readable on the web and in HTML eMail and at the same time removes any problems with the code.
A vb.NET Line Of Code Would Be:
SafeComment = Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
HttpUtility.HtmlEncode(Trim(strInput)), _
":", ":"), "-", "-"), "|", "|"), _
"`", "`"), "(", "("), ")", ")"), _
"%", "%"), "^", "^"), """", """), _
"/", "/"), "*", "*"), "\", "\"), _
"'", "'")

First of all, general advice: don't escape variables literally when inserting in the database. There are plenty of solutions that let you use prepared statements with variable binding. The reason to not do this explicitly is because it is only a matter of time then before you forget it just once.
If you're inserting plain text in the database, don't try to clean it on insert, but instead clean it on display. That is to say, use htmlentities to encode it as HTML (and pass the correct charset argument). You want to encode on display because then you're no longer trusting that the database contents are correct, which isn't necessarily a given.
If you're dealing with rich text (html), things get more complicated. Removing the "evil" bits from HTML without destroying the message is a difficult problem. Realistically speaking, you'll have to resort to a standardized solution, like HTMLPurifier. However, this is generally too slow to run on every page view, so you'll be forced to do this when writing to the database. You'll also have to ensure that the user can see their "cleaned up" html and correct the cleaned up version.
Definitely try to avoid "rolling your own" filter or encoding solution at any step. These problems are notoriously tricky, and you run a large risk of overlooking some minor detail that has big security implications.

I second Joeri, do not roll your own, go here to see some of the the many possible XSS attacks
http://ha.ckers.org/xss.html
htmlentities() -> turns text into html, converting characters to entities. If using UTF-8 encoding then use htmlspecialchars() instead as the other entities are not needed. This is the best defence against XSS. I use it on every variable I output regardless of type or origin unless I intend it to be html. There is only a tiny performance cost and it is easier than trying to work out what needs escaping and what doesn't.
strip_tags() - turns html into text by removing all html tags. Use this to ensure that there is nothing nasty in your input as a adjunct to escaping your output.
mysql_real_escape_string() - escapes a string for mysql and is your defence against SQL injections from little Bobby tables (better to use mysqli and prepare/bind as escaping is then done for you and you can avoid lots of messy string concatenations)
The advice given obve re avoiding HTML input unless it is essential and opting for BBCode or similar (make your own up if needs be) is very sound indeed.

Comprehensive server-side validation

I currently have a fairly robust server-side validation system in place, but I'm looking for some feedback to make sure I've covered all angles. Here is a brief outline of what I'm doing at the moment:
Ensure the input is not empty, or is too long
Escape query strings to prevent SQL injection
Using regular expressions to reject invalid characters (this depends on what's being submitted)
Encoding certain html tags, like <script> (all tags are encoded when stored in a database, with some being decoded when queried to render in the page)
Is there anything I'm missing? Code samples or regular expressions welcome.

You shouldn't need to "Escape" query strings to prevent SQL injection - you should be using prepared statements instead.
Ideally your input filtering will happen before any other processing, so you know it will always be used. Because otherwise you only need to miss one spot to be vulnerable to a problem.
Don't forget to encode HTML entities on output - to prevent XSS attacks.

You should encode every html tag, not only 'invalid' ones. This is a hot debate, but basically it boils down to there will always be some invalid HTML combination that you will forget to handle correctly (nested tags, mismatched tags some browsers interpret 'correctly' and so on). So the safest option in my opinion is to store everything as htmlentities and then, on output, print a validated HTML-safe-subset tree (as entities) from the content.

Run all server-side validation in a library dedicated to the task so that improvements in one area affect all of your application.
Additionally include work against known attacks, such as directory traversal and attempts to access the shell.

This Question/Answer has some good responses that you're looking for
(PHP-oriented, but then again you didn't specify language/platform and some of it applies beyond the php world):
What's the best method for sanitizing user input with PHP?

You might check out the Filter Extension for data filtering. It won't guarantee that you're completely airtight, but personally I feel a lot better using it because that code has a whole lot of eyeballs looking over it.
Also, consider prepared statements seconded. Escaping data in your SQL queries is a thing of the past.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.