What else should I be doing to sanitize user input?

What else should I be doing to sanitize user input? - php

Recently, I had an audit run on some of my sites by a client. One of the things they came back with was that I could be sanitizing the input data a little better as people could still cause potential harm to the database.
The function below is what I am currently using (a leftover from the old developer) but I cannot see where the potential issue may lie.
The string that gets passed through to the database will be displayed via XML which in turn is read by a Flash application.
Could anyone tell me what I might be missing? Thanks
function secure_string($string)
{
return (strip_tags(addslashes(mysql_real_escape_string(
stripslashes($string)))));
}

Better use the new PHP function filter_var() for cleaning input. New and better.

It looks like there's too much going on in that function. mysql_real_escape_string() already escapes everything you need to escape, so there's no need to run addslashes() on that. In fact, it could do more harm than good by escaping the backslashes mysql_real_escape_string() creates.

mysql_real_escape_string is the last step, you shouldn't use it your application logic. It's sole purpose is to pass strings to the database, so use it only when constructing queries. You can pass anything to mysql_real_escape_string and it will make sure you can safely store it in the database.
For the rest, it depends what do you want. If you want to strip tags, use strip_tags, etc.

Depends on where the "secured" string will be used. If it's going to be used in the database, you only need mysql_real_escape_string(), nothing more. If it's going to be displayed in html, you only need htmlentities(), nothing more. In short: your code is doing way too much, which could even be harmful.
If you want to store it in the database for displaying it in html lateron (like a comment, for example), you should be using mysql_real_escape_string() when storing the string and htmlentities() when displaying it.

If your server uses php 5.2 or better, you should use filter_var for the XML part.
$output = filter_var($input, FILTER_SANITIZE_STRING);
To store something into your database, use PDO and parameterized queries.

It's a misnomer to try and fix the problem at input time, since the problem happens at output time. See my answer over here:
What’s the best method for sanitizing user input with PHP?

Related

Sanitizing both HTML and SQL at once?

Right now I am using htmlspecialchars(mysql_escape_string($value)), but is there a way to sanitize it with one statement rather than a nested statement?

Well there's no one function that handles both of them.
You can use prepared statements and html puffier class, maybe then the "look and feel" will be little bit better :)

mysql_real_escape_string has actually fallen out of favor lately.
It is now preferred to use PDO or mysqli. They both come with PHP by default. They use something called parameterized queries to access the database, rather than having you write the SQL command yourself. This means that you don't need to worry about escaping anymore, since the query and the variables are passed into the function separately.
You can learn more about PDO here:
http://net.tutsplus.com/tutorials/php/why-you-should-be-using-phps-pdo-for-database-access/
On a related note, it is conventional to store user-supplied input into the database "as it was written", rather than using htmlspecialchars. You should then use htmlspecialchars to escape the data wherever the it appears on the site. This is a convention recommended by OWASP.
This is because you need to escape different things depending on context. This string:
' <script src="http://example.org/malice.js"></script> ]}\\\\--
...will need to be treated differently if it is used as a parameter in JSON (the quotes and backslashes and ] and } need to be escaped), HTML (the quotes and <s need to be escaped), or written as a URL (almost everything needs to be escaped). If you need to spend time instructing your JavaScript to un-encode the HTML, then your code is going to be confusing quickly.
This approach also makes fixing bugs simpler: if your site has a bug where content isn't escaped properly on a single page, then you can update the page and everything is fixed. If your site has a bug where the data is getting stored in the database incorrectly, then you need to fix everything in the database (which will take much longer and harm more users).

should quotes be saved escaped in MySQL?

I am working on a portal and I have these few questions regarding saving data in MySQL tables :
Should I save varchar field escaped ?
i'm using now mysql_real_escape_string() for avoiding string-injection.
Why should I save them unescaped (this was proposed by a guy on this website) and how would that work for characters like single and double-quotes. Doesn't it wreck the SQL command ?
easy talking around this topic.
And one last thing....I was using addslashes and stripslashes before using mysql_real_escape_string and it worked for me (of course, with mysql-injection of malicious code chance, which I recently discovered and documented myself on it)...
thanks

The very basic thing any programmer must learn is the meaning of context.
What am I going about here? If you knew the meaning of context, you wouldn't have asked this question. Now that (I hope) you know, you won't ask how to show <test> as HTML, or how to pass a variable to javascript.
So what's it all about? It's really easy. Context is the simple fact that something in a system may mean something entirely different somewhere else.
For example, in your case, a PHP string may mean something entirely different to MySQL. You can't just pass the string and expect everything to run smoothly - it won't.
So, now that you know what context means, you need to know something else that is important. You always need to convert a value from the older context to the newer one. Always.
Again, in your case, it's mysql_real_escape_string(), but a word of warning; conversion functions are context specific, so, for example, you can't use mysql_real_escape_string() to pass a string from PHP to Javascript. Similarly, you can't just use addslashes() and expect it to work. In fact, I'd argue that addslashes() is a completely useless and misleading function. Do NOT use it unless you are very sure of what you are doing.

Should I save varchar field escaped ?
No. You should escape data so that characters (in the data) with special meaning in SQL won't cause you problems.
Once it passes through SQL and gets stored in the database, it won't be escaped any longer.
i'm using now mysql_real_escape_string() for avoiding string-injection.
Don't do that, instead use prepared statements and parameterized queries
I was using addslashes and stripslashes
addslashes is a basic form of escaping. It is pointless unless you know exactly what the target of the data is. You should use something more specific where such a thing exists (and you are – mysql_real_escape_string)
stripslashes does the opposite of addslashes. Using them together is utterly pointless.

Do I really need to use mysql_real_escape_string when I save data in the DB?

I am using mysql_real_escape_string to save content in my mySQL database. The content I save is HTML through a form. I delete and re-upload the PHP file that writes in DB when I need it.
To display correctly my HTML input I use stripslashes()
In other case, when I insert it without mysql_real_escape_string, I do not use stripslashes() on the output.
What is your opinion? Does stripslashes affect performance badly ?

Do not use stripslashes(). It is utterly useless in terms of security, and there's no added benefit. This practice came from the dark ages of "magic quotes", a thing of the past that has been eliminated in the next PHP version.
Instead, only filter input:
string: mysql_real_escape_string($data)
integers: (int)$data
floats: (float)$data
boolean: isset($data) && $data
The output is a different matter. If you are storing HTML, you need to filter HTML against javascript.
Edit: If you have to do stripslashes() for the output to look correctly, than most probably you have magic quotes turned on. Some CMS even made the grave mistake to do their own magic quotes (eg: Wordpress). Always filter as I advised above, turn off magic quotes, and you should be fine.

Do not think about performance, think about security. Use mysql_real_escape_string everytime you're inserting data into DB

No, don't escape it. Use prepared statements instead. Store your data in its raw format, and process it as necessary for display - for example, use a suitable method to prevent Javascript from executing when displaying user supplied HTML.
See Bill Karwin's Sql Injection Myths and Fallacies talk and slides for more information on this subject.
See HTML Purifier and htmlspecialchars for a couple of approaches to filter your HTML for output.

Check out a database abstraction library that does all this and more for you automatically, such as ADOdb at http://adodb.sourceforge.net/
It addresses a lot of the concerns others have brought up such as security / parameterization. I doubt any performance saved is worth the developer hassle to do all this manually every query, or the security practices sacrificed.

It is always best to scrub your data for potential malicious or overlooked special characters which might throw errors or corrupt your database.
Per PHP docs, it even says "If this function is not used to escape data, the query is vulnerable to SQL Injection Attacks."

Validating user input?

I am very confused over something and was wondering if someone could explain.
In PHP i validate user input so htmlentitiies, mysql_real_escape_string is used before inserting into database, not on everything as i do prefer to use regular expressions when i can although i find them hard to work with. Now obviously i will use mysql_real_escape_string as the data is going into the database but not sure should i be using htmlentities() only when getting data from database and displaying it on a webpage as doing so before hand is altering the data entered by a person which is not keeping it's original form which may cause problems if i want to use that data later on for use for something else.
So for example, i have a guestbook with 3 fields name, subject and message. Now obviously the fields can contain anything like malicious code in js tags basically anything, now what confuses me is let say i am a malicious person and i decided to use js tags and some malicous js code and submit the form, now basically i have malicious useless data in my database. Now by using htmlentities when outputting the malicious code to the webpage (guestbook) that is not a problem because htmlentities has converted it to it's safe equivalent but then at the same time i have useless malicious code in the database that i would rather not have.
So after saying all this my question is should i accept the fact that some data in the database maybe malicious, useless data and as long as i use htmlentities on output everything will be ok or should i be doing something else aswell?.
I read so many books saying about filtering data on receiving it and escaping it on outputting it so the original form is kept but they only ever give examples like ensuring a field is only an int using functions already built into php etc but i have never found anything in regards ensuring something like a guestbook where you want users to type anything they want but also how you would filter such data apart from mysql_real_escape_string() to ensure it does not break the DB query?
Could someone please finally close this confusion for me and tell me what i should be doing and what is best practice?
Thanks to anyone who can explain.
Cheers!

This is a long question, but I think what you're actually asking boils down to:
"Should I escape HTML before inserting it into my database, or when I go to display it?"
The generally accepted answer to this question is that you should escape the HTML (via htmlspecialchars) when you go to display it to the user, and not before putting it into the database.
The reason is this: a database stores data. What you are putting into it is what the user typed. When you call mysql_real_escape_string, it does not alter what is inserted into the database; it merely avoids interpreting the user's input as SQL statements. htmlspecialchars does the same thing for HTML; when you print the user's input, it will avoid having it interpreted as HTML. If you were to call htmlspecialchars before the insert, you are no longer being faithful.
You should always strive to have the maximum-fidelity representation you can get. Since storing the "malicious" code in your database does no harm (in fact, it saves you some space, since escaped HTML is longer than unescaped!), and you might in the future want that HTML (what if you use an XML parser on user comments, or some day let trusted users have a subset of HTML in their comments, or some such?), why not let it be?
You also ask a bit about other types of input validation (integer constraints, etc). Your database schema should enforce these, and they can also be checked at the application layer (preferably on input via JS and then again server side).
On another note, the best way to do database escaping with PHP is probably to use PDO, rather than calling mysql_real_escape_string directly. PDO has more advanced functionality, including type checking.

mysql_real_escape_string() is all you need for the database operations. It'll ensure that a malicious user can't embed something into data that'll "break" your queries.
htmlentities() and htmlspecialchars() come into play when you're working with sending stuff to the client/browser. If you want to clean up potentially hostile HTML, you'd be better off using HTMLPurifier, which will strip the data to the bedrock and hose it down with bleach and rebuild it properly.

There's no reason to worry about having malicious JavaScript code in the database if you're escaping the HTML when it comes out. Just make sure you always do escape anything that comes out of the DB.

I'm learning PHP on my own and I've become aware of the strip_tags() function. Is this the only way to increase security?

I'm new to PHP and I'm following a tutorial here:
Link
It's pretty scary that a user can write php code in an input and basically screw your site, right?
Well, now I'm a bit paranoid and I'd rather learn security best practices right off the bat than try to cram them in once I have some habits in me.
Since I'm brand new to PHP (literally picked it up two days ago), I can learn pretty much anything easily without getting confused.
What other way can I prevent shenanigans on my site? :D

There are several things to keep in mind when developing a PHP application, strip_tags() only helps with one of those. Actually strip_tags(), while effective, might even do more than needed: converting possibly dangerous characters with htmlspecialchars() should even be preferrable, depending on the situation.
Generally it all comes down to two simple rules: filter all input, escape all output. Now you need to understand what exactly constitutes input and output.
Output is easy, everything your application sends to the browser is output, so use htmlspecialchars() or any other escaping function every time you output data you didn't write yourself.
Input is any data not hardcoded in your PHP code: things coming from a form via POST, from a query string via GET, from cookies, all those must be filtered in the most appropriate way depending on your needs. Even data coming from a database should be considered potentially dangerous; especially on shared server you never know if the database was compromised elsewhere in a way that could affect your app too.
There are different ways to filter data: white lists to allow only selected values, validation based on expcted input format and so on. One thing I never suggest is try fixing the data you get from users: have them play by your rules, if you don't get what you expect, reject the request instead of trying to clean it up.
Special attention, if you deal with a database, must be paid to SQL injections: that kind of attack relies on you not properly constructing query strings you send to the database, so that the attacker can forge them trying to execute malicious instruction. You should always use an escaping function such as mysql_real_escape_string() or, better, use prepared statements with the mysqli extension or using PDO.
There's more to say on this topic, but these points should get you started.
HTH
EDIT: to clarify, by "filtering input" I mean decide what's good and what's bad, not modify input data in any way. As I said I'd never modify user data unless it's output to the browser.

strip_tags is not the best thing to use really, it doesn't protect in all cases.
HTML Purify:
http://htmlpurifier.org/
Is a real good option for processing incoming data, however it itself still will not cater for all use cases - but it's definitely a good starting point.

I have to say that the tutorial you mentioned is a little misleading about security:
It is important to note that you never want to directly work with the $_GET & $_POST values. Always send their value to a local variable, & work with it there. There are several security implications involved with the values when you directly access (or
output) $_GET & $_POST.
This is nonsense. Copying a value to a local variable is no more safe than using the $_GET or $_POST variables directly.
In fact, there's nothing inherently unsafe about any data. What matters is what you do with it. There are perfectly legitimate reasons why you might have a $_POST variable that contains ; rm -rf /. This is fine for outputting on an HTML page or storing in a database, for example.
The only time it's unsafe is when you're using a command like system or exec. And that's the time you need to worry about what variables you're using. In this case, you'd probably want to use something like a whitelist, or at least run your values through escapeshellarg.
Similarly with sending queries to databases, sending HTML to browsers, and so on. Escape the data right before you send it somewhere else, using the appropriate escaping method for the destination.

strip_tags removes every piece of html. more sophisticated solutions are based on whitelisting (i.e. allowing specific html tags). a good whitelisting library is htmlpurifyer http://htmlpurifier.org/
and of course on the database side of things use functions like mysql_real_escape_string or pg_escape_string

Well, probably I'm wrong, but... In all literature, I've read, people say It's much better to use htmlspellchars.
Also, rather necessary to cast input data. (for int for example, if you are sure it's user id).
Well, beforehand, when you'll start using database - use mysql_real_escape_string instead of mysql_escape_string to prevent SQL injections (in some old books it's written mysql_escape_string still).

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.