PHP - Sanitize all data?

PHP - Sanitize all data? - php

I am making a CMS which can be extended by third-party developers. In the past I have had problems with newbie developers ignoring security all together. When they put their modules on my website, they are potentially compromising users websites.
I want to create a globals object. This will overwrite all globals with a sanitized copy. This could cause issues, so this object will also provide an option to get unsanitized data.
This way, by default, developers could theoretically do something like this and it's effect wouldn't be as bad as it usually would be. (Obviously this would still potentially cause problems however tables won't be dropped and data won't be exposed.)
mysql_query("INSERT INTO users (`name`) VALUES ('{$_POST['name']}')");
This doesn't protect against developers who intentionally try to break things. However, it will help eliminate basic mistakes.
The end object would be accessed as follows.
$_POST['key']; // Provides Sanitized version of the post key.
$obj->post('key'); // Provides Sanitized version of the post key.
$obj->post_raw('key'); // Provide unsanitized version of the post key.
What do people think about this approach? Is there a proven 'escape all' function floating around that would achieve this?

You're basically talking about reimplementing magic_quotes_gpc. It didn't go that well when Zend did it.
The largest problems are 1) different forms of data protection are necessary for different contexts, and 2) if somebody is too much of a noob to do elementary data security, they're definitely too much of a noob to understand what data your auto-protection mechanism has been applied to and which it hasn't. (They will source data from places your mechanism does not and cannot touch; take this as a given.)

No, it's really difficult to have a generic sanitizing function. It's always use-specific. And let me thus recommend something else:
http://sourceforge.net/p/php7framework/wiki/input/
It basically overwrites the superglobals $_GET, $_POST with objects. This prevents raw access, and you get either notices or log errors if no appropriate filter is used. You still have to think about which filter to use, but at least this method can be used to coerce co-developers on spending a few seconds to give it a thought. Also it's really easy to apply:
$_GET->text["comment"]
mysql_query("SELECT '{$_REQUEST->sql[field]}'");
$_POST->nocontrol->utf7->xss->text["text"];
It's also possible to predefine filter lists for specific input variable names. Or set a filter for all old array accesses with $_POST->xss->nocontrol->always(); It needs some getting used to, but it's really the simplest API possible and meant just for cases like you describe.

You may want to check out http://code.google.com/p/inspekt/ , which pretty much already does what you describe.

Security is a very complicate and delicate subject IMHO.
I'm not sure if you should even allow unsafe access to data. I'd make access only to sanitized contents, and also enforce use of prepared statements.

Related

Sanitizing PHP Variables, am I overusing it?

I've been working with PHP for some time and I began asking myself if I'm developing good habits.
One of these is what I belive consists of overusing PHP sanitizing methods, for example, one user registers through a form, and I get the following post variables:
$_POST['name'], $_POST['email'] and $_POST['captcha']. Now, what I usually do is obviously sanitize the data I am going to place into MySQL, but when comparing the captcha, I also sanitize it.
Therefore I belive I misunderstood PHP sanitizing, I'm curious, are there any other cases when you need to sanitize data except when using it to place something in MySQL (note I know sanitizing is also needed to prevent XSS attacks). And moreover, is my habit to sanitize almost every variable coming from user-input, a bad one ?

Whenever you store your data someplace, and if that data will be read/available to (unsuspecting) users, then you have to sanitize it. So something that could possibly change the user experience (not necessarily only the database) should be taken care of. Generally, all user input is considered unsafe, but you'll see in the next paragraph that some things might still be ignored, although I don't recommend it whatsoever.
Stuff that happens on the client only is sanitized just for a better UX (user experience, think about JS validation of the form - from the security standpoint it's useless because it's easily avoidable, but it helps non-malicious users to have a better interaction with the website) but basically, it can't do any harm because that data (good or bad) is lost as soon as the session is closed. You can always destroy a webpage for yourself (on your machine), but the problem is when someone can do it for others.
To answer your question more directly - never worry about overdoing it. It's always better to be safe than sorry, and the cost is usually not more than a couple of milliseconds.

The term you need to search for is FIEO. Filter Input, Escape Output.
You can easily confound yourself if you do not understand this basic principle.
Imagine PHP is the man in the middle, it receives with the left hand and doles out with the right.
A user uses your form and fills in a date form, so it should only accept digits and maybe, dashes. e.g. nnnnn-nn-nn. if you get something which does not match that, then reject it.
That is an example of filtering.
Next PHP, does something with it, lets say storing it in a Mysql database.
What Mysql needs is to be protected from SQL injection, so you use PDO, or Mysqli's prepared statements to make sure that EVEN IF your filter failed you cannot permit an attack on your database. This is an example of Escaping, in this case escaping for SQL storage.
Later, PHP gets the data from your db and displays it onto a HTML page. So you need to Escape the data for the next medium, HTML (this is where you can permit XSS attacks).
In your head you have to divide each of the PHP 'protective' functions into one or other of these two families, Filtering or Escaping.
Freetext fields are of course more complex than filtering for a date, but never mind, stick to the principles and you will be OK.
Hoping this helps http://phpsec.org/projects/guide/

How can I prevent form variable names from revealing the database structure?

I'm in the early phases of working on a web application, and I've reached a point where I want to make the best choice about a particular security concern. At the moment, all fields found within HTML forms are named after the database column that they are representing. So, for example, if in the database I have a field named "email", then the form field will be called "email" as well. This makes it easier for my generic code to handle forms, but I naturally see one major problem with such names: They can give potential hackers insight into how my database is structured, just from viewing the source.
The main solution I've thought of involves encrypting field names so that client never has the real ones. A server-side key would be used to do the encryption. I am, however, concerned that this approach may complicate things too much. For example:
I may find myself having to use POSTs more often, as the encrypted text might be longer than the original - pushing the limits of GET when many fields and their data are present.
Frequent encrypt/decrypt calls might lead to performance issues down the line. I did not test this yet, so it could end up being negligible.
Non-AJAX GETs can't use this approach without looking really cryptic.
So, I'm wondering what you guys think about this. Am I over-thinking it, or am I on the right track? Is there a better way to handle it?
By the way, I'm also aware that a field name like "email" doesn't offer much information to the developer (why not txtEmail, or something like that?). I'm looking to see if there's a good naming convention that I can adopt, as it might help with the above problem.

If anyone can gain access to your DB via SQL injection or any other method, your schema can be revealed with one query so there is no point in trying to obscure it. If you feel you have to do security by obscurity, you're not doing something else right.
If your application is secured, then it doesn't matter if a potential attacker thinks they know your schema or not. They can't do anything with the information.
I'd spend less time trying to obscure your database (which will only frustrate you and your developers) and more time trying to lock down your application against potential injection attacks.

Is it safe to unset PHP super-globals if this behavior is documented?

I'm building a PHP framework, and in it I have a request object that parses the url as well as the $_GET, $_POST and $_FILE superglobals.
I want to encourage safe web habits, so I'm protecting the data against SQL injection, etc.
In order to ensure users of this framework are accessing the safe, clean data through the request object, I plan to use unset($_GET, $_POST, $_REQUEST); after parsing those variables.
I will document this in the method comments, and explain in the framework documentation that this is happening.
My question is: Would this be desirable behavior? What are the potential pitfalls that I have not foreseen?

I know this was answered already, but here's my $0.02.
I would not unset or clear the input arrays. However, what I have done is to replace them with an object. So instead of having the raw array, I replace it with an object that implements ArrayAccess and Iterator. That way the vast majority of code which uses the native arrays will still work quite well with the object.
The rationale is that at least you can verify that the code paths are operating correctly via tests. You can replace those objects with a mock object to throw an exception during testing so that you can detect improper access to those arrays (if you do determine it to be "bad practice"). So it lets you run during production without putting un-necessary restrictions, but also lets you turn it on to verify best-practices during testing.
And while I do agree with #JW about escaping, you should be filtering input. Filter-in, Escape-out. Any time data comes into your program (either via user input or from a DB), filter it to expected values. Any time data goes out (either to the DB or to the user), you need to properly escape it for that medium. So using a request object that enables easy filtering of the submitted data can be very valuable.
An example using a fluent interface (which you may or may not want):
$id = $request->get('some_id')->filter('int', array('min' => 1));
And that doesn't include the benefits of compensating for differing platforms and configurations (for example, if magic_quotes_gcp is enabled or not, etc)...
Anyway, that's just my opinion...

I'm not sure what the point would be of preventing access to the $_GET or $_POST arrays. There's nothing harmful in them. If you're creating a framework for preventing SQL injection or cross-site-scripting, you should escape the data when creating an SQL query or HTML document.
Escaping GET/POST data at the beginning is too early; you don't know how the data will be used, so you can't escape or encode it properly.
Having said that, you still may have some valid reasons to want people to access GET/POST data through your code. In that case, I still wouldn't unset them. You may end up incorporating third-party code which relies on them. Instead, just encourage your users to avoid them (like they should avoid global variables in general).

I'd maybe expose a method (maybe hidden or super counter-intuitive ;)) to get the raw data, in the off chance that your sanitization routines corrupt data in some unforeseen manner. To protect the user is one thing, but to completely lock them from their ability to retrieve data in the most raw manner may lead to frustration and, as a result, those people not using your framework :)

Keep in mind this increases your maintenance costs...if anything is ever added, removed or changed with the super globals in PHP, you will need to update your framework.

Sounds like magic_quotes style thinking. Except, at least magic_quotes was 99% reversible at runtime. Your "cleaned" data might be lossy, which really sucks.

I'm learning PHP on my own and I've become aware of the strip_tags() function. Is this the only way to increase security?

I'm new to PHP and I'm following a tutorial here:
Link
It's pretty scary that a user can write php code in an input and basically screw your site, right?
Well, now I'm a bit paranoid and I'd rather learn security best practices right off the bat than try to cram them in once I have some habits in me.
Since I'm brand new to PHP (literally picked it up two days ago), I can learn pretty much anything easily without getting confused.
What other way can I prevent shenanigans on my site? :D

There are several things to keep in mind when developing a PHP application, strip_tags() only helps with one of those. Actually strip_tags(), while effective, might even do more than needed: converting possibly dangerous characters with htmlspecialchars() should even be preferrable, depending on the situation.
Generally it all comes down to two simple rules: filter all input, escape all output. Now you need to understand what exactly constitutes input and output.
Output is easy, everything your application sends to the browser is output, so use htmlspecialchars() or any other escaping function every time you output data you didn't write yourself.
Input is any data not hardcoded in your PHP code: things coming from a form via POST, from a query string via GET, from cookies, all those must be filtered in the most appropriate way depending on your needs. Even data coming from a database should be considered potentially dangerous; especially on shared server you never know if the database was compromised elsewhere in a way that could affect your app too.
There are different ways to filter data: white lists to allow only selected values, validation based on expcted input format and so on. One thing I never suggest is try fixing the data you get from users: have them play by your rules, if you don't get what you expect, reject the request instead of trying to clean it up.
Special attention, if you deal with a database, must be paid to SQL injections: that kind of attack relies on you not properly constructing query strings you send to the database, so that the attacker can forge them trying to execute malicious instruction. You should always use an escaping function such as mysql_real_escape_string() or, better, use prepared statements with the mysqli extension or using PDO.
There's more to say on this topic, but these points should get you started.
HTH
EDIT: to clarify, by "filtering input" I mean decide what's good and what's bad, not modify input data in any way. As I said I'd never modify user data unless it's output to the browser.

strip_tags is not the best thing to use really, it doesn't protect in all cases.
HTML Purify:
http://htmlpurifier.org/
Is a real good option for processing incoming data, however it itself still will not cater for all use cases - but it's definitely a good starting point.

I have to say that the tutorial you mentioned is a little misleading about security:
It is important to note that you never want to directly work with the $_GET & $_POST values. Always send their value to a local variable, & work with it there. There are several security implications involved with the values when you directly access (or
output) $_GET & $_POST.
This is nonsense. Copying a value to a local variable is no more safe than using the $_GET or $_POST variables directly.
In fact, there's nothing inherently unsafe about any data. What matters is what you do with it. There are perfectly legitimate reasons why you might have a $_POST variable that contains ; rm -rf /. This is fine for outputting on an HTML page or storing in a database, for example.
The only time it's unsafe is when you're using a command like system or exec. And that's the time you need to worry about what variables you're using. In this case, you'd probably want to use something like a whitelist, or at least run your values through escapeshellarg.
Similarly with sending queries to databases, sending HTML to browsers, and so on. Escape the data right before you send it somewhere else, using the appropriate escaping method for the destination.

strip_tags removes every piece of html. more sophisticated solutions are based on whitelisting (i.e. allowing specific html tags). a good whitelisting library is htmlpurifyer http://htmlpurifier.org/
and of course on the database side of things use functions like mysql_real_escape_string or pg_escape_string

Well, probably I'm wrong, but... In all literature, I've read, people say It's much better to use htmlspellchars.
Also, rather necessary to cast input data. (for int for example, if you are sure it's user id).
Well, beforehand, when you'll start using database - use mysql_real_escape_string instead of mysql_escape_string to prevent SQL injections (in some old books it's written mysql_escape_string still).

Where is the best place to sanitize user input that will be output on a webpage?

In the MVC way of doing things, where is the best place to run, for example htmlspecialchars() on any input? Should it happen in the view (it sort of makes sense to do it here, as I should be dealing with the raw input throughout the controller and model?)
I'm not quite sure... What are benefits of doing it in the view or controller? This is just reguarding outputting to a page... to minimize potential XSS exploits.

Well, that depends, doesn't it? You should sanitize everything you OUTPUT in the view. First, because sanitization depends on the format of your output. A JSON sanitized output is different than an HTML sanitized output, right? Second, because you never want to trust the data you have. It might have been compromised through any number of ways.
That won't protect against SQL injections and such, though. Now, you never want to do that in a client-side javascript, because an attacker may easily replace that. Again, my advice is sanitization at the point of usage. If you are just writing to a file, it might not be needed. And some database access libraries do not needed it either. Others do.
At any rate, do it at the point of usage, and all the source code becomes more reliable against attacks and bugs (or attacks through bugs).

This is why thinking in design patterns sucks. What you should be asking is where is the most efficient place to do this? If the data is write-once/read-many then sanitising it every time it's output (on page view) is going to put unnecessary load on the server. Make your decision based on how the data will be used, where you can setup caching, how you do searches, etc.. not on the merits of a pattern.
From what you've said I'd perform the sanitation just ahead of writing it to the DB. Then you're not only ensuring the data is safe to insert but you're also ensuring that no future mistakes can result in unsanitised data being sent. If you ever want the original text for some reason you just invert your original transformation.
You should not be concerned about storing html encoded text in your DB since ALL text is encoded in one form or another. If you need to search the text you just encode the search string as well. If you need another format then that's another story but then you would have to evaluate your options based on your needs.

I think the best way is to escape the view - output, and store everything in original in your database.
Why ? With this method you're able to use the db records for every use case.

You can do it in the view (via javascript validation), but data coming from the rendered view to the controller is still considered untrusted, so you will still have to sanitize it in the controller.
In the examples I've seen (such as nerddinner), the sanitizing code is part of the model classes. Some people use validation libraries.

I don't there's any 'best' place to sanitize. Depending on the use case, we may need to implement sanitizing logic in more than one tiers.

The general rule is : fat model, thin controller.
Now, how you apply that rule is a different story :)
The way i think of it is your controller should really just be controlling the flow, redirecting to pages and etc. Any validation should take place in your model. If you want to do client side validation, you'd probably put it in the view. Any developer concerned about security would do validation on the client and on the server.

I put it in the "controller" as most of today's frameworks define it. (Not getting into the discussion of how pure that is) It is not something that belongs directly in a view template, but it also does not necessarily need to be in the model, as you may want the original data sometimes and not others.
So when I'm loading data from the model in the controller and assigning it to a view (smarty template, in my case), I run it through the HTML Purifier first.

I'm going to buck the answering trend here and give this advice:
Untrusted input should be confined as rigidly as possible - by reducing the number of places that you interact with input before its safety has been evaluated, you reduce your threat exposure when someone who is thinking about a bug fix or functionality improvement rather than security changes the system under discussion.

Depends on the type of user input and what the validation it is you're running on it.
If you want to clean the input, I'd put the logic in the controller, and also in the view when you output data that comes from the database (or any source really).
If you are doing data validation, I'd do it both on the client side with javascript, as well as in the model.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.