is it okay to "repeatedly" xss-clean data in CodeIgniter?

is it okay to "repeatedly" xss-clean data in CodeIgniter? - php

The following are ways to XSS-clean data in Codeigniter:
set global_xss_filtering in config to TRUE
use xss_clean()
use xss_clean as a validation rule
set the second parameter to TRUE in $this->input->post('something', TRUE)
Is it okay to use all or more than one of them on one piece of data?
For example, would it be okay if I still used $this->input->post('something', TRUE) even if the data has already been cleaned by global_xss_filtering and xss_clean validation rule?

It's not going to hurt you, but it is definitely is pointless.
There's a very good chance that eventually, you will reach a point where the global XSS filter is going to be cumbersome. Since it can't be disabled per controller without extensive hacks, and access to the raw $_REQUEST data will be impossible, you will need to disable it globally. This will happen the moment you want to process a single piece of trusted data, or data that isn't HTML output and must remain intact.
Using it as a form validation rule is pointless and potentially destructive as well. Imagine what this site would be like if every time you typed <script> it was replaced with [removed], with no way to revert it in the future. For another example, what if a user users some "XSS" content in his password? Your application will end up altering the input silently.
Just use the XSS filter where you need it: on your HTML output, places where javascript can be executed.

Yes. Assume, your input is 'A'. Then, lets say you run an xss_clean to get XSS-safe content:
B = xss_clean(A)
Now, lets say I do it again to get C:
C = css_clean(B)
Now, if B and C differ, then it must mean that B had some xss-unsafe content. Which clearly means that xss_clean is broken as it did not clean A properly. So as long as you assume that the function returns xss-safe content, you are good to go.
One argument that can be made is what if the function modifies even xss-safe content? Well, that would suck and it would still mean that the function is broken, but that is not the case (saying just out of my experience, as in haven't seen it behave like this ever).
The only drawback I see is the additional processing overhead, but doing it twice is fine (once with global filtering, and once doing it explicitly, just in case global filtering is turned off sometime by someone), and is a pretty ok overhead cost for the security assurance.
Also, if I may add, codeigniters xss clean doesn't really parse the HTML and drop the tags and stuff. It just simply converts the < and > to < and >. So with that in mind, I don't see anything that could go wrong.

Using xss_clean even once is bad as far as I am concerned. This routine attempts to sanitise your data by removing parts or replacing parts. It is lossy and not guaranteed to return the same content when run multiple times. It is also hard to predict and will not always act appropriately. Given the amount of things it does to try to sanitise a string there is a massive performance hit for using this on input. Even the tiniest bit of input such as a=b will cause a flurry of activity for xss_clean.
I would like to say that you should never use xss_clean but realistically I can't say that. This system is made for inexperienced developers who do not know how to safely manage user content. I'm an experienced developer so I can say that no project I am working on should ever use xss_clean. The fact is though, the corruption issues will be less problematic for inexperience developers with simpler usage and ultimately it probably will make their code more secure even if they should be making their code more secure themselves rather than relying on quick dirty and cheap hacks. On the otherhand, xss_clean isn't guaranteed to make your code completely secure and can ultmimately make things worse by giving a false sense of security. You are recommended to really study instead to make sure you understand exactly everything your code does so you can make it truly secure. xss_clean does not compensate for code errors, it compensates for coder errors.
Ideally xss_clean wants to be done only on output (and wants to be replaced with htmlentities, etc) but most people wont bother with this as it's simpler for them to violate data purity by just filtering all input rather than filtering output (something can be input once but output ten times). Again, an undisciplined developer may not put xss_clean for one out of those ten cases of output.
Realistically however, the only real decent way is to properly encode everything in the view the moment it is to be displayed on a page. The problem with pre-emptive encoding is that you store data that might be incorrectly encoded and you can double encode data if it is input, then output into a form, then inputted again. If you think of something like an edit box you can have some serious problems with data growth. Not all sanitation removes content. For example, if you addslashes this will add content. If you have a slash in your content every time you run addslashes a new slash is added causing it to grow. Although there is a good chance your data will end up embedded in HTML you also can't always really know where data will end up. Suddenly you get a new requirement that applies to previous data and that's it, you're screwed because you applied and lossy filter to incoming data prior to storage. By lossy, in this case, that might mean your job after corrupting all the user data in your database. Your data is usually the most valuable thing for a web application. This is a big problem with pre-emptive encoding. It is easier to work with if you always know your data is pure and can escape it according to the situation at had but if your data could be in any condition down the line this can be very problematic. The filtering can also cause some occasional logical breakages. As the sanitisation can remove content for example, two strings that don't match can be made to match.
Many of the problems with xss_clean on input are the same or similar to those for magic_quotes:
http://en.wikipedia.org/wiki/Magic_quotes
Summary: You should not use it but instead block bad data on user input and escape properly on output. If you want to sanitise user data, it should happen in the client (browser, form validation) so that the user can see it. You should never have invisible data alteration. If you must run xss_clean. You should only run it once, on output. If you're going to use it for validation of input, have $posted_data !== xss_clean($posted_data) then reject.

Related

PHP - htmlspecialchars performance and concerns. Alternatives?

I am making a website that is 99% based around user content. I have been reading a lot about security vs xss, csrf, sql injection and all that fun stuff. I understand it all well and have been incorporating proper security. The thing I am concerned about is performance and over usage, looking for a better way.
I understand the idea of accept user input as is. Filter and validate user input before going into database and then output with sanitization with something like htmlspecialchars.
Now here is the thing. Every “entry” a user adds to the database can have around 30 different pieces of information attached to it.
So if they view a page. I would output around 30 htmlspecialchars on that page alone. That seems like over a kill. A listing or search page might have 5 or more variables for each of those items and at 20 listing a page I am easily hitting 100+ uses of htmlspecialchars. That seems insane.
Would this cause a strain on my cheap server? Is there a better way to do it?
My horrible ideas.
(1) How about using strip tags when inputting into the database? I understand the vulnerability of outputting into attributes without htmlspecialchars, but I control where every variable outputs and the worst would be variables going into things like <h4>$title</h4> or <li>$info</li> never into an href or anything. Wouldn't this save a ton of server usage to have the sanitization done once, instead of on every page load? I could still call htmlspecialchars on a variable if I have to put it inside an attribute.
(2) I understand this a horrible idea. But how about storing the htmlspecialchars sanitized text directly in the database? I know if I ever want to do something else with this data like, make an api, output as json or pdf, I would have to decode htmlspecialchars. But none of those situations are something I would ever do. This seems like it would save a TON of server resources, as I would be sanitizing only once instead of every page load.
(3) Store literal input and htmlspecialchars version sanitized of the text in another column. This way the user still sees their input as it was entered and I only have to htmlspecialchars once on input to the database, instead of every page load. Yes more database storage but otherwise what would be the problems?
Edit: Thanks I now see this is micro optimization.

My Opinion: You shouldn't have a big issue with performance. In the future your performance issues will actually decrease since techonology is only enhancing performance regarding the speed of CPU cycles and other factors.
I recommend you keep using the htmlspecialchars when echoing out the data. 30 function calls to htmlspecialchars is very little work for your server (give your server and php some credit xD) and for the reasons stated above will be even less work in the future.

Use http://htmlpurifier.org/, its open source PHP Library used by lot of big forums to clean up user inputs.
you can save the cleaned-up html in your database.

When PHP code should really be treated as unsafe?

Yesterday I took a part in interview for PHP developer postion. My job was to solve 15 questions quite simple test. One of the questions was to decide wether code similar to below should be treated as unsafe. I gave a wrong (as it turned out) answer and the argumentation from the other person on that interview was quite surprising (at least to me).
Code was something like that:
function someFunction($a)
{
echo $a * 4;
}
someFunction($_GET['value']);
Possible answers were:
always,
only when register_globals is enabled,
never.
You could get one point for correct answer and second one for giving good explanation (argumentation) on answer chosen answer.
My answer was third: this code is never unsafe. Plus argumentation: Because, this is just a simple equation. There are no file or database operations here, no streams, protocols, no nothing. It's just an equation. Nothing else. Attacker is unable to do anything wrong with PHP script, not matter how malformed URL query he or she will try to execute. No chance.
I've got zero points. Neither my answer was correct, nor my argumentation was accepted. The correct answer was: this code is always unsafe -- you should always escape, what you got from URL query.
My question is: Is this really good point of view? Do we really have to always use a rule of thumb, that anything taken directly from query is unsafe, if not filtered, escaped or secured in any other way? Does this means, that I teach my students an unsefe coding methodologies, becuase on very first PHP lecture they write a script for calculating a triangle area and they're using unescaped, unfiltered params from URL in their task?
I understand, that security and writing safe code should be a matter of highest priority. But, on the other hand, isn't that a little bit of safe-code-fascism (forgive me, if I offended someone) to threat any code unsafe, even it no one is able to do any harm with it?
Or maybe I'm completely wrong and you can do some harm on function that echoes times four, what you gave to it?

The issue is that later someone may change the function 'somefunction' and do more than simply multiply it by 4.
The function in itself is not unsafe, but the line:
someFunction($_GET['value']);
Is completely unsafe. Maybe someFunction gets refactored into another file or is way down in the code.
You should alway check and scrub user supplied data to protect yourself and others working on a library or function somewhere not caught not expecting you to pass them pure $_GET array data.
This is especially true when working with others and is why it's being asked in the interview--to see if your looking ahead at future potential issues, not to see that you understand that currently someFunction is harmless when pass possibly dangerous GET data. It's becomes an issue when your coworker refactors someFunction to query a DB table.

Having not spent much time playing with your code example, I won't say that it could be used to 'do harm' however, your function will not work properly unless it is passed some form of number. In the long run, it is better to escape your code, and handle erroneous data then wait for the day when an unsuspecting user puts the wrong type of value in your box and breaks things.
I'm sure that the company you were interviewing for was just looking for someone with a solid habit of making sure their code is complete and unbreakable.

NEVER trust anything that originates from a user. Just dont. Even when you cannot fathom a possibility of your code/class/package being misused, cover your own ass by ensuring the input to your product is exactly what you're expecting, no surprises. At the barest minimum, someone may supply bad input to that method just to screw with your app, to cause it to show an error or give the white screen of death. The code that does basic multiplication is a prime candidate for that kind of malevolence. It applies not just in PHP, but programming/design in general.

executing code from database

I have a PHP code stored in the database, I need to execute it when retrieved.
But my code is a mix of HTML and PHP, mainly used in echo "";
A sample that looks like my code:
echo "Some Text " . $var['something'] . " more text " . $anotherVar['something2'];
How can I execute a code like the either if I add the data to the DB with echo""; or without it.
Any ideas?
UPDATE:
I forgot to mention, I'm using this on a website that will be used on intranet and security will be enforced on the server to ensure data safety.

I have a PHP code stored in the database
STOP now.
Move the code out of the database.
And never mix your code with data again.

It's not only a bad idea but also invitation to several type of hacking attempts.
You can do with eval(). but never use it . The eval() is very dangerous because it allows execution of arbitrary PHP code. Its use thus is discouraged. If you have carefully verified that there is no other option than to use this construct, pay special attention not to pass any user provided data into it without properly validating it beforehand.

See eval. It lets you pass a string containing PHP and run it as if you'd written it directly into your file.
It's not a common practice to store executable PHP in a database; is the code you store really that different that it makes more sense to maintain many copies of it rather than adapting it to do the same thing to static data in the database? The use of eval is often considered bad practice as it can lead to problems with maintenance, if there's a way of avoiding it, it's normally worth it.

You can execute code with eval():
$code_str = "echo 'Im executed'";
eval($code_str );
BUT PAY ATTENTION that this is not safe: if someone will get access on your database he will be able to execute any code on your server

use the eval() function.
heres some info
http://www.php.net/manual/en/function.eval.php
something along the lines of:
eval($yourcode);
If that is the last resort, you want it to be secure as it will evaluate anything and hackers love that. Look into Suhosin or other paths to secure this in production.

As everyone'd indicated using eval() is a bad approach for your need. But you can have almost the same result by using whitelist approach.
Make a php file , db_driven_functions.php for instance. get your data from db. and map them in an array as below
//$sql_fn_parameters[0] = function name
//$sql_fn_parameters[1,2,3.....] = function parameters
Then define functions those include your php code blocks.for instance
my_echo($sql_fn_parameters){
echo $sql_fn_parameters[1];//numbered or assoc..
}
then pull the data which contains function name
after controlling if that function is defined
function_exists("$sql_fn_parameters[0]")
call function
call_user_func_array() or call_user_func()
( any you may also filter parameters array $sql_sourced_parameters_array does not contain any risky syntaxes for more security.)
And have your code controlled from db without a risk.
seems a little bit long way but after implementing it's really a joy to use an admin panel driven php flow.
BUT building a structure like this with OOP is better in long term. (Autoloading of classes etc. )

Eval is not safe obviously.
The best route IMO
Save your data in a table
Run a stored procedure when you are ready to grab and process that data

You should not abuse the database this way. And in general, dynamic code execution is a bad idea. You could employ a more elegant solution to this problem using template engines like Smarty or XSLT.

There are a few way to achieve this:
1) By using evil
eval($data);
That's not a typo, eval is usually considered evil and for good reasons. If you think you have fully validated user data to safely use eval, you are likely wrong, and have given a hacker full access to your system. Even if you only use eval for your own data, hacking the database is now enough to gain full access to everything else. It's also a nightmare to debug code used in eval.
2) Save the data to a file, then include it
file_put_contents($path, $data); include $path;
There are still the same security concerns as eval but at least this time the code is easier to debug. You can even test the code before executing it, eg:
if (strpos(exec('php -l '.$path), 'No syntax errors detected') === false))
{
include $path;
}
The downside to this method, is the extra overhead involved in saving the code.
3) Execute the code straight from the database.
You'd need to use database software that allows this. As far as I am aware, this is only includes database software that stores the content as text files. Having database software with "php eval" built in would not be a good thing. You could try txt-db-api. Alternatively, you could write your own. It would like become very difficult to maintain if you do though but is something to consider if you know exactly how you want your data to be structured and are unlikely to change your mind later.
This could save a lot of overhead and have many speed benefits. It likely won't though. Many types of queries run way faster using a traditional database because they are specifically designed for that purpose. If there's a possibility of trying to write to a file more than once at the same time, then you have to create a locking method to handle that.
4) Store php code as text files outside of the database
If your database contains a lot of data that isn't php code, why even store the php code in the database? This could save a lot of overhead, and if you're database is hacked, then it may no longer be enough to gain full access to your system.
Some of the security considerations
Probably more than 99% of the time, you shouldn't even be attempting to do what you are doing. Maybe you have found an exception though, but just being an intranet, isn't enough, and certainly doesn't mean it's safe to ignore security practices. Unless everyone on the intranet needs full admin access, they shouldn't be able to get it. It's best for everyone to have the minimum privileges necessary. If one machine does get hacked, you don't want the hacker to have easy access to everything on the entire intranet. It's likely the hacker will hide what they are doing and will introduce exploits to later bypass your server security.
I certainly need to do this for the CMS I am developing. I'm designing it mainly to produce dynamic content, not static content. The data itself is mostly code. I started off with simple text files, however it slowly evolved into a complicated text file database. It's very fast and efficient, as the only queries I need are very simply and use indexing. I am now focusing on hiding the complexity from myself and making it easy to maintain with greater automation. Directly writing php code or performing admin tasks requires a separate environment with Superuser access for only myself. This is only out of necessity though, as I manage my server from within, and I have produced my own debugging tools and made an environment for code structured a specific way that hides complexity. Using a traditional code editor, then uploading via ssh would now be too complicated to be efficient. Clients will only be able to write php code indirectly though and I have to go to extreme lengths to make that possible, just to avoid the obvious security risks. There are not so obvious ones too. I've had to create an entire framework called Jhp and every piece of code, is then parsed into php. Every function has to pass a whitelist, is renamed or throws an error, and every variable is renamed, and more. Without writing my own parser and with just a simple blacklist, it would never be even a tiny bit secure. Nothing whatsoever client-side can be trusted, unless I can confirm on every request that it has come entirely from myself, and even then my code error checks before saving so I don't accidentally break my system, and just in case I still do, I have another identical environment to fix it with, and detailed error information in the console that even works for fatal errors, whilst always been hidden from the public.
Conclusion
Unless you go to the same lengths I have (at minimum), then you will probably just get hacked. If you are sure that it is worth going to those lengths, then maybe you have found an exception. If your aim is to produce code with code, then the data is always going to be code and it cannot be separated. Just remember, there are a lot more security considerations other than what I have put in this answer and unless the entire purpose of what you are doing makes this a necessity, then why bother at all mix data with code?

Why can't I delete this item? Doesn't make sense

So hopefully someone with admin privileges will see this and delete it for me.

From a semantic standpoint, you should go with hyperlinks in the HTML using the anchor tag. However, if the variables you need to pass contain critical information that you cannot risk being modified, you could consider using jQuery to POST the information instead. The disadvantage to using just JavaScript would be, of course, if JavaScript was disabled.
You could do both methods, however. If you place an anchor tag with GET variables and then use jQuery to attach a POST onclick, the JavaScript would trump the href. This way, under typical circumstances, the variables would be POST'ed. Under circumstances when JavaScript is unavailable, the variables would be GET'ed. You could then check in your PHP script that is processing the data which one happened (POST or GET) and with GET, do some extra error checking or processing to make sure the data is exactly what you expect. Of course, the big disadvantage to this is having to main the hyperlink and JavaScript URL in two places if anything changes.
EDIT: Reading this again, I started to think: Quite honestly, if you go with my suggestion and write extra error checking or processing code for the GET, it wouldn't hurt to run it on the POST either. And if that's the case, you might as well just do a GET and skip the JavaScript. It'll save you the overhead.

Should you do validation checks that go outside the possiblility of normal user activity?

I am thinking about form security a lot lately. I have been told time and time again to check if form input is a number if you are expecting a number or escape it in case (unless you use proper mysqli formatting) to avoid injection.
After the safety checks are done, should I do additional logic checks? For example, if the user is sending a friend request to them-self for example even if my user interface will not show the form if the user is looking at their own page.

Anything you do in HTML or JavaScript is not sufficient to prevent someone from posting data directly to your HTTP server. So treat anything that is sent by the browser (even cookies!) as "user input" and guard accordingly.
Because even though your form may not allow me to send a friend request to myself, if I'm running Fiddler I can just set a breakpoint, change a POST variable, then resume the request and your server has no idea.
In fact, that's a great eye opening exercise. If you go download Fiddler you can watch everything that the browser sends or receives with your web site. Anything being sent by the browser should not be implicitly trusted.

Yes you should. Haven't we noticed a pattern in some site's URL's and then copied the url but changed some part to get around some restriction in the site bypassing login/access control? Do you want your site to be susceptible to that too?

Of course.
You can't go far enough validating input. Treat it as garbage and plan accordingly. If you want everything to work smoothly make sure that everything checks out.

Of course. The whole point of validation is to properly handle input outside what you're expecting. If users gave you what you expected, you wouldn't have to validate. You need to assume your user could throw absolutely anything at you. As noted, they can bypass the browser entirely using manual HTTP requests. Always code defensively.

A good description I once heard from some famous CS guy (not sure whom, a C writer?) went like "Some time in the early 90's evil on the internet started outgrowing the good on the internet. Any scheme founded upon the idea of enumerating badness is destined to fail (because there's so much of it)".
Don't describe the bad things IE functions like - isSQLcommand(), isJavaScript(), compilesToBinaryandRuns(). This is called Blacklisting and you will exhaust yourself doing it and there is always someone smarter and more evil than you out there.
Instead focus on whitelisting. Enumerate the good, and list only the things you expect to occur. Have a select HTML element with male/female options?
if (selectInput == 'male' || selectInput == 'female'){
//proceed
}
else {
//dump the user data and start over
}
EDIT
It was Marcus Ranum, a security expert:
http://www.ranum.com/security/computer_security/editorials/dumb/

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.