SQL preventation of XSS

SQL preventation of XSS - php

Hey guys so Ive got a question, is there a something I could use when inserting data into the SQL to prevent XSS? Instead of when reading it.
For example I have quite bit of outputs from my sql that are user generated, is it possible to just make that safe on Entering SQL or do I have to make it safe when it leaves SQL?
TL:DR can I use something like htmlspecialchars when inserting data into SQL to prevent XSS, will that be any sort of good protection?

I think several things are mixed up in the question.
Preventing XSS with input validation
In general you can't prevent XSS with input validation, except very special cases when you can validate input for something verz strict like numbers only.
Consider this html page (let's imagine <?= is used to insert data into your html in your server-side language because you hinted at PHP, could of course differ by language used):
<script>
var myVar = <?= var1 ?>;
</script>
In this case, var1 on the server doesn't need to have any special character, only letters are enough to inject javascript. Whether that can be useful for an attacker depends on several things, but technically, this would be vulnerable to XSS with almost any input validation. Of course such assignment may not currently be in your Javascript, but how will you ensure that there never will be?
Another example is obviously DOM XSS, where input does not ever get to the server, but that's a different story.
Preventing XSS is an output encoding thing. Input validation may help in some cases, but will not provide sufficient protection in most cases.
Storing encoded values
It is generally not a good idea to store values html-encoded in your database. On the one hand, it makes searching, ordering, any kind of processing much more cumbersome. On the other hand, it violates single responsibility and separation of concerns. Encoding is a view-level thing, your backend database has nothing to do with how you will want to present that data. It's even more emphasized when you consider different encodings. HTML encoding is only ok if you want to write the data into an HTML context. If it's javascript (in a script tag, or in an on* attribute like onclick, or several other places), html encoding is not sufficient, let alone where you have more special outputs. Your database doesn't need to know, where the data will be used, it's an output thing, and as such, it should be handled by views.

You should test the input for whitelist characters using a regex to only accept like [a-Z][0-9] for example. You'll have a big headache if you try the other way around, using a blacklist, because there are gigantic ways of exploiting input and catching them all is a big problem
Also, be aware of SqlInjections. You should use SqlMap on linux to test if your website is vulnerable

Related

Output or Input filtering?

Output or Input filtering?
I constantly see people writing "filter you inputs", "sanitize your inputs", don't trust user data, but I only agree with the last one, where I consider trusting any external data a bad idea even if it is internal relative to the system.
Input filtering:
The most common that I see.
Take the form post data or any other external source of information and define some boundaries when saving it, for example making sure text is text, numbers are numbers, that sql is valid sql, that html is valid html and that it does not contain harmful markup, and then you save the "safe" data in the database.
But when fetching data you just use the raw data from the database.
In my personal opinion, the data is never really safe.
Although it sounds easy, just filter everything you get from forms and url's, in reality it is much harder than that, it might be safe for one language but not another.
Output filtering:
When doing it this way I save the raw unaltered data, whatever it might be, with prepared statements into the database and then filter out the problematic code when accessing the data, this has it's own advantages:
This adds a layer between html and the server side script.
which I consider to be data access separation of sorts.
Now data is filtered depending on the context, for example I can have the data from the database presented in a html document as plain-escaped-text, or as html or as anything anywhere.
The drawbacks here are that you must not ever forget to add the filtering which is a little bit harder than with input filtering and it uses a bit more CPU when providing data.
This does not mean that you don't need to do validation checks, you still do, it's just that you don't save the filtered data, you validate it and provide the user with a error message if the data is somehow invalid.
So instead of going with "filter your inputs" maybe it should be "validate your inputs, filter your outputs".
so should I go with "Input validation and filtering" or "Input validation and output filtering"?

There is no generic "filtering" for input and output.
Validate your input, escape your output. How you do this depends on context.
Validation is about making sure input falls within sensible ranges, like the length of strings, the numericality of dollar amounts or that a record being updated is owned by the user performing the update. This is about maintaining the logical consistency of your data and preventing people from doing things like zeroing the price of a product they are purchasing or deleting records they shouldn't have access to. It has nothing to do with "filtering" or escaping specific characters in your input.
Escaping is a matter of context, and only really makes sense when you're doing something with data that can be poisoned by injecting certain characters. Escape HTML characters in data you send to the browser. Escape SQL characters in data you send to the database. Escape quotes when you're writing data inside JavaScript <script> tags. Just be conscious of how the data you're dealing with is going to be interpreted by the system you're passing it to and escape accordingly.

The best solution is to filter both. Doing just one makes it more likely that you miss a case, and can leave you open to other types of attacks.
If you only do input filtering, an attacker could find a way to bypass your inputs and cause a vulnerability. This could be someone with access to your database entering data manually, it could be an attacker uploading a file through FTP or some other channel that is not checked, or many other methods.
If you only do output filtering, you can leave yourself open to SQL injection and other server side attacks.
The best method is to filter both your inputs and outputs. It may cause more load, but greatly reduces the risk of an attacker finding a vulnerability.

Sounds like semantics to me. Either way the important thing to remember is to make sure bad data doesn't get in the system.
Doing output filtering instead of input filtering is asking for an SQL Injection .

Validating user input?

I am very confused over something and was wondering if someone could explain.
In PHP i validate user input so htmlentitiies, mysql_real_escape_string is used before inserting into database, not on everything as i do prefer to use regular expressions when i can although i find them hard to work with. Now obviously i will use mysql_real_escape_string as the data is going into the database but not sure should i be using htmlentities() only when getting data from database and displaying it on a webpage as doing so before hand is altering the data entered by a person which is not keeping it's original form which may cause problems if i want to use that data later on for use for something else.
So for example, i have a guestbook with 3 fields name, subject and message. Now obviously the fields can contain anything like malicious code in js tags basically anything, now what confuses me is let say i am a malicious person and i decided to use js tags and some malicous js code and submit the form, now basically i have malicious useless data in my database. Now by using htmlentities when outputting the malicious code to the webpage (guestbook) that is not a problem because htmlentities has converted it to it's safe equivalent but then at the same time i have useless malicious code in the database that i would rather not have.
So after saying all this my question is should i accept the fact that some data in the database maybe malicious, useless data and as long as i use htmlentities on output everything will be ok or should i be doing something else aswell?.
I read so many books saying about filtering data on receiving it and escaping it on outputting it so the original form is kept but they only ever give examples like ensuring a field is only an int using functions already built into php etc but i have never found anything in regards ensuring something like a guestbook where you want users to type anything they want but also how you would filter such data apart from mysql_real_escape_string() to ensure it does not break the DB query?
Could someone please finally close this confusion for me and tell me what i should be doing and what is best practice?
Thanks to anyone who can explain.
Cheers!

This is a long question, but I think what you're actually asking boils down to:
"Should I escape HTML before inserting it into my database, or when I go to display it?"
The generally accepted answer to this question is that you should escape the HTML (via htmlspecialchars) when you go to display it to the user, and not before putting it into the database.
The reason is this: a database stores data. What you are putting into it is what the user typed. When you call mysql_real_escape_string, it does not alter what is inserted into the database; it merely avoids interpreting the user's input as SQL statements. htmlspecialchars does the same thing for HTML; when you print the user's input, it will avoid having it interpreted as HTML. If you were to call htmlspecialchars before the insert, you are no longer being faithful.
You should always strive to have the maximum-fidelity representation you can get. Since storing the "malicious" code in your database does no harm (in fact, it saves you some space, since escaped HTML is longer than unescaped!), and you might in the future want that HTML (what if you use an XML parser on user comments, or some day let trusted users have a subset of HTML in their comments, or some such?), why not let it be?
You also ask a bit about other types of input validation (integer constraints, etc). Your database schema should enforce these, and they can also be checked at the application layer (preferably on input via JS and then again server side).
On another note, the best way to do database escaping with PHP is probably to use PDO, rather than calling mysql_real_escape_string directly. PDO has more advanced functionality, including type checking.

mysql_real_escape_string() is all you need for the database operations. It'll ensure that a malicious user can't embed something into data that'll "break" your queries.
htmlentities() and htmlspecialchars() come into play when you're working with sending stuff to the client/browser. If you want to clean up potentially hostile HTML, you'd be better off using HTMLPurifier, which will strip the data to the bedrock and hose it down with bleach and rebuild it properly.

There's no reason to worry about having malicious JavaScript code in the database if you're escaping the HTML when it comes out. Just make sure you always do escape anything that comes out of the DB.

Preventing JavaScript Injections in a PHP Web Application

What are the measures needed to prevent or to stop JavaScript injections from happening in a PHP Web application so that sensitive information is not given out (best-practices in PHP, HTML/XHTML and JavaScript)?

A good first step is applying the methods listed in the question Gert G linked. This covers in detail the variety of functions that can be used in different situations to cleanse input, including mysql_real_escape_string, htmlentities(), htmlspecialchars(), strip_tags() and addslashes()
A better way, whenever possible, is to avoid inserting user input directly into your database. Employ whitelist input validation: in any situation where you only have a limited range of options, choose from hard-coded values for for insertion, rather than taking the input from any client-side facing form. Basically, this means having only certain values that you accept, instead of trying to eliminate/counter evil/mal-formed/malicious input.
For example:
If you have a form with a drop down for items, do not take use the input from this dropdown for insertion. Remember that a malicious client can edit the information sent with the form's submission, even if you think they only have limited options. Instead, have the drop down refer to an index in an array in your server-side code. Then use that array to choose what to insert. This way, even if an attacker tries to send you malicious code, it never actually hits your database.
Obviously, this doesn't work for free-form applications like forums or blogs. For those, you have to fall back on the "first step" techniques. Still, there are a wide range of options that can be improved via whitelist input validation.
You can also use parameterized queries (aka prepared statements with bind variables) for your sql interactions wherever possible. This will tell your database server that all input is simply a value, so it mitigates a lot of the potential problems from injection attacks. In many situations, this can even cover free-form applications.

Treat any value you output to html with htmlspecialchars() by default.
Only excuse for not using htmlspecialchars() is when you need to output to html string that itself contains html. In that case you must be sure that this string is from completely safe source. If you don't have such confidence then you must pass it through whitelist html filter that allows only for carefully limited set of tags, attributes, and attribute values. You should be especially careful about attribute values. You should never allow everything to pass as attribute value especially for attributes like src, hef, style.
You should know all places in your webapp where you output anything to html without using htmspeciachars(), be sure that you really need those places and be aware that despite all your confidence those places are potential vulnerabilities.
If you are thinking that this is too much caution: "Why do I need to htmlspecialchar() this variable that of I know it contains just integer and loose all the precious CPU cycles?"
Remember this: You don't know, you only think you know, CPU cycles are cheapest thing in the world and nearly all of them will be wasted by waiting for database or filesystem or even memory access.
Also never use blacklist html filters. Youtube made that mistake and someone suddenly found out that only first <script> is removed and if you enter second one in the comment you can inject any Javascript into visitors browser.
Similarly to avoid SQL Injections treat with mysql_real_escape_string() all values that you glue to SQL query, or better yet use PDO Prepared statements.

If your not passing anything that needs to be formated as html then use:
strip_tags() <- Eliminates any suspicious html
and then run the following to clean before saving to the db
mysql_real_escape_string()
If your ajax is saving user entered html via a textbox or wysiwyg then look into using HTMLPurifier to strip out javascript but allow html tags.

I do not agree fully with the other answers provided so I will post my recommendations.
Recommended reading
XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet
Html Injection:
Whenever displaying any user submitted content, it should be appropriately cleaned up with htmlspecialchars or htmlentities when specifying ENT_QUOTES if used inside of single quotes. I would recommend never encapsulating in single quotes and always encapsulating your attributes in double quotes (do not omit them). This applies to things such as:
<input value="<?php echo htmlspecialchars($var); ?>" />
<textarea><?php echo htmlspecialchars($var); ?></textarea>
<p><?php echo htmlspecialchars($var); ?></p>
<img width="<?php echo htmlspecialchars($var); ?>" />
Javascript Injection:
It is best practice (but not always practical) to never echo user content into events and javascript. However, if you do there are some things that can be done to reduce the risk. Only pass integer id's. If you require something such as a type specifier, then use a whitelist and/or conditional check ahead of time before outputting. Possibly force user content to alphanumeric only when appropriate; preg_replace("/[^A-Za-z0-9]/", '', $string); but be very careful what you allow here. Only include content when it is encapsulated in quotes and note that htmlspecialchars/htmlentities does not protect you here. It will be interpreted at runtime even if it has been translated into html entities.
This applies to things such as:
Click
href, src, style, onClick, etc.
Do not echo any user content into other areas such as the body of script tags etc unless it has been forced to an int or some other very very limited character set (if you know what you are doing).
SQL Injection:
Use Prepared statements, bind user content to them, and never directly insert user content into the query. I would recommend creating a class for prepared statements with helper functions for your different basic statement types (and while on the subject, functionalize all of your database statements). If you choose not to use prepared statements then use mysql_real_escape_string() or similar (not addslashes()). Validate content when possible before storing into the database such as forcing/checking for integer data type, conditional checks on types, etc. Use proper database column types and lengths. Remember the main goal here is to prevent sql injection but you can optionally do html/javascript injection protection here as well.
Other Resources
I have done some research online in hopes to find a simple solution already publicly available. I found OWASP ESAPI but it appears quite dated. The links to the php version are broken in several places. I believe I found it here; ESAPI PHP but again it is quite dated and not as simple as I was hoping for. You may find it useful however.
All in all, don't ever just assume you're protected such as using htmlentities in an onClick attribute. You must use the right tool in the right location and avoid doing things in the wrong location.

This question already have some answers accepted and rated by users.
Instead I am also posting an answer, hope this will work fine.
This is tested by me.
1) Use strip_tags() //Prevent html injection
2) mysqli_real_escape_string //suspicious element
3) preg_replace("/[\'\")(;|`,<>]/", "", $value); //This will replace match
You can try what you like.

I'm learning PHP on my own and I've become aware of the strip_tags() function. Is this the only way to increase security?

I'm new to PHP and I'm following a tutorial here:
Link
It's pretty scary that a user can write php code in an input and basically screw your site, right?
Well, now I'm a bit paranoid and I'd rather learn security best practices right off the bat than try to cram them in once I have some habits in me.
Since I'm brand new to PHP (literally picked it up two days ago), I can learn pretty much anything easily without getting confused.
What other way can I prevent shenanigans on my site? :D

There are several things to keep in mind when developing a PHP application, strip_tags() only helps with one of those. Actually strip_tags(), while effective, might even do more than needed: converting possibly dangerous characters with htmlspecialchars() should even be preferrable, depending on the situation.
Generally it all comes down to two simple rules: filter all input, escape all output. Now you need to understand what exactly constitutes input and output.
Output is easy, everything your application sends to the browser is output, so use htmlspecialchars() or any other escaping function every time you output data you didn't write yourself.
Input is any data not hardcoded in your PHP code: things coming from a form via POST, from a query string via GET, from cookies, all those must be filtered in the most appropriate way depending on your needs. Even data coming from a database should be considered potentially dangerous; especially on shared server you never know if the database was compromised elsewhere in a way that could affect your app too.
There are different ways to filter data: white lists to allow only selected values, validation based on expcted input format and so on. One thing I never suggest is try fixing the data you get from users: have them play by your rules, if you don't get what you expect, reject the request instead of trying to clean it up.
Special attention, if you deal with a database, must be paid to SQL injections: that kind of attack relies on you not properly constructing query strings you send to the database, so that the attacker can forge them trying to execute malicious instruction. You should always use an escaping function such as mysql_real_escape_string() or, better, use prepared statements with the mysqli extension or using PDO.
There's more to say on this topic, but these points should get you started.
HTH
EDIT: to clarify, by "filtering input" I mean decide what's good and what's bad, not modify input data in any way. As I said I'd never modify user data unless it's output to the browser.

strip_tags is not the best thing to use really, it doesn't protect in all cases.
HTML Purify:
http://htmlpurifier.org/
Is a real good option for processing incoming data, however it itself still will not cater for all use cases - but it's definitely a good starting point.

I have to say that the tutorial you mentioned is a little misleading about security:
It is important to note that you never want to directly work with the $_GET & $_POST values. Always send their value to a local variable, & work with it there. There are several security implications involved with the values when you directly access (or
output) $_GET & $_POST.
This is nonsense. Copying a value to a local variable is no more safe than using the $_GET or $_POST variables directly.
In fact, there's nothing inherently unsafe about any data. What matters is what you do with it. There are perfectly legitimate reasons why you might have a $_POST variable that contains ; rm -rf /. This is fine for outputting on an HTML page or storing in a database, for example.
The only time it's unsafe is when you're using a command like system or exec. And that's the time you need to worry about what variables you're using. In this case, you'd probably want to use something like a whitelist, or at least run your values through escapeshellarg.
Similarly with sending queries to databases, sending HTML to browsers, and so on. Escape the data right before you send it somewhere else, using the appropriate escaping method for the destination.

strip_tags removes every piece of html. more sophisticated solutions are based on whitelisting (i.e. allowing specific html tags). a good whitelisting library is htmlpurifyer http://htmlpurifier.org/
and of course on the database side of things use functions like mysql_real_escape_string or pg_escape_string

Well, probably I'm wrong, but... In all literature, I've read, people say It's much better to use htmlspellchars.
Also, rather necessary to cast input data. (for int for example, if you are sure it's user id).
Well, beforehand, when you'll start using database - use mysql_real_escape_string instead of mysql_escape_string to prevent SQL injections (in some old books it's written mysql_escape_string still).

Which Type of Input is Least Vulnerable to Attack?

Which type of input is least vulnerable to Cross-Site Scripting (XSS) and SQL Injection attacks.
PHP, HTML, BBCode, etc. I need to know for a forum I'm helping a friend set up.

(I just posted this in a comment, but it seems a few people are under the impression that select lists, radio buttons, etc don't need to be sanitized.)
Don't count on radio buttons being secure. You should still sanitize the data on the server. People could create an html page on their local machine, and make a text box with the same name as your radio button, and have that data get posted back.
A more advanced user could use a proxy like WebScarab, and just tweak the parameters as they are posted back to the server.
A good rule of thumb is to always use parameterized SQL statements, and always escape user-generated data before putting it into the HTML.

We need to know more about your situation. Vulnerable how? Some things you should always do:
Escape strings before storing them in a database to guard against SQL injections
HTML encode strings when printing them back to the user from an unknown source, to prevent malicious html/javascript
I would never execute php provided by a user. BBCode/UBBCode are fine, because they are converted to semantically correct html, though you may want to look into XSS vulnerabilities related to malformed image tags. If you allow HTML input, you can whitelist certain elements, but this will be a complicated approach that is prone to errors. So, given all of the preceding, I would say that using a good off-the-shelf BBCode library would be your best bet.

None of them are. All data that is expected at the server can be manipulated by those with the knowledge and motivation. The browser and form that you expect people to be using is only one of several valid ways to submit data to your server/script.
Please familiarize yourself with the topic of XSS and related issues
http://shiflett.org/articles/input-filtering
http://shiflett.org/blog/2007/mar/allowing-html-and-preventing-xss

Any kind of boolean.
You can even filter invalid input quite easily.
;-)

There's lots of BB code parsers that sanitize input for HTML and so on. If there's not one available as a package, then you could look at one of the open source forum software packages for guidance.
BB code makes sense as it's the "standard" for forums.

The input that is the least vulnerable to attack is the "non-input".
Are you asking the right question?

For Odin's sake, please don't sanitize inputs. Don't be afraid of users entering whatever they want into your forms.
User input is not inherently unsafe. The accepted answer leads to those kinds of web interfaces like my bank's, where Mr. O'Reilly cannot open an account, because he has an illegal character in his name. What is unsafe is always how you use the user input.
The correct way to avoid SQL injections is to use prepared statements. If your database abstraction layer doesn't let you use those, use the correct escaping functions rigorously (myslq_escape et al).
The correct way to prevent XSS attacks is never something like striptags(). Escape everything - in PHP, something like htmlentities() is what you're looking for, but it depends on whether you are outputing the string as part of HTML text, an HTML attribute, or inside of Javascript, etc. Use the right tool for the right context. And NEVER just print the user's input directly to the page.
Finally, have a look at the Top 10 vulnerabilities of web applications, and do the right thing to prevent them. http://www.applicure.com/blog/owasp-top-10-2010

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.