What is "enough sanitization" for a URL [duplicate]

What is "enough sanitization" for a URL [duplicate] - php

This question already has answers here:
Best way to handle security and avoid XSS with user entered URLs
(12 answers)
Closed 9 years ago.
The URL would be
Saved to a MySQL database
Used to display a picture on the user's profile
would strip_tags() and mysql_real_escape_string() be enough?

"Enough sanitization" thoroughly depends on what environment you're talking about. Sanitization for MySQL should be considered entirely separate from sanitization for web output, and you should handle them separately to avoid a lot of hassle.
Sanitizing for MySQL
mysql_real_escape_string() will sanitize a piece of data and make it safe to put inside an SQL query.
Any other type of malicious data, such as HTML tags inside the string, should be absolutely ignored. Trying to manipulate it here will lead you to headaches as you try to "un-manipulate" it later after getting it out of the database. Bad "web data" cannot harm your database.
Sanitizing for output
htmlspecialchars($val) at output time will prevent any malicious tags from being rendered, because < and > characters are converted into their entity representations and not rendered as tag delimiters.
Use the ENT_QUOTES modifier if you are outputting something that is inside an HTML element's quoted attribute, such as <input name="email" value="<?php echo htmlspecialchars($email,ENT_QUOTES); ?>" />
That should be all you need, unless you have special requirements. strip_tags() shouldn't really be used for sanitization, as it can be fooled with badly formed HTML. Sanitization is a worthy goal, and if you can keep your contexts separate, you'll run into fewer problems with data manipulation between them.

It's probably safer and better to call htmlentities() on the string instead of counting on strip_tags().
strip_tags() won't remove html special chars like '"&
e.g., if your code is:
<img src="<?= strip_tags($myVar) ?>">
and
$myVar = '">something goes here<';
then you end up with:
<img src="">something goes here<">
Which is pretty obviously the root of an XSS hole; an actual exploit is left as an exercise for the reader.

I initially upvoted Frank's answer, but thought of a problem: htmlentities() will break legal urls like this:
http://www.mywebsite.com/profile?id=jojo&w=60&h=60
Perhaps stripping angle brackets + mysql_real_escape would be sufficient?

Related

When to use Strip_tags and when to use htmlspecialchars()

Every time I look at a code someone wrote, they either use
$var = htmlspecialchars($var);
$var = trim($var);
$var = stripcslashes($var);
or just
strip_tags($var)
when to use the first and the second one?

htmlspecialchars and htmlentities are for displaying text in web pages. It will translate the characters that have special meaning in HTML, such as the < and > characters that surround tags, into their entity codes. For instance, if the string contains
Use <table> to create a table on a web page.
it will be converted to
Use <table> to create a table on a web page.
When you display the string on a web page, you'll then see the intended message correctly.
strip_tags completely removes all the HTML tags. So the above string would be converted to:
Use to create a table on a web page.
If you display this, it doesn't make much sense. This is often used to sanitize input that isn't really meant for display, and shouldn't contain anything that looks like an HTML tag in the first place, such as usernames. Although it would probably be better to just validate it against whatever rules you have for those values (e.g. usernames should just be alphanumeric characters).
In my opinion, strip_tags() is almost always the wrong tool. It's a simple crutch to prevent XSS attacks, since code without any HTML tags can't introduce scripts. But it's a broad brush that doesn't usually match the specific needs.
And it's generally wrong to do these conversions when processing input. Do them when you're using the data, performing whatever escaping is necessary at that time. So you use mysqli_real_escape_string() if you're substituting the variable into a query (but you really should use prepared statements instead of this), htmlentities() when you're displaying it on a web page, urlencode() when you're putting it into a URL query string, etc.

In my case, i use htmlspecialchars() while passing url parameter in php. For eg
<?php
?>
This prevents cross site scripting, which means if users put some other code such as javascript, it replaces the reserved characters.
And i use strip_tags in forms, to prevent users from inserting tags into database.

Is my method of finding XSS vulnerability's a good place to start?

Information
This is a quick question regarding finding possible XSS attacks in my website.
I am currently securing my website and have learnt that a good way to prevent XSS is to use htmlspecialchars($string, ENT_QUOTES, 'UTF-8'); to make sure that the html is displayed rather than ran.
My solution
I have pumped my database with test data which is exactly as below <script>console.log("This Page is Vulnerable");</script>
Therefore any page that displays any row of data that isnt escaped will put out a console.log which will then allow me to hunt it down in my source and escape it.
Question
Now I understand this isn't the only thing I would have to do to prevent XSS, but does this at least narrow the possibility of persistent XSS attacks?
Lastly, does anyone have any advice on where to go from here? (I understand this is a vague question, so please ignore if you like. Otherwise, any questions will be answered ASAP.)
Thank you.

Yes, ofcourse, a single function wrapping the input/output data is never enough. Programming ain't no magic, it's logic.
Assuming you have this example HTML/JS/PHP for some reason:
<form action="" method="POST" />
<input type="text" name="yourInput" />
<input type="submit"/>
</form>
<script><?= htmlspecialchars($_POST['yourInput'], ENT_QUOTES, 'UTF-8');?></script>
For some reason, which maybe nobody can explain, you put the user input into <script> tags. You have used HTML Special Chars, so no quotes or <, > will be present, but let's say the user enter this:
alert(document.cookie)
there's no quotes and opening tags, but still after submitting the form, on the next load an ALERT with the current user cookie will be displayed.
So, as I said in my comments, you should go through all your inputs and test their behavior. Try to think as a potential hacker.
It's not all about using a function which is wrapping the data, but where the data is used. If you put it in the wrong place, no function will save you. In the example above, you need to place data in the right place, and if you are going to use it - use it as a string.

I'd say it is exactly what you say — "a good start" — insofar as it should catch any simple mistakes where you simply meant to escape the user input but forgot.
In fact, I'd be inclined to simplify it further, and just try injecting an arbitrary string that should get escaped, like, say "<<<<<", and searching the HTML source of the returned page for that string to see if it appears unescaped anywhere. (Don't forget to also test it with single and double quotes, since those need to be escaped too in attributes.) This kind of testing would even be fairly simple to automate.
What neither of those methods will catch is cases where you've tried to escape or sanitize the input, but have done it poorly, so that the simple test inputs won't pass, but other vulnerabilities still remain. However, if you can be confident that you haven't done that, then these simple tests should be enough.
Anyway, where you IMO should go from here is getting in the habit of writing your code such that XSS vulnerabilities can't even happen. In particular, get used to:
Escaping every string that isn't supposed to be HTML with htmlspecialchars() before embedding it in HTML. (Of course, the same goes, mutatis mutandis, for embedding anything that isn't supposed to be SQL into SQL code, or anything that isn't supposed to be JavaScript into JavaScript code, etc.)
This is IMO best done just before the string is printed or concatenated with HTML, so that you never have to worry about whether or not the string has already been escaped earlier. (Tip: if you're tired of typing htmlspecialchars() all the time, consider defining your own, shorter alias for it.)
Thinking about user inputs (or anything else potentially coming from unknown sources) not in terms of what they're supposed to contain, or even what an attacker might inject into them, but simply in terms of what they could possibly contain.
That is, if your code takes a string as an input, write it so that it behaves correctly even if given any string of any length containing any arbitrary bytes in any order.
Conversely, if you cannot be sure that some code will do that (e.g. because you didn't write it), explicitly verify and/or force any inputs you pass into it to only have values that you know it can handle.

Protection against XSS exploits?

I'm newish to PHP but I hear XSS exploits are bad. I know what they are, but how do I protect my sites?

To prevent from XSS attacks, you just have to check and validate properly all user inputted data that you plan on using and dont allow html or javascript code to be inserted from that form.
Or you can you Use htmlspecialchars() to convert HTML characters into HTML entities. So characters like <> that mark the beginning/end of a tag are turned into html entities and you can use strip_tags() to only allow some tags as the function does not strip out harmful attributes like the onclick or onload.

Escape all user data (data in the database from user) with htmlentities() function.
For HTML data (for example from WYSIWYG editors), use HTML Purifier to clean the data before saving it to the database.

strip_tags() if you want to have no tags at all. Meaning anything like <somthinghere>
htmlspecialchars() would covert them to html so the browser will only show and not try to run.
If you want to allow good html i would use something like htmLawed or htmlpurifier

The bad news
Unfortunately, preventing XSS in PHP is a non-trivial undertaking.
Unlike SQL injection, which you can mitigate with prepared statements and carefully selected white-lists, there is no provably secure way to separate the information you are trying to pass to your HTML document from the rest of the document structure.
The good news
However, you can mitigate known attack vectors by being particularly cautious with your escaping (and keeping your software up-to-date).
The most important rule to keep in mind: Always escape on output, never on input. You can safely cache your escaped output if you're concerned about performance, but always store and operate on the unescaped data.
XSS Mitigation Strategies
In order of preference:
If you are using a templating engine (e.g. Twig, Smarty, Blade), check that it offers context-sensitive escaping. I know from experience that Twig does. {{ var|e('html_attr') }}
If you want to allow HTML, use HTML Purifier. Even if you think you only accept Markdown or ReStructuredText, you still want to purify the HTML these markup languages output.
Otherwise, use htmlentities($var, ENT_QUOTES | ENT_HTML5, $charset) and make sure the rest of your document uses the same character set as $charset. In most cases, 'UTF-8' is the desired character set.
Why shouldn't I filter on input?
Attempting to filter XSS on input is premature optimization, which can lead to unexpected vulnerabilities in other places.
For example, a recent WordPress XSS vulnerability employed MySQL column truncation to break their escaping strategy and allow the prematurely escaped payload to be stored unsafely. Don't repeat their mistake.

How to prevent XSS attack with Zend Form using %

our company has made a website for our client. The client hired a webs security company to test the pages for security before the product launches.
We've removed most of our XSS problems. We developed the website with zend. We add the StripTags, StringTrim and HtmlEntities filters to the order form elements.
They ran another test and it still failed :(
They used the following for the one input field in the data of the http header: name=%3Cscript%3Ealert%28123%29%3C%2Fscript%3E which basically translates to name=<script>alert(123);</script>
I've added alpha and alnum to some of the fields, which fixes the XSS vulnerability (touch wood) by removing the %, however, now the boss don't like it because what of O'Brien and double-barrel surnames...
I haven't come across the %3C as < problem reading up about XSS. Is there something wrong with my html character set or encoding or something?
I probably now have to write a custom filter, but that would be a huge pain to do that with every website and deployment. Please help, this is really frustrating.
EDIT:
if it's about escaping the form's output, how do I do that? The form submits to the same page - how do I escape if I only have in my view <?= $this->form ?>
How can I get Zend Form to escape it's output?

%3Cscript%3Ealert%28123%29%3C%2Fscript%3E is the URL-encoded form of <script>alert(123);</script>. Any time you include < in a form value, it will be submitted to the server as %3C. PHP will read and decode that back to < before anything in your application gets a look at it.
That is to say, there is no special encoding that you have to handle; you won't actually see %3C in your input, you see <. If you're failing to encode that for on-page display then you don't have even the most basic defenses against XSS.
We've removed most of our XSS problems. We developed the website with zend. We add the StripTags, StringTrim and HtmlEntities filters to the order form elements.
I'm afraid you have not fixed your XSS problems at all. You may have merely obfuscated them.
Input filtering is a depressingly common but quite wrong strategy for blocking XSS.
It is not the input that's the problem. As your boss says, there is no reason you shouldn't be able to input O'Brien. Or even <script>, like I am just now in this comment box. You should not attempt to strip tags in the input or even HTML-encode them, because who knows at input-time that the data is going to end up in an HTML page? You don't want your database filled with nonsense like 'Fish&Chips' which then ends up in an e-mail or other non-HTML context with weird HTML escapes in it.
HTML-encoding is an output-stage issue. Leave the incoming strings alone, keep them as raw strings in the database (of course, if you are hacking together queries in strings to put the data in the database instead of parameterised queries, you would need to SQL-escape the content at exactly that point). Then only when you are inserting the values in HTML, encode them:
Name: <?php echo htmlspecialchars($row['name']); ?>
If you have a load of dodgy code like echo "Name: $name"; then I'm afraid you have much rewriting to do to make it secure.
Hint: consider defining a function with a short name like h so you don't have to type htmlspecialchars so much. Don't use htmlentities which will usually-unnecessarily encode non-ASCII characters, which will also mess them up unless you supply a correct $charset argument.
(Or, if you are using Zend_View, $this->escape().)
Input validation is useful on an application-specific level, for things like ensuring telephone number fields contain numbers and not letters. It is not something you can apply globally to avoid having to think about the issues that arise when you put a string inside the context of another string—whether that's inside HTML, SQL, JavaScript string literals or one of the many other contexts that require escaping.

If you correctly escape strings every time you write them to the HTML page, you won't have any issues.
%3C is a URL-encoded <; it is decoded by the server.

What is the correct/safest way to escape input in a forum?

I am creating a forum software using php and mysql backend, and want to know what is the most secure way to escape user input for forum posts.
I know about htmlentities() and strip_tags() and htmlspecialchars() and mysql_real_escape_string(), and even javascript's escape() but I don't know which to use and where.
What would be the safest way to process these three different types of input (by process, I mean get, save in a database, and display):
A title of a post (which will also be the basis of the URL permalink).
The content of a forum post limited to basic text input.
The content of a forum post which allows html.
I would appreciate an answer that tells me how many of these escape functions I need to use in combination and why.
Thanks!

When generating HTLM output (like you're doing to get data into the form's fields when someone is trying to edit a post, or if you need to re-display the form because the user forgot one field, for instance), you'd probably use htmlspecialchars() : it will escape <, >, ", ', and & -- depending on the options you give it.
strip_tags will remove tags if user has entered some -- and you generally don't want something the user typed to just disappear ;-)
At least, not for the "content" field :-)
Once you've got what the user did input in the form (ie, when the form has been submitted), you need to escape it before sending it to the DB.
That's where functions like mysqli_real_escape_string become useful : they escape data for SQL
You might also want to take a look at prepared statements, which might help you a bit ;-)
with mysqli - and with PDO
You should not use anything like addslashes : the escaping it does doesn't depend on the Database engine ; it is better/safer to use a function that fits the engine (MySQL, PostGreSQL, ...) you are working with : it'll know precisely what to escape, and how.
Finally, to display the data inside a page :
for fields that must not contain HTML, you should use htmlspecialchars() : if the user did input HTML tags, those will be displayed as-is, and not injected as HTML.
for fields that can contain HTML... This is a bit trickier : you will probably only want to allow a few tags, and strip_tags (which can do that) is not really up to the task (it will let attributes of the allowed tags)
You might want to take a look at a tool called HTMLPUrifier : it will allow you to specify which tags and attributes should be allowed -- and it generates valid HTML, which is always nice ^^
This might take some time to compute, and you probably don't want to re-generate that HTML each time is has to be displayed ; so you can think about storing it in the database (either only keeping that clean HTML, or keeping both it and the not-clean one, in two separate fields -- might be useful to allow people editing their posts ? )
Those are only a few pointers... hope they help you :-)
Don't hesitate to ask if you have more precise questions !

mysql_real_escape_string() escapes everything you need to put in a mysql database. But you should use prepared statements (in mysqli) instead, because they're cleaner and do any escaping automatically.
Anything else can be done with htmlspecialchars() to remove HTML from the input and urlencode() to put things in a format for URL's.

There are two completely different types of attack you have to defend against:
SQL injection: input that tries to manipulate your DB. mysql_real_escape_string() and addslashes() are meant to defend against this. The former is better, but parameterized queries are better still
Cross-Site scripting (XSS): input that, when displayed on your page, tries to execute JavaScript in a visitor's browser to do all kinds of things (like steal the user's account data). htmlspecialchars() is the definite way to defend against this.
Allowing "some HTML" while avoiding XSS attacks is very, very hard. This is because there are endless possibilities of smuggling JavaScript into HTML. If you decided to do this, the safe way is to use BBCode or Markdown, i.e. a limited set of non-HTML markup that you then convert to HTML, while removing all real HTML with htmlspecialchars(). Even then you have to be careful not to allow javascript: URLs in links. Actually allowing users to input HTML is something you should only do if it's absolutely crucial for your site. And then you should spend a lot of time making sure you understand HTML and JavaScript and CSS completely.

The answer to this post is a good answer
Basically, using the pdo interface to parameterize your queries is much safer and less error prone than escaping your inputs manually.

I have a tendency to escape all characters that would be problematic in page display, Javascript and SQL all at the same time. It leaves it readable on the web and in HTML eMail and at the same time removes any problems with the code.
A vb.NET Line Of Code Would Be:
SafeComment = Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
Replace(Replace(Replace( _
HttpUtility.HtmlEncode(Trim(strInput)), _
":", ":"), "-", "-"), "|", "|"), _
"`", "`"), "(", "("), ")", ")"), _
"%", "%"), "^", "^"), """", """), _
"/", "/"), "*", "*"), "\", "\"), _
"'", "'")

First of all, general advice: don't escape variables literally when inserting in the database. There are plenty of solutions that let you use prepared statements with variable binding. The reason to not do this explicitly is because it is only a matter of time then before you forget it just once.
If you're inserting plain text in the database, don't try to clean it on insert, but instead clean it on display. That is to say, use htmlentities to encode it as HTML (and pass the correct charset argument). You want to encode on display because then you're no longer trusting that the database contents are correct, which isn't necessarily a given.
If you're dealing with rich text (html), things get more complicated. Removing the "evil" bits from HTML without destroying the message is a difficult problem. Realistically speaking, you'll have to resort to a standardized solution, like HTMLPurifier. However, this is generally too slow to run on every page view, so you'll be forced to do this when writing to the database. You'll also have to ensure that the user can see their "cleaned up" html and correct the cleaned up version.
Definitely try to avoid "rolling your own" filter or encoding solution at any step. These problems are notoriously tricky, and you run a large risk of overlooking some minor detail that has big security implications.

I second Joeri, do not roll your own, go here to see some of the the many possible XSS attacks
http://ha.ckers.org/xss.html
htmlentities() -> turns text into html, converting characters to entities. If using UTF-8 encoding then use htmlspecialchars() instead as the other entities are not needed. This is the best defence against XSS. I use it on every variable I output regardless of type or origin unless I intend it to be html. There is only a tiny performance cost and it is easier than trying to work out what needs escaping and what doesn't.
strip_tags() - turns html into text by removing all html tags. Use this to ensure that there is nothing nasty in your input as a adjunct to escaping your output.
mysql_real_escape_string() - escapes a string for mysql and is your defence against SQL injections from little Bobby tables (better to use mysqli and prepare/bind as escaping is then done for you and you can avoid lots of messy string concatenations)
The advice given obve re avoiding HTML input unless it is essential and opting for BBCode or similar (make your own up if needs be) is very sound indeed.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.