I am trying to avoid XSS attack via url
url :http://example.com/onlineArcNew/html/terms_conditions_1.php/%22ns=%22alert%280x0000DC%29
I have tried
var_dump(filter_var('http://10.0.4.2/onlineArcNew/html/terms_conditions_1.php/%22ns=%22alert%280x0000DC%29', FILTER_VALIDATE_URL));
and other url_validation using regex but not worked at all.
above link shows all the information but my css and some java script function doesn't work.
please suggest the best possible solution...
Try using FILTER_SANITIZE_SPECIAL_CHARS Instead
$url = 'http://10.0.4.2/onlineArcNew/html/terms_conditions_1.php/%22ns=%22alert%280x0000DC%29';
// Original
echo $url, PHP_EOL;
// Sanitise
echo sanitiseURL($url), PHP_EOL;
// Satitise + URL encode
echo sanitiseURL($url, true), PHP_EOL;
Output
http://10.0.4.2/onlineArcNew/html/terms_conditions_1.php/%22ns=%22alert%280x0000DC%29
http://10.0.4.2/onlineArcNew/html/terms_conditions_1.php/"ns="alert(0x0000DC)
http%3A%2F%2F10.0.4.2%2FonlineArcNew%2Fhtml%2Fterms_conditions_1.php%2F%26%2334%3Bns%3D%26%2334%3Balert%280x0000DC%29
Function Used
function sanitiseURL($url, $encode = false) {
$url = filter_var(urldecode($url), FILTER_SANITIZE_SPECIAL_CHARS);
if (! filter_var($url, FILTER_VALIDATE_URL))
return false;
return $encode ? urlencode($url) : $url;
}
If you're using MVC, then try to decode all ofthe values before routing, and use stript_tags() to get rid of these nasties. And as the docs say, case should not impact anything.
If not, create a utility function and do the same while retrieveing the variables from the URI. But I am by no means an XSS expert, so this might be just a part of the trick.
From Janis Peisenieks
Step 1: Escape Output Provided by Users
If you want to include data within a page that’s been provided by users, escape the output. And, in this simplified list, we’re going to stick with one simple escape operation: HTML encode any <, >, &, ‘, “. For example, PHP provides the htmlspecialchars() function to accomplish this common task.
Step 2: Always Use XHTML
Read through OWASP’s XSS prevention strategies, and it becomes apparent that protecting against injection requires much more effort if you use unquoted attributes in your HTML. In contrast, in quoted attributes, escaping data becomes the same process needed to escape data for content within tags, the escape operation we already outlined above. That’s because the only troublemaker in terms of sneaking in structurally significant content within the context of a quoted attribute is the closing quote.
Obviously, your markup doesn’t have to be XHTML in order to contain quoted attributes. However, shooting for and validating against XHTML makes it easy to test if all of the attributes are quoted.
Step 3: Only Allow Alphanumeric Data Values in CSS and JavaScript
We need to limit the data you allow from users that will be output within CSS and Javascript sections of the page to alphanumeric (e.g., a regex like [a-zA-Z0-9]+) types, and make sure they are used in a context in which they truly represent values. In Javascript this means user data should only be output within quoted strings assigned to variables (e.g., var userId = “ALPHANUMERIC_USER_ID_HERE”;.) In CSS this means that user data should only be output within the context for a property value (e.g., p { color: #ALPHANUMERIC_USER_COLOR_HERE;}.) This might seem Draconian, but, hey, this is supposed to be a simple XSS tutorial
Now, to be clear, you should always validate user data to make sure it meets your expectations, even for data that’s output within tags or attributes, as in the earlier examples. However, it’s especially important for CSS and JavaScript regions, as the complexity of the possible data structures makes it exceedingly difficult to prevent XSS attacks.
Common data you might want users to be able supply to your JavaScript such as Facebook, Youtube, and Twitter ID’s can all be used whilst accommodating this restriction. And, CSS color attributes and other styles can be integrated, too.
Step 4: URL-Encode URL Query String Parameters
If user data is output within a URL parameter of a link query string, make sure to URL-encode the data. Again, using PHP as example, you can simply use the urlencode() function. Now, let’s be clear on this and work through a couple examples, as I’ve seen much confusion concerning this particular point.
Must URL-encode
The following example outputs user data that must be URL-encoded because it is used as a value in the query string.
http://site.com?id=USER_DATA_HERE_MUST_BE_URL_ENCODED”>
Must Not URL-Encode
The following example outputs the user-supplied data for the entire URL. In this case, the user data should be escaped with the standard escape function (HTML encode any <, >, &, ‘, “), not URL-encoded. URL-encoding this example would lead to malformed links.
Related
If the following statements are true,
All documents are served with the HTTP header Content-Type: text/html; charset=UTF-8.
All HTML attributes are enclosed in either single or double quotes.
There are no <script> tags in the document.
are there any cases where htmlspecialchars($input, ENT_QUOTES, 'UTF-8') (converting &, ", ', <, > to the corresponding named HTML entities) is not enough to protect against cross-site scripting when generating HTML on a web server?
htmlspecialchars() is enough to prevent document-creation-time HTML injection with the limitations you state (ie no injection into tag content/unquoted attribute).
However there are other kinds of injection that can lead to XSS and:
There are no <script> tags in the document.
this condition doesn't cover all cases of JS injection. You might for example have an event handler attribute (requires JS-escaping inside HTML-escaping):
<div onmouseover="alert('<?php echo htmlspecialchars($xss) ?>')"> // bad!
or, even worse, a javascript: link (requires JS-escaping inside URL-escaping inside HTML-escaping):
<a href="javascript:alert('<?php echo htmlspecialchars($xss) ?>')"> // bad!
It is usually best to avoid these constructs anyway, but especially when templating. Writing <?php echo htmlspecialchars(urlencode(json_encode($something))) ?> is quite tedious.
And... injection issues can happen on the client-side as well (DOM XSS); htmlspecialchars() won't protect you against a piece of JavaScript writing to innerHTML (commonly .html() in poor jQuery scripts) without explicit escaping.
And... XSS has a wider range of causes than just injections. Other common causes are:
allowing the user to create links, without checking for known-good URL schemes (javascript: is the most well-known harmful scheme but there are more)
deliberately allowing the user to create markup, either directly or through light-markup schemes (like bbcode which is invariably exploitable)
allowing the user to upload files (which can through various means be reinterpreted as HTML or XML)
Assuming you are not using older PHP versions (5.2 or so), the htmlspecialchars is "safe" (and off course taking the backend code into consideration as #Royal Bg mentions)
In older PHP versions malformed UTF-8 characters made this function vulnerable
My 2 cents: just always sanitize/check your inputs by telling what is allowed, instead of just escaping everything/encoding everything
i.e. if someone must enter a telephone number, i can imagine the following characters are allowed: 0123456789()+-. and a space, but all others are just ignored / stripped out
Same would apply to addresses etc. someone specifying UTF-8 characters for dots/blocks/hearts etc. in their address must be mentally ill...
As far as i know, yes. I cant imagine a case where it doesnt avoid xss. If you want to be completely safe, use strip_tags()
In a contact form, if someone enters the following into the textbox:
<?php echo 'hi'; ?>
I see that the server will not execute it because of an error. What I would like it to do is instead, somehow escape it into plain text and display it correctly. I have seen other sites been able to do this. I originally thought this could be solved by the addslashes() function, but that doesn't seem to work.
Thanks,
Phil
No. Use htmlspecialchars instead. Don't use addslashes.
To be more specific, addslashes bluntly escapes all instances of ', " and \ and NUL. It was meant to prevent SQL injection, but it has no real use in proper security measures.
What you want is preventing the browser to interpret tags as is (and that's entirely different from preventing SQL injections). For instance, if I want to talk about <script> elements, SO shouldn't simply send that string literally, causing to start an actual script (that can lead to Cross-site scripting), but some characters, especially < and >, need to be encoded as HTML entities so they're shown as angle brackets (the same is true for &, that otherwise would be interpreted as the start of an HTML entity).
In your case, output after htmlspecialchars would look like:
<?php echo 'hi'; ?>
Use htmlspecialchars before outputing anything provided by the user. But in this case, also make sure that you do not execute anything the user inputs. Do not use eval, include or require. If you save the user data to a file, use readfile or file_get_contents+htmlspecialchars instead of include/require. If you're using eval, change it into echo and so on.
Well, the title is my question. Can anybody give me a list of things to do to sanitize my data before entering to mysql database using php, especially if the data contains html tags?
It depends on a lot of things. If you don't want to accept any HTML, that makes it a whole lot easier, run it through strip_tags() first to remove all the HTML from it. After that it's much safer. If you do want to accept some HTML, you can selectively keep some tags from it with the same function, just add in the tags to keep after. eg: strip_tags($string_to_sanitize, '<p><div>'); // Keeps only <p> and <div> tags.
As for inserting into a database, it's always best to sanitize anything before inserting into the database; adopting a "don't trust anybody" mentality will save you a lot of trouble. Preventing against SQL injection is fairly straightforward, this is the method I use:
$q = sprintf("INSERT INTO table_name (string_field, int_field) VALUES ('%s', %d);",
mysql_real_escape_string($values['string']),
mysql_real_escape_string($values['number']));
$result = mysql_query($q, $connection)
Generally once you open the door for allowing HTML in, you'll have a whole deal of things to worry about (there are some great articles on defending from XSS out there). If you want to test for XSS vulnerabilities, try the examples on http://ha.ckers.org/xss.html. There are some they have there that you would probably never even consider, so give it a look!
Also, if you are accepting specific types of input (eg: numbers, emails, boolean values) try using the inbuilt filter_var() function in PHP. They have a bunch of inbuilt types to validate data against (http://www.php.net/manual/en/filter.filters.validate.php), as well as a number of filters to sanitize your data (http://www.php.net/manual/en/filter.filters.sanitize.php).
Generally, accepting any input is like opening a Pandora's Box, and while you'll probably never be able to block 100% of the weaknesses (people are always looking to find a way in), you can block the common ones to save you headaches.
Finally remember to sanitize ALL external data. Just because you make a dropdown input doesn't mean some shady person can't send their own data instead!
Use mysql_real_escape_string();
mysql_query("INSERT INTO table(col) VALUES('".mysql_real_escape_string($_POST['data']."')");
You should use prepared statements when inserting data into the database, not any sort of escaping. (PHP manual: prepared statements in pdo and mysqli.)
Sanitization for HTML output should, as mentioned by others, happen when you go to take data out of the database and merge it into a page, not before.
Turn off register_globals and magic_quotes, use mysql_real_escape_string on any string coming from the user before placing it into your query.
Of course mysql_real_escape_string
When dealing with any kind of input start from the I won't allow anything stand point and whitelist only that deemed to be acceptable.
On insert you need to make sure that the data is MySQL-escaped. For this, use mysql_real_escape_string.
Before showing the data you will need to strip out unsafe HTML and/or JavaScript code. Many people choose to store the sanitised version in the database. Other prefer to strip the ugly HTML from the string before rendering.
You do this in PHP with some filtering. an example is the Drupal filter_xss function:
function filter_xss($string, $allowed_tags = array('a', 'em', 'strong', 'cite', 'code', 'ul', 'ol', 'li', 'dl', 'dt', 'dd')) {
// Only operate on valid UTF-8 strings. This is necessary to prevent cross
// site scripting issues on Internet Explorer 6.
if (!drupal_validate_utf8($string)) {
return '';
}
// Store the input format
_filter_xss_split($allowed_tags, TRUE);
// Remove NUL characters (ignored by some browsers)
$string = str_replace(chr(0), '', $string);
// Remove Netscape 4 JS entities
$string = preg_replace('%&\s*\{[^}]*(\}\s*;?|$)%', '', $string);
// Defuse all HTML entities
$string = str_replace('&', '&', $string);
// Change back only well-formed entities in our whitelist
// Decimal numeric entities
$string = preg_replace('/&#([0-9]+;)/', '&#\1', $string);
// Hexadecimal numeric entities
$string = preg_replace('/&#[Xx]0*((?:[0-9A-Fa-f]{2})+;)/', '&#x\1', $string);
// Named entities
$string = preg_replace('/&([A-Za-z][A-Za-z0-9]*;)/', '&\1', $string);
return preg_replace_callback('%
(
<(?=[^a-zA-Z!/]) # a lone <
| # or
<!--.*?--> # a comment
| # or
<[^>]*(>|$) # a string that starts with a <, up until the > or the end of the string
| # or
> # just a >
)%x', '_filter_xss_split', $string);
}
well, there is not too much to do while we're talking of inserting data from textarea to mysql database.
For the strings placed into query, Mysql requirements are not so complicated.
Only 2 rules to follow:
inserted data should be surrounded by quotes.
some special character in the data should be escaped.
Note that this operation has nothing to do with security. It's syntax requirements.
Assuming you're adding quotes already, the only thing you have to add is escaping. Depends on your encoding, you can use addslashes or mysql_escape_string or mysql_real_escape_string functions.
However, other parts of query require more attention. If you're curious, refer to my earlier answer with complete guide: In PHP when submitting strings to the database should I take care of illegal characters using htmlspecialchars() or use a regular expression?
HTML tags has nothing to do with database and require no special attention.
However, for displaying data from untrusted source, some precautions should be taken. It was described in this topic already, only I have to add is you can't trust to strip_tags when used with second parameter.
You can use mysql_real_escape_string, you can also use htmlentities with addslashes... or you can use all 3 together also...
I have the following array:
'tagline_p' => "I'm a <a href='#showcase'>multilingual web</a> developer, designer and translator. I'm here to <a href='#contact'>help you</a> reach a worldwide audience.",
Should I escape the HTML tags inside the array to avoid hackings to my site? (How to escape them?)
or is OK to have HTML tags inside an array?
The only time it becomes a problem is when it contains user input. You know what you put in your array, and trust it. But you don't know what users are passing in, and don't trust that.
So in this particular case, escaping is not needed. But as soon as user input is involved, you should escape the input.
It's not the HTML itself that is dangerous, but the type of HTML users can pass in, like script tags which allow them to execute Javascript.
Addition
Note that it's best practice to only escape on output not on input. The output is where the data can do damage, so you want to consistently escape that. That way, you don't have to make sure that all input is escaped.
That way, you don't have problems when outputting data to different formats where maybe different rules apply. You don't have to use things like stripslashes() or htmlspecialchars_decode() if you don't need things to be output as html.
It's fine to store the data in the array.
You only need to escape the tags when you are outputting it into an HTML context, and you don't trust it, or you don't want the HTML to be interpreted.
You have to escape data in an appropriate manner to where you are sending it; for HTML if you don't want it to be read as HTML you can use htmlspecialchars(), likewise if you are putting it into an SQL statement and you don't want it to be read as SQL, you can use mysql_real_escape_string() etc.
You should escape HTML when it has been entered by a user (and thus is unsafe) AND you're going to display that HTML in you site. If it's you who wrote it, it doesn't need any kind of escaping.
If you do need to escape html you should do so right before displaying it on your site. There is no need to escape data when you're just lugging it around (like you're presummably doing with that array). You can escape HTML with the htmlspecialchars() function.
(Use htmlspecialchars or htmlentities to escape the HTML.)
Having HTML tags is fine as long as you restrict the set of tags and attributes coming from user, if that array is dynamically generated. For example, <script> should not be allowed, nor event handlers like onmouseover.
It depends on how the HTML is getting into the array. If it's hardcoded by you, it's probably all right. If it's coming from a user, well, all user input is suspect- HTML is just more difficult to clean.
The real question might be "Why do you want to put HTML in an array?". If it's static text, put it in a template file somewhere.
make an array of allowable tags and use strip_tags($input_array[$key],$allowable_tags)
or make a function like this
function sanitize_input($allowable_tags='<br><b><strong><p>')
{
$input_array = $input;
foreach ($input as $key=>$value){
if(!empty($value)) {
$input_array[$key] = strip_tags($input_array[$key],$allowable_tags);
}
}
return $input_array;
}
In an article http://dev.mysql.com/tech-resources/articles/4.1/prepared-statements.html, it says the followings:
There are numerous advantages to using prepared statements in your applications, both for security and performance reasons.
Prepared statements can help increase security by separating SQL logic from the data being supplied. This separation of logic and data can help prevent a very common type of vulnerability called an SQL injection attack.
Normally when you are dealing with an ad hoc query, you need to be very careful when handling the data that you received from the user. This entails using functions that escape all of the necessary trouble characters, such as the single quote, double quote, and backslash characters.
This is unnecessary when dealing with prepared statements. The separation of the data allows MySQL to automatically take into account these characters and they do not need to be escaped using any special function.
Does this mean I don't need htmlentities() or htmlspecialchars()?
But I assume I need to add strip_tags() to user input data?
Am I right?
htmlentities and htmlspecialchars are used to generate the HTML output that is sent to the browser.
Prepared statements are used to generate/send queries to the Database engine.
Both allow escaping of data; but they don't escape for the same usage.
So, no, prepared statements (for SQL queries) don't prevent you from properly using htmlspecialchars/htmlentities (for HTML generation)
About strip_tags: it will remove tags from a string, where htmlspecialchars will transform them to HTML entities.
Those two functions don't do the same thing; you should choose which one to use depending on your needs / what you want to get.
For instance, with this piece of code:
$str = 'this is a <strong>test</strong>';
var_dump(strip_tags($str));
var_dump(htmlspecialchars($str));
You'll get this kind of output:
string 'this is a test' (length=14)
string 'this is a <strong>test</strong>' (length=43)
In the first case, no tag; in the second, properly escaped ones.
And, with an HTML output:
$str = 'this is a <strong>test</strong>';
echo strip_tags($str);
echo '<br />';
echo htmlspecialchars($str);
You'll get:
this is a test
this is a <strong>test</strong>
Which one of those do you want? That is the important question ;-)
Nothing changes for htmlspecialchars(), because that's for HTML, not SQL. You still need to escape HTML properly, and it's best to do it when you actually generate the HTML, rather than tying it to the database somehow.
If you use prepared statements, then you don't need mysql_[real_]escape_string() anymore (assuming you stick to prepared statements' placeholders and resist temptation to bypass it with string manipulation).
If you want to get rid of htmlspecialchars(), then there are HTML templating engines that work similarily to prepared statements in SQL and free you from escaping everything manually, for example PHPTAL.
You don't need htmlentities() or htmlspecialchars() when inserting stuff in the database, nothing bad will happen, you will not be vulnerable to SQL injection if you're using prepared statements.
The good thing is you'll now store the pristine user input in your database.
You DO need to escape stuff on output and sending it back to a client, - when you pull stuff out of the database else you'll be vulnerable to cross site scripting attacks, and other bad things. You'll need to escape them for the output format you need, like html, so you'll still need htmlentities etc.
For that reason you could just escape things as you put them into the database, not when you output it - however you'll lose the original formatting of the user, and you'll escape the data for html use which might not pay off if you're using the data in different output formats.
prepare for SQL Injection
htmlspecialchar for XSS(redirect to another link)
<?php
$str = "this is <script> document.location.href='https://www.google.com';</script>";
echo $str;
output: this is ... and redirect to google.com
Using htmlspecialchars:
$str = "this is <script> document.location.href='https://www.google.com';</script>";
echo htmlspecialchars($str);
<i>output1</i>: this is <script> document.location.href='https://www.google.com';</script> (in output browser)<br />
<i>output2</i>: this is <script> document.location.href='https://www.google.com';</script> (in view source)<br />
If user input comment "the script" into database, then browser display
all comment from database, auto "the script" will executed and
redirect to google.com
So,
1. use htmlspecial for deactive bad script tag
2. use prepare for secure database
htmlspecialchars
htmlspecialchars_decode
php validation
I would still be inclined to encode HTML. If you're building some form of CMS or web application, it's easier to store it as encoded HTML, and then re-encode it as required.
For example, when bringing information into a TextArea modified by TinyMCE, they reccomend that the HTML should be encoded - since the HTML spec does not allow for HTML inside a text area.
I would also strip_tags() from anywhere you don't want HTML code.