How do I prevent XSS (cross-site scripting) using just HTML and PHP?
I've seen numerous other posts on this topic but I have not found an article that clear and concisely states how to actually prevent XSS.
Basically you need to use the function htmlspecialchars() whenever you want to output something to the browser that came from the user input.
The correct way to use this function is something like this:
echo htmlspecialchars($string, ENT_QUOTES, 'UTF-8');
Google Code University also has these very educational videos on Web Security:
How To Break Web Software - A look at security vulnerabilities in
web software
What Every Engineer Needs to Know About Security
and Where to Learn It
One of the most important steps is to sanitize any user input before it is processed and/or rendered back to the browser. PHP has some "filter" functions that can be used.
The form that XSS attacks usually have is to insert a link to some off-site javascript that contains malicious intent for the user. Read more about it here.
You'll also want to test your site - I can recommend the Firefox add-on [XSS Me]. Looks like Easy XSS is now the way to go.
Cross-posting this as a consolidated reference from the SO Documentation beta which is going offline.
Problem
Cross-site scripting is the unintended execution of remote code by a web client. Any web application might expose itself to XSS if it takes input from a user and outputs it directly on a web page. If input includes HTML or JavaScript, remote code can be executed when this content is rendered by the web client.
For example, if a 3rd party side contains a JavaScript file:
// http://example.com/runme.js
document.write("I'm running");
And a PHP application directly outputs a string passed into it:
<?php
echo '<div>' . $_GET['input'] . '</div>';
If an unchecked GET parameter contains <script src="http://example.com/runme.js"></script> then the output of the PHP script will be:
<div><script src="http://example.com/runme.js"></script></div>
The 3rd party JavaScript will run and the user will see "I'm running" on the web page.
Solution
As a general rule, never trust input coming from a client. Every GET parameter, POST or PUT content, and cookie value could be anything at all, and should therefore be validated. When outputting any of these values, escape them so they will not be evaluated in an unexpected way.
Keep in mind that even in the simplest applications data can be moved around and it will be hard to keep track of all sources. Therefore it is a best practice to always escape output.
PHP provides a few ways to escape output depending on the context.
Filter Functions
PHPs Filter Functions allow the input data to the php script to be sanitized or validated in many ways. They are useful when saving or outputting client input.
HTML Encoding
htmlspecialchars will convert any "HTML special characters" into their HTML encodings, meaning they will then not be processed as standard HTML. To fix our previous example using this method:
<?php
echo '<div>' . htmlspecialchars($_GET['input']) . '</div>';
// or
echo '<div>' . filter_input(INPUT_GET, 'input', FILTER_SANITIZE_SPECIAL_CHARS) . '</div>';
Would output:
<div><script src="http://example.com/runme.js"></script></div>
Everything inside the <div> tag will not be interpreted as a JavaScript tag by the browser, but instead as a simple text node. The user will safely see:
<script src="http://example.com/runme.js"></script>
URL Encoding
When outputting a dynamically generated URL, PHP provides the urlencode function to safely output valid URLs. So, for example, if a user is able to input data that becomes part of another GET parameter:
<?php
$input = urlencode($_GET['input']);
// or
$input = filter_input(INPUT_GET, 'input', FILTER_SANITIZE_URL);
echo 'Link';
Any malicious input will be converted to an encoded URL parameter.
Using specialised external libraries or OWASP AntiSamy lists
Sometimes you will want to send HTML or other kind of code inputs. You will need to maintain a list of authorised words (white list) and un-authorized (blacklist).
You can download standard lists available at the OWASP AntiSamy website. Each list is fit for a specific kind of interaction (ebay api, tinyMCE, etc...). And it is open source.
There are libraries existing to filter HTML and prevent XSS attacks for the general case and performing at least as well as AntiSamy lists with very easy use.
For example you have HTML Purifier
In order of preference:
If you are using a templating engine (e.g. Twig, Smarty, Blade), check that it offers context-sensitive escaping. I know from experience that Twig does. {{ var|e('html_attr') }}
If you want to allow HTML, use HTML Purifier. Even if you think you only accept Markdown or ReStructuredText, you still want to purify the HTML these markup languages output.
Otherwise, use htmlentities($var, ENT_QUOTES | ENT_HTML5, $charset) and make sure the rest of your document uses the same character set as $charset. In most cases, 'UTF-8' is the desired character set.
Also, make sure you escape on output, not on input.
Many frameworks help handle XSS in various ways. When rolling your own or if there's some XSS concern, we can leverage filter_input_array (available in PHP 5 >= 5.2.0, PHP 7.)
I typically will add this snippet to my SessionController, because all calls go through there before any other controller interacts with the data. In this manner, all user input gets sanitized in 1 central location. If this is done at the beginning of a project or before your database is poisoned, you shouldn't have any issues at time of output...stops garbage in, garbage out.
/* Prevent XSS input */
$_GET = filter_input_array(INPUT_GET, FILTER_SANITIZE_STRING);
$_POST = filter_input_array(INPUT_POST, FILTER_SANITIZE_STRING);
/* I prefer not to use $_REQUEST...but for those who do: */
$_REQUEST = (array)$_POST + (array)$_GET + (array)$_REQUEST;
The above will remove ALL HTML & script tags. If you need a solution that allows safe tags, based on a whitelist, check out HTML Purifier.
If your database is already poisoned or you want to deal with XSS at time of output, OWASP recommends creating a custom wrapper function for echo, and using it EVERYWHERE you output user-supplied values:
//xss mitigation functions
function xssafe($data,$encoding='UTF-8')
{
return htmlspecialchars($data,ENT_QUOTES | ENT_HTML401,$encoding);
}
function xecho($data)
{
echo xssafe($data);
}
<?php
function xss_clean($data)
{
// Fix &entity\n;
$data = str_replace(array('&','<','>'), array('&','<','>'), $data);
$data = preg_replace('/(&#*\w+)[\x00-\x20]+;/u', '$1;', $data);
$data = preg_replace('/(&#x*[0-9A-F]+);*/iu', '$1;', $data);
$data = html_entity_decode($data, ENT_COMPAT, 'UTF-8');
// Remove any attribute starting with "on" or xmlns
$data = preg_replace('#(<[^>]+?[\x00-\x20"\'])(?:on|xmlns)[^>]*+>#iu', '$1>', $data);
// Remove javascript: and vbscript: protocols
$data = preg_replace('#([a-z]*)[\x00-\x20]*=[\x00-\x20]*([`\'"]*)[\x00-\x20]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2nojavascript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2novbscript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*-moz-binding[\x00-\x20]*:#u', '$1=$2nomozbinding...', $data);
// Only works in IE: <span style="width: expression(alert('Ping!'));"></span>
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?expression[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?behaviour[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:*[^>]*+>#iu', '$1>', $data);
// Remove namespaced elements (we do not need them)
$data = preg_replace('#</*\w+:\w[^>]*+>#i', '', $data);
do
{
// Remove really unwanted tags
$old_data = $data;
$data = preg_replace('#</*(?:applet|b(?:ase|gsound|link)|embed|frame(?:set)?|i(?:frame|layer)|l(?:ayer|ink)|meta|object|s(?:cript|tyle)|title|xml)[^>]*+>#i', '', $data);
}
while ($old_data !== $data);
// we are done...
return $data;
}
You are also able to set some XSS related HTTP response headers via header(...)
X-XSS-Protection "1; mode=block"
to be sure, the browser XSS protection mode is enabled.
Content-Security-Policy "default-src 'self'; ..."
to enable browser-side content security. See this one for Content Security Policy (CSP) details: http://content-security-policy.com/
Especially setting up CSP to block inline-scripts and external script sources is helpful against XSS.
for a general bunch of useful HTTP response headers concerning the security of you webapp, look at OWASP: https://www.owasp.org/index.php/List_of_useful_HTTP_headers
Use htmlspecialchars on PHP. On HTML try to avoid using:
element.innerHTML = “…”;
element.outerHTML = “…”;
document.write(…);
document.writeln(…);
where var is controlled by the user.
Also obviously try avoiding eval(var),
if you have to use any of them then try JS escaping them, HTML escape them and you might have to do some more but for the basics this should be enough.
The best way to protect your input it's use htmlentities function.
Example:
htmlentities($target, ENT_QUOTES, 'UTF-8');
You can get more information here.
Related
How do I prevent XSS (cross-site scripting) using just HTML and PHP?
I've seen numerous other posts on this topic but I have not found an article that clear and concisely states how to actually prevent XSS.
Basically you need to use the function htmlspecialchars() whenever you want to output something to the browser that came from the user input.
The correct way to use this function is something like this:
echo htmlspecialchars($string, ENT_QUOTES, 'UTF-8');
Google Code University also has these very educational videos on Web Security:
How To Break Web Software - A look at security vulnerabilities in
web software
What Every Engineer Needs to Know About Security
and Where to Learn It
One of the most important steps is to sanitize any user input before it is processed and/or rendered back to the browser. PHP has some "filter" functions that can be used.
The form that XSS attacks usually have is to insert a link to some off-site javascript that contains malicious intent for the user. Read more about it here.
You'll also want to test your site - I can recommend the Firefox add-on [XSS Me]. Looks like Easy XSS is now the way to go.
Cross-posting this as a consolidated reference from the SO Documentation beta which is going offline.
Problem
Cross-site scripting is the unintended execution of remote code by a web client. Any web application might expose itself to XSS if it takes input from a user and outputs it directly on a web page. If input includes HTML or JavaScript, remote code can be executed when this content is rendered by the web client.
For example, if a 3rd party side contains a JavaScript file:
// http://example.com/runme.js
document.write("I'm running");
And a PHP application directly outputs a string passed into it:
<?php
echo '<div>' . $_GET['input'] . '</div>';
If an unchecked GET parameter contains <script src="http://example.com/runme.js"></script> then the output of the PHP script will be:
<div><script src="http://example.com/runme.js"></script></div>
The 3rd party JavaScript will run and the user will see "I'm running" on the web page.
Solution
As a general rule, never trust input coming from a client. Every GET parameter, POST or PUT content, and cookie value could be anything at all, and should therefore be validated. When outputting any of these values, escape them so they will not be evaluated in an unexpected way.
Keep in mind that even in the simplest applications data can be moved around and it will be hard to keep track of all sources. Therefore it is a best practice to always escape output.
PHP provides a few ways to escape output depending on the context.
Filter Functions
PHPs Filter Functions allow the input data to the php script to be sanitized or validated in many ways. They are useful when saving or outputting client input.
HTML Encoding
htmlspecialchars will convert any "HTML special characters" into their HTML encodings, meaning they will then not be processed as standard HTML. To fix our previous example using this method:
<?php
echo '<div>' . htmlspecialchars($_GET['input']) . '</div>';
// or
echo '<div>' . filter_input(INPUT_GET, 'input', FILTER_SANITIZE_SPECIAL_CHARS) . '</div>';
Would output:
<div><script src="http://example.com/runme.js"></script></div>
Everything inside the <div> tag will not be interpreted as a JavaScript tag by the browser, but instead as a simple text node. The user will safely see:
<script src="http://example.com/runme.js"></script>
URL Encoding
When outputting a dynamically generated URL, PHP provides the urlencode function to safely output valid URLs. So, for example, if a user is able to input data that becomes part of another GET parameter:
<?php
$input = urlencode($_GET['input']);
// or
$input = filter_input(INPUT_GET, 'input', FILTER_SANITIZE_URL);
echo 'Link';
Any malicious input will be converted to an encoded URL parameter.
Using specialised external libraries or OWASP AntiSamy lists
Sometimes you will want to send HTML or other kind of code inputs. You will need to maintain a list of authorised words (white list) and un-authorized (blacklist).
You can download standard lists available at the OWASP AntiSamy website. Each list is fit for a specific kind of interaction (ebay api, tinyMCE, etc...). And it is open source.
There are libraries existing to filter HTML and prevent XSS attacks for the general case and performing at least as well as AntiSamy lists with very easy use.
For example you have HTML Purifier
In order of preference:
If you are using a templating engine (e.g. Twig, Smarty, Blade), check that it offers context-sensitive escaping. I know from experience that Twig does. {{ var|e('html_attr') }}
If you want to allow HTML, use HTML Purifier. Even if you think you only accept Markdown or ReStructuredText, you still want to purify the HTML these markup languages output.
Otherwise, use htmlentities($var, ENT_QUOTES | ENT_HTML5, $charset) and make sure the rest of your document uses the same character set as $charset. In most cases, 'UTF-8' is the desired character set.
Also, make sure you escape on output, not on input.
Many frameworks help handle XSS in various ways. When rolling your own or if there's some XSS concern, we can leverage filter_input_array (available in PHP 5 >= 5.2.0, PHP 7.)
I typically will add this snippet to my SessionController, because all calls go through there before any other controller interacts with the data. In this manner, all user input gets sanitized in 1 central location. If this is done at the beginning of a project or before your database is poisoned, you shouldn't have any issues at time of output...stops garbage in, garbage out.
/* Prevent XSS input */
$_GET = filter_input_array(INPUT_GET, FILTER_SANITIZE_STRING);
$_POST = filter_input_array(INPUT_POST, FILTER_SANITIZE_STRING);
/* I prefer not to use $_REQUEST...but for those who do: */
$_REQUEST = (array)$_POST + (array)$_GET + (array)$_REQUEST;
The above will remove ALL HTML & script tags. If you need a solution that allows safe tags, based on a whitelist, check out HTML Purifier.
If your database is already poisoned or you want to deal with XSS at time of output, OWASP recommends creating a custom wrapper function for echo, and using it EVERYWHERE you output user-supplied values:
//xss mitigation functions
function xssafe($data,$encoding='UTF-8')
{
return htmlspecialchars($data,ENT_QUOTES | ENT_HTML401,$encoding);
}
function xecho($data)
{
echo xssafe($data);
}
<?php
function xss_clean($data)
{
// Fix &entity\n;
$data = str_replace(array('&','<','>'), array('&','<','>'), $data);
$data = preg_replace('/(&#*\w+)[\x00-\x20]+;/u', '$1;', $data);
$data = preg_replace('/(&#x*[0-9A-F]+);*/iu', '$1;', $data);
$data = html_entity_decode($data, ENT_COMPAT, 'UTF-8');
// Remove any attribute starting with "on" or xmlns
$data = preg_replace('#(<[^>]+?[\x00-\x20"\'])(?:on|xmlns)[^>]*+>#iu', '$1>', $data);
// Remove javascript: and vbscript: protocols
$data = preg_replace('#([a-z]*)[\x00-\x20]*=[\x00-\x20]*([`\'"]*)[\x00-\x20]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2nojavascript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2novbscript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*-moz-binding[\x00-\x20]*:#u', '$1=$2nomozbinding...', $data);
// Only works in IE: <span style="width: expression(alert('Ping!'));"></span>
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?expression[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?behaviour[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:*[^>]*+>#iu', '$1>', $data);
// Remove namespaced elements (we do not need them)
$data = preg_replace('#</*\w+:\w[^>]*+>#i', '', $data);
do
{
// Remove really unwanted tags
$old_data = $data;
$data = preg_replace('#</*(?:applet|b(?:ase|gsound|link)|embed|frame(?:set)?|i(?:frame|layer)|l(?:ayer|ink)|meta|object|s(?:cript|tyle)|title|xml)[^>]*+>#i', '', $data);
}
while ($old_data !== $data);
// we are done...
return $data;
}
You are also able to set some XSS related HTTP response headers via header(...)
X-XSS-Protection "1; mode=block"
to be sure, the browser XSS protection mode is enabled.
Content-Security-Policy "default-src 'self'; ..."
to enable browser-side content security. See this one for Content Security Policy (CSP) details: http://content-security-policy.com/
Especially setting up CSP to block inline-scripts and external script sources is helpful against XSS.
for a general bunch of useful HTTP response headers concerning the security of you webapp, look at OWASP: https://www.owasp.org/index.php/List_of_useful_HTTP_headers
Use htmlspecialchars on PHP. On HTML try to avoid using:
element.innerHTML = “…”;
element.outerHTML = “…”;
document.write(…);
document.writeln(…);
where var is controlled by the user.
Also obviously try avoiding eval(var),
if you have to use any of them then try JS escaping them, HTML escape them and you might have to do some more but for the basics this should be enough.
The best way to protect your input it's use htmlentities function.
Example:
htmlentities($target, ENT_QUOTES, 'UTF-8');
You can get more information here.
How do I prevent XSS (cross-site scripting) using just HTML and PHP?
I've seen numerous other posts on this topic but I have not found an article that clear and concisely states how to actually prevent XSS.
Basically you need to use the function htmlspecialchars() whenever you want to output something to the browser that came from the user input.
The correct way to use this function is something like this:
echo htmlspecialchars($string, ENT_QUOTES, 'UTF-8');
Google Code University also has these very educational videos on Web Security:
How To Break Web Software - A look at security vulnerabilities in
web software
What Every Engineer Needs to Know About Security
and Where to Learn It
One of the most important steps is to sanitize any user input before it is processed and/or rendered back to the browser. PHP has some "filter" functions that can be used.
The form that XSS attacks usually have is to insert a link to some off-site javascript that contains malicious intent for the user. Read more about it here.
You'll also want to test your site - I can recommend the Firefox add-on [XSS Me]. Looks like Easy XSS is now the way to go.
Cross-posting this as a consolidated reference from the SO Documentation beta which is going offline.
Problem
Cross-site scripting is the unintended execution of remote code by a web client. Any web application might expose itself to XSS if it takes input from a user and outputs it directly on a web page. If input includes HTML or JavaScript, remote code can be executed when this content is rendered by the web client.
For example, if a 3rd party side contains a JavaScript file:
// http://example.com/runme.js
document.write("I'm running");
And a PHP application directly outputs a string passed into it:
<?php
echo '<div>' . $_GET['input'] . '</div>';
If an unchecked GET parameter contains <script src="http://example.com/runme.js"></script> then the output of the PHP script will be:
<div><script src="http://example.com/runme.js"></script></div>
The 3rd party JavaScript will run and the user will see "I'm running" on the web page.
Solution
As a general rule, never trust input coming from a client. Every GET parameter, POST or PUT content, and cookie value could be anything at all, and should therefore be validated. When outputting any of these values, escape them so they will not be evaluated in an unexpected way.
Keep in mind that even in the simplest applications data can be moved around and it will be hard to keep track of all sources. Therefore it is a best practice to always escape output.
PHP provides a few ways to escape output depending on the context.
Filter Functions
PHPs Filter Functions allow the input data to the php script to be sanitized or validated in many ways. They are useful when saving or outputting client input.
HTML Encoding
htmlspecialchars will convert any "HTML special characters" into their HTML encodings, meaning they will then not be processed as standard HTML. To fix our previous example using this method:
<?php
echo '<div>' . htmlspecialchars($_GET['input']) . '</div>';
// or
echo '<div>' . filter_input(INPUT_GET, 'input', FILTER_SANITIZE_SPECIAL_CHARS) . '</div>';
Would output:
<div><script src="http://example.com/runme.js"></script></div>
Everything inside the <div> tag will not be interpreted as a JavaScript tag by the browser, but instead as a simple text node. The user will safely see:
<script src="http://example.com/runme.js"></script>
URL Encoding
When outputting a dynamically generated URL, PHP provides the urlencode function to safely output valid URLs. So, for example, if a user is able to input data that becomes part of another GET parameter:
<?php
$input = urlencode($_GET['input']);
// or
$input = filter_input(INPUT_GET, 'input', FILTER_SANITIZE_URL);
echo 'Link';
Any malicious input will be converted to an encoded URL parameter.
Using specialised external libraries or OWASP AntiSamy lists
Sometimes you will want to send HTML or other kind of code inputs. You will need to maintain a list of authorised words (white list) and un-authorized (blacklist).
You can download standard lists available at the OWASP AntiSamy website. Each list is fit for a specific kind of interaction (ebay api, tinyMCE, etc...). And it is open source.
There are libraries existing to filter HTML and prevent XSS attacks for the general case and performing at least as well as AntiSamy lists with very easy use.
For example you have HTML Purifier
In order of preference:
If you are using a templating engine (e.g. Twig, Smarty, Blade), check that it offers context-sensitive escaping. I know from experience that Twig does. {{ var|e('html_attr') }}
If you want to allow HTML, use HTML Purifier. Even if you think you only accept Markdown or ReStructuredText, you still want to purify the HTML these markup languages output.
Otherwise, use htmlentities($var, ENT_QUOTES | ENT_HTML5, $charset) and make sure the rest of your document uses the same character set as $charset. In most cases, 'UTF-8' is the desired character set.
Also, make sure you escape on output, not on input.
Many frameworks help handle XSS in various ways. When rolling your own or if there's some XSS concern, we can leverage filter_input_array (available in PHP 5 >= 5.2.0, PHP 7.)
I typically will add this snippet to my SessionController, because all calls go through there before any other controller interacts with the data. In this manner, all user input gets sanitized in 1 central location. If this is done at the beginning of a project or before your database is poisoned, you shouldn't have any issues at time of output...stops garbage in, garbage out.
/* Prevent XSS input */
$_GET = filter_input_array(INPUT_GET, FILTER_SANITIZE_STRING);
$_POST = filter_input_array(INPUT_POST, FILTER_SANITIZE_STRING);
/* I prefer not to use $_REQUEST...but for those who do: */
$_REQUEST = (array)$_POST + (array)$_GET + (array)$_REQUEST;
The above will remove ALL HTML & script tags. If you need a solution that allows safe tags, based on a whitelist, check out HTML Purifier.
If your database is already poisoned or you want to deal with XSS at time of output, OWASP recommends creating a custom wrapper function for echo, and using it EVERYWHERE you output user-supplied values:
//xss mitigation functions
function xssafe($data,$encoding='UTF-8')
{
return htmlspecialchars($data,ENT_QUOTES | ENT_HTML401,$encoding);
}
function xecho($data)
{
echo xssafe($data);
}
<?php
function xss_clean($data)
{
// Fix &entity\n;
$data = str_replace(array('&','<','>'), array('&','<','>'), $data);
$data = preg_replace('/(&#*\w+)[\x00-\x20]+;/u', '$1;', $data);
$data = preg_replace('/(&#x*[0-9A-F]+);*/iu', '$1;', $data);
$data = html_entity_decode($data, ENT_COMPAT, 'UTF-8');
// Remove any attribute starting with "on" or xmlns
$data = preg_replace('#(<[^>]+?[\x00-\x20"\'])(?:on|xmlns)[^>]*+>#iu', '$1>', $data);
// Remove javascript: and vbscript: protocols
$data = preg_replace('#([a-z]*)[\x00-\x20]*=[\x00-\x20]*([`\'"]*)[\x00-\x20]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2nojavascript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2novbscript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*-moz-binding[\x00-\x20]*:#u', '$1=$2nomozbinding...', $data);
// Only works in IE: <span style="width: expression(alert('Ping!'));"></span>
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?expression[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?behaviour[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:*[^>]*+>#iu', '$1>', $data);
// Remove namespaced elements (we do not need them)
$data = preg_replace('#</*\w+:\w[^>]*+>#i', '', $data);
do
{
// Remove really unwanted tags
$old_data = $data;
$data = preg_replace('#</*(?:applet|b(?:ase|gsound|link)|embed|frame(?:set)?|i(?:frame|layer)|l(?:ayer|ink)|meta|object|s(?:cript|tyle)|title|xml)[^>]*+>#i', '', $data);
}
while ($old_data !== $data);
// we are done...
return $data;
}
You are also able to set some XSS related HTTP response headers via header(...)
X-XSS-Protection "1; mode=block"
to be sure, the browser XSS protection mode is enabled.
Content-Security-Policy "default-src 'self'; ..."
to enable browser-side content security. See this one for Content Security Policy (CSP) details: http://content-security-policy.com/
Especially setting up CSP to block inline-scripts and external script sources is helpful against XSS.
for a general bunch of useful HTTP response headers concerning the security of you webapp, look at OWASP: https://www.owasp.org/index.php/List_of_useful_HTTP_headers
Use htmlspecialchars on PHP. On HTML try to avoid using:
element.innerHTML = “…”;
element.outerHTML = “…”;
document.write(…);
document.writeln(…);
where var is controlled by the user.
Also obviously try avoiding eval(var),
if you have to use any of them then try JS escaping them, HTML escape them and you might have to do some more but for the basics this should be enough.
The best way to protect your input it's use htmlentities function.
Example:
htmlentities($target, ENT_QUOTES, 'UTF-8');
You can get more information here.
How do I prevent XSS (cross-site scripting) using just HTML and PHP?
I've seen numerous other posts on this topic but I have not found an article that clear and concisely states how to actually prevent XSS.
Basically you need to use the function htmlspecialchars() whenever you want to output something to the browser that came from the user input.
The correct way to use this function is something like this:
echo htmlspecialchars($string, ENT_QUOTES, 'UTF-8');
Google Code University also has these very educational videos on Web Security:
How To Break Web Software - A look at security vulnerabilities in
web software
What Every Engineer Needs to Know About Security
and Where to Learn It
One of the most important steps is to sanitize any user input before it is processed and/or rendered back to the browser. PHP has some "filter" functions that can be used.
The form that XSS attacks usually have is to insert a link to some off-site javascript that contains malicious intent for the user. Read more about it here.
You'll also want to test your site - I can recommend the Firefox add-on [XSS Me]. Looks like Easy XSS is now the way to go.
Cross-posting this as a consolidated reference from the SO Documentation beta which is going offline.
Problem
Cross-site scripting is the unintended execution of remote code by a web client. Any web application might expose itself to XSS if it takes input from a user and outputs it directly on a web page. If input includes HTML or JavaScript, remote code can be executed when this content is rendered by the web client.
For example, if a 3rd party side contains a JavaScript file:
// http://example.com/runme.js
document.write("I'm running");
And a PHP application directly outputs a string passed into it:
<?php
echo '<div>' . $_GET['input'] . '</div>';
If an unchecked GET parameter contains <script src="http://example.com/runme.js"></script> then the output of the PHP script will be:
<div><script src="http://example.com/runme.js"></script></div>
The 3rd party JavaScript will run and the user will see "I'm running" on the web page.
Solution
As a general rule, never trust input coming from a client. Every GET parameter, POST or PUT content, and cookie value could be anything at all, and should therefore be validated. When outputting any of these values, escape them so they will not be evaluated in an unexpected way.
Keep in mind that even in the simplest applications data can be moved around and it will be hard to keep track of all sources. Therefore it is a best practice to always escape output.
PHP provides a few ways to escape output depending on the context.
Filter Functions
PHPs Filter Functions allow the input data to the php script to be sanitized or validated in many ways. They are useful when saving or outputting client input.
HTML Encoding
htmlspecialchars will convert any "HTML special characters" into their HTML encodings, meaning they will then not be processed as standard HTML. To fix our previous example using this method:
<?php
echo '<div>' . htmlspecialchars($_GET['input']) . '</div>';
// or
echo '<div>' . filter_input(INPUT_GET, 'input', FILTER_SANITIZE_SPECIAL_CHARS) . '</div>';
Would output:
<div><script src="http://example.com/runme.js"></script></div>
Everything inside the <div> tag will not be interpreted as a JavaScript tag by the browser, but instead as a simple text node. The user will safely see:
<script src="http://example.com/runme.js"></script>
URL Encoding
When outputting a dynamically generated URL, PHP provides the urlencode function to safely output valid URLs. So, for example, if a user is able to input data that becomes part of another GET parameter:
<?php
$input = urlencode($_GET['input']);
// or
$input = filter_input(INPUT_GET, 'input', FILTER_SANITIZE_URL);
echo 'Link';
Any malicious input will be converted to an encoded URL parameter.
Using specialised external libraries or OWASP AntiSamy lists
Sometimes you will want to send HTML or other kind of code inputs. You will need to maintain a list of authorised words (white list) and un-authorized (blacklist).
You can download standard lists available at the OWASP AntiSamy website. Each list is fit for a specific kind of interaction (ebay api, tinyMCE, etc...). And it is open source.
There are libraries existing to filter HTML and prevent XSS attacks for the general case and performing at least as well as AntiSamy lists with very easy use.
For example you have HTML Purifier
In order of preference:
If you are using a templating engine (e.g. Twig, Smarty, Blade), check that it offers context-sensitive escaping. I know from experience that Twig does. {{ var|e('html_attr') }}
If you want to allow HTML, use HTML Purifier. Even if you think you only accept Markdown or ReStructuredText, you still want to purify the HTML these markup languages output.
Otherwise, use htmlentities($var, ENT_QUOTES | ENT_HTML5, $charset) and make sure the rest of your document uses the same character set as $charset. In most cases, 'UTF-8' is the desired character set.
Also, make sure you escape on output, not on input.
Many frameworks help handle XSS in various ways. When rolling your own or if there's some XSS concern, we can leverage filter_input_array (available in PHP 5 >= 5.2.0, PHP 7.)
I typically will add this snippet to my SessionController, because all calls go through there before any other controller interacts with the data. In this manner, all user input gets sanitized in 1 central location. If this is done at the beginning of a project or before your database is poisoned, you shouldn't have any issues at time of output...stops garbage in, garbage out.
/* Prevent XSS input */
$_GET = filter_input_array(INPUT_GET, FILTER_SANITIZE_STRING);
$_POST = filter_input_array(INPUT_POST, FILTER_SANITIZE_STRING);
/* I prefer not to use $_REQUEST...but for those who do: */
$_REQUEST = (array)$_POST + (array)$_GET + (array)$_REQUEST;
The above will remove ALL HTML & script tags. If you need a solution that allows safe tags, based on a whitelist, check out HTML Purifier.
If your database is already poisoned or you want to deal with XSS at time of output, OWASP recommends creating a custom wrapper function for echo, and using it EVERYWHERE you output user-supplied values:
//xss mitigation functions
function xssafe($data,$encoding='UTF-8')
{
return htmlspecialchars($data,ENT_QUOTES | ENT_HTML401,$encoding);
}
function xecho($data)
{
echo xssafe($data);
}
<?php
function xss_clean($data)
{
// Fix &entity\n;
$data = str_replace(array('&','<','>'), array('&','<','>'), $data);
$data = preg_replace('/(&#*\w+)[\x00-\x20]+;/u', '$1;', $data);
$data = preg_replace('/(&#x*[0-9A-F]+);*/iu', '$1;', $data);
$data = html_entity_decode($data, ENT_COMPAT, 'UTF-8');
// Remove any attribute starting with "on" or xmlns
$data = preg_replace('#(<[^>]+?[\x00-\x20"\'])(?:on|xmlns)[^>]*+>#iu', '$1>', $data);
// Remove javascript: and vbscript: protocols
$data = preg_replace('#([a-z]*)[\x00-\x20]*=[\x00-\x20]*([`\'"]*)[\x00-\x20]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2nojavascript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iu', '$1=$2novbscript...', $data);
$data = preg_replace('#([a-z]*)[\x00-\x20]*=([\'"]*)[\x00-\x20]*-moz-binding[\x00-\x20]*:#u', '$1=$2nomozbinding...', $data);
// Only works in IE: <span style="width: expression(alert('Ping!'));"></span>
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?expression[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?behaviour[\x00-\x20]*\([^>]*+>#i', '$1>', $data);
$data = preg_replace('#(<[^>]+?)style[\x00-\x20]*=[\x00-\x20]*[`\'"]*.*?s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:*[^>]*+>#iu', '$1>', $data);
// Remove namespaced elements (we do not need them)
$data = preg_replace('#</*\w+:\w[^>]*+>#i', '', $data);
do
{
// Remove really unwanted tags
$old_data = $data;
$data = preg_replace('#</*(?:applet|b(?:ase|gsound|link)|embed|frame(?:set)?|i(?:frame|layer)|l(?:ayer|ink)|meta|object|s(?:cript|tyle)|title|xml)[^>]*+>#i', '', $data);
}
while ($old_data !== $data);
// we are done...
return $data;
}
You are also able to set some XSS related HTTP response headers via header(...)
X-XSS-Protection "1; mode=block"
to be sure, the browser XSS protection mode is enabled.
Content-Security-Policy "default-src 'self'; ..."
to enable browser-side content security. See this one for Content Security Policy (CSP) details: http://content-security-policy.com/
Especially setting up CSP to block inline-scripts and external script sources is helpful against XSS.
for a general bunch of useful HTTP response headers concerning the security of you webapp, look at OWASP: https://www.owasp.org/index.php/List_of_useful_HTTP_headers
Use htmlspecialchars on PHP. On HTML try to avoid using:
element.innerHTML = “…”;
element.outerHTML = “…”;
document.write(…);
document.writeln(…);
where var is controlled by the user.
Also obviously try avoiding eval(var),
if you have to use any of them then try JS escaping them, HTML escape them and you might have to do some more but for the basics this should be enough.
The best way to protect your input it's use htmlentities function.
Example:
htmlentities($target, ENT_QUOTES, 'UTF-8');
You can get more information here.
Questions:
What are the best safe1(), safe2(), safe3(), and safe4() functions to avoid XSS for UTF8 encoded pages? Is it also safe in all browsers (specifically IE6)?
<body><?php echo safe1($xss)?></body>
<body id="<?php echo safe2($xss)?>"></body>
<script type="text/javascript">
var a = "<?php echo safe3($xss)?>";
</script>
<style type="text/css">
.myclass {width:<?php echo safe4($xss)?>}
</style>
.
Many people say the absolute best that can be done is:
// safe1 & safe2
$s = htmlentities($s, ENT_QUOTES, "UTF-8");
// But how would you compare the above to:
// https://github.com/shadowhand/purifier
// OR http://kohanaframework.org/3.0/guide/api/Security#xss_clean
// OR is there an even better if not perfect solution?
.
// safe3
$s = mb_convert_encoding($s, "UTF-8", "UTF-8");
$s = htmlentities($s, ENT_QUOTES, "UTF-8");
// How would you compare this to using using mysql_real_escape_string($s)?
// (Yes, I know this is a DB function)
// Some other people also recommend calling json_encode() before passing to htmlentities
// What's the best solution?
.
There are a hell of a lot of posts about PHP and XSS.
Most just say "use HTMLPurifier" or "use htmlspecialchars", or are wrong.
Others say use OWASP -- but it is EXTREMELY slow.
Some of the good posts I came across are listed below:
Do htmlspecialchars and mysql_real_escape_string keep my PHP code safe from injection?
XSS Me Warnings - real XSS issues?
CodeIgniter - why use xss_clean
safe2() is clearly htmlspecialchars()
In place of safe1() you should really be using HTMLPurifier to sanitize complete blobs of HTML. It strips unwanted attributes, tags and in particular anything javascriptish. Yes, it's slow, but it covers all the small edge cases (even for older IE versions) which allow for safe HTML user snippet reuse. But check out http://htmlpurifier.org/comparison for alternatives. -- If you really only want to display raw user text there (no filtered html), then htmlspecialchars(strip_tags($src)) would actually work fine.
safe3() screams regular expression. Here you can really only apply a whitelist to whatever you actually want:
var a = "<?php echo preg_replace('/[^-\w\d .,]/', "", $xss)?>";
You can of course use json_encode here to get a perfectly valid JS syntax and variable. But then you've just delayed the exploitability of that string into your JS code, where you then have to babysit it.
Is it also safe in all browsers (specifically IE6)?
If you specify the charset explicitly, then IE won't do its awful content detection magic, so UTF7 exploits can be ignored.
http://php.net/htmlentities note the section on the optional third parameter that takes a character encoding. You should use this instead of mv_convert_encoding. So long as the php file itself is saved with a utf8 encoding that should work.
htmlentities($s, ENT_COMPAT, 'UTF-8');
As for injecting the variable directly into javascript, you might consider putting the content into a hidden html element somewhere else in the page instead and pulling the content out of the dom when you need it.
The purifiers that you mention are used when you want to actually display html that a user submitted (as in, allow the browser to actually render). Using htmlentities will encode everything such that the characters will be displayed in the ui, but none of the actual code will be interpreted by the browser. Which are you aiming to do?
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I have PHP configured so that magic quotes are on and register globals are off.
I do my best to always call htmlentities() for anything I am outputing that is derived from user input.
I also occasionally seach my database for common things used in xss attached such as...
<script
What else should I be doing and how can I make sure that the things I am trying to do are always done.
Escaping input is not the best you can do for successful XSS prevention. Also output must be escaped. If you use Smarty template engine, you may use |escape:'htmlall' modifier to convert all sensitive characters to HTML entities (I use own |e modifier which is alias to the above).
My approach to input/output security is:
store user input not modified (no HTML escaping on input, only DB-aware escaping done via PDO prepared statements)
escape on output, depending on what output format you use (e.g. HTML and JSON need different escaping rules)
I'm of the opinion that one shouldn't escape anything during input, only on output. Since (most of the time) you can not assume that you know where that data is going. Example, if you have form that takes data that later on appears in an email that you send out, you need different escaping (otherwise a malicious user could rewrite your email-headers).
In other words, you can only escape at the very last moment the data is "leaving" your application:
List item
Write to XML file, escape for XML
Write to DB, escape (for that particular DBMS)
Write email, escape for emails
etc
To go short:
You don't know where your data is going
Data might actually end up in more than one place, needing different escaping mechanism's BUT NOT BOTH
Data escaped for the wrong target is really not nice. (E.g. get an email with the subject "Go to Tommy\'s bar".)
Esp #3 will occur if you escape data at the input layer (or you need to de-escape it again, etc).
PS: I'll second the advice for not using magic_quotes, those are pure evil!
There are a lot of ways to do XSS (See http://ha.ckers.org/xss.html) and it's very hard to catch.
I personally delegate this to the current framework I'm using (Code Igniter for example). While not perfect, it might catch more than my hand made routines ever do.
This is a great question.
First, don't escape text on input except to make it safe for storage (such as being put into a database). The reason for this is you want to keep what was input so you can contextually present it in different ways and places. Making changes here can compromise your later presentation.
When you go to present your data filter out what shouldn't be there. For example, if there isn't a reason for javascript to be there search for it and remove it. An easy way to do that is to use the strip_tags function and only present the html tags you are allowing.
Next, take what you have and pass it thought htmlentities or htmlspecialchars to change what's there to ascii characters. Do this based on context and what you want to get out.
I'd, also, suggest turning off Magic Quotes. It is has been removed from PHP 6 and is considered bad practice to use it. Details at http://us3.php.net/magic_quotes
For more details check out http://ha.ckers.org/xss.html
This isn't a complete answer but, hopefully enough to help you get started.
rikh Writes:
I do my best to always call htmlentities() for anything I am outputing that is derived from user input.
See Joel's essay on Making Code Look Wrong for help with this
Template library. Or at least, that is what template libraries should do.
To prevent XSS all output should be encoded. This is not the task of the main application / control logic, it should solely be handled by the output methods.
If you sprinkle htmlentities() thorughout your code, the overall design is wrong. And as you suggest, you might miss one or two spots.
That's why the only solution is rigorous html encoding -> when output vars get written into a html/xml stream.
Unfortunately, most php template libraries only add their own template syntax, but don't concern themselves with output encoding, or localization, or html validation, or anything important. Maybe someone else knows a proper template library for php?
I rely on PHPTAL for that.
Unlike Smarty and plain PHP, it escapes all output by default. This is a big win for security, because your site won't become vurnelable if you forget htmlspecialchars() or |escape somewhere.
XSS is HTML-specific attack, so HTML output is the right place to prevent it. You should not try pre-filtering data in the database, because you could need to output data to another medium which doesn't accept HTML, but has its own risks.
Escaping all user input is enough for most sites. Also make sure that session IDs don't end up in the URL so they can't be stolen from the Referer link to another site. Additionally, if you allow your users to submit links, make sure no javascript: protocol links are allowed; these would execute a script as soon as the user clicks on the link.
If you are concerned about XSS attacks, encoding your output strings to HTML is the solution. If you remember to encode every single output character to HTML format, there is no way to execute a successful XSS attack.
Read more:
Sanitizing user data: How and where to do it
Personally, I would disable magic_quotes. In PHP5+ it is disabled by default and it is better to code as if it is not there at all as it does not escape everything and it will be removed from PHP6.
Next, depending on what type of user data you are filtering will dictate what to do next e.g. if it is just text e.g. a name, then strip_tags(trim(stripslashes())); it or to check for ranges use regular expressions.
If you expect a certain range of values, create an array of the valid values and only allow those values through (in_array($userData, array(...))).
If you are checking numbers use is_numeric to enforce whole numbers or cast to a specific type, that should prevent people trying to send strings in stead.
If you have PHP5.2+ then consider looking at filter() and making use of that extension which can filter various data types including email addresses. Documentation is not particularly good, but is improving.
If you have to handle HTML then you should consider something like PHP Input Filter or HTML Purifier. HTML Purifier will also validate HTML for conformance. I am not sure if Input Filter is still being developed. Both will allow you to define a set of tags that can be used and what attributes are allowed.
Whatever you decide upon, always remember, never ever trust anything coming into your PHP script from a user (including yourself!).
All of these answers are great, but fundamentally, the solution to XSS will be to stop generating HTML documents by string manipulation.
Filtering input is always a good idea for any application.
Escaping your output using htmlentities() and friends should work as long as it's used properly, but this is the HTML equivalent of creating a SQL query by concatenating strings with mysql_real_escape_string($var) - it should work, but fewer things can validate your work, so to speak, compared to an approach like using parameterized queries.
The long-term solution should be for applications to construct the page internally, perhaps using a standard interface like the DOM, and then to use a library (like libxml) to handle the serialization to XHTML/HTML/etc. Of course, we're a long ways away from that being popular and fast enough, but in the meantime we have to build our HTML documents via string operations, and that's inherently more risky.
“Magic quotes” is a palliative remedy for some of the worst XSS flaws which works by escaping everything on input, something that's wrong by design. The only case where one would want to use it is when you absolutely must use an existing PHP application known to be written carelessly with regard to XSS. (In this case you're in a serious trouble even with “magic quotes”.) When developing your own application, you should disable “magic quotes” and follow XSS-safe practices instead.
XSS, a cross-site scripting vulnerability, occurs when an application includes strings from external sources (user input, fetched from other websites, etc) in its [X]HTML, CSS, ECMAscript or other browser-parsed output without proper escaping, hoping that special characters like less-than (in [X]HTML), single or double quotes (ECMAscript) will never appear. The proper solution to it is to always escape strings according to the rules of the output language: using entities in [X]HTML, backslashes in ECMAscript etc.
Because it can be hard to keep track of what is untrusted and has to be escaped, it's a good idea to always escape everything that is a “text string” as opposed to “text with markup” in a language like HTML. Some programming environments make it easier by introducing several incompatible string types: “string” (normal text), “HTML string” (HTML markup) and so on. That way, a direct implicit conversion from “string” to “HTML string” would be impossible, and the only way a string could become HTML markup is by passing it through an escaping function.
“Register globals”, though disabling it is definitely a good idea, deals with a problem entirely different from XSS.
I find that using this function helps to strip out a lot of possible xss attacks:
<?php
function h($string, $esc_type = 'htmlall')
{
switch ($esc_type) {
case 'css':
$string = str_replace(array('<', '>', '\\'), array('<', '>', '/'), $string);
// get rid of various versions of javascript
$string = preg_replace(
'/j\s*[\\\]*\s*a\s*[\\\]*\s*v\s*[\\\]*\s*a\s*[\\\]*\s*s\s*[\\\]*\s*c\s*[\\\]*\s*r\s*[\\\]*\s*i\s*[\\\]*\s*p\s*[\\\]*\s*t\s*[\\\]*\s*:/i',
'blocked', $string);
$string = preg_replace(
'/#\s*[\\\]*\s*i\s*[\\\]*\s*m\s*[\\\]*\s*p\s*[\\\]*\s*o\s*[\\\]*\s*r\s*[\\\]*\s*t/i',
'blocked', $string);
$string = preg_replace(
'/e\s*[\\\]*\s*x\s*[\\\]*\s*p\s*[\\\]*\s*r\s*[\\\]*\s*e\s*[\\\]*\s*s\s*[\\\]*\s*s\s*[\\\]*\s*i\s*[\\\]*\s*o\s*[\\\]*\s*n\s*[\\\]*\s*/i',
'blocked', $string);
$string = preg_replace('/b\s*[\\\]*\s*i\s*[\\\]*\s*n\s*[\\\]*\s*d\s*[\\\]*\s*i\s*[\\\]*\s*n\s*[\\\]*\s*g:/i', 'blocked', $string);
return $string;
case 'html':
//return htmlspecialchars($string, ENT_NOQUOTES);
return str_replace(array('<', '>'), array('<' , '>'), $string);
case 'htmlall':
return htmlentities($string, ENT_QUOTES);
case 'url':
return rawurlencode($string);
case 'query':
return urlencode($string);
case 'quotes':
// escape unescaped single quotes
return preg_replace("%(?<!\\\\)'%", "\\'", $string);
case 'hex':
// escape every character into hex
$s_return = '';
for ($x=0; $x < strlen($string); $x++) {
$s_return .= '%' . bin2hex($string[$x]);
}
return $s_return;
case 'hexentity':
$s_return = '';
for ($x=0; $x < strlen($string); $x++) {
$s_return .= '&#x' . bin2hex($string[$x]) . ';';
}
return $s_return;
case 'decentity':
$s_return = '';
for ($x=0; $x < strlen($string); $x++) {
$s_return .= '&#' . ord($string[$x]) . ';';
}
return $s_return;
case 'javascript':
// escape quotes and backslashes, newlines, etc.
return strtr($string, array('\\'=>'\\\\',"'"=>"\\'",'"'=>'\\"',"\r"=>'\\r',"\n"=>'\\n','</'=>'<\/'));
case 'mail':
// safe way to display e-mail address on a web page
return str_replace(array('#', '.'),array(' [AT] ', ' [DOT] '), $string);
case 'nonstd':
// escape non-standard chars, such as ms document quotes
$_res = '';
for($_i = 0, $_len = strlen($string); $_i < $_len; $_i++) {
$_ord = ord($string{$_i});
// non-standard char, escape it
if($_ord >= 126){
$_res .= '&#' . $_ord . ';';
} else {
$_res .= $string{$_i};
}
}
return $_res;
default:
return $string;
}
}
?>
Source
Make you any session cookies (or all cookies) you use HttpOnly. Most browsers will hide the cookie value from JavaScript in that case. User could still manually copy cookies, but this helps prevent direct script access. StackOverflow had this problem durning beta.
This isn't a solution, just another brick in the wall
Don't trust user input
Escape all free-text output
Don't use magic_quotes; see if there's a DBMS-specfic variant, or use PDO
Consider using HTTP-only cookies where possible to avoid any malicious script being able to hijack a session
You should at least validate all data going into the database. And try to validate all data leaving the database too.
mysql_real_escape_string is good to prevent SQL injection, but XSS is trickier.
You should preg_match, stip_tags, or htmlentities where possible!
The best current method for preventing XSS in a PHP application is HTML Purifier (http://htmlpurifier.org/). One minor drawback to it is that it's a rather large library and is best used with an op code cache like APC. You would use this in any place where untrusted content is being outputted to the screen. It is much more thorough that htmlentities, htmlspecialchars, filter_input, filter_var, strip_tags, etc.
Use an existing user-input sanitization library to clean all user-input. Unless you put a lot of effort into it, implementing it yourself will never work as well.
I find the best way is using a class that allows you to bind your code so you never have to worry about manually escaping your data.
It is difficult to implement a thorough sql injection/xss injection prevention on a site that doesn't cause false alarms. In a CMS the end user might want to use <script> or <object> that links to items from another site.
I recommend having all users install FireFox with NoScript ;-)