Should I use ENT_QUOTES with htmlspecialchars or not

Should I use ENT_QUOTES with htmlspecialchars or not - php

I am using php 5.4.4 running as UTF-8, and im not sure if I am using htmlspecialchars right.
My strings / vars look like this:
$text = "<p><span class='clx'>By:</span> ".htmlspecialchars($foo)."</span></p>";
echo $text;
Do I have need to use ENT_QUOTES or is that only necessary when I have to echo something
inside eg: href="$foo" or id='$foo' ?
Atm, om only using htmlspecialchars inside closed html tags and not attributes.
Just concatenate the var inside the string within a <p> tag and a </p> tag
Thanks

You should generally use it when taking data from database and inserting it into html elements. This is so that quotes from the data don't close the value quotes and mess up the html.

Short answer: Always.
Long answer: I highly recommend reading OWASP's PHP Top 5 and PHP Security Cheat Sheet
The OWASP Top 5 covers the five most serious and common security vulnerabilities afflicting PHP today:
Remote Code Execution
Cross-Site Scripting (XSS) [The one htmlspecialchars is trying to prevent]
SQL Injection
PHP Configuration
File System Attacks
It demonstrates common mistakes and solutions and is well worth reading for any PHP developer even considering hosting a live website.

Related

Writing HTML / PHP Code inside an input/Textarea

I have found out that if a user writes in an input php/HTML code the code will excecute in my admin panel. Can this damage my system ? And if yes how can I disable it?
I will appreciate any answers!

You can remove HTML and PHP tags with
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> Other text';
echo strip_tags($text);
echo "\n";
// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>
result:
Test paragraph. Other text
<p>Test paragraph.</p> Other text
source: https://www.php.net/manual/pt_BR/function.strip-tags.php

It is a good practice to always filter data that comes from outside the application. So I mean every input that is given to the application. Particular attention must be given by the programmer to the way in which to execute the queries to the database. Since database queries can also be made using parameters that come from the user or more generally from outside the application.
Remove all HTML tags from the input before they are used to run queries or saved to the DATABASE.
https://www.php.net/manual/en/function.strip-tags.php
strip_tags ( string $string , array|string|null $allowed_tags = null )
Pay particular attention to the formatting of queries and input parameters before running database queries to avoid SQLinjection
an interesting article about it : https://www.ptsecurity.com/ww-en/analytics/knowledge-base/how-to-prevent-sql-injection-attacks/
Cyber Security is a very broad topic and I don't think it can be expressed here in just one answer.
Dealing with this topic requires more and more IT requirements such as, for example, knowing Programming
This answer is intended to be a starting point to deepen the subject

Yes, definitely it effect the project, not only html and PHP also SQL queries and JavaScript code also.
For prevention you have to validate the input using JavaScript or jQuery.

Best ways to sanitize user submitted content? [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
PHP: the ultimate clean/secure function
I am working on an experimental social networking site in PHP. So, there will be a lot of user submitted data sent to the database.
I had coded a custom block script a while back, that would just block certain characters or keywords from being submitted. This worked, but it had it's list of problems.
I heard addslashes and mysql_real_escape_string will do this, but I don't want to do anything until I get some solid advice.
I tried addslashes, and it will add slashes to can't, don't, etc. I don't want that.
I just want my database to be safe from xss, html, php, and javascript attacks. Any advice?

prepared statements from PDO
filter_var() functions
htmlspecialchars()
For people who don't know PHP or find documentation about functions:
prepared statements - will provide protection against SQL injections ( but not against extreme stupidity )
filter_var() - will let you make sure that data really us an URL or email address , etc.
htmlspecialchars() - converts characters like < , > and & into html entities, thus, protecting against XSS.
I really fail to see the need for explanation here.

You should HTML escape any content before outputting it back to the user. Then when it's ever outputted back it'll be safe. Use htmlspecialchars for PHP. See What are the best practices for avoiding xss attacks in a PHP site for more information and read OWASP XSS (Cross Site Scripting) Prevention Cheat Sheet.

All good answers so far, I would just like to add that you should make sure that the input data comes in the desired encoding - you should also normalize the different types of new line feeds or strip control characters altogether, I end up using the following function a lot:
function Filter($string, $control = true)
{
$string = iconv('UTF-8', 'UTF-8//IGNORE', $string);
if ($control === true)
{
return preg_replace('~\p{C}+~u', '', $string);
}
return preg_replace(array('~\r[\n]?~', '~[^\P{C}\t\n]+~u'), array("\n", ''), $string);
}
It will remove all invalid UTF-8 data from the string and normalize new lines. All control chars (except tab (\t) and new lines (\n)) are striped, and if $control == true these are stripped too.
PS: This is not very useful from a security standpoint of view but is helps avoiding GIGO.

For HTML type input use HTMLPurifier or similar to filter out unwanted markup.
Validate form fields before storing the data
Use prepared statements with PDO or MySQLi when writing to the database. This will take care of the SQL escaping for you, provided you bind your parameters correctly.
Escape the output coming out of the DB before displaying it unless it can be considered safe.

A PHP Function that verify code language

I have a form with 2 textareas; the first one allows user to send HTML Code, the second allows to send CSS Code. I have to verify with a PHP function, if the language is correct.
If the language is correct, for security, i have to check that there is not PHP code or SQL Injection or whatever.
What do you think ? Is there a way to do that ?
Where can I find this kind of function ?
Is "HTML Purifier" http://htmlpurifier.org/ a good solution ?

If you have to validate the date to insert them in to database - then you just have to use mysql_real_escape_string() function before inserting them in to db.
//Safe database insertion
mysql_query("INSERT INTO table(column) VALUES(".mysql_real_escape_string($_POST['field']).")");
If you want to output the data to the end user as plain text - then you have to escape all html sensitive chars by htmlspecialchars(). If you want to output it as HTML, the you have to use HTML Purify tool.
//Safe plain text output
echo htmlspecialchars($data, ENT_QUOTES);
//Safe HTML output
$data = purifyHtml($data); //Or how it is spiecified in the purifier documentation
echo $data; //Safe html output

for something primitive you can use regex, BUT it should be noted using a parser to fully-exhaust all possibilities is recommended.
/(<\?(?:php)?(.*)\?>)/i
Example: http://regexr.com?2t3e5 (change the < in the expression back to a < and it will work (for some reason rexepr changes it to html formatting))
EDIT
/(<\?(?:php)?(.*)(?:\?>|$))/i
That's probably better so they can't place php at the end of the document (as PHP doesn't actually require a terminating character)

SHJS syntax highlighter for Javascript have files with regular expressions http://shjs.sourceforge.net/lang/ for languages that highlights — You can check how SHJS parse code.

HTMLPurifier is the recommended tool for cleaning up HTML. And as luck has it, it also incudes CSSTidy and can sanitize CSS as well.
... that there is not PHP code or SQL Injection or whatever.
You are basing your question on a wrong premise. While HTML can be cleaned, this is no safeguard against other exploitabilies. PHP "tags" are most likely to be filtered out. If you are doing something other weird (include-ing or eval-ing the content partially), that's no real help.
And SQL exploits can only be prevented by meticously using the proper database escape functions. There is no magic solution to that.

Yes. htmlpurifier is a good tool to remove malicious scripts and validate your HTML. Don't think it does CSS though. Apparently it works with CSS too. Thanks Briedis.

Ok thanks you all.
actually, i realize that I needed a human validation. Users can post HTML + CSS, I can verify in PHP that the langage & the syntax are correct, but it doesn't avoid people to post iframe, html redirection, or big black div that take all the screen.
:-)

How can I prevent an XSS attack that places a <script> element in the query string? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
What are the best practices for avoid xss attacks in a PHP site
For example:
http://www.mid-day.com/searchresult.php?query=%3Cscript%3Ealert%28%22This%20is%20vulnerable%20for%20XSS%20..%20by%20binny%22%29%3C/script%3E
This website has an XSS DOM based vulnerability I want to know what causes this vulnerability and how to prevent it?

You should always HTML encode values that come from the user like querystring parameters before outputting them. You could use the htmlentities function.
Instead of:
$str = $_GET['a'];
echo $str;
use:
$str = $_GET['a'];
echo htmlentities($str);

You should never echo directly what is in a $_REQUEST, $_POST, $_GET, or $_SERVER variables.
If you have something like this:
index.php?text=hai
and you say:
echo $_GET['text']
It isn't safe.
If you put scripts in there or something:
index.php?text=<script>alert(document.cookie);</script>
It will display the document cookie. Not realy safe.
Same is with $_REQUEST and $_POST when you submit something. And with $_SERVER['THIS_URI']...
index.php?url=<script><!-- something --></script>
and you do
<form action="<?php echo $_SERVER['THIS_URI']; ?>"> then you get an XSS hack too.
Allways use htmlentities($string) and in MySQL queries mysql_real_escape_string($string).
Greetings.

You can go for HTML Purifier:
HTML Purifier is a standards-compliant
HTML filter library written in PHP.
HTML Purifier will not only remove all
malicious code (better known as
XSS) with a thoroughly audited,
secure yet permissive whitelist, it
will also make sure your documents are
standards compliant, something only
achievable with a comprehensive
knowledge of W3C's specifications.
Built-in PHP functions do not respond to all sorts of attacks, a reason why such open source solution was needed.
I would also suggest you to take a look at:
PHP Security Guide

What are the best practices for avoiding xss attacks in a PHP site [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 1 year ago.
Improve this question
I have PHP configured so that magic quotes are on and register globals are off.
I do my best to always call htmlentities() for anything I am outputing that is derived from user input.
I also occasionally seach my database for common things used in xss attached such as...
<script
What else should I be doing and how can I make sure that the things I am trying to do are always done.

Escaping input is not the best you can do for successful XSS prevention. Also output must be escaped. If you use Smarty template engine, you may use |escape:'htmlall' modifier to convert all sensitive characters to HTML entities (I use own |e modifier which is alias to the above).
My approach to input/output security is:
store user input not modified (no HTML escaping on input, only DB-aware escaping done via PDO prepared statements)
escape on output, depending on what output format you use (e.g. HTML and JSON need different escaping rules)

I'm of the opinion that one shouldn't escape anything during input, only on output. Since (most of the time) you can not assume that you know where that data is going. Example, if you have form that takes data that later on appears in an email that you send out, you need different escaping (otherwise a malicious user could rewrite your email-headers).
In other words, you can only escape at the very last moment the data is "leaving" your application:
List item
Write to XML file, escape for XML
Write to DB, escape (for that particular DBMS)
Write email, escape for emails
etc
To go short:
You don't know where your data is going
Data might actually end up in more than one place, needing different escaping mechanism's BUT NOT BOTH
Data escaped for the wrong target is really not nice. (E.g. get an email with the subject "Go to Tommy\'s bar".)
Esp #3 will occur if you escape data at the input layer (or you need to de-escape it again, etc).
PS: I'll second the advice for not using magic_quotes, those are pure evil!

There are a lot of ways to do XSS (See http://ha.ckers.org/xss.html) and it's very hard to catch.
I personally delegate this to the current framework I'm using (Code Igniter for example). While not perfect, it might catch more than my hand made routines ever do.

This is a great question.
First, don't escape text on input except to make it safe for storage (such as being put into a database). The reason for this is you want to keep what was input so you can contextually present it in different ways and places. Making changes here can compromise your later presentation.
When you go to present your data filter out what shouldn't be there. For example, if there isn't a reason for javascript to be there search for it and remove it. An easy way to do that is to use the strip_tags function and only present the html tags you are allowing.
Next, take what you have and pass it thought htmlentities or htmlspecialchars to change what's there to ascii characters. Do this based on context and what you want to get out.
I'd, also, suggest turning off Magic Quotes. It is has been removed from PHP 6 and is considered bad practice to use it. Details at http://us3.php.net/magic_quotes
For more details check out http://ha.ckers.org/xss.html
This isn't a complete answer but, hopefully enough to help you get started.

rikh Writes:
I do my best to always call htmlentities() for anything I am outputing that is derived from user input.
See Joel's essay on Making Code Look Wrong for help with this

Template library. Or at least, that is what template libraries should do.
To prevent XSS all output should be encoded. This is not the task of the main application / control logic, it should solely be handled by the output methods.
If you sprinkle htmlentities() thorughout your code, the overall design is wrong. And as you suggest, you might miss one or two spots.
That's why the only solution is rigorous html encoding -> when output vars get written into a html/xml stream.
Unfortunately, most php template libraries only add their own template syntax, but don't concern themselves with output encoding, or localization, or html validation, or anything important. Maybe someone else knows a proper template library for php?

I rely on PHPTAL for that.
Unlike Smarty and plain PHP, it escapes all output by default. This is a big win for security, because your site won't become vurnelable if you forget htmlspecialchars() or |escape somewhere.
XSS is HTML-specific attack, so HTML output is the right place to prevent it. You should not try pre-filtering data in the database, because you could need to output data to another medium which doesn't accept HTML, but has its own risks.

Escaping all user input is enough for most sites. Also make sure that session IDs don't end up in the URL so they can't be stolen from the Referer link to another site. Additionally, if you allow your users to submit links, make sure no javascript: protocol links are allowed; these would execute a script as soon as the user clicks on the link.

If you are concerned about XSS attacks, encoding your output strings to HTML is the solution. If you remember to encode every single output character to HTML format, there is no way to execute a successful XSS attack.
Read more:
Sanitizing user data: How and where to do it

Personally, I would disable magic_quotes. In PHP5+ it is disabled by default and it is better to code as if it is not there at all as it does not escape everything and it will be removed from PHP6.
Next, depending on what type of user data you are filtering will dictate what to do next e.g. if it is just text e.g. a name, then strip_tags(trim(stripslashes())); it or to check for ranges use regular expressions.
If you expect a certain range of values, create an array of the valid values and only allow those values through (in_array($userData, array(...))).
If you are checking numbers use is_numeric to enforce whole numbers or cast to a specific type, that should prevent people trying to send strings in stead.
If you have PHP5.2+ then consider looking at filter() and making use of that extension which can filter various data types including email addresses. Documentation is not particularly good, but is improving.
If you have to handle HTML then you should consider something like PHP Input Filter or HTML Purifier. HTML Purifier will also validate HTML for conformance. I am not sure if Input Filter is still being developed. Both will allow you to define a set of tags that can be used and what attributes are allowed.
Whatever you decide upon, always remember, never ever trust anything coming into your PHP script from a user (including yourself!).

All of these answers are great, but fundamentally, the solution to XSS will be to stop generating HTML documents by string manipulation.
Filtering input is always a good idea for any application.
Escaping your output using htmlentities() and friends should work as long as it's used properly, but this is the HTML equivalent of creating a SQL query by concatenating strings with mysql_real_escape_string($var) - it should work, but fewer things can validate your work, so to speak, compared to an approach like using parameterized queries.
The long-term solution should be for applications to construct the page internally, perhaps using a standard interface like the DOM, and then to use a library (like libxml) to handle the serialization to XHTML/HTML/etc. Of course, we're a long ways away from that being popular and fast enough, but in the meantime we have to build our HTML documents via string operations, and that's inherently more risky.

“Magic quotes” is a palliative remedy for some of the worst XSS flaws which works by escaping everything on input, something that's wrong by design. The only case where one would want to use it is when you absolutely must use an existing PHP application known to be written carelessly with regard to XSS. (In this case you're in a serious trouble even with “magic quotes”.) When developing your own application, you should disable “magic quotes” and follow XSS-safe practices instead.
XSS, a cross-site scripting vulnerability, occurs when an application includes strings from external sources (user input, fetched from other websites, etc) in its [X]HTML, CSS, ECMAscript or other browser-parsed output without proper escaping, hoping that special characters like less-than (in [X]HTML), single or double quotes (ECMAscript) will never appear. The proper solution to it is to always escape strings according to the rules of the output language: using entities in [X]HTML, backslashes in ECMAscript etc.
Because it can be hard to keep track of what is untrusted and has to be escaped, it's a good idea to always escape everything that is a “text string” as opposed to “text with markup” in a language like HTML. Some programming environments make it easier by introducing several incompatible string types: “string” (normal text), “HTML string” (HTML markup) and so on. That way, a direct implicit conversion from “string” to “HTML string” would be impossible, and the only way a string could become HTML markup is by passing it through an escaping function.
“Register globals”, though disabling it is definitely a good idea, deals with a problem entirely different from XSS.

I find that using this function helps to strip out a lot of possible xss attacks:
<?php
function h($string, $esc_type = 'htmlall')
{
switch ($esc_type) {
case 'css':
$string = str_replace(array('<', '>', '\\'), array('<', '>', '/'), $string);
// get rid of various versions of javascript
$string = preg_replace(
'/j\s*[\\\]*\s*a\s*[\\\]*\s*v\s*[\\\]*\s*a\s*[\\\]*\s*s\s*[\\\]*\s*c\s*[\\\]*\s*r\s*[\\\]*\s*i\s*[\\\]*\s*p\s*[\\\]*\s*t\s*[\\\]*\s*:/i',
'blocked', $string);
$string = preg_replace(
'/#\s*[\\\]*\s*i\s*[\\\]*\s*m\s*[\\\]*\s*p\s*[\\\]*\s*o\s*[\\\]*\s*r\s*[\\\]*\s*t/i',
'blocked', $string);
$string = preg_replace(
'/e\s*[\\\]*\s*x\s*[\\\]*\s*p\s*[\\\]*\s*r\s*[\\\]*\s*e\s*[\\\]*\s*s\s*[\\\]*\s*s\s*[\\\]*\s*i\s*[\\\]*\s*o\s*[\\\]*\s*n\s*[\\\]*\s*/i',
'blocked', $string);
$string = preg_replace('/b\s*[\\\]*\s*i\s*[\\\]*\s*n\s*[\\\]*\s*d\s*[\\\]*\s*i\s*[\\\]*\s*n\s*[\\\]*\s*g:/i', 'blocked', $string);
return $string;
case 'html':
//return htmlspecialchars($string, ENT_NOQUOTES);
return str_replace(array('<', '>'), array('<' , '>'), $string);
case 'htmlall':
return htmlentities($string, ENT_QUOTES);
case 'url':
return rawurlencode($string);
case 'query':
return urlencode($string);
case 'quotes':
// escape unescaped single quotes
return preg_replace("%(?<!\\\\)'%", "\\'", $string);
case 'hex':
// escape every character into hex
$s_return = '';
for ($x=0; $x < strlen($string); $x++) {
$s_return .= '%' . bin2hex($string[$x]);
}
return $s_return;
case 'hexentity':
$s_return = '';
for ($x=0; $x < strlen($string); $x++) {
$s_return .= '&#x' . bin2hex($string[$x]) . ';';
}
return $s_return;
case 'decentity':
$s_return = '';
for ($x=0; $x < strlen($string); $x++) {
$s_return .= '&#' . ord($string[$x]) . ';';
}
return $s_return;
case 'javascript':
// escape quotes and backslashes, newlines, etc.
return strtr($string, array('\\'=>'\\\\',"'"=>"\\'",'"'=>'\\"',"\r"=>'\\r',"\n"=>'\\n','</'=>'<\/'));
case 'mail':
// safe way to display e-mail address on a web page
return str_replace(array('#', '.'),array(' [AT] ', ' [DOT] '), $string);
case 'nonstd':
// escape non-standard chars, such as ms document quotes
$_res = '';
for($_i = 0, $_len = strlen($string); $_i < $_len; $_i++) {
$_ord = ord($string{$_i});
// non-standard char, escape it
if($_ord >= 126){
$_res .= '&#' . $_ord . ';';
} else {
$_res .= $string{$_i};
}
}
return $_res;
default:
return $string;
}
}
?>
Source

Make you any session cookies (or all cookies) you use HttpOnly. Most browsers will hide the cookie value from JavaScript in that case. User could still manually copy cookies, but this helps prevent direct script access. StackOverflow had this problem durning beta.
This isn't a solution, just another brick in the wall

Don't trust user input
Escape all free-text output
Don't use magic_quotes; see if there's a DBMS-specfic variant, or use PDO
Consider using HTTP-only cookies where possible to avoid any malicious script being able to hijack a session

You should at least validate all data going into the database. And try to validate all data leaving the database too.
mysql_real_escape_string is good to prevent SQL injection, but XSS is trickier.
You should preg_match, stip_tags, or htmlentities where possible!

The best current method for preventing XSS in a PHP application is HTML Purifier (http://htmlpurifier.org/). One minor drawback to it is that it's a rather large library and is best used with an op code cache like APC. You would use this in any place where untrusted content is being outputted to the screen. It is much more thorough that htmlentities, htmlspecialchars, filter_input, filter_var, strip_tags, etc.

Use an existing user-input sanitization library to clean all user-input. Unless you put a lot of effort into it, implementing it yourself will never work as well.

I find the best way is using a class that allows you to bind your code so you never have to worry about manually escaping your data.

It is difficult to implement a thorough sql injection/xss injection prevention on a site that doesn't cause false alarms. In a CMS the end user might want to use <script> or <object> that links to items from another site.
I recommend having all users install FireFox with NoScript ;-)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.