Only run HTML (PHP) - php

I'm here with a question on a project; I try to explain as best as possible:
I have a text area in which the user can write whatever they want.
The problem is that they can try for some kind of malicious code (js xss, for example)
I was using the function:
echo htmlspecialchars($topic->getMessage(), ENT_QUOTES, 'UTF-8');
I thought I had solved the problem, but I remembered that the user can type HTML, and it is allowed.
Is there any function already made for running HTML and others stay as text?

As per PHP manual, htmlspecialchars performs the following translations:
'&' (ampersand) becomes '&'
'"' (double quote) becomes '"' when ENT_NOQUOTES is not set.
"'" (single quote) becomes ''' (or ') only when ENT_QUOTES is set.
'<' (less than) becomes '<'
'>' (greater than) becomes '>'
Your HTML actually does get translated into safe characters.
After reading your question again (for it's not very clear), I thought maybe you want the HTML tags actually stay as HTML tags, meaning <b>bold</b> wouldn't get translated into <b>bold</b>
To do so, you may want to use str_replace after htmlspecialchars:
$result = htmlspecialchars($topic->getMessage(), ENT_QUOTES, 'UTF-8');
$result = str_replace(array("<",">"), array("<",">"), $result);
echo $result;
Or you could just translate &, ' (single quote) and " (double quote) via str_replace:
echo str_replace(array("&", "\"", "'"), array("&", """, "'"), $topic->getMessage());
Possibilities are endless.

htmlspecialchars is ok but not completely safe to insert into mysql.
For mysql it's better to use prepared statements, such as explained here:
http://bobby-tables.com/php.html
For output in the page (without inserting on database), htmlspecialchars is enough... provided you don't decode those before printing.
Like CBroe suggested, You could use http://htmlpurifier.org/ to clean the html and avoid garbage in your database, but you still must use prepared statements.
Also read: http://php.net/manual/en/pdo.prepared-statements.php

Related

PHP's htmlspeciachars not working on single quotes

I'm trying to convert a single quote into its relevant HTML code for database insertion, but it does not appear to be working. When I create the following script:
<?php
$str = "& and ' and \" and < and >";
echo htmlspecialchars($str);
?>
My browser returns the following:
& and ' and " and < and >
What am I doing wrong? I've read the PHP manual on htmlspecialchars() function and it says it applies to single quotes, but it doesn't seem to be working for me.
Use htmlentities() with the flag ENT_QUOTES. From the manual:
ENT_QUOTES Will convert both double and single quotes.
htmlentities($text, ENT_QUOTES);
If you just want to replace ' to ' you could use str_replace(), of course:
str_replace("'", "'", $text);
However, since you want to insert the data into SQL code, please look into prepared statements in PDO or MySQLi. These functions serve the exact purpose you need (from what I can tell) and will be better than your own function. After all, why reinvent the wheel?
Just for the record, be sure not to use the deprecated MySQL functions in PHP – as explained in _Why shouldn't I use mysql__* functions in PHP?.

CKEditor is escaping html elements

I am using CKEditor to insert text into a MySQL database. I have noticed that my installed CKEditor is escaping all HTML elements when the data reaches the database.
Therefore the following is what I am getting in the database after I have inserted the text with CKEditor:
'&' (ampersand) becomes '&'
'"' (double quote) becomes '"
"'" (single quote) becomes ''
'<' (less than) becomes '<'
'>' (greater than) becomes '>'
I would rather disable the CKEditor HTML escaping completely, and rely on my PHP script to handle the HTML escaping using PHP's htmlspecialchars.
Another good reason for me to disable CKEditor's HTML escaping ability is that I want to preserve the written content in the MySQL database. In other words I want to keep the single quotes and double quotes in the database, and then I want to have PHP sanitise the HTML elements with htmlspecialchars when I print the database data to page using MySQL select statement.
Can anybody tell me how to disable html escaping within CKeditor? Your input or any advice on the above would be much appreciated.
Here you go:
config.entities
config.basicEntities

Mitigate xss attacks when building links

I posted this question a while back and it is working great for finding and 'linkifying' links from user generated posts.
Linkify Regex Function PHP Daring Fireball Method
<?php
if (!function_exists("html")) {
function html($string){
return htmlspecialchars($string, ENT_QUOTES, 'UTF-8');
}
}
if ( false === function_exists('linkify') ):
function linkify($str) {
$pattern = '(?xi)\b((?:(http)s?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))';
return preg_replace_callback("#$pattern#i", function($matches) {
$input = $matches[0];
$url = $matches[2] == 'http' ? $input : "http://$input";
return '' . "$input";
}, $str);
}
endif;
echo "<div>" . linkify(html($row_rsgetpost['userinput'])) . "</div>";
?>
I am concerned that I may be introducing a security risk by inserting user generated content into a link. I am already escaping user content coming from my database with htmlspecialchars($string, ENT_QUOTES, 'UTF-8') before running it through the linkify function and echoing back to the page, but I've read on OWASP that link attributes need to be treated specially to mitigate XSS. I am thinking this function is ok since it places the user-generated content inside double quotes and has already been escaped with htmlspecialchars($string, ENT_QUOTES, 'UTF-8'), but would really appreciate someone with xss expertise to confirm this. Thanks!
First of data must NEVER be escaped before entering the database, this is very serious mistake. This is not only insecure, but it breaks functionality. Chaining the values of strings, is data corruption and affects string comparison. This approach is insecure because XSS is an output problem. When you are inserting data into the database you do not know where it appears on the page. For instance, even if you where this function the following code is still vulnerable to XSS:
For example:
<a href="javascript:alert(1)" \>
In terms of your regular expression. My initial reaction was, well this is a horrible idea. No comments on how its supposed to work and heavy use of NOT operators, blacklisting is always worse than white-listing.
So I loaded up Regex Buddy and in about 3 minutes I bypassed your regex with this input:
https://test.com/test'onclick='alert(1);//
No developer wants to write a vulnerably, so they are caused with a breakdown in how programmer thinks his application is working, and how it actually works. In this case i would assume you never tested this regex, and its a gross oversimplification of the problem.
HTMLPurifer is a php library designed to clean HTML, it consist of THOUSANDS of regular expressions. Its very slow, and is bypassed on a fairly regular basis. So if you go this route, make sure to update regularly.
In terms of fixing this flaw i think your best off using htmlspecialchars($string, ENT_QUOTES, 'UTF-8'), and then enforcing that the string start with 'http'. HTML encoding is a form of escaping, and the value will be automatically decoded such that the URL is unmolested.
Because the data is going into an attribute, it should be url (or percent) encoded:
return '' . "$input";
Technically it should also then be html encoded
return '' . "$input";
but no browsers I know of care and consequently no-one does it, and it sounds like you might be doing this step already and you don't want to do this twice
Your regular expression is looking for urls that are of http or https. That expression seems to be relatively safe as in does not detect anything that is not a url.
The XSS vulnerability comes from the escaping of the url as html argument. That means making sure that the url cannot prematurely escape the url string and then add extra attributes to the html tag that #Rook has been mentioning.
So I cannot really think of a way how an XSS attack could be performed the following code as suggested by #tobyodavies, but without urlencode, which does something else:
$pattern = '(?xi)\b((?:(http)s?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))';
return preg_replace_callback("#$pattern#i", function($matches) {
$input = $matches[0];
$url = $matches[2] == 'http' ? $input : "http://$input";
return '' . "$input";
}, $str);
Note that I have also a added a small shortcut for checking the http prefix.
Now the anchor links that you generate are safe.
However you should also sanitize the rest of the text. I suppose that you don't want to allow any html at all and display all the html as clear text.
Firstly, as the PHP documentation states htmlspecialchars only escapes
" '&' (ampersand) becomes '&'
'"' (double quote) becomes '"' when ENT_NOQUOTES is not set.
"'" (single quote) becomes ''' (or &apos;) only when ENT_QUOTES is set.
'<' (less than) becomes '<'
'>' (greater than) becomes '>'
". javascript: is still used in regular programming, so why : isn't escaped is beyond me.
Secondly, if !html only expects the characters you think will be entered, not the representation of those characters that can be entered and are seen as valid. the utf-8 character set, and every other character set supports multiple representations for the same character. Also, your false statement allows 0-9 and a-z, so you still have to worry about base64 characters. I'd call your code a good attempt, but it needs a ton of refining. That or you could just use htmlpurifier, which people can still bypass. I do think that it is awesome that you set the character set in htmlspecialchars, since most programmers don't understand why they should do that.

php - Clean user input using preg_replace_callback and ord()?

I have a forum style text box and I would like to sanitize the user input to stop potential xss and code insertion. I have seen htmlentities used, but then others have said that &,#,%,: characters need to be encoded as well, and it seems the more I look, the more potentially dangerous characters pop up. Whitelisting is problematic as there are many valid text options beyond ^a-zA-z0-9. I have come up with this code. Will it work to stop attacks and be secure? Is there any reason not to use it, or a better way?
function replaceHTML ($match) {
return "&#" . ord ($match[0]) . ";";
}
$clean = preg_replace_callback ( "/[^ a-zA-Z0-9]/", "replaceHTML", $userInput );
EDIT:_____________________________
I could of course be wrong, but it is my understanding that htmlentities only replaces & < > " (and ' if ENT_QUOTES is turned on). This is probably enough to stop most attacks (and frankly probably more than enough for my low traffic site). In my obsessive attention to detail, however, I dug further. A book I have warns to also encode # and % for "shutting down hex attacks". Two websites I found warned against allowing : and --. Its all rather confusing to me, and led me to explore converting all non-alphanumeric characters. If htmlentities does this already then great, but it does not seem to. Here are results from code I ran I copied after clicking view source in firefox.
original (random characters to test):
5:gjla#''*&$!j-l:4
preg_replace_callback:
<b>5:</b>gjla<hi>#''*&$!j-l:4
htmlentities (w/ ENT_QUOTES):
<b>5:</b>gjla<hi>#''*&$!j-l:4
htmlentities appears to not be encoding those other characters like :
Sorry for the wall of text. Is this just me being paranoid?
EDIT #2: ___________
All you need to do to stop XSS attacks is use htmlspecialchars().
That is exactly what htmlentities does already:
http://codepad.viper-7.com/NDZMa3
It will convert (spaced to prevent stackoverflow double encoding):
"& # amp ;"
to
"& # amp; # amp ;"
space ' ' can be changed to \s in your regex, also by adding /i at the end of the regex you made it case insensitive, and you don't need manually translate your chars to sequences, it can be done with a callback of htmlentities
$clean = preg_replace_callback('/[^a-z0-9\s]/i', 'htmlentities', $userInput);

Apostrophe issue

I have built a search engine using php and mysql.
Problem:
When I submit a word with an apostrophe in it and return the value to the text field using $_GET the apostrophe has been replaced with a backslash and all characters after the apostrophe are missing.
Example:
Submitted Words: Just can't get enough
Returned Value (Using $_GET): Just can\
Also the url comes up like this:search=just+can%27t+get+enough
As you can see the ' has been replaced with a \ and get enough is missing.
Question:
Does anybody know what causes this to happen and what is the solution to fix this problem?
The code:
http://tinypaste.com/11d62
If you're running PHP version less than 5.3.0, the slash might be added by the Magic Quotes which you can turn off in the .ini file.
From your description of "value to the text field" I speculate you have some output code like this:
Redisplay
<input value='<?=$_GET['search']?>'>
In that case the contained single quote will terminate the html attribute. And anything behind the single quote is simply garbage to the browser. In this case applying htmlspecialchars to the output helps.
(The backslash is likely due to magic_quotes or mysql_*_escape before outputting the text. I doubt the question describes a database error here.)
Update: It seems it's indeed an output problem here:
echo "<a href='searchmusic.php?search=$search&s=$next'>Next</a>";
Regardless of if you use single or double quotes you would need:
echo "<a href='searchmusic.php?search="
. htmlspecialchars(stripslashes($search))
. "&s=$next'>Next</a>";
(Notice that using stripslashes is a workaround here. You should preserve the original search text, or disable the magic_quotes rather.)
Okay I forgot something crucial. htmlspecialchars needs the ENT_QUOTES parameter - always, and in your case particularly:
// prepare for later output:
$search = $_GET['search'];
$html_search = htmlspecialchars(stripslashes($search), ENT_QUOTES);
And then use that whereever you wanted to display $search before:
echo "<a href='searchmusic.php?search=$html_search&s=$next'>Next</a>";
Single quotes are important in PHP and MySQL.
A single quote is a delimeter for a string in PHP, for example:
$str = 'my string';
If you want to include a literal quote inside a string you must tell PHP that the quote is not the end of the string. It is escaped with the backslash, for example:
$str = 'my string with a quote \' inside it';
See PHP Strings for more on this.
MySQL operates in a similar way. An example query might be:
$username = 'andyb';
$quert = "SELECT * FROM users WHERE user_name = '$username'";
The single quote delimits the string parameter. If the $username included a single quote, this would cause the query to end prematurely. Correctly escaping parameters is an important concept to be familiar with as it is one attack vector for breaking into a database - see SQL Injection for more information.
One way to handle this escaping is with mysql_real_escape_string().

Categories