How to retrieve original text after using htmlspecialchars() and htmlentities() - php

I have some text that I will be saving to my DB. Text may look something like this: Welcome & This is a test paragraph. When I save this text to my DB after processing it using htmlspecialchars() and htmlentities() in PHP, the sentence will look like this: Welcome & This is a test paragraph.
When I retrieve and display the same text, I want it to be in the original format. How can I do that?
This is the code that I use;
$text= htmlspecialchars(htmlentities($_POST['text']));
$text= mysqli_real_escape_string($conn,$text);

There are two problems.
First, you are double-encoding HTML characters by using both htmlentities and htmlspecialchars. Both of those functions do the same thing, but htmlspecialchars only does it with a subset of characters that have HTML character entity equivalents (the special ones.) So with your example, the ampersand would be encoded twice (since it is a special character), so what you would actually get would be:
$example = 'Welcome & This is a test paragraph';
$example = htmlentities($example);
var_dump($example); // 'Welcome & This is a test paragraph'
$example = htmlspecialchars($example);
var_dump($example); // 'Welcome & This is a test paragraph'
Decide which one of those functions you need to use (probably htmlspecialchars will be sufficient) and use only one of them.
Second, you are using these functions at the wrong time. htmlentities and htmlspecialchars will not do anything to "sanitize" your data for input into your database. (Not saying that's what you're intending, as you haven't mentioned this, but many people do seem to try to do this.) If you want to protect yourself from SQL injection, bind your values to prepared statements. Escaping it as you are currently doing with mysqli_real_escape_string is good, but it isn't really sufficient.
htmlspecialchars and htmlentities have specific purposes: to convert characters in strings that you are going to output into an HTML document. Just wait to use them until you are ready to do that.

Related

Does escape order of strings matters in some way? (HTML and SQL injection)

We are developing a small website at school, so we were introduced to SQL injection in PHP and how to prevent it.
However I see most online examples and text books doing the following
$str = $_POST['user_input'];
$str = $mysqli->real_escape_string($str);
$str = htmlentities($str);
However I think the most logical way to validate input from users is:
$str = htmlentities($str);
$str = $mysqli->real_escape_string($str);
Does that make any difference for
Site accessibility (characters showed to end user, the string will just be "echoed" once retrieved from Database)
Preventing HTML and SQL injection (does wrong order allows injection?)
Performance of the server (for example one function is most expensive while the other increase string lenght, or just because the final string lenght is different and we want to save bytes on our Database)
?
Would be nice to cover also more escaping functions (maybe there is some dangerous combination/order that we should avoid).
I think that the right way is to search for escaping functions that outputs "dangerous characters" (dangerouse as HTML or as SQL, if any exist) and then just provide an input that generate those dangerous characters.
The order will result in different output.
The following code:
$string = 'Example " string';
echo htmlentities($mysqli->real_escape_string(($string))
. "\n"
. $mysqli->real_escape_string((htmlentities($string));
Results in:
Example \" string
Example " string
The output is different because if you escape the string before converting to HTML entities, it has a quote that needs escaping whereas if you do it in the reverse order the quote is replaced with the HTML entity and is a valid string value for MySQL.
That said, the two functions have entirely different purposes.
htmlentities is for converting strings to their HTML entities ready for output to a web browser.
real_escape_string is for converting a string for use between quotation marks in a MySQL query.
The two do not go together, you should store the text in the database (which will need escaping before being passed to) and convert to HTML entities when it comes to displaying it.
If you insist on storing the HTML entities version of the string in the database the correct way is to use htmlentities first, then escape it. Escaping of a string should be the last operation on it before passing to the database.
Doing it in the wrong way may result in stray backslashes as shown above, although when passed to MySQL these will actually be ignored as \& is not a valid escape sequence. You would only notice a difference if outputting the variable that was passed to the database (as opposed to later retrieving it from the database and then outputting it).
You may also want to look into prepared statements in mysqli:
http://www.php.net/manual/en/mysqli.prepare.php

Sanitizing user input HTML with htmlspecialchars, nl2br, str_replace, htmlspecialchars_decode, and stripslashes

I made this function
function echoSanitizer($var)
{
$var = htmlspecialchars($var, ENT_QUOTES);
$var = nl2br($var, false);
$var = str_replace(array("\\r\\n", "\\r", "\\n"), "<br>", $var);
$var = htmlspecialchars_decode($var);
return stripslashes($var);
}
Would it be safe from xss attacks?
htmlspecialchars to take away html tags
nl2br for the new lines
str_replace to convert the \r\n to <br>
htmlspecialchars_decode to convert back the original characters
stripslashes to STRIPSLASHES
Why I need all of that? Because I want to preview what the users inputed in and I wanted a WYSIWYG thing for them to see. Some of the input came from a textarea box and I wanted the spaces to be preserved so the nl2br is needed.
Generally I'm asking about the (htmlspecialchars_decode) because its new to me. Is it safe? As a whole is the function I made safe if I use it to display user input?
(No database involved in this scenario.)
In your case htmlspecialchars_decode() makes the function unsafe. Users must not be allowed to insert < character unescaped, because that allows them to create arbitrary tags (and filtering/blacklisting is a cat and mouse game you can't win).
At very minimum < must be escaped as <.
If you only allow plain text with newlines, then:
nl2br(htmlspecialchars($text_with_newlines, ENT_QUOTES));
is safe to output in HTML (except inside <script> or attributes that expect JavaScript or URLs such as onclick and href (in the latter case somebody could use javascript:… URL)).
If you want to allow users to use HTML tags, but not exploit your page, then correct function to do this won't fit in StackOverflow post (thousands of lines long, requires full HTML parser, processing of URLs and CSS, etc.) — you'll have to use something heavy-weight like HTMLPurifier.

How to escape html code to insert in mysql

I am using tinymce editor to have html page and then insert it in mysql.
I tried this:
$esdata = mysql_real_escape_string($data);
it is working for all html except images. If I have hyperlink like:
http://www.abc.com/pic.jpg
then it makes it somewhat very obscure and the image doesn't appear.
INPUT
<img src="../images/size-chart.jpg" alt="Beer" />
OUPUT
<img src="\""images/size-chart.jpg\\"\"" alt="\"Beer" />
Try to use urlencode and urldecode to escape the string.
As Christian said it is not used for the sake of DB but to keep the things as it is. So you can also use urlencode and urldecode.
For Ex:
//to encode
$output = urlencode($input);
//to decode
$input = urldecode($output);
You shouldn't over-escape code before you send it to DB.
When you escape it, it's done in a way that it is stored in the DB as it was originally. Escaping is not done for the sake of the DB, but for the sake of keeping the data as it was without allowing users to inject bad stuff in your sql statements (prior to sending the stuff in the DB).
You should use htmlspecialchars function to encode the string and htmlspecialchars_decode to display the string back to html

strip_tags and htmlentities

Should I use htmlentities with strip_tags?
I am currently using strip_tags when adding to database and thinking about removing htmlentities on output; I want to avoid unnecessary processing while generating HTML on the server.
Is it safe to use only strip_tags without allowed tags?
First: Use the escaping method only as soon as you need it. I.e. if you insert something into a database, only escape it for the database, i.e. apply mysql_real_escape_string (or PDO->quote or whatever database layer you are using). But don't yet apply any escaping for the output. No strip_tags or similar yet. This is because you may want to use the data stored in the database someplace else, where HTML escaping isn't necessary, but only makes the text ugly.
Second: You should not use strip_tags. It removes the tags altogether. I.e. the user doesn't get the same output as he typed in. Instead use htmlspecialchars. It will give the user the same output, but will make it harmless.
strip_tags will remove all HTML tags:
"<b>foo</b><i>bar</i>" --> "foobar"
htmlentities will encode characters which are special characters in HTML
"a & b" --> "a & b"
"<b>foo</b>" --> "<b>foo</b>"
If you use htmlentities, then when you output the string to the browser, the user should see the text as they entered it, not as HTML
echo htmlentities("<b>foo</b>");
Visually results in: <b>foo</b>
echo strip_tags("<b>foo</b>");
Results in: foo
I wouldn't use htmlentities as this will allow you to insert the string, as is, into the database. Yhis is no good for account details or forums.
Use mysql_real_escape_string for inserting data into the database, and strip_tags for receiving data from the database and echoing out to the screen.
try this one and see the differences:
<?php
$d= isset($argv[1]) ? $argv[1] : "empty argv[1]".PHP_EOL;
echo strip_tags(htmlentities($d)) . PHP_EOL;
echo htmlentities(strip_tags($d)) . PHP_EOL;
?>
open up cmd or your terminal and type something like following;
php your_script.php "<br>foo</br>"
this should get what you want and safe !

Do I need htmlentities() or htmlspecialchars() in prepared statements?

In an article http://dev.mysql.com/tech-resources/articles/4.1/prepared-statements.html, it says the followings:
There are numerous advantages to using prepared statements in your applications, both for security and performance reasons.
Prepared statements can help increase security by separating SQL logic from the data being supplied. This separation of logic and data can help prevent a very common type of vulnerability called an SQL injection attack.
Normally when you are dealing with an ad hoc query, you need to be very careful when handling the data that you received from the user. This entails using functions that escape all of the necessary trouble characters, such as the single quote, double quote, and backslash characters.
This is unnecessary when dealing with prepared statements. The separation of the data allows MySQL to automatically take into account these characters and they do not need to be escaped using any special function.
Does this mean I don't need htmlentities() or htmlspecialchars()?
But I assume I need to add strip_tags() to user input data?
Am I right?
htmlentities and htmlspecialchars are used to generate the HTML output that is sent to the browser.
Prepared statements are used to generate/send queries to the Database engine.
Both allow escaping of data; but they don't escape for the same usage.
So, no, prepared statements (for SQL queries) don't prevent you from properly using htmlspecialchars/htmlentities (for HTML generation)
About strip_tags: it will remove tags from a string, where htmlspecialchars will transform them to HTML entities.
Those two functions don't do the same thing; you should choose which one to use depending on your needs / what you want to get.
For instance, with this piece of code:
$str = 'this is a <strong>test</strong>';
var_dump(strip_tags($str));
var_dump(htmlspecialchars($str));
You'll get this kind of output:
string 'this is a test' (length=14)
string 'this is a <strong>test</strong>' (length=43)
In the first case, no tag; in the second, properly escaped ones.
And, with an HTML output:
$str = 'this is a <strong>test</strong>';
echo strip_tags($str);
echo '<br />';
echo htmlspecialchars($str);
You'll get:
this is a test
this is a <strong>test</strong>
Which one of those do you want? That is the important question ;-)
Nothing changes for htmlspecialchars(), because that's for HTML, not SQL. You still need to escape HTML properly, and it's best to do it when you actually generate the HTML, rather than tying it to the database somehow.
If you use prepared statements, then you don't need mysql_[real_]escape_string() anymore (assuming you stick to prepared statements' placeholders and resist temptation to bypass it with string manipulation).
If you want to get rid of htmlspecialchars(), then there are HTML templating engines that work similarily to prepared statements in SQL and free you from escaping everything manually, for example PHPTAL.
You don't need htmlentities() or htmlspecialchars() when inserting stuff in the database, nothing bad will happen, you will not be vulnerable to SQL injection if you're using prepared statements.
The good thing is you'll now store the pristine user input in your database.
You DO need to escape stuff on output and sending it back to a client, - when you pull stuff out of the database else you'll be vulnerable to cross site scripting attacks, and other bad things. You'll need to escape them for the output format you need, like html, so you'll still need htmlentities etc.
For that reason you could just escape things as you put them into the database, not when you output it - however you'll lose the original formatting of the user, and you'll escape the data for html use which might not pay off if you're using the data in different output formats.
prepare for SQL Injection
htmlspecialchar for XSS(redirect to another link)
<?php
$str = "this is <script> document.location.href='https://www.google.com';</script>";
echo $str;
output: this is ... and redirect to google.com
Using htmlspecialchars:
$str = "this is <script> document.location.href='https://www.google.com';</script>";
echo htmlspecialchars($str);
<i>output1</i>: this is <script> document.location.href='https://www.google.com';</script> (in output browser)<br />
<i>output2</i>: this is <script> document.location.href='https://www.google.com';</script> (in view source)<br />
If user input comment "the script" into database, then browser display
all comment from database, auto "the script" will executed and
redirect to google.com
So,
1. use htmlspecial for deactive bad script tag
2. use prepare for secure database
htmlspecialchars
htmlspecialchars_decode
php validation
I would still be inclined to encode HTML. If you're building some form of CMS or web application, it's easier to store it as encoded HTML, and then re-encode it as required.
For example, when bringing information into a TextArea modified by TinyMCE, they reccomend that the HTML should be encoded - since the HTML spec does not allow for HTML inside a text area.
I would also strip_tags() from anywhere you don't want HTML code.

Categories