I have a user input field which will be stored into a 'tinytext' field in a MySQL database; pretty standard stuff. I am wondering if there is some sort of standard or best-practice to adhere to when it comes to escaping html special characters using the php function htmlentities()?
Should I use htmlentities() before I store the data in the database or should I run the function on the data ever time it is output from the website?
There is usually no reason to use htmlentities() at all any more. Just store everything in UTF-8 fields and adhere to UTF-8 all the way through.
When outputting unsafe user input as HTML, use htmlspecialchars(), ideally at the time of output so you have a copy of the original data.
Related
I'm adding some xss protection to the website I'm working on, the platform is zendFrameWork 2 and therefor I'm using Zend\escaper. from zend documentation i knew that:
Zend\Escaper is meant to be used only for escaping data that is to be
output, and as such should not be misused for filtering input data.
For such tasks, the Zend\Filter component, HTMLPurifier.
but what are the riskes if i escaped the data before inserting it into the database, am i so wrong to do that? please explane to me as im somehow new to this topic.
thanks
When encoding data before storing it you will have to decode it before you can do anything sensible with it before outputting it. That's why I'd not do it.
Let's say you have an international application and you want to store the escaped value of a form field which might contain any NON-ASCII characters those might become escaped into HTML-Entities. So what if you have to quantify the content of that field? Like counting the characters? You will always have to de-escape the content before counting it. and then you have to re-escape it again. Much work done but nothing gained.
The same applies to search-operations in your database. You will have to escape the search-phrase the same way then your input for the database to understand what you are looking for.
I'd use one character-set throughout the application and database (I prefer UTF-8, beware of the MySQL-Connection....) and only escape content on output. Thant way I can then do whatever I like with the data and are on the safe side on output. And escaping is done in my view-layer automaticaly so I don't even have to think about it every time I handle data as it works automaticaly. That way you can't forget it.
That does not prevent me from filtering and sanitizing the input. And it doesn't prevent me from escaping the database-content using the appropriate database-escaping mechanisms like mysqli_real_escape_string or similar or using prepared statements!
But that's just my opinion, others might think otherwise!
"Output" here refers to the web page. A form field ( HTML tag) is an INPUT (from the webpage), any text is an OUTPUT (to the webpage). You need to ensure any output (to the webpage) does not contain dangerous characters that could be used to forge XSS attack vectors.
This said, if you have DANGEROUS_INPUT_X given by the user and then
$NOT_DANGEROUS_ANYMORE = ZED.HtmlPurifier(DANGEROUS_INPUT_X)
DBSave($NOT_DANGEROUS_ANYMORE)
and somewhere else
$OUTPUT = DBLoad($NOT_DANGEROUS_ANYMORE)
echo $OUTPUT
you should be fine, as long as you do not apply any additional encoding/decoding to this output. It will be displayed in the way it is saved, that was safe.
I would suggest to look at output encoding more than validation: HtmlPurifier cleans the HTML, while you could accept any kind of bad characters if you ensure your output is encoded in the page.
Here https://www.owasp.org/index.php/XSS_(Cross_Site_Scripting)_Prevention_Cheat_Sheet some general rules, here the PHP example
echo htmlspecialchars($DANGEROUS_INPUT_X_NOW_OUTPUT, ENT_QUOTES, "UTF-8");
Remember to set the Character Set and be consistent with the same one throughout your pages/scripts/binaries and in the database as well.
I'm building a category list and I'm unsure how to store the Ampersand symbol
in MySQL database. Is there any problem/disadvantage if I use '&'. Are there any differences
from using it in a html format '&'?
Using '&' saves a few bytes. It is not a special character for MySQL. You can easily produce the & for output later with PHP's method htmlspecialchars(). When your field is meant to keep simple information, you should use plain text only, as this is in general more flexible since you can generate different kinds of markup etc. later. Exception: the markup is produced by a user whose layout decisions you want to save with the text (as in rich-text input). If you have tags etc. in your input, you may want to use & for consistency.
You should store it as & only if the field in the DB contains HTML (like <span class="bold">some text</span> & more). In which case you should be very careful about XSS.
If the field contains some general data (like an username, title... etc) you should only escape it when you put it in your HTML (using htmlentities for example).
Storing it as & is an appropriate method. You can echo it or use it in statements as &.
We store '&' into database fields all the time, it's fine to do so (at-least I've never heard an argument otherwise).
If you're only ever using the string in a HTML page you could just store the HTML safe & version I suppose. I would suggest that storing '&' and escaping it when you read it would be better though (in-case you need to use the string in a non-HTML context in the future).
Use & if you want to have a valid HTML or avoid problems, like cut© (browser shows it as cut©).
What's the best route for storing data in MySQL. With MySQL should I just use, TEXT as my field type?
As well when using mysql_real_escape_string() with return'ed values \r\n .
But should I be running the htmlentities() on it after that?
And then when I return data to the screen I should use, NL2BR()?
Just trying to figure out the best route here for storing this information.
Thank you for your help!
TEXT or TINYTEXT or anything similar should be fine for storing ASCII data from the user. If you don't need a lot of space you may think about VARCHAR
i think that mysql_real_escape_string() escapes characters that may compromise the security of an SQL query (single quote, double quote, etc.) but doesn't do much more than that.
htmlentities() converts reserved html characters like < and > into their html encoded equivalent, < and > respectively. These characters are not dangerous for SQL queries so you probably do not need to escape them unless you want to display the HTML tag entered by the user as text, and not let it be interpreted as HTML.
NL2BR() is probably not necessary either.
Most importantly, your decision on when to use each of these functions will depend on your end application. You may need / want some but not others ( though you should definitely use mysql_real_escape_string() )
Really depends on what you are trying to store. For things such as usernames, passwords, etc... then you can use varchar. But if your storing long text such as news posts or html data, then you can use TEXT or LONG TEXT (Depending on how long it is).
You should ALWAYS use mysql_real_escape_string() when inserting into the DB. If you're outputting HTML from the DB, you may wan to run htmlentities or html_specialchars to ensure that you aren't outputting user injected javascript that could redirect your users to hacker websites and such.
One other idea is that you could escape your data using htmlentities before inserting into the DB, but it's your choice.
NL2BR is great for forcing all \r\n to tags instead.
So, it seems like your on the right track...
When encoding newline of textarea before storing into mysql using PHP with rawurlencode function encodes newline as %0D%0A.
For Example:
textarea text entered by user:
a
b
encoding using rawurlencode and store into database will store value as a%0D%0Ab
When retrieving from database and decoding using rawurldecode does not work and code gives error. How to overcome this situation and what is the best way to store and retrieve and display textarea values.
can you first encode this textarea string using base64_encode and then perform a base64_decode on the same, if the above does not work for you.
If the textarea does not contain URLs, you should rather use base64_encode then rawurlencode and then store as normal.
You simply should not use rawurlencode for escaping data for your database.
Each target format has it's own escaping method which in general terms makes sure it is stored/display/transferred safely from one place to another, and it doesn't need decoding at the other end.
For instance:
displaying text in HTML, use htmlentities or htmlspecialchars
storing in database, use mysqli_real_escape_string, pg_escape_string, etc...
transferring variablename, use urlencode
transferring variablecontent, use rawurlencode
etc...
You should notice that decoding these things is often done by the browser/database. So no data is actually stored escaped. And decoding doesn't need te be done by your code.
The problem is probably because you escape a sequence with rawurlencode, but your database expected the escaped format for the specific brand of database. And de-escaped it using that assumption, which was wrong, which messed up your string.
Conclusion: find out what brand database you are using, look up the specific escape function for that database, and use the proper escaping function on all your content "transferral".
P.S.: some definition may not be correct, please comment on that. I wanted to make the idea stick but am probably not using all the right terms.
First of all it is very uncommon to run textarea through urlencode()
urlencode was not designed for this purpose.
Second, if you still want to do this, then maybe the problem comes from database. First you need to tell us what database you using and what TYPE you using for storing this data: do you store it as TEXT or as BINARY data? Have you setup the correct charset in database?
I was wondering if converting POST input from an HTML form into html entities, (via the PHP function htmlentities() or using the FILTER_SANITIZE_SPECIAL_CHARS constant in tandem with the filter_input() PHP function ), will help defend against any attacks where a user attempts to insert any JavaScript code inside the form field or if there's any other PHP based function or tactic I should employ to create a safe HTML form experience?
Sorry for the loaded run-on sentence question but that's the best I could word it in a hurry.
Any responses would be greatly appreciated and thanks to all in advance.
racl101
It would turn the following:
<script>alert("Muhahaha");</script>
into
<script>alert("Muhahaha");</script>
So if you're printing out this data into HTML later, you would be protected. It wouldn't protect you from:
"; alert("Muhahaha");
just in case you were echoing into a script like so:
var t = "Hello there <?php echo $str;?>";
For this purpose, you should use addslashes() and a database string escaping method like mysql_real_escape_string().
yes, that is one way to sanitise. it has the benefit that you can always display the database contents without fear of xss attacks. however, a 'purer' approach is to store the raw data in the database and sanitise in the view - so every time you want to show the text, use htmlentities() on it.
however, your approach does not take into account sql injection attacks. you might want to look at http://php.net/manual/en/function.mysql-real-escape-string.php to guard against that.
Yes, do this when you want to display data to a webpage, but I recommend you don't store the HTML in the database as encoded, this may seem fine for large text fields, but when you have shorter titles, say a 32 character, a normal 30 character string that contains an & would become & and this would either cause a SQL error or the data to be cut off.
So the rule of thumb is, store everything row (obviously prevent SQL injection) and treat EVERYTHING as tainted, no matter where it comes from: the database, user forms, rss feeds, flat files, XML, etc. This is how you build good security without worrying about the data overflowing, or the fact you might oneday need to extract the data to a non web user where the HTML encoding is a problem.