Hi
I was wondering when is the appropriate place to use htmlspecialchars(). Is it before inserting data to database or when retrieving them from the database?
You should only call this method when echoing the data into HTML.
Don't store escaped HTML in your database; it will just make queries more annoying.
The database should store your actual data, not its HTML representation.
You use htmlspecialchars EVERY time you output content within HTML, so it is interperted as content and not HTML.
If you allow content to be treated as HTML, you have just opened the door to bugs at a minimum, and total XSS hacks at worst.
Save the exact thing that the user enters into the database.
then when displaying it to public, use htmlspecialchars(), so that it offers some xss protection.
Guide - How to use htmlspecialchars() function in PHP
To begin you have to understand 1 simple concept: Render.
What Render is?
Render is when the HTML transforms
<b>Hello</b>
to bold like this Hello. That's render.
So...When to use the htmlspecialchars() function?
Wherever you want to render HTML contents.
For example, if you are using JQuery and you do this:
$("#YourDiv").html("<b>Hello</b>");
The div contents will be Hello. It rendered the text into HTML.
If you want to display the message in this way (was wrote by user):
<b>Hello</b>
you have to put:
$("#YourDiv").text("<b>Hello</b>");
In that way the Hello will never be rendered.
If you want to load the message (as wrote by user) into a textbox, textarea, etc... You have to put:
<input type="text" class="Texbox1" value="">
<script>
$(".Textbox1").val("<b>Hello</b>");
</script>
That will display
<b>Hello</b>
Inside the Textbox without problems.
Conclusion:
What ever data the user input into your forms, etc...Save the data as normally.
Do not use any function. If user sent 12345 save as it is. Do not filter nothing. You only have to filter when you are going to display the data in the page to the users. YOU, ONLY YOU decide if you want to render or not what the user wrote. *Remember that.
Regards!
Related
What is the most secure way to save data from a textarea that contains a <pre><code> text in it? , using strip_tags will remove all the tags from the text..
is it save to use this:
strip_tags($input, '<pre><code><other accepted tags except script,php,...');
or should I do other things too?
What is the most secure way to save data from a textarea that contains a <pre><code> text in it?
Save it as it is.
When you take that data back out of the database and put it into a web page, call htmlspecialchars on it first to escape it so that it looks like normal text on the page.
If you want the user to be able to input actual markup, but you only want to allow certain tags, then you've got a different problem and you want something like htmlpurifier.
Either way, the input or database layer is not the right place to be worrying about output formatting concerns.
If you are saving the contents of the text area to mysql database you should use mysqli_escape_string. before saving the data.
Also you can remove javascript tags from the posted data using regular expression. e.g preg_replace
I am creating a rather small web application in PHP, where a (trusted) administrator can, amongst other things, store hundreds of objects in a database. The user can enter a number of details about these objects in the form of text fields (an input element with the type attribute set to "text").
The objects with their details are echoed in the form of a table, escaped by the htmlspecialchars function. This function, however, does not prevent against the malicious use of html tags, for example, the <script> tag.
The question is whether all user entered data (every cell in the table) should be purified by something like HTMLPurifier, which is already used elsewhere in the application. And if so, what would be the best way to do it as using HTMLPurifier thousands of times, as there are many details, may cause some serious performance issues.
The objects with their details are echoed in the form of a table, escaped by the htmlspecialchars function. This function, however, does not prevent against the malicious use of html tags, for example, the <script> tag.
Yes it does. They get harmlessly and correctly output as <script>.
The question is whether all user entered data (every cell in the table) should be purified by something like HTMLPurifier
Nope. You should only use HTMLPurifier on fields where you are deliberately intending to allow the user to enter markup for direct rendering to the page, for example a comment system where the user can type <i> for italics.
For other input that you are treating as plain text, htmlspecialchars remains the right thing to do when outputting to HTML.
Everything should be checked and cleaned before you save it into database. Principle is that you DO NOT TRUST anything which is coming from user.
ALWAYS escape everything.
Or just use tools which will do that for you - like frameworks.
Here is an example of the workflow a user can have on my website :
Create a task, with content: I use htmlentities to encode the content and store it in my database (yes, I've decided to store the encoded content);
The user comes back later and clicks to view the task. The thing is, the preview of the content is done in a disabled textarea.
I tried to use htmlentities_decode when printing the content in the textarea (XSS problem if the user entered bad things);
I just print the encoded text and everything is fine.
The user clicks on EDIT, this will make the textarea editable
The user clicks on SAVE.
Here is my main issue, as I didn't decode the text before I printed it, it is still encoded and when the user saves it, it is re-encoded. So, the previous content is double encoded.
So, if the first time the user enters something like:
blablabla </textarea/> yeah!
Then, it's encoded and the result is:
blablabla </textarea/> yeah!
Then, when I display it, it displays as the user previously entered it but if he saves it, the result is:
blablabla </textarea/> yeah!
And, so, if he displays it again, it is not well displayed (and it also takes more and more space in my database as the user keeps editing his task).
Well, I am sure this is a problem a lot of people have experienced but I can't find any good solution.
By the way, I am using htmlentities with ENT_QUOTES.
ahah, here is my main issue, as I didn't decode the text before I
printed it, it is still encoded and when the user save it, it is
reencoded. So, the previous content is double-encoded.
This is actually correct, you shouldn't decode the text before you print it. In fact, it must be HTML encoded when output in the HTML page. It is not still encoded when the user submits it because the browser will have already interpreted the HTML entities.
Unless... you are creating a TEXT_NODE in the DOM and assigning the encoded data to this (in the textarea)? In which case the browser will not interpret the HTML entities and you will end up resubmitting already encoded data. Assign to the innerHTML property instead, if this is the case. However, the HTML entities would be clearly visible in the form to the end user (on the first edit), before the data is submitted, which does not appear to be the case?
Hum,
I fixed my problem.
I didn't noticed but for the first entry, I was using htmlentities() and when editing, I was using the Zend escape() function.
Using only htmlentities() fixed the problem. I don't know how the escape() function of ZF works, but I won't use it in the future :p
Thanks you for answers :)
Anyway, so, I am wondering, the htmlentities_decode() function, in which situation should it be used? As I htmlentities() when I get the form and print it like that, I never use the htmlentities_decode(). Is that normal? So I am wondering what is this function used for?
Thanks again!
I'm creating my own blog in PHP and want to know your opinions on how I should format my post content.
Currently I store the post content as just plain text, call it when necessary, then wrap each line with P tags. I did this in case I wanted to change the way I formatted my text in the future and it would save me the dilema of having to remove all P tags from the posts in the DB.
Now the problem I have this this method is that if I want to add extra formatting in, e.g. lists etc those would also be wrapped with P tags which is not correct.
How would you do this, would you store text as plain text in the DB, or would you add the HTML formatting and store that in the DB to?
I'd prefer not to store unnessary HTML in the DB, but not sure of a way around it?
I think the best way would be to keep the html in the db. You would have too much to work with parsing the text if you don't use html.
See how it's done in other blog tools. I know that Joomla, for example, keeps all html in the db. I know Joomla isn't blog tool :) but still...
Wordpress stores html in the db. You say you are concerned about storing 'unnecessary' html in the db. What makes it unnecessary? I think it is the opposite. You may have headings or bold or italic text in your post. If storing as plain text, how do you save this formatting? How are you saving the lists you mentioned?
I see it as a better practice to store raw user input in the database, and format it on output, caching the result if it is needed. That way you can change the way you are parsing things easily without having to regex-replace anything inside the database. You can also store the raw input in one column, and the formatted HTML in another one.
I assume that you are formatting your raw text with the Markdown or the Textile syntax?
If you store HTML in your DB, you will be just a few clicks away from your current situation:
you can use strip_tags() to remove HTML formating and in case of bigger changes, you can run HTML Tidy on your code to remap tags and classes.
Hey guys Im building a web-app where users can login and post/read articles and comment and things.
Im giving them a form to post an article where they provide its title, description and text.
leaving the validations and sql injections aside (already done that), I need help with displaying the article stored in MySQL database as TEXT.
Im taking the article text from a textarea, and displaying it in a p tag but then obviously it skips the new line characters entered by the users, but the pre tag makes it ugly by giving a wide scrollable display.I want to know which tag is appropriate to be used for this purpose? or is even taking an article through textarea correct?
Im a learner and am building such a webapp with articles and comments sections for the first time, so any suggestions are most welcome. Thank you in advanced.
My recommendation would be of two choices:
1. Use Plain Text:
If you want that user can not put any HTML in the contents, show a simple HTML Textarea input to user, then when the user enters a new line (Enter key) it would be \n in your database. When you want to print the article just use nl2br($article_contents); and it will convert the new lines (\n) into HTML line breaks.
2. Rich Text:
If you want users to put HTML contents in article then it would be easy if you use any Text Editors like TinyMCE. TinyMCE will make it easy for your users to do simple HTML Formatting like headings, bold, italic, paragraph alignments, color, add images. Then in the PHP side use strip_tags function to allow only the certain tags so the user could not insert any malicious code like XSS injections into HTML contents. For example:
strip_tags($article_contents, "<u><b><i><font><span><p>");
Proposed Answer:
Use <span></span>
Tags like <p></p><div></div> take up as much space as they can, while <span></span> takes up as little as it can to hold whatever is inside it, so it might be more suitable for you.
Let me know if that worked for you.
In PHP you can use function nl2br that changes all newline characters to BR HTML tag. http://php.net/nl2br