how to stop user putting HTML code in text inputs - php

iv been building a website and while testing I noticed that if I put <em>bob</em>
or something similar in my text fields on my register/udate pages they are stored on the database as entered '<em>bob</em>' but when called back on to the website they display in italics
is there a way to block html code from my text inputs?
or dose it only read as html when being echoed back on the page from the database?
mostly just curious to know what's happening here?
the name displaying in italics isn't a major issue but seems like something the user shouldn't be able to control?
p.s. i can provide code if needed but didn't think it would be much help in this question?

You can also just use htmlspecialchars() to output exactly what they typed on the page — as-is.
So if they enter <i>bob</i> then what will show up on the page is literally <i>bob</i> — that way you're "allowing" all the input in the world, but none of it is ever rendered.
If you want to just get rid of the tags, strip_tags() is the better option, so <i>bob</i> would show up as bob. This works if you're sure there's no legitimate scenario where someone would want to enter an HTML tag. (For example, Stack Overflow obviously can't just strip the tags out of stuff we type, since a lot of questions involve typing HTML tags.)

You can use strip_tags to remove all HTML tags from a string: http://php.net/manual/es/function.strip-tags.php
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> Other text';
echo strip_tags($text); // Output: Test paragraph. Other text
echo "\n";
// Allows <p> and <a>
echo strip_tags($text, '<p><a>'); // Output: <p>Test paragraph.</p> Other text
?>

You can use builtin PHP function strip_tags. It will remove all HTML tags from a string and return the result.
Something like that:
$cleaned_string = strip_tags($_GET['field']);

Related

PHP output string and maintain spacing [duplicate]

Any ideas why formatted text from DB, when echo-ed out in php loses its formatting, i.e. no new lines? Thanks!
Use nl2br().
New lines are ignored by browser. That's why you see all text without line breaks. nl2br() converts new lines to <br /> tags that are displayed as new lines in browsers.
If you want to display your text in <textarea>, you don't need to convert all new lines to <br />. Anyway, if you do it... you will see "<br />"s as text in new lines places.
Because there are no html tags for formatting!
Try the nl2br function.
You could try add nl2br() function...
something like this: echo nl2br($your_text_variable);
It should work ;-)
The reason
This is the default behavior for all user agents. If you look at the page source, you'll see that your text has the same formatting like the one in the database (or textarea).
The reason of your confusion is probably that you once see the text in the <textarea> tag, which displays preformatted text, does not interpret the tags, and in the other case the text is interpreted (whitespace is not important in this case).
The browsers don't display new lines, unless specifically asked for - using <br> tag or any block level tags.
No tags == no new lines.
The fix
If you store preformatted text in the database,
you should wrap the output in the <pre> tag.
You may want to convert the formatting characters to the HTML tags you need using set of functions like nl2br, str_replace etc.
You may also correct your structure to store the HTML in the database instead of just plain text (however markup looks like a better solution).
See similar question:
How do I keep whitespace formatting using PHP/HTML?
The difference between the two images you show is that one has the text in a <textarea></textarea> and the other does not ... if you want 1:1: <textarea><?php echo $yourVariable;?></textarea>
It does output what you say to output. If the text is pre-formatted, put it inside the HTML <pre></pre> tag in your output script.
This should be helpful in answering.
How do I keep whitespace formatting using PHP/HTML?enter link description here
Set up a string preprocessing code for both input to database and output to display page

Using strip_tags() and preg_replace() to display text entered in a WYSIWYG/TinyMCE Text Editior

Good morning,
Here's the problem:
I have some text being entered in via text editor (WYSIWYG/TinyMCE) and being displayed elsewhere as posting. The problem we have is that the text looses its formatting when being displayed as a posting. After digging through the code, I discovered that this was being done with a strip_tags() + echo preg_replace() combo. I'm still new to PHP, but I was able to figure out:
strip_tags() was taking out the formatting (b/c that's how it rolls)
I could add and to get the bold and italicized text to display
the underlined and strikethrough text are CSS styles and adding the code (as it is saved on the db table) to the strip_tags() list did NOT solve the problem
My question is: can I modify the existing code to solve this, or should I use something else (htmlentities() perhaps)?
EDIT: I tried htmlentities and it failed.
EDIT: I added just the tag and the problem is 50% solved. My text is underlined, but it shows lower than the non-underlined text that comes after it. Its as if the underlined text is being treated as subtext or something.
code snippet:
<div class="display_text_area">
<?php $text = strip_tags(str_ireplace("</p>", "</p><br/>",
$text_detail->description),
'<font><ul><li><br/><strong><em><span style="text-decoration: underline;">'); ?>
<?php echo preg_replace('/(<br[^>]*>\s*){2,}/', '<br/>', $text); ?>
</div>
I'm leaving the tag here to show that (a) I tried it, and (b) it didn't work. So (c) I know it needs to be removed or modified.
Many thanks in advance.
The point is that TinyMCE returns nominally valid rich HTML that doesn't need stripping or escaping before being used in an HTML page. However, you can't assume that the TinyMCE editor is running on the client, as a you might be exploited by someone who simply directly posts a response which contains an XSS attack.
IIRC, TinyMCE returns XHTML by default. You need to ensure that any returned HTML is correct using a library such as HTML Purifier.

Stripping input to complete plain text

Currently finalising the coding for my comment system, and it want it to work a little how Stack Overflow works with their posts etc, I would like my users to be able to use BOLD, Italic and Underscore only, and to do that I would use following:
_ Text _ * BOLD * -Italic-
Now, firstly I would like to know a way of stripping a comment completely clean of any tags, html entities and such, so for example, if a user was to use any html / php tags, they would be removed from the input.
I am currently using Strip_tags, but that can leave the output looking quite nasty, even if an abusive or blatent XSS/Injection attempt has been made, I would still like the plain-text to be outputted in full, and not chopped up as strip_tags seems to make an absolute mess when it comes to that.
What I will then do, is replace the asterisks with bold html tags, and so on AFTER stripping the content clean of html tags.
How do people suggest I do this, currently this is the comment sanitize function
function cleanNonSQL( $str )
{
return strip_tags( stripslashes( trim( $str ) ) );
}
PHP tags are surrounded by <? and ?>, or maybe <% and %>on some ages-old installations, so removing PHP tags can be managed by a regex:
$cleaned=preg_replace('/\<\?.*?\?\>/', '', $dirty);
$cleaned=preg_replace('/\<\%.*?\%\>/', '', $cleaned);
Next you take care of the HTML tags: These are surrounded by < and >. Again you can do this with a regex
$cleaned=preg_replace('/\<.*?\>/','',$cleaned);
This will transform
$dirty="blah blah blah <?php echo $this; ?> foo foo foo <some> html <tag> and <another /> bar bar";
into
$cleaned="blah blah blah foo foo foo html and bar bar";
You could try using regular expressions to strip the tags, such as:
preg_replace("/\<(.+?)\>/", '', $str);
Not sure if that's what you're looking for, but it will remove anything inside < and >. You can also make it a little more foolproof by requiring the first character after the < to be a letter.
The correct way is not to delete html tags from your user's comment, but to tell the browser that the following text should not be interpreted as HTML, Javascript, whatever. Imagine someone wants to post example code like we do here on stackoverflow. If you just bluntly remove any parts of a comment that seem to be code, you will mess up the user's comment.
The solution is to use htmlentities which will escape symbols used for html markup in the comment so that it will actually show up as just text in the browser.
For example the browser will interpret a < as the beginning of a html tag. if you just want the browser to display a <, you have to write < in the source code. htmlentities will convert all the relevant symbols into their html entities for you.
Longer Example
echo htmlentities("<b>this text should not be bold</b><?php echo PHP_SELF;?>");
Outputs
<b>this text should not be bold</b><?php echo PHP_SELF;?>
The browser will output
<b>this text should not be bold</b><?php echo PHP_SELF;?>
Consider the following real life example with the solution, you accepted. Imagine a user writing this comment.
i'm in a bad mood today :<. but your blog made me really happy :>
You will now do your preg_replace("/\<(.+?)\>/", '', $comment); on the text and it will remove half the comment:
i'm in a bad mood today :
If that's what you wanted, never mind this answer. If you don't, use htmlentities.
If you want to save the comment as a file and not have the server interpret PHP code inside it, save it with an extension like '.html' or '.txt', so that the web server won't call the PHP interpreter in the first place. There is usually no need to escape PHP code.

html tags + ascii names and numbers

Our users want to be able to type in
<p>some text goes here.</p>
which gets saved to the database and then outputs to the screen as it should, i.e. however the browser renders <p> tags without actually displaying the <p> tags.
At the same time without affecting the existing database generated pages where the user has been typing in
<p>some text goes here.</p>
Is this possible?
An example of "the customer is always right". Do I have a choice? No, it's something they want for some reason which is beyond me.
I'm unsure what you're trying to accomplish. If you WANT the browser to render the tags when you display them, then use can use html_entity_decode:
echo html_entity_decode( "<p>some text goes here.</p> ");
echo html_entity_decode( "<p>some text goes here.</p> ");
If not, then you can pass the strings to htmlspecialchars
echo htmlspecialchars( "<p>some text goes here.</p> ");
echo htmlspecialchars( "<p>some text goes here.</p> ");
Take a look at htmlspecialchars(). It will escape any HTML characters out for you.
http://php.net/manual/en/function.htmlspecialchars.php
Have you looked at examples of htmlentities and strip_tags
http://php.net/manual/en/function.htmlentities.php
http://php.net/manual/en/function.strip-tags.php
You can store the date at which your choice of encoding changes in the code, and then you can use the timestamp for each entry in the database (I assume you have a 'created'/'modified'/'timestamp' or similar field?) to decide which behaviour should be used when outputting the content.
Alternatively you could write a script to update the old entries in the database to the new format (e.g. calling htmlspecialchars() or html_entity_decode() on each of them).

Why does PHP echo'd text lose its formatting?

Any ideas why formatted text from DB, when echo-ed out in php loses its formatting, i.e. no new lines? Thanks!
Use nl2br().
New lines are ignored by browser. That's why you see all text without line breaks. nl2br() converts new lines to <br /> tags that are displayed as new lines in browsers.
If you want to display your text in <textarea>, you don't need to convert all new lines to <br />. Anyway, if you do it... you will see "<br />"s as text in new lines places.
Because there are no html tags for formatting!
Try the nl2br function.
You could try add nl2br() function...
something like this: echo nl2br($your_text_variable);
It should work ;-)
The reason
This is the default behavior for all user agents. If you look at the page source, you'll see that your text has the same formatting like the one in the database (or textarea).
The reason of your confusion is probably that you once see the text in the <textarea> tag, which displays preformatted text, does not interpret the tags, and in the other case the text is interpreted (whitespace is not important in this case).
The browsers don't display new lines, unless specifically asked for - using <br> tag or any block level tags.
No tags == no new lines.
The fix
If you store preformatted text in the database,
you should wrap the output in the <pre> tag.
You may want to convert the formatting characters to the HTML tags you need using set of functions like nl2br, str_replace etc.
You may also correct your structure to store the HTML in the database instead of just plain text (however markup looks like a better solution).
See similar question:
How do I keep whitespace formatting using PHP/HTML?
The difference between the two images you show is that one has the text in a <textarea></textarea> and the other does not ... if you want 1:1: <textarea><?php echo $yourVariable;?></textarea>
It does output what you say to output. If the text is pre-formatted, put it inside the HTML <pre></pre> tag in your output script.
This should be helpful in answering.
How do I keep whitespace formatting using PHP/HTML?enter link description here
Set up a string preprocessing code for both input to database and output to display page

Categories