PHP clean or normalize html text - php

It seems that my question is not well asked, that's why I am asking you to help me.
I have a chunk of html code when in <textarea> tag created with WYSIWYG editor.
Now before saving, on form submit, I would like to normalize the text inside by removing unneccessary whitespaces/spaces, have the code well formated with tabs, etc.
here is the code:
27 Июня
17:51:58 Познакомлюсь с девчонками из прибалтики Категория: Знакомства > Контакты по интересам Просмотров: 7
I wish this text is going in one row (single lined), but with correct use of spaces, dots etc. Is it possible somehow with PHP without need to write a function ?

Sanitizing text is never a task composed of just one line of text, I'm afraid. However, the procedures are somewhat common for simple cleansing. For example, $output = preg_replace("/\s+/", " ", $input); will get rid of excess whitespaces but I would worry a lot more about possible malicious code injections through that <textarea> element.
Maybe you should give HTMLPurifier a look, it's quite complete even if it's still not HTML5 compliant. It will sort out the majority of concerns about filtered content.
Hope that helps :)

try giving striptags("string");

Related

PHP wont show new line [duplicate]

This question already has answers here:
Echo from MySQL database with spaces and line breaks?
(2 answers)
Closed 9 years ago.
i have a little problem with the text to be readed from my database.
After the user has confirmed their new post, it saves in the database like this ( like i want it to do).
but in the webpage, it will ignore these lines, and just echo out everything on the same line.
Here is a bit my source code:
$objekttekst=str_replace("\\r\\n", "<br>", $obj->innhold);
$objekttittel=$obj->tittel;
?>
<h2><?=$objekttittel?></h2>
<p><?=$objekttekst?></p>
could someone help me out? thanks
Use nl2br() function.
$objekttekst = nl2br($obj->innhold);
The input textarea is pre-formatted, which means that it will show any newlines that the user enters. However, HTML rendered (web browser) does not display any newlines from the input, unless newlines are explicitly inserted with tags such as <BR>.
You have several options here. For sure these three are not your only options, but they are the ones I have personally been using most often.
Form textarea with pre-formatted text
If you want to display the data (objekttekst) in a similar textarea where the input was given, you could do:
<h2><?=$objekttittel?></h2>
<p><textarea><?=$objekttekst?></textarea></p>
This would suit you best in a situation where the user needs a possibility to edit the entry.
Preformatting
If you want to display the text as it is, you can always surround it with <PRE>...</PRE>. That will show any newlines, indentations etc. Note that this will make the output use a fixed-width font such as Courier New.
Convert newlines to <BR> tags
Use function nl2br() as already mentioned in another answer. See: http://php.net/manual/en/function.nl2br.php for more information.
Additional note...
You might want to look into regular expressions, as in many cases you might want to do also some other modifications to your data before showing it in the HTML page. nl2br() will take care of newlines, but for other and more complex modifications you should learn regular expressions.
You can surround your string with <pre> tag instead of replacing \n with <br>
Example:
<?php
$objekttekst=$obj->innhold
$objekttittel=$obj->tittel;
?>
<h2><?=$objekttittel?></h2>
<p><pre><?=$objekttekst?></pre></p>

How to sanitize HTML POST values of NicEdit?

I recently started to use NicEdit on my "Article Entry" page. However, I have some questions about security and preventing abuse.
First question:
I currently sanitize every input with "mysql_real_escape_string()" in my database class. In addition, I sanitize HTML values with "htmlspecialchars(htmlentities(strip_tags($var))).
How would you sanitize your "HTML inputs" while adding them to database, or the way I'm doing it works perfect?
Second question:
While I was making this question, there was a question with "similar title" so I readed it once. It was someone speaking about "abused HTML inputs" to mess with his valid template. (e.g just input)
It may occur on my current system too. How should it be dealt with in PHP?
Ps. I want to keep using NicEdit, so using BBCode system should be the last advice.
Thank you.
mysql_real_escape_string is not sanitization, it escapes text values to keep the syntax of the SQL query valid/unambiguous/injection safe.
strip_tags is sanitizing your string.
Doing both htmlentities and htmlspecialchars in order is overkill and may just garble your data. Since you're also stripping tags right before that, it's double overkill.
The rule is to make sure your data doesn't break your SQL syntax, therefore you mysql_real_escape_string once before putting the data into the query. You also do the same thing, protecting your HTML syntax, by HTML escaping text before outputting it into HTML, using either htmlspecialchars (recommended) or htmlentities, not both.
For a much more in-depth excursion into all this read The Great Escapism (Or: What You Need To Know To Work With Text Within Text).
I don't know NicEdit, but I assume it allows your users to style text using HTML behind the scenes. Why are you stripping the HTML from the data then? There's no point in using a WYSIWYG editor then.
This is a function I am using in one of my NICEDIT applications and it seems to do well with the code that comes out of nicedit.
function cleanFromEditor($text) {
//try to decode html before we clean it then we submit to database
$text = stripslashes(html_entity_decode($text));
//clean out tags that we don't want in the text
$text = strip_tags($text,'<p><div><strong><em><ul><ol><li><u><blockquote><br><sub><img><a><h1><h2><h3><span><b>');
//conversion elements
$conversion = array(
'<br>'=>'<br />',
'<b>'=>'<strong>',
'</b>'=>'</strong>',
'<i>'=>'<em>',
'</i>'=>'</em>'
);
//clean up the old html with new
foreach($conversion as $old=>$new){
$text = str_replace($old, $new, $text);
}
return htmlentities(mysql_real_escape_string($text));
}

How to make a textarea ouput textarea code without breaking?

I created a form where users can enter html code and it outputs their code in another textarea. The problem is that if the html the user enters has a textarea in the code, the in their code breaks my textarea form. I see other sites display any html correctly so how is this done without breaking the form and allowing the user to copy it so that it still remains as and not some converted code so they can paste it on their webpage?
Ah crap yeah I figured it out, in fact the problem wasn't with the htmlspecialchars code alone I forgot to add a return to one of my functions haha. Thanks guys.
Represent characters that have special meaning in HTML using entities. Since you are using PHP, use htmlspecialchars
There are millions and millions of ways to do this. The easiest is to use htmlspecialchars or htmlentities on the user's input. This will make a visual </textarea> in the textarea box without closing it. This actually turns it into </textarea>. htmlspecialchars transforms less characters than htmlentities and usually makes more sense to use in a situation like this, but do your research.
strip_tags() is also a possibility.
You can also use a regular expression with PCRE, or even str_replace() or other string manipulation functions to strip off the textarea, convert the special characters, etc.
PECL also as a BB code extension you can use if you still want your users to be able to enter some for of tags to style their output.
<textarea><?php echo htmlentities($code); ?></textarea>
You have to transform the html code into symbols, so it is not treated as html.
Use the function htmlentities() on the textarea content before echoing it.

PHP: Convert HTML for use in alt/title attribute in a tag and keep the formatting

Essentially I have this
<p>hello</p>
<p>So I wanted just to say hi</p>
<p>I hope its going well</p>
Coming from a db.
If I just strip the tags then in the title I get this
title="helloSo I wanted just to say hiI hope its going well"
What I want (like SO does it).
title="Hello
So I wanted just to say hi
I hope its going well"
I have tried using \n or \r and it just shows up in the title text.
title="Hello\n\rSo I wanted just to say hi\n\rI hope its going well"
Is this possible because of using Markdown and the way SO is saving the text?
I'm using TinyMCE and I have looked into ways of formatting the text, I've even tried using the output buffer to try and arrange the text how I want it.
Edit: Lets make this really clear and simple. It's going into a TITLE element!
If I can't get this to work, I'll just do this via a popup with jQuery.
Any help / advise appreciated :)
In PHP, The string '\r\n' is interpreted just like that and consist of four letters. The string "\r\n" is interpreted as escape sequences and contains two characters, a carriage return and a linefeed. Use double quotes if you want to insert a line break in a string.
If I recall correctly, browsers won't render alt/title tags with multiple lines? It will simply remove the line spacing.
If they do now, then PHP requires you to wrap \r\n\t and other formatting in " double quotations as Sjoerd suggested.

How to rescrict a JavaScript script being inserted in the database with PHP?

I have one problem regarding the data insertion in PHP.
In my site there is a message system.
So when my inbox loads it gives one JavaScript alert.
I have searched a lot in my site and finally I found that someone have send me a message with the text below.
<script>
alert(5)
</script>
So how can I restrict the script code being inserted in my database?
I am running on PHP.
There is no problem with JavaScript code being stored in the database. The actual problem is with non-HTML content being taken from the database and displayed to the user as if it were HTML. The correct approach would be to make sure your rendering code treats text as text, not as HTML.
In PHP, this would be done by calling htmlspecialchars on the inbox contents when displaying the inbox (possibly along with nl2br and maybe turning links to <a> tags).
Avoid using striptags for text content: as an user, I might want to type a message like:
... and to create a link, use your-text-here ...
striptags would eliminate the tag, htmlspecialchars would make the text appear as it was typed.
You should not restrict it to be inserted into the database (if StackOverflow would restrict it, we would not be able to post code examples here!)
You should better control how you display it. For instance, add htmlentities() or htmlspecialchars() to your echo call.
This is called XSS. There are numerous threads about it on SO.
How to prevent XSS with HTML/PHP?
What are the best practices for avoid xss attacks in a PHP site?
XSS Attacks Prevention
Is preventing XSS and SQL Injection as easy as does this…?
You should use strip_tags. If you still want to allow some HTML, then add a whitelist in the second parameter.
I should add a really big caveat here. If you're leaving any tags in a strip_tags whitelist, you can still be susceptible to javascript injection. Assume you're allowing the seemingly innocuous tags <strong> and <em>:
Strip tags will still allow all attributes, including event handlers
like <strong onmouseover="window.href=http://mydodgysite.com">this</strong>.
You have a couple of serious options:
strip_tags with no whitelist. Safe, but doesn't allow for any formatting, and may cause problems with strings like this: "x<y, but y>4" --> "x4"
htmlentities. Use this when displaying the data on the screen (not on the data before you put it in the database). It's safe, but doesn't allow for formatting.
A different markup system than HTML, for example: Markdown, Wiki markup, BB Code. Requires rendering to convert back to HTML, but it's mostly safe and can be quite flexible.
User input should be escaped before outputting it.
Whenever you're displaying something a user submitted, run it through htmlspecialchars() first. This'll turn HTML code into safe output.
Take a look at the htmlspecialchars() function. It converts < > ' " and & to their html entity equilivents, meaning <script> will become <script>
You can use strip_tags(). The second argument of this function will allow you to list an explicit list of which tags are allowable:
// Allow <p> and <a>, <script> will be stripped
echo strip_tags($text, '<p><a>');
You may also consider htmlspecialchars(), which converts characters like < into <, causing the browser to interpret them as text, rather than code:
$new = htmlspecialchars("<a href='test'>Test</a>", ENT_QUOTES);
echo $new; // <a href='test'>Test</a>
If I understand you right, you're just looking for two simple commands:
$message = str_replace($message, "<", "<");
$message = str_replace($message, ">", ">");

Categories