Spaces doesn't come after nl2br(htmlentities($text))? - php

I am printing a article with spaces inside the article.
Text inside article has HTML tags also,so i am using htmlentities before echo.
But problem is that display does't show spaces on the browser.
What is the problem with these commands?
Can someone please suggest me a better option?
DB update command:
mysql_real_escape_string($text, $db)
Article display command:
echo nl2br(htmlentities($row_page['text']));
Example: displayed text is pretty ugly
Real text and i am expecting same:
dbus-1/ libcom_err.so.2# libglib-2.0.so.0# liblvm2cmd.so.2.02* libpopt.so.0.0.0*
device-mapper/ libcom_err.so.2.1* libglib-2.0.so.0.2200.5* libm-2.12.so* libproc-3.2.8.so*
firmware/ libcrypt-2.12.so* libgmodule-2.0.so.0#

HTML collapses all whitespace (spaces or newlines or tabs) into a single space. You can work around it by replacing ' ' with ' ', for example:
echo str_replace(' ', ' ', nl2br(htmlentities($row_page['text'])));
But even cleaner is to just have the browser use pre formatted whitespace:
<pre><?php echo htmlentities($row_page['text']); ?></pre>
Or alternatively use CSS for a bit of extra flexibility:
<div style="white-space: pre;"><?php echo htmlentities($row_page['text']); ?></div>
Pre formatted whitespace has some drawbacks, for example you can't have any newlines or indentation in your HTML file when you're using pre, because the browser will render them. But when you really need to control how something is rendered it's the best choice.

Browsers collapse continuous whitespace into a single space. That's the way it works, mainly so you can write source code like this:
<p>
Some very long text nicely indented and readable in source,
so it's easy to write for the author.
</p>
and it will display nicely in a browser like this:
Some very long text nicely indented and readable in source, so it's easy to write for the author.
To use pre-formatted text, wrap that section in a <pre> tag or use the equivalent CSS rule white-space: pre. The way you're escaping HTML makes this rather difficult of course. A markup language like Markdown may be the solution there.

Related

PHP output string and maintain spacing [duplicate]

Any ideas why formatted text from DB, when echo-ed out in php loses its formatting, i.e. no new lines? Thanks!
Use nl2br().
New lines are ignored by browser. That's why you see all text without line breaks. nl2br() converts new lines to <br /> tags that are displayed as new lines in browsers.
If you want to display your text in <textarea>, you don't need to convert all new lines to <br />. Anyway, if you do it... you will see "<br />"s as text in new lines places.
Because there are no html tags for formatting!
Try the nl2br function.
You could try add nl2br() function...
something like this: echo nl2br($your_text_variable);
It should work ;-)
The reason
This is the default behavior for all user agents. If you look at the page source, you'll see that your text has the same formatting like the one in the database (or textarea).
The reason of your confusion is probably that you once see the text in the <textarea> tag, which displays preformatted text, does not interpret the tags, and in the other case the text is interpreted (whitespace is not important in this case).
The browsers don't display new lines, unless specifically asked for - using <br> tag or any block level tags.
No tags == no new lines.
The fix
If you store preformatted text in the database,
you should wrap the output in the <pre> tag.
You may want to convert the formatting characters to the HTML tags you need using set of functions like nl2br, str_replace etc.
You may also correct your structure to store the HTML in the database instead of just plain text (however markup looks like a better solution).
See similar question:
How do I keep whitespace formatting using PHP/HTML?
The difference between the two images you show is that one has the text in a <textarea></textarea> and the other does not ... if you want 1:1: <textarea><?php echo $yourVariable;?></textarea>
It does output what you say to output. If the text is pre-formatted, put it inside the HTML <pre></pre> tag in your output script.
This should be helpful in answering.
How do I keep whitespace formatting using PHP/HTML?enter link description here
Set up a string preprocessing code for both input to database and output to display page

Properly rendering stored HTML

A part of my site allows users to create comments in a text box to be stored in an SQL database. Because a lot of people copy/paste things in from word or other places, I have to keep <p> and <br> tags to keep formatting, and also <a> tags to let users create their own links. Everything else gets stripped out. I was accomplishing this like so:
$text = strip_tags( $text, '<br><a><p>' );
But today a user came to me and told me they lost a large portion of their text because they made a arrow <- for visual effect. So now I know strip tags removes everything after a <.
I can accomplish a similar effect with preg_replace like so:
preg_replace('/((?!<((\/)?p|br|a))<[^>]*>)/', "", $text);
But this still has the downside of only working if the tag spans one line (I think), leaving in html comments and probably a few other things that I'm not aware of. What are my options? Is there a catch all solution? A library I can use? I most work alone so I'm not really aware of industry standards.
Use html purifier. It help clean the summited html and removes the unwanted codes for example if a user adds a scripts tag that might cause harm to your website (XSS Attack) html purifier before submitting. It also adds or completes html for example a user inputs < strong > gamer ... with out closing the tag, it will close the tag and output cleaner html.
I can accomplish a similar effect with preg_replace...But this still has the downside of only working if the tag spans one line (I think). Not really! You could use some modifiers to make PHP Regular Expressions span multiple lines. Consider the Example below with Multiline HTML String:
<?php
// $s IS A MULTILINE HTML SNIPPET CONTAINING THE FOLLOWING HTML TAGS
// <div>, <a>, <blockquote>, <em>, <strong>, <span>, <br />
$s = "<div class='one'>
<a href='/link.php'>
<blockquote>
There is real Power in the Hearts of men: not just Power but
\"something so much powerful than Power\" that Power itself begs to \"power down\".
</blockquote>
</a>
<p class='lv'>
This Power is not in the Head nor in the Intellect nor in the Skills of Man...
<em class='em1'>but in the deep recess of the Human Heart...</em>
and it speaks volumes yet only very few understand its language -
<strong>The Language of Love</strong>
- The Greatest Power You can have.... The Power to which nothing is Impossible!!!
</p>
<br />
<span>Do you know this Power? <--</span>
<strong>Do you Speak Love???</strong>
</div>";
// THIS CONCISE REGEX PATTERN REMOVES ALL HTML TAGS WITHIN THE MULTILINE STRING
// EXCEPT FOR TAGS LIKE: <a> <p> <br />
// IT WOULD ALSO LEAVE <- OR <-- OR <------ UNTOUCHED
$r = preg_replace("#<(?!\/[ap]|[ap\-]|br).*?>#si", "", $s);
echo ($r);
If you viewed the Source Code, You would observe that all HTML Tags except for <br>, <p>, <a> and Symbols like <-- were stripped out. In effect, the Source would look something like this:
<a href='/link.php'>
There is real Power in the Hearts of men: not just Power but
"something so much powerful than Power" that Power itself begs to "power down".
</a>
<p class='lv'>
This Power is not in the Head nor in the Intellect nor in the Skills of Man...
but in the deep recess of the Human Heart...
and it speaks volumes yet only very few understand its language -
The Language of Love
- The Greatest Power You can have.... The Power to which nothing is Impossible!!!
</p>
<br />
Do you know this Power? <--
Do you Speak Love???
Cheers and Good-Luck...
If your case is simple as how you showed us in your question, I won't go with external libraries like HTML Purifier.
strip_tags() function has its own way to determine tags. One way that it doesn't consider a < a real tag is when it's followed by an space. By space I mean any character between 0x09 to 0x0d as well as 0x20 (it is how isSpace() internal function works by its call from php_strip_tags_ex()).
So a workaround could be putting one of those allowed spaces between <- characters and then revert it after doing a strip_tags() but you'd better take care of not only a < character followed by - but any < character followed by a [^a-zA-Z!?\s] character (a character which is not an alphabet, ! and ? marks, \s any kind of white-space characters (spaces are fine!))
I'd like to choose my space character to be a carriage-return \r which is 0x0D in hex. That is more specific:
$text = preg_replace( "~<\r([^a-zA-Z!?\s])~", "<\1", strip_tags( preg_replace( '~<([^a-zA-Z!?\s])~', "<\r\1", $text ), '<p><a><br>' ) );
I can recommend you to encode the data that the user submits and then remove the tags you don't allow. This way you won't remove tags that appear normally on the page.
Please note that running complex regex expression on big string so not efficient.
Take the input from the user encode it so instead of <p> you will save <p> and then you can insert it to the page as html so it will render as html but without the actual tags, that way you don't need to remove anything.
You can use htmlspecialchars(string) here is an example

Form deleting spaces

<form action="class.php" method="POST">
thread link:
<br>
<input type="text" name="thread">
<input type="submit" value="Submit">
</form>
I have this simple form. Upon entering a string starting with many spaces, something like
" test"
my PHP code
echo 'test:'.$_POST['thread'];
will print test: test. It will erase all spaces except one.
Where did all the spaces go and why does this happen?
Specification of HTMLs tells, renderer removes multiple spaces. That is useful in some cases. To avoid that, you can place content of this field in <pre></pre> block. Like that:
echo '<pre>test:'.$_POST['thread'].'</pre>';
The form does not delete spaces. Neither does your PHP code. The spaces are still there in resulting HTML document (generated by your PHP code in response to form submission). They just get rendered as a single space, since in most contexts, any sequence of whitespace characters in HTML content is equivalent to a single space. This is defined in CSS 2.1 spec, in the description of the white-space property.
Thus, to prevent the collapse of spaces, the simple way is to set white-space: pre in CSS. It also prevents line breaks in the content, but this is probably not a problem here. Using the pre element in HTML causes this setting, but it also sets font family to monospace.
So this is just a matter of HTML and CSS, independently of PHP. Example:
<p> Hello world!</p>
<p style="white-space: pre"> Hello world!</p>
You need to convert whitespaces to html entities
$thread = str_replace(' ', ' ', $_POST['thread'])
and now echo 'test:'.$thread will output your text with whitespaces.
This is the most basic thing about HTML. Any whitespace is equivalent and is treated as a single space.
You should never use multiple spaces to try to layout your text in HTML ( like you could do in Word for instance ). You should use css styles like margin or padding instead.
The answers that propose to replace the spaces with & nbsp; are correct, but they leave you on the wrong track.

Stripping input to complete plain text

Currently finalising the coding for my comment system, and it want it to work a little how Stack Overflow works with their posts etc, I would like my users to be able to use BOLD, Italic and Underscore only, and to do that I would use following:
_ Text _ * BOLD * -Italic-
Now, firstly I would like to know a way of stripping a comment completely clean of any tags, html entities and such, so for example, if a user was to use any html / php tags, they would be removed from the input.
I am currently using Strip_tags, but that can leave the output looking quite nasty, even if an abusive or blatent XSS/Injection attempt has been made, I would still like the plain-text to be outputted in full, and not chopped up as strip_tags seems to make an absolute mess when it comes to that.
What I will then do, is replace the asterisks with bold html tags, and so on AFTER stripping the content clean of html tags.
How do people suggest I do this, currently this is the comment sanitize function
function cleanNonSQL( $str )
{
return strip_tags( stripslashes( trim( $str ) ) );
}
PHP tags are surrounded by <? and ?>, or maybe <% and %>on some ages-old installations, so removing PHP tags can be managed by a regex:
$cleaned=preg_replace('/\<\?.*?\?\>/', '', $dirty);
$cleaned=preg_replace('/\<\%.*?\%\>/', '', $cleaned);
Next you take care of the HTML tags: These are surrounded by < and >. Again you can do this with a regex
$cleaned=preg_replace('/\<.*?\>/','',$cleaned);
This will transform
$dirty="blah blah blah <?php echo $this; ?> foo foo foo <some> html <tag> and <another /> bar bar";
into
$cleaned="blah blah blah foo foo foo html and bar bar";
You could try using regular expressions to strip the tags, such as:
preg_replace("/\<(.+?)\>/", '', $str);
Not sure if that's what you're looking for, but it will remove anything inside < and >. You can also make it a little more foolproof by requiring the first character after the < to be a letter.
The correct way is not to delete html tags from your user's comment, but to tell the browser that the following text should not be interpreted as HTML, Javascript, whatever. Imagine someone wants to post example code like we do here on stackoverflow. If you just bluntly remove any parts of a comment that seem to be code, you will mess up the user's comment.
The solution is to use htmlentities which will escape symbols used for html markup in the comment so that it will actually show up as just text in the browser.
For example the browser will interpret a < as the beginning of a html tag. if you just want the browser to display a <, you have to write < in the source code. htmlentities will convert all the relevant symbols into their html entities for you.
Longer Example
echo htmlentities("<b>this text should not be bold</b><?php echo PHP_SELF;?>");
Outputs
<b>this text should not be bold</b><?php echo PHP_SELF;?>
The browser will output
<b>this text should not be bold</b><?php echo PHP_SELF;?>
Consider the following real life example with the solution, you accepted. Imagine a user writing this comment.
i'm in a bad mood today :<. but your blog made me really happy :>
You will now do your preg_replace("/\<(.+?)\>/", '', $comment); on the text and it will remove half the comment:
i'm in a bad mood today :
If that's what you wanted, never mind this answer. If you don't, use htmlentities.
If you want to save the comment as a file and not have the server interpret PHP code inside it, save it with an extension like '.html' or '.txt', so that the web server won't call the PHP interpreter in the first place. There is usually no need to escape PHP code.

PHP code line break `\n` causing gap between elements

I'm echoing a series of HTML elements using PHP. I'm using \n to cause code line breaks to make the source code more organized and legible.
For some reason, the use of \n in a specific location is causing a mysterious gap between the HTML elements. In firebug, this gap is not showing up as a margin, or padding, but rather just a gap.
Here is the PHP in question:
Note: As you can see, I have removed all of the PHP inside the tags as I'm pretty sure it is not relevant to this problem.
echo '<ul ... >'."\n";
while($row = mysql_fetch_assoc($result_pag_data)) {
echo '<li><a ... >'."\n".
'<img ... >'."\n".
'</a></li>'."\n"; <---- THIS IS THE \n THAT SEEMS TO BE CAUSING THE GAP
}
echo '</ul>'."\n";
Have you ever seen anything like this before, a presentation gap associated with PHP line breaks?
If so, what is the reason for it?
Is it really that important that I use \n in my code?
That's normal. A \n line break has no meaning in HTML, so it's interpreted as a space character. If you don't want that gap, then eliminate the \n, or rewrite the html so it's not relevant:
<li><a ...><img ...></a></li>
As a general rule, tags which can contain text should never have their closing tags on a line by themselves, for this very reason.
Following up on your 'where to put \n' question. This comes down to personal preference, but I tend to format my html like this:
<table>
<tr>
<td><a href="some big long ugly url">
<img ....></a></td>
</tr>
Since <tr> can't contain any text on its own (in valid html), it's ok to put on its own line. But the </a> and </td> are both tags that CAN contain text, so I put them right up against the end of the 'text' (the img tag in this case), so that the Phantom Linebreak Menance (coming soon to a starwars ripoff near you) can't strike.
Note, of course, that my example does have a line break and indentation between the opening <a> and the <img> tag, so that's another place where a "must be right next to each" other layout would cause a gap. If you need a series of things lined up smack dab against each other, than you basically can't use line breaks anywhere in that section of the page.
The whitespace is translated into (empty) HTML text nodes, which take up some space (you can test this by walking the DOM). There is no solution to make these disappear that I know of other than removing the whitespace from your HTML in the first place.
Of course it's not only \n that would cause this behavior; spaces or tabs would do exactly the same as well.
In that particular case the newlines are used to prettify the html source, keep it readable via view-source. That's quite common actually. (Yet redundant.)
As said by the other answers, it does not have meaning normally. Albeit this can be overriden via CSS and the attribute (which we can assume is not the case here):
white-space: pre-line;
You should only output a newline where you in fact want a newline in the output. In HTML, a newline is whitespace, just like the space character.

Categories