Form deleting spaces - php

<form action="class.php" method="POST">
thread link:
<br>
<input type="text" name="thread">
<input type="submit" value="Submit">
</form>
I have this simple form. Upon entering a string starting with many spaces, something like
" test"
my PHP code
echo 'test:'.$_POST['thread'];
will print test: test. It will erase all spaces except one.
Where did all the spaces go and why does this happen?

Specification of HTMLs tells, renderer removes multiple spaces. That is useful in some cases. To avoid that, you can place content of this field in <pre></pre> block. Like that:
echo '<pre>test:'.$_POST['thread'].'</pre>';

The form does not delete spaces. Neither does your PHP code. The spaces are still there in resulting HTML document (generated by your PHP code in response to form submission). They just get rendered as a single space, since in most contexts, any sequence of whitespace characters in HTML content is equivalent to a single space. This is defined in CSS 2.1 spec, in the description of the white-space property.
Thus, to prevent the collapse of spaces, the simple way is to set white-space: pre in CSS. It also prevents line breaks in the content, but this is probably not a problem here. Using the pre element in HTML causes this setting, but it also sets font family to monospace.
So this is just a matter of HTML and CSS, independently of PHP. Example:
<p> Hello world!</p>
<p style="white-space: pre"> Hello world!</p>

You need to convert whitespaces to html entities
$thread = str_replace(' ', ' ', $_POST['thread'])
and now echo 'test:'.$thread will output your text with whitespaces.

This is the most basic thing about HTML. Any whitespace is equivalent and is treated as a single space.
You should never use multiple spaces to try to layout your text in HTML ( like you could do in Word for instance ). You should use css styles like margin or padding instead.
The answers that propose to replace the spaces with & nbsp; are correct, but they leave you on the wrong track.

Related

Properly rendering stored HTML

A part of my site allows users to create comments in a text box to be stored in an SQL database. Because a lot of people copy/paste things in from word or other places, I have to keep <p> and <br> tags to keep formatting, and also <a> tags to let users create their own links. Everything else gets stripped out. I was accomplishing this like so:
$text = strip_tags( $text, '<br><a><p>' );
But today a user came to me and told me they lost a large portion of their text because they made a arrow <- for visual effect. So now I know strip tags removes everything after a <.
I can accomplish a similar effect with preg_replace like so:
preg_replace('/((?!<((\/)?p|br|a))<[^>]*>)/', "", $text);
But this still has the downside of only working if the tag spans one line (I think), leaving in html comments and probably a few other things that I'm not aware of. What are my options? Is there a catch all solution? A library I can use? I most work alone so I'm not really aware of industry standards.
Use html purifier. It help clean the summited html and removes the unwanted codes for example if a user adds a scripts tag that might cause harm to your website (XSS Attack) html purifier before submitting. It also adds or completes html for example a user inputs < strong > gamer ... with out closing the tag, it will close the tag and output cleaner html.
I can accomplish a similar effect with preg_replace...But this still has the downside of only working if the tag spans one line (I think). Not really! You could use some modifiers to make PHP Regular Expressions span multiple lines. Consider the Example below with Multiline HTML String:
<?php
// $s IS A MULTILINE HTML SNIPPET CONTAINING THE FOLLOWING HTML TAGS
// <div>, <a>, <blockquote>, <em>, <strong>, <span>, <br />
$s = "<div class='one'>
<a href='/link.php'>
<blockquote>
There is real Power in the Hearts of men: not just Power but
\"something so much powerful than Power\" that Power itself begs to \"power down\".
</blockquote>
</a>
<p class='lv'>
This Power is not in the Head nor in the Intellect nor in the Skills of Man...
<em class='em1'>but in the deep recess of the Human Heart...</em>
and it speaks volumes yet only very few understand its language -
<strong>The Language of Love</strong>
- The Greatest Power You can have.... The Power to which nothing is Impossible!!!
</p>
<br />
<span>Do you know this Power? <--</span>
<strong>Do you Speak Love???</strong>
</div>";
// THIS CONCISE REGEX PATTERN REMOVES ALL HTML TAGS WITHIN THE MULTILINE STRING
// EXCEPT FOR TAGS LIKE: <a> <p> <br />
// IT WOULD ALSO LEAVE <- OR <-- OR <------ UNTOUCHED
$r = preg_replace("#<(?!\/[ap]|[ap\-]|br).*?>#si", "", $s);
echo ($r);
If you viewed the Source Code, You would observe that all HTML Tags except for <br>, <p>, <a> and Symbols like <-- were stripped out. In effect, the Source would look something like this:
<a href='/link.php'>
There is real Power in the Hearts of men: not just Power but
"something so much powerful than Power" that Power itself begs to "power down".
</a>
<p class='lv'>
This Power is not in the Head nor in the Intellect nor in the Skills of Man...
but in the deep recess of the Human Heart...
and it speaks volumes yet only very few understand its language -
The Language of Love
- The Greatest Power You can have.... The Power to which nothing is Impossible!!!
</p>
<br />
Do you know this Power? <--
Do you Speak Love???
Cheers and Good-Luck...
If your case is simple as how you showed us in your question, I won't go with external libraries like HTML Purifier.
strip_tags() function has its own way to determine tags. One way that it doesn't consider a < a real tag is when it's followed by an space. By space I mean any character between 0x09 to 0x0d as well as 0x20 (it is how isSpace() internal function works by its call from php_strip_tags_ex()).
So a workaround could be putting one of those allowed spaces between <- characters and then revert it after doing a strip_tags() but you'd better take care of not only a < character followed by - but any < character followed by a [^a-zA-Z!?\s] character (a character which is not an alphabet, ! and ? marks, \s any kind of white-space characters (spaces are fine!))
I'd like to choose my space character to be a carriage-return \r which is 0x0D in hex. That is more specific:
$text = preg_replace( "~<\r([^a-zA-Z!?\s])~", "<\1", strip_tags( preg_replace( '~<([^a-zA-Z!?\s])~', "<\r\1", $text ), '<p><a><br>' ) );
I can recommend you to encode the data that the user submits and then remove the tags you don't allow. This way you won't remove tags that appear normally on the page.
Please note that running complex regex expression on big string so not efficient.
Take the input from the user encode it so instead of <p> you will save <p> and then you can insert it to the page as html so it will render as html but without the actual tags, that way you don't need to remove anything.
You can use htmlspecialchars(string) here is an example

text box submits empty html tags to sql database

This one is kind of complicated to explain but here we go
I have a text box sent up as an iframe so I can allow people to make their text bold or italic before submitting it to my database. I'm working in php and sql.
I've just discovered that if you were to enter a bunch of blank lines I get a bunch of
<br><br><br>
etc...' stored in the database.
I already have functions in place to strip out all unwanted html apart from paragraph and linebreak tags, and of course bold and italic, but what I now need is a function to check if the content is entirely html tags and no actual text inbetween them.
I've no idea how to go about this.
I'd like to allow something like '
<br> I am <br><br><br>
but not
<br> <br> <br>
or
<br> <br><br><br>
or something similar, empty tags or tags with just white space. How would I go about this?
I think I'm pretty clear on my problem without pasting any actual code as such, but I'm but I'm happy to edit this if you want :)
Cheers
Just remove all html tags and trim the remaining string. If nothing is left, there is no content:
if (empty(trim(strip_tags($your_string))))
{
// no content
}

Spaces doesn't come after nl2br(htmlentities($text))?

I am printing a article with spaces inside the article.
Text inside article has HTML tags also,so i am using htmlentities before echo.
But problem is that display does't show spaces on the browser.
What is the problem with these commands?
Can someone please suggest me a better option?
DB update command:
mysql_real_escape_string($text, $db)
Article display command:
echo nl2br(htmlentities($row_page['text']));
Example: displayed text is pretty ugly
Real text and i am expecting same:
dbus-1/ libcom_err.so.2# libglib-2.0.so.0# liblvm2cmd.so.2.02* libpopt.so.0.0.0*
device-mapper/ libcom_err.so.2.1* libglib-2.0.so.0.2200.5* libm-2.12.so* libproc-3.2.8.so*
firmware/ libcrypt-2.12.so* libgmodule-2.0.so.0#
HTML collapses all whitespace (spaces or newlines or tabs) into a single space. You can work around it by replacing ' ' with ' ', for example:
echo str_replace(' ', ' ', nl2br(htmlentities($row_page['text'])));
But even cleaner is to just have the browser use pre formatted whitespace:
<pre><?php echo htmlentities($row_page['text']); ?></pre>
Or alternatively use CSS for a bit of extra flexibility:
<div style="white-space: pre;"><?php echo htmlentities($row_page['text']); ?></div>
Pre formatted whitespace has some drawbacks, for example you can't have any newlines or indentation in your HTML file when you're using pre, because the browser will render them. But when you really need to control how something is rendered it's the best choice.
Browsers collapse continuous whitespace into a single space. That's the way it works, mainly so you can write source code like this:
<p>
Some very long text nicely indented and readable in source,
so it's easy to write for the author.
</p>
and it will display nicely in a browser like this:
Some very long text nicely indented and readable in source, so it's easy to write for the author.
To use pre-formatted text, wrap that section in a <pre> tag or use the equivalent CSS rule white-space: pre. The way you're escaping HTML makes this rather difficult of course. A markup language like Markdown may be the solution there.

PHP code line break `\n` causing gap between elements

I'm echoing a series of HTML elements using PHP. I'm using \n to cause code line breaks to make the source code more organized and legible.
For some reason, the use of \n in a specific location is causing a mysterious gap between the HTML elements. In firebug, this gap is not showing up as a margin, or padding, but rather just a gap.
Here is the PHP in question:
Note: As you can see, I have removed all of the PHP inside the tags as I'm pretty sure it is not relevant to this problem.
echo '<ul ... >'."\n";
while($row = mysql_fetch_assoc($result_pag_data)) {
echo '<li><a ... >'."\n".
'<img ... >'."\n".
'</a></li>'."\n"; <---- THIS IS THE \n THAT SEEMS TO BE CAUSING THE GAP
}
echo '</ul>'."\n";
Have you ever seen anything like this before, a presentation gap associated with PHP line breaks?
If so, what is the reason for it?
Is it really that important that I use \n in my code?
That's normal. A \n line break has no meaning in HTML, so it's interpreted as a space character. If you don't want that gap, then eliminate the \n, or rewrite the html so it's not relevant:
<li><a ...><img ...></a></li>
As a general rule, tags which can contain text should never have their closing tags on a line by themselves, for this very reason.
Following up on your 'where to put \n' question. This comes down to personal preference, but I tend to format my html like this:
<table>
<tr>
<td><a href="some big long ugly url">
<img ....></a></td>
</tr>
Since <tr> can't contain any text on its own (in valid html), it's ok to put on its own line. But the </a> and </td> are both tags that CAN contain text, so I put them right up against the end of the 'text' (the img tag in this case), so that the Phantom Linebreak Menance (coming soon to a starwars ripoff near you) can't strike.
Note, of course, that my example does have a line break and indentation between the opening <a> and the <img> tag, so that's another place where a "must be right next to each" other layout would cause a gap. If you need a series of things lined up smack dab against each other, than you basically can't use line breaks anywhere in that section of the page.
The whitespace is translated into (empty) HTML text nodes, which take up some space (you can test this by walking the DOM). There is no solution to make these disappear that I know of other than removing the whitespace from your HTML in the first place.
Of course it's not only \n that would cause this behavior; spaces or tabs would do exactly the same as well.
In that particular case the newlines are used to prettify the html source, keep it readable via view-source. That's quite common actually. (Yet redundant.)
As said by the other answers, it does not have meaning normally. Albeit this can be overriden via CSS and the attribute (which we can assume is not the case here):
white-space: pre-line;
You should only output a newline where you in fact want a newline in the output. In HTML, a newline is whitespace, just like the space character.

Escape all HTML except <br>

I am trying to display comments on a page and am having some trouble.
There are essentially two different types of comments I am trying to handle:
(1) The XSS type.. e.g. <script type="text/javascript">alert('hi')</script>. This is handled fairly easily by escaping it before it gets into the database and then running stripslashes and htmlentities on it.
(2) The comment with <br> breaks in it. When the data is stored into the database, I am running nl2br on it so the data looks like hi<br>hello<br><br>etc. However, when I display this comment, the <br>s do not turn into page breaks like I want them to.
Any idea what to do? I should note that turning off htmlentities fixes the second type, but the first type then is executed as pure html and displays an alert dialog.
Thanks,
Phil
If you want to remove unwanted tags you can try strip_tags. It supports allowable_tags so you can specify any tags that you don't want to be stripped. A sample from the manual:
// Allow <p> and <a>
// you can add <br> if you want it not stripped
echo strip_tags($text, '<p><a>');
So after you've converted all \n to be line breaks you dont have to worry about it being stripped. May not be what you want but hope it gives an idea.
One method: Replace <br> with a placeholder, like \n. Then do htmlentities to clean up html code. Finally, replace \n back with <br> to recover the line breaks.

Categories