I am using a Richtext box control to post some data in one page.
and I am saving the data to my db table with the HTML mark up Ex : This is <b >my bold </b > text
I am displaying the first 50 characters of this column in another page. Now When i am saving, if i save a Sentence (with more than 50 chars )with bold tag applied and in my other page when i trim this (for taking first 50 chars) I would lost the closing b tag (</b>) .So the bold is getting applied to rest of my contents in that page.
How can i solve this ? How can i check which all open tags are not closed ? is there anyeasy way to do this in PHP. Is there any function to remove my entire HTML tags / mark up and give me the sentence as plain text ?
http://php.net/strip_tags
the strip_tags function will remove any tags you might have.
Yes
$textWithoutTags = strip_tags($html);
I generally use HTML::Truncate for this. Of course, being a Perl module, you won't be able to use it directly in your PHP - but the source code does show a working approach (which is to use an HTML parser).
An alternative approach, might be to truncate as you are doing at the moment, and then try to fix it using Tidy.
If you want the HTML tags to remain, but be closed properly, see PHP: Truncate HTML, ignoring tags. Otherwise, read on:
strip_tags will remove HTML tags, but not HTML entities (such as &), which could still cause problems if truncated.
To handle entities as well, one can use html_entity_decode to decode entities after stripping tags, then trim, and finally reencode the entities with htmlspecialchars:
$text = "1 < 2\n";
print $text;
print htmlspecialchars(substr(html_entity_decode(strip_tags($text), ENT_QUOTES), 0, 3));
(Note use of ENT_QUOTES to actually convert all entities.)
Result:
1 < 2
1 <
Footnote: The above only works for entities that can be decoded to ISO-8859-1. If you need support for international characters, you should already be working with UTF-8 encoded strings, and simply need to specify that in the call to html_entity_decode.
Related
Any ideas why formatted text from DB, when echo-ed out in php loses its formatting, i.e. no new lines? Thanks!
Use nl2br().
New lines are ignored by browser. That's why you see all text without line breaks. nl2br() converts new lines to <br /> tags that are displayed as new lines in browsers.
If you want to display your text in <textarea>, you don't need to convert all new lines to <br />. Anyway, if you do it... you will see "<br />"s as text in new lines places.
Because there are no html tags for formatting!
Try the nl2br function.
You could try add nl2br() function...
something like this: echo nl2br($your_text_variable);
It should work ;-)
The reason
This is the default behavior for all user agents. If you look at the page source, you'll see that your text has the same formatting like the one in the database (or textarea).
The reason of your confusion is probably that you once see the text in the <textarea> tag, which displays preformatted text, does not interpret the tags, and in the other case the text is interpreted (whitespace is not important in this case).
The browsers don't display new lines, unless specifically asked for - using <br> tag or any block level tags.
No tags == no new lines.
The fix
If you store preformatted text in the database,
you should wrap the output in the <pre> tag.
You may want to convert the formatting characters to the HTML tags you need using set of functions like nl2br, str_replace etc.
You may also correct your structure to store the HTML in the database instead of just plain text (however markup looks like a better solution).
See similar question:
How do I keep whitespace formatting using PHP/HTML?
The difference between the two images you show is that one has the text in a <textarea></textarea> and the other does not ... if you want 1:1: <textarea><?php echo $yourVariable;?></textarea>
It does output what you say to output. If the text is pre-formatted, put it inside the HTML <pre></pre> tag in your output script.
This should be helpful in answering.
How do I keep whitespace formatting using PHP/HTML?enter link description here
Set up a string preprocessing code for both input to database and output to display page
I recently had the need to truncate post content that contains HTML (for a post excerpt/summary, etc.). This is usually done by manually entering an excerpt for the post, but for this specific project, I need to do it automatically.
I tried to create a simple method which just takes a character count and sub-strings the content. However, this does not work all the time as it may truncate the content within an HTML tag/attribute.
eg:
<?php
function truncateText($string, $chars) { return substr($string, 0, $chars); }
$content = "<div><p>some content</p><a href='http://google.com'>Let's go to google</a></div>";
echo truncateText($content,40); //returns "<div><p>some content</p><a href='http:/"
as you can see, it will return a broken HTML, which will not render properly. How would I be able to truncate content, yet retain HTML tags?
Your approach yelds many problems. Do you want to truncate at the 40 characters, then add as many tags as needed until they are closed? Or do you prefer to truncate at 40 and trim as much as needed to make the tags work? Do the tags add up to the 40 characters or they are ignored when counting? There are many problems with this as you can see. However, there's an alternative commonly found for summaries:
Delete the tags and truncate the text. The summary is normally just a small extract of text, a paragraph, with simple format. You don't want lists here and in most cases and stripping a link or two is okay for this.
However, if you really want to go down that road, I'd recommend meaningfully reading the html tags with some DOM parser, but to know how to do that you will first need to answer the first questions I wrote.
If you don't care if formatting is removed from your text, then just send the string through the PHP function strip-tags() before you do anything else. Instructions here.
Any ideas why formatted text from DB, when echo-ed out in php loses its formatting, i.e. no new lines? Thanks!
Use nl2br().
New lines are ignored by browser. That's why you see all text without line breaks. nl2br() converts new lines to <br /> tags that are displayed as new lines in browsers.
If you want to display your text in <textarea>, you don't need to convert all new lines to <br />. Anyway, if you do it... you will see "<br />"s as text in new lines places.
Because there are no html tags for formatting!
Try the nl2br function.
You could try add nl2br() function...
something like this: echo nl2br($your_text_variable);
It should work ;-)
The reason
This is the default behavior for all user agents. If you look at the page source, you'll see that your text has the same formatting like the one in the database (or textarea).
The reason of your confusion is probably that you once see the text in the <textarea> tag, which displays preformatted text, does not interpret the tags, and in the other case the text is interpreted (whitespace is not important in this case).
The browsers don't display new lines, unless specifically asked for - using <br> tag or any block level tags.
No tags == no new lines.
The fix
If you store preformatted text in the database,
you should wrap the output in the <pre> tag.
You may want to convert the formatting characters to the HTML tags you need using set of functions like nl2br, str_replace etc.
You may also correct your structure to store the HTML in the database instead of just plain text (however markup looks like a better solution).
See similar question:
How do I keep whitespace formatting using PHP/HTML?
The difference between the two images you show is that one has the text in a <textarea></textarea> and the other does not ... if you want 1:1: <textarea><?php echo $yourVariable;?></textarea>
It does output what you say to output. If the text is pre-formatted, put it inside the HTML <pre></pre> tag in your output script.
This should be helpful in answering.
How do I keep whitespace formatting using PHP/HTML?enter link description here
Set up a string preprocessing code for both input to database and output to display page
I have been building a function reads in the title text as found on a webpage between the <title></title> tags. I am using the following regex code to grab the title text form the html page:
if(preg_match('#<title>([^<]+)</title>#simU', $this->html, $m1))
$this->title = trim($m1[1]);
I am using the following to encode the value for the mysql insert statement:
mysql_real_escape_string(rawurldecode($this->title))
So that leaves me with a database full of titles that have html entities(&nsbp etc...) and
foreign characters such as in
Dating S.o.s | Gluten-free, Dairy-free, Sugar-free Recipes And Lifestyle Tips
The goal is to decode,remove, clean the titles so that they look as close to perfect english as possible.
I have constructed a function that uses the following 2 regex's to remove html entities and limit junk respectively. And while not ideal(because it removes the html entities rather than preserves them) it's the closest to clean as I've got.
$string = preg_replace("/&#?[a-z0-9]+;/i","",$string);
//remove all non-normal chars
$string = preg_replace('/[^a-zA-Z0-9-\s\'\!\,\|\(\)\.\*\&\#\/\:]/', '', $string);
But the non-english chars still exist.
Would anyone be able to offer help as to:
Best way to save these title strings to the db trying to preserve the english intent (punctuation, apostrophies, etc...)
How to convert or eliminate the strange chars as shown in my example title above?
Thanks much for your help!
For point 1, PHP has an html_entity_decode() function that you can use to turn HTML entities into "regular" characters.
Check out http://www.php.net/manual/en/function.html-entity-decode.php for #1
And http://php.net/manual/en/function.mb-convert-encoding.php for #2
I am developing a MVC application with PHP that uses XML and XSLT to print the views. It need to be fully UTF-8 supported. I also use MySQL right configured with UTF8. My problem is the next.
I have a <input type="text"/> with a value like àáèéìíòóùú"><'##~!¡¿?. This is processed to add it to the database. I use mysql_real_escape_string($_POST["name"]) and then do MySQL a INSERT. This will add a slash \ before " and '.
The MySQL database have a DEFAULT CHARACTER SET utf8 and COLLOCATE utf8_spanish_ci. The table field is a normal VARCHAR.
Then I have to print this on a XML that will be transformed with XSLT. I can use PHP on the XML so I echo it with <?php echo TexUtils::obtainSqlText($value_obtained_from_sql); ?>. The obtainSqlText() function actually returns the same as the $value processed, is waiting for a final structure.
One of the first things that I will need for the selected input is to convert > and < to > and < because this will generate problems with start/end tags. This will be done with <?php htmlspecialchars($string, ENT_QUOTES, "UTF-8"); ?>. This will also converts & to &, " to " and ' to '. This is a big problem: XSLT starts to fail because it doesn't recognize all HTML special characters.
There is another problem. I've talked about àáèéìíòóùú"><'##~!¡¿? input but I will have some text from a CKEditor <textarea /> that the value will look like:
<p>
àáèéìíòóùú"><'##~!¡¿?
</p>
How I've to manage this? At first, if I want to print this second value right I will need to use <xsl:value-of select="value" disable-output-escaping="yes" />. Will "><' print right?
So what I am really looking for is how I need to manage this values and how I've to print. I need to use something if is coming from a VARCHARthat doesn't allows HTML and another if is a TEXT (for example) and allows HTML? I will need to use disable-output-escaping="yes" everytime?
I also want to know if doing this I am really securing the query from XSS attacks.
Thank you in advance!
This will be done with <?php htmlspecialchars($string, ENT_QUOTES, "UTF-8"); ?>.
Fine.
This is a big problem: XSLT starts to fail because it doesn't recognize all HTML special characters.
It shouldn't fail on htmlspecialchars() output, ever. & is a predefined entity in XML and ' is a character reference which is always allowed. htmlspecialchars() should produce XML-compatible output, unlike the usually-a-mistake htmlentities(). What is the error you are seeing?
àáèéìíòóùú"><'##~!¡¿?
Urgh, an HTML rich text editor produced that invalid markup? What a dodgy editor.
If you have to allow users to input arbitrary HTML, it's going to need some processing. Unless you really trust those users, you'll need a purifier (to stop them using dangerous scripting elements and XSS-ing each other), and a tidier (to remove malformed markup either due to crap rich-text-editor output or deliberate sabotage). If you intend to put the content directly into XML you will also need it to convert to XHTML output and replace HTML entity references.
A simple way to do this in PHP would be DOMDocument->loadHTML followed by a walk of the DOM tree removing all but known-good elements/attributes/URL-schemes, followed by DOMDocument->saveXML.
Will "><' print right?
Well, it'll print as in your example, yes. But that's equally invalid as both HTML and XML.