remove HTML from displaying in PHP - php

I have this text : http://pastebin.com/2Zgbs7hi
And i want to be able to remove the HTML code from it and just display the plain text but i want to keep at least one line break where there are currently a few line breaks
i have tried:
$ticket["summary"] = 'pastebin example';
$TicketSummaryDisplay = nl2br($ticket["summary"]);
$TicketSummaryDisplay = stripslashes($TicketSummaryDisplay);
$TicketSummaryDisplay = trim(strip_tags($TicketSummaryDisplay));
$TicketSummaryDisplay = preg_replace('/\n\s+$/m', '', $TicketSummaryDisplay);
echo $TicketSummaryDisplay;
that is displaying as plain text, but it shows it all as one big block of text with no line breaks at all

Maybe this will earn you some time.
<?php
libxml_use_internal_errors(true); //crazy o tags
$html = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
$dom = new DOMDocument;
$dom->loadHTML($html);
$result='';
foreach ($dom->getElementsByTagName('p') as $node) {
if (strstr($node->nodeValue, 'Legal Disclaimer:')){
break;
}
$result .= $node->nodeValue;
}
echo $result;

This example should successfully store text from html into an array of strings.
After stripping all the tags, you can use preg_split with \R special character ( matches any newline sequence ) to convert string into array. That array will now have several blank values, and there will be also some amount of html non-breaking space entities, so we will check the array for empty values with array_filter() function ( it will remove all items that do not satisfy the filter conditions, in our case, an empty value ). Here are a problem with entity, because and space characters are not the same, they have different ASCII code, so trim() function will not remove spaces. Here are two possible solutions, the first uncommented part will only replace &nbsp and check for white space characters, while the second commented one will decode all html entities and also check for spaces.
PHP:
$text = file_get_contents( 'http://pastebin.com/raw.php?i=2Zgbs7hi' );
$text = strip_tags( $text );
$array = array_filter(
preg_split( '/\R/', $text ),
function( &$item ) {
$item = str_replace( ' ', ' ', $item );
return trim( $item );
// $item = html_entity_decode( $item );
// return trim( str_replace( "\xC2\xA0", ' ', $item ) );
}
);
foreach( $array as $value ) {
echo $value . '<br />';
}
Array output:
Array
(
[8] => Hi,
[11] => Ashley has explained that I need to ask for another line and broadband for the wifi to work, please can you arrange this.
[13] => Regards
[23] => Legal Disclaimer:
[24] => This email and its attachments are confidential. If you received it by mistake, please don’t share it. Let us know and then delete it. Its content does not necessarily represent the views of The Dragon Enterprise
[25] => Centre and we cannot guarantee the information it contains is complete. All emails are monitored and may be seen by another member of The Dragon Enterprise Centre's staff for internal use
)
Now you should have clear array with only items with value in it. By the way, newlines in HTML are expressed through <br />, not through \n, your example as response in a web browser still has them, but they are only visible in page source code. I hope I did not missed the point of the question.

try this get text output with line brakes
<?php
$ticket["summary"] = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
$TicketSummaryDisplay = nl2br($ticket["summary"]);
echo strip_tags($TicketSummaryDisplay,'<br>');
?>

You are asking on how to add line-breaks to your "one big block of text with no line breaks at all".
Short answer
After you stripped the HTML tags, apply wordwrap with a desired text-block length
$text = wordwrap($text, 90, "<br />\n");
I really wonder, why nobody suggested that function before.
there is also chunk_split around, which doesn't take words into account and just splits after a certain number of chars. breaking words - but that's not what you want, i guess.
PHP
<?php
$text = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
/**
* Returns string without html tags, also
* removes takes control chars, spaces and " " into account.
*/
function dropHtmlTags($string) {
// remove html tags
//$string = preg_replace ('/<[^>]*>/', ' ', $string);
$string = strip_tags($string);
// control characters and "&nbsp"
$string = str_replace("\r", '', $string); // remove
$string = str_replace("\n", ' ', $string); // replace with space
$string = str_replace("\t", ' ', $string); // replace with space
$string = str_replace(" ", ' ', $string);
// remove multiple spaces
$string = preg_replace('/ {2,}/', ' ', $string);
$string = trim($string);
return $string;
}
$text = dropHtmlTags($text);
// The Answer: insert line breaks after 95 chars,
// to get rid of the "one big block of text with no line breaks at all"
$text = wordwrap($text, 95, "<br />\n");
// if you want to insert line-breaks before the legal disclaimer,
// uncomment the next line
//$text = str_replace("Regards Legal Disclaimer", "<br /><br />Regards Legal Disclaimer", $text);
echo $text;
?>
Result
first section shows your text block
second section shows the text with wordwrap applied (code from above)

Hello it can be done as follows:
$abc= file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
$abc = strip_tags("\n", $abc);
echo $abc;
Please, let me know whether it works

you may use
<?php
$a= file_get_contents('a.txt');
echo nl2br(htmlspecialchars($a));
?>

<?php
$handle = #fopen("pastebin.html", "r");
if ($handle) {
while (!feof($handle)) {
$buffer = fgetss($handle, 4096);
echo $buffer;
}
fclose($handle);
}
?>
output is
Hi,
Ashley has explained that I need to ask for another line and broadband for the wifi to work, please can you arrange this.
Regards
Legal Disclaimer:
This email and its attachments are confidential. If you received it by mistake, please don’t share it. Let us know and then delete it. Its content does not necessarily represent the views of The Dragon Enterprise
Centre and we cannot guarantee the information it contains is complete. All emails are monitored and may be seen by another member of The Dragon Enterprise Centre's staff for internal use
You can probably write additional code to convert to spaces etc.

I'm not sure I did understand everything correctly but this seems to be your expected result:
$txt = file_get_contents('http://pastebin.com/raw.php?i=2Zgbs7hi');
var_dump(preg_replace("/(\&nbsp\;(\s{1,})?)+/", "\n", trim(strip_tags(preg_replace("/(\s){1,}/", " ", $txt)))));
//more readable
$txt = preg_replace("/(\s){1,}/", " ", $txt);
$txt = trim(strip_tags($txt));
$txt = preg_replace("/(\&nbsp\;(\s{1,})?)+/", "\n", $txt);

The strip_tags() function strips HTML and PHP tags from a string, if that is what you are trying to accomplish.
Examples from the docs:
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> Other text';
echo strip_tags($text);
echo "\n";
// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>
The above example will output:
Test paragraph. Other text
<p>Test paragraph.</p> Other text

Related

echo characters only in one line with explode function php

here is what i want to do
i am working with php explode function trying to limit characters it prints after defined condition
{
$result=http://php.net
new line characters i don't want to print
$links =explode("://",$result);
$nows=$links[1];
echo $nows;
}
as you can see the above code will print
php.net
new line characters i don't want to print
but instead i want to stop printing after
php.net
You can replace newline characters with nothing:
$nows = str_replace("\n", "", $links[1]);
$nows = str_replace("\r", "", $nows);
echo $nows;
If you want only what is printed on the first line, try this:
$result = "php.net
and some other text";
$nows = reset(explode("\n", str_replace("\r\n", "\n", $result)));
If the part you're looking after will always be in the first line:
$result="http://php.net
new line characters i don't want to print";
$links = explode("\n",$result);
/*
$links[0] ->http://php.net
$links[1] ->new line characters i don't want to print
*/
$links =explode("://",$links[0]);
$nows=$links[1];
echo $nows;
/*
php.net
*/
Anyway , Consider giving more details about your case in order to offer a better way.
For instance , maybe regex?
Try
$nows = trim( $links[1] );
TRIM() will remove newlines among other things
Manual page
EDIT:
Well now we have the actual situation which you say is :-
$result=http://php.net</br>nameserver:ns1</br>nameserver:ns2.
Try
$t = explode( '</br>', $result );
$t1 = explode ( '://', $t[0] );
echo $t1[1];
Just as a note, if it is you that is creating this string somewhere else </br> is not a valid html tag, it should be <br> or if you are using XHTML it should be <br />.

Replace text ignoring HTML tags

I have a simple text with HTML tags, for example:
Once <u>the</u> activity reaches the resumed state, you can freely add and remove fragments to the activity. Thus, <i>only</i> while the activity is in the resumed state can the <b>lifecycle</b> of a <hr/> fragment change independently.
I need to replace some parts of this text ignoring its html tags when I do this replace, for example this string - Thus, <i>only</i> while I need to replace with my string Hello, <i>its only</i> while . Text and strings to be replaced are dynamically. I need your help with my preg_replace pattern
$text = '<b>Some html</b> tags with <u>and</u> there are a lot of tags <i>in</i> this text';
$arrayKeys= array('Some html' => 'My html', 'and there' => 'is there', 'in this text' => 'in this code');
foreach ($arrayKeys as $key => $value)
$text = preg_replace('...$key...', '...$value...', $text);
echo $text; // output should be: <b>My html</b> tags with <u>is</u> there are a lot of tags <i>in</i> this code';
Please help me to find solution. Thank you
Basically we're going to build dynamic arrays of matches and patterns off of plain text using Regex. This code only matches what was originally asked for, but you should be able to get an idea of how to edit the code from the way I've spelled it all out. We're catching either an open or a close tag and white space as a passthru variable and replacing the text around it. This is setup based on two and three word combinations.
<?php
$text = '<b>Some html</b> tags with <u>and</u> there are a lot of tags <i>in</i> this text';
$arrayKeys= array(
'Some html' => 'My html',
'and there' => 'is there',
'in this text' =>'in this code');
function make_pattern($string){
$patterns = array(
'!(\w+)!i',
'#^#',
'! !',
'#$#');
$replacements = array(
"($1)",
'!',
//This next line is where we capture the possible tag or
//whitespace so we can ignore it and pass it through.
'(\s?<?/?[^>]*>?\s?)',
'!i');
$new_string = preg_replace($patterns,$replacements,$string);
return $new_string;
}
function make_replacement($replacement){
$patterns = array(
'!^(\w+)(\s+)(\w+)(\s+)(\w+)$!',
'!^(\w+)(\s+)(\w+)$!');
$replacements = array(
'$1\$2$3\$4$5',
'$1\$2$3');
$new_replacement = preg_replace($patterns,$replacements,$replacement);
return $new_replacement;
}
foreach ($arrayKeys as $key => $value){
$new_Patterns[] = make_pattern($key);
$new_Replacements[] = make_replacement($value);
}
//For debugging
//print_r($new_Patterns);
//print_r($new_Replacements);
$new_text = preg_replace($new_Patterns,$new_Replacements,$text);
echo $new_text."\n";
echo $text;
?>
Output
<b>My html</b> tags with <u>is</u> there are a lot of tags <i>in</i> this code
<b>Some html</b> tags with <u>and</u> there are a lot of tags <i>in</i> this text
Here we go. this piece of code should work, assuming you're respecting only twp constraints :
Pattern and replacement must have the same number of words. (Logical, since you want to keep position)
You must not split a word around a tag. (<b>Hel</b>lo World won't work.)
But if these are respected, this should work just fine !
<?php
// Splits a string in parts delimited with the sequence.
// '<b>Hey</b> you' becomes '~-=<b>~-=Hey~-=</b>~-= you' that make us get
// array ("<b>", "Hey" " you")
function getTextArray ($text, $special) {
$text = preg_replace ('#(<.*>)#isU', $special . '$1' . $special, $text); // Adding spaces to make explode work fine.
return preg_split ('#' . $special . '#', $text, -1, PREG_SPLIT_NO_EMPTY);
}
$text = "
<html>
<div>
<p>
<b>Hey</b> you ! No, you don't have <em>to</em> go!
</p>
</div>
</html>";
$replacement = array (
"Hey you" => "Bye me",
"have to" => "need to",
"to go" => "to run");
// This is a special sequence that you must be sure to find nowhere in your code. It is used to split sequences, and will disappear.
$special = '~-=';
$text_array = getTextArray ($text, $special);
// $restore is the array that will finally contain the result.
// Now we're only storing the tags.
// We'll be story the text later.
//
// $clean_text is the text without the tags, but with the special sequence instead.
$restore = array ();
for ($i = 0; $i < sizeof ($text_array); $i++) {
$str = $text_array[$i];
if (preg_match('#<.+>#', $str)) {
$restore[$i] = $str;
$clean_text .= $special;
}
else {
$clean_text .= $str;
}
}
// Here comes the tricky part.
// We wanna keep the position of each part of the text so the tags don't
// move after.
// So we're making the regex look like (~-=)*Hey(~-=)* you(~-=)*
// And the replacement look like $1Bye$2 me $3.
// So that we keep the separators at the right place.
foreach ($replacement as $regex => $newstr) {
$regex_array = explode (' ', $regex);
$regex = '(' . $special . '*)' . implode ('(' . $special . '*) ', $regex_array) . '(' . $special . '*)';
$newstr_array = explode (' ', $newstr);
$newstr = "$1";
for ($i = 0; $i < count ($regex_array) - 1; $i++) {
$newstr .= $newstr_array[$i] . '$' . ($i + 2) . ' ';
}
$newstr .= $newstr_array[count($regex_array) - 1] . '$' . (count ($regex_array) + 1);
$clean_text = preg_replace ('#' . $regex . '#isU', $newstr, $clean_text);
}
// Here we re-split one last time.
$clean_text_array = preg_split ('#' . $special . '#', $clean_text, -1, PREG_SPLIT_NO_EMPTY);
// And we merge with $restore.
for ($i = 0, $j = 0; $i < count ($text_array); $i++) {
if (!isset($restore[$i])) {
$restore[$i] = $clean_text_array[$j];
$j++;
}
}
// Now we reorder everything, and make it go back to a string.
ksort ($restore);
$result = implode ($restore);
echo $result;
?>
Will output Bye me ! No, you don't need to run!
[EDIT] Now supporting a custom pattern, which allows to avoid adding useless spaces.

Lines get split on preg_replace usage

My code-
$input = "this text is for highlighting a text if it exists in a string. Let us check if it works or not";
$pattern ="/if/";
$replacement= "H1Fontbracket"."if"."H1BracketClose";
echo preg_replace($pattern, $replacement, $input);
Now the problem is that when i run this code, it splits into multiple lines, what else do i need to do so that i am able to get it in one line
Use str_replace rather than preg_replace. preg_replace will return an array of strings, and str_replace will just return the string:
echo str_replace($pattern, $replacement, $input);
What do you mean by multiple lines? Of course it'll show up as multiple lines on a webpage if you wrap the ifs in header tags. Headers are block elements. And more importantly, headers are headers. Not for highlighting text.
If you want to highlight something with HTML, you should probably use a span with a class, or you could use the HTML5 element mark:
$input = "this text is for highlighting a text if it exists in an iffy string.";
echo preg_replace('/\\bif\\b/', '<span class="highlighted">$0</span>', $input);
echo preg_replace('/\\bif\\b/', '<mark>$0</mark>', $input);
The \\b is to only match if words, and not just the if letters, which might be part of a different word. Then in your CSS you can decide how the marked words should show up:
.highlighted { background: yellow }
mark { background: yellow }
Or whatever. I would recommend that you read up a bit on how HTML and CSS works if you're going to make web pages :)
Try this
$input = "this text is for highlighting a text if
it exists in a string. Let us check if it works or not";
$pattern="if";
$replacement="<h1>". $pattern. "</h1>";
$input= str_replace($pattern,$replacement,$input);
echo "$input";
function highlight($str,$search){
$patterns = array('/\//', '/\^/', '/\./', '/\$/', '/\|/',
'/\(/', '/\)/', '/\[/', '/\]/', '/\*/', '/\+/',
'/\?/', '/\{/', '/\}/', '/\,/');
$replace = array('\/', '\^', '\.', '\$', '\|', '\(', '\)',
'\[', '\]', '\*', '\+', '\?', '\{', '\}', '\,');
$search = preg_replace($patterns, $replace, $search);
$search = str_replace(" ","|",$search);
return #preg_replace("/(^|\s)($search)/i",'${1}<span class=highlight>${2}</span>',$str);
}

I need to split text delimited by paragraph tag

$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>";
I need to split the above into an array delimited by the paragraph tags. That is, I need to split the above into an array with two elements:
array ([0] = "this is the first paragraph", [1] = "this is the first paragraph")
Remove the closing </p> tags as we don't need them and then explode the string into an array on opening </p> tags.
$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>";
$text = str_replace('</p>', '', $text);
$array = explode('<p>', $text);
To see the code run please see the following codepad entry. As you can see this code will leave you with an empty array entry at index 0. If this is a problem then it can easily be removed by calling array_shift($array) before using the array.
For anyone else who finds this, don't forget that a P tag may have styles, id's or any other possible attributes so you should probably look at something like this:
$ps = preg_split('#<p([^>])*>#',$input);
This is an old question but I was not able to find any reasonable solution in an hour of looking for stactverflow answers. If you have string full of html tags (p tags) and if you want to get paragraphs (or first paragraph) use DOMDocument.
$long_description is a string that has <p> tags in it.
$long_descriptionDOM = new DOMDocument();
// This is how you use it with UTF-8
$long_descriptionDOM->loadHTML((mb_convert_encoding($long_description, 'HTML-ENTITIES', 'UTF-8')));
$paragraphs = $long_descriptionDOM->getElementsByTagName('p');
$first_paragraph = $paragraphs->item(0)->textContent();
I guess that this is the right solution. No need for regex.
edit: YOU SHOULD NOT USE REGEX TO PARSE HTML.
$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>";
$exptext = explode("<p>", $text);
echo $exptext[0];
echo "<br>";
echo $exptext[1];
//////////////// OUTPUT /////////////////
this is the first paragraph
this is the first paragraph
Try this code:
<?php
$textArray = explode("<p>" $text);
for ($i = 0; $i < sizeof($textArray); $i++) {
$textArray[$i] = strip_tags($textArray[$i]);
}
If your input is somewhat consistent you can use a simple split method as:
$paragraphs = preg_split('~(</?p>\s*)+~', $text, PREG_SPLIT_NO_EMPTY);
Where the preg_split will look for combinations of <p> and </p> plus possible whitespace and separate the string there.
As unnecessary alternative you can also use querypath or phpquery to extract only complete paragraph contents using:
foreach (htmlqp($text)->find("p") as $p) { print $p->text(); }
Try the following:
<?php
$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>";
$array;
preg_replace_callback("`<p>(.+)</p>`isU", function ($matches) {
global $array;
$array[] = $matches[1];
}, $text);
var_dump($array);
?>
This can be modified, putting the array in a class that manage it with an add value method, and a getter.
Try this.
<?php
$text = "<p>this is the first paragraph</p><p>this is the first paragraph</p>";
$array = json_decode(json_encode((array) simplexml_load_string('<data>'.$text.'</data>')),1);
print_r($array['p']);
?>

Get each line from textarea

<textarea> put returns between paragraphs
for linebreak add 2 spaces at end
indent code by 4 spaces
quote by placing > at start of line
</textarea>
$text = value from this textarea;
How to:
1) Get each line from this textarea ($text) and work with them using foreach()?
2) Add <br /> to the end of each line, except the last one?
3) Throw each line to an array.
Important - text inside textarea can be multilanguage.
Have tried to use:
$text = str_replace('\n', '<br />', $text);
But it doesn't work.
Thanks.
You will want to look into the nl2br() function along with the trim().
The nl2br() will insert <br /> before the newline character (\n) and the trim() will remove any ending \n or whitespace characters.
$text = trim($_POST['textareaname']); // remove the last \n or whitespace character
$text = nl2br($text); // insert <br /> before \n
That should do what you want.
UPDATE
The reason the following code will not work is because in order for \n to be recognized, it needs to be inside double quotes since double quotes parse data inside of them, where as single quotes takes it literally, IE "\n"
$text = str_replace('\n', '<br />', $text);
To fix it, it would be:
$text = str_replace("\n", '<br />', $text);
But it is still better to use the builtin nl2br() function, PHP provides.
EDIT
Sorry, I figured the first question was so you could add the linebreaks in, indeed this will change the answer quite a bit, as anytype of explode() will remove the line breaks, but here it is:
$text = trim($_POST['textareaname']);
$textAr = explode("\n", $text);
$textAr = array_filter($textAr, 'trim'); // remove any extra \r characters left behind
foreach ($textAr as $line) {
// processing here.
}
If you do it this way, you will need to append the <br /> onto the end of the line before the processing is done on your own, as the explode() function will remove the \n characters.
Added the array_filter() to trim() off any extra \r characters that may have been lingering.
You could use PHP constant:
$array = explode(PHP_EOL, $text);
additional notes:
1. For me this is the easiest and the safest way because it is cross platform compatible (Windows/Linux etc.)
2. It is better to use PHP CONSTANT whenever you can for faster execution
Old tread...? Well, someone may bump into this...
Please check out http://telamenta.com/techarticle/php-explode-newlines-and-you
Rather than using:
$values = explode("\n", $value_string);
Use a safer method like:
$values = preg_split('/[\n\r]+/', $value_string);
Use PHP DOM to parse and add <br/> in it. Like this:
$html = '<textarea> put returns between paragraphs
for linebreak add 2 spaces at end
indent code by 4 spaces
quote by placing > at start of line
</textarea>';
//parsing begins here:
$doc = new DOMDocument();
#$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('textarea');
//get text and add <br/> then remove last <br/>
$lines = $nodes->item(0)->nodeValue;
//split it by newlines
$lines = explode("\n", $lines);
//add <br/> at end of each line
foreach($lines as $line)
$output .= $line . "<br/>";
//remove last <br/>
$output = rtrim($output, "<br/>");
//display it
var_dump($output);
This outputs:
string ' put returns between paragraphs
<br/>for linebreak add 2 spaces at end
<br/>indent code by 4 spaces
<br/>quote by placing > at start of line
' (length=141)
It works for me:
if (isset($_POST['MyTextAreaName'])){
$array=explode( "\r\n", $_POST['MyTextAreaName'] );
now, my $array will have all the lines I need
for ($i = 0; $i <= count($array); $i++)
{
echo (trim($array[$i]) . "<br/>");
}
(make sure to close the if block with another curly brace)
}
$array = explode("\n", $text);
for($i=0; $i < count($array); $i++)
{
echo $line;
if($i < count($array)-1)
{
echo '<br />';
}
}
$content = $_POST['content_name'];
$lines = explode("\n", $content);
foreach( $lines as $index => $line )
{
$lines[$index] = $line . '<br/>';
}
// $lines contains your lines
For a <br> on each line, use
<textarea wrap="physical"></textarea>
You will get \ns in the value of the textarea. Then, use the nl2br() function to create <br>s, or you can explode() it for <br> or \n.
Hope this helps

Categories