Regex detect line break - php

I have this message (without the quotes, it's just for being precise):
"hey, here i am<br /><br />
"
Note the white space after the line break. So here's the thing: I'm trying to remove all the invisible chars and the <br /> of the message, all of them at the end of the message, with a regex to have something like "hey, here I am". But I must do something wrong because I can't make it work. That's what I tried:
$content = preg_replace('{(<br(\s*/)?>| |\r\n|\r|\n| )+$}i', '', $content);
But the message remains the same at the end. Must be something simple I missed. Thank you for your help!

You don't need a regular expression to do that. Use the strip tags function to remove the tags.
$str = 'hey, here i am<br /><br />';
echo strip_tags($str);//yields hey, here i am
Don't try to write your own regular expressions to parse HTML when you have tools that already do it. Sometimes it's necessary depending on case, but in your case I would say it isn't. Just use the built in function.

You can use the following regex:
([^\s\w",](?:br\W*\s*)+)"$
Working demo
The code is:
$re = "/([^\\s\\w\\",](?:br\\W*\\s*)+)\\"$/";
$str = "\"hey, here i am<br /> test<br /><br />\n \"";
$subst = '';
$result = preg_replace($re, $subst, $str);

You should not use regular expression to do what you wanted. Take a look at this answer: RegEx match open tags except XHTML self-contained tags
Instead use strip_tags().

$str = 'hey, here i am<br /><br />';
$str=~s{(<br />|\s)*$}{}ig;
use this code this might help you

my $cnt;
$cnt = "hey, here i am<br /><br />";
$cnt =~s/(<br \/>)*//isg;
print $cnt;
output : "hey, here i am"

Related

How to convert multiple <br> to twice <br> tag in php [duplicate]

Wanted to convert
<br/>
<br/>
<br/>
<br/>
<br/>
into
<br/>
You can do this with a regular expression:
preg_replace("/(<br\s*\/?>\s*)+/", "<br/>", $input);
This if you pass in your source HTML, this will return a string with a single <br/> replacing every run of them.
Mine is almost exactly the same as levik's (+1), just accounting for some different br formatting:
preg_replace('/(<br[^>]*>\s*){2,}/', '<br/>', $sInput);
Enhanced readability, shorter, produces correct output regardless of attributes:
preg_replace('{(<br[^>]*>\s*)+}', '<br/>', $input);
Thanks all..
Used Jakemcgraw's (+1) version
Just added the case insensative option..
{(<br[^>]*>\s*)+}i
Great tool to test those Regular expressions is:
http://www.spaweditor.com/scripts/regex/index.php
without preg_replace, but works only in PHP 5.0.0+
$a = '<br /><br /><br /><br /><br />';
while(($a = str_ireplace('<br /><br />', '<br />', $a, $count)) && $count > 0)
{}
// $a becomes '<br />'
Use a regular expression to match <br/> one or more times, then use preg_replace (or similar) to replace with <br/> such as levik's reply.
You probably want to use a Regular Expression. I haven't tested the following, but I believe it's right.
$text = preg_replace( "/(<br\s?\/?>)+/i","<br />", $text );
A fast, non regular-expression approach:
while(strstr($input, "<br/><br/>"))
{
$input = str_replace("<br/><br/>", "<br/>", $input);
}
User may enter many variants
<br>
<br/>
< br />
<br >
<BR>
<BR>< br>
...and more.
So I think it will be better next
$str = preg_replace('/(<[^>]*?br[^>]*?>\s*){2,}/i', '<br>', $str);

preg_replace question

I'm trying to use preg_replace to strip out a section of code but I am having problems getting it to work right.
Code Example:
$str = '<p class="code">some string here</p>';
PHP I'm using:
$pattern = array();
$pattern[0] = '!<p class="code">!';
$pattern[1] = '!</p>!';
preg_replace($pattern,"", $str);
This strips out the code just as I want with the exception of the space between the p and class.
Returns:
some string here //notice the single space at the beginning.
I'm trying to get:
some string here //no space at the beginning.
I have been beating my head against the wall trying to find a solution. The reason I'm trying to strip it out in a chunk instead of breaking the preg_replace into pieces is because I don't want to change anything that may be in the string between the tags. Any ideas?
That does not happen for me (and it shouldn't).
It may be a space output somewhere else (use var_dump() to view the string).
You might want to look into this thread to see if you want to switch to using DOMDocument. It'll save you a great deal of headaches trying to parse through HTML.
Robust and Mature HTML Parser for PHP
test:
<?php
$str = '<p class="code">some string here</p>';
$pattern = array();
$pattern[0] = '!<p class="code">!';
$pattern[1] = '!</p>!';
$result = preg_replace($pattern,"", $str);
var_dump($result);
result:
php pregrep.php
string(16) "some string here"
seems to work just fine.
Alex I figured out where I was picking up the extra space.
I was putting that code into a text area like this:
$str = '<p class="code">some string here</p>';
$pattern = array();
$pattern[0] = '!<p class="code">!';
$pattern[1] = '!</p>!';
$strip_str = preg_replace($pattern,"", $str);
<textarea id="code_area" class="syntaxhl" name="code" cols="66" rows="5">
<?php echo $strip_str; ?>
</textarea>
This gave me my extra space but when I changed the code to:
<textarea id="code_area" class="syntaxhl" name="code" cols="66" rows="5"><?php echo $strip_str; ?></textarea>
No line spaces or breaks the extra space went away.
Why not use trim()?
$text = trim($text);
This removes white spaces around strings.

How to remove redundant <br /> tags from HTML code using PHP?

I'm parsing some messy HTML code with PHP in which there are some redundant tags and I would like to clean them up a bit. For instance:
<br>
<br /><br />
<br>
How would I replace something like that with this using preg_replace()?:
<br /><br />
Newlines, spaces, and the differences between <br>, <br/>, and <br /> would all have to be accounted for.
Edit: Basically I'd like to replace every instance of three or more successive breaks with just two.
Here is something you can use. The first line finds whenever there is 2 or more <br> tags (with whitespace between and different types) and replace them with wellformated <br /><br />.
I also included the second line to clean up the rest of the <br> tags if you want that too.
function clean($txt)
{
$txt=preg_replace("{(<br[\\s]*(>|\/>)\s*){2,}}i", "<br /><br />", $txt);
$txt=preg_replace("{(<br[\\s]*(>|\/>)\s*)}i", "<br />", $txt);
return $txt;
}
This should work, using minimum specifier:
preg_replace('/(<br[\s]?[\/]?>[\s]*){3,}/', '<br /><br />', $multibreaks);
Should match appalling <br><br /><br/><br> constructions too.
this will replace all breaks ... even if they're in uppercase:
preg_replace('/<br[^>]*>/i', '', $string);
Try with:
preg_replace('/<br\s*\/?>/', '', $inputString);
Use str_replace, its much better for simple replacement, and you can also pass an array instead of a single search value.
$newcode = str_replace("<br>", "", $messycode);

Remove all text between <hr> and <embed> tag?

<hr>I want to remove this text.<embed src="stuffinhere.html"/>
I tried using regex but nothing works.
Thanks in advance.
P.S. I tried this: $str = preg_replace('#(<hr>).*?(<embed)#', '$1$2', $str)
You'll get a lot of advice to use an HTML parser for this kind of thing. You should do that.
The rest of this answer is for when you've decided that the HTML parser is too slow, doesn't handle ill formed (i.e. standard in the wild) HTML, or is a pain in the ass to integrate into the system you don't control. I created the following small shell script
$str = '<hr>I want to remove this text.<embed src="stuffinhere.html"/>';
$str = preg_replace('#(<hr>).*?(<embed)#', '$1$2', $str);
var_dump($str);
//outputs
string(35) "<hr><embed src="stuffinhere.html"/>"
and it did remove the text, so I'd check your source documents and any other PHP code around your RegEx. You're not feeding preg_replace the string you think you are. My best guess is your source document has irregular case, or there's whitespace between the <hr /> and <embed>. Try the following regular expression instead.
$str = '<hr>I want to remove
this text.
<EMBED src="stuffinhere.html"/>';
$str = preg_replace('#(<hr>).*?(<embed)#si', '$1$2', $str);
var_dump($str);
//outputs
string(35) "<hr><EMBED src="stuffinhere.html"/>"
The "i" modifier says "make this search case insensitive". The "s" modifier says "the [.] character should also match my platform's line break/carriage return sequence"
But use a proper parser if you can. Seriously.
I think the code is self-explanatory and pretty easy to understand since it does not use regex (and it might be faster)...
$start='<hr>';
$end='<embed src="stuff...';
$str=' html here... ';
function between($t1,$t2,$page) {
$p1=stripos($page,$t1);
if($p1!==false) {
$p2=stripos($page,$t2,$p1+strlen($t1));
} else {
return false;
}
return substr($page,$p1+strlen($t1),$p2-$p1-strlen($t1));
}
$found=between($start,$end,$str);
while($found!==false) {
$str=str_replace($start.$found.$end,$start.$end,$str);
$found=between($start,$end,$str);
}
// do something with $str here...
$text = '<hr>I want to remove this text.<embed src="stuffinhere.html"/>';
$text = preg_replace('#(<hr>).*?(<embed.*?>)#', '$1$2', $text);
echo $text;
If you want to hard code src in embed tag:
$text = '<hr>I want to remove this text.<embed src="stuffinhere.html"/>';
$text = preg_replace('#(<hr>).*?(<embed src="stuffinhere.html"/>)#', '$1$2', $text);
echo $text;

How to remove <br /> tags and more from a string?

I need to strip all <br /> and all 'quotes' (") and all 'ands' (&) and replace them with a space only ...
How can I do this? (in PHP)
I have tried this for the <br />:
$description = preg_replace('<br />', '', $description);
But it returned <> in place of every <br />...
Thanks
<?php
$text = '<p>Test paragraph.</p><!-- Comment --> Other text';
echo strip_tags($text);
echo "\n";
// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>
http://php.net/manual/en/function.strip-tags.php
You can use str_replace like this:
str_replace("<br/>", " ", $orig );
preg_replace etc uses regular expressions and that may not be what you want.
If str_replace() isnt working for you, then something else must be wrong, because
$string = 'A string with <br/> & "double quotes".';
$string = str_replace(array('<br/>', '&', '"'), ' ', $string);
echo $string;
outputs
A string with double quotes .
Please provide an example of your input string and what you expect it to look like after filtering.
To manipulate HTML it is generally a good idea to use a DOM aware tool instead of plain text manipulation tools (think for example what will happen if you enounter variants like <br/>, <br /> with more than one space, or even <br> or <BR/>, which altough illegal are sometimes used). See for example here: http://sourceforge.net/projects/simplehtmldom/
To remove all permutations of br:
<br> <br /> <br/> <br >
check out the user contributed strip_only() function in
http://www.php.net/strip_tags
The "Use the DOM instead of replacing" caveat is always correct, but if the task is really limited to these three characters, this should be o.k.
Try this:
$description = preg_replace('/<br \/>/iU', '', $description);
$string = "Test<br>Test<br />Test<br/>";
$string = preg_replace( "/<br>|\n|<br( ?)\/>/", " ", $string );
echo $string;
This worked for me, to remove <br/> :
(> is recognised whereas > isn't)
$temp2 = str_replace('<','', $temp);
// echo ($temp2);
$temp2 = str_replace('/>','', $temp2);
// echo ($temp2);
$temp2 = str_replace('br','', $temp2);
echo ($temp2);

Categories