I have searched for a solution to a problem on this site but have not found a way to do this task using regex (or perhaps something just shortened that uses less memory).
I am attempting to parse a string where text after a specific pattern (& the pattern itself) is to be removed from the same line. The text prior to to the pattern and also any line not containing the search pattern should be unedited.
Here is a working example:
$text = 'This is a test to remove single lines.<br />
The line below has the open type bbcode (case insensitive) that is to be removed.<br />
The text on the same line that follows the bbcode should also be removed.<br />
this text should remain[test]this text should be removed on this line only!<br />
the other lines should remain.<br />
done.<br />';
$remove = '[test]';
$lines = preg_split('/\r?\n/', $text);
foreach ($lines as $line)
{
$check = substr($line, 0, stripos($line, $remove));
$new[] = !empty($check) ? $check . '<br />' : $line;
}
$newText = implode($new);
echo $newText;
The above code works as expected but I would like to know how to do this using regex or perhaps something that uses a lot less code and memory. I have attempted to do this using regex from examples on this site + some tinkering but have not been able to get the result that is required. The solution should also use code that is compatible with PHP 5.5 syntax (no \e modifier). Using an array for the removal pattern will also be fitting as I may need to do a search for multiple patterns (although it is not shown in my example).
Thank you.
Thanks to frightangel for showing me the proper regex pattern.
Below is the necessary code to accomplish what was asked above:
$text = 'This is a test to remove single lines.<br />
The line below has the open type bbcode (case insensitive) that is to be removed.<br />
The text on the same line that follows the bbcode should also be removed.<br />
this text should remain[test]this text should be removed on this line only!<br />
the other lines should remain.<br />
[bbc]done.<br />
[another]this line should not be affected.<br />
it works!!<br />';
$removals = array('[test]', '[bbc]');
$remove = str_replace(array('[', ']'), array('~\[', '\].*?(?=\<br\s\/\>|\\n|\\r)~mi'), $removals);
$text = preg_replace($remove, '', $text);
echo $text;
The text that it searches for actually comes from a mysql query that feeds an array so I changed what is shown above to use what will more or less be used ($removals being that array).
The only problem left for me is that if text was prior to the removal then it would be better to leave the final line break from that line instead of omitting it. It should only be omitted if all text from a single line is removed.
Try this way:
$text = 'This is a test to remove single lines.<br />
The line below has the open type bbcode (case insensitive) that is to be removed.<br />
The text on the same line that follows the bbcode should also be removed.
this text should remain[test]this text should be removed on this line only!<br />
the other lines should remain.<br />
done.<br />';
$remove = 'test';
$text = preg_replace("/\[$remove\].*(?=\<br\s\/\>)/m", '', $text);
$text = preg_replace("/^(\<br\s\/\>)|(\\n)|(\\r)$/m","",$text);
echo $text;
Here's regex explanation: http://regex101.com/r/nW1bG8
try this, and tell me if thats what you want
if not, tell me, i probably didnt understand your question
preg_replace("/\[\S+\].+<[^>]+\s?\/>|<[^>]+\s?\/>/m","",$text);
Related
This question already has answers here:
Converting <br /> into a new line for use in a text area
(6 answers)
Closed 5 years ago.
I have a text with <br> tags and I want to save it into MySQL database as a new line. not HTML tags.
for example :
$string = 'some text with<br>tags here.'
and I want to save it into MySQL like this :
some text with
tags here
what right str_replace for this purpose? thank you.
There is already a function in PHP that converts a new line to a br called nl2br(). However, the reverse is not true. Instead you can create your own function like this:
function br2nl($string)
{
$breaks = array("<br />","<br>","<br/>");
return str_ireplace($breaks, "\r\n", $string);
}
Then whenever you want to use it, just call it as follows:
$original_string = 'some text with<br>tags here.';
$good_string = br2nl($original_string);
There are three things worth mentioning:
It may be better to store the data in the database exactly as the user entered it and then do the conversion when you retrieve it. Of course this depends what you are doing.
Some systems such as Windows use \r\n. Some systems such as Linux and Mac use \n. Some systems such as older Mac systems user \r for new line characters. Given this and especially if you choose to use point 1. above, you might prefer to use the PHP constant PHP_EOL instead of \r\n. This will give the correct new line character no matter what system you are on.
The method I posted above will be more efficient than preg_replace. However, it does not take into account non-standard HTML such as <br /> and other variations. If you need to take into account these variations then you should use the preg_replace() function. With that said, one can overthink all the possible variations and yet still not account for them all. For example, consider <br id="mybreak"> and many other combinations of attributes and white space.
You could use str_replace, as you suggest.
$string = 'some text with<br>tags here.';
$string = str_replace('<br>', "\r\n", $string);
Although, if your <br> tags may also be closed, <br /> or <br/>, it may be worth considering using preg_replace.
$string = 'some text with<br>tags here.';
$string = preg_replace('/<br(\s+\/)?>/', "\r\n", $string);
Here try this. This will replace all <br> to \r\n.
$string = 'some text with<br>tags here.';
str_replace("<br>","\r\n",$string);
echo $string;
Output:
some text with
tags here.
You can use htmlentities— Convert all HTML characters to entities and html_entity_decode to Convert HTML entities to characters
$string = 'some text with<br>tags here'
$a = htmlentities($string);
$b = html_entity_decode($a);
echo $a; // some text with<br>tags here
echo $b; // some text with<br>tags here
Try :
mysql_real_escape_string
function safe($value){
return mysql_real_escape_string($value);
}
I have this message (without the quotes, it's just for being precise):
"hey, here i am<br /><br />
"
Note the white space after the line break. So here's the thing: I'm trying to remove all the invisible chars and the <br /> of the message, all of them at the end of the message, with a regex to have something like "hey, here I am". But I must do something wrong because I can't make it work. That's what I tried:
$content = preg_replace('{(<br(\s*/)?>| |\r\n|\r|\n| )+$}i', '', $content);
But the message remains the same at the end. Must be something simple I missed. Thank you for your help!
You don't need a regular expression to do that. Use the strip tags function to remove the tags.
$str = 'hey, here i am<br /><br />';
echo strip_tags($str);//yields hey, here i am
Don't try to write your own regular expressions to parse HTML when you have tools that already do it. Sometimes it's necessary depending on case, but in your case I would say it isn't. Just use the built in function.
You can use the following regex:
([^\s\w",](?:br\W*\s*)+)"$
Working demo
The code is:
$re = "/([^\\s\\w\\",](?:br\\W*\\s*)+)\\"$/";
$str = "\"hey, here i am<br /> test<br /><br />\n \"";
$subst = '';
$result = preg_replace($re, $subst, $str);
You should not use regular expression to do what you wanted. Take a look at this answer: RegEx match open tags except XHTML self-contained tags
Instead use strip_tags().
$str = 'hey, here i am<br /><br />';
$str=~s{(<br />|\s)*$}{}ig;
use this code this might help you
my $cnt;
$cnt = "hey, here i am<br /><br />";
$cnt =~s/(<br \/>)*//isg;
print $cnt;
output : "hey, here i am"
I'm relatively new to regex expressions and I'm having a problem with this one. I've searched this site and found nothing that works.
I want it to remove all <br /> between <div class='quote'> and </div>. The reason for this is that the whitespace is preserved anyway by the CSS and I want to remove any extra linebreaks the user puts into it.
For example, say I have this:
<div class='quote'>First line of text<br />
Second line of text<br />
Third line of text</div>
I've been trying to use this remove both the <br /> tags.
$TEXT = preg_replace("/(<div class='quote'>(.*?))<br \/>((.*?)<\/div>)/is","$1$3",$TEXT);
This works to an extent because the result is:
<div class='quote'>First line of text
Second line of text<br />
Third line of text</div>
However it won't remove the second <br />. Can someone help please? I figure it's probably something small I'm missing :)
Thanks!
If you want to clear all br-s inside only one div-block you need to first catch the content inside your div-block and then clear all your br-s.
Your regexp has the only one <br /> in it and so it replaces only one <br />.
You need something like that:
function clear_br($a)
{
return str_replace("<br />", "", $a[0]);
}
$TEXT = preg_replace_callback("/<div class='quote'>.*?<br \/>.*?<\/div>/is", "clear_br", $TEXT);
It does replace more than once, because you didn't use a 4th argument in preg_replace, so it is "without limit" and will replace more than once. It only replaced once because you specified the wrapping <div> in your regex and so it only matched your string once, because your string only has such a wrapping <div> once.
Assuming we already have:
<div class='quote'>First line of text<br />
Second line of text<br />
Third line of text</div>
we can simply do something like:
$s = "<div class='quote'>First line of text<br />\nSecond line of text<br>\nThird line of text</div>";
echo preg_replace("{<br\s*/?>}", " ", $s);
the \s* is for optional whitespaces, because what if it is <br/>? The /? is for optional / because it might be <br>. If the system entered those <br /> for you and you are sure they will be in this form, then you can use the simpler regex instead.
One word of caution is that I actually would replace it with a space, because for hello<br>world, if no space is used as the replacement text, then it would become helloworld and it merged two words into one.
(If you do not have this <div ... > ... </div> extracted already, then you probably would need to first do that using an HTML parser, say if the original content is a whole webpage (we use a parser because what if the content inside this outer <div>...</div> has <div> and </div> or even nested as well? If there isn't any <div> inside, then it is easier to extract it just using regex))
I don't get your [.*?] : You said here that you want "any charactere any number of times zero or one time". So you can simply say "any charactere any number of times" : .*
function clear_br($a){ return str_replace("<br />","",$a); }
$TEXT = preg_replace("/(<div class='quote'>.*<br \/>.*<\/div>)/",clear_br($1), $TEXT);
Otherwise that should works
You have to be careful about how you capture the div that contains the br elements. Mr. 動靜能量 pointed out that you need to watch out for nested divs. My solution does not.
<?php
$subject ="
<div>yomama</div>
<div class='quote'>First line of text<br />
Second line of text<br />
Third line of text</div>
<div>hasamustache</div>
";
$result = preg_replace_callback( '#<div[^>]+class.*quote.*?</div>#s',
function ($matches) {
print_r($matches);
return preg_replace('#<br ?/?>#', '', $matches[0]);
}
, $subject);
echo "$result\n";
?>
# is used as a regex delimiter instead of the conventional /
<div[^>]+ prevents the yomama div from being matched because it would have been with <div.*class.*quote since we have the s modifier (multiline-match).
quote.*? means a non-greedy match to prevent hasamustache</div> from being caught.
So the strategy is to match only the quote div in a string with newlines, and run a function on it that will kill all br tags.
output:
I'm trying to use preg_replace to strip out a section of code but I am having problems getting it to work right.
Code Example:
$str = '<p class="code">some string here</p>';
PHP I'm using:
$pattern = array();
$pattern[0] = '!<p class="code">!';
$pattern[1] = '!</p>!';
preg_replace($pattern,"", $str);
This strips out the code just as I want with the exception of the space between the p and class.
Returns:
some string here //notice the single space at the beginning.
I'm trying to get:
some string here //no space at the beginning.
I have been beating my head against the wall trying to find a solution. The reason I'm trying to strip it out in a chunk instead of breaking the preg_replace into pieces is because I don't want to change anything that may be in the string between the tags. Any ideas?
That does not happen for me (and it shouldn't).
It may be a space output somewhere else (use var_dump() to view the string).
You might want to look into this thread to see if you want to switch to using DOMDocument. It'll save you a great deal of headaches trying to parse through HTML.
Robust and Mature HTML Parser for PHP
test:
<?php
$str = '<p class="code">some string here</p>';
$pattern = array();
$pattern[0] = '!<p class="code">!';
$pattern[1] = '!</p>!';
$result = preg_replace($pattern,"", $str);
var_dump($result);
result:
php pregrep.php
string(16) "some string here"
seems to work just fine.
Alex I figured out where I was picking up the extra space.
I was putting that code into a text area like this:
$str = '<p class="code">some string here</p>';
$pattern = array();
$pattern[0] = '!<p class="code">!';
$pattern[1] = '!</p>!';
$strip_str = preg_replace($pattern,"", $str);
<textarea id="code_area" class="syntaxhl" name="code" cols="66" rows="5">
<?php echo $strip_str; ?>
</textarea>
This gave me my extra space but when I changed the code to:
<textarea id="code_area" class="syntaxhl" name="code" cols="66" rows="5"><?php echo $strip_str; ?></textarea>
No line spaces or breaks the extra space went away.
Why not use trim()?
$text = trim($text);
This removes white spaces around strings.
<hr>I want to remove this text.<embed src="stuffinhere.html"/>
I tried using regex but nothing works.
Thanks in advance.
P.S. I tried this: $str = preg_replace('#(<hr>).*?(<embed)#', '$1$2', $str)
You'll get a lot of advice to use an HTML parser for this kind of thing. You should do that.
The rest of this answer is for when you've decided that the HTML parser is too slow, doesn't handle ill formed (i.e. standard in the wild) HTML, or is a pain in the ass to integrate into the system you don't control. I created the following small shell script
$str = '<hr>I want to remove this text.<embed src="stuffinhere.html"/>';
$str = preg_replace('#(<hr>).*?(<embed)#', '$1$2', $str);
var_dump($str);
//outputs
string(35) "<hr><embed src="stuffinhere.html"/>"
and it did remove the text, so I'd check your source documents and any other PHP code around your RegEx. You're not feeding preg_replace the string you think you are. My best guess is your source document has irregular case, or there's whitespace between the <hr /> and <embed>. Try the following regular expression instead.
$str = '<hr>I want to remove
this text.
<EMBED src="stuffinhere.html"/>';
$str = preg_replace('#(<hr>).*?(<embed)#si', '$1$2', $str);
var_dump($str);
//outputs
string(35) "<hr><EMBED src="stuffinhere.html"/>"
The "i" modifier says "make this search case insensitive". The "s" modifier says "the [.] character should also match my platform's line break/carriage return sequence"
But use a proper parser if you can. Seriously.
I think the code is self-explanatory and pretty easy to understand since it does not use regex (and it might be faster)...
$start='<hr>';
$end='<embed src="stuff...';
$str=' html here... ';
function between($t1,$t2,$page) {
$p1=stripos($page,$t1);
if($p1!==false) {
$p2=stripos($page,$t2,$p1+strlen($t1));
} else {
return false;
}
return substr($page,$p1+strlen($t1),$p2-$p1-strlen($t1));
}
$found=between($start,$end,$str);
while($found!==false) {
$str=str_replace($start.$found.$end,$start.$end,$str);
$found=between($start,$end,$str);
}
// do something with $str here...
$text = '<hr>I want to remove this text.<embed src="stuffinhere.html"/>';
$text = preg_replace('#(<hr>).*?(<embed.*?>)#', '$1$2', $text);
echo $text;
If you want to hard code src in embed tag:
$text = '<hr>I want to remove this text.<embed src="stuffinhere.html"/>';
$text = preg_replace('#(<hr>).*?(<embed src="stuffinhere.html"/>)#', '$1$2', $text);
echo $text;