preg_replace question - php

I'm trying to use preg_replace to strip out a section of code but I am having problems getting it to work right.
Code Example:
$str = '<p class="code">some string here</p>';
PHP I'm using:
$pattern = array();
$pattern[0] = '!<p class="code">!';
$pattern[1] = '!</p>!';
preg_replace($pattern,"", $str);
This strips out the code just as I want with the exception of the space between the p and class.
Returns:
some string here //notice the single space at the beginning.
I'm trying to get:
some string here //no space at the beginning.
I have been beating my head against the wall trying to find a solution. The reason I'm trying to strip it out in a chunk instead of breaking the preg_replace into pieces is because I don't want to change anything that may be in the string between the tags. Any ideas?

That does not happen for me (and it shouldn't).
It may be a space output somewhere else (use var_dump() to view the string).

You might want to look into this thread to see if you want to switch to using DOMDocument. It'll save you a great deal of headaches trying to parse through HTML.
Robust and Mature HTML Parser for PHP

test:
<?php
$str = '<p class="code">some string here</p>';
$pattern = array();
$pattern[0] = '!<p class="code">!';
$pattern[1] = '!</p>!';
$result = preg_replace($pattern,"", $str);
var_dump($result);
result:
php pregrep.php
string(16) "some string here"
seems to work just fine.

Alex I figured out where I was picking up the extra space.
I was putting that code into a text area like this:
$str = '<p class="code">some string here</p>';
$pattern = array();
$pattern[0] = '!<p class="code">!';
$pattern[1] = '!</p>!';
$strip_str = preg_replace($pattern,"", $str);
<textarea id="code_area" class="syntaxhl" name="code" cols="66" rows="5">
<?php echo $strip_str; ?>
</textarea>
This gave me my extra space but when I changed the code to:
<textarea id="code_area" class="syntaxhl" name="code" cols="66" rows="5"><?php echo $strip_str; ?></textarea>
No line spaces or breaks the extra space went away.

Why not use trim()?
$text = trim($text);
This removes white spaces around strings.

Related

How to add new line in php echo

The text of story content in my database is:
I want to add\r\nnew line
(no quote)
When I use:
echo nl2br($story->getStoryContent());
to replace the \r\n with br, it doesn't work. The browser still display \r\n. When I view source, the \r\n is still there and br is nowhere to be found also. This is weird because when I test the function nl2br with simple code like:
echo nl2br("Welcome\r\nThis is my HTML document");
it does work. Would you please tell me why it didn't work? Thank you so much.
The following snippet uses a technique that you may like better, as follows:
<?php
$example = "\n\rSome Kind\r of \nText\n\n";
$replace = array("\r\n", "\n\r", "\r", "\n");
$subs = array("","","","");
$text = str_replace($replace, $subs, $example );
var_dump($text); // "Some Kind of Text"
Live demo here
I doubt that you need "\n\r" but I left it in just in case you feel it is really necessary.
This works by having an array of line termination strings to be replaced with an empty string in each case.
I found the answer is pretty simple. I simply use
$text = $this->storyContent;
$text = str_replace("\\r\\n","<br>",$text);
$text = str_replace("\\n\\r","<br>",$text);
$text = str_replace("\\r","<br>",$text);
$text = str_replace("\\n","<br>",$text);

Regex detect line break

I have this message (without the quotes, it's just for being precise):
"hey, here i am<br /><br />
"
Note the white space after the line break. So here's the thing: I'm trying to remove all the invisible chars and the <br /> of the message, all of them at the end of the message, with a regex to have something like "hey, here I am". But I must do something wrong because I can't make it work. That's what I tried:
$content = preg_replace('{(<br(\s*/)?>| |\r\n|\r|\n| )+$}i', '', $content);
But the message remains the same at the end. Must be something simple I missed. Thank you for your help!
You don't need a regular expression to do that. Use the strip tags function to remove the tags.
$str = 'hey, here i am<br /><br />';
echo strip_tags($str);//yields hey, here i am
Don't try to write your own regular expressions to parse HTML when you have tools that already do it. Sometimes it's necessary depending on case, but in your case I would say it isn't. Just use the built in function.
You can use the following regex:
([^\s\w",](?:br\W*\s*)+)"$
Working demo
The code is:
$re = "/([^\\s\\w\\",](?:br\\W*\\s*)+)\\"$/";
$str = "\"hey, here i am<br /> test<br /><br />\n \"";
$subst = '';
$result = preg_replace($re, $subst, $str);
You should not use regular expression to do what you wanted. Take a look at this answer: RegEx match open tags except XHTML self-contained tags
Instead use strip_tags().
$str = 'hey, here i am<br /><br />';
$str=~s{(<br />|\s)*$}{}ig;
use this code this might help you
my $cnt;
$cnt = "hey, here i am<br /><br />";
$cnt =~s/(<br \/>)*//isg;
print $cnt;
output : "hey, here i am"

replace text in single line after specific occurence using regex

I have searched for a solution to a problem on this site but have not found a way to do this task using regex (or perhaps something just shortened that uses less memory).
I am attempting to parse a string where text after a specific pattern (& the pattern itself) is to be removed from the same line. The text prior to to the pattern and also any line not containing the search pattern should be unedited.
Here is a working example:
$text = 'This is a test to remove single lines.<br />
The line below has the open type bbcode (case insensitive) that is to be removed.<br />
The text on the same line that follows the bbcode should also be removed.<br />
this text should remain[test]this text should be removed on this line only!<br />
the other lines should remain.<br />
done.<br />';
$remove = '[test]';
$lines = preg_split('/\r?\n/', $text);
foreach ($lines as $line)
{
$check = substr($line, 0, stripos($line, $remove));
$new[] = !empty($check) ? $check . '<br />' : $line;
}
$newText = implode($new);
echo $newText;
The above code works as expected but I would like to know how to do this using regex or perhaps something that uses a lot less code and memory. I have attempted to do this using regex from examples on this site + some tinkering but have not been able to get the result that is required. The solution should also use code that is compatible with PHP 5.5 syntax (no \e modifier). Using an array for the removal pattern will also be fitting as I may need to do a search for multiple patterns (although it is not shown in my example).
Thank you.
Thanks to frightangel for showing me the proper regex pattern.
Below is the necessary code to accomplish what was asked above:
$text = 'This is a test to remove single lines.<br />
The line below has the open type bbcode (case insensitive) that is to be removed.<br />
The text on the same line that follows the bbcode should also be removed.<br />
this text should remain[test]this text should be removed on this line only!<br />
the other lines should remain.<br />
[bbc]done.<br />
[another]this line should not be affected.<br />
it works!!<br />';
$removals = array('[test]', '[bbc]');
$remove = str_replace(array('[', ']'), array('~\[', '\].*?(?=\<br\s\/\>|\\n|\\r)~mi'), $removals);
$text = preg_replace($remove, '', $text);
echo $text;
The text that it searches for actually comes from a mysql query that feeds an array so I changed what is shown above to use what will more or less be used ($removals being that array).
The only problem left for me is that if text was prior to the removal then it would be better to leave the final line break from that line instead of omitting it. It should only be omitted if all text from a single line is removed.
Try this way:
$text = 'This is a test to remove single lines.<br />
The line below has the open type bbcode (case insensitive) that is to be removed.<br />
The text on the same line that follows the bbcode should also be removed.
this text should remain[test]this text should be removed on this line only!<br />
the other lines should remain.<br />
done.<br />';
$remove = 'test';
$text = preg_replace("/\[$remove\].*(?=\<br\s\/\>)/m", '', $text);
$text = preg_replace("/^(\<br\s\/\>)|(\\n)|(\\r)$/m","",$text);
echo $text;
Here's regex explanation: http://regex101.com/r/nW1bG8
try this, and tell me if thats what you want
if not, tell me, i probably didnt understand your question
preg_replace("/\[\S+\].+<[^>]+\s?\/>|<[^>]+\s?\/>/m","",$text);

Remove javascript links

I'm looking for a regex that will be able to replace all links like Link with a warning. I've been having a play but no success so far! I've always been bad with regex, can someone point me in the right direction? I have this so far:
Edit: People saying don't use Regex - the HTML will be the output of a markdown parser with all HTML tags in the markdown stripped. Therefore i know that the output of all links will be formatted as stated above, therefore regex would surely be a good tool in this particular situation. I am not allowing users to enter pure HTML. And SO has done something very similar, try creating a javascript link, and it will be removed
<?php
//Javascript link filter test
if(isset($_POST['jsfilter'])){
$html = " JS Link ";
$pattern = "/ href\\s*?=\\s*?[\"']\\s*?(javascript)\\s*?(:).*?([\"']) /is";
$replacement = "\"javascript: alert('Javascript links have been blocked');\"";
$html = preg_replace($pattern, $replacement, $html);
echo $html;
}
?>
<form method="post">
<input type="text" name="jsfilter" />
<button type="submit">Submit</button>
</form>
The right regex should be :
$pattern = '/href="javascript:[^"]+"/';
$replacement = 'href="javascript:alert(\'Javascript links have been blocked\')"';
Use strip_tags and htmlSpecialChars() to display user generated content. If you want to let users use specific tags, refer to BBcode.
You should test quote and double quotes, handle white spaces, etc...
$html = preg_replace( '/href\s*=\s*"javascript:[^"]+"/i' , 'href="#"' , $html );
$html = preg_replace( '/href\s*=\s*\'javascript:[^i]+\'/i' , 'href=\'#\'' , $html );
Try this code. I think, this would help.
<?php
//Javascript link filter test
if(isset($_POST['jsfilter'])){
$html = " JS Link ";
$pattern = '/a href="javascript:(.*?)"/i';
$replacement = 'a href="javascript: alert(\'Javascript links have been blocked\');"';
$html = preg_replace($pattern, $replacement, $html);
echo $html;
}
?>

Remove all text between <hr> and <embed> tag?

<hr>I want to remove this text.<embed src="stuffinhere.html"/>
I tried using regex but nothing works.
Thanks in advance.
P.S. I tried this: $str = preg_replace('#(<hr>).*?(<embed)#', '$1$2', $str)
You'll get a lot of advice to use an HTML parser for this kind of thing. You should do that.
The rest of this answer is for when you've decided that the HTML parser is too slow, doesn't handle ill formed (i.e. standard in the wild) HTML, or is a pain in the ass to integrate into the system you don't control. I created the following small shell script
$str = '<hr>I want to remove this text.<embed src="stuffinhere.html"/>';
$str = preg_replace('#(<hr>).*?(<embed)#', '$1$2', $str);
var_dump($str);
//outputs
string(35) "<hr><embed src="stuffinhere.html"/>"
and it did remove the text, so I'd check your source documents and any other PHP code around your RegEx. You're not feeding preg_replace the string you think you are. My best guess is your source document has irregular case, or there's whitespace between the <hr /> and <embed>. Try the following regular expression instead.
$str = '<hr>I want to remove
this text.
<EMBED src="stuffinhere.html"/>';
$str = preg_replace('#(<hr>).*?(<embed)#si', '$1$2', $str);
var_dump($str);
//outputs
string(35) "<hr><EMBED src="stuffinhere.html"/>"
The "i" modifier says "make this search case insensitive". The "s" modifier says "the [.] character should also match my platform's line break/carriage return sequence"
But use a proper parser if you can. Seriously.
I think the code is self-explanatory and pretty easy to understand since it does not use regex (and it might be faster)...
$start='<hr>';
$end='<embed src="stuff...';
$str=' html here... ';
function between($t1,$t2,$page) {
$p1=stripos($page,$t1);
if($p1!==false) {
$p2=stripos($page,$t2,$p1+strlen($t1));
} else {
return false;
}
return substr($page,$p1+strlen($t1),$p2-$p1-strlen($t1));
}
$found=between($start,$end,$str);
while($found!==false) {
$str=str_replace($start.$found.$end,$start.$end,$str);
$found=between($start,$end,$str);
}
// do something with $str here...
$text = '<hr>I want to remove this text.<embed src="stuffinhere.html"/>';
$text = preg_replace('#(<hr>).*?(<embed.*?>)#', '$1$2', $text);
echo $text;
If you want to hard code src in embed tag:
$text = '<hr>I want to remove this text.<embed src="stuffinhere.html"/>';
$text = preg_replace('#(<hr>).*?(<embed src="stuffinhere.html"/>)#', '$1$2', $text);
echo $text;

Categories