PHP preg_replace needs improvement

PHP preg_replace needs improvement - php

Ok, I have a string as follows:
$disallowedBBC = 'abbr|acronym|anchor|bdo|black|blue|br|color|email|flash|font|ftp|glow|green|html|hr|img|iurl|li|list|ltr|url|quote';
And than a preg_replace on the actual string ($message variable) that should get rid of all bbc code that is not allowed according to the $disallowedBBC variable:
$message = preg_replace("/\[($disallowedBBC)[^]]*](.*?)\[\/($disallowedBBC)\]/is", "$2", $message);
But, for some reason, the [hr] tag is getting past this preg_replace. So, in this case:
$message = '[hr]Test';
It outputs the [hr] tag, but should remove it. What is wrong with my regex?
Basically...
How to change it so that it removes all [hr] and/or [hr]Test[/hr] altogether? But would also need to get rid of instances where [url=http://someurl.com]Some Url[/url]. And it should remove [color=red] from a string as follows: [color=red]Testing
For example, it needs to get rid of [{tag}] and if it has a closing tag [/{tag}], but if there is no closing tag, it needs to get rid of the opening tag, and vice versa. It should be able to capture anything within the {tag} that is within the brackets as well, such as: [quote author=Solomon time=7834783470]Just a quote here[/quote] Additional text here...
So, this should output: Just a quote here Additional text here...

I think you need two preg_replaces. One to first get rid of the pairs of [hr]...[/hr] and then a second to get rid of any remaining [hr]...
$message = preg_replace("/\[($disallowedBBC)[^\]]*](.*?)\[\/($disallowedBBC)\]/is", "$2", $message);
$message = preg_replace("/\[($disallowedBBC)[^\]]*]/is", "", $message);
I you try to do it in one step, then things like "abc[br]blahblah[hr]gak[/hr]def" will become "abcdef". You might be able to do it if you can place restrictions on the blahblah portion.
You can, of course, combine these into one preg_replace call by using the array syntax (but remember that the order matters):
$patterns = array("/\[($disallowedBBC)[^\]]*](.*?)\[\/($disallowedBBC)\]/is",
"/\[($disallowedBBC)[^\]]*]/is", );
$replacements = array("$2", "");
$message = preg_replace($patterns, $replacements, $message);

Related

strip_tags php removes too much

There's a little trouble with a malformed string including html (see the '' at the beginning and the <'blabla) and the function strip_tags() from PHP.
I've this code:
$str = "To: ''blablal#johndoe.com' <'blablal#johndoe.com>\nSubject: Hello World\nDear Ladies <b>and</b> Gentlemen,";
echo strip_tags($str);
With the following outout:
To: ''blablal#johndoe.com'
My wanted/expected result is:
To: ''blablal#johndoe.com'
Subject: Hello World
Dear Ladies and Gentlemen,
Do you have any idea to get this?

If stip_tags() didn't work as you expect, try this one.
$str = "To: ''blablal#johndoe.com' <'blablal#johndoe.com>\nSubject: Hello World\nDear Ladies <b>and</b> Gentlemen,";
$val = preg_replace('/<[^>]+?>/', ' ', $str);
Your $val contains string without html values

Because strip_tags() does not actually validate the HTML, partial or broken tags can result in the removal of more text/data than expected.
See the PHP manual:
Your case is the invalid HTML. The HTML validator says
Bad character ' after <. Probable cause: Unescaped <. Try escaping it as <.

The reason is that when stripper finds a < that follows a non-whitespace character it assumes that it is inside a tag. Then inside a tag if it sees a quotation mark it sets a flag (in_q) then looks for a closing pair. If it finds a closing pair it unsets in_q flag (in_q = 0;) but if it doesn't find it then it assumes it is still inside quotes and consumes everything up to the end and removes it from the output.
If you have such malformed tags in your input string you'd better use regular expressions instead:
preg_replace('~<\S[^<>]*>~', '', $str);
See live demo here

Remove information from string using php

I am trying to remove the following type of info from a string using php :
[url:2q57noz9]http://www.mysite.com/other/screencaps-from-ddd-t7099.html#p24174[/url:2q57noz9]
there are random numbers assigned to the [url: bit which makes it harder. I tried to adapt the following which works for image tags but I don't think it likes square brackets put in like I have. This is what I used for images :
$message = preg_replace(array("/<img[^>]+\>/i","/<!--[^>]+\-->/i"), "", $message);
and this is how I tried to modify it without success :
$message = preg_replace("/[[^>]+\]/i", "", $message);

add backslash before brackets
$message = preg_replace('/\[[^>]+\]/i', "", $message);
and use single Quotation for holding string

PHP str_replace not working correctly

I'm using str_replace and it's not working correctly.
I have a text area, which input is sent with a form. When the data is received by the server, I want to change the new lines to ",".
$teams = $_GET["teams"];
$teams = str_replace("\n",",",$teams);
echo $teams;
Strangely, I receive the following result
Chelsea
,real
,Barcelona
instead of Chealsea,real,Barcelona.
What's wrong?

To expand on Waage's response, you could use an array to replace both sets of characters
$teams = str_replace(array("\r\n", "\n"),",",$teams);
echo $teams;
This should handle both items properly, as a single \n is valid and would not get caught if you were just replacing \r\n

Try replacing "\r\n" instead of just "\n"

I would trim the text and replace all consecutive CR/LF characters with a comma:
$text = preg_replace('/[\r\n]+/', ',', trim($text))

I had the same issue but found a different answer so thought I would share in case it helps someone.
The problem I had was that I wanted to replace \n with <br/> for printing in HTML. The simple change I had to make was to escape the backslash in str_replace("\n","<br>",($text)) like this:
str_replace("\\n","<br>",($text))

bbcode style tags with preg

Okay, I've been working through a set of string replacments for bbcode style tags in my forum, replacing [b] and [i] etc is fairly simple as I can replace them directly without issue.
There are two tags that are giving me problems, as what I need to do with them is more complex. [quote] and [url] are fine, but, I would like to give users the choice of [quote=person_to_quote] and [url=URL]link text[/url], the [quote=] tag needs to be able to be nested too!
So I need to be able to replace the opening tag [quote= then retain the string add the ] show the quote and then end the [/quote]. I can replace the tags wholesale and retain the =person fine, but that is done by me cheating and simple adding the end tag to the text. What I'd really like to be able to do is pull everything between the = and ] store it so I can manipulate it separately.
Currently I'm using an array and simplye replacing the inline text thus:
"[quote=" is replaced by "<span class=\"quote\">[Quote: ",
and just slapping the non-replaced text back on the end of it, that's ugly though. What I'd like to be able to do is take the code saying [quote=person]some text here[/quote] and turn it into:
"[quote=" is replaced by "<span class=\"quote\">$person says: ",
where $person would be a variable storing the person's name so it can be replaced dynamically.
Similarly with the URLs I'd like to replace [url=link]link text[/url] and make it able to accept the url and replace it inline so the output is:
"[url=" is replaced by "<a href=$URL>"
with the html a tag already closed, which means stripping the url out, storing it then replacing it after.
So what method do I use to remove the text between = and the closing ] tags so what I pass into the replace array can be passed out and modified accordingly. Also I'm not worried about nesting in the quotes as the span class styling takes care of that, but I do need a function that can deal with any number of quote tags!.Thoughts please.
EDIT:
Just an update, I've solved the things I wanted to do, I modified the code webbiedave gave me and it works:
$output = preg_replace_callback(
'/\[quote=([^\]]+)?\]/',
create_function(
'$matches',
'return \'<span class="quote">\'.$matches[1].\' says: \';'
),
$comment);
then the close tag is picked up through my normal tag replace search afterwards anyway.

Try preg_replace_callback:
$output = preg_replace_callback(
'/\[quote=([^\]]+)?\]([^\[]+)?\[\/quote\]?/',
create_function(
'$matches',
'return \'"[quote=" is replaced by "<span class=\"quote\">\'.$matches[1].\' says: ",\';'
),
'[quote=person]some text here[/quote]'
);

Highlite words from searchstring

I wrote a little search script for a client, it works and words get highlited, BUT...
Imagine this situation:
search term: test
found result: Hello this is a test
In this example both 'test' in the href part and between the <a> tags get highlited, breaking the link.
How could I prevent this?
Edit:
So this is what I need: A regex replace function that replaces all matched search strings EXCEPT the ones that are located inside a href attribute

You can not parse XML with regular expressions. :( If you want a dirty regex solution that still works in many cases you may try this regex.
">[^<]*?(test)"
First you look for a tag closing brace and than you make sure that no other tag is opened in between.
Ideally you want to parse HTML and replace only the textual parts of it.

Got it!
$body = $row['body'];
$pattern = "/".$search_string."(?!([^<]+)?>)/i";
$replacement = "<strong class='highlite'>".$search_string."</strong>";
$altered_body = preg_replace($pattern, $replacement, $body);
print($altered_body);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP preg_replace needs improvement - php

Related

strip_tags php removes too much

Remove information from string using php

PHP str_replace not working correctly

bbcode style tags with preg

Highlite words from searchstring

Categories

Resources