preg_replace with URL problem - php

I use a preg_replace to auto insert HTML links within paragraphs.
Here's what I currently use:
$pattern = "~(?!(?:[^<\[]+[>\]]|[^>\]]+<\/a>))(".preg_quote($find_keyword, '/').")\b~msUi";
$replacement = "\$0";
$article_content = preg_replace($pattern, $replacement, stripslashes($article_content), 1, $added );
It works great, except 1 problem:
It doesn't match and replace if the keyword is a URL.
If: $find_keyword="http://www.mysite.com/" it won't come up with any matches even though it's in the content.
I already tried escaping $find_keyword with preg_quote, which didn't make any different.
Any regex experts know a solution? Thanks.

The forward slashes in your $find_keywords are not escaped which is breaking the pattern.

You can run your find_keyword through
$find_keyword=preg_quote("http://www.mysite.com/", '/');
http://www.php.net/manual/en/function.preg-quote.php

Related

preg_replace_callback pattern issue

I'm using the following pattern to capture links, and turn them into HTML friendly links. I use the following pattern in a preg_replace_callback and for the most part it works.
"#(https?|ftp)://(\S+[^\s.,>)\];'\"!?])#"
But this pattern fails when the text reads like so:
http://mylink.com/page[/b]
At that point it captures the [/b amusing it is part of the link, resulting in this:
woodmill.co.uk[/b]
I've look over the pattern, and used some cheat sheets to try and follow what is happening, but it has foxed me. Can any of you code ninja's help?
Try adding the open square bracket to your character class:
(\S+[^\s.,>)[\];'\"!?])
^
UPDATE
Try this more effective URL regex:
^(https?://)?([\da-z\.-]+)\.([a-z\.]{2,6})([/\w \.-]*)*/?$
(From: http://net.tutsplus.com/tutorials/other/8-regular-expressions-you-should-know/)
I have no experience directly with PHP regular expressions, but the above is simple and generic enough that I wouldn't expect any problems. You may want to modify it some to extract just the domain, like you seem to be with your current regex.
Ok I solved the problem. Thanks to #Cyborgx37 and #MikeBrant for your help. Here's the solution.
Firstly I replaced my regexp pattern with the one that João Castro used in this question: Making a url regex global
The problem with that pattern is it captured any trailing dots at the end, so in the final section of the pattern I added ^. making the final part look like so [^\s^.]. As I read it, do not match a trailing space or dot.
This still caused an issue matching bbcode as I mentioned above, so I used preg_replace_callback() and create_function() to filter it out. The final create_function() looks like this:
create_function('$match','
$match[0] = preg_replace("/\[\/?(.*?)\]/", "", $match[0]);
$match[0] = preg_replace("/\<\/?(.*?)\>/", "", $match[0]);
$m = trim(strtolower($match[0]));
$m = str_replace("http://", "", $m);
$m = str_replace("https://", "", $m);
$m = str_replace("ftp://", "", $m);
$m = str_replace("www.", "", $m);
if (strlen($m) > 25)
{
$m = substr($m, 0, 25) . "...";
}
return "$m";
'), $string);
Tests so far are looking good, so I'm happy it is now solved.
Thanks again, and I hope this helps someone else :)

PHP preg_replace certrain strings

I have several of these tags in a text document (stored in $content) with a lot of other content as well:
[[tag:author|id:6329]]
And I've been trying to replace all of these tags with a given id (in this case 6329) with another string.
$id = 6329;
$replacement = $this->format($id);
$pattern = "/(\[\[tag:author|id:".$id."\]\])/";
preg_replace($pattern, $replacement, $content);
Although this does not seem to work, and I've been tearing my hair out for the last couple of hours. Can anyone see the error? Many thanks.
You need to escape | too, and not necessary to wrap it with ().
Try:
$pattern = "/\[\[tag:author\|id:".$id."\]\]/";

preg_replace not matching properly

I know this has been asked before as ive just been reading those answers but still cant get this to work (properly).
Im very new to regex and am trying to do something that sounds pretty simple:
The string would be:
http://www.something.com/section/filter/colour/red-#998682/size/small/
What i would like to do is a preg_replace to remove the -#?????? so the url looks like:
http://www.something.com/section/filter/colour/red/size/small/
So i tried:
$string = $theURL;
$pattern = '/-\#(.*)\//i';
$replacement = '/';
$newURL = preg_replace($pattern, $replacement, $string);
That sort of works but it doesnt stop. If I have anything after the -#?????? it also removes that as well. But I thought having the / on the end would stop it doing that?
Hoping someone can help and thanks for reading
PCRE is greedy by default, meaning that .* will match as big a chunk as possible. Make it ungreedy by adding the U flag (for the entire pattern) or use .*? (for just that wildcard part):
/-\#(.*)\//iU
or
/-\#(.*?)\//i
You need to use non-greedy quantifier.
$pattern = '/-\#(.*?)\//i';
Your regex is greedy, which means that (.*)\/ looks for the last slash, not the first one.
demo
(.*) pattern is gready, which means it'll match as many characters as possible. To match everything to the first slash use (.*?):
$pattern = '/-\#(.*?)\//i';

PHP regex title conversion / negative look ahead / toLowerCase

I'm trying to convert some titles in my html pages to <h2>. The pattern is simple.
<?php
$test = "<p><strong>THIS IS A TEST</strong></p><div>And this is Random STUFF</div><p><strong>CP</strong></p>";
$pattern = "/<p><strong>([A-Z ]*?)<\/strong><\/p>/";
$replacement = "<h2>$1</h2>";
$test = preg_replace($pattern, $replacement, $test);
?>
Basically, grab anything that's between <p><strong></strong></p> that is capitalized. Easy enough, so here's the complicated bit.
Firstly, I need to make a single exception. <p><strong>CP</strong></p> must not be converted to <h2>. I tried adding ?!(CP) right after the <p><strong> but it doesn't work.
Secondly, I need to be able to make the first letter capitalized. When I use "ucfirst" with "strtolower" on the preg_replace (ex:ucfirst(strtolower(preg_replace($pattern, $replacement, $test)));), it makes all the characters in the string to lowercase and ucfirst doesn't work as it's detecting "<" to be the first character.
Any hints or am I even going in the right direction?
EDIT
Thanks for the help, it was definitely better to use preg_replace_callback. I found that all my titles were more than 3 characters so I added the limiter. Also added special characters.
Here's my final code:
$pattern = "/<p><strong>([A-ZÀ-ÿ0-9 ']{3,}?)<\/strong><\/p>/";
$replacement = "<h2>$1</h2>";
$test[$i] = preg_replace_callback($pattern, create_function('$matches', 'return "<h2>".ucfirst(mb_strtolower($matches[1]))."</h2>";'), $test[$i]);
Try http://php.net/manual/de/function.preg-replace-callback.php .
You can create a custom function that is called on every match. In this function you can decide to a) not replace CP and b) to not put $1, but ucfirst.
Hope this helps & good luck.

PHP replace string help

i am designing a site with a comment system and i would like a twitter like reply system.
The if the user puts #a_registered_username i would like it to become a link to the user's profile.
i think preg_replace is the function needed for this.
$ALL_USERS_ROW *['USERNAME'] is the database query array for all the users and ['USERNAME'] is the username row.
$content is the comment containing the #username
i think this should not be very hard to solve for someone who is good at php.
Does anybody have any idea how to do it?
$content = preg_replace( "/\b#(\w+)\b/", "http://twitter.com/$1", $content );
should work, but I can't get the word boundary matches to work in my test ... maybe dependent on the regex library used in versions of PHP
$content = preg_replace( "/(^|\W)#(\w+)(\W|$)/", "$1http://twitter.com/$2$3", $content );
is tested and does work
You want it to go through the text and get it, here is a good starting point:
$txt='this is some text #seanja';
$re1='.*?'; # Non-greedy match on filler
$re2='(#)'; # Any Single Character 1
$re3='((?:[a-z][a-z]+))'; # Word 1
if ($c=preg_match_all ("/".$re1.$re2.$re3."/is", $txt, $matches))
{
$c1=$matches[1][0];
$word1=$matches[2][0]; //this is the one you want to replace with a link
print "($c1) ($word1) \n";
}
Generated with:
http://www.txt2re.com/index-php.php3?s=this%20is%20some%20text%20#seanja&-40&1
[edit]
Actually, if you go here ( http://www.gskinner.com/RegExr/ ), and search for twitter in the community tab on the right, you will find a couple of really good solutions for this exact problem:
$mystring = 'hello #seanja #bilbobaggins sean#test.com and #slartibartfast';
$regex = '/(?<=#)((\w+))(\s)/g';
$replace = '$1$3';
preg_replace($regex, $replace, $myString);
$str = preg_replace('~(?<!\w)#(\w+)\b~', 'http://twitter.com/$1', $str);
Does not match emails. Does not match any spaces around it.

Categories