PHP regular expression to replace links - php

I have this replace regex (it's taken from the phpbb source code).
$match = array(
'#<!\-\- ([mw]) \-\-><a (?:class="[\w-]+" )?href="(.*?)" target\=\"_blank\">.*?</a><!\-\- \1 \-\->#',
'#<!\-\- .*? \-\->#s',
'#<.*?>#s',
);
$replace = array( '\2', '', '');
$message = preg_replace($match, $replace, $message);
If I run it through a message like this
asdfafdsfdfdsfds
<!-- m --><a class="postlink" href="http://website.com/link-is-looooooong.txt">http://website.com/link ... oooong.txt</a><!-- m -->
asdfafdsfdfdsfds4324
It returns this
asdfafdsfdfdsfds
http://website.com/link ... oooong.txt
asdfafdsfdfdsfds4324
However I would like to make it into a replace function. So I can replace the link title in a block by providing the href.
I want to provide the url, new url and new title. So I can run a regex with these variables.
$url = 'http://website.com/link-is-looooooong.txt';
$new_title = 'hello';
$new_url = 'http://otherwebsite.com/';
And it would return the same raw message but with the link changed.
<!-- m --><a class="postlink" href="http://otherwebsite.com/">hello</a><!-- m -->
I've tried tweaking it into something like this but I can't get it right. I don't know how to build up the matched result so it has the same format after replacing.
$message = preg_replace('#<!\-\- ([mw]) \-\-><a (?:class="[\w-]+" )?href="'.preg_quote($url).'" target\=\"_blank\">(.*?)</a><!\-\- \1 \-\->#', $replace, $message);

You'll find that parsing HTML with regex can be a pain and get very complex. Your best bet is to use a DOM parser, like this one, and modify the links with that instead.

You need to catch the other parts in groups as well and then use them in the replacement. try something like this:
$replace = '\1http://otherwebsite.com/\3hello\4';
$reg = '#(<!-- ([mw]) --><a (?:class="[\w-]+" )?href=")'.preg_quote($url).'("(?: target="_blank")?>).*?(</a><!-- \2 -->)#';
$message = preg_replace($reg, $replace, $message);
See here.

Related

parsing multiple lists in bbcode?

Hi all I have a very simple bbcode parsing system, it is currently having problems with lists within lists.
My code:
$find = array(
'/\[list\](.*?)\[\/list\]/is',
'/\[\*\](.*?)(\n|\r\n?)/is',
'/\[ul\](.*?)\[\/ul\]/is',
'/\[li\](.*?)\[\/li\]/is'
);
$replace = array(
'<ul>$1</ul>',
'<li>$1</li>',
'<ul>$1</ul>',
'<li>$1</li>'
);
$body = preg_replace($find, $replace, $body);
The problem is when you have another list inside the li tags it then completely fails to parse, screenshot showing:
This is how it should look:
I know my code is probably too simple for it but how do i adjust it so it can parse a list within a list item?
Rather than using Regular Expressions you have a couple of options..
Use PHP's BBCode Parsing extension
Do a much simpler replacement, ie. straight up replace [ul] with <ul> etc.
I'm not saying it can't be done with Regex, just that it's not the simplest option.
Here's a still-regex based replacement:
$body = '[ul][li]test[/li][li]test[/li][li]test[ul][li]lol[/li][/ul][/li][li]hehe[/li][/ul]';
$find = array(
'/\[(\/?)list\]/i',
'/\[\*\](.*?)(\n|\r\n?)/i',
'/\[(\/?)ul\]/i',
'/\[(\/?)li\]/i'
);
$replace = array(
'<$1ul>',
'<li>$1</li>',
'<$1ul>',
'<$1li>'
);
$body = preg_replace($find, $replace, $body);

Regex for [title](http://) links

I'm trying to parse links using php with a structure such as [google](http://google.com), and so far I've got this:
"/\[(.+?)\]((.+?\))/is"
The only part I can't get to work is the part with the parenthesis ')'.
Any help would be great!
[edit]
#jari - This is part of the code:
$find = array(
"#\n#",
"/[**](.+?)[**]/is",
"/\[(.+?)\]\((.+?)\)/is"
);
$replace = array(
"<br />",
"<strong>$1</strong>",
"<a href\"$1\" target=\"_blank\" rel=\"no follow\"></a>"
);
$body = htmlspecialchars($body);
$body = preg_replace($find, $replace, $body);
return $body;
The parenthesis is a special character and usually marks sub-patterns inside your expression, so you need to escape them (just as you did with the square brackets, btw):
"/\[(.+?)\]\((.+?)\)/is"
It should look something like:
\[([^\]]+)]\([^)]+\)
We are using [^x] which means, match anything that is not x.
We have used this to capture in group one everything after the [ which is not a ].
Same for the second group using [^)]+. Anything that is not a ).

Removal of bad hyperlinks and the content inside of them

Ok, basically I have an array of bad urls and I would like to search through a string and strip them out. I want to strip everything from the opening tag to the closing tag, but only if the url in the hyperlink is in the array of bad urls. Here is how I would picture it working but I don't understand regular expressions well.
foreach($bad_urls as $bad_url){
$pattern = "/<a*$bad_url*</a>/";
$replacement = ' ';
preg_replace($pattern, $replacement, $content);
}
Thanks in advance.
Assuming that your 'bad urls' are properly formatted URLs, I would suggest doing something like this:
foreach($bad_urls as $bad_url){
$pattern = '/<[aA]\s.+[href|HREF]\=\"' . convert_to_pattern($bad_url) . '\".+<\/[aA]>/msU';
$replacement = ' ';
$content = preg_replace_all($pattern, $replacement, $content);
}
and separately
function convert_to_pattern($url)
{
searches = array('%', '&', '?', '.', '/', ';', ' ');
replaces = array('\%','\&','\?','\.','\/','\;','\ ');
return preg_replace_all($searches, $replaces, $url);
}
Please do not try to parse HTML using regular expressions. Just load up the HTML in a DOM, find all the <a> tags and check the href property. Much simpler and fool-proof.

PHP Regular Expression Text URL to HTML Link

I'm working on a project where I need to replace text urls anywhere from domain.com to www.domain.com to http(s)://www.domain.com and e-mail addresses to the proper html <a> tag. I was using a great solution in the past, but it used the now depreciated eregi_replace function. On top of that, the regular expression used for such function does not work with preg_replace.
So basically, the user inputs a message in which may/may not contain a link/e-mail address and I need a regular expression that works with preg_replace to replace that link/email with a HTML link like link.
Please note that I have multiple other preg_replaces too. Below is my current code for the other replacements being made.
$patterns = array('~\[#([^\]]*)\]~','~\[([^\]]*)\]~','~{([^}]*)}~','~_([^_]*)_~','/\s{2}/');
$replacements = array('<b class="reply">#\\1</b>','<b>\\1</b>','<i>\\1</i>','<u>\\1</u>','<br />');
$msg = preg_replace($patterns, $replacements, $msg);
return stripslashes(utf8_encode($msg));
I have created a very basic set of Regular Expressions for this. Don't expect them to be 100% reliable, and you may need to tweak them as you go.
$pattern = array(
'/((?:[\w\d]+\:\/\/)?(?:[\w\-\d]+\.)+[\w\-\d]+(?:\/[\w\-\d]+)*(?:\/|\.[\w\-\d]+)?(?:\?[\w\-\d]+\=[\w\-\d]+\&?)?(?:\#[\w\-\d]*)?)/' , # URL
'/([\w\-\d]+\#[\w\-\d]+\.[\w\-\d]+)/' , # Email
'/\[#([^\]]*)\]/' , # Reply
'/\[([^\]]*)\]/' , # Bold
'/\{([^}]*)\}/' , # Italics
'/_([^_]*)_/' , # Underline
'/\s{2}/' , # Linebreak
);
$replace = array(
'$1' ,
'$1' ,
'<b class="reply">#$1</b>' ,
'<b>$1</b>' ,
'<i>$1</i>' ,
'<u>$1</u>' ,
'<br />'
);
$msg = preg_replace( $pattern , $replace , $msg );
return stripslashes( utf8_encode( $msg ) );

PHP: STR replace by link

i have this PHP chatbox.
If i would type a link in the chatbox, it would not display it as a link.
How can i use STR replace to do this?
It should respond to stuff like 'http' 'http://' '.com' '.nl' 'www' 'www.' ....
My other STR replace lines look like these:
$bericht = str_replace ("STRING1","STRINGREPLACEMENT1",$bericht);
Someone?
Hey! Try this code (found at php.net somewhere):
function format_urldetect( $text )
{
$tag = " rel=\"nofollow\"";
//
// First, look for strings beginning with http:// that AREN't preceded by an <a href tag
//
$text = preg_replace( "/(?<!\\0", $text );
//
// Second, look for strings with casual urls (www.something.com...) and make sure they don't have a href tag OR a http:// in front,
// since that would have been caught in the previous step.
//
$text = preg_replace( "/(?<!\\0", $te
xt );
$text = preg_replace( "/(?<!\\0", $t
ext );
return $text;
}
Uhm, broken indentation. try this http://triop.se/code.txt
You should use regex instead. look at preg_replace
$regex = '`(\s|\A)(http|https|ftp|ftps)://(.*?)(\s|\n|[,.?!](\s|\n)|$)`ism';
$replace = '$1$2://$3$4'
$buffer = preg_replace($regex,$replace,$buffer);

Categories