Regex for [title](http://) links - php

I'm trying to parse links using php with a structure such as [google](http://google.com), and so far I've got this:
"/\[(.+?)\]((.+?\))/is"
The only part I can't get to work is the part with the parenthesis ')'.
Any help would be great!
[edit]
#jari - This is part of the code:
$find = array(
"#\n#",
"/[**](.+?)[**]/is",
"/\[(.+?)\]\((.+?)\)/is"
);
$replace = array(
"<br />",
"<strong>$1</strong>",
"<a href\"$1\" target=\"_blank\" rel=\"no follow\"></a>"
);
$body = htmlspecialchars($body);
$body = preg_replace($find, $replace, $body);
return $body;

The parenthesis is a special character and usually marks sub-patterns inside your expression, so you need to escape them (just as you did with the square brackets, btw):
"/\[(.+?)\]\((.+?)\)/is"

It should look something like:
\[([^\]]+)]\([^)]+\)
We are using [^x] which means, match anything that is not x.
We have used this to capture in group one everything after the [ which is not a ].
Same for the second group using [^)]+. Anything that is not a ).

Related

Regex to select url except when = is directly infront of it

I'm trying to use a regex to find and replace all URLs in a forum system. This works but it also selects anything that is within bbcode. This shouldn't be happening.
My code is as follows:
<?php
function make_links_clickable($text){
return preg_replace('!(([^=](f|ht)tp(s)?://)[-a-zA-Zа-яА-Я()0-9#:%_+.~#?&;//=]+)!i', '$1', $text);
}
//$text = "https://www.mcgamerzone.com<br>http://www.mcgamerzone.com/help/support<br>Just text<br>http://www.google.com/<br><b>More text</b>";
$text = "#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa";
echo "<b>Unparsed text:</b><br>";
echo $text;
echo "<br><br>";
echo "<b>Parsed text:</b><br>";
echo make_links_clickable($text);
?>
All urls that occur in bb-code are following up on a = character, meaning that I don't want anything that starts with = to be selected.
I basically have that working but this results in selecting 1 extra character in in front of the string that should be selected.
I'm not very familiar with regex. The final output of my code is this:
<b>Unparsed text:</b><br>
#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa<br>
<br>
<b>Parsed text:</b><br>
#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa
You can match and skip [url=...] like this:
\[url=[^\]]*](*SKIP)(?!)|(((f|ht)tps?://)[-a-zA-Zа-яёЁА-Я()0-9#:%_+.\~#?&;/=]+)
See regex demo
That way, you will only match the URLs outside the [url=...] tag.
IDEONE demo:
function make_links_clickable($text){
return preg_replace('~\[url=[^\]]*](*SKIP)(?!)|(((f|ht)tps?://)[-a-zA-Zа-яёЁА-Я()0-9#:%_+.\~#?&;/=]+)~iu', '$1', $text);
}
$text = "#Theareak We know this and [b][url=https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks]here[/url] [/b]is an explanation, we are trying to fix this asap! https://www.mcgamerzone.com/news/67/False-positive-proxy-bans-and-bot-attacks aaa";
echo "<b>Parsed text:</b><br>";
echo make_links_clickable($text);
You can use a negative lookbehind (?<!=) instead of your negated class. It asserts that what is going to be matched isn't preceded by something.
Example

How to replace b tag with # sign

I have a string that looks like this
$t="<b>vist</b>thank you for the follow.";
I am trying to remove the tag b and put an "#" instead of this tag.
I tried this
str_replace("<b></b>","#",$t);
but it doesn't replace the closing tag.
I don't know why it is not working may be there is something omitted in the code.
Try with
$search = array('&#60b&#62','&#60/b&#62');
$replace = '#';
echo str_replace($search, $replace, $t);
To replace multiple words using str_replace() function,
You can Try this
$t="<b>vist</b> thank you for the follow";
$pattern=array();
$pattern[0]="<b>";
$pattern[1]="</b>";
$replacement=array();
$replacement[0]="#";
$replacement[1]="";
echo str_replace($pattern,$replacement,$t);
View the Demo
Try with -
$t="&#60b&#62vist&#60/b&#62";
echo str_replace(array("&#60b&#62", "&#60/b&#62"),"#",$t);

PHP regular expression to replace links

I have this replace regex (it's taken from the phpbb source code).
$match = array(
'#<!\-\- ([mw]) \-\-><a (?:class="[\w-]+" )?href="(.*?)" target\=\"_blank\">.*?</a><!\-\- \1 \-\->#',
'#<!\-\- .*? \-\->#s',
'#<.*?>#s',
);
$replace = array( '\2', '', '');
$message = preg_replace($match, $replace, $message);
If I run it through a message like this
asdfafdsfdfdsfds
<!-- m --><a class="postlink" href="http://website.com/link-is-looooooong.txt">http://website.com/link ... oooong.txt</a><!-- m -->
asdfafdsfdfdsfds4324
It returns this
asdfafdsfdfdsfds
http://website.com/link ... oooong.txt
asdfafdsfdfdsfds4324
However I would like to make it into a replace function. So I can replace the link title in a block by providing the href.
I want to provide the url, new url and new title. So I can run a regex with these variables.
$url = 'http://website.com/link-is-looooooong.txt';
$new_title = 'hello';
$new_url = 'http://otherwebsite.com/';
And it would return the same raw message but with the link changed.
<!-- m --><a class="postlink" href="http://otherwebsite.com/">hello</a><!-- m -->
I've tried tweaking it into something like this but I can't get it right. I don't know how to build up the matched result so it has the same format after replacing.
$message = preg_replace('#<!\-\- ([mw]) \-\-><a (?:class="[\w-]+" )?href="'.preg_quote($url).'" target\=\"_blank\">(.*?)</a><!\-\- \1 \-\->#', $replace, $message);
You'll find that parsing HTML with regex can be a pain and get very complex. Your best bet is to use a DOM parser, like this one, and modify the links with that instead.
You need to catch the other parts in groups as well and then use them in the replacement. try something like this:
$replace = '\1http://otherwebsite.com/\3hello\4';
$reg = '#(<!-- ([mw]) --><a (?:class="[\w-]+" )?href=")'.preg_quote($url).'("(?: target="_blank")?>).*?(</a><!-- \2 -->)#';
$message = preg_replace($reg, $replace, $message);
See here.

PHP regex on weird situations

I'm trying to scrape a website using some regex. But the site isn't written in well formatted html. In fact, the html is horrible and not structured hardly at all. But I've managed to tackle most of it. The problem I'm encountering now is that in some emails, a span is wrapped around a random part of the email like so:
****.*******#g<span class="tournamenttext">mail.com</span>
************<span class="tournamenttext">#yahoo.com</span>
<span class="tournamenttext">**********#mail.com</span>
*******#gmail.com
Is there a way to retrieve the emails with all this inconsistency?
$string ='****.*******#g<span class="tournamenttext">mail.com</span>
************<span class="tournamenttext">#yahoo.com</span>
<span class="tournamenttext">**********#mail.com</span>
*******#gmail.com';
$pattern = "/<\/?span[^>]*>/";
$string = preg_replace($pattern, "", $string);
after that $string will be only mails
****.*******#gmail.com
************#yahoo.com
**********#mail.com
*******#gmail.com
Your code will be like this
$text[1]->innertext = "Where innertext contains something like: "<em>Local (Open)
Tournament.</em> ****.*******#g<span class="tournamenttext">mail.com</span>"
// Firstly clear spans
$pattern = "/<\/?span[^>]*>/";
$text[1]->innertext = preg_replace($pattern, "", $text[1]->innertext);
// Preg Match mail
$email_regex = "^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$"; // Just an example email match regex
preg_match($email_regex, $text[1]->innertext, $theMatch);
echo '<pre>' . print_r($theMatch, true) . '</pre>';
You could simply remove all span tags by replacing </?span[^>]*> with nothing and try your favourite email address finder on the result.

Smiley Replace within CDATA of an HTML-String

i have got a simple problem :( I need to replace text smilies with the according smiley-image. ok.. thats not really complex, but now i have to replace only smilie appereances outside of HTML Tags. short examplae:
Text:
Thats a good example :/ .. with a link inside.
i want to replace ":/" with the image of this smiley...
ok, how to do that the best way?
I won't try to create some super script but think about it.... smilies are just about always surrounded by spaces. So str replace ' :/ ' with the smiley. You could be saying "what about a smiley at the end of a sentence(where it would be used the most)". Well just check for at least one space on either the left or the right of a potential smiley.
Using the above scripts:
$smiley_array = array(
":) " => "<a href...>",
" :)" => "<a href...>",
":/ " => "<a href...>",
" :/" => "<a href...>");
$codes = array_keys($smiley_array);
$links = array_values($smiley_array);
$str = str_replace($codes, $links, $str);
If you rather not have to type everything twice you can generate the array from a single smiley array.
Why don't you just try to use some special chars around your smiley text like this maybe -:/-
This will make your smiley text some kind of unique and easy to recognize
Use preg_replace with a lookbehind assertion. Example:
$smileys = array(
':/' => '<img src="..." alt=":/">'
);
foreach ($smileys as $smile => $img) {
$text = preg_replace('#(?<!<[^<>]*)' . preg_quote($smile, '#') . '#',
$img, $text);
}
The regex should match only smileys that are not inside angle brackets. This might be slow if you have a lot of false positives.
I wouldn't know about the best way, only the way I would do it.
Build an array having the smiley codes as the keys and the link as the value. The use str_replace. Pass as "needle" an array of the keys (the smiley codes) and as "replace" an array of the values.
For instance, suppose you have something like this:
$smiley_array = array(":)" => "<a href...>",
":(" => "<a href=....>");
$codes = array_keys($smiley_array);
$links = array_values($smiley_array);
$str = str_replace($codes, $links, $str);
EDIT: In case this could accidentally replace other instances with smiley-links you should consider using regexes with preg_replace. Obviously preg_replace is slower than str_replace.
You can use regex, or the extra sloppy version of the above:
$smiley_array = array(":)" => "<a href...>",
":(" => "<a href=....>");
$codes = array_keys($smiley_array);
$links = array_values($smiley_array);
$str = str_replace("://", "%%QF%%", $str);
$str = str_replace($codes, $links, $str);
$str = str_replace("%%QF%%", "://", $str);
Actually, assuming str_replace follows the array sorting...
this should work:
$smiley_array = array("://" => "%%QF%%", ":)" => "<a href...>",
":(" => "<a href=....>", "%%QF%%" => "://");
$codes = array_keys($smiley_array);
$links = array_values($smiley_array);
$str = str_replace($codes, $links, $str);
Possible overkill (increased cpu/load), but 99.99999999% safe:
<?php
$n = new DOMDocument();
$n->loadHTML('<p>Thats a good example :/ .. with a link inside.</p>');
$x = new DOMXPath($n);
$instances = $x->query('//text()[contains(.,\':/\')]');//or use '//*[child::text()]' for all textnodes
foreach($instances as $node){
if($node instanceof DOMText && preg_match_all('/:\//',$node->wholeText,$matches,PREG_OFFSET_CAPTURE|PREG_SET_ORDER)){
foreach($matches[0] as $match){
$newnode = $node->splitText($match[1]);
$newnode->replaceData(0,strlen($match[0]),'');
$img = $n->createElement('img');
$img->setAttribute('src','smily.gif');
$img = $newnode->parentNode->insertBefore($img,$newnode);
//var_dump($match);
}
}
}
var_dump($n->saveHTML());
?>
But in reality you do not want to do this all that often, save once, show many, if you are letting users edit the html (beit in wysiwyg or elsewise, the 'return' transformation (img to text) is a whole lot lighter. Up to you to expand with different smilies (one monster regex to match them, or several smaller ones / strstr()'s for readability, and a array for smiley to src (e.g. array(':/'=>'frown.gif')) would be the way to go.

Categories