PHP: Regex matching and replacing multiple instances of multiple matches being duplicated - php

I'm looking to write a shortcode system for a gaming community/database, where users can add things like ((Magical Sword)) to their content, and it'll be parsed into a nice link to relevant item with an inline thumbnail image.
Here's the code I'm using so far:
function inlineItems($text) {
$re = "/\(\(([^)]+)\)\)/m";
preg_match_all($re, $text, $matches, PREG_SET_ORDER, 0);
foreach($matches as $match) {
$slug = makeSlug($match[1]);
$item = getItem($slug);
if($item) {
$text = preg_replace($match[0], '<a class="text-item" data-tooltip="tooltip-item-' . $item->slug . '" href="/items/' . $item->slug .'"><img src="/images/items/' . $item->slug .'.png">' . $item->name .'</a>', $text);
}
}
$text = str_replace("((", "", $text);
$text = str_replace("))", "", $text);
return $text;
}
Example output, if a user entered ((Crystal Sword)) would be:
<a class="text-item" data-tooltip="tooltip-item-crystal-sword" href="/items/crystal-sword"><img src="/images/items/crystal-sword.png">Crystal Sword</a>
So far so good, everything works great.
However, an issue occurs when a particular match is repeated multiple times in one text string.
If a user enters something like: A ((Crystal Sword)) is essential for farming, get a ((Crystal Sword)) as soon as you can. ((Crystal Sword)) is the best! then the replacement matches the item name multiple times, and ends up with a mess like this:
<a class="text-item" data-tooltip="tooltip-item-crystal-sword" href="/items/crystal-sword"><img src="/images/items/crystal-sword.png"></a><a class="text-item" data-tooltip="tooltip-item-crystal-sword" href="/items/crystal-sword"><img src="/images/items/crystal-sword.png"></a><a class="text-item" data-tooltip="tooltip-item-crystal-sword" href="/items/crystal-sword"><img src="/images/items/crystal-sword.png">Crystal Sword</a>
How do I prevent it from overlapping matches like this?

Your code is pretty messy. You don't need all those replaces all around, one is enough. Follow the KISS principle:
<?php
function inlineItems($text) {
$re = "/\(\((.+?)\)\)/m";
return preg_replace_callback($re, function($matches){
$item = getItem( makeSlug($matches[1]) );
return "<a class='text-item' data-tooltip='tooltip-item-{$item->slug}' href='/items/{$item->slug}'>
<img src='images/items/{$item->slug}.png'>
{$item->name}
</a>";
}, $text);
}
print inlineItems('A ((Crystal Sword)) is essential for farming, get a ((Crystal Sword)) as soon as you can. ((Crystal Sword)) is the best!');

Related

Problems with Regular expressions in PHP

(I want to let my users tag other users with their names, problem: when someone edits his post again, he gets the link in his tinymce editor. when he saves his edits, the script will destroy the old link...)
I replace all words in a big string with words included in an array.
$users = {'this', 'car'}
$text = hello, this is <a title="this" href="">a test this</a>
$search = '!\b('.implode('|', $users).')\b!i';
$replace = '<a target="_blank" alt="$1" href="/user/$1">$1</a>';
$text = preg_replace($search, $replace, $text);
as you can see above, I try to replace 'this' and 'car' in $text with
<a target="_blank" alt="$1" href="/user/$1">$1</a>
the problem is, that my script also replaces 'this', when it's in my link:
<a title="this" href="">this</a>
im not completely sure, but I think, you know what I mean.
so my script destroys my links...
I don't need to detect, if the word is in a html element, because it should be able to replace words in other tags like h1 or p ...
I need something like
a pattern, which only matches, when the word looks like:
" this "
" this, "
",this "
" this: "...
(no problem, if i have to set these manually...)
another great solution: a string, where I can set the html tags which are not allowed.
$tags = 'a,e,article';
Greets
This should do it
<.*?this.*?>(*SKIP)(*FAIL)|\b(this)\b
Demo: https://regex101.com/r/fX0pT1/1
More on this regex approach, http://www.rexegg.com/regex-best-trick.html.
PHP Usage:
$users = array('this', 'car');
$text = 'hello, this is <a title="this" href="">a test this</a>';
$terms = '(' . implode('|', $users) . ')';
$search = '!<.*?'.$terms.'.*?>(*SKIP)(*FAIL)|\b(' . $terms . ')\b!i';
echo $search;
$replace = '<a target="_blank" alt="$2" href="/user/$2">$2</a>';
echo preg_replace($search, $replace, $text);
Output:
hello, <a target="_blank" alt="" href="/user/this">this</a> is <a title="this" href="">a test <a target="_blank" alt="" href="/user/this">this</a></a>
PHP Demo: https://eval.in/415964
...or if you only want it for links, https://regex101.com/r/fX0pT1/2, <a.*?this.*?>(*SKIP)(*FAIL)|\b(this)\b.

replace link with another

I'm struggling on replacing text in each link.
$reg_ex = "/(http|https)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
$text = '<br /><p>this is a content with a link we are supposed to click</p><p>another - this is a content with a link we are supposed to click</p><p>another - this is a content with a link we are supposed to click</p>';
if(preg_match_all($reg_ex, $text, $urls))
{
foreach($urls[0] as $url)
{
echo $replace = str_replace($url,'http://www.sometext'.$url, $text);
}
}
From the code above, I'm getting 3x the same text, and the links are changed one by one: everytime is replaced only one link - because I use foreach, I know.
But I don't know how to replace them all at once.
Your help would be great!
You don't use regexes on html. use DOM instead. That being said, your bug is here:
$replace = str_replace(...., $text);
^^^^^^^^--- ^^^^^---
you never update $text, so you continually trash the replacement on every iteration of the loop. You probably want
$text = str_replace(...., $text);
instead, so the changes "propagate"
If you want the final variable to contain all replacements change it so something like this...
You basically are not passing the replaced string back into the "subject". I assume that is what you are expecting since it's a bit difficult to understand the question.
$reg_ex = "/(http|https)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
$text = '<br /><p>this is a content with a link we are supposed to click</p><p>another - this is a content with a link we are supposed to click</p><p>another - this is a content with a link we are supposed to click</p>';
if(preg_match_all($reg_ex, $text, $urls))
{
$replace = $text;
foreach($urls[0] as $url) {
$replace = str_replace($url,'http://www.sometext'.$url, $replace);
}
echo $replace;
}

Lines get split on preg_replace usage

My code-
$input = "this text is for highlighting a text if it exists in a string. Let us check if it works or not";
$pattern ="/if/";
$replacement= "H1Fontbracket"."if"."H1BracketClose";
echo preg_replace($pattern, $replacement, $input);
Now the problem is that when i run this code, it splits into multiple lines, what else do i need to do so that i am able to get it in one line
Use str_replace rather than preg_replace. preg_replace will return an array of strings, and str_replace will just return the string:
echo str_replace($pattern, $replacement, $input);
What do you mean by multiple lines? Of course it'll show up as multiple lines on a webpage if you wrap the ifs in header tags. Headers are block elements. And more importantly, headers are headers. Not for highlighting text.
If you want to highlight something with HTML, you should probably use a span with a class, or you could use the HTML5 element mark:
$input = "this text is for highlighting a text if it exists in an iffy string.";
echo preg_replace('/\\bif\\b/', '<span class="highlighted">$0</span>', $input);
echo preg_replace('/\\bif\\b/', '<mark>$0</mark>', $input);
The \\b is to only match if words, and not just the if letters, which might be part of a different word. Then in your CSS you can decide how the marked words should show up:
.highlighted { background: yellow }
mark { background: yellow }
Or whatever. I would recommend that you read up a bit on how HTML and CSS works if you're going to make web pages :)
Try this
$input = "this text is for highlighting a text if
it exists in a string. Let us check if it works or not";
$pattern="if";
$replacement="<h1>". $pattern. "</h1>";
$input= str_replace($pattern,$replacement,$input);
echo "$input";
function highlight($str,$search){
$patterns = array('/\//', '/\^/', '/\./', '/\$/', '/\|/',
'/\(/', '/\)/', '/\[/', '/\]/', '/\*/', '/\+/',
'/\?/', '/\{/', '/\}/', '/\,/');
$replace = array('\/', '\^', '\.', '\$', '\|', '\(', '\)',
'\[', '\]', '\*', '\+', '\?', '\{', '\}', '\,');
$search = preg_replace($patterns, $replace, $search);
$search = str_replace(" ","|",$search);
return #preg_replace("/(^|\s)($search)/i",'${1}<span class=highlight>${2}</span>',$str);
}

Add id attribute to hyperlinks through PHP Regular Expressions

I am still relatively new to Regular Expressions and feel My code is being too greedy. I am trying to add an id attribute to existing links in a piece of code. My functions is like so:
function addClassHref($str) {
//$str = stripslashes($str);
$preg = "/<[\s]*a[\s]*href=[\s]*[\"\']?([\w.-]*)[\"\']?[^>]*>(.*?)<\/a>/i";
preg_match_all($preg, $str, $match);
foreach ($match[1] as $key => $val) {
$pattern[] = '/' . preg_quote($match[0][$key], '/') . '/';
$replace[] = "<a id='buttonRed' href='$val'>{$match[2][$key]}</a>";
}
return preg_replace($pattern, $replace, $str);
}
This adds the id tag like I want but it breaks the hyperlink. For example:
If the original code is : Link
Instead of <a id="class" href="http://www.google.com">Link</a>
It is giving
<a id="class" href="http">Link</a>
Any suggestions or thoughts?
Do not use regular expressions to parse XML or HTML.
$doc = new DOMDocument();
$doc->loadHTML($html);
$all_a = $doc->getElementsByTagName('a');
$firsta = $all_a->item(0);
$firsta->setAttribute('id', 'idvalue');
echo $doc->saveHTML($firsta);
You've got some overcomplications in your regex :)
Also, there's no need for the loop as preg_replace() will hit all the instances of the search pattern in the relevant string. The first regex below will take everything in the a tag and simply add the id attribute on at the end.
$str = 'Link' . "\n" .
'Link' . "\n" .
'Link';
$p = "{<\s*a\s*(href=[^>]*)>([^<]*)</a>}i";
$r = "<a $1 id=\"class\">$2</a>";
echo preg_replace($p, $r, $str);
If you only want to capture the href attribute you could do the following:
$p = '{<\s*a\s*href=["\']([^"\']*)["\'][^>]*>([^<]*)</a>}i';
$r = "<a href='$1' id='class'>$2</a>";
Your first subpattern ([\w.-]*) doesn't match :, thus it stops at "http".
Couldn't you just use a simple str_replace() for this? Regex seems like overkill if this is all you're doing.
$str = str_replace('<a ', '<a id="someID" ', $str);

Regex match full hyperlink only with certain class

I have a string that has some hyperlinks inside. I want to match with regex only certain link from all of them. I can't know if the href or the class comes first, it may be vary.
This is for example a sting:
<div class='wp-pagenavi'>
<span class='pages'>Page 1 of 8</span><span class='current'>1</span>
<a href='http://stv.localhost/channel/political/page/2' class='page'>2</a>
»eee<span class='extend'>...</span><a href='http://stv.localhost/channel/political/page/8' class='last'>lastן »</a>
<a class="cccc">xxx</a>
</div>
I want to select from the aboce string only the one that has the class nextpostslink
So, the match in this example should return this -
»eee
This regex is the most close I could get -
/<a\s?(href=)?('|")(.*)('|") class=('|")nextpostslink('|")>.{1,6}<\/a>/
But it is selecting the links from the start of the string.
I think my problem is in the (.*) , but I can't figure out how to change this to select only the needed link.
I would appreciate your help.
It's much better to use a genuine HTML parser for this. Abandon all attempts to use regular expressions on HTML.
Use PHP's DOMDocument instead:
$dom = new DOMDocument;
$dom->loadHTML($yourHTML);
foreach ($dom->getElementsByTagName('a') as $link) {
$classes = explode(' ', $link->getAttribute('class'));
if (in_array('nextpostslink', $classes)) {
// $link has the class "nextpostslink"
}
}
Not sure if that's what you're but anyway: it's a bad idea to parse html with regex. Use a xpath implementation in order to reach the desired elements. The following xpath expression would give you all the 'a' elements with class "nextpostlink" :
//a[contains(#class,"nextpostslink")]
There are loads of xpath info around, since you didn't mention your programming language here goes a quick xpath tutorial using java: http://www.ibm.com/developerworks/library/x-javaxpathapi/index.html
Edit:
php + xpath + html: http://dev.juokaz.com/php/web-scraping-with-php-and-xpath
This would work in php:
/<a[^>]+href=(\"|')([^\"']*)('|\")[^>]+class=(\"|')[^'\"]*nextpostslink[^'\"]*('|\")[^>]*>(.{1,6})<\/a>/m
This is of course assuming that the class attribute always comes after the href attribute.
This is a code snippet:
$html = <<<EOD
<div class='wp-pagenavi'>
<span class='pages'>Page 1 of 8</span><span class='current'>1</span>
<a href='http://stv.localhost/channel/political/page/2' class='page'>2</a>
»eee<span class='extend'>...</span><a href='http://stv.localhost/channel/political/page/8' class='last'>lastן »</a>
<a class="cccc">xxx</a>
</div>
EOD;
$regexp = "/<a[^>]+href=(\"|')([^\"']*)('|\")[^>]+class=(\"|')[^'\"]*nextpostslink[^'\"]*('|\")[^>]*>(.{1,6})<\/a>/m";
$matches = array();
if(preg_match($regexp, $html, $matches)) {
echo "URL: " . $matches[2] . "\n";
echo "Text: " . $matches[6] . "\n";
}
I would however suggest first matching the link and then getting the url so that the order of the attributes doesn't matter:
<?php
$html = <<<EOD
<div class='wp-pagenavi'>
<span class='pages'>Page 1 of 8</span><span class='current'>1</span>
<a href='http://stv.localhost/channel/political/page/2' class='page'>2</a>
»eee<span class='extend'>...</span><a href='http://stv.localhost/channel/political/page/8' class='last'>lastן »</a>
<a class="cccc">xxx</a>
</div>
EOD;
$regexp = "/(<a[^>]+class=(\"|')[^'\"]*nextpostslink[^'\"]*('|\")[^>]*>(.{1,6})<\/a>)/m";
$matches = array();
if(preg_match($regexp, $html, $matches)) {
$link = $matches[0];
$text = $matches[4];
$regexp = "/href=(\"|')([^'\"]*)(\"|')/";
$matches = array();
if(preg_match($regexp, $html, $matches)) {
$url = $matches[2];
echo "URL: $url\n";
echo "Text: $text\n";
}
}
You could of course extend the regexp by matching one of the both variants (class first vs href first) but it would be very long and I don't think it would be a performance increase.
Just as a proof of concept I created a regexp that doesn't care about the order:
/<a[^>]+(href=(\"|')([^\"']*)('|\")[^>]+class=(\"|')[^'\"]*nextpostslink[^'\"]*(\"|')|class=(\"|')[^'\"]*nextpostslink[^'\"]*(\"|')[^>]+href=(\"|')([^\"']*)('|\"))[^>]*>(.{1,6})<\/a>/m
The text will be in group 12 and the URL will be in either group 3 or group 10 depending on the order.
As the question is to get it by regex, here is how <a\s[^>]*class=["|']nextpostslink["|'][^>]*>(.*)<\/a>.
It doesn't matter in which order are the attributs and it also consider simple or double quotes.
Check the regex online: https://regex101.com/r/DX03KD/1/
I replaced the (.*) with [^'"]+ as follows:
<a\s*(href=)?('|")[^'"]+('|") class=('|")nextpostslink('|")>.{1,6}</a>
Note: I tried this with RegEx Buddy so I didnt need to escape the <>'s or /

Categories