Removing characters from a variable created using preg_replace

Removing characters from a variable created using preg_replace - php

So I'm trying to hack off a few characters at the end of a URL I'm getting from a preg_replace function. However it doesn't seem to be working. I'm not familiar with using these variables in preg_replace (it was just something I found that "mostly" worked).
Here's my attempt:
function addlink_replace($string) {
$pattern = '/<ul(.*?)class="slides"(.*?)<img(.*?)src="(.*?)"(.*?)>(.*?)<\/ul>/is';
$URL = substr($4, 0, -8);;
$replacement = '<ul$1class="slides"$2<a rel=\'shadowbox\' href="'.$URL.'"><img$3src="$4"$5></a>$6</ul>';
return preg_replace($pattern, $replacement, $string);
}
add_filter('the_content', 'addlink_replace', 9999);
Basically I need to remove the last bit of my .jpg file name, so I can show the LARGE image rather than the THUMBNAIL it's generating, but the "$4" doesn't seem to want to be manipulated.

This answer is based off of what you're looking to accomplish in this question with the HTML structure of your other question. The regex that is posted in your question will not match anything other than the first set of <li> and <img> tags , and you've indicated that you need to match all <li> and <img> tags within a <ul> so I've written a larger function to do so.
It will wrap all <img> tags that are inside of an <li> within a <ul> with the class of slides with an <a> with the source being the image's URL with the -110x110 string removed, while preserving the thumbnail source in the <img> tag.
function addlink_replace($string) {
$new_ul_block = '';
$ul_pattern = '/<ul(.*?)class="slides"(.*?)>(.*?)<\/ul>/is';
$img_pattern = '/<li(.*?)>(.*?)<img(.*?)src="(.*?)"(.*?)>(.*?)<\/li>/is';
preg_match($ul_pattern, $string, $ul_matches);
if (!empty($ul_matches[3]))
{
preg_match_all($img_pattern, $ul_matches[3], $img_matches);
if (!empty($img_matches[0]))
{
$new_ul_block .= "<ul{$ul_matches[1]}class=\"slides\"{$ul_matches[2]}>";
foreach ($img_matches[0] as $id => $img)
{
$new_img = str_replace('-110x110', '', $img_matches[4][$id]);
$new_ul_block .= "<li{$img_matches[1][$id]}>{$img_matches[2][$id]}<a href=\"{$new_img}\">";
$new_ul_block .= "<img{$img_matches[3][$id]}src=\"{$img_matches[4][$id]}\"{$img_matches[5][$id]}></a>{$img_matches[6][$id]}</li>";
}
$new_ul_block .= "</ul>";
}
}
if (!empty($new_ul_block))
{
$replace_pattern = '/<ul.*?class="slides".*?>.*?<\/ul>/is';
return preg_replace($replace_pattern, $new_ul_block, $string);
}
else
{
return $string;
}
}
The change of the <a>'s href attribute from what the image had is specifically done on the line
$new_img = str_replace('-110x110', '', $img_matches[2][$id]);
if you would like to modify it. If you need to remove anything other than -110x110 from the URL you may need to change it from str_replace to a preg_replace, or if you want to remove a specific number of characters from the end of the URL, you could use substr:
$new_img = substr($img_matches[2][$id], 0, -12);
Where -12 is the number of characters you want to remove from the end of the string (it's negative because it's starting at the end).
I've posted a working example of this function here.
You may want to consider modifying the source of what is generating this code block, rather than using this regex, as this regex may be hard to maintain in the future if the HTML structure changes.

Related

Regex DO NOT match text inside <a> tag

I am trying to create a link tag keyword onto a given text when it find that keyword. But that keyword should NOT already inside an <a></a> tag already and it should also not match href and title attribute. I'm using PHP
For example:
*NOTE. My usecase text don't have space
Text = <p>thisiscatfish</p>
Keyword = catfish
Expected Output = <p>thisiscatfish</p>
BUT if
Text = <p>iamcatfish</p>
Keyword = fish
Expected Output: <p>iamcatfish</p>
*NOTE it should NOT match href and title attribute and replace it.
What I have tried https://paiza.io/projects/WYvDVTUMDg0kFOUo6NEHpQ
Problem
My solution so far it matching and replacing href and title as well. How can I modify my regex to not match href and title attribute as well?
function replaceText($text, $keyword, $url) {
$pattern = "/(?!>)$keyword(?!<\/a>)/i";
$replaceWith = "<a href='$url' title='$keyword'>$keyword</a>";
$newText = preg_replace($pattern, $replaceWith, $text);
return $newText;
}
$text = '<p>thisiscatfish</p>';
$newText = replaceText($text, 'catfish', 'www.catfish.com');
$newText2 = replaceText($newText, 'fish', 'www.fish.com');
echo $newText2;

What you are attempting to do, should not be attempted. RegEx cannot parse HTML. Regular Expressions are for regular languages, HTML is an Irregular language. Therefore it simply cannot handle what you are asking for.
To further emphasize this point, let me point you in the direction of one of the most powerful answers to garnish this forum.

Get substring under condition

I have a string $content which looks like that
<h1>Or Any Other tags except img or nothing</h1>
...
<img src="{{media url="image_name.png"}}" alt="image_test" />
...
<h1>Or Any Other tags except img or nothing</h1>
So as the minimal content of the string is
<img src="{{media url="dynamic_image_name.png"}}" alt="dynamic_image_test_alt" />
What I want if to find a way to extract, alter and replace this specific line by the new one?
In the first place I made this:
protected function getStringBetween($str,$from,$to)
{
$sub = substr($str, strpos($str,$from)+strlen($from),strlen($str));
return substr($sub,0,strpos($sub,$to));
}
Using " as from and to variable to get the filename. which is enough to generate what I want.
I would like to do something like that
$generatedContent = "<b>Hi test</b>";
$newContent = alterateContent($content,$generatedContent)
And the $newContent output needs to be:
<h1>Or Any Other tags except img or nothing</h1>
...
<b>Hi test</b>
...
<h1>Or Any Other tags except img or nothing</h1>

I would usually rarely recommend using regular expressions to parse HTML, but in your case, since your goal is to alter something in the database, parsing HTML and then saving it again might accidentally alter some other stuff that you'd want unchanged, such as the formatting.
So here's a simple solution using regex:
function alterateContent(string $html, string $imageFileName, string $replacement): string
{
$imageFileName = preg_quote($imageFileName, '/');
return preg_replace(
"/<img\h+src=\"{{media url="{$imageFileName}"}}\".*?\/>/",
$replacement,
$html
);
}
Usage:
$newContent = alterateContent($yourHtmlString, 'image_name.png', '<b>Hi test</b>');
Note: this assumes the src attribute is always the first attribute of the image.
Demo

You can simply use preg_replace() for that, like this:
$newstring = preg_replace('~<img.*~','<b>Hi test</b>',$oldstring);
Without s modifier, it won't match new line character, so it should work just fine with inline replacement.
If you need to replace the img with exact src, you can do this like this:
$newstring = preg_replace('~<img src="'.$img_source.'".*~','<b>Hi test</b>',$oldstring);
If your source is only a filename without path, and in img tag it's with path, you can use this:
$newstring = preg_replace('~<img src=".*?'.$img_file.'".*~','<b>Hi test</b>',$oldstring);

PHP preg_replace: Replace all anchor tags in text with their href value with Regex

I want to replace all anchor tags within a text with their href value, but my pattern does not work right.
$str = 'This is a text with multiple anchor tags. This is the first one: Link 1 and this one the second: Link 2 after that a lot of other text. And here the 3rd one: Link 3 Some other text.';
$test = preg_replace("/<a\s.+href=['|\"]([^\"\']*)['|\"].*>[^<]*<\/a>/i",'\1', $str);
echo $test;
At the end the text should look like this:
This is a text with multiple anchor tags. This is the first one: https://www.link1.com/ and this one the second: https://www.link2.com/ after that a lot of other text. And here the 3rd one: https://www.link3.com/ Some other text.
Thank you very much!

Just don't.
Use a parser instead.
$dom = new DOMDocument();
// since you have a fragment, wrap it in a <body>
$dom->loadHTML("<body>".$str."</body>");
$links = $dom->getElementsByTagName("a");
while($link = $links[0]) {
$link->parentNode->insertBefore(new DOMText($link->getAttribute("href")),$link);
$link->parentNode->removeChild($link);
}
$result = $dom->saveHTML($dom->getElementsByTagName("body")[0]);
// remove <body>..</body> wrapper
$output = substr($result, strlen("<body>"), -strlen("</body>"));
Demo on 3v4l

In case you're still set on regex, this should work:
preg_replace("/<a\s+href=['\"]([^'\"]+)['\"][^\>]*>[^<]+<\/a>/i",'$1', $str);
But you're probably better off with a solution like what Andreas posted.
FYI: the reason your previous regex didn't work was this little number:
.*>
Because . selects everything you ended up matching everything past the url to be replaced; all the way to the end. This is why it appeared to only select and replace the first anchor tag it found and cut off the rest.
Changing that to
[^\>]*
Ensures that this particular selection is constrained to only the portion of the string which exists between the url and the ending bracket of the a tag.

Simpler perhaps not, but safer is to loop the string with strpos to find and cut the string and remove the html.
$str = 'This is a text with multiple anchor tags. This is the first one: <a class="funky-style" href="https://www.link1.com/" title="Link 1">Link 1</a> and this one the second: Link 2 after that a lot of other text. And here the 3rd one: Link 3 Some other text.';
$pos = strpos($str, '<a');
while($pos !== false){
// Find start of html and remove up to link (<a href=")
$str = substr($str, 0, $pos) . substr($str, strpos($str, 'href="', $pos)+6);
// Find end of link and remove that.(" title="Link 1">Link 1</a>)
$str = substr($str, 0, strpos($str,'"', $pos)) . substr($str, strpos($str, '</a>', $pos)+4);
// Find next link if possible
$pos = strpos($str, '<a');
}
echo $str;
https://3v4l.org/vdN7E
Edited to handle different order of a a-tag.

If you want to replace a tags with href values you can do:
$post = preg_replace("/<a.*?href=\"(.*?)\".*?>(.*?)<\/a>/","$1",$post);
If you want to replace with text values:
$post = preg_replace("/<a.*?href=\"(.*?)\".*?>(.*?)<\/a>/","$2",$post);

replace link with another

I'm struggling on replacing text in each link.
$reg_ex = "/(http|https)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
$text = '<br /><p>this is a content with a link we are supposed to click</p><p>another - this is a content with a link we are supposed to click</p><p>another - this is a content with a link we are supposed to click</p>';
if(preg_match_all($reg_ex, $text, $urls))
{
foreach($urls[0] as $url)
{
echo $replace = str_replace($url,'http://www.sometext'.$url, $text);
}
}
From the code above, I'm getting 3x the same text, and the links are changed one by one: everytime is replaced only one link - because I use foreach, I know.
But I don't know how to replace them all at once.
Your help would be great!

You don't use regexes on html. use DOM instead. That being said, your bug is here:
$replace = str_replace(...., $text);
^^^^^^^^--- ^^^^^---
you never update $text, so you continually trash the replacement on every iteration of the loop. You probably want
$text = str_replace(...., $text);
instead, so the changes "propagate"

If you want the final variable to contain all replacements change it so something like this...
You basically are not passing the replaced string back into the "subject". I assume that is what you are expecting since it's a bit difficult to understand the question.
$reg_ex = "/(http|https)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,3}(\/\S*)?/";
$text = '<br /><p>this is a content with a link we are supposed to click</p><p>another - this is a content with a link we are supposed to click</p><p>another - this is a content with a link we are supposed to click</p>';
if(preg_match_all($reg_ex, $text, $urls))
{
$replace = $text;
foreach($urls[0] as $url) {
$replace = str_replace($url,'http://www.sometext'.$url, $replace);
}
echo $replace;
}

Regular expression check by skipping anchor tags

I have written a regex for searching particular keyword and I am replacing that keyword with particular URL.
My current regex is as: \b$keyword\b
One problem in this is that if my data contains anchor tags and that tag contains this keyword then this regex replaces that keyword in the anchor tag as well.
I want to search in given data excluding anchor tag. Please help me out. Appreciate your help.
eg. Keyword: Disney
I/p:
This is Disney The disney should be replaceable
Expected O/p:
This is Disney The disney should be replaceable
Invalid o/p:
This is <a href="any-url.php">Disney </a> The disney should be replaceable

I've modified my function that highlights searched phrase on a page, here you go:
$html = 'This is Disney The disney should be replaceable.'.PHP_EOL;
$html .= 'Let\'s test also use of keyword inside other tags, for example as class name:'.PHP_EOL;
$html .= '<b class=disney></b> - this should not be replaced with link, and it isn\'t!'.PHP_EOL;
$result = ReplaceKeywordWithLink($html, "disney", "any-url.php");
echo nl2br(htmlspecialchars($result));
function ReplaceKeywordWithLink($html, $keyword, $link)
{
if (strpos($html, "<") !== false) {
$id = 0;
$unique_array = array();
// Hide existing anchor tags with some unique string.
preg_match_all("#<a[^<>]*>[\s\S]*?</a>#i", $html, $matches);
foreach ($matches[0] as $tag) {
$id++;
$unique_string = "#####$id#####";
$unique_array[$unique_string] = $tag;
$html = str_replace($tag, $unique_string, $html);
}
// Hide all tags by replacing with some unique string.
preg_match_all("#<[^<>]+>#", $html, $matches);
foreach ($matches[0] as $tag) {
$id++;
$unique_string = "#####$id#####";
$unique_array[$unique_string] = $tag;
$html = str_replace($tag, $unique_string, $html);
}
}
// Then we replace the keyword with link.
$keyword = preg_quote($keyword);
assert(strpos($keyword, '$') === false);
$html = preg_replace('#(\b)('.$keyword.')(\b)#i', '$1$2$3', $html);
// We get back all the tags by replacing unique strings with their corresponding tag.
if (isset($unique_array)) {
foreach ($unique_array as $unique_string => $tag) {
$html = str_replace($unique_string, $tag, $html);
}
}
return $html;
}
Result:
This is Disney The disney should be replaceable.
Let's test also use of keyword inside other tags, for example as class name:
<b class=disney></b> - this should not be replaced with link, and it isn't!

Add this to the end of your regex:
(?=[^<]*(?:<(?!/?a\b)[^<]*)*(?:<a\b|\z))
This lookahead tries to match either the next opening <a> tag or the end of the input, but only if it doesn't see a closing </a> tag first. Assuming the HTML is minimally well formed, the lookahead will fail whenever the match starts after the beginning of an <a> tag and before the corresponding </a> tag.
To prevent it from matching inside any other tag (e.g. <div class="disney">), you can add this lookahead as well:
(?![^<>]*+>)
With this one I'm assuming there won't be any angle brackets in the attribute values of the tags, which is legal according to the HTML 4 spec, but extremely rare in the real world.
If you're writing the regex in the form of a PHP double-quoted string (which you must be, if you expect the $keyword variable to be replaced) you should double all the backslashes. \z probably wouldn't be a problem but I believe \b would be interpreted as a backspace, not as a word-boundary assertion.
EDIT: On second thought, definitely do add the second lookahead--I mean, why would not want to prevent matches inside tags? And place it first, because it will tend to evaluate more quickly than the other:
(?![^<>]*+>)(?=[^<]*(?:<(?!/?a\b)[^<]*)*(?:<a\b|\z))

strip the tags first, then search on the stripped text.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Removing characters from a variable created using preg_replace - php

Related

Regex DO NOT match text inside <a> tag

Get substring under condition

PHP preg_replace: Replace all anchor tags in text with their href value with Regex

replace link with another

Regular expression check by skipping anchor tags

Categories

Resources