Get substring under condition - php

I have a string $content which looks like that
<h1>Or Any Other tags except img or nothing</h1>
...
<img src="{{media url="image_name.png"}}" alt="image_test" />
...
<h1>Or Any Other tags except img or nothing</h1>
So as the minimal content of the string is
<img src="{{media url="dynamic_image_name.png"}}" alt="dynamic_image_test_alt" />
What I want if to find a way to extract, alter and replace this specific line by the new one?
In the first place I made this:
protected function getStringBetween($str,$from,$to)
{
$sub = substr($str, strpos($str,$from)+strlen($from),strlen($str));
return substr($sub,0,strpos($sub,$to));
}
Using " as from and to variable to get the filename. which is enough to generate what I want.
I would like to do something like that
$generatedContent = "<b>Hi test</b>";
$newContent = alterateContent($content,$generatedContent)
And the $newContent output needs to be:
<h1>Or Any Other tags except img or nothing</h1>
...
<b>Hi test</b>
...
<h1>Or Any Other tags except img or nothing</h1>

I would usually rarely recommend using regular expressions to parse HTML, but in your case, since your goal is to alter something in the database, parsing HTML and then saving it again might accidentally alter some other stuff that you'd want unchanged, such as the formatting.
So here's a simple solution using regex:
function alterateContent(string $html, string $imageFileName, string $replacement): string
{
$imageFileName = preg_quote($imageFileName, '/');
return preg_replace(
"/<img\h+src=\"{{media url="{$imageFileName}"}}\".*?\/>/",
$replacement,
$html
);
}
Usage:
$newContent = alterateContent($yourHtmlString, 'image_name.png', '<b>Hi test</b>');
Note: this assumes the src attribute is always the first attribute of the image.
Demo

You can simply use preg_replace() for that, like this:
$newstring = preg_replace('~<img.*~','<b>Hi test</b>',$oldstring);
Without s modifier, it won't match new line character, so it should work just fine with inline replacement.
If you need to replace the img with exact src, you can do this like this:
$newstring = preg_replace('~<img src="'.$img_source.'".*~','<b>Hi test</b>',$oldstring);
If your source is only a filename without path, and in img tag it's with path, you can use this:
$newstring = preg_replace('~<img src=".*?'.$img_file.'".*~','<b>Hi test</b>',$oldstring);

Related

PHP can't take <img> tag from page

I have a problem with PHP preg_match function.
In CMS DLE, I try to extract a picture from the news (image-x), but in the module I'm referring to via a direct link.
//remove <p></p> tags
$row[$i]['short_story'] = str_replace( "</p><p>", " ",$row[$i]['short_story'] );
//remove the \" escapes (DLE put it in the MySQL column)
$row[$i]['short_story'] = str_replace("\\\"", " ", $row[$i]['short_story']);
//remove all tags except <img>, but there remains a simple text that is stored without tags
$row[$i]['img'] = strip_tags($row[$i]['short_story'], "<img>");
//try to find <img> (by '>'), to remove the simple text;
preg_match(".*>", $row[$i]['img'], $matches);
// print only <br/> (matches is empty)
print_r($matches."<br/>\n");
for example print_r($row[$i]['img']) is
<img src="somelink" class="fr-fic" fr-dib="" alt=""> Some text
And i need only
<img src="somelink" class="fr-fic" fr-dib="" alt="">
Your regex pattern to selecting <img> is incorrect. Use /<img[^>]+>/ in pattern instead. The code should change to
preg_match("/<img[^>]+>/", $row[$i]['img'], $matches);
Also you can use preg_replace() to removing additional text after <img>
preg_replace("/(<img[^>]+>)[\w\s]+/", "$1", $string)

How to remove plain text from a string after using strip_tags()

So i have a string and I used the strip_tags() function to remove all tags except IMG but I still have plain text next to my IMG element. Here a visual example
$myvariable = "This text needs to be removed<a href='blah_blah_blah'>Blah</a><img src='blah.jpg'>"
So using PHP strip_tags() I was able to remove all tags except the <img> tag (which is what I want). But the thing is now it didn't remove the text.
How do I remove the left over text? Text will always either before tag or after tag as well
[ADDED MORE DETAILS]
$description = 'crazy stuff<img src="https://scontent.cdninstagram.com/t51.2885-15/e15/14287934_1389514537744146_673363238_n.jpg?ig_cache_key=MTMzNzM3MzgwNjAyNDY5NDAzMA%3D%3D.2">';
that's what the variable is actually holding.
Thanks in Advance
Instead of replacing something you can very well extract the values you want:
(<(\w+).+</\2>)
To be used with preg_match(), see a demo on regex101.com.
IN PHP:
<?php
$regex = '~(<(\w+).+</\2>)~';
$string = 'crazy stuff<img src="https://scontent.cdninstagram.com/t51.2885-15/e15/14287934_1389514537744146_673363238_n.jpg?ig_cache_key=MTMzNzM3MzgwNjAyNDY5NDAzMA%3D%3D.2">here as well';
if (preg_match($regex, $string, $match)) {
echo $match[1];
}
?>
Please show your whole piece of code with the use of strip_tags.
You can try: preg_replace('~.*(<img[^>]+>)~', '$1', $myvariable);

Removing characters from a variable created using preg_replace

So I'm trying to hack off a few characters at the end of a URL I'm getting from a preg_replace function. However it doesn't seem to be working. I'm not familiar with using these variables in preg_replace (it was just something I found that "mostly" worked).
Here's my attempt:
function addlink_replace($string) {
$pattern = '/<ul(.*?)class="slides"(.*?)<img(.*?)src="(.*?)"(.*?)>(.*?)<\/ul>/is';
$URL = substr($4, 0, -8);;
$replacement = '<ul$1class="slides"$2<a rel=\'shadowbox\' href="'.$URL.'"><img$3src="$4"$5></a>$6</ul>';
return preg_replace($pattern, $replacement, $string);
}
add_filter('the_content', 'addlink_replace', 9999);
Basically I need to remove the last bit of my .jpg file name, so I can show the LARGE image rather than the THUMBNAIL it's generating, but the "$4" doesn't seem to want to be manipulated.
This answer is based off of what you're looking to accomplish in this question with the HTML structure of your other question. The regex that is posted in your question will not match anything other than the first set of <li> and <img> tags , and you've indicated that you need to match all <li> and <img> tags within a <ul> so I've written a larger function to do so.
It will wrap all <img> tags that are inside of an <li> within a <ul> with the class of slides with an <a> with the source being the image's URL with the -110x110 string removed, while preserving the thumbnail source in the <img> tag.
function addlink_replace($string) {
$new_ul_block = '';
$ul_pattern = '/<ul(.*?)class="slides"(.*?)>(.*?)<\/ul>/is';
$img_pattern = '/<li(.*?)>(.*?)<img(.*?)src="(.*?)"(.*?)>(.*?)<\/li>/is';
preg_match($ul_pattern, $string, $ul_matches);
if (!empty($ul_matches[3]))
{
preg_match_all($img_pattern, $ul_matches[3], $img_matches);
if (!empty($img_matches[0]))
{
$new_ul_block .= "<ul{$ul_matches[1]}class=\"slides\"{$ul_matches[2]}>";
foreach ($img_matches[0] as $id => $img)
{
$new_img = str_replace('-110x110', '', $img_matches[4][$id]);
$new_ul_block .= "<li{$img_matches[1][$id]}>{$img_matches[2][$id]}<a href=\"{$new_img}\">";
$new_ul_block .= "<img{$img_matches[3][$id]}src=\"{$img_matches[4][$id]}\"{$img_matches[5][$id]}></a>{$img_matches[6][$id]}</li>";
}
$new_ul_block .= "</ul>";
}
}
if (!empty($new_ul_block))
{
$replace_pattern = '/<ul.*?class="slides".*?>.*?<\/ul>/is';
return preg_replace($replace_pattern, $new_ul_block, $string);
}
else
{
return $string;
}
}
The change of the <a>'s href attribute from what the image had is specifically done on the line
$new_img = str_replace('-110x110', '', $img_matches[2][$id]);
if you would like to modify it. If you need to remove anything other than -110x110 from the URL you may need to change it from str_replace to a preg_replace, or if you want to remove a specific number of characters from the end of the URL, you could use substr:
$new_img = substr($img_matches[2][$id], 0, -12);
Where -12 is the number of characters you want to remove from the end of the string (it's negative because it's starting at the end).
I've posted a working example of this function here.
You may want to consider modifying the source of what is generating this code block, rather than using this regex, as this regex may be hard to maintain in the future if the HTML structure changes.

Remove style attribute from certain HTML tags in a document and replace with class attribute

For instance I have a string:
$string = '<div class="ImageRight" style="width:150px">';
which I want to transform into this:
$string = '<div class="ImageRight">';
I want to remove the portion
style="width:150px with preg_replace() where the
size 150 can vary, so the width can be
500px etc. aswell.
Also, the last part of the classname varies aswell, so the class can be ImageRight, ImageLeft, ImageTop etc.
So, how can I remove the style attribute completely from a string with the above mentioned structure, where the only things that varies is the last portion of the classname and the width value?
EDIT: The ACTUAL string I have is an entire html document and I don't want to remove the style attribute from the entire html, only from the tags which match the string I've shown above.
I think this is what you're after...
$modifiedHtml = preg_replace('/<(div class="Image[^"]+") style="[^"]+">/i', '<$1>', $html);
Remove completely.
$string = preg_replace("/style=\"width:150px\"/", "", $string);
Replace:
$string = preg_replace("/style=\"width:150px\"/", "style=\"width:500px\"", $string);
You can do it in two steps with
$place = 'Left';
$size = 500;
$string = preg_replace('/(?<=class="image)\W(?=")/',$place,$string);
$string = preg_replace('/(?<=style="width:)[0-9]+(?=")/',$size,$string);
Note: (?=...) is called a lookahead.
How about:
$string = preg_replace('/(div class="Image.+?") style="width:.+?"/', "$1", $string);
Simple:
$string = preg_replace('/<div class="Image(.*?)".*?>/i', '<div class="Image$1">', $string);

preg_replace only OUTSIDE tags ? (... we're not talking full 'html parsing', just a bit of markdown)

What is the easiest way of applying highlighting of some text excluding text within OCCASIONAL tags "<...>"?
CLARIFICATION: I want the existing tags PRESERVED!
$t =
preg_replace(
"/(markdown)/",
"<strong>$1</strong>",
"This is essentially plain text apart from a few html tags generated with some
simplified markdown rules: <a href=markdown.html>[see here]</a>");
Which should display as:
"This is essentially plain text apart from a few html tags generated with some simplified markdown rules: see here"
... BUT NOT MESS UP the text inside the anchor tag (i.e. <a href=markdown.html> ).
I've heard the arguments of not parsing html with regular expressions, but here we're talking essentially about plain text except for minimal parsing of some markdown code.
Actually, this seems to work ok:
<?php
$item="markdown";
$t="This is essentially plain text apart from a few html tags generated
with some simplified markdown rules: <a href=markdown.html>[see here]</a>";
//_____1. apply emphasis_____
$t = preg_replace("|($item)|","<strong>$1</strong>",$t);
// "This is essentially plain text apart from a few html tags generated
// with some simplified <strong>markdown</strong> rules: <a href=
// <strong>markdown</strong>.html>[see here]</a>"
//_____2. remove emphasis if WITHIN opening and closing tag____
$t = preg_replace("|(<[^>]+?)(<strong>($item)</strong>)([^<]+?>)|","$1$3$4",$t);
// this preserves the text before ($1), after ($4)
// and inside <strong>..</strong> ($2), but without the tags ($3)
// "This is essentially plain text apart from a few html tags generated
// with some simplified <strong>markdown</strong> rules: <a href=markdown.html>
// [see here]</a>"
?>
A string like $item="odd|string" would cause some problems, but I won't be using that kind of string anyway... (probably needs htmlentities(...) or the like...)
You could split the string into tag‍/‍no-tag parts using preg_split:
$parts = preg_split('/(<(?:[^"\'>]|"[^"<]*"|\'[^\'<]*\')*>)/', $str, -1, PREG_SPLIT_DELIM_CAPTURE);
Then you can iterate the parts while skipping every even part (i.e. the tag parts) and apply your replacement on it:
for ($i=0, $n=count($parts); $i<$n; $i+=2) {
$parts[$i] = preg_replace("/(markdown)/", "<strong>$1</strong>", $parts[$i]);
}
At the end put everything back together with implode:
$str = implode('', $parts);
But note that this is really not the best solution. You should better use a proper HTML parser like PHP’s DOM library. See for example these related questions:
Highlight keywords in a paragraph
Regex / DOMDocument - match and replace text not in a link
First replace any string after a tag, but force your string is after a tag:
$t=preg_replace("|(>[^<]*)(markdown)|i",'$1<strong>$2</strong>',"<null>$t");
Then delete your forced tag:
$show=preg_replace("|<null>|",'',$show);
You could split your string into an array at every '<' or '>' using preg_split(), then loop through that array and replace only in entries not beginning with an '>'. Afterwards you combine your array to an string using implode().
This regex should strip all HTML opening and closing tags: /(<[.*?]>)+/
You can use it with preg_replace like this:
$test = "Hello <strong>World!</strong>";
$regex = "/(<.*?>)+/";
$result = preg_replace($regex,"",$test);
actually this is not very efficient, but it worked for me
$your_string = '...';
$search = 'markdown';
$left = '<strong>';
$right = '</strong>';
$left_Q = preg_quote($left, '#');
$right_Q = preg_quote($right, '#');
$search_Q = preg_quote($search, '#');
while(preg_match('#(>|^)[^<]*(?<!'.$left_Q.')'.$search_Q.'(?!'.$right_Q.')[^>]*(<|$)#isU', $your_string))
$your_string = preg_replace('#(^[^<]*|>[^<]*)(?<!'.$left_Q.')('.$search_Q.')(?!'.$right_Q.')([^>]*<|[^>]*$)#isU', '${1}'.$left.'${2}'.$right.'${3}', $your_string);
echo $your_string;

Categories