Replace all images in HTML with text - php

I am trying to replace all images in some HTML which meet specific requirements with the appropriate text. The specific requirements are that they are of class "replaceMe" and the image src filename is in $myArray. Upon searching for solutions, it appears that some sort of PHP DOM technique is appropriate, however, I am very new with this. For instance, given $html, I wish to return $desired_html. At the bottom of this post is my attempted implementation which currently doesn't work. Thank you
$myArray=array(
'goodImgage1'=>'Replacement for Good Image 1',
'goodImgage2'=>'Replacement for Good Image 2'
);
$html = '<div>
<p>Random text and an <img src="goodImgage1.png" alt="" class="replaceMe">. More random text.</p>
<p>Random text and an <img src="goodImgage2.png" alt="" class="replaceMe">. More random text.</p>
<p>Random text and an <img src="goodImgage2.png" alt="" class="dontReplaceMe">. More random text.</p>
<p>Random text and an <img src="badImgage1.png" alt="" class="replaceMe">. More random text.</p>
</div>';
$desiredHtml = '<div>
<p>Random text and an Replacement for Good Image 1. More random text.</p>
<p>Random text and an Replacement for Good Image 2. More random text.</p>
<p>Random text and an <img src="goodImgage2.png" alt="" class="dontReplaceMe">. More random text.</p>
<p>Random text and an <img src="badImgage1.png" alt="" class="replaceMe">. More random text.</p>
</div>';
Below is what I am attempting to do..
libxml_use_internal_errors(true); //Temorarily disable errors resulting from improperly formed HTML
$doc = new DOMDocument();
$doc->loadHTML($html);
//What does this do for me?
$imgs= $doc->getElementsByTagName('img');
foreach ($imgs as $img){}
$xpath = new DOMXPath($doc);
foreach( $xpath->query( '//img') as $img) {
if(true){ //How do I check class and image name?
$new = $doc->createTextNode("New Attribute");
$img->parentNode->replaceChild($new,$img);
}
}
$html=$doc->saveHTML();
libxml_use_internal_errors(false);

Do it like this, you were on a good way:
$myArray=array(
'goodImgage1.png'=>'Replacement for Good Image 1',
'goodImgage2.png'=>'Replacement for Good Image 2'
);
$html = '<div>
<p>Random text and an <img src="goodImgage1.png" alt="" class="replaceMe">. More random text.</p>
<p>Random text and an <img src="goodImgage2.png" alt="" class="replaceMe">. More random text.</p>
<p>Random text and an <img src="goodImgage2.png" alt="" class="dontReplaceMe">. More random text.</p>
<p>Random text and an <img src="badImgage1.png" alt="" class="replaceMe">. More random text.</p>
</div>';
$classesToReplace = array('replaceMe');
libxml_use_internal_errors(true); //Temorarily disable errors resulting from improperly formed HTML
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
foreach( $xpath->query( '//img') as $img) {
// get the classes into an array
$classes = explode(' ', $img->getAttribute('class')); // this will contain the classes assigned to the element
$classMatches = array_intersect($classes, $classesToReplace);
// preprocess the image name to match the $myArray keys
$imageName = $img->getAttribute('src');
if (isset($myArray[$imageName]) && $classMatches) {
$new = $doc->createTextNode($myArray[$imageName]);
$img->parentNode->replaceChild($new,$img);
}
}
echo var_dump($html = $doc->saveHTML());
Please note the following:
I made the code check for images that have the replaceMe class, potentially in addition to other classes
I added the full image file names to your $myArray keys, basically for simplicity.

likeitlikeit was faster. I'll post my answer, though, because it has some differences in detail, e.g. xpath doing the job of selecting only <img> with the appropriate class attribute, use of pathinfo to get filename without extension.
$doc = new DOMDocument();
$doc->loadHTML($h); // assume HTML in $h
$xpath = new DOMXPath($doc);
$imgs = $xpath->query("//img[#class = 'replaceMe']");
foreach ($imgs as $img) {
$imgfile = pathinfo($img->getAttribute("src"),PATHINFO_FILENAME);
if (array_key_exists($imgfile, $myArray)) {
$replacement = $doc->createTextNode($myArray[$imgfile]);
$img->parentNode->replaceChild($replacement, $img);
}
}
echo "<pre>" . htmlentities($doc->saveHTML()) . "</pre>";
see it working: http://codepad.viper-7.com/11XZt7

Related

Add specific class to img tags if missing

I am using Nette PHP (framework shouldn't matter), and I'm trying to replace parts of html with different one - if image tag has class=, it will be replaced with class="image-responsive, and if not it will get a new attribute class="image-responsive".
I'm getting that HTML as a string, which will be saved in database!
This is my current code. It can find the strings, but what I need help with is replacing parts of the html.
public static function ImageAddClass($string)
{
// Match Img with class="$1 (group 1 here)"
$regex_img = '/(<img)([^>]*[^>]*)(\/>)/mi';
$regex_imgClass = '/(<img[^>]* )(class=\")([^\"]*\"[^>]*>)/mi';
$html = $string;
if (preg_match_all($regex_img, $html, $matches)) {
for ($x = 0; $x < count($matches[0]); $x++) {
bdump($matches[0]);
bdump($matches[0][$x]);
bdump($x);
if (preg_match($regex_imgClass, $matches[0][$x])) {
$html = preg_replace($regex_imgClass, '$1class="image-responsiveO $3', $html);
} else if (preg_match($regex_img, $matches[0][$x])) {
$html = preg_replace($regex_img, '$1 class="image-responsiveN" $2$3', $html);
}
}
return $html;
}
}
Covering all scenarios where an img tag might have no class attribute, an orphaned class attribute, a blank class attribute, a class attribute with one or more other words, and a class attribute that already contains image-responsive -- I prefer to use XPath to filter the elements.
Not only is parsing HTML with a legitimate DOM parser like DOMDocument more robust/reliable than regex, the accompanying XPath syntax is highly intuitive.
Pay close attention to how the XPath query pads the haystack class and the needle class with spaces as a means to ensure whole word matching.
Any images that are iterated will have the desired value added to the element's class attribute.
Code: (Demo)
$html = <<<HTML
<div>
<img src="">
<img src="" class>
<img src="" class="image-responsive">
<img src="" class="">
<img src="image-responsive" class="classy">
<img src="" class="image-responsiveness">
<span class='NOT-responsive'></span>
<img src="" class = "foo image-responsive">
</div>
HTML;
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//img[not(contains(concat(" ", #class, " ")," image-responsive "))]') as $img) {
$img->setAttribute('class', ltrim($img->getAttribute('class') . ' image-responsive'));
}
echo $dom->saveHTML();
Output:
<div>
<img src="" class="image-responsive">
<img src="" class="image-responsive">
<img src="" class="image-responsive">
<img src="" class="image-responsive">
<img src="image-responsive" class="classy image-responsive">
<img src="" class="image-responsiveness image-responsive">
<span class="NOT-responsive"></span>
<img src="" class="foo image-responsive">
</div>
Related content:
Replace empty alt in wordpress post content with filter
Xpath syntax for "and not contains"
Parsing HTML with PHP To Add Class Names
How can I match on an attribute that contains a certain string?
As a slight variation, you can access all img tags without XPath, then use preg_match() calls to determine which tags should receive the new class. The word boundary character \b is not useful in this case because class names may contain non-word characters.
Code: (Demo)
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($dom->getElementsByTagName('img') as $img) {
$class = $img->getAttribute('class');
if (!preg_match('/(?:^| )image-responsive(?: |$)/', $class)) {
$img->setAttribute('class', ltrim("$class image-responsive"));
}
}
echo $dom->saveHTML();
// same output as first snippet

preg_match() get the source link of image using regex

I want to get the image link from the html content with preg_match() function.
I tried like this but not getting the correct source link.
$data = "<div class="poster">
<div class="pic">
<img class="xfieldimage img" src="https://bobtor.com/uploads/posts/2019-01/1546950927_mv5bnji5yta2mtetztmzny00odc5lwfimzctnme2owqwnwnkywm1xkeyxkfqcgdeqxvyntm3mdmymdq._v1_-1.jpg" alt="Song of Back and Neck 2018" title="Song of Back and Neck 2018">
</div>
</div>";
preg_match("'<img class=\"xfieldimage img\" src=\"(.*?)\" alt=\"(.*?)\" title=\"(.*?)\" />'si", $data, $movie_poster);
print_r($movie_poster);
Its not working.
self-contained tags meme link.
$dom = new DOMDocument();
$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$image = $xpath->query("//img[#class='xfieldimage img']")->item(0);
echo $image->getAttribute("src");

Replace image inside HTML if we only know the image source

I have an HTML with a number of images inside it. Suppose I have one url which is of one of the images inside the HTML content. What if I have to replace the image inside HTML with some custom text in PHP?
<div>
<p>Some text<img src="a.jpg" class="testclass" alt="image" title="image"/></p>
<p>Some more text<img src="b.jpg" class="testclass2" alt="image2" title="image2"/></p>
</div>
And suppose I have to replace <img src="a.jpg" class="testclass" alt="image" title="image"/> with some custom text but the only information I have is the image URL i.e "a.jpg". How to do it in PHP?
Using regular expressions for this is not the ideal solution. Such expressions can become very complicated to deal with quotes, white space, attribute order, scripts,... etc in HTML.
The preferred method is to use a DOM parser, which PHP offers out-of-the-box.
Here is some code you could use to get what you want:
// main function: pass it the DOM, image URL and replacement text
function DOMreplaceImagesByText($dom, $img_src, $text) {
foreach($dom->getElementsByTagName('img') as $img) {
if ($img->getAttribute("src") == "a.jpg") {
$span = $dom->createElement("span", $text);
$img->parentNode->replaceChild($span, $img);
};
}
}
// utility function to get innerHTML of an element
function DOMinnerHTML($element) {
$innerHTML = "";
foreach ($element->childNodes as $child) {
$innerHTML .= $element->ownerDocument->saveHTML($child);
}
return $innerHTML;
}
// test data
$html = '<div>
<p>Some text<img src="a.jpg" class="testclass" alt="image" title="image"/></p>
<p>Some more text<img src="b.jpg" class="testclass2" alt="image2" title="image2"/></p>
</div>';
// create DOM for given HTML
$dom = new DOMDocument();
$dom->loadHTML($html);
// call our function to make the replacement(s)
DOMreplaceImagesByText($dom, "a.jpg", "custom text");
// convert back to HTML
$html = DOMinnerHTML($dom->getElementsByTagName('body')->item(0));
// show result (for demo only, in reality you would not use htmlentities)
echo htmlentities($html);
The above code will output:
<div>
<p>Some text<span>custom text</span></p>
<p>Some more text<img src="b.jpg" class="testclass2" alt="image2" title="image2"></p>
</div>
Regular Expression
As stated above, regular expressions are not well-suited for this job, but I will provide you one just for completeness sake:
function HTMLreplaceImagesByText($html, $img_src, $text) {
// escape special characters in $img_src so they work as
// literals in the main regular expression
$img_src = preg_replace("/(\W)/", "\\\\$1", $img_src);
// main regular expression:
return preg_replace("/<img[^>]*?\ssrc\s*=\s*[\'\"]" . $img_src
. "[\'\"].*?>/si", "<span>$text</span>", $html);
}
$html = '<div>
<p>Some text<img src="a.jpg" class="testclass" alt="image" title="image"/></p>
<p>Some more text<img src="b.jpg" class="testclass2" alt="image2" title="image2"/></p>
</div>';
$html = HTMLreplaceImagesByText($html, "a.jpg", "custom text");
echo htmlentities($html);
The output will be the same as with the DOM parsing solution. But it will fail in many specific situations, where the DOM solution will not have any problem. For instance, if a matching image tag appears in a comment or as a string within a script tag, it will make the replacement, while it shouldn't. Worse, when the matching image tag has a greater-than sign in an attribute value, the replacement will produce wrong results.
There are many other instances where it will go wrong.

RegEx for linked images of certain class

I don't have access to an HTML parser on my server, so I need to do this via RegEx and PHP. I want to match all occurrences of linked images of a certain class within a large content string.
Here's a sample taken out of the larger content string that I want to match:
<a href='url'><img width="150" height="150" src="url" class="attachment-thumbnail" alt="Description" /></a>
This seems to match class="attachment-thumbnail"
(class=("|"([^"]*)\s)attachment-thumbnail("|\s([^"]*)"))
This seems to match everything from the opening HREF to the closing HREF, but it also gets other images in the larger content string that don't have class="attachment-thumbnail"
/(<a[^>]*)(href=)([^>]*?)(><img[^>]*></a>)/igm
How can I combine the above two to match only those HREFed images of class="attachment-thumbnail"?
Thanks for your help.
Try something like the following:
$html = '<img width="150" height="150" src="url" class="attachment-thumbnail" alt="Description" />';
$doc = new DOMDocument();
$doc->loadHTML($html);
foreach($doc->getElementsByTagName('img') as $item)
{
$doc->saveHTML($item);
if ($item->getAttribute('class') == 'attachment-thumbnail')
{
echo $item->getAttribute('src');
}
}
To remove all elements that match the class 'attachment-thumbnail':
$html = '<img width="150" height="150" src="url" class="attachment-thumbnail" alt="Description" />';
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach($xpath->query('//div[contains(attribute::class,"attachment-thumbnail")]') as $elem)
{
$elem->parentNode->removeChild($elem);
}
echo $dom->saveHTML($doc->documentElement);

preg_match move selection above paragraph

I'm wanting to move images above their container paragraphs in a large body of text using preg_replace.
So, I might have
$body = '<p><img src="a" alt="image"></p><img src="b" alt="image"><p>something here<img src="c" alt="image"> text</p>'
What I want (apart from the 40' yacht etc etc);
<img src="a" alt="image"><p></p><img src="b" alt="image"><img src="c" alt="image"><p>something here text</p>
I've got this, which aint working,
$body = preg_replace('/(<p>.*\s*)(<img.*\s*?image">)(.*\s*?<\/p>)/', '$2$1$3',$body);
It results in;
<img src="c" alt="image"><p><img src="a" alt="image"></p><img src="b" alt="image"><p>something here text</p>
You should load the HTML with DOMDocument and use its operations to move nodes around:
$content = <<<EOM
<p><img src="a" alt="image"></p>
<img src="b" alt="image"><p>something here<img src="c" alt="image"> text</p>
EOM;
$doc = new DOMDocument;
$doc->loadHTML($content);
$xp = new DOMXPath($doc);
// find images that are a direct descendant of a paragraph
foreach ($xp->query('//p/img') as $img) {
$parent = $img->parentNode;
// move image as a previous sibling of its parent
$parent->parentNode->insertBefore($img, $parent);
}
echo $doc->saveHTML();

Categories