preg_match move selection above paragraph - php

I'm wanting to move images above their container paragraphs in a large body of text using preg_replace.
So, I might have
$body = '<p><img src="a" alt="image"></p><img src="b" alt="image"><p>something here<img src="c" alt="image"> text</p>'
What I want (apart from the 40' yacht etc etc);
<img src="a" alt="image"><p></p><img src="b" alt="image"><img src="c" alt="image"><p>something here text</p>
I've got this, which aint working,
$body = preg_replace('/(<p>.*\s*)(<img.*\s*?image">)(.*\s*?<\/p>)/', '$2$1$3',$body);
It results in;
<img src="c" alt="image"><p><img src="a" alt="image"></p><img src="b" alt="image"><p>something here text</p>

You should load the HTML with DOMDocument and use its operations to move nodes around:
$content = <<<EOM
<p><img src="a" alt="image"></p>
<img src="b" alt="image"><p>something here<img src="c" alt="image"> text</p>
EOM;
$doc = new DOMDocument;
$doc->loadHTML($content);
$xp = new DOMXPath($doc);
// find images that are a direct descendant of a paragraph
foreach ($xp->query('//p/img') as $img) {
$parent = $img->parentNode;
// move image as a previous sibling of its parent
$parent->parentNode->insertBefore($img, $parent);
}
echo $doc->saveHTML();

Related

Add specific class to img tags if missing

I am using Nette PHP (framework shouldn't matter), and I'm trying to replace parts of html with different one - if image tag has class=, it will be replaced with class="image-responsive, and if not it will get a new attribute class="image-responsive".
I'm getting that HTML as a string, which will be saved in database!
This is my current code. It can find the strings, but what I need help with is replacing parts of the html.
public static function ImageAddClass($string)
{
// Match Img with class="$1 (group 1 here)"
$regex_img = '/(<img)([^>]*[^>]*)(\/>)/mi';
$regex_imgClass = '/(<img[^>]* )(class=\")([^\"]*\"[^>]*>)/mi';
$html = $string;
if (preg_match_all($regex_img, $html, $matches)) {
for ($x = 0; $x < count($matches[0]); $x++) {
bdump($matches[0]);
bdump($matches[0][$x]);
bdump($x);
if (preg_match($regex_imgClass, $matches[0][$x])) {
$html = preg_replace($regex_imgClass, '$1class="image-responsiveO $3', $html);
} else if (preg_match($regex_img, $matches[0][$x])) {
$html = preg_replace($regex_img, '$1 class="image-responsiveN" $2$3', $html);
}
}
return $html;
}
}
Covering all scenarios where an img tag might have no class attribute, an orphaned class attribute, a blank class attribute, a class attribute with one or more other words, and a class attribute that already contains image-responsive -- I prefer to use XPath to filter the elements.
Not only is parsing HTML with a legitimate DOM parser like DOMDocument more robust/reliable than regex, the accompanying XPath syntax is highly intuitive.
Pay close attention to how the XPath query pads the haystack class and the needle class with spaces as a means to ensure whole word matching.
Any images that are iterated will have the desired value added to the element's class attribute.
Code: (Demo)
$html = <<<HTML
<div>
<img src="">
<img src="" class>
<img src="" class="image-responsive">
<img src="" class="">
<img src="image-responsive" class="classy">
<img src="" class="image-responsiveness">
<span class='NOT-responsive'></span>
<img src="" class = "foo image-responsive">
</div>
HTML;
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query('//img[not(contains(concat(" ", #class, " ")," image-responsive "))]') as $img) {
$img->setAttribute('class', ltrim($img->getAttribute('class') . ' image-responsive'));
}
echo $dom->saveHTML();
Output:
<div>
<img src="" class="image-responsive">
<img src="" class="image-responsive">
<img src="" class="image-responsive">
<img src="" class="image-responsive">
<img src="image-responsive" class="classy image-responsive">
<img src="" class="image-responsiveness image-responsive">
<span class="NOT-responsive"></span>
<img src="" class="foo image-responsive">
</div>
Related content:
Replace empty alt in wordpress post content with filter
Xpath syntax for "and not contains"
Parsing HTML with PHP To Add Class Names
How can I match on an attribute that contains a certain string?
As a slight variation, you can access all img tags without XPath, then use preg_match() calls to determine which tags should receive the new class. The word boundary character \b is not useful in this case because class names may contain non-word characters.
Code: (Demo)
$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($dom->getElementsByTagName('img') as $img) {
$class = $img->getAttribute('class');
if (!preg_match('/(?:^| )image-responsive(?: |$)/', $class)) {
$img->setAttribute('class', ltrim("$class image-responsive"));
}
}
echo $dom->saveHTML();
// same output as first snippet

PHP dom parser: How to get element count only if it comes after another element?

I'm trying to get a count of how many images are on an HTML page sprinkled throughout an article but I do not want to count the image if it comes before the text of the article begins. The problem is the classes are exactly the same, so I can't use that to help me, and not every article is even going to start with an image. So the HTML might look like this:
<img class="image-asset" src="image.jpg">
<p>First line</p>
<p>Second line</p>
<img class="image-asset" src="second_image.jpg">
<p>Third line</p>
<img class="image-asset" src="third_image.jpg">
In this instance, I want to only count the second and third images. Here's my code, which is successfully counting every image at the moment:
$photoCount = count($html->find('div.image-asset'));
I believe you are looking for something along these lines:
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$target = $xpath->query('//img[preceding-sibling::p]');
echo count($target), PHP_EOL;
//and just to be on the safe side:
foreach ($target as $t) {
echo $t->ownerDocument->saveHTML($t), PHP_EOL;
};
Output:
2
<img class="image-asset" src="second_image.jpg">
<img class="image-asset" src="third_image.jpg">

preg_match() get the source link of image using regex

I want to get the image link from the html content with preg_match() function.
I tried like this but not getting the correct source link.
$data = "<div class="poster">
<div class="pic">
<img class="xfieldimage img" src="https://bobtor.com/uploads/posts/2019-01/1546950927_mv5bnji5yta2mtetztmzny00odc5lwfimzctnme2owqwnwnkywm1xkeyxkfqcgdeqxvyntm3mdmymdq._v1_-1.jpg" alt="Song of Back and Neck 2018" title="Song of Back and Neck 2018">
</div>
</div>";
preg_match("'<img class=\"xfieldimage img\" src=\"(.*?)\" alt=\"(.*?)\" title=\"(.*?)\" />'si", $data, $movie_poster);
print_r($movie_poster);
Its not working.
self-contained tags meme link.
$dom = new DOMDocument();
$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$image = $xpath->query("//img[#class='xfieldimage img']")->item(0);
echo $image->getAttribute("src");

Not able to get image src using regex

I am using below regex to append a element in front of image tag, but it's not working. I took this code from Add link around img tags with regexp
preg_replace('#(<img[^>]+ src="([^"]*)" alt="[^"]*" />)#', '<a href="$2" ...>$1</a>', $str)
However, If I use below code without src, it works.
preg_replace('#(<img[^>]+ alt="[^"]*" />)#', '<a href="" ...>$1</a>', $str)
Any reason why I am not able to get the src from the image tag.
My image tag is <img src="" alt="">
A better way to do something like this is to use PHP's DOMDocument class as it is independent of how people write their HTML (e.g. putting the alt attribute before the src attribute). Something like this would work for your case:
$html = '<div id="x"><img src="/images/xyz" alt="xyz" /><p>hello world!</p></div>';
$doc = new DomDocument();
$doc->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DomXPath($doc);
$images = $xpath->query('//img');
foreach ($images as $img) {
// create a new anchor element
$a = $doc->createElement('a');
// copy the img src attribute to the a href attribute
$a->setAttribute('href', $img->attributes->getNamedItem('src')->nodeValue);
// add the a to the images parent
$img->parentNode->replaceChild($a, $img);
// make the image a child of the <a> element
$a->appendChild($img);
}
echo $doc->saveHTML();
Output:
<div id="x"><img src="/images/xyz" alt="xyz"><p>hello world!</p></div>
Demo on 3v4l.org

Replace all images in HTML with text

I am trying to replace all images in some HTML which meet specific requirements with the appropriate text. The specific requirements are that they are of class "replaceMe" and the image src filename is in $myArray. Upon searching for solutions, it appears that some sort of PHP DOM technique is appropriate, however, I am very new with this. For instance, given $html, I wish to return $desired_html. At the bottom of this post is my attempted implementation which currently doesn't work. Thank you
$myArray=array(
'goodImgage1'=>'Replacement for Good Image 1',
'goodImgage2'=>'Replacement for Good Image 2'
);
$html = '<div>
<p>Random text and an <img src="goodImgage1.png" alt="" class="replaceMe">. More random text.</p>
<p>Random text and an <img src="goodImgage2.png" alt="" class="replaceMe">. More random text.</p>
<p>Random text and an <img src="goodImgage2.png" alt="" class="dontReplaceMe">. More random text.</p>
<p>Random text and an <img src="badImgage1.png" alt="" class="replaceMe">. More random text.</p>
</div>';
$desiredHtml = '<div>
<p>Random text and an Replacement for Good Image 1. More random text.</p>
<p>Random text and an Replacement for Good Image 2. More random text.</p>
<p>Random text and an <img src="goodImgage2.png" alt="" class="dontReplaceMe">. More random text.</p>
<p>Random text and an <img src="badImgage1.png" alt="" class="replaceMe">. More random text.</p>
</div>';
Below is what I am attempting to do..
libxml_use_internal_errors(true); //Temorarily disable errors resulting from improperly formed HTML
$doc = new DOMDocument();
$doc->loadHTML($html);
//What does this do for me?
$imgs= $doc->getElementsByTagName('img');
foreach ($imgs as $img){}
$xpath = new DOMXPath($doc);
foreach( $xpath->query( '//img') as $img) {
if(true){ //How do I check class and image name?
$new = $doc->createTextNode("New Attribute");
$img->parentNode->replaceChild($new,$img);
}
}
$html=$doc->saveHTML();
libxml_use_internal_errors(false);
Do it like this, you were on a good way:
$myArray=array(
'goodImgage1.png'=>'Replacement for Good Image 1',
'goodImgage2.png'=>'Replacement for Good Image 2'
);
$html = '<div>
<p>Random text and an <img src="goodImgage1.png" alt="" class="replaceMe">. More random text.</p>
<p>Random text and an <img src="goodImgage2.png" alt="" class="replaceMe">. More random text.</p>
<p>Random text and an <img src="goodImgage2.png" alt="" class="dontReplaceMe">. More random text.</p>
<p>Random text and an <img src="badImgage1.png" alt="" class="replaceMe">. More random text.</p>
</div>';
$classesToReplace = array('replaceMe');
libxml_use_internal_errors(true); //Temorarily disable errors resulting from improperly formed HTML
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
foreach( $xpath->query( '//img') as $img) {
// get the classes into an array
$classes = explode(' ', $img->getAttribute('class')); // this will contain the classes assigned to the element
$classMatches = array_intersect($classes, $classesToReplace);
// preprocess the image name to match the $myArray keys
$imageName = $img->getAttribute('src');
if (isset($myArray[$imageName]) && $classMatches) {
$new = $doc->createTextNode($myArray[$imageName]);
$img->parentNode->replaceChild($new,$img);
}
}
echo var_dump($html = $doc->saveHTML());
Please note the following:
I made the code check for images that have the replaceMe class, potentially in addition to other classes
I added the full image file names to your $myArray keys, basically for simplicity.
likeitlikeit was faster. I'll post my answer, though, because it has some differences in detail, e.g. xpath doing the job of selecting only <img> with the appropriate class attribute, use of pathinfo to get filename without extension.
$doc = new DOMDocument();
$doc->loadHTML($h); // assume HTML in $h
$xpath = new DOMXPath($doc);
$imgs = $xpath->query("//img[#class = 'replaceMe']");
foreach ($imgs as $img) {
$imgfile = pathinfo($img->getAttribute("src"),PATHINFO_FILENAME);
if (array_key_exists($imgfile, $myArray)) {
$replacement = $doc->createTextNode($myArray[$imgfile]);
$img->parentNode->replaceChild($replacement, $img);
}
}
echo "<pre>" . htmlentities($doc->saveHTML()) . "</pre>";
see it working: http://codepad.viper-7.com/11XZt7

Categories