preg_replace img src, width, height stack overflow - php

This's my codes:
$content = '<p><img src="http://localhost/contents/uploads/sdadaasa.jpg" width="1500" height="900"></p>';
$content = preg_replace('/<p><img.+src=[\'"]([^\'"]+)[\'"].*>/i', "<p class=\"the-image\"><img class=\"lazy-load\" src=\"$1\" width=\"\" height=\"\"/></p>", $content);
return $content;
My code is add a class for <p> tag and <img> tag.
Now i want to also get the width and height from $content because my code is removing the width and height attribute.

To parser HTML it's better to use any library as, for example, DomDocument
$content = '<p><img src="http://localhost/contents/uploads/sdadaasa.jpg" width="1500" height="900"></p>';
$dom = new DomDocument();
$dom->loadHTML($content);
$p = $dom->getElementsByTagName('p')->item(0);
$p->setAttribute('class', 'the-image');
$img = $p->getElementsByTagName('img')->item(0);
$img->setAttribute('class', 'lazy-load');
echo $dom->saveHTML($p);
// <p class="the-image"><img src="http://localhost/contents/uploads/sdadaasa.jpg" width="1500" height="900" class="lazy-load"></p>

Related

Get background image from webpage using DOM XPATH

I'm reading a webpage using PHP DOM/XPath and I've managed to get the text I need, but now I'm trying to get the src of the main image but I can't get it.
Also to complicate things, the source is different to the inspector.
Here is the source:
<div id="bg">
<img src="https://example.com/image.jpg" alt=""/>
</div>
And here is the element in the inspector:
<div class="media-player" id="media-player-0" style="width: 320px; height: 320px; background: url("https://example.com/image.jpg") center center / cover no-repeat rgb(208, 208, 208);" currentmouseover="16">
I've tried:
$img = $xpath->evaluate('substring-before(substring-after(//div[#id=\'bg\']/img, "\')")');
and
$img = $xpath->evaluate('substring-before(substring-after(//div[#class=\'media-player\']/#style, "background: url(\'"), "\')")');
but get nothing from either.
Here is my complete code:
$html = file_get_contents($externalurl);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$allChildNodesFromDiv = $xpath->query('//h1[#class="artist"]');
$releasetitle = $allChildNodesFromDiv->item(0)->textContent;
echo "</br>Title: " . $releasetitle;
$img = $xpath->evaluate('substring-before(substring-after(//div[#class=\'media-player\']/#style, "background: url(\'"), "\')")');
echo $image;
$img = $xpath->evaluate('substring-before(substring-after(//div[#id=\'bg\']/img, "\')")');
echo $image;
Not something I would normally suggest, but as the particular content you are after is loaded from javascript, BUT the content is in <script> tags, then it may be an easy one for a regex to extract. From your comment...
Ah yes, it appears in: poster :
'https://284fc2d5f6f33a52cd9f-ce476c3c56a27f320262daffab84f1af.ssl.cf3.rackcdn.com/artwork_5e74a44e1e004_CHAMPDL879D_5e74a44e4672b.jpg'
So this code looks the value of poster : '...',.
$html = file_get_contents($externalurl);
preg_match("/poster : '(.*)',/", $html, $matches);
echo $matches[1];
This can be prone to changes in the html, but it may work for now.

Replace empty alt tag on <img> tag

I would like to replace my empty alt tags on images in a string. I have a string that contains all the text for a curtain page. In the text are also images, and a lot of them have empty tags (old data), but most of the time they do have title tags.
For example:
<img src="assets/img/test.png" alt="" title="I'am a title tag" width="100" height="100" />
What I wish to have:
<img src="assets/img/test.png" alt="" title="I'am a title tag" alt="I'am a title tag" width="100" height="100" />
So:
I need to find all the images in my string, loop trough the images, find title tags, find alt tags, and replace the empty alt tags with the title tags that do have a value.
This is what i tried:
preg_match_all('/<img[^>]+>/i',$return, $text);
if(isset($text)) {
foreach( $text as $itemImg ) {
foreach( $itemImg as $item ) {
$array = array();
preg_match( '/title="([^"]*)"/i', $item, $array );
if(isset($array[1])) {
//So $array[1] is a title tag, now what?
}
}
}
}
I don't know have to complete the code, and I think there must be a easier fix for this. Suggestions?
Using Regex is not a good approach you should use DOMDocument for parsing HTML. Here we are querying on those elements whose alt attribute is empty which is actually asked in question.
Try this code snippet here
<?php
ini_set('display_errors', 1);
$string=<<<HTML
<img src="assets/img/test1.png" alt="" title="I'am a title tag" width="100" height="100" />
<img src="assets/img/test2.png" alt="" title="I'am a title tag" width="100" height="100" />
<img src="assets/img/test3.png" alt="" title="I'am a title tag" width="100" height="100" />
HTML;
$domDocument = new DOMDocument();
$domDocument->loadHTML($string,LIBXML_HTML_NODEFDTD);
$domXPath = new DOMXPath($domDocument);
$results = $domXPath->query('//img[#alt=""]');
foreach($results as $result)
{
$title=$result->getAttribute("title");
$result->setAttribute("alt",$title);
echo $domDocument->saveHTML($result);
echo PHP_EOL;
}
maybe you could use Javascript for this kind of things with jquery
like:
$('img').each(function(){
$(this).attr('alt', $(this).attr('title'));
});
hope it helps
Regards.
What you want here is an HTML parser library that can manipulate HTML and then save it again. By using regular expressions to modify HTML markup, you're setting yourself up for a mess.
The DOM module built into PHP offers this functionality: http://php.net/manual/en/book.dom.php
Here's an example (cribbed from this article):
$dom = new DOMDocument;
$dom->loadHTML($html);
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
$image->setAttribute('src', 'http://example.com/' . $image->getAttribute('src'));
}
$html = $dom->saveHTML();
You can use DOMDocument to achieve your requirement. Below is one of the sample code for your reference:
<?php
$html = 'test';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//a[contains(concat(' ', normalize-space(#rel), ' '), ' external ')]");
foreach($nodes as $node) {
$node->setAttribute('href', 'http://example.org');
}
?>
Please try it below:
function img_title_in_alt($full_img_tag){
$doc = new DOMDocument();
$doc->loadHTML($full_img_tag);
$imageTags = $doc->getElementsByTagName('img');
foreach($imageTags as $tag) {
if($tag->getAttribute('src')!==''){
return '<img src="'.$tag->getAttribute('src').'" width="'.$tag->getAttribute('width').'" height="'.$tag->getAttribute('height').'" alt="'.$tag->getAttribute('title').'" title="'.$tag->getAttribute('title').'" />';
}
}
}
Now call the function with your full html tag of image. See the example:
$image = '<img src="assets/img/test.png" alt="" title="I\'am a title tag" width="100" height="100" />';
print img_title_in_alt($image);
Let me know if you do not understand anything.

preg_replace target images within P tags

I am using preg_replace to change some content, I have 2 different types of images...
<p>
<img class="responsive" src="image.jpg">
</p>
<div class="caption">
<img class="responsive" src="image2.jpg">
</div>
I am using preg_replace like this to add a container div around images...
function filter_content($content)
{
$pattern = '/(<img[^>]*class=\"([^>]*?)\"[^>]*>)/i';
$replacement = '<div class="inner $2">$1</div>';
$content = preg_replace($pattern, $replacement, $content);
return $content;
}
Is there a way to modify this so that it only affect images in P tags? And also vice versa so I can also target images within a caption div?
Absolutely.
$dom = new DOMDocument();
$dom->loadHTML("<body><!-- DOMSTART -->".$content."<!-- DOMEND --></body>");
$xpath = new DOMXPath($dom);
$images = $xpath->query("//p/img");
foreach($images as $img) {
$wrap = $dom->createElement("div");
$wrap->setAttribute("class","inner ".$img->getAttribute("class"));
$img->parentNode->replaceChild($wrap,$img);
$wrap->appendChild($img);
}
$out = $dom->saveHTML();
preg_match("/<!-- DOMSTART -->(.*)<!-- DOMEND -->/s",$out,$match);
return $match[1];
It's worth noting that while parsing arbitrary HTML with regex is a disaster waiting to happen, using a parser with markers and then matching based on those markers is perfectly safe.
Adjust the XPath query and/or inner manipulation as needed.
Use an html parser instead of regex, DOMDocument for example, i.e.:
$html = <<< EOF
<p>
<img class="responsive" src="image.jpg">
</p>
<div class="caption">
<img class="responsive" src="image2.jpg">
</div>
EOF;
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$images = $xpath->query('//p/img[contains(#class,"responsive")]');
$new_src_url = "some_image_name.jpg";
foreach($images as $image)
{
$image->setAttribute('src', $new_src_url);
$dom->saveHTML($tag);
}

String replace with regex in PHP

I want to modify the contents of an html file with php.
I am applying style to img tags, and I need to check if the tag already has a style attribute, if it has, I want to replace it with my own.
$pos = strpos($theData, "src=\"".$src."\" style=");
if (!$pos){
$theData = str_replace("src=\"".$src."\"", "src=\"".$src."\" style=\"width:".$width."px\"", $theData);
}
else{
$theData = preg_replace("src=\"".$src."\" style=/\"[^\"]+\"/", "src=\"".$src."\" style=\"width: ".$width."px\"", $theData);
}
$theData is the html source code I receive.
If a style attribute has not been found, I successfully insert my own style, but I think the problem comes when there is already a style attribute defined so my regex is not working.
I want to replace the style attribute with everything inside it, with my new style attribute.
How should my regex look?
Instead of using regex for this, you should use a DOM parser.
Example using DOMDocument:
<?php
$html = '<img src="http://example.com/image.jpg" width=""/><img src="http://example.com/image.jpg"/>';
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML('<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />'.$html);
$dom->formatOutput = true;
foreach ($dom->getElementsByTagName('img') as $item)
{
//Remove width attr if its there
$item->removeAttribute('width');
//Get the sytle attr if its there
$style = $item->getAttribute('style');
//Set style appending existing style if necessary, 123px could be your $width var
$item->setAttribute('style','width:123px;'.$style);
}
//remove unwanted doctype ect
$ret = preg_replace('~<(?:!DOCTYPE|/?(?:html|body|head))[^>]*>\s*~i', '', $dom->saveHTML());
echo trim(str_replace('<meta http-equiv="Content-Type" content="text/html;charset=utf-8">','',$ret));
//<img src="http://example.com/image.jpg" style="width:123px;">
//<img src="http://example.com/image.jpg" style="width:123px;">
?>
Here is the regexp variant of solving this problem:
<?php
$theData = "<img src=\"/image.png\" style=\"lol\">";
$src = "/image.png";
$width = 10;
//you must escape potential special characters in $src,
//before using it in regexp
$regexp_src = preg_quote($src, "/");
$theData = preg_replace(
'/src="'. $regexp_src .'" style=".*?"/i',
'src="'. $src .'" style="width: '. $width . 'px;"',
$theData);
print $theData;
prints:
<img src="/image.png" style="width: 10px;">
Regex expression:
(<[^>]*)style\s*=\s*('|")[^\2]*?\2([^>]*>)
Usage:
$1$3
Example:
http://rubular.com/r/28tCIMHs50
Search for:
<img([^>])style="([^"])"
and replace with:
<img\1style="attribute1: value1; attribute2: value2;"
http://regex101.com/r/zP2tV9

Replace existing image tags within html string with new image

I am trying to replace existing image tags with new image tag wiht id attribute.
I have DB returned htmlstring like
<div>
<p>random p</p>
<img src='a.jpb'/>
</div>
<span>random span</span>
<img src='b.jpb'/>
more...
I want to replace 'SOME' of the images with id attribute for example:
replacing
<img src='a.jpb'/>
to
<img id='123' src='a.jpb'/>
I use domdocument
$doc = new DOMDocument();
$doc->loadHTML($html);
$imageTags = $doc->getElementsByTagName('img');
$imgSource=$this->DBimgSource;
$id=$this->DBID;
foreach($imageTags as $tag) {
//getting all the images tags from $html string
$source=$tag->getAttribute('src');
if(!empty($imgSource) && !empty($source)){
if(in_array($source, $imgSource)){
$id=array_search($source,$imgSource);
$imgID=$id;
$tag->setAttribute('id',$imgID);
$newImageTag=$doc->saveXML($tag);
//I am not sure how to replace existing image tags within $html with the $newImageTag
}
}
}
Any ideas how to do this? I couldn't think of anyway. Thanks for the help!
This works for me:
$html = "<html><body><img src='foo.jpg'/></body></html>";
$doc = new DOMDocument();
$doc->loadHTML($html);
$img = $doc->getElementsByTagName('img')->item(0);
$img->setAttribute('id', 'newId');
$html = $doc->saveHTML();
echo $html;
Output is:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><img src="foo.jpg" id="newId"></body></html>

Categories