Find image src with regex in PHP - php

How can I extract image src from an text that only contains img tag?
And by the way src is double quote sometimes and in single quote sometimes.

I would not recommend using regex to parse html. Instead you can use php's DOMDocument() class, which should still work even if the rest of the string isn't really html:
$html = 'Lorem ipsum<img src="test.png">dolor sit amet&[H*()';
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$imgs = $dom->getElementsByTagName('img');
foreach($imgs as $img) {
$src = $img->getAttribute('src');
echo $src;
}
Depending on your php version you may also want to use:
$dom->loadHTML($a, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);

Try
$image = '<img class="foo bar test" title="test image" src=\'http://example.com/img/image.jpg\' alt="test image" width="100" height="100" />';
$array = array();
preg_match( "/src='([^\"]*)'/i", $image, $array ) ;
print_r( $array[1] ) ;

Related

PHP domDocument works incorrectly when the node wrapper in figure?

I'm trying to add some HTML to all links that contain image.
Basic HTML loaded into dom looks like
<div class='content'>
<img src="">
<figure>
<img src="">
<figcaption>Caption</figcaption>
</figure>
</div>
The code:
$content = mb_convert_encoding($content, 'HTML-ENTITIES', "UTF-8");
$dom = new DOMDocument();
#$dom->loadHTML($content);
// Convert Images
$images = [];
foreach ($dom->getElementsByTagName('img') as $node) {
$images[] = $node;
}
foreach ($images as $node) {
$field_html = $dom->createDocumentFragment(); // create fragment
$field_html->appendXML('<span>11</span>'); // create fragment
$node->parentNode->appendChild($field_html);
}
$newHtml = preg_replace('/^<!DOCTYPE.+?>/', '', str_replace( array('<html>', '</html>', '<body>', '</body>'), array('', '', '', ''), $dom->saveHTML()));
return $newHtml;
So when it's a regular link with img, it produces correct output:
<img src=""><span>11</span>
But when it's a figure, output is very strange — link is duplicated and inserted into figcaption:
<figure>
<img src="">
<figcaption>Caption <a href="..."><span>11</span>
</figcaption>
</figure>
Is that because DOMDocument doesn't understand figure thing?
I was unable to reproduce your problem. My guess would be a misplaced element somewhere in your source HTML. But your code can be simplified quite a bit.
There's no need to put your image nodes into an array, you can work directly with the results of DomDocument::getElementsByTagName().
As mentioned in comments you can setup DomDocument::loadHTML() not to add the doctype and implied elements, instead of removing them later with potentially tricky string manipulations.
A simple DomDocument::createElement() can be used for the element you want to append, instead of creating a new object.
Finally, the error control operator # should generally be avoided. Instead, libxml_use_internal_errors() can be used to set the error behaviour. This allows you to examine error messages with libxml_get_errors() if desired.
$content = <<< HTML
<div class="content">
<img src="">
<figure>
<img src="">
<figcaption>Caption</figcaption>
</figure>
</div>
HTML;
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($content, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
libxml_use_internal_errors(false);
foreach ($dom->getElementsByTagName('img') as $node) {
$node->parentNode->appendChild($dom->createElement("span", "11"));
}
$newHtml = $dom->saveHTML();
echo $newHtml;
Output:
<div class="content">
<img src=""><span>11</span>
<figure>
<img src=""><span>11</span>
<figcaption>Caption</figcaption>
</figure>
</div>

Replace empty alt tag on <img> tag

I would like to replace my empty alt tags on images in a string. I have a string that contains all the text for a curtain page. In the text are also images, and a lot of them have empty tags (old data), but most of the time they do have title tags.
For example:
<img src="assets/img/test.png" alt="" title="I'am a title tag" width="100" height="100" />
What I wish to have:
<img src="assets/img/test.png" alt="" title="I'am a title tag" alt="I'am a title tag" width="100" height="100" />
So:
I need to find all the images in my string, loop trough the images, find title tags, find alt tags, and replace the empty alt tags with the title tags that do have a value.
This is what i tried:
preg_match_all('/<img[^>]+>/i',$return, $text);
if(isset($text)) {
foreach( $text as $itemImg ) {
foreach( $itemImg as $item ) {
$array = array();
preg_match( '/title="([^"]*)"/i', $item, $array );
if(isset($array[1])) {
//So $array[1] is a title tag, now what?
}
}
}
}
I don't know have to complete the code, and I think there must be a easier fix for this. Suggestions?
Using Regex is not a good approach you should use DOMDocument for parsing HTML. Here we are querying on those elements whose alt attribute is empty which is actually asked in question.
Try this code snippet here
<?php
ini_set('display_errors', 1);
$string=<<<HTML
<img src="assets/img/test1.png" alt="" title="I'am a title tag" width="100" height="100" />
<img src="assets/img/test2.png" alt="" title="I'am a title tag" width="100" height="100" />
<img src="assets/img/test3.png" alt="" title="I'am a title tag" width="100" height="100" />
HTML;
$domDocument = new DOMDocument();
$domDocument->loadHTML($string,LIBXML_HTML_NODEFDTD);
$domXPath = new DOMXPath($domDocument);
$results = $domXPath->query('//img[#alt=""]');
foreach($results as $result)
{
$title=$result->getAttribute("title");
$result->setAttribute("alt",$title);
echo $domDocument->saveHTML($result);
echo PHP_EOL;
}
maybe you could use Javascript for this kind of things with jquery
like:
$('img').each(function(){
$(this).attr('alt', $(this).attr('title'));
});
hope it helps
Regards.
What you want here is an HTML parser library that can manipulate HTML and then save it again. By using regular expressions to modify HTML markup, you're setting yourself up for a mess.
The DOM module built into PHP offers this functionality: http://php.net/manual/en/book.dom.php
Here's an example (cribbed from this article):
$dom = new DOMDocument;
$dom->loadHTML($html);
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
$image->setAttribute('src', 'http://example.com/' . $image->getAttribute('src'));
}
$html = $dom->saveHTML();
You can use DOMDocument to achieve your requirement. Below is one of the sample code for your reference:
<?php
$html = 'test';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//a[contains(concat(' ', normalize-space(#rel), ' '), ' external ')]");
foreach($nodes as $node) {
$node->setAttribute('href', 'http://example.org');
}
?>
Please try it below:
function img_title_in_alt($full_img_tag){
$doc = new DOMDocument();
$doc->loadHTML($full_img_tag);
$imageTags = $doc->getElementsByTagName('img');
foreach($imageTags as $tag) {
if($tag->getAttribute('src')!==''){
return '<img src="'.$tag->getAttribute('src').'" width="'.$tag->getAttribute('width').'" height="'.$tag->getAttribute('height').'" alt="'.$tag->getAttribute('title').'" title="'.$tag->getAttribute('title').'" />';
}
}
}
Now call the function with your full html tag of image. See the example:
$image = '<img src="assets/img/test.png" alt="" title="I\'am a title tag" width="100" height="100" />';
print img_title_in_alt($image);
Let me know if you do not understand anything.

preg_replace target images within P tags

I am using preg_replace to change some content, I have 2 different types of images...
<p>
<img class="responsive" src="image.jpg">
</p>
<div class="caption">
<img class="responsive" src="image2.jpg">
</div>
I am using preg_replace like this to add a container div around images...
function filter_content($content)
{
$pattern = '/(<img[^>]*class=\"([^>]*?)\"[^>]*>)/i';
$replacement = '<div class="inner $2">$1</div>';
$content = preg_replace($pattern, $replacement, $content);
return $content;
}
Is there a way to modify this so that it only affect images in P tags? And also vice versa so I can also target images within a caption div?
Absolutely.
$dom = new DOMDocument();
$dom->loadHTML("<body><!-- DOMSTART -->".$content."<!-- DOMEND --></body>");
$xpath = new DOMXPath($dom);
$images = $xpath->query("//p/img");
foreach($images as $img) {
$wrap = $dom->createElement("div");
$wrap->setAttribute("class","inner ".$img->getAttribute("class"));
$img->parentNode->replaceChild($wrap,$img);
$wrap->appendChild($img);
}
$out = $dom->saveHTML();
preg_match("/<!-- DOMSTART -->(.*)<!-- DOMEND -->/s",$out,$match);
return $match[1];
It's worth noting that while parsing arbitrary HTML with regex is a disaster waiting to happen, using a parser with markers and then matching based on those markers is perfectly safe.
Adjust the XPath query and/or inner manipulation as needed.
Use an html parser instead of regex, DOMDocument for example, i.e.:
$html = <<< EOF
<p>
<img class="responsive" src="image.jpg">
</p>
<div class="caption">
<img class="responsive" src="image2.jpg">
</div>
EOF;
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$images = $xpath->query('//p/img[contains(#class,"responsive")]');
$new_src_url = "some_image_name.jpg";
foreach($images as $image)
{
$image->setAttribute('src', $new_src_url);
$dom->saveHTML($tag);
}

getting img src within a link

I am having trouble with retrieving the src of an image that is part of a link. For example with this I would like to retrieve the src of the img between the tag.
<img src="http://example.com/picture1234.jpg" id="pic_1234" />
I will need to do this for a couple of the links on the page that are all laid out the same. So what I tried so far is this:
$dom = new DOMDocument;
#$dom->loadHTML($html);
$i = 0;
$links = $dom->getElementsByTagName('a');
//Get images
foreach ($links as $link){
$test = $link->getAttribute('href');
if (strpos($test,'/video') !== false) {
$XV_IMG[$i] = $link->nodeValue;
$i++;
}
}
If the link does not contain an img tag only, but instead it has plain text it will work just fine. Is there any way to get the src?
Just keep using getElementsByTagName on the node like this
foreach ($link->getElementsByTagName('img') as $img) {
$XV_IMG[] = $img->getAttribute('src');
}
try to use preg_match_all
$html= '<img src="http://example.com/picture1234.jpg" id="pic_1234" />
<img src="http://example.com/picture1224.jpg" id="pic_1224" />
<img src="http://example.com/picture1434.jpg" id="pic_1434" />
<img src="http://example.com/picture1554.jpg" id="pic_1554" />
<img src="http://example.com/picture1334.jpg" id="pic_1334" />';
preg_match_all('/<a href="(.*)"><img src="(.*)" id="pic_[0-9]{1,7}" \/><\/a>/i',$html,$out);
unset($out[0]);
unset($out[1]);
print_r($out);

How to get image src by class

I have this:
<img class="brand-logo" src="http://www.teledynamics.com/tdresources/74c42cb2-dc7f-4548-b820-2946fbe160db.jpg" onerror="this.src='/Content/Css/Images/no_brand_logo_120_48.gif'" alt="ADTRAN">
how to get img src (http://www.teledynamics.com/tdresources/74c42cb2-dc7f-4548-b820-2946fbe160db.jpg)
I tried a lot of things and that was the last one:
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$src = $xpath->evaluate("string(//class='brand-logo']/img/#src)");
echo "$src";
That's not proper XPath syntax. Try
$nodes = $xpath->query("//img[#class='brand-logo']");
$src = $nodes->item(0)->getAttribute('src');
First you fetch the NODE that represents the image whose src you want, THEN you get the src attribute. Note that the ->query() call returns a DOMNodeList, not a node.
Try like this
<?php
$html = '<a href="/Dealer-Catalog/ManufacturerID-3">
<img class="brand-logo" src="http://www.teledynamics.com/tdresources/74c42cb2-dc7f-4548-b820-2946fbe160db.jpg" alt="ADTRAN" />
</a>';
$xml = simplexml_load_string($html);
echo $xml->img['src'];
?>
Try like this
<?php
$doc=new DOMDocument();
$doc->loadHTML('<a href="/Dealer-Catalog/ManufacturerID-3">
<img class="brand-logo" src="http://www.teledynamics.com/tdresources/74c42cb2-dc7f-4548-b820-2946fbe160db.jpg" alt="ADTRAN" />
</a>');
$xml=simplexml_import_dom($doc); // just to make xpath more simple
$images=$xml->xpath('//img');
foreach ($images as $img) {
echo $img['src'];
}?>
With xpath you can query an attribute directly, string() give it's node-value:
$src = $xpath->evaluate("string(//img[#class='brand-logo']/#src)");
However I'm really sorry to say that I have no clue how that could be done with preg_match in your case ;)

Categories