I am having trouble with retrieving the src of an image that is part of a link. For example with this I would like to retrieve the src of the img between the tag.
<img src="http://example.com/picture1234.jpg" id="pic_1234" />
I will need to do this for a couple of the links on the page that are all laid out the same. So what I tried so far is this:
$dom = new DOMDocument;
#$dom->loadHTML($html);
$i = 0;
$links = $dom->getElementsByTagName('a');
//Get images
foreach ($links as $link){
$test = $link->getAttribute('href');
if (strpos($test,'/video') !== false) {
$XV_IMG[$i] = $link->nodeValue;
$i++;
}
}
If the link does not contain an img tag only, but instead it has plain text it will work just fine. Is there any way to get the src?
Just keep using getElementsByTagName on the node like this
foreach ($link->getElementsByTagName('img') as $img) {
$XV_IMG[] = $img->getAttribute('src');
}
try to use preg_match_all
$html= '<img src="http://example.com/picture1234.jpg" id="pic_1234" />
<img src="http://example.com/picture1224.jpg" id="pic_1224" />
<img src="http://example.com/picture1434.jpg" id="pic_1434" />
<img src="http://example.com/picture1554.jpg" id="pic_1554" />
<img src="http://example.com/picture1334.jpg" id="pic_1334" />';
preg_match_all('/<a href="(.*)"><img src="(.*)" id="pic_[0-9]{1,7}" \/><\/a>/i',$html,$out);
unset($out[0]);
unset($out[1]);
print_r($out);
Related
I would like to replace my empty alt tags on images in a string. I have a string that contains all the text for a curtain page. In the text are also images, and a lot of them have empty tags (old data), but most of the time they do have title tags.
For example:
<img src="assets/img/test.png" alt="" title="I'am a title tag" width="100" height="100" />
What I wish to have:
<img src="assets/img/test.png" alt="" title="I'am a title tag" alt="I'am a title tag" width="100" height="100" />
So:
I need to find all the images in my string, loop trough the images, find title tags, find alt tags, and replace the empty alt tags with the title tags that do have a value.
This is what i tried:
preg_match_all('/<img[^>]+>/i',$return, $text);
if(isset($text)) {
foreach( $text as $itemImg ) {
foreach( $itemImg as $item ) {
$array = array();
preg_match( '/title="([^"]*)"/i', $item, $array );
if(isset($array[1])) {
//So $array[1] is a title tag, now what?
}
}
}
}
I don't know have to complete the code, and I think there must be a easier fix for this. Suggestions?
Using Regex is not a good approach you should use DOMDocument for parsing HTML. Here we are querying on those elements whose alt attribute is empty which is actually asked in question.
Try this code snippet here
<?php
ini_set('display_errors', 1);
$string=<<<HTML
<img src="assets/img/test1.png" alt="" title="I'am a title tag" width="100" height="100" />
<img src="assets/img/test2.png" alt="" title="I'am a title tag" width="100" height="100" />
<img src="assets/img/test3.png" alt="" title="I'am a title tag" width="100" height="100" />
HTML;
$domDocument = new DOMDocument();
$domDocument->loadHTML($string,LIBXML_HTML_NODEFDTD);
$domXPath = new DOMXPath($domDocument);
$results = $domXPath->query('//img[#alt=""]');
foreach($results as $result)
{
$title=$result->getAttribute("title");
$result->setAttribute("alt",$title);
echo $domDocument->saveHTML($result);
echo PHP_EOL;
}
maybe you could use Javascript for this kind of things with jquery
like:
$('img').each(function(){
$(this).attr('alt', $(this).attr('title'));
});
hope it helps
Regards.
What you want here is an HTML parser library that can manipulate HTML and then save it again. By using regular expressions to modify HTML markup, you're setting yourself up for a mess.
The DOM module built into PHP offers this functionality: http://php.net/manual/en/book.dom.php
Here's an example (cribbed from this article):
$dom = new DOMDocument;
$dom->loadHTML($html);
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
$image->setAttribute('src', 'http://example.com/' . $image->getAttribute('src'));
}
$html = $dom->saveHTML();
You can use DOMDocument to achieve your requirement. Below is one of the sample code for your reference:
<?php
$html = 'test';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//a[contains(concat(' ', normalize-space(#rel), ' '), ' external ')]");
foreach($nodes as $node) {
$node->setAttribute('href', 'http://example.org');
}
?>
Please try it below:
function img_title_in_alt($full_img_tag){
$doc = new DOMDocument();
$doc->loadHTML($full_img_tag);
$imageTags = $doc->getElementsByTagName('img');
foreach($imageTags as $tag) {
if($tag->getAttribute('src')!==''){
return '<img src="'.$tag->getAttribute('src').'" width="'.$tag->getAttribute('width').'" height="'.$tag->getAttribute('height').'" alt="'.$tag->getAttribute('title').'" title="'.$tag->getAttribute('title').'" />';
}
}
}
Now call the function with your full html tag of image. See the example:
$image = '<img src="assets/img/test.png" alt="" title="I\'am a title tag" width="100" height="100" />';
print img_title_in_alt($image);
Let me know if you do not understand anything.
I have the following HTML:
<div><p><img src="https://test1.jpg" /></p></div>
<p>aaa</p>
<p>bbb</p>
<p>ccc<div>ddd <img src="http://test2.jpg" /></div></p>
<p>eee</p>
<p>fff</p>
<p>ggg</p>
<p>hhh</p>
<p>iii</p>
<div><p><img src="https://test3.jpg" /></p></div>
But I need to remove the div tag around the image outside the p tag; the expected output is:
<p><img src="https://test1.jpg" /></p>
<p>aaa</p>
<p>bbb</p>
<p>ccc<div>ddd <img src="http://test2.jpg" /></div></p>
<p>eee</p>
<p>fff</p>
<p>ggg</p>
<p>hhh</p>
<p>iii</p>
<p><img src="https://test3.jpg" /></p>
Does anybody know how to do it with PHP preg_replace function?
you really dont want to use regex to do this, you should use DOMDocument instead. Whilst this seems longer and more complicated, its much more secure.
$dom = new DOMDocument();
$html = '<div><p><img src="https://test1.jpg" /></p></div>ccc<div>ddd <img src="http://test2.jpg" /></div>';
libxml_use_internal_errors(true);
$dom->loadHTML($html);
foreach($dom->getElementsByTagName( 'div' ) as $node) {
// this bit is a little hacky, but if you can predict the values use it to exclude some items
if(strpos($node->nodeValue, 'ddd') !== false) {
continue;
}
$fragment = $dom->createDocumentFragment();
while($node->childNodes->length > 0) {
$fragment->appendChild($node->childNodes->item(0));
}
$node->parentNode->replaceChild($fragment,$node);
}
$html = $dom->saveHTML();
echo $html;
sandbox example
How can I extract image src from an text that only contains img tag?
And by the way src is double quote sometimes and in single quote sometimes.
I would not recommend using regex to parse html. Instead you can use php's DOMDocument() class, which should still work even if the rest of the string isn't really html:
$html = 'Lorem ipsum<img src="test.png">dolor sit amet&[H*()';
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html);
$imgs = $dom->getElementsByTagName('img');
foreach($imgs as $img) {
$src = $img->getAttribute('src');
echo $src;
}
Depending on your php version you may also want to use:
$dom->loadHTML($a, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
Try
$image = '<img class="foo bar test" title="test image" src=\'http://example.com/img/image.jpg\' alt="test image" width="100" height="100" />';
$array = array();
preg_match( "/src='([^\"]*)'/i", $image, $array ) ;
print_r( $array[1] ) ;
i try to replace all img src that not contain full url with full image url
example like this
<?php
$html_str = "<html>
<body>
Hi, this is the first image
<img src='image/example.jpg' />
this is the second image
<img src='http://sciencelakes.com/data_images/out/14/8812836-green-light-abstract.jpg' />
and this is the last image
<img src='image/last.png' />
</body>
</html>";
?>
and when replace became like this
<?php
$html_str = "<html>
<body>
Hi, this is the first image
<img src='http://example.com/image/example.jpg' />
this is the second image
<img src='http://sciencelakes.com/data_images/out/14/8812836-green-light-abstract.jpg' />
and this is the last image
<img src='http://example.com/image/last.png' />
</body>
</html>";
?>
so how to check every img src that not full link and replace it ? ( the $html_str is dynamic based on mysql )
please give me some solution for this problem
thanks
I'd do it properly using a DOM library, eg
$doc = new DOMDocument();
$doc->loadHTML($html_str);
$xp = new DOMXPath($doc);
$images = $xp->query('//img[not(starts-with(#src, "http:") or starts-with(#src, "https:") or starts-with(#src, "data:"))]');
foreach ($images as $img) {
$img->setAttribute('src',
'http://example.com/' . ltrim($img->getAttribute('src'), '/'));
}
$html = $doc->saveHTML($doc->documentElement);
Demo here - http://ideone.com/4K9pyD
Try this:
You can get image source using following code:
$xpath = new DOMXPath(#DOMDocument::loadHTML($html));
$src = $xpath->evaluate("string(//img/#src)");
After that check string contains http or not. According do the operation.
I have this:
<img class="brand-logo" src="http://www.teledynamics.com/tdresources/74c42cb2-dc7f-4548-b820-2946fbe160db.jpg" onerror="this.src='/Content/Css/Images/no_brand_logo_120_48.gif'" alt="ADTRAN">
how to get img src (http://www.teledynamics.com/tdresources/74c42cb2-dc7f-4548-b820-2946fbe160db.jpg)
I tried a lot of things and that was the last one:
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$src = $xpath->evaluate("string(//class='brand-logo']/img/#src)");
echo "$src";
That's not proper XPath syntax. Try
$nodes = $xpath->query("//img[#class='brand-logo']");
$src = $nodes->item(0)->getAttribute('src');
First you fetch the NODE that represents the image whose src you want, THEN you get the src attribute. Note that the ->query() call returns a DOMNodeList, not a node.
Try like this
<?php
$html = '<a href="/Dealer-Catalog/ManufacturerID-3">
<img class="brand-logo" src="http://www.teledynamics.com/tdresources/74c42cb2-dc7f-4548-b820-2946fbe160db.jpg" alt="ADTRAN" />
</a>';
$xml = simplexml_load_string($html);
echo $xml->img['src'];
?>
Try like this
<?php
$doc=new DOMDocument();
$doc->loadHTML('<a href="/Dealer-Catalog/ManufacturerID-3">
<img class="brand-logo" src="http://www.teledynamics.com/tdresources/74c42cb2-dc7f-4548-b820-2946fbe160db.jpg" alt="ADTRAN" />
</a>');
$xml=simplexml_import_dom($doc); // just to make xpath more simple
$images=$xml->xpath('//img');
foreach ($images as $img) {
echo $img['src'];
}?>
With xpath you can query an attribute directly, string() give it's node-value:
$src = $xpath->evaluate("string(//img[#class='brand-logo']/#src)");
However I'm really sorry to say that I have no clue how that could be done with preg_match in your case ;)