i try to replace all img src that not contain full url with full image url
example like this
<?php
$html_str = "<html>
<body>
Hi, this is the first image
<img src='image/example.jpg' />
this is the second image
<img src='http://sciencelakes.com/data_images/out/14/8812836-green-light-abstract.jpg' />
and this is the last image
<img src='image/last.png' />
</body>
</html>";
?>
and when replace became like this
<?php
$html_str = "<html>
<body>
Hi, this is the first image
<img src='http://example.com/image/example.jpg' />
this is the second image
<img src='http://sciencelakes.com/data_images/out/14/8812836-green-light-abstract.jpg' />
and this is the last image
<img src='http://example.com/image/last.png' />
</body>
</html>";
?>
so how to check every img src that not full link and replace it ? ( the $html_str is dynamic based on mysql )
please give me some solution for this problem
thanks
I'd do it properly using a DOM library, eg
$doc = new DOMDocument();
$doc->loadHTML($html_str);
$xp = new DOMXPath($doc);
$images = $xp->query('//img[not(starts-with(#src, "http:") or starts-with(#src, "https:") or starts-with(#src, "data:"))]');
foreach ($images as $img) {
$img->setAttribute('src',
'http://example.com/' . ltrim($img->getAttribute('src'), '/'));
}
$html = $doc->saveHTML($doc->documentElement);
Demo here - http://ideone.com/4K9pyD
Try this:
You can get image source using following code:
$xpath = new DOMXPath(#DOMDocument::loadHTML($html));
$src = $xpath->evaluate("string(//img/#src)");
After that check string contains http or not. According do the operation.
Related
I would like to replace my empty alt tags on images in a string. I have a string that contains all the text for a curtain page. In the text are also images, and a lot of them have empty tags (old data), but most of the time they do have title tags.
For example:
<img src="assets/img/test.png" alt="" title="I'am a title tag" width="100" height="100" />
What I wish to have:
<img src="assets/img/test.png" alt="" title="I'am a title tag" alt="I'am a title tag" width="100" height="100" />
So:
I need to find all the images in my string, loop trough the images, find title tags, find alt tags, and replace the empty alt tags with the title tags that do have a value.
This is what i tried:
preg_match_all('/<img[^>]+>/i',$return, $text);
if(isset($text)) {
foreach( $text as $itemImg ) {
foreach( $itemImg as $item ) {
$array = array();
preg_match( '/title="([^"]*)"/i', $item, $array );
if(isset($array[1])) {
//So $array[1] is a title tag, now what?
}
}
}
}
I don't know have to complete the code, and I think there must be a easier fix for this. Suggestions?
Using Regex is not a good approach you should use DOMDocument for parsing HTML. Here we are querying on those elements whose alt attribute is empty which is actually asked in question.
Try this code snippet here
<?php
ini_set('display_errors', 1);
$string=<<<HTML
<img src="assets/img/test1.png" alt="" title="I'am a title tag" width="100" height="100" />
<img src="assets/img/test2.png" alt="" title="I'am a title tag" width="100" height="100" />
<img src="assets/img/test3.png" alt="" title="I'am a title tag" width="100" height="100" />
HTML;
$domDocument = new DOMDocument();
$domDocument->loadHTML($string,LIBXML_HTML_NODEFDTD);
$domXPath = new DOMXPath($domDocument);
$results = $domXPath->query('//img[#alt=""]');
foreach($results as $result)
{
$title=$result->getAttribute("title");
$result->setAttribute("alt",$title);
echo $domDocument->saveHTML($result);
echo PHP_EOL;
}
maybe you could use Javascript for this kind of things with jquery
like:
$('img').each(function(){
$(this).attr('alt', $(this).attr('title'));
});
hope it helps
Regards.
What you want here is an HTML parser library that can manipulate HTML and then save it again. By using regular expressions to modify HTML markup, you're setting yourself up for a mess.
The DOM module built into PHP offers this functionality: http://php.net/manual/en/book.dom.php
Here's an example (cribbed from this article):
$dom = new DOMDocument;
$dom->loadHTML($html);
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
$image->setAttribute('src', 'http://example.com/' . $image->getAttribute('src'));
}
$html = $dom->saveHTML();
You can use DOMDocument to achieve your requirement. Below is one of the sample code for your reference:
<?php
$html = 'test';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//a[contains(concat(' ', normalize-space(#rel), ' '), ' external ')]");
foreach($nodes as $node) {
$node->setAttribute('href', 'http://example.org');
}
?>
Please try it below:
function img_title_in_alt($full_img_tag){
$doc = new DOMDocument();
$doc->loadHTML($full_img_tag);
$imageTags = $doc->getElementsByTagName('img');
foreach($imageTags as $tag) {
if($tag->getAttribute('src')!==''){
return '<img src="'.$tag->getAttribute('src').'" width="'.$tag->getAttribute('width').'" height="'.$tag->getAttribute('height').'" alt="'.$tag->getAttribute('title').'" title="'.$tag->getAttribute('title').'" />';
}
}
}
Now call the function with your full html tag of image. See the example:
$image = '<img src="assets/img/test.png" alt="" title="I\'am a title tag" width="100" height="100" />';
print img_title_in_alt($image);
Let me know if you do not understand anything.
I am having trouble with retrieving the src of an image that is part of a link. For example with this I would like to retrieve the src of the img between the tag.
<img src="http://example.com/picture1234.jpg" id="pic_1234" />
I will need to do this for a couple of the links on the page that are all laid out the same. So what I tried so far is this:
$dom = new DOMDocument;
#$dom->loadHTML($html);
$i = 0;
$links = $dom->getElementsByTagName('a');
//Get images
foreach ($links as $link){
$test = $link->getAttribute('href');
if (strpos($test,'/video') !== false) {
$XV_IMG[$i] = $link->nodeValue;
$i++;
}
}
If the link does not contain an img tag only, but instead it has plain text it will work just fine. Is there any way to get the src?
Just keep using getElementsByTagName on the node like this
foreach ($link->getElementsByTagName('img') as $img) {
$XV_IMG[] = $img->getAttribute('src');
}
try to use preg_match_all
$html= '<img src="http://example.com/picture1234.jpg" id="pic_1234" />
<img src="http://example.com/picture1224.jpg" id="pic_1224" />
<img src="http://example.com/picture1434.jpg" id="pic_1434" />
<img src="http://example.com/picture1554.jpg" id="pic_1554" />
<img src="http://example.com/picture1334.jpg" id="pic_1334" />';
preg_match_all('/<a href="(.*)"><img src="(.*)" id="pic_[0-9]{1,7}" \/><\/a>/i',$html,$out);
unset($out[0]);
unset($out[1]);
print_r($out);
So I've been at this for quite a while, and the best I got is wrapping the image in a link and with a span after the image tag:
<a href="">
<img src="">
<span></span>
</a>
But wat I want is:
<a href="">
<span></span>
<img src="">
</a>
I tried al kinds of variatons and positions of
$img->parentNode->appendChild($dom->createElement('span'), $img);
and the use of insertBefore() on all kinds of places in my code and I'm completely out of ideas since I'm fairly new to the php DOM stuff. My source:
foreach($dom->getElementsByTagName('img') as $img)
{
$fancyHref = $dom->createElement('a');
$clone = $fancyHref->cloneNode();
$img->parentNode->replaceChild($clone, $img);
$clone->appendChild($img);
$img->parentNode->appendChild($dom->createElement('span'));
};
Update:
To clarify my goal: I have an img tag in the html. After it goes through the php dom I want the img tag wrapped in an a tag with a span tag before the image tag:
Before
<img src="" />
After
<a href="">
<span class=""></span>
<img src="" />
</a>
My code at the moment for doing this (without the span)
foreach($dom->getElementsByTagName('img') as $img)
{
$src = $img->getAttribute('src');
$filename = substr(strrchr($src , '/') ,1);
$filename = preg_replace('/^[.]*/', '', $filename);
$filename = explode('.', $filename);
$filename = $filename[0];
if($this->imagesTitles[$this->currentLanguage][$filename] !== '')
{
$img->setAttribute('title', $this->imagesTitles[$this->currentLanguage][$filename]);
$img->setAttribute('alt', $this->imagesTitles[$this->currentLanguage][$filename]);
}
else
{
$img->removeAttribute('title');
$img->removeAttribute('alt');
}
$classes = explode(' ', $img->getAttribute('class'));
if(!in_array('no-enlarge', $classes))
{
$fancyHref = $dom->createElement('a');
$span = $dom->createElement('span');
$span->setAttribute('class', 'magnifier');
$fancyHref->setAttribute('class', 'enlarge');
$fancyHref->setAttribute('rel', 'enlarge');
$fancyHref->setAttribute('href', $img->getAttribute('src'));
if($img->getAttribute('title') !== '')
{
$fancyHref->setAttribute('title', $img->getAttribute('title'));
$fancyHref->setAttribute('alt', $img->getAttribute('title'));
}
$clone = $fancyHref->cloneNode();
$img->parentNode->replaceChild($clone, $img);
$clone->appendChild($img);
$img->parentNode->insertBefore($span, $img);
}
$img->setAttribute('class', trim(str_replace('no-enlarge', '', $img->getAttribute('class'))));
if($img->getAttribute('class') === '')
{
$img->removeAttribute('class');
}
}
You can use the DOMNode->insertBefore() method (docs). It has the form $parentNode->insertBefore( $nodeToBeInserted, $nodeToInsertBefore ). Your code would be:
$img->parentNode->insertBefore( $dom->createElement('span'), $img );
DOMNode->appendChild() (docs) only has 1 argument (the inserted node), and this node will always be inserted after the last childNode.
Edit:
I've now tested my code, and if you would replace $img->parentNode->appendChild($dom->createElement('span')); with my line in your test case, it would end up with the correct format. You are however manipulating elements in a very confusing way. Besides that, if I test your updated code, I either end up with your desired format, or without a span element at all...
The element you want to replace is the image, because that is the only element that is originally in the document. Therefore, you should clone -that- element. While your current code eliminates the hierarchy error, you are copying all the changed code, instead of just the conflicting element and that is a waste of memory and time. When you copy the conflicting element, it is easy. You append the elements to your a in the order you want them to appear. When you append the image, append the clone instead. Do not manipulate $img. If you need to manipulate the image, you have to manipulate the clone instead. Then you just replace $img by the elements you manipulated ($fancyHref in your case).
$html = '<img src="">';
$dom = new DOMDocument();
$dom->loadHTML($html);
foreach($dom->getElementsByTagName('img') as $img) {
$fancyHref = $dom->createElement('a');
$clone = $img->cloneNode();
$span = $dom->createElement( 'span' );
#... do whatever you need to do to this span, the clone and the a element ...
#... when you are done, you can simply append all elements you have been manipulating ...
$fancyHref->appendChild( $span );
$fancyHref->appendChild( $clone );
$img->parentNode->replaceChild( $fancyHref, $img );
};
echo $dom->saveHTML();
Ummm. I know this may not classify as an answer, but don't you already have it? I mean.. I tried your code:
$dom = new DOMDocument;
$dom->loadHTMLFile("data.html");
foreach($dom->getElementsByTagName('img') as $img){
$src = $img->getAttribute('src');
$filename = substr(strrchr($src , '/') ,1);
$filename = preg_replace('/^[.]*/', '', $filename);
$filename = explode('.', $filename);
$filename = $filename[0];
$classes = explode(' ', $img->getAttribute('class'));
if(!in_array('no-enlarge', $classes))
{
$fancyHref = $dom->createElement('a');
$span = $dom->createElement('span');
$span->setAttribute('class', 'magnifier');
$fancyHref->setAttribute('class', 'enlarge');
$fancyHref->setAttribute('rel', 'enlarge');
$fancyHref->setAttribute('href', $img->getAttribute('src'));
if($img->getAttribute('title') !== '')
{
$fancyHref->setAttribute('title', $img->getAttribute('title'));
$fancyHref->setAttribute('alt', $img->getAttribute('title'));
}
$clone = $fancyHref->cloneNode();
$img->parentNode->replaceChild($clone, $img);
$clone->appendChild($img);
$img->parentNode->insertBefore($span, $img);
}
$img->setAttribute('class', trim(str_replace('no-enlarge', '', $img->getAttribute('class'))));
if($img->getAttribute('class') === ''){
$img->removeAttribute('class');
}
}
echo "<pre>" . htmlentities($dom->saveHTML()) . "</pre>";
With this (data.html) as source data:
<!doctype html>
<html>
<head>
</head>
<body>
<img src="http://lorempixel.com/g/400/200/" alt="alts" title="tits">
</body>
</html>
And the result is this:
<!DOCTYPE html>
<html>
<head>
</head>
<body>
<a class="enlarge" rel="enlarge" href="http://lorempixel.com/g/400/200/" title="tits" alt="tits">
<span class="magnifier">
</span>
<img src="http://lorempixel.com/g/400/200/" alt="alts" title="tits"></a>
</body>
</html>
So.. Isn't that what you wanted? :D
I have this:
<img class="brand-logo" src="http://www.teledynamics.com/tdresources/74c42cb2-dc7f-4548-b820-2946fbe160db.jpg" onerror="this.src='/Content/Css/Images/no_brand_logo_120_48.gif'" alt="ADTRAN">
how to get img src (http://www.teledynamics.com/tdresources/74c42cb2-dc7f-4548-b820-2946fbe160db.jpg)
I tried a lot of things and that was the last one:
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$src = $xpath->evaluate("string(//class='brand-logo']/img/#src)");
echo "$src";
That's not proper XPath syntax. Try
$nodes = $xpath->query("//img[#class='brand-logo']");
$src = $nodes->item(0)->getAttribute('src');
First you fetch the NODE that represents the image whose src you want, THEN you get the src attribute. Note that the ->query() call returns a DOMNodeList, not a node.
Try like this
<?php
$html = '<a href="/Dealer-Catalog/ManufacturerID-3">
<img class="brand-logo" src="http://www.teledynamics.com/tdresources/74c42cb2-dc7f-4548-b820-2946fbe160db.jpg" alt="ADTRAN" />
</a>';
$xml = simplexml_load_string($html);
echo $xml->img['src'];
?>
Try like this
<?php
$doc=new DOMDocument();
$doc->loadHTML('<a href="/Dealer-Catalog/ManufacturerID-3">
<img class="brand-logo" src="http://www.teledynamics.com/tdresources/74c42cb2-dc7f-4548-b820-2946fbe160db.jpg" alt="ADTRAN" />
</a>');
$xml=simplexml_import_dom($doc); // just to make xpath more simple
$images=$xml->xpath('//img');
foreach ($images as $img) {
echo $img['src'];
}?>
With xpath you can query an attribute directly, string() give it's node-value:
$src = $xpath->evaluate("string(//img[#class='brand-logo']/#src)");
However I'm really sorry to say that I have no clue how that could be done with preg_match in your case ;)
$img = '<img src="http://some-img-link" alt="some-img-alt"/>';
$src = preg_match('/<img src=\"(.*?)\">/', $img);
echo $src;
I want to get the src value from the img tag and maybe the alt value
Assuming you are always getting the img html as you shown in the question.
Now in the regular expression you provided its saying that, after the src attribute its given the closing tag for img. But in the string there is an alt attribute also. So you need to care about it also.
/<img src=\"(.*?)\".*\/>/
And if you are going to check alt also then the regular expression.
/<img src=\"(.*?)\"\s*alt=\"(.*?)\"\/>/
Also you are just checking whether its matched or not. If you need to get the matches, you need to provide a third parameter to preg_match which will fill with the matches.
$img = '<img src="http://some-img-link" alt="some-img-alt"/>';
$src = preg_match('/<img src=\"(.*?)\"\s*alt=\"(.*?)\"\/>/', $img, $results);
var_dump($results);
Note : The regex given above is not so generic one, if you could provide the img strings which will occur, will provide more strong regex.
function scrapeImage($text) {
$pattern = '/src=[\'"]?([^\'" >]+)[\'" >]/';
preg_match($pattern, $text, $link);
$link = $link[1];
$link = urldecode($link);
return $link;
}
Tested Code :
$ input=’<img src= ”http://www.site.com/file.png” > ‘;
preg_match(“<img.*?src=[\"\"'](?<url>.*?)[\"\"'].*?>”,$input,$output);
echo $output; // output = http://www.site.com/file/png
How to extract img src, title and alt from html using php?
See the first answer on this post.
You are going to use preg_match, just in a slightly different way.
Try this code:
<?php
$doc = new DOMDocument();
$doc->loadHTML('<img src="" />');
$imageTags = $doc->getElementsByTagName('img');
foreach($imageTags as $tag) {
echo $tag->getAttribute('src');
}
?>
Also you could use this library: SimpleHtmlDom
<?php
$html = new simple_html_dom();
$html->load('<html><body><img src="image/profile.jpg" alt="profile image" /></body></html>');
$imgs = $html->find('img');
foreach($imgs as $img)
print($img->src);
?>
preg_match('/<img src=\("|')([^\"]+)\("|')[^\>]?>/', $img);
You already have good enough responses above, but here is another one code (more universal):
function retrieve_img_src($img) {
if (preg_match('/<img(\s+?)([^>]*?)src=(\"|\')([^>\\3]*?)\\3([^>]*?)>/is', $img, $m) && isset($m[4]))
return $m[4];
return false;
}
You can use JQuery to get src and alt attributes
include jquery in header
<script type="text/javascript"
src="http://ajax.googleapis.com/ajax/libs/jquery/1/jquery.min.js">
</script>
//HTML
//to get src and alt attributes
<script type='text/javascript'>
// src attribute of first image with id =imgId
var src1= $('#imgId1').attr('src');
var alt1= $('#imgId1').attr('alt');
var src2= $('#imgId2').attr('src');
var alt2= $('#imgId2').attr('alt');
</script>