I have this:
<img class="brand-logo" src="http://www.teledynamics.com/tdresources/74c42cb2-dc7f-4548-b820-2946fbe160db.jpg" onerror="this.src='/Content/Css/Images/no_brand_logo_120_48.gif'" alt="ADTRAN">
how to get img src (http://www.teledynamics.com/tdresources/74c42cb2-dc7f-4548-b820-2946fbe160db.jpg)
I tried a lot of things and that was the last one:
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$src = $xpath->evaluate("string(//class='brand-logo']/img/#src)");
echo "$src";
That's not proper XPath syntax. Try
$nodes = $xpath->query("//img[#class='brand-logo']");
$src = $nodes->item(0)->getAttribute('src');
First you fetch the NODE that represents the image whose src you want, THEN you get the src attribute. Note that the ->query() call returns a DOMNodeList, not a node.
Try like this
<?php
$html = '<a href="/Dealer-Catalog/ManufacturerID-3">
<img class="brand-logo" src="http://www.teledynamics.com/tdresources/74c42cb2-dc7f-4548-b820-2946fbe160db.jpg" alt="ADTRAN" />
</a>';
$xml = simplexml_load_string($html);
echo $xml->img['src'];
?>
Try like this
<?php
$doc=new DOMDocument();
$doc->loadHTML('<a href="/Dealer-Catalog/ManufacturerID-3">
<img class="brand-logo" src="http://www.teledynamics.com/tdresources/74c42cb2-dc7f-4548-b820-2946fbe160db.jpg" alt="ADTRAN" />
</a>');
$xml=simplexml_import_dom($doc); // just to make xpath more simple
$images=$xml->xpath('//img');
foreach ($images as $img) {
echo $img['src'];
}?>
With xpath you can query an attribute directly, string() give it's node-value:
$src = $xpath->evaluate("string(//img[#class='brand-logo']/#src)");
However I'm really sorry to say that I have no clue how that could be done with preg_match in your case ;)
Related
I would like to replace my empty alt tags on images in a string. I have a string that contains all the text for a curtain page. In the text are also images, and a lot of them have empty tags (old data), but most of the time they do have title tags.
For example:
<img src="assets/img/test.png" alt="" title="I'am a title tag" width="100" height="100" />
What I wish to have:
<img src="assets/img/test.png" alt="" title="I'am a title tag" alt="I'am a title tag" width="100" height="100" />
So:
I need to find all the images in my string, loop trough the images, find title tags, find alt tags, and replace the empty alt tags with the title tags that do have a value.
This is what i tried:
preg_match_all('/<img[^>]+>/i',$return, $text);
if(isset($text)) {
foreach( $text as $itemImg ) {
foreach( $itemImg as $item ) {
$array = array();
preg_match( '/title="([^"]*)"/i', $item, $array );
if(isset($array[1])) {
//So $array[1] is a title tag, now what?
}
}
}
}
I don't know have to complete the code, and I think there must be a easier fix for this. Suggestions?
Using Regex is not a good approach you should use DOMDocument for parsing HTML. Here we are querying on those elements whose alt attribute is empty which is actually asked in question.
Try this code snippet here
<?php
ini_set('display_errors', 1);
$string=<<<HTML
<img src="assets/img/test1.png" alt="" title="I'am a title tag" width="100" height="100" />
<img src="assets/img/test2.png" alt="" title="I'am a title tag" width="100" height="100" />
<img src="assets/img/test3.png" alt="" title="I'am a title tag" width="100" height="100" />
HTML;
$domDocument = new DOMDocument();
$domDocument->loadHTML($string,LIBXML_HTML_NODEFDTD);
$domXPath = new DOMXPath($domDocument);
$results = $domXPath->query('//img[#alt=""]');
foreach($results as $result)
{
$title=$result->getAttribute("title");
$result->setAttribute("alt",$title);
echo $domDocument->saveHTML($result);
echo PHP_EOL;
}
maybe you could use Javascript for this kind of things with jquery
like:
$('img').each(function(){
$(this).attr('alt', $(this).attr('title'));
});
hope it helps
Regards.
What you want here is an HTML parser library that can manipulate HTML and then save it again. By using regular expressions to modify HTML markup, you're setting yourself up for a mess.
The DOM module built into PHP offers this functionality: http://php.net/manual/en/book.dom.php
Here's an example (cribbed from this article):
$dom = new DOMDocument;
$dom->loadHTML($html);
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
$image->setAttribute('src', 'http://example.com/' . $image->getAttribute('src'));
}
$html = $dom->saveHTML();
You can use DOMDocument to achieve your requirement. Below is one of the sample code for your reference:
<?php
$html = 'test';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query("//a[contains(concat(' ', normalize-space(#rel), ' '), ' external ')]");
foreach($nodes as $node) {
$node->setAttribute('href', 'http://example.org');
}
?>
Please try it below:
function img_title_in_alt($full_img_tag){
$doc = new DOMDocument();
$doc->loadHTML($full_img_tag);
$imageTags = $doc->getElementsByTagName('img');
foreach($imageTags as $tag) {
if($tag->getAttribute('src')!==''){
return '<img src="'.$tag->getAttribute('src').'" width="'.$tag->getAttribute('width').'" height="'.$tag->getAttribute('height').'" alt="'.$tag->getAttribute('title').'" title="'.$tag->getAttribute('title').'" />';
}
}
}
Now call the function with your full html tag of image. See the example:
$image = '<img src="assets/img/test.png" alt="" title="I\'am a title tag" width="100" height="100" />';
print img_title_in_alt($image);
Let me know if you do not understand anything.
I am using a script to get src of a <img> with class="cover-image"
The webpage is of a Google Playstore page.
Here's the script:
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTMLFile('https://play.google.com/store/apps/details?id=com.igg.castleclash');
libxml_clear_errors();
$xp = new DOMXPath($dom);
$image_src = $xp->query("//img[#class='cover-image']/#src");
foreach($image_src as $attr) {
echo $attr->value. "<br/>";
}
Issue is, there's only one <img> tag with class name cover-image, but I am still getting 15 src values.
If your intent is to just get the first one, then you can add this on the xpath query:
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTMLFile('https://play.google.com/store/apps/details?id=com.igg.castleclash');
libxml_clear_errors();
$xp = new DOMXPath($dom);
$image_src = $xp->evaluate("string(//img[#class='cover-image'][1]/#src)");
echo $image_src;
echo "<img src='$image_src' alt='' />";
Also, if you want that cover image that's on the topmost portion of the site (near the header part), you could just point it directly to it:
$image_src = $xp->evaluate("
string(
//div[#class='details-info']
/div[#class='cover-container']
/img[#class='cover-image']/#src
)
"); // much more specific
Are you sure there should be only 1 tag? I see 15 tags with this class in html code.
I am having trouble with retrieving the src of an image that is part of a link. For example with this I would like to retrieve the src of the img between the tag.
<img src="http://example.com/picture1234.jpg" id="pic_1234" />
I will need to do this for a couple of the links on the page that are all laid out the same. So what I tried so far is this:
$dom = new DOMDocument;
#$dom->loadHTML($html);
$i = 0;
$links = $dom->getElementsByTagName('a');
//Get images
foreach ($links as $link){
$test = $link->getAttribute('href');
if (strpos($test,'/video') !== false) {
$XV_IMG[$i] = $link->nodeValue;
$i++;
}
}
If the link does not contain an img tag only, but instead it has plain text it will work just fine. Is there any way to get the src?
Just keep using getElementsByTagName on the node like this
foreach ($link->getElementsByTagName('img') as $img) {
$XV_IMG[] = $img->getAttribute('src');
}
try to use preg_match_all
$html= '<img src="http://example.com/picture1234.jpg" id="pic_1234" />
<img src="http://example.com/picture1224.jpg" id="pic_1224" />
<img src="http://example.com/picture1434.jpg" id="pic_1434" />
<img src="http://example.com/picture1554.jpg" id="pic_1554" />
<img src="http://example.com/picture1334.jpg" id="pic_1334" />';
preg_match_all('/<a href="(.*)"><img src="(.*)" id="pic_[0-9]{1,7}" \/><\/a>/i',$html,$out);
unset($out[0]);
unset($out[1]);
print_r($out);
Tasks:
All occurrences must be replaced with the value of constant DEVICE which I have already per-defined using some other function. BUT should not replace from the code wrapped within the <pre> and <code> tags.
This task need to be accomplished without the use of any JavaScript.
Sample Mark-up:
<div class="wrapper">
<img src="img/demo/desktop-img/sample1.jpg" width="200" height="200" alt="image">
<img src="img/demo/desktop-img/sample2.jpg" width="200" height="200" alt="image">
<pre>
<img src="img/demo/desktop-img/sample1.jpg" width="200" height="200" alt="image">
<img src="img/demo/desktop-img/sample2.jpg" width="200" height="200" alt="image">
</pre>
<code>
<img src="img/demo/desktop-img/sample1.jpg" width="200" height="200" alt="image">
<img src="img/demo/desktop-img/sample2.jpg" width="200" height="200" alt="image">
</code>
</div>
I tried the following but it is replacing the occurrences within the <pre> and <code> tags too and that is exactly what should not happen. I am using PHP as my programming language.
$content = str_replace('/desktop-img/', '/' . DEVICE . '-img/', $content);
I even tried the following but no luck
<?php
function changeImagePaths ($content) { // content here comes as string via some other functon.
$dom = new DOMDocument;
$dom->loadHTML($content);
$nodes = $dom->getElementsByTagName('[self::img][not(ancestor::pre) and not(ancestor::code)]');
foreach( $nodes as $node ) {
// What is the correct way to retrive all image-paths
// that are not wrapped within <pre> or <code> tags and how to use them in the code?
$node->nodeValue = str_replace('desktop-img', DEVICE, $node->nodeValue);
}
$content = $dom->saveHTML();
return $content;
}
?>
Challenge:
I think it should be possible using the DOM method but I am not able to figure out the correct syntax.
I am very new to programming hence still in the learning process. Please be gentle and illustrate you answer with code examples for easy understanding.
My Questions:
What should be the correct syntax for DOM method?
Is using DOM the correct way?
Are there going to be any challenges / performance hits, using the DOM method?
Is there any other better way of doing it without using JavaScript?
This is a quick work, so it may not look nice, but hopefully you get the idea.
Online demo
function walkDOM($node)
{
if($node->nodeName=="pre" || $node->nodeName=="code")
{
return;
}
elseif($node->nodeName=="img")
{
$node->attributes->getNamedItem("src")->nodeValue=str_replace('desktop-img','mobile-img',$node->attributes->getNamedItem("src")->nodeValue);
}
elseif($node->hasChildNodes())
{
foreach($node->childNodes as $child)
{
walkDOM($child);
}
}
}
function changeImagePath($html)
{
$dom=new DOMDocument;
$dom->preserveWhiteSpace=true;
$dom->loadHTML($html);
$root=$dom->documentElement;
walkDOM($root);
$dom->formatOutput=false;
return $dom->saveHTML($root);
}
The thought is to recursively walk through the DOM tree, skip every <pre> and <code>, and change all <img> that encountered.
This is pretty straight forward, but you may notice that since you treat it as HTML, DOM automatically add some tags to "fulfill" it, and format it in a (IMO) quite strange manner.
Xpath is your friend:
$xpath = new DOMXpath($dom);
foreach($xpath->query('//img[not(ancestor::pre) and not(ancestor::code)]') as $img){
$img->setAttribute('src', 'foo');
}
$path1='C:/ff/ss.html';
$file=file_get_contents($path1);
$dom = new DOMDocument;
#$dom->loadHTML($file);
$links = $dom->getElementsByTagName('img');
//extract img src path from the html page
foreach ($links as $link) {
$re= $link->getAttribute('src');
$a[]=$re;
}
$oldpth=explode('/',$path1);
$c=count($oldpth)-1;
$fname=$oldpth[$c];
$pth=array_slice($oldpth,0,$c);
$cpth=implode('/',$pth);
foreach($a as $v) {
if(is_file ($cpth.'/'.$v)) {
$c=explode('/',$v);
$c[0]="xyz007";
$f=implode('/',$c);
$file=str_replace ($v,$f,$file);
}
}
$path2='D:/mail/newpath/';
$wnew=fopen($path2.$fname,'w+');
fwrite($wnew,$file);
I am writing a little scraper script that will find the image URL that has a particular class name. I know that my cURL and DOMDocument is functioning okay, and even the DomXPath really (as far as I can tell, there are no errors) But I am struggling to work out how to get the URL of the xpath query results.
My code so far:
$dom = new DOMDocument();
#$dom->loadHTML($x);
$xpath = new DomXpath($dom);
$div = $xpath->query('//*[#class="productImage"]');
var_dump($div);
echo $div->item(0);
If I var_dump($x) the page outputs no problem. So the CURL is working fine. But I do not know how to get the data that is contained in the $div. I am trying to find an Image with a class of 'productImage' which looks like:
<img src="/uploads/5W/yP/5WyPP4l7Z-jmZRzu_MJ6zg/1077-d.jpg" border="1" alt="Album" class="productImage">
I want the source of that image tag.
Any suggestions?
$dom = new DOMDocument();
$dom->loadHTML($x);
$xpath = new DomXpath($dom);
$imgs = $xpath->query('//*[#class="productImage"]');
foreach($imgs as $img)
{
echo 'ImgSrc: ' . $img->getAttribute('src') .'<br />' . PHP_EOL;
}
Try that...
== EDIT: Additional Info ==
The reason I use a loop here is because you may find more than one img. If you know there is only one element (or you want the first dom node found) you can access the elelement from the domnodelist via the item method of domnodelist - like so:
$dom = new DOMDocument();
$dom->loadHTML($x);
$xpath = new DomXpath($dom);
$img = $xpath->query('//*[#class="productImage"]');
echo 'ImgSrc: ' . $img->item(0)->getAttribute('src') .'<br />' . PHP_EOL;
You don't actually need to use XPath here, because it seems that you're just after images and that can be done by using DOMDocument::getElementsByTagName(), followed by a simple filter:
foreach ($dom->getElementsByTagName('img') as $image) {
$class = $image->getAttribute('class');
if (strpos(" $class ", " productImage ") !== false) {
$url = $image->getAttribute('src');
// do stuff
}
}
Then, you can get the src attribute by using DOMElement::getAttribute():
echo $image->getAttribute('src');