Get image source from html dom element - php

I am querying image using getElementsByTagName("img") and printing it using image->src , it does not work. I also tried to use image->nodeValue this to does not work.
require('simple_html_dom.php');
$dom=new DOMDocument();
$dom->loadHTML( $str); /*$str contains html output */
$xpath=new DOMXPath($dom);
$imgfind=$dom->getElementsByTagName('img'); /*finding elements by tag name img*/
foreach($imgfind as $im)
{
echo $im->src; /*this doesnt work */
/*echo $im->nodeValue; and also this doesnt work (i tried both of them separately ,Neither of them worked)*/
// echo "<img src=".$im->nodeValue."</img><br>"; //This also did not work
}
/*the image is encolsed within div tags.so i tried to query value of div and print but still image was not printed*/
$printimage=$xpath->query('//div[#class="abc"]');
foreach($printimage as $image)
{
echo $image->src; //still i could not accomplish my task
}

Okay, use this to display your image:
foreach($imgfind as $im)
{
echo "<img src=".$im->getAttribute('src')."/>"; //use this instead of echo $im->src;
}
and it will surely display your image. Make sure path to the image is correct.

Espero te sirva
$dom = new DOMDocument();
$filename = "https://www.amazon.com/dp/B0896WB9XD/";
$html = file_get_contents($filename);
#$dom->loadHTML($html);
$imgfind=$dom->getElementsByTagName('img');
foreach($imgfind as $im)
{
$ids= $im->getAttribute('id');
if ($ids == 'landingImage') {
$im2 = $im->getAttribute('src');
echo '<img src="'.$im2.'">';
}
else{
}
}
para amazon.

Related

Can't append img element using php domdocument

I'm having a weird issue trying to append an image element to a noscript element using php DomDocument.
If I create a new div node I can append it without issue to the noscript element but as soon as a try to append an image element the script just times out.
What am I doing wrong?
<?php
$html = '<!DOCTYPE html><html><head><title>Sample</title></head><body><img src="https://example.com/images/example.jpg"></body></html>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$images = $doc->getElementsByTagName('img');
foreach ($images as $image) {
$src = $image->getAttribute('src');
$noscript = $doc->createElement('noscript');
$node = $doc->createElement('div');
//$node = $doc->createElement('img'); If a uncomment this line the script just times out
$node->setAttribute('src', $src);
$noscript->appendChild($node);
$image->setAttribute('x-data-src', $src);
$image->removeAttribute('src');
$image->parentNode->appendChild($noscript);
//$image->parentNode->appendChild($newImage);
}
$body = $doc->saveHTML();
echo $body;
You're getting caught in a recursive loop. This will help you visualize what's going on. I've added indenting for clarity:
php > $html = '<!DOCTYPE html><html><head><title>Sample</title></head><body><img src="https://example.com/images/example.jpg"></body></html>';
php >
php > $doc = new DOMDocument();
php > $doc->loadHTML($html);
php >
php > $images = $doc->getElementsByTagName('img');
php >
php > $count=0;
php > foreach ($images as $image) {
php { $count++;
php { if($count>4) {
php { die('limit exceeded');
php { }
php {
php { $src = $image->getAttribute('src');
php { $noscript = $doc->createElement('noscript');
php {
php { //$node = $doc->createElement('div');
php { $node = $doc->createElement('img'); //If a uncomment this line the script just times out
php {
php { $node->setAttribute('src', $src);
php {
php { $noscript->appendChild($node);
php {
php { $image->setAttribute('x-data-src', $src);
php { $image->removeAttribute('src');
php { $image->parentNode->appendChild($noscript);
php { //$image->parentNode->appendChild($newImage);
php {
php { }
limit exceeded
php > $body = $doc->saveHTML();
php >
php > echo $body;
<!DOCTYPE html>
<html><head><title>Sample</title></head><body>
<img x-data-src="https://example.com/images/example.jpg">
<noscript>
<img x-data-src="https://example.com/images/example.jpg">
<noscript>
<img x-data-src="https://example.com/images/example.jpg">
<noscript>
<img x-data-src="https://example.com/images/example.jpg">
<noscript>
<img src="https://example.com/images/example.jpg">
</noscript>
</noscript>
</noscript>
</noscript>
</body></html>
php >
The troublesome line causing the recursion is
$image->parentNode->appendChild($noscript);
if you comment that out, the recursion goes away. Notice that when it recurses, the x-data-src is being applied to all but the last one.
I haven't quite figured out what is causing this behaviour, but hopefully being able to visualize it will help you diagnose it further.
**UPDATE
The OP took this and ran with it, and completed the answer with his solution as shown below.
The problem was in fact that getElementsByTagName returns a LiveNodeList so appending an image to the doc will cause the infinite recursion.
I solved it by first collecting all the image tags in a simple array
<?php
$html = '<!DOCTYPE html><html><head><title>Sample</title></head><body><img src="https://example.com/images/example.jpg"></body></html>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$images = $doc->getElementsByTagName('img');
$normal_array = [];
foreach ($images as $image) {
$normal_array[] = $image;
}
// Now we have all tags in a simple array NOT in a Live Node List
foreach ($normal_array as $image) {
$src = $image->getAttribute('src');
$noscript = $doc->createElement('noscript');
$node = $doc->createElement('img'); //If a uncomment this line the script just times out
$node->setAttribute('src', $src);
$noscript->appendChild($node);
$image->setAttribute('x-data-src', $src);
$image->removeAttribute('src');
$image->parentNode->appendChild($noscript);
//$image->parentNode->appendChild($newImage);
}
$body = $doc->saveHTML();

How Can I get a link from each HTML document in a directory and display it?

What I have so far:
<?php
$html = file_get_contents('content/');
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('a') as $node)
{
echo $node->nodeValue.': '.$node->getAttribute("href")."\n";
}
?>
I have a directory called 'content' that has several HTML documents in it. Edit: Each document has one link in it, wrapped around an image. I want to parse each document and display the link from each page as an image. Would I need a loop to step through each document?
You can try something like this:
foreach (glob("content/*.html") as $filename) {
$html = file_get_contents($filename);
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementsByTagName('a') as $node) {
echo $node->nodeValue.': '.$node->getAttribute("href")."\n";
}
}
Well Andrej Ludinovskov's answer helped guide me to the answer but it took a lot trial and error so here it is. How to fetch all the the links as images.
foreach ($dom->getElementsByTagName('a') as $link) {
echo "<a href=" .$link->getAttribute("href"). ">";
foreach ($dom->getElementsByTagName('img') as $img) {
echo "<img src=".$img->getAttribute('src').">";
}
}
hopefully this can help someone else.

Image pulling script not working PHP

Not sure why the code below is not working, its displaying the "Else" value in the IF statement basically saying that there are no IMG tags found on the page but.. im sure they are there? any advice or guidance will be appreciated.
// This variable will contain all the HTML source code of the sample page
$htmlContent = file_get_contents('https://www.instagram.com/ken_flavius/');
var_dump($htmlContent);
// We'll add all the images in this array
$images = [];
// Instantiate a new object of class DOMDocument
$doc = new DOMDocument();
// Load the HTML doc into the object
$doc->loadHTML($htmlContent);
// Get all the IMG tags in the document
$elements = $doc->getElementsByTagName('img');
// If we get at least one result
if($elements->length > 0)
{
// Loop on all of the IMG tags
foreach($elements as $element)
{
// Get the attribute SRC of the IMG tag (this is the link of the image)
$src = $element->getAttribute('src');
if (strlen($src) > 0) {
// Add the link to the array containing all the links
array_push($images, $src);
}
}
//show all links
echo '<pre>'."\r\n";
print_r($images);
echo '</pre>'."\r\n";
} else {
// No result, it means that there were no IMG tags
echo 'no img tag found in the HTML source provided!';
}
Edited it to show the exact example that im using.
$url="http://example.com";
$html = file_get_contents($url);
$doc = new DOMDocument();
#$doc->loadHTML($html);
$tags = $doc->getElementsByTagName('img');
foreach ($tags as $tag) {
echo $tag->getAttribute('src');
}

DOMDocument grab html between two p tags [duplicate]

I'm trying to replace video links inside a string - here's my code:
$doc = new DOMDocument();
$doc->loadHTML($content);
foreach ($doc->getElementsByTagName("a") as $link)
{
$url = $link->getAttribute("href");
if(strpos($url, ".flv"))
{
echo $link->outerHTML();
}
}
Unfortunately, outerHTML doesn't work when I'm trying to get the html code for the full hyperlink like <a href='http://www.myurl.com/video.flv'></a>
Any ideas how to achieve this?
As of PHP 5.3.6 you can pass a node to saveHtml, e.g.
$domDocument->saveHtml($nodeToGetTheOuterHtmlFrom);
Previous versions of PHP did not implement that possibility. You'd have to use saveXml(), but that would create XML compliant markup. In the case of an <a> element, that shouldn't be an issue though.
See http://blog.gordon-oheim.biz/2011-03-17-The-DOM-Goodie-in-PHP-5.3.6/
You can find a couple of propositions in the users notes of the DOM section of the PHP Manual.
For example, here's one posted by xwisdom :
<?php
// code taken from the Raxan PDI framework
// returns the html content of an element
protected function nodeContent($n, $outer=false) {
$d = new DOMDocument('1.0');
$b = $d->importNode($n->cloneNode(true),true);
$d->appendChild($b); $h = $d->saveHTML();
// remove outter tags
if (!$outer) $h = substr($h,strpos($h,'>')+1,-(strlen($n->nodeName)+4));
return $h;
}
?>
The best possible solution is to define your own function which will return you outerhtml:
function outerHTML($e) {
$doc = new DOMDocument();
$doc->appendChild($doc->importNode($e, true));
return $doc->saveHTML();
}
than you can use in your code
echo outerHTML($link);
Rename a file with href to links.html or links.html to say google.com/fly.html that has flv in it or change flv to wmv etc you want href from if there are other href
it will pick them up as well
<?php
$contents = file_get_contents("links.html");
$domdoc = new DOMDocument();
$domdoc->preservewhitespaces=“false”;
$domdoc->loadHTML($contents);
$xpath = new DOMXpath($domdoc);
$query = '//#href';
$nodeList = $xpath->query($query);
foreach ($nodeList as $node){
if(strpos($node->nodeValue, ".flv")){
$linksList = $node->nodeValue;
$htmlAnchor = new DOMElement("a", $linksList);
$htmlURL = new DOMAttr("href", $linksList);
$domdoc->appendChild($htmlAnchor);
$htmlAnchor->appendChild($htmlURL);
$domdoc->saveHTML();
echo ("<a href='". $node->nodeValue. "'>". $node->nodeValue. "</a><br />");
}
}
echo("done");
?>

Regex Replacement Dependent On Class

I have the following code that replaces all tags on a page and adds the nCode image resizer to it. The code is as follows:
function ncode_the_content($content) {
return preg_replace("/<img([^`|>]*)>/im", "<img onload=\"NcodeImageResizer.createOn(this);\"$1>", $content); }
}
What I need to do is make it so that if an image has the class of "noresize" it doesn't do the preg_match.
I have only managed to get it so that if there is the "noresize" class anywhere on the page it stops resizing all images instead of just the one with the correct class.
Any suggestions?
UPDATE:
Am I even remotely in the right ballpark with this?
function ncode_the_content($content) {
//Load the HTML page
$html = file_get_contents($content);
//Parse it. Here we use loadHTML as a static method
//to parse the HTML and create the DOM object in one go.
#$dom = DOMDocument::loadHTML($html);
//Init the XPath object
$xpath = new DOMXpath($dom);
//Query the DOM
$linksnoresize = $xpath->query( 'img[#class = "noresize"]' );
$links = $xpath->query( 'img[]' );
//Display the results as in the previous example
foreach($links as $link){
echo $link->getAttribute('onload'), 'NcodeImageResizer.createOn(this);';
}
foreach($linksnoresize as $link){
echo $link->getAttribute('onload'), '';
}
}
Here's some untested code:
$dom = DOMDocument::loadHTML($content);
$images = $dom->getElementsByTagName("img");
foreach ($images as $image) {
if (!strstr($image->getAttribute("class"), "noresize")) {
$image->setAttribute("onload", "NcodeImageResizer.createOn(this);");
}
}
But, if it were me, I would eschew any such inline event handler and instead just find the appropriate elements with Javascript.
I ended up just using pure CSS and adding a around the images I didn't want to be resized. Forced the width and height of that div back to auto and then removed the warning message that was displayed above them. Seems to work fine. Thanks for your help :)

Categories