I basically want to get ALL the images in any website using DOMDocument.
but then i cant even load my html due to some reasons I dont know yet.
$url="http://<any_url_here>/";
$dom = new DOMDocument();
#$dom->loadHTML($url); //i have also tried removing #
$dom->preserveWhiteSpace = false;
$dom->saveHTML();
$images = $dom->getElementsByTagName('img');
foreach ($images as $image)
{
echo $image->getAttribute('src');
}
what happens is nothing gets printed . or did I do something wrong with the code?
You don't get a result because $dom->loadHTML() expects html. You give it an url, you first need to get the html of the page you want to parse. You can use file_get_contents() for that.
I used this in my image grab class. Works fine for me.
$html = file_get_contents('http://www.google.com/');
$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
echo $image->getAttribute('src');
}
Related
Hi I want to get all images from a website and store it on MySQL using PHP I am using HTML dom its showing me an error please help me here is this error
Warning: DOMDocument::loadHTML(): htmlParseEntityRef: expecting ';' in Entity, line: 73 in C:\xampp\htdocs\Oifind\Formula.php on line 3
and here is my code
<?php
$html = file_get_contents('http://www.google.com/');
$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
$src= $image->getAttribute('src');
echo "<img src='".$src."'>";
}
?>
I have this xml code, and I need to get every value from . I've tried but what I get is only the first value of . I wonder what's wrong with my code.
Here's the xml code:
<item>
<g:detailed_images>
<g:detailed_image>hat.png</g:detailed_image>
<g:detailed_image>tie.png</g:detailed_image>
<g:detailed_image>eye_glass.png</g:detailed_image>
<g:detailed_image>watch.png</g:detailed_image>
</g:detailed_images>
</item>
<item>
<g:detailed_images>
<g:detailed_image>shoe.png</g:detailed_image>
<g:detailed_image>socks.png</g:detailed_image>
<g:detailed_image>hand_gloves.png</g:detailed_image>
<g:detailed_image>scarf.png</g:detailed_image>
</g:detailed_images>
</item>
And this is my code:
foreach($xpath->evaluate('//item') as $item)
{
$detailed_images = $xpath->evaluate('g:detailed_images', $item);
foreach ($detailed_images as $img)
{
$simg = $xpath->evaluate('string(g:detailed_image)', $img);
echo 'image = ';
echo $simg;
}
}
My result is:
image = hat.png
image = shoe.png
While what I want is this:
image = hat.png
image = tie.png
image = eye_glass.png
image = watch.png
image = shoe.png
image = socks.png
image = hand_gloves.png
image = scarf.png
Thanks for the help.
As you can see, you're only getting the first detailed_image of each detailed_images. So, keeping the way you're doing it, you'd need to have another foreach on $simg and print each resulting node. But you don't need to do all that XPath querying to get those elements. You can get there just fine with only one query:
//item/g:detailed_images/g:detailed_image
PHP Code:
$dom = new DOMDocument;
$dom->loadXML($xml);
$xpath = new DOMXPath($dom);
foreach($xpath->evaluate('//item/g:detailed_images/g:detailed_image') as $item) {
var_dump($item->nodeValue);
}
Demo
I'm trying to do the same that this jQuery function in PHP (FB Open Graph doesn't execute JS Code, so it has to be executed server side) :
<script>captureurl=jQuery('.blog-content').find('img').attr('src');
jQuery('head').append("<meta property='og:image' content="+captureurl+"/></meta>");</script>
I've seen I could get an image attribute like that :
<?php doc = new DOMDocument();
$doc->loadHTMLFile($url);
$xpath = new DOMXpath($doc);
$imgs = $xpath->query("//img");
for ($i=0; $i < $imgs->length; $i++) {
$img = $imgs->item($i);
$src = $img->getAttribute("src");
// do something with $src
} ?>
But how can I target the first image src in the div with .blog-content class?
Thanks for your help :)
Replace $xpath->query("//img") with following:
$imgs = $xpath->query('//img[contains(attribute::class, "blog-content")]'); //here we are querying domdocument to find img which has class .blog-content
I'm trying to print out a list of image file extensions within a webpage excluding the .png extension.
I only want to parse all image file names from image url's within a website that use div class = cartoon only.
Example Structure:
<div class="cartoon">
<img src="URL/images/element8/12345.png" alt="cartoon">
Desired Output: 12345
Here is my code that I use to return all images
include('simple_html_dom.php');
$html = new simple_html_dom();
$html->load_file('URL');
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query("//img"); // find your image
$imageTags = $doc->getElementsByTagName('img');
foreach($imageTags as $tag) {
echo $tag->getAttribute('src');
}
You want to do it with xpath?
How about:
.//*[contains(#class, "cartoon")]//img[not(contains(#src, "png"))]
I have image in html. I parse it to DOMDocument and start working with it...
$doc = new DOMDocument();
$doc->loadHTML($article_header);
$imgs = $doc->getElementsByTagName('img');
foreach ($imgs as $img) {
$container = $img->parentNode;
if ($container->tagName != "a") {
$image_inside=utf8_decode($img->nodeValue);
echo "3".$image_inside;
die;
}
}
This code works fine line 3 gets image. line 6 understands that there is no "a" tag above this "img" tag, and line 8 must print out my initial image. But the thing is I only see "3" without image tag and etc...
I did inspect element and nothing is there. just "3" is coming out. Why I cannot print out image ?
You could use:
DOMDocument::saveXML($img);
From PHP Documetation's saveXML().
$doc = new DOMDocument();
$doc->loadHTML($article_header);
$imgs = $doc->getElementsByTagName('img');
foreach ($imgs as $img) {
$container = $img->parentNode;
if ($container->tagName != "a") {
echo utf8_decode($doc->saveXML($img));
die;
}
}
If you're using PHP 5.3.6 you could use (from How to return outer html of DOMDocument?)
$doc->saveHtml($img);
Note the caveat mentioned in the linked-to question:
(...) use saveXml(), but that would
create XML compliant markup. In the
case of an <a>(<img>) element, that shouldn't
be an issue though.