printing out html content from domelement using nodeValue - php

I have image in html. I parse it to DOMDocument and start working with it...
$doc = new DOMDocument();
$doc->loadHTML($article_header);
$imgs = $doc->getElementsByTagName('img');
foreach ($imgs as $img) {
$container = $img->parentNode;
if ($container->tagName != "a") {
$image_inside=utf8_decode($img->nodeValue);
echo "3".$image_inside;
die;
}
}
This code works fine line 3 gets image. line 6 understands that there is no "a" tag above this "img" tag, and line 8 must print out my initial image. But the thing is I only see "3" without image tag and etc...
I did inspect element and nothing is there. just "3" is coming out. Why I cannot print out image ?

You could use:
DOMDocument::saveXML($img);
From PHP Documetation's saveXML().
$doc = new DOMDocument();
$doc->loadHTML($article_header);
$imgs = $doc->getElementsByTagName('img');
foreach ($imgs as $img) {
$container = $img->parentNode;
if ($container->tagName != "a") {
echo utf8_decode($doc->saveXML($img));
die;
}
}
If you're using PHP 5.3.6 you could use (from How to return outer html of DOMDocument?)
$doc->saveHtml($img);
Note the caveat mentioned in the linked-to question:
(...) use saveXml(), but that would
create XML compliant markup. In the
case of an <a>(<img>) element, that shouldn't
be an issue though.

Related

Accessing child element

I have this xml code, and I need to get every value from . I've tried but what I get is only the first value of . I wonder what's wrong with my code.
Here's the xml code:
<item>
<g:detailed_images>
<g:detailed_image>hat.png</g:detailed_image>
<g:detailed_image>tie.png</g:detailed_image>
<g:detailed_image>eye_glass.png</g:detailed_image>
<g:detailed_image>watch.png</g:detailed_image>
</g:detailed_images>
</item>
<item>
<g:detailed_images>
<g:detailed_image>shoe.png</g:detailed_image>
<g:detailed_image>socks.png</g:detailed_image>
<g:detailed_image>hand_gloves.png</g:detailed_image>
<g:detailed_image>scarf.png</g:detailed_image>
</g:detailed_images>
</item>
And this is my code:
foreach($xpath->evaluate('//item') as $item)
{
$detailed_images = $xpath->evaluate('g:detailed_images', $item);
foreach ($detailed_images as $img)
{
$simg = $xpath->evaluate('string(g:detailed_image)', $img);
echo 'image = ';
echo $simg;
}
}
My result is:
image = hat.png
image = shoe.png
While what I want is this:
image = hat.png
image = tie.png
image = eye_glass.png
image = watch.png
image = shoe.png
image = socks.png
image = hand_gloves.png
image = scarf.png
Thanks for the help.
As you can see, you're only getting the first detailed_image of each detailed_images. So, keeping the way you're doing it, you'd need to have another foreach on $simg and print each resulting node. But you don't need to do all that XPath querying to get those elements. You can get there just fine with only one query:
//item/g:detailed_images/g:detailed_image
PHP Code:
$dom = new DOMDocument;
$dom->loadXML($xml);
$xpath = new DOMXPath($dom);
foreach($xpath->evaluate('//item/g:detailed_images/g:detailed_image') as $item) {
var_dump($item->nodeValue);
}
Demo

Download and merge multiple XML files with PHP, Foreach and Dom

I'm breaking my tooths since a week with this problem : i'm trying to download and merge dynamicly multiples xml files with an API. I can download all the files but i can't merge them without having multiple roots elements... It's frustrating and i don t find any suggestion. Here is my code :
<?php
$fileout = 'file.xml';
unlink($fileout);
$baseurl="https://websitewithapi.com/";
$topcategories=array("COOL","DRIVE","FUN");
foreach ($topcategories as $topcategory) {
$url_cata_test="https://websitewithapi.com/&filters=topcategory:$topcategory&limit=1";
$jsontest = file_get_contents($url_cata_test);
$arrtest=json_decode($jsontest);
$items=$arrtest->pagination->count;
$pagemax=ceil($items/250);
$pagetest= range(0,$pagemax);
foreach ($pagetest as $page) {
$url_cata="$baseurl&filters=topcategory:$topcategory&offset=$page&limit=250";
echo "Cat en cours d import: ".$topcategory."\n";
echo "Page en cours d import: ".$page."\n";
echo "URL Cata: $url_cata \n";
};
$dom = new DOMDocument();
$dom->appendChild($dom->createElement('superdeals'));
$files= array($url_cata);
foreach ($files as $filename) {
$addDom = new DOMDocument();
$addDom->load($filename);
if ($addDom->documentElement->getElementsByTagName('products')) {
foreach ($addDom->documentElement->getElementsByTagName('product') as $node) {
$dom->documentElement->appendChild(
$dom->importNode($node, TRUE)
);
}
}
$dom->formatOutput = true;
file_put_contents($fileout, $dom->saveXML(), FILE_APPEND);
}
};
?>
I got always the same problem with "associate" files in the same file but with multiple roots ! Is there a thing i miss ?
Thank you.
Simply initialize the DOM object and save its file output outside of all four foreach loops. Currently you are using FILE APPEND for each iteration which is not an XML DOM method but simply concatenates text content. Continue to grow your XML tree within the loops and then output the singular XML once without any file appends.
$fileout = 'file.xml';
unlink($fileout);
// INITIALIZE DOM TREE
$dom = new DOMDocument();
$dom->appendChild($dom->createElement('superdeals'));
...
foreach ($topcategories as $topcategory) {
...
foreach ($pagetest as $page) {
...
foreach ($files as $filename) {
...
foreach ($addDom->documentElement->getElementsByTagName('product') as $node) {
$dom->documentElement->appendChild(
$dom->importNode($node, TRUE)
);
}
}
}
}
// OUTPUT DOM TREE
file_put_contents($fileout, $dom->saveXML());
It looks like you are adding root element called "superdeals" to your document, and then adding the contents of each file at the root level.
You need to add the contents of each file as a child of the "superdeals" element, not as a child of the document.
Save the root node:
$root = $dom->appendChild($dom->createElement('superdeals'));
then instead of
$dom->documentElement->appendChild($dom->importNode($node, TRUE))
add child to the root node (not the document node):
$root->appendChild($dom->importNode($node, TRUE))
Document element can contain nodes apart from the root element, such as entity definitions, processing instructions and so on.

Get src attribute of specific image with DOM PHP

I'm trying to do the same that this jQuery function in PHP (FB Open Graph doesn't execute JS Code, so it has to be executed server side) :
<script>captureurl=jQuery('.blog-content').find('img').attr('src');
jQuery('head').append("<meta property='og:image' content="+captureurl+"/></meta>");</script>
I've seen I could get an image attribute like that :
<?php doc = new DOMDocument();
$doc->loadHTMLFile($url);
$xpath = new DOMXpath($doc);
$imgs = $xpath->query("//img");
for ($i=0; $i < $imgs->length; $i++) {
$img = $imgs->item($i);
$src = $img->getAttribute("src");
// do something with $src
} ?>
But how can I target the first image src in the div with .blog-content class?
Thanks for your help :)
Replace $xpath->query("//img") with following:
$imgs = $xpath->query('//img[contains(attribute::class, "blog-content")]'); //here we are querying domdocument to find img which has class .blog-content

Parse and extract image URL file names based on specific alt tags

I'm trying to print out a list of image file extensions within a webpage excluding the .png extension.
I only want to parse all image file names from image url's within a website that use div class = cartoon only.
Example Structure:
<div class="cartoon">
<img src="URL/images/element8/12345.png" alt="cartoon">
Desired Output: 12345
Here is my code that I use to return all images
include('simple_html_dom.php');
$html = new simple_html_dom();
$html->load_file('URL');
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query("//img"); // find your image
$imageTags = $doc->getElementsByTagName('img');
foreach($imageTags as $tag) {
echo $tag->getAttribute('src');
}
You want to do it with xpath?
How about:
.//*[contains(#class, "cartoon")]//img[not(contains(#src, "png"))]

scraping all images from a website using DOMDocument

I basically want to get ALL the images in any website using DOMDocument.
but then i cant even load my html due to some reasons I dont know yet.
$url="http://<any_url_here>/";
$dom = new DOMDocument();
#$dom->loadHTML($url); //i have also tried removing #
$dom->preserveWhiteSpace = false;
$dom->saveHTML();
$images = $dom->getElementsByTagName('img');
foreach ($images as $image)
{
echo $image->getAttribute('src');
}
what happens is nothing gets printed . or did I do something wrong with the code?
You don't get a result because $dom->loadHTML() expects html. You give it an url, you first need to get the html of the page you want to parse. You can use file_get_contents() for that.
I used this in my image grab class. Works fine for me.
$html = file_get_contents('http://www.google.com/');
$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
echo $image->getAttribute('src');
}

Categories