Accessing child element - php

I have this xml code, and I need to get every value from . I've tried but what I get is only the first value of . I wonder what's wrong with my code.
Here's the xml code:
<item>
<g:detailed_images>
<g:detailed_image>hat.png</g:detailed_image>
<g:detailed_image>tie.png</g:detailed_image>
<g:detailed_image>eye_glass.png</g:detailed_image>
<g:detailed_image>watch.png</g:detailed_image>
</g:detailed_images>
</item>
<item>
<g:detailed_images>
<g:detailed_image>shoe.png</g:detailed_image>
<g:detailed_image>socks.png</g:detailed_image>
<g:detailed_image>hand_gloves.png</g:detailed_image>
<g:detailed_image>scarf.png</g:detailed_image>
</g:detailed_images>
</item>
And this is my code:
foreach($xpath->evaluate('//item') as $item)
{
$detailed_images = $xpath->evaluate('g:detailed_images', $item);
foreach ($detailed_images as $img)
{
$simg = $xpath->evaluate('string(g:detailed_image)', $img);
echo 'image = ';
echo $simg;
}
}
My result is:
image = hat.png
image = shoe.png
While what I want is this:
image = hat.png
image = tie.png
image = eye_glass.png
image = watch.png
image = shoe.png
image = socks.png
image = hand_gloves.png
image = scarf.png
Thanks for the help.

As you can see, you're only getting the first detailed_image of each detailed_images. So, keeping the way you're doing it, you'd need to have another foreach on $simg and print each resulting node. But you don't need to do all that XPath querying to get those elements. You can get there just fine with only one query:
//item/g:detailed_images/g:detailed_image
PHP Code:
$dom = new DOMDocument;
$dom->loadXML($xml);
$xpath = new DOMXPath($dom);
foreach($xpath->evaluate('//item/g:detailed_images/g:detailed_image') as $item) {
var_dump($item->nodeValue);
}
Demo

Related

Delete files that is not in my XML document with PHP

I have an XML file that contains the SKU of products. I also have a folder that corresponds to this XML file.
Snippet from XML:
<Feed>
<Product>
<ItemCode>ALT-AAB-BL</ItemCode>
<BaseItemCode>ALT-AAB</BaseItemCode>
<StockCheckCode>ALT-AAB-BL</StockCheckCode>
</Product>
<Product>
<ItemCode>ALT-AAB-L</ItemCode>
<BaseItemCode>ALT-AAB</BaseItemCode>
<StockCheckCode>ALT-AAB-L</StockCheckCode>
</Product>
<Product>
<ItemCode>ALT-AAB-N</ItemCode>
<BaseItemCode>ALT-AAB</BaseItemCode>
<StockCheckCode>ALT-AAB-N</StockCheckCode>
</Product>
</Feed>
I have been trying it with php but I am a junior and dont know where to start so I will give you some pseudo code.
if $domelement->ItemCode != filename.jpg{
delte.jpg;
}
Yes this pseudo code is terrible. I basically am able to pull in the .xml file and was able to manipulate data.
I basically want to delete the files that is not present in the xml file and preserve the rest. I know how to appedn the ItemCode with .png if I need to.
<?php
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->load('altitude.xml');
$xpath = new DOMXPath($dom);
$query = sprintf('/Feed/Product/BaseItemCode');
foreach($xpath->query($query) as $record) {
//delete file that is not present in BaseItemCode
}
I just want the files not present in xml->BaseItemCode (which I will append with .png or .jpg) to be deleted from the folder.
You need two lists: whitelist from XML and all items list from system.
$dom = new DOMDocument;
$dom->preserveWhiteSpace = false;
$dom->formatOutput = true;
$dom->load('altitude.xml');
$xpath = new DOMXPath($dom);
$query = sprintf('/Feed/Product/BaseItemCode');
$xmlList = [];
foreach($xpath->query($query) as $record) {
$xmlList[] = $record->ItemCode . '.jpg';
$xmlList[] = $record->ItemCode . '.png'; // If you can, use smarter way
}
$directory = '/full/path/to/dir';
$dirList = array_diff(scandir($directory), array('..', '.'));
$filesToDelete = array_diff($dirList, $xmlList);
foreach ($filesToDelete as $file) {
unlink($directory . DIRECTORY_SEPARATOR . $file);
}
#justinas 's method worked. I had this weird space at the end of every single array element imported from my CSV file. I converted my XML to a CSV file and used it as an array.
<?php
$csv = file('convertcsv.csv');
function test_alter(&$item1, $key, $prefix)
{
$item1 = "$item1$prefix";
}
array_walk($csv, 'test_alter', '.png');
//var_dump($csv);
$directory = 'img';
$dirList = array_diff(scandir($directory), array('..', '.'));
$filesToDelete = array_diff($dirList, $csv);
foreach ($filesToDelete as $file) {
unlink($directory . DIRECTORY_SEPARATOR . $file);
}
echo "klaar"
?>
Can anyone tell me why there is a blank space after every single element in the array if you use:
$csv = file('convertcsv.csv');
as an array?

Get src attribute of specific image with DOM PHP

I'm trying to do the same that this jQuery function in PHP (FB Open Graph doesn't execute JS Code, so it has to be executed server side) :
<script>captureurl=jQuery('.blog-content').find('img').attr('src');
jQuery('head').append("<meta property='og:image' content="+captureurl+"/></meta>");</script>
I've seen I could get an image attribute like that :
<?php doc = new DOMDocument();
$doc->loadHTMLFile($url);
$xpath = new DOMXpath($doc);
$imgs = $xpath->query("//img");
for ($i=0; $i < $imgs->length; $i++) {
$img = $imgs->item($i);
$src = $img->getAttribute("src");
// do something with $src
} ?>
But how can I target the first image src in the div with .blog-content class?
Thanks for your help :)
Replace $xpath->query("//img") with following:
$imgs = $xpath->query('//img[contains(attribute::class, "blog-content")]'); //here we are querying domdocument to find img which has class .blog-content

Parse and extract image URL file names based on specific alt tags

I'm trying to print out a list of image file extensions within a webpage excluding the .png extension.
I only want to parse all image file names from image url's within a website that use div class = cartoon only.
Example Structure:
<div class="cartoon">
<img src="URL/images/element8/12345.png" alt="cartoon">
Desired Output: 12345
Here is my code that I use to return all images
include('simple_html_dom.php');
$html = new simple_html_dom();
$html->load_file('URL');
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query("//img"); // find your image
$imageTags = $doc->getElementsByTagName('img');
foreach($imageTags as $tag) {
echo $tag->getAttribute('src');
}
You want to do it with xpath?
How about:
.//*[contains(#class, "cartoon")]//img[not(contains(#src, "png"))]

scraping all images from a website using DOMDocument

I basically want to get ALL the images in any website using DOMDocument.
but then i cant even load my html due to some reasons I dont know yet.
$url="http://<any_url_here>/";
$dom = new DOMDocument();
#$dom->loadHTML($url); //i have also tried removing #
$dom->preserveWhiteSpace = false;
$dom->saveHTML();
$images = $dom->getElementsByTagName('img');
foreach ($images as $image)
{
echo $image->getAttribute('src');
}
what happens is nothing gets printed . or did I do something wrong with the code?
You don't get a result because $dom->loadHTML() expects html. You give it an url, you first need to get the html of the page you want to parse. You can use file_get_contents() for that.
I used this in my image grab class. Works fine for me.
$html = file_get_contents('http://www.google.com/');
$dom = new domDocument;
$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
echo $image->getAttribute('src');
}

printing out html content from domelement using nodeValue

I have image in html. I parse it to DOMDocument and start working with it...
$doc = new DOMDocument();
$doc->loadHTML($article_header);
$imgs = $doc->getElementsByTagName('img');
foreach ($imgs as $img) {
$container = $img->parentNode;
if ($container->tagName != "a") {
$image_inside=utf8_decode($img->nodeValue);
echo "3".$image_inside;
die;
}
}
This code works fine line 3 gets image. line 6 understands that there is no "a" tag above this "img" tag, and line 8 must print out my initial image. But the thing is I only see "3" without image tag and etc...
I did inspect element and nothing is there. just "3" is coming out. Why I cannot print out image ?
You could use:
DOMDocument::saveXML($img);
From PHP Documetation's saveXML().
$doc = new DOMDocument();
$doc->loadHTML($article_header);
$imgs = $doc->getElementsByTagName('img');
foreach ($imgs as $img) {
$container = $img->parentNode;
if ($container->tagName != "a") {
echo utf8_decode($doc->saveXML($img));
die;
}
}
If you're using PHP 5.3.6 you could use (from How to return outer html of DOMDocument?)
$doc->saveHtml($img);
Note the caveat mentioned in the linked-to question:
(...) use saveXml(), but that would
create XML compliant markup. In the
case of an <a>(<img>) element, that shouldn't
be an issue though.

Categories