PHP DomXpath xpath query of Child Node - php

I'm trying to use xpath to query some HTML:
<a target="_blank" class="dx-smart-widget-grid-item_113_20" href="https://link.com" title="Rules for the Road to One Source of Truth' with Jaguar Land Rover and Spark44">
<div class="dx-smart-widget-grid-info_113_20">
<img class="dx-smart-widget-report-cover_113_20" src="https://imagelink.com/preview.png" alt="The Alternative Text"/>
<div class="dx-smart-widget-grid-text_113_20">
<div class="dx-smart-widget-grid-title_113_20">The Alternative Text</div>
</div>
<span class="dx-smart-widget-report-assettype_113_20">On-Demand Webinar</span>
<img class="dx-smart-widget-partner-logo_113_20" src="https://logopath/logo.png" alt="censhare"/>
</div>
</a>
This is the code I'm using:
# $dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$elements = $xpath->query("//a[contains(#class,'dx-smart-widget-grid-item_113_20')]");
if (!is_null($elements)) {
foreach ($elements as $element) {
echo "<strong>Link: </strong>". $element->getAttribute('href'). "<br />";
echo "<strong>Title: </strong>". $element->getAttribute('title'). "<br />";
$images = $xpath->query("//img[contains(#class,'dx-smart-widget-report-cover_113_20')]", $element);
echo "<strong>Image: </strong>".$images->getAttribute('src'). "<br />";
}
}
I'm gettin the href and title fine... but trying to query the image just isn't working. It actually breaks.
Any help would be appreciated.

Assuming there is only 1 matching image, you can use XPaths evaluate() and string() in the XPath expression to extract the value in one go...
$images = $xpath->evaluate("string(//img[contains(#class,'dx-smart-widget-report-cover_113_20')]/#src)", $element);
echo "<strong>Image: </strong>".$images. "<br />";

You are almost there. You just need to iterate over $images in a foreach loop. So replace
echo "<strong>Image: </strong>".$images->getAttribute('src'). "<br />";
with
foreach ($images as $image) {
echo "<strong>Image: </strong>".$image->getAttribute('src'). "<br /and i>";
};
and it should work.

Related

How can i get the text from a child node with php DOMDocument

I've been writing a php code to get information from a site, so far i was able to get the href attribute, but i cant find a way to get the text from the child node "span", can someone help me?
html- >
<a class="js-publication" href="publication/247931167">
<span class="publication-title">An approach for textual authoring</span>
</a>
This is how i am currently able to get the href ->
#$dom->loadHTMLFile($curPage);
$anchors = $dom->getElementsByTagName('a');
foreach ($anchors as $element) {
$class_ = $element->getAttribute('class');
if (0 !== strpos($class_, 'js-publication')) {
$href = $element->getAttribute('href');
if(0 === stripos($href,'publication/')){
echo $href;//link para a publicação;
echo "\n";
}
}
}
You can use DOMXpath
$html = <<< LOL
<a class="js-publication" href="publication/247931167">
<span class="publication-title">An approach for textual authoring</span>
</a>
LOL;
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
foreach ($xpath->query("//a[#class='js-publication']") as $element){
echo $element->getAttribute('href');
echo $element->textContent;
}
//publication/247931167
//An approach for textual authoring
Or without the for loop, if you just want one element :
echo $xpath->query("//a[#class='js-publication']/span")[0]->textContent;
echo $xpath->query("//a[#class='js-publication']")[0]->getAttribute('href');
Ideone Demo

Getting element inside other element by class php DOMDocument

Hi Guys i do have this Html Code :
<div class="post-thumbnail2">
<a href="http://example.com" title="Title">
<img src="http://linkimgexample/image.png" alt="Title"/>
</a>
</div>
I want to get the value of src image (http://linkimgexample/image.png) and the value of the href link (http://example.com) using php DOMDocument
what i did to get the link was something like that :
$divs = $dom->getElementsByTagName("div");
foreach($divs as $div) {
$cl = $div->getAttribute("class");
if ($cl == "post-thumbnail2") {
$links = $div->getElementsByTagName("a");
foreach ($links as $link)
echo $link->getAttribute("href")."<br/>";
}
}
i could do the same for src img
$imgs = $div->getElementsByTagName("img");
foreach ($imgs as $img)
echo $img->getAttribute("src")."<br/>";
but sometime in the website there is no image and the Html code is like that :
<div class="post-thumbnail2">
</div>
so my questions is how could i get the 2 value at the same time it means when there is no image i show some message
to be more clear this is an example :
<div class="post-thumbnail2">
<a href="http://example1.com" title="Title">
<img src="http://linkimgexample/image1.png" alt="Title"/>
</a>
</div>
<div class="post-thumbnail2">
</div>
<div class="post-thumbnail2">
<a href="http://example3.com" title="Title">
<img src="http://linkimgexample/image2.png" alt="Title"/>
</a>
</div>
i want the result to be
http://example1.com - http://linkimgexample/image1.png
http://example2.com - there is no image here !
http://example3.com - http://linkimgexample/image2.pn
DOMElement::getElementsByTagName returns a DOMNodeList, that means you can find out if a img-element was found by checking the length property.
$imgs = $div->getElementsByTagName("img");
if($imgs->length > 0) {
foreach ($imgs as $img)
echo $img->getAttribute("src")."<br/>";
} else {
echo "there is no image here!<br/>";
}
You should think about using XPath - it makes your life traversing the DOM a bit easier:
$doc = new DOMDocument();
if($doc->loadHtml($xmlData)) {
$xpath = new DOMXPath($doc);
$postThumbLinks = $xpath->query("//div[#class='post-thumbnail2']/a");
foreach($postThumbLinks as $link) {
$imgList = $xpath->query("./img", $link);
$imageLink = "there is no image here!";
if($imgList->length > 0) {
$imageLink = $imgList->item(0)->getAttribute('src');
}
echo $link->getAttribute('href'), " - ", $link->getAttribute('title'),
" - ", $imageLink, "<br/>", PHP_EOL;
}
} else {
echo "can't load HTML document!", PHP_EOL;
}

Extraction of src and value from html not working?

So the problem here is that when i use getElementById() it doesn't work. But, if i replace it with getElementsByTagName('img') it's perfectly fine.
How do i fix this problem, if possible?
(html codes are in file garden.php)Html:
<img id="head" src="images/flowers.png" value="blah">
(Php codes is in the head of the garden.php file)
Php:
<?
$html = file_get_contents('garden.php');
$dom = new DOMDocument;
$dom->loadHTML($html);
foreach ($dom->getElementById('head') as $tag) {
echo $tag->getAttribute('value'); // "prints" yellow
echo "<br>";
echo $tag->getAttribute('src'); // prints images/flowers.png
}
You should not be using a foreach loop. IDs are unique, so getElementById returns a DOMElement, not a DOMNodeList.
$tag = $dom->getElementById('head');
echo $tag->getAttribute('value') . '<br>' . $tag->getAttribute('src');

PHP regex to check if image is wrapped with a tag

I am creating a wordpress function and need to determine whether an image in the content is wrapped with an a tag that contains a link to a PDF or DOC file e.g.
<img src="../images/image.jpg" />
How would I go about doing this with PHP?
Thanks
I would very strongly advise against using a regular expression for this. Besides being more error prone and less readable, it also does not give you the ability to manipulate the content easily.
You would be better of loading the content into a DomDocument, retrieving all <img> elements and validating whether or not their parents are <a> elements. All you would have to do then is validate whether or not the value of the href attribute ends with the desired extension.
A very crude implementation would look a bit like this:
<?php
$sHtml = <<<HTML
<html>
<body>
<img src="../images/image.jpg" />
<img src="../images/image.jpg" />
<img src="../images/image.jpg" />
<p>this is some text <a href="site.com/doc.pdf"> more text</p>
</body>
</html>
HTML;
$oDoc = new DOMDocument();
$oDoc->loadHTML($sHtml);
$oNodeList = $oDoc->getElementsByTagName('img');
foreach($oNodeList as $t_oNode)
{
if($t_oNode->parentNode->nodeName === 'a')
{
$sLinkValue = $t_oNode->parentNode->getAttribute('href');
$sExtension = substr($sLinkValue, strrpos($sLinkValue, '.'));
echo '<li>I am wrapped in an anchor tag '
. 'and I link to a ' . $sExtension . ' file '
;
}
}
?>
I'll leave an exact implementation as an exercise for the reader ;-)
Here is a DOM parse based code that you can use:
$html = <<< EOF
<img src="../images/image.jpg" />
<img src="../images/image1.jpg" />
<IMG src="../images/image2.jpg" />
<img src="../images/image3.jpg" />
My PDF
EOF;
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html); // loads your html
$nodeList = $doc->getElementsByTagName('a');
for($i=0; $i < $nodeList->length; $i++) {
$node = $nodeList->item($i);
$children = $node->childNodes;
$hasImage = false;
foreach ($children as $child) {
if ($child->nodeName == 'img') {
$hasImage = true;
break;
}
}
if (!$hasImage)
continue;
if ($node->hasAttributes())
foreach ($node->attributes as $attr) {
$name = $attr->nodeName;
$value = $attr->nodeValue;
if ($attr->nodeName == 'href' &&
preg_match('/\.(doc|pdf)$/i', $attr->nodeValue)) {
echo $attr->nodeValue .
" - Image is wrapped in a link to a PDF or DOC file\n";
break;
}
}
}
Live Demo: http://ideone.com/dwJNAj

simple html dom get 2 tags for one foreach

How to call 2 tags for one foreach with simple html dom?
<?php
require_once("simple_html_dom.php");
$str='<img src="./1.jpg" /><span>image1</span><img src="./2.jpg" /><span>image2</span>';//still have more 'img' and 'span'
$html = str_get_html($str);
foreach($html->find('img') as $content){
echo $content.'<br />';
//echo <span> inner html
}
?>
I want to get the result like:
<img src="./1.jpg" />
image1
<img src="./2.jpg" />
image2
one img and behind span for one unit. Thanks.
You could do this.
$img = $html->find("img");
$span = $html->find("span");
for($i=0;$i<count($img);$i++) {
echo $img[$i] . "<br />" . $span[$i];
}

Categories