Multiple src for img class="cover-image" using DomDocument - php

I am using a script to get src of a <img> with class="cover-image"
The webpage is of a Google Playstore page.
Here's the script:
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTMLFile('https://play.google.com/store/apps/details?id=com.igg.castleclash');
libxml_clear_errors();
$xp = new DOMXPath($dom);
$image_src = $xp->query("//img[#class='cover-image']/#src");
foreach($image_src as $attr) {
echo $attr->value. "<br/>";
}
Issue is, there's only one <img> tag with class name cover-image, but I am still getting 15 src values.

If your intent is to just get the first one, then you can add this on the xpath query:
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTMLFile('https://play.google.com/store/apps/details?id=com.igg.castleclash');
libxml_clear_errors();
$xp = new DOMXPath($dom);
$image_src = $xp->evaluate("string(//img[#class='cover-image'][1]/#src)");
echo $image_src;
echo "<img src='$image_src' alt='' />";
Also, if you want that cover image that's on the topmost portion of the site (near the header part), you could just point it directly to it:
$image_src = $xp->evaluate("
string(
//div[#class='details-info']
/div[#class='cover-container']
/img[#class='cover-image']/#src
)
"); // much more specific

Are you sure there should be only 1 tag? I see 15 tags with this class in html code.

Related

Retrieve data from html page using xpath and php

I know there are similar question, but, trying to study PHP I met this error and I want understand why this occurs.
<?php
$url = 'http://aice.anie.it/quotazione-lme-rame/';
echo "hello!\r\n";
$html = new DOMDocument();
#$html->loadHTML($url);
$xpath = new DOMXPath($html);
$nodelist = $xpath->query(".//*[#id='table33']/tbody/tr[2]/td[3]/b");
foreach ($nodelist as $n) {
echo $n->nodeValue . "\n";
}
?>
this prints just "hello!". I want to print the value extracted with the xpath, but the last echo doesn't do anything.
You have some errors in your code :
You try to get the table from the url http://aice.anie.it/quotazione-lme-rame/, but it's actually in an iframe located at http://www.aiceweb.it/it/frame_rame.asp, so get the iframe url directly.
You use the function loadHTML(), which load an HTML string. What you need is the loadHTMLFile function, which takes the link of an HTML document as a parameter (See http://www.php.net/manual/fr/domdocument.loadhtmlfile.php)
You assume there is a tbody element on the page but there is no one. So remove that from your query filter.
Working code :
$url = 'http://www.aiceweb.it/it/frame_rame.asp';
echo "hello!\r\n";
$html = new DOMDocument();
#$html->loadHTMLFile($url);
$xpath = new DOMXPath($html);
$nodelist = $xpath->query(".//*[#id='table33']/tr[2]/td[3]/b");
foreach ($nodelist as $n) {
echo $n->nodeValue . "\n";
}

DOMDocument grab html between two p tags [duplicate]

I'm trying to replace video links inside a string - here's my code:
$doc = new DOMDocument();
$doc->loadHTML($content);
foreach ($doc->getElementsByTagName("a") as $link)
{
$url = $link->getAttribute("href");
if(strpos($url, ".flv"))
{
echo $link->outerHTML();
}
}
Unfortunately, outerHTML doesn't work when I'm trying to get the html code for the full hyperlink like <a href='http://www.myurl.com/video.flv'></a>
Any ideas how to achieve this?
As of PHP 5.3.6 you can pass a node to saveHtml, e.g.
$domDocument->saveHtml($nodeToGetTheOuterHtmlFrom);
Previous versions of PHP did not implement that possibility. You'd have to use saveXml(), but that would create XML compliant markup. In the case of an <a> element, that shouldn't be an issue though.
See http://blog.gordon-oheim.biz/2011-03-17-The-DOM-Goodie-in-PHP-5.3.6/
You can find a couple of propositions in the users notes of the DOM section of the PHP Manual.
For example, here's one posted by xwisdom :
<?php
// code taken from the Raxan PDI framework
// returns the html content of an element
protected function nodeContent($n, $outer=false) {
$d = new DOMDocument('1.0');
$b = $d->importNode($n->cloneNode(true),true);
$d->appendChild($b); $h = $d->saveHTML();
// remove outter tags
if (!$outer) $h = substr($h,strpos($h,'>')+1,-(strlen($n->nodeName)+4));
return $h;
}
?>
The best possible solution is to define your own function which will return you outerhtml:
function outerHTML($e) {
$doc = new DOMDocument();
$doc->appendChild($doc->importNode($e, true));
return $doc->saveHTML();
}
than you can use in your code
echo outerHTML($link);
Rename a file with href to links.html or links.html to say google.com/fly.html that has flv in it or change flv to wmv etc you want href from if there are other href
it will pick them up as well
<?php
$contents = file_get_contents("links.html");
$domdoc = new DOMDocument();
$domdoc->preservewhitespaces=“false”;
$domdoc->loadHTML($contents);
$xpath = new DOMXpath($domdoc);
$query = '//#href';
$nodeList = $xpath->query($query);
foreach ($nodeList as $node){
if(strpos($node->nodeValue, ".flv")){
$linksList = $node->nodeValue;
$htmlAnchor = new DOMElement("a", $linksList);
$htmlURL = new DOMAttr("href", $linksList);
$domdoc->appendChild($htmlAnchor);
$htmlAnchor->appendChild($htmlURL);
$domdoc->saveHTML();
echo ("<a href='". $node->nodeValue. "'>". $node->nodeValue. "</a><br />");
}
}
echo("done");
?>

DomXPath with DOMDocument to get <img> Class URL

I am writing a little scraper script that will find the image URL that has a particular class name. I know that my cURL and DOMDocument is functioning okay, and even the DomXPath really (as far as I can tell, there are no errors) But I am struggling to work out how to get the URL of the xpath query results.
My code so far:
$dom = new DOMDocument();
#$dom->loadHTML($x);
$xpath = new DomXpath($dom);
$div = $xpath->query('//*[#class="productImage"]');
var_dump($div);
echo $div->item(0);
If I var_dump($x) the page outputs no problem. So the CURL is working fine. But I do not know how to get the data that is contained in the $div. I am trying to find an Image with a class of 'productImage' which looks like:
<img src="/uploads/5W/yP/5WyPP4l7Z-jmZRzu_MJ6zg/1077-d.jpg" border="1" alt="Album" class="productImage">
I want the source of that image tag.
Any suggestions?
$dom = new DOMDocument();
$dom->loadHTML($x);
$xpath = new DomXpath($dom);
$imgs = $xpath->query('//*[#class="productImage"]');
foreach($imgs as $img)
{
echo 'ImgSrc: ' . $img->getAttribute('src') .'<br />' . PHP_EOL;
}
Try that...
== EDIT: Additional Info ==
The reason I use a loop here is because you may find more than one img. If you know there is only one element (or you want the first dom node found) you can access the elelement from the domnodelist via the item method of domnodelist - like so:
$dom = new DOMDocument();
$dom->loadHTML($x);
$xpath = new DomXpath($dom);
$img = $xpath->query('//*[#class="productImage"]');
echo 'ImgSrc: ' . $img->item(0)->getAttribute('src') .'<br />' . PHP_EOL;
You don't actually need to use XPath here, because it seems that you're just after images and that can be done by using DOMDocument::getElementsByTagName(), followed by a simple filter:
foreach ($dom->getElementsByTagName('img') as $image) {
$class = $image->getAttribute('class');
if (strpos(" $class ", " productImage ") !== false) {
$url = $image->getAttribute('src');
// do stuff
}
}
Then, you can get the src attribute by using DOMElement::getAttribute():
echo $image->getAttribute('src');

Get link from html by php

How do I get this link <li><a rel="prev" href="/1149/" accesskey="p">< Prev</a></li> from an html document using PHP? How do I get the link by the "rel"?
I'm trying to get /1149/
Trying to understand what you want… If you want to take an HTML/XML input and grab the href value of a link with the attribute rel="prev" I'd suggest using DOMXpath, something like:
$html = '<li><a rel="prev" href="/1149/" accesskey="p">< Prev</a></li>';
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
foreach ($xpath->query("//a[#rel='prev']") as $node) {
if ($node->hasAttribute('href')) {
echo $node->getAttribute('href') . '<br>';
}
}

How to get image src by class

I have this:
<img class="brand-logo" src="http://www.teledynamics.com/tdresources/74c42cb2-dc7f-4548-b820-2946fbe160db.jpg" onerror="this.src='/Content/Css/Images/no_brand_logo_120_48.gif'" alt="ADTRAN">
how to get img src (http://www.teledynamics.com/tdresources/74c42cb2-dc7f-4548-b820-2946fbe160db.jpg)
I tried a lot of things and that was the last one:
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$src = $xpath->evaluate("string(//class='brand-logo']/img/#src)");
echo "$src";
That's not proper XPath syntax. Try
$nodes = $xpath->query("//img[#class='brand-logo']");
$src = $nodes->item(0)->getAttribute('src');
First you fetch the NODE that represents the image whose src you want, THEN you get the src attribute. Note that the ->query() call returns a DOMNodeList, not a node.
Try like this
<?php
$html = '<a href="/Dealer-Catalog/ManufacturerID-3">
<img class="brand-logo" src="http://www.teledynamics.com/tdresources/74c42cb2-dc7f-4548-b820-2946fbe160db.jpg" alt="ADTRAN" />
</a>';
$xml = simplexml_load_string($html);
echo $xml->img['src'];
?>
Try like this
<?php
$doc=new DOMDocument();
$doc->loadHTML('<a href="/Dealer-Catalog/ManufacturerID-3">
<img class="brand-logo" src="http://www.teledynamics.com/tdresources/74c42cb2-dc7f-4548-b820-2946fbe160db.jpg" alt="ADTRAN" />
</a>');
$xml=simplexml_import_dom($doc); // just to make xpath more simple
$images=$xml->xpath('//img');
foreach ($images as $img) {
echo $img['src'];
}?>
With xpath you can query an attribute directly, string() give it's node-value:
$src = $xpath->evaluate("string(//img[#class='brand-logo']/#src)");
However I'm really sorry to say that I have no clue how that could be done with preg_match in your case ;)

Categories