Get all elements by class name using DOMDocument

Get all elements by class name using DOMDocument - php

This question seems to have been answered numerous times but i still cant seem to put the pieces together.
I would like to get node value of every class by name. for example
<td class="thename"><strong>32</strong></td>
<td class="thename"><strong>12</strong></td>
i would like to grab the 32 and the 12. I assume this requires for sort of for loop but not sure exactly how to go about implementing it. Here's what i have so far
$domain = "http://domain.com";
$dom = new DOMDocument();
$dom->loadHTMLFile($domain);
$xpath = new DomXpath($dom);
$div = $xpath->query('//*[#class="thename"]')->item(0);
$stuff = $div ->textContent;
echo($stuff);

Is this what your are looking for?
$result = array();
$doc = <<< HTML
<html>
<body>
<div>1
<span>2</span>
</div>
<div>3</div>
<div>4
<span class="class1"><strong>5</strong></span>
<span class="class1"><strong>6</strong></span>
<span>7</span>
</div>
</body>
</html>
HTML;
$classname = "class1";
$domdocument = new DOMDocument();
$domdocument->loadHTML($doc);
$a = new DOMXPath($domdocument);
$spans = $a->query("//*[contains(concat(' ', normalize-space(#class), ' '), ' $classname ')]");
for ($i = $spans->length - 1; $i > -1; $i--) {
$result[] = $spans->item($i)->firstChild->nodeValue;
}
echo "<pre>";
print_r($result);
exit();

i simply did this in php
$dom = new DOMDocument('1.0');
$classname = "product-name";
#$dom->loadHTMLFile("http://shophive.com/".$query);
$nodes = array();
$nodes = $dom->getElementsByTagName("div");
foreach ($nodes as $element)
{
$classy = $element->getAttribute("class");
if (strpos($classy, "product") !== false)
{
echo $classy;
echo '<br>';
}
}

Related

Extracting information from <i> tag from HTML using PHP

I am having some code and getting HTTP 500 Error. A bit getting confused. I need to extract from the web of weather cast weather digit information and add in the website.
Here is a code:
orai_class.php
<?php
Class orai{
var $url;
function generate_orai($url){
$html = file_get_contents($url);
$classname = 'wi wi-1';
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[#class='" . $classname . "']");
$i=0;
foreach($results as $node)
{
if ($results->length > 0) {
$array[] = $results->item($i)->nodeValue;
}
$i++;
}
return $array;
}
}
?>
index.php
<?php
include("orai.class.php");
$orai = new orai();
print_r($orai->generate_orai('https://orai.15min.lt/prognoze/vilnius'));
?>
Thank You.

How to web-scrape in in divs with DOMparser

I am trying to get div and for other pages, trying to put it in a foreach.
But facing some troubles,
<div class="article_info">
<ul class="c-result_box">
<li>
<div class="inner cf">
<div class="c-header">
<div class="c-logo">
<im src="/e/designs/31sumai/common/img/logo_08.png" alt="#">
</div>
<p class="c-supplier">三井のマンション</p>
<p class="c-name">
パークリュクス大阪天満
</p>
I'm trying to get the text inside the <a> element, here is my codes, what I am missing here?
$start_id = 1501;
while(true){
$url = 'https://www.31sumai.com/mfr/K'.$start_id.'/outline.html';
$html = file_get_contents($url);
libxml_use_internal_errors(true);
$DOMParser = new \DOMDocument();
$DOMParser->loadHTML($html);
$xpath = new \DOMXPath($DOMParser);
$classname="c-name";
$nodes = $finder->query("//*[contains(#class, '$classname')]");
$MyTable = false;
$insertData = [];
foreach($nodes as $node){
$allNames = [];
foreach($node->getElementsByTagName('a') as $a){
$name = $a->getElementsByTagName('a');
$allProperties[] = [
'names' => $name];
}
}
Thank you for helping!

You can rely on your XPath query to pull all the text node that you want, and then just get the nodeValue property within your loop:
$start_id = "1501";
$url = "https://www.31sumai.com/mfr/K$start_id/outline.html";
$html = file_get_contents($url);
libxml_use_internal_errors(true);
$DOMParser = new \DOMDocument();
$DOMParser->loadHTML($html);
$xpath = new \DOMXPath($DOMParser);
$classname="c-name";
$nodes = $xpath->query("//*[contains(#class, '$classname')]/a/text()");
foreach($nodes as $node){
echo $node->nodeValue;
}

Get Element by ClassName with DOMdocument() Method

Here is what I am trying to achieve : retrieve all products on a page and put them into an array. Here is the code I am using :
$page2 = curl_exec($ch);
$doc = new DOMDocument();
#$doc->loadHTML($page2);
$nodes = $doc->getElementsByTagName('title');
$noders = $doc->getElementsByClassName('productImage');
$title = $nodes->item(0)->nodeValue;
$product = $noders->item(0)->imageObject.src;
It works for the $title but not for the product. For info, in the HTML code the img tag looks like this :
<img alt="" class="productImage" data-altimages="" src="xxxx">
I have been looking at this (PHP DOMDocument how to get element?) but I still don't understand how to make it work.
PS : I get this error :
Call to undefined method DOMDocument::getElementsByclassName()

I finally used the following solution :
$classname="blockProduct";
$finder = new DomXPath($doc);
$spaner = $finder->query("//*[contains(#class, '$classname')]");

https://stackoverflow.com/a/31616848/3068233
Linking this answer as it helped me the most with this problem.
function getElementsByClass(&$parentNode, $tagName, $className) {
$nodes=array();
$childNodeList = $parentNode->getElementsByTagName($tagName);
for ($i = 0; $i < $childNodeList->length; $i++) {
$temp = $childNodeList->item($i);
if (stripos($temp->getAttribute('class'), $className) !== false) {
$nodes[]=$temp;
}
}
return $nodes;
}
Theres the code and heres the usage
$dom = new DOMDocument('1.0', 'utf-8');
$dom->loadHTML($html);
$content_node=$dom->getElementById("content_node");
$div_a_class_nodes=getElementsByClass($content_node, 'div', 'a');

function getElementsByClassName($dom, $ClassName, $tagName=null) {
if($tagName){
$Elements = $dom->getElementsByTagName($tagName);
}else {
$Elements = $dom->getElementsByTagName("*");
}
$Matched = array();
for($i=0;$i<$Elements->length;$i++) {
if($Elements->item($i)->attributes->getNamedItem('class')){
if($Elements->item($i)->attributes->getNamedItem('class')->nodeValue == $ClassName) {
$Matched[]=$Elements->item($i);
}
}
}
return $Matched;
}
// usage
$dom = new \DOMDocument('1.0');
#$dom->loadHTML($html);
$elementsByClass = getElementsByClassName($dom, $className, 'h1');

Print an array after DOM extraction?

I need to print out my array, but print_r($test) doesn't work at last...
Here is a simple code :
$code = '<html><head></head><body><div class="list"><img src="http://google.com/564308080517287.jpg" alt="my title"></div></body></html>'; // Code is simplified here, but imagine you've got much more contents inside
$doc = new DOMDocument();
$doc->loadHTML( $code );
//
$test = array();
foreach($doc->getElementsByTagName('div') as $div){
if($div->getAttribute('class') == "list"){
$ads_count = $div->getElementsByTagName('a')->length;
for ($i=0; $i<=$ads_count; $i++) {
$ad = $div->getElementsByTagName('a')->item($i);
$ad_img = trim($ad->getElementsByTagName('img')->item(0)->getAttribute('src'));
$test[$i]['img'] = $ad_img;
}
}
}
print_r($test); // doesn't work !!
Any idea ?

<?php
$code = '<html><head></head><body><div class="list">
<img src="http://google.com/564308080517287.jpg" alt="my title"></div></body></html>'; // Code is simplified here, but imagine you've got much more contents inside
$dom = new DOMDocument();
$dom->loadHtml($code);
$selector = new DOMXPath($dom);
$parceiltable = $selector->query("//div[#class='list']/a/img");
foreach($parceiltable as $key=>$tds){
$test[]['img'] = $tds->getAttribute('src');
}
print_r($test);
?>

Xpath for extracting links

I create an scraper for an automoto site and first I want to get all manufactures and after that all links of models for each manufactures but with the code below I get only the first model on the list. Why?
<?php
$dom = new DOMDocument();
#$dom->loadHTMLFile('http://www.auto-types.com');
$xpath = new DOMXPath($dom);
$entries = $xpath->query("//li[#class='clearfix_center']/a/#href");
$output = array();
foreach($entries as $e) {
$dom2 = new DOMDocument();
#$dom2->loadHTMLFile('http://www.auto-types.com' . $e->textContent);
$xpath2 = new DOMXPath($dom2);
$data = array();
$data['newLinks'] = trim($xpath2->query("//div[#class='modelImage']/a/#href")->item(0)->textContent);
$output[] = $data;
}
echo '<pre>' . print_r($output, true) . '</pre>';
?>
SO I need to get: mercedes/100, mercedes/200, mercedes/300 but now with my script i get only the first link so mercedes/100...
please help

You need to iterate through the results instead of just taking the first item:
$items = $xpath2->query("//div[#class='modelImage']/a/#href");
$links = array();
foreach($items as $item) {
$links[] = $item->textContent;
}
$data['newLinks'] = implode(', ', $links);

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Get all elements by class name using DOMDocument - php

Related

Extracting information from <i> tag from HTML using PHP

How to web-scrape in in divs with DOMparser

Get Element by ClassName with DOMdocument() Method

Print an array after DOM extraction?

Xpath for extracting links

Categories

Resources