Undefined property: DOMNodeList::$textContent when to parse web - php

In my code1,it can parse the web to get the td content for me.
code1
<?php
$url='http://www.sse.com.cn/marketservices/tradingservice/shhksc/eligible/';
$html = file_get_contents($url);
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//div[#id="hk_view"]//table[#class="tablestyle"]//tr//td[position()<4 and position()>1]');
foreach($nodes as $node){
echo $node->textContent.'</br>';}
?>
Now i change other format to parse the web.
code2
<?php
$url='http://www.sse.com.cn/marketservices/tradingservice/shhksc/eligible/';
$html = file_get_contents($url);
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//div[#id="hk_view"]//table[#class="tablestyle"]//tr');
foreach($nodes as $node){
$sub =$xpath->query('//td[position()<4 and position()>1]' ,$node);
echo $sub->textContent.'</br>';}
?>
Is the xpath expression wrong ?
$sub =$xpath->query('//td[position()<4 and position()>1]' ,$node);
It is the result of my code1.
According to har07's answer ,code2 was rewrite as code3,there is another problem remain,please test it with my code3 .
code3
<?php
$url='http://www.sse.com.cn/marketservices/tradingservice/shhksc/eligible/';
$html = file_get_contents($url);
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//div[#id="hk_view"]//table[#class="tablestyle"]//tr');
foreach($nodes as $node){
$sub =$xpath->query('//td[position()<4 and position()>1]' ,$node);
foreach($sub as $s){
echo $s->textContent.'</br>';
}
}
?>

The problem isn't in the xpath expression you use. As the error message suggests, query() returns DOMNodeList which doesn't have textContent property. It is DOMNode that have textContent.
You need to iterate through the DOMNodeList to access it's individual DOMNode member, and access textContent property on each DOMNode :
foreach($nodes as $node){
$sub = $xpath->query('.//td[position()<4 and position()>1]' ,$node);
foreach($sub as $s){
echo $s->textContent;
}
}

Related

How to parse body class with Xpath?

I'm trying to parse a page with Xpath, but I don't manage to get the body class.
Here is what I'm trying :
<?php
$url = 'http://figurinepop.com/mickey-paintbrush-disney-funko';
$html = file_get_contents($url);
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXpath($doc);
$nodes = $xpath->query('//link[#rel="canonical"]/#href');
foreach($nodes as $node) {
$canonical = $node->nodeValue;
}
$nodes = $xpath->query('//html/body/#class');
foreach($nodes as $node) {
$bodyclass = $node->nodeValue;
}
$output['canonical'] = $canonical;
$output['bodyclass'] = $bodyclass;
echo '<pre>'; print_r ($output); echo '</pre>';
?>
Here is what I get :
Array
(
[canonical] => http://figurinepop.com/mickey-paintbrush-disney-funko
[bodyclass] =>
)
It's working with many elements (title, canonical, div...) but the body class.
I've tested the Xpath query with a chrome extension and it seems well written.
What is wrong ?

Xpath nodeValue/textContent unable to see <BR> tag

HTML is as follows:
ABC<BR>DEF
However, both nodeValue and textContent attributes show "ABCDEF" as the value.
Any way to show or parse the <BR>?
Maybe this'll help you: DOMNode::C14N
It'll return the HTML of the node.
<?php
$a = 'ABC<BR>DEF';
$doc = new DOMDocument();
#$doc->loadHTML($a);
$finder = new DomXPath($doc);
$nodes = $finder->query("//a");
foreach ($nodes as $node) {
var_dump($node->c14n());
}
Demo
I know you have already solved your problem, but I wanted to add a more direct way of solving it...
$a = 'ABC<BR>DEF';
$doc = new DOMDocument();
$doc->loadHTML($a);
$xp = new DomXPath($doc);
$nodes = $xp->query("//a/node()");
$text = '';
foreach ($nodes as $node) {
$text .= $doc->saveHTML($node);
}
echo $text;
Outputs...
ABC<br>DEF

Parsing HTML to extract array of DIV content by class

$html = file_get_contents("https://www.wireclub.com/chat/room/music");
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$result = array();
foreach($xpath->evaluate('//div[#class="message clearfix"]/node()') as $childNode) {
$result[] = $dom->saveHtml($childNode);
}
echo '<pre>'; var_dump($result);
I would like the content of each individual DIV in an array to be processed individually.
This code is clumping every DIV together.
You could retrieve all the div and get the nodeValue
$dom = new DOMDocument();
$dom->loadHTML($html);
$myDivs = $dom->getElementsByTagName('div');
foreach($myDivs as $key => $value) {
$result[] = $value->nodeValue;
}
var_dump($result);
for class you should
you could use you code
$xpath = new DOMXPath($dom);
$myElem = $xpath->query("//*[contains(#class, '$classname')]");
foreach($myElem as $key => $value) {
$result[] = $value->nodeValue;
}

PHP Xpath Error already defined in Entity not showing results

I am getting errors in this php xpath app and i cannot fix, i would love some help if possible
<?php
//Get Username
$username = $_GET["u"];
$html = file_get_contents('http://us.playstation.com/publictrophy/index.htm?onlinename=' .$username);
$html = tidy_repair_string($html);
$doc = new DomDocument();
$doc->loadHtml($html);
$xpath = new DomXPath($doc);
// Now query the document:
foreach ($xpath->query('//*[#id="id-handle"]') as $node) {
echo $node, "\n";
}
foreach ($xpath->query('//*[#id="leveltext"]') as $node1) {
echo $node1, "\n";
}
?>
put # before $dom->loadHTML($html) because loadHTML usually rises a lot of warnings and notices
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xpath = new DOMXPath($dom);

Simple HTML DOM gets only 1 element

I'm following a simplified version of the scraping tutorial by NetTuts here, which basically finds all divs with class=preview
http://net.tutsplus.com/tutorials/php/html-parsing-and-screen-scraping-with-the-simple-html-dom-library/comment-page-1/#comments
This is my code. The problem is that when I count $items I get only 1, so it's getting only the first div with class=preview, not all of them.
$articles = array();
$html = new simple_html_dom();
$html->load_file('http://net.tutsplus.com/page/76/');
$items = $html->find('div[class=preview]');
echo "count: " . count($items);
Try using DOMDocument and DOMXPath:
$file = file_get_contents('http://net.tutsplus.com/page/76/');
$dom = new DOMDocument();
#$dom->loadHTML($file);
$domx = new DOMXPath($dom);
$nodelist = $domx->evaluate("//div[#class='preview']");
foreach ($nodelist as $node) { print $node->nodeValue; }

Categories