I'm currently working on a fantasy sports site, and I want to be able to pull basic stats from another site. (I don't have much experience with XML or pulling data from other sites).
I inspected the element to gain it's XPath:
Which gave me: //*[#id="cp1_ctl01_pnlPlayerStats"]/table[1]/tbody/tr[4]/td[18]
I've looked into a couple methods of trying to pull the info and came up with this:
But I just end up with empty elements in my table within my site:
Here's My Code:
$doc = new DOMDocument();
#$doc->loadHTMLFile($P_RotoLink);
$xpath = new DOMXpath($doc);
$elements = $xpath->query('//* [#id="cp1_ctl01_pnlPlayerStats"]/table[1]/tbody/tr[4]/td[18]');
if (!is_null($elements)) {
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}
A few things I've tried have thrown me errors, and any time I finally get pass them or suppress them I get empty content. I've tried a bunch of different formats but none seem to give me the desired content.
Edit: Here's the source HTML, I want to grab the value within the td (13.0).
Edit 2: So this is what I'm trying now:
$html = file_get_contents($P_RotoLink);
$doc = new DOMDocument;
libxml_use_internal_errors(true);
$doc->loadHTML($html);
libxml_use_internal_errors(false);
$xpath = new DOMXpath( $doc);
foreach ($xpath->query('//*[#id="cp1_ctl01_pnlPlayerStats"]/table//tr[4]/td[18]') as $node) {
$ppg = substr($node->textContent,0,3);
echo $ppg;
}
The problem is that the table in the screenshot doesn't have tbody node, but your XPath expression includes tbody which causes DOMXPath::query to return an empty list of nodes. I suggest ignoring tbody and fetching only rows with //tr.
Example
$html = <<<'HTML'
<div id="cp1_ctl01_pnlPlayerStats">
<table>
<tr></tr>
<tr>
<td><span>0.9</span>1.0<span>3.0</span></td><td>2.0</td>
</tr>
</table>
</div>
HTML;
$doc = new DOMDocument();
$doc->loadHTML($html);
$xp = new DOMXPath($doc);
$expr = '//*[#id="cp1_ctl01_pnlPlayerStats"]/table//tr[2]/td[1]/text()';
$td = $xp->query($expr);
if ($td->length) {
var_dump($td[0]->nodeValue);
}
Output
string(3) "1.0"
The text() function selects all text node children of the context node.
Related
Can you echo the results of a document parser or do you have to first create an array to display the results? Anyway, when running the code, nothing appears (no output or errors), and I have tried both methods. Could possibly be a site issue but I have tried a few others and get the same result.
<?php
$ebayquery ='halo';
$ebayhtml = 'https://www.ebay.com/sch/i.html_from=R40&_trksid=p2380057.m570.l1311.R6.TR12.TRC2.A0.H0.X.TRS0&_nkw=' . $ebayquery . '&_sacat=0';
$ebayresults = array();
$document = new \DOMDocument('1.0', 'UTF-8');
$internalErrors = libxml_use_internal_errors(true);
$document->loadHTML($ebayhtml);
libxml_use_internal_errors($internalErrors);
$xpath = new DOMXpath($document);
$links = $xpath->query('//h3[#id="lvtitle"]/a');
foreach($links as $a) {
echo $a->nodeValue;
}
?>
There are a couple of problems with the code. Firstly is that loadHTML() takes a string for the HTML and not a filename or URI. So first you have to read the web page and pass it in ( I've used file_get_contents() here).
Secondly, the XPath was looking for any <h3> tag with an id attribute of lvtitle, there are only instances where the class attribute is lvtitle. I've updated the XPath expression to use this instead.
$ebayquery ='halo';
$ebayhtml = 'https://www.ebay.com/sch/i.html_from=R40&_trksid=p2380057.m570.l1311.R6.TR12.TRC2.A0.H0.X.TRS0&_nkw=' . $ebayquery . '&_sacat=0';
$ebayresults = array();
$document = new \DOMDocument('1.0', 'UTF-8');
$internalErrors = libxml_use_internal_errors(true);
$ebayhtml = file_get_contents($ebayhtml);
$document->loadHTML($ebayhtml);
libxml_use_internal_errors($internalErrors);
$xpath = new DOMXpath($document);
$links = $xpath->query('//h3[#class="lvtitle"]/a');
print_r($links);
foreach($links as $a) {
echo $a->nodeValue.PHP_EOL;
}
I try to get the result of a span tag like this:
The problem, the span class is often names css "AB" into the html code and I want to get the one span tag with the itemprop ratingvalue, only.
<span class="AB" itemprop='ratingValue'>Count</span>
In php I use generally this code:
$html= file_get_contents('url');
$html = escapeshellarg($html);
$html = nl2br($html);
$dom = new DOMDocument;
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$results = $xpath->query("//*[#itemprop='ratingValue']");
if ($results->length > 0) {
echo $review_count_html = $results->item(0)->nodeValue;
}
This code does run generally, but by this request I get no results. Can anybody help me? Thanks a lot.
I'm trying to get the bing search results with XPath. Here is my code:
$html = file_get_contents("http://www.bing.com/search?q=bacon&first=11");
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHtml($html);
$x = new DOMXpath($doc);
$output = array();
// just grab the urls for now
foreach ($x->query("//li[#class='b_algo']") as $node)
{
//$output[] = $node->getAttribute("href");
$tmpDom = new DOMDocument();
$tmpDom->loadHTML($node);
$tmpDP = new DOMXPath($tmpDom);
echo $tmpDP->query("//div[#class='b_title']//h2//a//href");
}
return $output;
This foreach iterates over all results, all I want to do is to extract the link and text from $node in foreach, but because $node itself is an object I can't create a DOMDocument from it. How can I query it?
First of all, your XPath expression tries to match non-existant href subelements, query #href for the attribute.
You don't need to create any new DOMDocuments, just pass the $node as context item:
foreach ($x->query("//li[#class='b_algo']") as $node)
{
var_dump( $x->query("./div[#class='b_title']//h2//a//#href", $node)->item(0) );
}
If you're just interested in the URLs, you could also query them directly:
foreach ($x->query("//li[#class='b_algo']/div[#class='b_title']/h2/a/#href") as $node)
{
var_dump($node);
}
I was to use DOMXpath to loop through the nodes of a DOM and stops when it gets to the first piece of text.
So with this method I can capture and delete the first lot of line breaks but leave the rest after hello world:
$html = '<br><br><br>Hello World<br><br><br>'
I'm not sure what the $xpath query is to find plain text but I imaging the code would be something like this:
$doc = new DOMDocument();
$doc->loadHTML($html);
showDOMNode($doc);
$i = 1;
$dom_xpath = new DOMXpath($doc);
foreach($nodes as $node) {
do {
$node->parentNode->removeChild($node);
} while ($i > 0);
if($node == $xpath->query("/:TEXT")){
$i = 0;
}
}
Just a rough piece of code but imagine what I want is something like that, could somebody fill in the gaps for me please.
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
foreach($xpath->query('//br[not(preceding::text())]') as $node) {
$node->parentNode->removeChild($node);
}
return $doc->saveHTML();
#cHao the man!
I have a var of a HTTP (craigslist) link $link, and put the contents into $linkhtml. In this var is the HTML code for a craigslist page, $link.
I need to extract the text between <h2> and </h2>. I could use a regexp, but how do I do this with PHP DOM? I have this so far:
$linkhtml= file_get_contents($link);
$dom = new DOMDocument;
#$dom->loadHTML($linkhtml);
What do I do next to put the contents of the element <h2> into a var $title?
if DOMDocument looks complicated to understand/use to you, then you may try PHP Simple HTML DOM Parser which provides the easiest ever way to parse html.
require 'simple_html_dom.php';
$html = '<h1>Header 1</h1><h2>Header 2</h2>';
$dom = new simple_html_dom();
$dom->load( $html );
$title = $dom->find('h2',0)->plaintext;
echo $title; // outputs: Header 2
You can use this code:
$linkhtml= file_get_contents($link);
$doc = new DOMDocument();
libxml_use_internal_errors(true);
$doc->loadHTML($linkhtml); // loads your html
$xpath = new DOMXPath($doc);
$h2text = $xpath->evaluate("string(//h2/text())");
// $h2text is your text between <h2> and </h2>
You can do this with XPath: untested, may contain errors
$linkhtml= file_get_contents($link);
$dom = new DOMDocument;
#$dom->loadHTML($linkhtml);
$xpath = new DOMXpath($dom);
$elements = $xpath->query("/html/body/h2");
if (!is_null($elements)) {
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}