How to get a table by ID from a URL? - php

I am attempting to get a table from a specific URL by it's ID. My method is getting the raw HTML from the URL, converting it into a readable DOM for PHP, and then finding the table via a query.
The results of the below code is $elements always being empty (length of 0).
<?php
$c = curl_init('http://www.urlhere.com/');
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
$html = curl_exec($c);
if (curl_error($c))
die(curl_error($c));
curl_close($c);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXpath($dom);
$elements = $xpath->query("*/table[#id=anyid]");
if (!is_null($elements)) {
foreach ($elements as $element) {
echo "<br/>[". $element->nodeName. "]";
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}
?>
How can I render this table successfully on my page?
EDIT:
A snippet of the HTML I am trying to get, taken directly from the $html variable:
<div></div><table class=sortable id=anyid></table>

To continue on the comments, you could hide those errors first thru:
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();
This discussion is thoroughly tacked here.
Then to apply it, just add it in your code:
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();
$xpath = new DOMXpath($dom);
$elements = $xpath->query("//table[#id='anyid']");
if (!is_null($elements)) {
foreach ($elements as $element) {
echo "<br/>[". $element->nodeName. "]";
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}
Sample Output

Related

Xpath nodeValue/textContent unable to see <BR> tag

HTML is as follows:
ABC<BR>DEF
However, both nodeValue and textContent attributes show "ABCDEF" as the value.
Any way to show or parse the <BR>?
Maybe this'll help you: DOMNode::C14N
It'll return the HTML of the node.
<?php
$a = 'ABC<BR>DEF';
$doc = new DOMDocument();
#$doc->loadHTML($a);
$finder = new DomXPath($doc);
$nodes = $finder->query("//a");
foreach ($nodes as $node) {
var_dump($node->c14n());
}
Demo
I know you have already solved your problem, but I wanted to add a more direct way of solving it...
$a = 'ABC<BR>DEF';
$doc = new DOMDocument();
$doc->loadHTML($a);
$xp = new DomXPath($doc);
$nodes = $xp->query("//a/node()");
$text = '';
foreach ($nodes as $node) {
$text .= $doc->saveHTML($node);
}
echo $text;
Outputs...
ABC<br>DEF

How to get an exact value from a website using php DOM and save it in a database?

I want to get the span id "CPH1_lblCurrent" from the url and save it in the database.
here is the code that i tried by seeing some examples.
<?php
$file = $DOCUMENT_ROOT. "http://www.mypetrolprice.com/2/Petrol-price-in-Delhi";
$doc = new DOMDocument();
$doc->loadHTMLFile($file);
$xpath = new DOMXpath($doc);
$elements = $xpath->query('//span[#id="CPH1_lblCurrent"]');
if (!is_null($elements)) {
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
echo $node->nodeValue. "\n";
}
}
}
?>
This shows me the following.
Current Delhi Petrol Price = 67.12 Rs/Ltr
but i want only the value 67.12.
Can somebody help me.
try to use this simple regex for getting nubmer
.*= ([\d.]+) .*
preg_match

Parsing HTML to extract array of DIV content by class

$html = file_get_contents("https://www.wireclub.com/chat/room/music");
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$result = array();
foreach($xpath->evaluate('//div[#class="message clearfix"]/node()') as $childNode) {
$result[] = $dom->saveHtml($childNode);
}
echo '<pre>'; var_dump($result);
I would like the content of each individual DIV in an array to be processed individually.
This code is clumping every DIV together.
You could retrieve all the div and get the nodeValue
$dom = new DOMDocument();
$dom->loadHTML($html);
$myDivs = $dom->getElementsByTagName('div');
foreach($myDivs as $key => $value) {
$result[] = $value->nodeValue;
}
var_dump($result);
for class you should
you could use you code
$xpath = new DOMXPath($dom);
$myElem = $xpath->query("//*[contains(#class, '$classname')]");
foreach($myElem as $key => $value) {
$result[] = $value->nodeValue;
}

Crawling through Amazon Bestsellers page

<?php
$i=1;
while ($i<=5) {
# code...
$url = 'http://www.amazon.in/gp/bestsellers/electronics/ref=zg_bs_nav_0#'.$i;
echo $url;
$html= file_get_contents($url);
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xPath = new DOMXPath($dom);
$classname="zg_title";
$elements = $xPath->query("//*[contains(#class, '$classname')]");
foreach ($elements as $e)
{
$lnk = $e->getAttribute('href');
$e->setAttribute("href", "http://www.amazon.in".$lnk);
$newdoc = new DOMDocument;
$e = $newdoc->importNode($e, true);
$newdoc->appendChild($e);
$html = $newdoc->saveHTML();
echo $html;
}
$i++;
}
?>
I am trying to crawl through the Amazon bestsellers page which has a list of top 100 bestseller items which have 20 items in each page. In every loop the $i value is changed and appended to URL. But only the first 20 items are being displayed 5 times, I think this has something to do with the ajax pagination, but i am not able to figure out what it is.
Try this:
<?php
$i=1;
while ($i<=5) {
# code...
$url = 'http://www.amazon.in/gp/bestsellers/electronics/ref=zg_bs_electronics_pg_'.$i.'?ie=UTF8&pg='.$i;
echo $url;
$html= file_get_contents($url);
$dom = new DOMDocument();
#$dom->loadHTML($html);
$xPath = new DOMXPath($dom);
$classname="zg_title";
$elements = $xPath->query("//*[contains(#class, '$classname')]");
foreach ($elements as $e)
{
$lnk = $e->getAttribute('href');
$e->setAttribute("href", "http://www.amazon.in".$lnk);
$newdoc = new DOMDocument;
$e = $newdoc->importNode($e, true);
$newdoc->appendChild($e);
$html = $newdoc->saveHTML();
echo $html;
}
$i++;
}
?>
Change your $url

php dom not able to find any nodes

I'm trying to get the href of all anchor(a) tags using this code
$obj = json_decode($client->getResponse()->getContent());
$dom = new DOMDocument;
if($dom->loadHTML(htmlentities($obj->data->partial))) {
foreach ($dom->getElementsByTagName('a') as $node) {
echo $dom->saveHtml($node), PHP_EOL;
echo $node->getAttribute('href');
}
}
where the returned JSON is like here but it doesn't echo anything. The HTML does have a tags but the foreach is never run. What am I doing wrong?
Just remove that htmlentities(). It will work just fine.
$contents = file_get_contents('http://jsonblob.com/api/jsonBlob/54a7ff55e4b0c95108d9dfec');
$obj = json_decode($contents);
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTML($obj->data->partial);
libxml_clear_errors();
foreach ($dom->getElementsByTagName('a') as $node) {
echo $dom->saveHTML($node) . '<br/>';
echo $node->getAttribute('href') . '<br/>';
}

Categories