Retrieve data from the first td in every tr - php

I'm scraping a page which contains of a table with several tr's. Inside every tr there's four td's, and I want to get the data from the first of these td's. Below is the code I've tried so far, but it grabs all the td's. How can I accomplish what I want?
...
$html = new simple_html_dom();
$html = file_get_html($url);
foreach($html->find('table tr') as $row) {
foreach($row->find('td', 0) as $cell) {
echo $cell;
}
}

Think about why you're using the second foreach when you actually only mean to act on one element within each row.
$html = new simple_html_dom();
$html = file_get_html($url);
foreach($html->find('table tr') as $row) {
$cell = $row->find('td', 0);
echo $cell;
}

simple html dom is a turd. It's simpler to use the built in dom functions and xpath:
$dom = new DOMDocument();
#$dom->loadHTMLFile($url);
$xpath = new DOMXPath($dom);
foreach($xpath->query('//td[1]') as $td){
echo $td->nodeValue;
}
That said, I would probably still prefer to use phpquery

Related

get value of href inside of div from external site using PHP

good day Sir/Maam.
I have a certain html attribute that I want to search from the external website
I want to get the a href value but the problem is the id or class or name is random.
<div class="static">
Dynamic
</div>
This code should display all the hrefs in http://example.com
In this case I use DOMDocument and XPath to select the elements you want to access because it's very flexible and easy to use.
<?php
$html = file_get_contents("http://example.com");
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DomXPath($doc);
$nodeList = $xpath->query("//a/#href");
print_r($nodeList);
// To access the values inside nodes
foreach($nodeList as $node){
echo "<p>" . $node->nodeValue . "</p>";
}
use jquery to get the value as follow:
var link = $(".static>a").attr("href");
You can use PHP DOMDocument:
<?php
$exampleurl = "http://YourDomain.com"; //set your url
$filterClass = "dynamicclass";
$dom = new DOMDocument('1.0');
#$dom->loadHTMLFile($exampleurl);
$anchors = $dom->getElementsByTagName('a');
foreach ($anchors as $element) {
$href = $element->getAttribute('href'); // all href
$class = $element->getAttribute('class');
if($class==$filterClass){
echo $href;
}
}
?>

Getting data from HTML using DOMDocument

I'm trying to get data from HTML using DOM. I can get some data, but can't figure out how to get the rest. Here is an image highlighting the data I want.
http://i.imgur.com/Es51s5s.png
here is the code itself
http://pastebin.com/Re8qEivv
and here my PHP code
$html = file_get_contents('result.html');
$dom = new DOMDocument;
$dom->loadHTML($html);
$tr = $dom->getElementsByTagName('tr');
foreach ($tr as $row){
$td = $row->getElementsByTagName('td');
$td1 = $td->item(1);
$td2 = $td->item(2);
foreach ($td1->childNodes as $node){
$title = $node->textContent;
}
foreach ($td2->childNodes as $node){
$type = $node->textContent;
}
}
Figured it out
$html = file_get_contents('result.html');
$dom = new DOMDocument;
$dom->loadHTML($html);
$tr = $dom->getElementsByTagName('tr');
foreach ($tr as $row){
$td = $row->getElementsByTagName('td');
$td1 = $td->item(1);
$td2 = $td->item(2);
$title = $td1->childNodes->item(0)->textContent;
$firstURL = $td1->getElementsByTagName('a')->item(0)->getAttribute('href');
$type = $td2->childNodes->item(0)->textContent;
$imageURL = $td2->getElementsByTagName('img')->item(0)->getAttribute('src');
}
I have used following class.
http://sourceforge.net/projects/simplehtmldom/
This is very simple and easy to use class.
You can use
$html->find('#RosterReport > tbody', 0);
to find specific table
$html->find('tr')
$html->find('td')
to find table rows or columns
Note $html is variable have full html dom content.

PHP DOMDocument Strings to Objects

I have created a php script in PHP Dom where multiple html files are scraped to look for all P tags that contain a specific class.
I then want to get the values inside those p tags and build an unordered list in PHP Dom.
My problem is, while I can get the values and echo all of them onto a page, when I try to createElements and append each value in its own LI tag my results only returns the LAST item in the list. I hope that makes sense. Here is the code:
$dom = new DOMDocument();
$dom->formatOutput = true;
$dom->preservewhiteSpace = false;
//looping through an array
foreach ($pages as $page) {
foreach ($page['pageContent'] as $listlinks) {
$dom->loadHTMLFile($theurl . 'content_id_' . $listlinks['content'] . '.html');
//create the xPath object after loading the html source, otherwise the query won't work:/
$xPath = new DOMXPath($dom);
//get the p nodes in a DOMNodeList that has class"content_header_type_2":
$nodeList = $xPath->query("//p[#class='content_header_type_2']");
//create a new DOMDocument and add a ul element:
$newDom = new DOMDocument();
$ul = $newDom->createElement('ul');
$newDom->appendChild($ul);
// append all nodes from $nodeList to the new dom, as children of $ul:
foreach ($nodeList as $domElement) {
$domNode = $newDom->importNode($domElement, true);
echo $domNode->nodeValue . '<br>'; //This gives the entire list
$li = $newDom->createElement('li', $domNode->nodeValue); //This gives the last value in the list
$ul->appendChild($li);
}
}
};
$output = $newDom ->saveHTML();
echo $output;

Simple HTML DOM gets only 1 element

I'm following a simplified version of the scraping tutorial by NetTuts here, which basically finds all divs with class=preview
http://net.tutsplus.com/tutorials/php/html-parsing-and-screen-scraping-with-the-simple-html-dom-library/comment-page-1/#comments
This is my code. The problem is that when I count $items I get only 1, so it's getting only the first div with class=preview, not all of them.
$articles = array();
$html = new simple_html_dom();
$html->load_file('http://net.tutsplus.com/page/76/');
$items = $html->find('div[class=preview]');
echo "count: " . count($items);
Try using DOMDocument and DOMXPath:
$file = file_get_contents('http://net.tutsplus.com/page/76/');
$dom = new DOMDocument();
#$dom->loadHTML($file);
$domx = new DOMXPath($dom);
$nodelist = $domx->evaluate("//div[#class='preview']");
foreach ($nodelist as $node) { print $node->nodeValue; }

how to handle DOM in PHP

My PHP code
$dom = new DOMDocument();
#$dom->loadHTML($file);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[#class="text"]');
foreach ($tags as $tag) {
echo $tag->textContent;
}
What I'm trying to do here is to get the content of the div that has class 'text' but the problem when I loop and echo the results I only get the text I can't get the HTML code with images and all the HTML tags like p, br, img... etc i tried to use $tag->nodeValue; but also nothing worked out.
Personally, I like Simple HTML Dom Parser.
include "lib.simple_html_dom.php"
$html = str_get_html($file);
foreach($html->find('div.text') as $e){
echo $e->innertext;
}
Pretty simple, huh? It accommodates selectors like jQuery :)
What you need to do is create a temporary document, add the element to that and then use saveHTML():
foreach ($tags as $tag) {
$doc = new DOMDocument;
$doc->appendChild($doc->importNode($tag, true));
$html = $doc->saveHTML();
}
I found this snippet at http://www.php.net/manual/en/class.domelement.php:
<?php
function getInnerHTML($Node)
{
$Body = $Node->ownerDocument->documentElement->firstChild->firstChild;
$Document = new DOMDocument();
$Document->appendChild($Document->importNode($Body,true));
return $Document->saveHTML();
}
?>
Not sure if it works though.

Categories