Retrieve data from the first td in every tr

Retrieve data from the first td in every tr - php

I'm scraping a page which contains of a table with several tr's. Inside every tr there's four td's, and I want to get the data from the first of these td's. Below is the code I've tried so far, but it grabs all the td's. How can I accomplish what I want?
...
$html = new simple_html_dom();
$html = file_get_html($url);
foreach($html->find('table tr') as $row) {
foreach($row->find('td', 0) as $cell) {
echo $cell;
}
}

Think about why you're using the second foreach when you actually only mean to act on one element within each row.
$html = new simple_html_dom();
$html = file_get_html($url);
foreach($html->find('table tr') as $row) {
$cell = $row->find('td', 0);
echo $cell;
}

simple html dom is a turd. It's simpler to use the built in dom functions and xpath:
$dom = new DOMDocument();
#$dom->loadHTMLFile($url);
$xpath = new DOMXPath($dom);
foreach($xpath->query('//td[1]') as $td){
echo $td->nodeValue;
}
That said, I would probably still prefer to use phpquery

Related

get value of href inside of div from external site using PHP

good day Sir/Maam.
I have a certain html attribute that I want to search from the external website
I want to get the a href value but the problem is the id or class or name is random.
<div class="static">
Dynamic
</div>

This code should display all the hrefs in http://example.com
In this case I use DOMDocument and XPath to select the elements you want to access because it's very flexible and easy to use.
<?php
$html = file_get_contents("http://example.com");
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DomXPath($doc);
$nodeList = $xpath->query("//a/#href");
print_r($nodeList);
// To access the values inside nodes
foreach($nodeList as $node){
echo "<p>" . $node->nodeValue . "</p>";
}

use jquery to get the value as follow:
var link = $(".static>a").attr("href");

You can use PHP DOMDocument:
<?php
$exampleurl = "http://YourDomain.com"; //set your url
$filterClass = "dynamicclass";
$dom = new DOMDocument('1.0');
#$dom->loadHTMLFile($exampleurl);
$anchors = $dom->getElementsByTagName('a');
foreach ($anchors as $element) {
$href = $element->getAttribute('href'); // all href
$class = $element->getAttribute('class');
if($class==$filterClass){
echo $href;
}
}
?>

Getting data from HTML using DOMDocument

I'm trying to get data from HTML using DOM. I can get some data, but can't figure out how to get the rest. Here is an image highlighting the data I want.
http://i.imgur.com/Es51s5s.png
here is the code itself
http://pastebin.com/Re8qEivv
and here my PHP code
$html = file_get_contents('result.html');
$dom = new DOMDocument;
$dom->loadHTML($html);
$tr = $dom->getElementsByTagName('tr');
foreach ($tr as $row){
$td = $row->getElementsByTagName('td');
$td1 = $td->item(1);
$td2 = $td->item(2);
foreach ($td1->childNodes as $node){
$title = $node->textContent;
}
foreach ($td2->childNodes as $node){
$type = $node->textContent;
}
}

Figured it out
$html = file_get_contents('result.html');
$dom = new DOMDocument;
$dom->loadHTML($html);
$tr = $dom->getElementsByTagName('tr');
foreach ($tr as $row){
$td = $row->getElementsByTagName('td');
$td1 = $td->item(1);
$td2 = $td->item(2);
$title = $td1->childNodes->item(0)->textContent;
$firstURL = $td1->getElementsByTagName('a')->item(0)->getAttribute('href');
$type = $td2->childNodes->item(0)->textContent;
$imageURL = $td2->getElementsByTagName('img')->item(0)->getAttribute('src');
}

I have used following class.
http://sourceforge.net/projects/simplehtmldom/
This is very simple and easy to use class.
You can use
$html->find('#RosterReport > tbody', 0);
to find specific table
$html->find('tr')
$html->find('td')
to find table rows or columns
Note $html is variable have full html dom content.

PHP DOMDocument Strings to Objects

I have created a php script in PHP Dom where multiple html files are scraped to look for all P tags that contain a specific class.
I then want to get the values inside those p tags and build an unordered list in PHP Dom.
My problem is, while I can get the values and echo all of them onto a page, when I try to createElements and append each value in its own LI tag my results only returns the LAST item in the list. I hope that makes sense. Here is the code:
$dom = new DOMDocument();
$dom->formatOutput = true;
$dom->preservewhiteSpace = false;
//looping through an array
foreach ($pages as $page) {
foreach ($page['pageContent'] as $listlinks) {
$dom->loadHTMLFile($theurl . 'content_id_' . $listlinks['content'] . '.html');
//create the xPath object after loading the html source, otherwise the query won't work:/
$xPath = new DOMXPath($dom);
//get the p nodes in a DOMNodeList that has class"content_header_type_2":
$nodeList = $xPath->query("//p[#class='content_header_type_2']");
//create a new DOMDocument and add a ul element:
$newDom = new DOMDocument();
$ul = $newDom->createElement('ul');
$newDom->appendChild($ul);
// append all nodes from $nodeList to the new dom, as children of $ul:
foreach ($nodeList as $domElement) {
$domNode = $newDom->importNode($domElement, true);
echo $domNode->nodeValue . '<br>'; //This gives the entire list
$li = $newDom->createElement('li', $domNode->nodeValue); //This gives the last value in the list
$ul->appendChild($li);
}
}
};
$output = $newDom ->saveHTML();
echo $output;

Simple HTML DOM gets only 1 element

I'm following a simplified version of the scraping tutorial by NetTuts here, which basically finds all divs with class=preview
http://net.tutsplus.com/tutorials/php/html-parsing-and-screen-scraping-with-the-simple-html-dom-library/comment-page-1/#comments
This is my code. The problem is that when I count $items I get only 1, so it's getting only the first div with class=preview, not all of them.
$articles = array();
$html = new simple_html_dom();
$html->load_file('http://net.tutsplus.com/page/76/');
$items = $html->find('div[class=preview]');
echo "count: " . count($items);

Try using DOMDocument and DOMXPath:
$file = file_get_contents('http://net.tutsplus.com/page/76/');
$dom = new DOMDocument();
#$dom->loadHTML($file);
$domx = new DOMXPath($dom);
$nodelist = $domx->evaluate("//div[#class='preview']");
foreach ($nodelist as $node) { print $node->nodeValue; }

how to handle DOM in PHP

My PHP code
$dom = new DOMDocument();
#$dom->loadHTML($file);
$xpath = new DOMXPath($dom);
$tags = $xpath->query('//div[#class="text"]');
foreach ($tags as $tag) {
echo $tag->textContent;
}
What I'm trying to do here is to get the content of the div that has class 'text' but the problem when I loop and echo the results I only get the text I can't get the HTML code with images and all the HTML tags like p, br, img... etc i tried to use $tag->nodeValue; but also nothing worked out.

Personally, I like Simple HTML Dom Parser.
include "lib.simple_html_dom.php"
$html = str_get_html($file);
foreach($html->find('div.text') as $e){
echo $e->innertext;
}
Pretty simple, huh? It accommodates selectors like jQuery :)

What you need to do is create a temporary document, add the element to that and then use saveHTML():
foreach ($tags as $tag) {
$doc = new DOMDocument;
$doc->appendChild($doc->importNode($tag, true));
$html = $doc->saveHTML();
}

I found this snippet at http://www.php.net/manual/en/class.domelement.php:
<?php
function getInnerHTML($Node)
{
$Body = $Node->ownerDocument->documentElement->firstChild->firstChild;
$Document = new DOMDocument();
$Document->appendChild($Document->importNode($Body,true));
return $Document->saveHTML();
}
?>
Not sure if it works though.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Retrieve data from the first td in every tr - php

Think about why you're using the second foreach when you actually only mean to act on one element within each row. $html = new simple_html_dom(); $html = file_get_html($url); foreach($html->find('table tr') as $row) { $cell = $row->find('td', 0); echo $cell; }

simple html dom is a turd. It's simpler to use the built in dom functions and xpath: $dom = new DOMDocument(); #$dom->loadHTMLFile($url); $xpath = new DOMXPath($dom); foreach($xpath->query('//td[1]') as $td){ echo $td->nodeValue; } That said, I would probably still prefer to use phpquery

Related

get value of href inside of div from external site using PHP

Getting data from HTML using DOMDocument

PHP DOMDocument Strings to Objects

Simple HTML DOM gets only 1 element

how to handle DOM in PHP

Categories

Resources