Loop through a table with Simple HTML DOM - php

Trying to scrape data out of a table on a website. I got the following PHP written but it isn't working.
Following error received: Notice: Trying to get property of non-object in DataScraping.php on line 27
//Sets the HTML DOM Library
require_once 'C:/xampp/php/lib/SimpleHTMLDOM/simple_html_dom.php';
$html = new simple_html_dom();
$html = file_get_html('https://www.flightradar24.com/data/flights/british-airways-ba-baw');
foreach($html->find('table[id=tbl-datatable]') as $datatable) {
foreach($datatable->find('tr') as $tr) {
foreach($tr->find('td') as $td) {
if(strpos($td->find('a', 0)->href, 'https://www.flightradar24.com/data/flights/') !== false) {
echo $td->find('a', 0)->innertext .", " .$td->find('a', 0)->href;
}
}
}
}
Also worth mentioning, this data is publically available and it is only for personal use. Please don't comment about copyright infringement - there is nothing wrong with what I want to do.
I'm simply trying to scrape the flight number only, both the inner text and the URL that sites behind it. Any help on where I'm going wrong?
Additional test provides the data I need but with the same error in between rows:
foreach($html->find('table[id=tbl-datatable]') as $datatable) {
foreach($datatable->find('tr') as $tr) {
foreach($tr->find('td') as $td) {
if (strpos($td->find('a', 0)->href, '/data/flights/') !== false) {
$test = $td->find('a', 0)->href;
$test2 = $td->find('a', 0)->innertext;
echo $test .", " .$test2;
}
}
}
}

You're trying to access elements of a null reference in your if statement itself, because not all of the <TD> tags have <A> tags in them. When there's no <A> tag in $td, $td->find('a', 0) is null, so
$td->find('a', 0)->href
is just what your error message said: "trying to get [a] property of [a] non-object".
You can fix this by checking the result of find() for null with an if:
$atag = $td->find('a', 0)
if ($atag) {
// ...
}
And you can fold this into your single if statement with the && operator. You've got another couple problems I found when running your code:
in the source of that site, the hrefs in the table are all relative, not absolute, so when you check for 'https://www.flightradar24.com' you find none of them
you're not adding a newline at the end of your echo
So to summarize my suggestions, something like this seems to work:
foreach($tr->find('td') as $td) {
$atag = $td->find('a', 0);
if($atag && strpos($atag->href, '/data/flights/') !== false) {
echo $atag->innertext . ", " . $atag->href . "\n";
}
}

Related

Parsing XML with PHP DOMDocument

I've been looking around for a solution for this but can't find one anywhere.
I am trying to parse a XML file, but certain TagNames are missing from the XML. Some posts suggest using the object length but this doesn't work either.
if ($xmlObject->item($i)->getElementsByTagName('image1')->item(0)->childNodes->item(0)->length > 0) {
$product_image1 = $xmlObject->item($i)->getElementsByTagName('image1')->item(0)->childNodes->item(0)->nodeValue;
} else {
$product_image1 = "";
}
Notice: Trying to get property of non-object in
/home/s/public_html/import_xml.php on line 72
Fatal error: Call to a member function item() on a non-object in
/home/s/public_html/import_xml.php on line 72
The error is because <image1> is missing from the XML.
Any ideas on a fix?
This is how I've done it. Not sure if its "the best" way, but it works...
foreach ($xmlDoc->getElementsByTagName('product')->item($i)->childNodes as $node) {
if ($node->nodeType === XML_ELEMENT_NODE) {
$nodes[] = $node->nodeName;
echo "Processing node " . $node->nodeName . "<br />";
}
}
if (in_array("name", $nodes)) {
$product_name = $xmlObject->item($i)->getElementsByTagName('name')->item(0)->childNodes->item(0)->nodeValue;
} else {
$product_name = "";
}

Error using PHP Simple HTML DOM parser

if (!is_null($elements)) {
$embeds = array();
foreach ($elements as $element) {
if (trim(strip_tags($element->innertext)) == $episode_term) {
$html2 = file_get_html($element->href);
$elements2 = $html2->find('#streamlinks .sideleft a');
if (!is_null($elements2)) {
foreach ($elements2 as $element) {
$html3 = file_get_html($element->href);
$iframe_element = $html3->find('.frame', 0);
if (!is_null($iframe_element)) {
$embed = $misc->buildEmbed($iframe_element->src);
if ($embed) {
$embeds[] = array(
"embed" => $embed,
"link" => $iframe_element->src,
"language" => "ENG",
);
}
}
}
}
}
}
return $embeds;
}
Blockquote
PHP Fatal error: Call to a member function find() on a non-object in
$elements2 = $html2->find('#streamlinks .sideleft a');
so its confusing as to what is causing this error to appear in my error log file?
I'd try to output $element->href befor you do the file_get_html.
If the file_get_html can't get a page $html2 stays uniinitialized and you can't use find on it.
Beside that you could build a check wether $html2 is set after the file_get_html and output an error if not. I usually use something like this:
if($html2 == false || $html2 == NULL){
// no html found
}else{
// html found
}

dom document to get the href and nodeValue

I need to fetch the nodeValue and the HREF from this following snippet
<a class="head_title" href="/automotive/pr?sid=0hx">Automotive</a>
To achieve this I have done the following:
foreach($dom->getElementsByTagName('a') as $p) {
if($p->getAttribute('class') == 'head_title') {
foreach($p->childNodes as $child) {
$name = $child->nodeValue;
echo $name ."<br />";
echo $child->hasAttribute('href');
}
}
}
It returns me an error:
PHP Fatal error: Call to undefined method DOMText::hasAttribute()
Can anyone please help me with this.
hasAttribute is valid method for DOMElements but you cannot use it for text nodes. Can you check the type of node and then try to extract the value is its not a 'text' node. The following code might help you
foreach($p->childNodes as $child) {
$name = $child->nodeValue;
echo $name ."<br />";
if ($child->nodeType == 1) {
echo $child->hasAttribute('href');
}
}
It checks if the node is of type 'DOMElement' and invokes hasAttribute method only if it is a DOMElement.
Yes...I did the changes in my coding like the following:
foreach($dom->getElementsByTagName('a') as $link) {
if($link->getAttribute('class') == 'head_title') {
$link2 = $link->nodeValue;
$link1 = $link->getAttribute('href');
echo "".$link2."<br/>";
}
}
And it works for me!

PHP - Simple HTML DOM: Notice: Trying to get property of non-object error

I've been trying to fix this error for the longest time now, and I just can't seem to fix it.
I'm trying to get an article image, url, and url title. For some reason I keep getting the above error for this code:
<?php
$html = file_get_html("http://articlesite.com/");
if($html){
foreach ($html->find('.index_item a img') as $div) {
$articlePoster = $div->src;
$grabURL = $html->find('.index_item a');
/*Error Here -->*/$articleURL = $grabURL->href;
/*And Here -->*/$rawTitle = $grabURL->title;
echo '<div class="articleFrame"><img src="'.$articlePoster.'" width="125" height="186"/><br><p class="title">'.$rawTitle.'</p></div>';
}
}else{
echo '<h1>'."Sorry.".'</h1>';
}
?>
Any ideas? Thanks.
$html->find('xxxxx') returns an array, so you need to iterate through it -- i.e.
foreach ($html->find('.index_item a img') as $div) {
$articlePoster = $div->src;
foreach ($html->find('.index_item a') as $grabURL) {
$articleURL = $grabURL->href;
$rawTitle = $grabURL->title;
(etc.)

DomNode get value of item

Hello I'm new with domnode and i'm trying to check the values from an xml tree which loads ok.
Here is my code but I dont understand why is not working.
private function createCSV($xml, $f)
{
foreach ($xml->getElementsByTagName('*') as $item)
{
$hasChild = $item->hasChildNodes() ? true : false;
if(!$hasChild)
{
//echo 'Doesn\'t have children';
echo 'Value: ' . $item->nodeValue;
}
else
{
//echo 'Has children';
$this->createCSV($item, $f);
}
}
}
$item->nodeValue doesnt print anything to the browser.
I read the documentation but I can't see any mistake.
PS. $item->tagname doesnt work either.
UPDATE
whe using this: echo $item->ownerDocument->saveHTML($item);
I get the tags listed but i dont get the data inside(between the tags) like innerHTML in javascript.
UPDATE
sample xml data : http://pastebin.com/dkuUUC0Q
Text nodes are also considered child nodes, but you're only iterating element nodes (get Elements ByTagName). Because of this you're almost never getting into the 2nd condition.
Try this:
if(!$xml->hasChildNodes()){
printf('Value: %s', $xml->nodeValue);
return;
}
foreach($xml->childNodes as $item)
$this->createCSV($item, $f);
XPath version:
$xpath = new DOMXPath($xml);
$text = $xpath->query('//text()[normalize-space()]');
foreach($text as $node)
printf('Value: %s', $node->nodeValue);

Categories