PHP simple HTML DOM parser errors - php

I've started writing a scraper for one site that will also have a crawler, since I need to go through some links, but I'm getting this error :
PHP Fatal error: Uncaught Error: Call to a member function find() on
null in D:\Projekti\hemrank\simple_html_dom.php:1129 Stack trace:
0 D:\Projekti\hemrank\scrapeit.php(37): simple_html_dom->find('ul')
1 D:\Projekti\hemrank\scrapeit.php(19): ScrapeIt->getAllAddresses()
2 D:\Projekti\hemrank\scrapeit.php(55): ScrapeIt->run()
3 {main} thrown in D:\Projekti\hemrank\simple_html_dom.php on line 1129
When I var_dump the $html variable I get the full html with all the tags, etc, that's why it's strange to me that it says "Call to a member function find() on null", when there's actually value in the $html. Here's the part of the code that's not working :
$html = new simple_html_dom();
$html->load_file($baseurl);
if(empty($html)){echo "HTTP Response not received!<br/>\n";exit;}
$links = array();
foreach ($html->find('ul') as $ul) {
if(!empty($ul) && (count($ul)>0))
foreach ($ul->find('li') as $li) {
if(!empty($li) && (count($li)>0))
foreach ($li->find('a') as $a) {
$links[] = $a->href;
}
else
die("NOT AVAILABLE");
}
}
return $links;
}
Is this a common problem with PHP simple HTML DOM parser, is there a solution or should I switch to some other kind of scraping?

I just searched for the lib you are using, this is line 1129:
return $this->root->find($selector, $idx, $lowercase);
So your error message is telling you that $this->root inside the class is null, therefore no find() method exists!
I'm no expert on the lib, as I use the awesome DOMDocument for parsing HTML, but hopefully this should help you understand what has happened.
Also, $html will never be empty in that code of yours, you already populated it when you instantiated it!

I suggest the following change:
$html->load_file($baseurl); to $html = file_get_html($baseurl);
On my VPS server it works with $html->load_file($baseurl); but on my dedicated local server it only works with $html = file_get_html($baseurl);
This solved my problem:
- Call to a member function find() on null
- simple_html_dom.php on line 1129

Related

file_get_html error, not working

I'm using Simple HTML Dom to try scrape a HTML table.
I follow their instructions and have looked at many other code examples, but the file_get_html just doesn't seem to work.
Here is my code:
<?php
// Simple HTML Dom Parser
include('simple_html_dom.php');
//$worlds = ["Amera", "Antica", "Astera", "Aurera", "Aurora", "Bellona", "Belobra", "Beneva", "Calmera", "Calva", "Calvera", "Candia", "Celesta", "Chrona", "Danera", "Dolera", "Efidia", "Eldera", "Ferobra", "Fidera", "Fortera", "Garnera", "Guardia", "Harmonia", "Honera", "Hydera", "Inferna", "Iona", "Irmada", "Julera", "Justera", "Kenora", "Kronera", "Laudera", "Luminera", "Magera", "Menera", "Morta", "Mortera", "Neptera", "Nerana", "Nika", "Olympa", "Osera", "Pacera", "Premia", "Pythera", "Quilia", "Refugia", "Rowana", "Secura", "Serdebra", "Shivera", "Silvera", "Solera", "Tavara", "Thera", "Umera", "Unitera", "Veludera", "Verlana", "Xantera", "Xylana", "Yanara", "Zanera", "Zeluna"];
//foreach ($worlds as $world) {
// All HTML from the online list
$html = file_get_html('https://secure.tibia.com/community/?subtopic=worlds&world=Antica');
// Search for the online list table content
foreach ($html->find('tr[class=Table2]') as $row) {
$name = $row->find('td', 0)->plaintext;
$level = $row->find('td', 1)->plaintext;
$vocation = $row->find('td', 2)->plaintext;
echo $name . ' | ' . $level . ' | ' . $vocation . '<br>';
}
//}
?>
And I get these errors:
Warning: file_get_contents(): stream does not support seeking in D:\xampp\htdocs\simple_html_dom.php on line 76
Warning: file_get_contents(): Failed to seek to position -1 in the stream in D:\xampp\htdocs\simple_html_dom.php on line 76
Fatal error: Uncaught Error: Call to a member function find() on boolean in D:\xampp\htdocs\index.php:13 Stack trace: #0 {main} thrown in D:\xampp\htdocs\index.php on line 13
What am I doing wrong?
The table I am trying to scrape is the "Players Online" table on:
https://secure.tibia.com/community/?subtopic=worlds&world=Antica
Try this:
$html = str_get_html(file_get_contents($url));
This is a simple_html_dom library problem with the latest versions of PHP.
To correct it, simply change "$offset = -1," to "$offset = 0," in the parameters of the "file_get_html" function in the "simple_html_dom.php" file.
I don't know much about simpledom but i think you might need to use a more robust library like https://github.com/FriendsOfPHP/Goutte

Parsing HTML Table - PHP

I am trying to parse many HTML tables, with the URLs stored in the database. The current problem with my code is that it will fail on a different table every time. Here is the part of the code that gets the error:
while ($sqlrow = mysqli_fetch_row($res)) {
echo "Started Processing Table " . $tables . PHP_EOL;
$tables++;
$data = file_get_contents($sqlrow[1]);
$dom->loadHTML($data);
$dom->preserveWhiteSpace = false;
$teamtable = $dom->getElementById("reTeamTable");
$teamrows = $teamtable->getElementsByTagName('tr');
The lines that usually fails is either the "getElementById" command or the "getElementsByTagName" command. The error I am getting is: "PHP Fatal error: Call to a member function getElementsByTagName() on a non-object in /scouting/teamlist.php on line 20". I don't understand why this is getting an error on a different URL every time.
Its means that $dom doesnt find element with id="reTeamTable" ( $teamtable is null ). Before call getElementsByTagName , check $teamtable on empty.

"Comment not terminated" XML parsing error in Box API response

For months I've been running the "Box Rest Client" lib by Angela R that employs the following code to parse curl responses from the box API:
$xml = simplexml_load_string($res);
Today, after the code loops through dozens of request/responses I generate this following error:
ErrorException [ Warning ]: simplexml_load_string(): Entity: line 9:
parser error : Comment not terminated
This happened in 2 straight attempts to run the code - and now seems to have gone away without any changes to anything.
Interested if anyone knows what is up with that?
I have put a catch for this case if its useful to anyone using this lib (for the next month or so before its deprecated by box api 2.0)
private function parse_result($res) {
try {
$xml = simplexml_load_string($res);
$json = json_encode($xml);
$array = json_decode($json,TRUE);
return $array;
} catch (Exception $e){
$error = 'xml parsing error: '. $e->getMessage(). "<br>";
return array('status' => $error );
}
}
It's possible it is related to including two minus signs -- inside of an HTML comment. For example:
<!-- this is my comment--but not a very good one. -->
The two dashes in the middle of the comment causes problems with the parser.

Call to a member function find() on a non-object

I keep getting this error from my code and I have no idea what I am doing wrong, this happens on occasions and it seems to work when it wants to
error
Call to a member function find() on a non-object in C:\xampp\htdocs\sites\P\Find.php on line 265
I've basically created a crawler which searches a webpage for an element on the webpage, sometimes this element may not be present on the page, and I check for this by using the if statement.
line 265 refers to
if($page->find('div#olpDivId span.price'))
code
$page = file_get_html('http://www.amazon.co.uk/dp/0304362212');
if($page->find('div#olpDivId span.price')){
foreach($page->find('div#olpDivId span.price') as $p){
$i[] = floatval($p->plaintext);
}
}
if the book does not exist the crawler goes to a blank "sorry product does not exist" page
Am I doing something wrong? any help would be appreciated
file_get_html can return false (if it was unable to fetch content from webpage), so you should check for it before using any method on $page
$page = file_get_html('http://www.amazon.co.uk/dp/0304362212');
if($page !== FALSE){
foreach($page->find('div#olpDivId span.price') as $p){
$i[] = floatval($p->plaintext);
}
}

PHP: DOMNode::appendChild to an array of Elements

I am using DOMDocument to parse an XML file. I loop through the different Elements and see if any of them is missing and I fill an array with a createElement, with the error message. At the end I'm trying to appendChild that array but I always get the same error message:
Uncaught exception 'DOMException' with message 'Wrong Document Error'
DOMNode->appendChild(Object(DOMElement))
1 {main}
thrown in /xxx/xxx.php on line 235
PHP Fatal error: Call to undefined method DOMElement::item() in /xxx/xxx.php on line 235.
the code is as follow:
$SMQuery = new DOMDocument();
$SMQuery->loadXML($params);
$response = $SMQuery->createElement('SMreply');
$errors = array();
if (!$reqtyp = $SMQuery->getElementsByTagName("tag1"))
{$errors[] = $SMQuery->createElement('error', 'tag1 Element is missing');}
if (!$reqtyp = $SMQuery->getElementsByTagName("tag2"))
{$errors[] = $SMQuery->createElement('error', 'tag2 Element is missing');}
......
if(!empty($errors))
{
foreach($errors as $error) {
$response->appendChild($error); <==== this line is causing the error !!!
}
}
Any help is much appreciated.
Cheers,
Riki.
You don't show where $response is being defined, but if it's the result of another new DOMDocument(), then that explains you error - you can't add nodes from one DOM object to another directly. It has to be imported first via ->importNode(). Only after that can you actually append it.

Categories