file_get_html error, not working - php

I'm using Simple HTML Dom to try scrape a HTML table.
I follow their instructions and have looked at many other code examples, but the file_get_html just doesn't seem to work.
Here is my code:
<?php
// Simple HTML Dom Parser
include('simple_html_dom.php');
//$worlds = ["Amera", "Antica", "Astera", "Aurera", "Aurora", "Bellona", "Belobra", "Beneva", "Calmera", "Calva", "Calvera", "Candia", "Celesta", "Chrona", "Danera", "Dolera", "Efidia", "Eldera", "Ferobra", "Fidera", "Fortera", "Garnera", "Guardia", "Harmonia", "Honera", "Hydera", "Inferna", "Iona", "Irmada", "Julera", "Justera", "Kenora", "Kronera", "Laudera", "Luminera", "Magera", "Menera", "Morta", "Mortera", "Neptera", "Nerana", "Nika", "Olympa", "Osera", "Pacera", "Premia", "Pythera", "Quilia", "Refugia", "Rowana", "Secura", "Serdebra", "Shivera", "Silvera", "Solera", "Tavara", "Thera", "Umera", "Unitera", "Veludera", "Verlana", "Xantera", "Xylana", "Yanara", "Zanera", "Zeluna"];
//foreach ($worlds as $world) {
// All HTML from the online list
$html = file_get_html('https://secure.tibia.com/community/?subtopic=worlds&world=Antica');
// Search for the online list table content
foreach ($html->find('tr[class=Table2]') as $row) {
$name = $row->find('td', 0)->plaintext;
$level = $row->find('td', 1)->plaintext;
$vocation = $row->find('td', 2)->plaintext;
echo $name . ' | ' . $level . ' | ' . $vocation . '<br>';
}
//}
?>
And I get these errors:
Warning: file_get_contents(): stream does not support seeking in D:\xampp\htdocs\simple_html_dom.php on line 76
Warning: file_get_contents(): Failed to seek to position -1 in the stream in D:\xampp\htdocs\simple_html_dom.php on line 76
Fatal error: Uncaught Error: Call to a member function find() on boolean in D:\xampp\htdocs\index.php:13 Stack trace: #0 {main} thrown in D:\xampp\htdocs\index.php on line 13
What am I doing wrong?
The table I am trying to scrape is the "Players Online" table on:
https://secure.tibia.com/community/?subtopic=worlds&world=Antica

Try this:
$html = str_get_html(file_get_contents($url));

This is a simple_html_dom library problem with the latest versions of PHP.
To correct it, simply change "$offset = -1," to "$offset = 0," in the parameters of the "file_get_html" function in the "simple_html_dom.php" file.

I don't know much about simpledom but i think you might need to use a more robust library like https://github.com/FriendsOfPHP/Goutte

Related

Is it possible to use DOM on an external webpage?

I like to use the list on this page: list of teams of soccerclub
But I always get this error
Fatal error: Uncaught Error: Call to a member function find() on null in /home/u562375926/domains/lucswebsite.tech/public_html/resp/scrapvv.php:152 Stack trace: #0 {main} thrown in /home/u562375926/domains/lucswebsite.tech/public_html/resp/scrapvv.php on line 152
using this code:
<?php
include('simple_html_dom.php');
$file = 'https://www.voetbalvlaanderen.be/club/1771/ploegen';
$html = new simple_html_dom();
$html->load_file($file);
$club = $html->find('div',8)->find(h2);
echo $club. '<br>';
?>
I have used the same page in ParseHub and I get all a hrefs and the text of the corresponding spans.
Is it possible that DOM is not working on that page?

PHP simple HTML DOM parser errors

I've started writing a scraper for one site that will also have a crawler, since I need to go through some links, but I'm getting this error :
PHP Fatal error: Uncaught Error: Call to a member function find() on
null in D:\Projekti\hemrank\simple_html_dom.php:1129 Stack trace:
0 D:\Projekti\hemrank\scrapeit.php(37): simple_html_dom->find('ul')
1 D:\Projekti\hemrank\scrapeit.php(19): ScrapeIt->getAllAddresses()
2 D:\Projekti\hemrank\scrapeit.php(55): ScrapeIt->run()
3 {main} thrown in D:\Projekti\hemrank\simple_html_dom.php on line 1129
When I var_dump the $html variable I get the full html with all the tags, etc, that's why it's strange to me that it says "Call to a member function find() on null", when there's actually value in the $html. Here's the part of the code that's not working :
$html = new simple_html_dom();
$html->load_file($baseurl);
if(empty($html)){echo "HTTP Response not received!<br/>\n";exit;}
$links = array();
foreach ($html->find('ul') as $ul) {
if(!empty($ul) && (count($ul)>0))
foreach ($ul->find('li') as $li) {
if(!empty($li) && (count($li)>0))
foreach ($li->find('a') as $a) {
$links[] = $a->href;
}
else
die("NOT AVAILABLE");
}
}
return $links;
}
Is this a common problem with PHP simple HTML DOM parser, is there a solution or should I switch to some other kind of scraping?
I just searched for the lib you are using, this is line 1129:
return $this->root->find($selector, $idx, $lowercase);
So your error message is telling you that $this->root inside the class is null, therefore no find() method exists!
I'm no expert on the lib, as I use the awesome DOMDocument for parsing HTML, but hopefully this should help you understand what has happened.
Also, $html will never be empty in that code of yours, you already populated it when you instantiated it!
I suggest the following change:
$html->load_file($baseurl); to $html = file_get_html($baseurl);
On my VPS server it works with $html->load_file($baseurl); but on my dedicated local server it only works with $html = file_get_html($baseurl);
This solved my problem:
- Call to a member function find() on null
- simple_html_dom.php on line 1129

Parsing HTML Table - PHP

I am trying to parse many HTML tables, with the URLs stored in the database. The current problem with my code is that it will fail on a different table every time. Here is the part of the code that gets the error:
while ($sqlrow = mysqli_fetch_row($res)) {
echo "Started Processing Table " . $tables . PHP_EOL;
$tables++;
$data = file_get_contents($sqlrow[1]);
$dom->loadHTML($data);
$dom->preserveWhiteSpace = false;
$teamtable = $dom->getElementById("reTeamTable");
$teamrows = $teamtable->getElementsByTagName('tr');
The lines that usually fails is either the "getElementById" command or the "getElementsByTagName" command. The error I am getting is: "PHP Fatal error: Call to a member function getElementsByTagName() on a non-object in /scouting/teamlist.php on line 20". I don't understand why this is getting an error on a different URL every time.
Its means that $dom doesnt find element with id="reTeamTable" ( $teamtable is null ). Before call getElementsByTagName , check $teamtable on empty.

Problem validating XML against XSD - PHP/schemaValidate

I'm trying to validate an XML file against an XSD using the function schemaValidate(String file) from DOMDocument.
When I validate it on other tools like online validators, it works fine, but in my program I always get this error and really can't find where it's coming from:
Warning: DOMDocument::schemaValidate(/home/public_html/product/xxxx/xxxx/xxxxx/xsd/AdlSchema.xsd): failed to open stream: Permission denied in /home/public_html/xxxx/xxxx.php on line 209 Warning: DOMDocument::schemaValidate(): I/O warning : failed to load external entity "/home/public_html/product/xxxx/xxxx/xxxx/xxxx/xsd/AdlSchema.xsd" in /home/public_html/xxxx/xxxx.php on line 209 Warning: DOMDocument::schemaValidate(): Failed to locate the main schema resource at '/home/public_html/product/xxxxx/xxxxx/xxxxx/xxxx/xsd/AdlSchema.xsd'. in /home/public_html/xxxx/xxxxx.php on line 209 Warning: DOMDocument::schemaValidate(): Invalid Schema in /home/public_html/xxxx/xxxx.php on line 209
So my question is, is there a way to get more details about this error (mainly the Invalid schema one) with DOMDocument functions? and if ever someone could tell what could cause that kind of errors that would be great (xml and xsd are kind of confidentials, sorry, but once again it is working just fine with a few other tools).
Thanks!
/home/public_html/product/xxxx/xxxx/xxxxx/xsd/AdlSchema.xsd): failed to open stream: Permission deniedThe php process doesn't have the necessary rights to access the xsd file.
Let's poke around a little bit and add some debug/info code
Please add
/* debug code start. Don't forget to remove */
// if there already is a variable you use as parameter for schemaValidate() use that instead of defining a new one.
$path = '/home/public_html/product/xxxx/xxxx/xxxxx/xsd/AdlSchema.xsd';
foreach( array('file_exists', 'is_readable', 'is_writable') as $fn ) {
echo $fn, ': ', $fn($path) ? 'true':'false', "<br />\n";
}
$foo = stat($path);
echo 'mode: ', $foo['mode'], "<br />\n";
echo 'uid: ', $foo['uid'], "<br />\n";
echo 'gid: ', $foo['gid'], "<br />\n";
if ( function_exists('getmyuid') ) {
echo 'myuid: ', getmyuid(), "<br />\n";
}
if ( function_exists('getmygid') ) {
echo 'myuid: ', getmygid(), "<br />\n";
}
$foo = fopen($path, 'rb');
if ( $foo ) {
echo 'fopen succeeded';
fclose($foo);
}
else {
echo 'fopen failed';
}
/* debug code end */
right before your call to schemaValidate().
I got the same problem using relative paths to XML and XSD schema files. But after I changed it to the absolute ones the problem disappeared.
For me the reason was that the libxml entity loader was disabled (libxml_disable_entity_loader(true);). It seems to have to be enabled to use this function. I switched to DOMDocument::validateSchemaSource since I don't want to have to enable the entity loader.

PHP: XML parsing issues

Im trying to create a payment system that integrates with eWay .. They have supplied some code for processing payments, except when i run it im getting the following error:
Message: Undefined index: value
Line Number: 106
The function that line 106 references is as follows:
function parseResponse($xmlResponse){
$xml_parser = xml_parser_create();
xml_parse_into_struct($xml_parser, $xmlResponse, $xmlData, $index);
$responseFields = array();
foreach($xmlData as $xData)
if($xData["level"] == 2)
$responseFields[$xData["tag"]] = $xData["value"];
return $responseFields;
}
Im really stuck on this, i cant seem to get it working.
Any help on diagnosing this would be fantastic.
Cheers,
The XML im trying to parse is as follows
<ewaygateway>
<ewayCustomerID>87654321</ewayCustomerID>
<ewayTotalAmount>44000</ewayTotalAmount>
<ewayCardHoldersName>Testing Test</ewayCardHoldersName>
<ewayCardNumber>4444333322221111</ewayCardNumber>
<ewayCardExpiryMonth>04</ewayCardExpiryMonth>
<ewayCardExpiryYear>15</ewayCardExpiryYear>
<ewayCustomerFirstName>Testing test</ewayCustomerFirstName>
<ewayCustomerLastName>Testing test</ewayCustomerLastName>
<ewayCustomerEmail>info#emailaddress.com.au</ewayCustomerEmail>
<ewayCustomerAddress>123 Testing St</ewayCustomerAddress>
<ewayCustomerPostcode>2000</ewayCustomerPostcode>
<ewayCustomerInvoiceDescription>Membership</ewayCustomerInvoiceDescription>
<ewayCustomerInvoiceRef>00001</ewayCustomerInvoiceRef>
<ewayTrxnNumber>000001</ewayTrxnNumber>
<ewayOption1>Nice</ewayOption1>
<ewayOption2>Big</ewayOption2>
<ewayOption3>Option</ewayOption3>
</ewaygateway>
Also, this is how the xml is being generated
$xmlRequest = "<ewaygateway><ewayCustomerID>" . $this->myCustomerID . "</ewayCustomerID>";
foreach($this->myTransactionData as $key=>$value)
$xmlRequest .= "<$key>$value</$key>";
$xmlRequest .= "</ewaygateway>";
The <ewaygateway> tag has no value. Just check to make sure the value index exists with isset.

Categories