Scraping a table using Simple HTML Dom - php

I am trying to scrape this product table,
http://www.dropforyou.com/search2.php?mode=search&posted_data%5Bcategoryid%5D=2&posted_data%5Bsearch_in_subcategories%5D=on
I need the product id, the quantity and the price.
Since the site uses cookies and a post form I am grabbing the site with CURL. Which works fine. I am then loading that into simple html dom with $html = str_get_html($content);
I have been able to load all the table values into an array, however I can't label them. They just come in as 0,1,2 and I can't tell what's what.
I tried using a different method posted here on stackoverflow, however it gives me Fatal error: Call to a member function find() on a non-object in
My working code that isn't labeled
$content = curlscraper($urltoscrape);
$html = str_get_html($content);
$tds = $html->find('table',2)->find('td');
$num = NULL;
foreach($tds as $td)
{
$num[] = $td->plaintext;
}
echo '<pre>';
var_dump ($num);
echo '</pre>';
The code I found on Stackoverflow that just gives me Fatal error: Call to a member function find() on a non-object in
$content = curlscraper($urltoscrape);
$html = str_get_html($content);
foreach($html->find('tr',2) as $page)
{
$item['sku'] = $page->find('td',0)->plaintext;
$item['product'] = $page->find('td',1)->plaintext;
$item['Qty'] = $page->find('td',2)->plaintext;
$item['description'] = $page->find('td',3)->plaintext;
$item['price'] = $page->find('td',4)->plaintext;
$table[] = $item;
}
print_r($table);

Try initialize variable for your foreach function and then use your code. But you don't say which line created that error?
$line = $html->find('tr',2);
foreach($line as $page)
{
//var_dump($page) //You can check array
$item['sku'] = $page->find('td',0)->plaintext;
$item['product'] = $page->find('td',1)->plaintext;
$item['Qty'] = $page->find('td',2)->plaintext;
$item['description'] = $page->find('td',3)->plaintext;
$item['price'] = $page->find('td',4)->plaintext;
$table[] = $item;
}

Related

scape pagination content using simple dom parser

I want to scrape title post of a blog and I wrote below code. I stuck in figuring out how to loop through every page.
$dom = file_get_html('http://demos.appthemes.com/clipper/');
scrape('http://demos.appthemes.com/clipper/');
function scrape($URL)
{
$dom = file_get_html($URL);
foreach ($dom->find('.item-frame h1 a') as $items) {
$item = array('courseTitle' => $items->text());
var_dump($item);
}
}
for($pages = 0; $pages < 3;$pages++) {
if($next = $dom->find('a[class=page]', $pages)) {
$URL = $next->href;
$dom->clear();
unset($dom);
scrape($URL);
}
}
Partial result did appear but stuck at an error Undefined variable: dom in on line 23
unset($dom); causes the $dom variable to be unset and on the second loop iteration ($pages == 1) call to $dom->find fails.
I did not get the logic, but try to remove $dom->clear(); unset($dom); lines.
Hope it helps.

PHP - Simple HTML DOM: Notice: Trying to get property of non-object error

I've been trying to fix this error for the longest time now, and I just can't seem to fix it.
I'm trying to get an article image, url, and url title. For some reason I keep getting the above error for this code:
<?php
$html = file_get_html("http://articlesite.com/");
if($html){
foreach ($html->find('.index_item a img') as $div) {
$articlePoster = $div->src;
$grabURL = $html->find('.index_item a');
/*Error Here -->*/$articleURL = $grabURL->href;
/*And Here -->*/$rawTitle = $grabURL->title;
echo '<div class="articleFrame"><img src="'.$articlePoster.'" width="125" height="186"/><br><p class="title">'.$rawTitle.'</p></div>';
}
}else{
echo '<h1>'."Sorry.".'</h1>';
}
?>
Any ideas? Thanks.
$html->find('xxxxx') returns an array, so you need to iterate through it -- i.e.
foreach ($html->find('.index_item a img') as $div) {
$articlePoster = $div->src;
foreach ($html->find('.index_item a') as $grabURL) {
$articleURL = $grabURL->href;
$rawTitle = $grabURL->title;
(etc.)

PHP:Trying to get property of non-object

There are many posts on this topic but I did not get the required answer. Hence, I am here.
I have been getting Notice: Trying to get property of non-object in /opt/lampp/htdocs/amit/crawlnepalstock.php on line 49 error in my php page.
Here is my code
<?php
include_once('simple_html_dom.php');
error_reporting(E_ALL);
$html = file_get_html('http://nepalstock.com/datanepse/index.php');
$indexarray = array('s_no','stocksymbol', 'LTP', 'LTV', 'point_change', 'per_change', 'open','high', 'low', 'volume','prev_close');
$stocks = array();
$maincount = 0;
$tables = $html->find('table[class=dataTable]');
$str = $html->plaintext;
$matches = array();
foreach ($tables[0]->find('tr') as $elementtr) {
$count = 0;
$temp = array();
$anchor = $elementtr->children(1)->find('a',0);
$splits = preg_split('/=/', $anchor->href); **//line 49**
$temp['stocksymbol'] = isset($splits[1]) ? $splits[1] : null;
$temp['fullname'] = $elementtr->children(1)->plaintext;
$temp['no_of_trans'] = $elementtr->children(2)->plaintext;
$temp['max_price'] = $elementtr->children(3)->plaintext;
$temp['min_price'] = $elementtr->children(4)->plaintext;
$temp['closing_price'] = $elementtr->children(5)->plaintext;
$temp['total_share'] = $elementtr->children(6)->plaintext;
$temp['amount'] = $elementtr->children(7)->plaintext;
$temp['previous_close'] = $elementtr->children(8)->plaintext;
$temp['difference'] = $elementtr->children(9)->plaintext;
$stocks[] = $temp;
}
$html->clear();
unset($html);
echo '<pre>';
print_r($stocks);
echo '</pre>';
?>
I have not included simple_html_dom.php class as it is quite long. Your opinions are very much appreciated. You can find simple_html_dom.php file online in case http://sourceforge.net/projects/simplehtmldom/files/
You are trying to access property of non-object or from null object. e.g.
$obj = null;
echo $boject->first_name // this will produce same error as you are getting.
// another example may be
$obj = array();
echo $obj->first_name; // this will also produce same error.
In code sample Line 49 is not clear so you should check yourself this type of error on line 49.
This is happening because there is no longer a td[align="center"] tag found in the google.com document. Perhaps it was there when the code was first written.
So, what the others are saying about a non-object is true, but because the HTML was not found, there is not an object to use the ->plaintext method on.
As of 12/11/2020, if you change the URL found in line 6 of example_basic_selector.php to this:
$html = file_get_html('https://www.w3schools.com/tags/tryit.asp?filename=tryhtml_td_align_css');
And change this line
echo $html->find('td[align="center"]', 1)->plaintext . '';
to:
echo $html->find('td style="text-align:right"', 1)->plaintext. '';
the error will go away because the text it searches for is found, and thus the method works as intended.

How would I echo this XML data from a database using PHP simpleXML?

I am using a formbuilder plugin in Wordpress which submits the form input to the database as XML data. Now I would like to fetch that data and have it displayed in another page. I have started trying simpleXML to achieve this but now I have hit a road bump.
The XML data that appears in each row of the database follows this format:
<form>
<FormSubject>Report</FormSubject>
<FormRecipient>****#***.com</FormRecipient>
<Name>admin</Name>
<Department>test</Department>
<Value>1000</Value>
<Comments>test</Comments>
<Page>http://***.com</Page>
<Referrer>http://****.com</Referrer>
</form>
I have previously managed to fetch the data that I need using simpleXML from an XML string of the same markup which is in the database but now my question is, how do I do this with a loop for each row in the database?
When the following code is run, wordpress displays a blank page meaning that there is an error:
<?php
global $wpdb;
$statistics = $wpdb->get_results("SELECT * FROM wpformbuilder_results WHERE form_id = '00000000000000000001';");
echo "<table>";
foreach($statistics as $statistic){
$string = $statistic->xmldata
$xml = simplexml_load_string($string);
$Name = (string) $xml->Name;
$Department = (string) $xml->Department;
$Value = (string) $xml->Value;
$Comments = (string) $xml->Comments;
echo "<tr>";
echo "<td>".$statistic->timestamp."</td>";
echo "<td>".$Name."</td>";
echo "<td>".$Department."</td>";
echo "<td>".$Value."</td>";
echo "<td>".$Comments."</td>";
echo "</tr>";
}
echo "</table>";
?>
You are missing ; on line 5
$string = $statistic->xmldata
Should be
$string = $statistic->xmldata;
You should consider enablign WP_DEBUG constant in wp-config.php file. Insert following code to your wp-config.php, just before /* That's all, stop editing! Happy blogging. */
define('WP_DEBUG', true);
/* That's all, stop editing! Happy blogging. */
For more tips on debugging, read the codex
Formbuilder users custom function to extract XML data in formbuilder_xml_db_results Class:
function xmltoarray($xml)
{
$xml = trim($xml);
$match = "#<([a-z0-9_]+)([ \"']*[a-z0-9_ \"']*)>(.*)(</\\1>)#si";
$offset = 0;
if(!preg_match($match, $xml, $regs, false, $offset)) {
return($xml);
}
while(preg_match($match, $xml, $regs, false, $offset))
{
list($data, $element, $attribs, $content, $closing) = $regs;
$offset = strpos($xml, $data) + strlen($data);
$tmp = $this->xmltoarray($content);
$result[$element] = $tmp;
}
return($result);
}
Define that function in your code (before global $wpdb; you don't have to be afraid of same name as that function is defined in Class) and than modify your code in this way:
<?php
global $wpdb;
$statistics = $wpdb->get_results("SELECT * FROM wpformbuilder_results WHERE form_id = '00000000000000000001';");
echo "<table>";
foreach($statistics as $statistic){
$xml = xmltoarray( $statistic->xmldata );
$Name = (string) $xml['form']['Name'];
$Department = (string) $xml['form']['Department'];
$Value = (string) $xml['form']['Value'];
$Comments = (string) $xml['form']['Comments'];
echo "<tr>";
echo "<td>".$statistic->timestamp."</td>";
echo "<td>".$Name."</td>";
echo "<td>".$Department."</td>";
echo "<td>".$Value."</td>";
echo "<td>".$Comments."</td>";
echo "</tr>";
}
echo "</table>";
?>
EDIT: edited $xml['Comments'] to $xml['form']['Comments'] and analogous
I fixed it by stripping the backslashes from the XML string using stripslashes()

Call to a member function find() on a non-object simpleHTMLDOM

I am trying to read a link from one page, print the URL, go to that page, and read the link on the next page in the same location, print the url, go to that page (and so on...).
All I'm doing is reading the URL and passing it as an argument to the get_links() function until there are no more links.
This is my code but it throws:
Fatal error: Call to a member function find() on a non-object.
Anyone know how to fix this?
<?php
$mainPage = 'https://www.bu.edu/link/bin/uiscgi_studentlink.pl/1346752597?ModuleName=univschr.pl&SearchOptionDesc=Class+Subject&SearchOptionCd=C&KeySem=20133&ViewSem=Fall+2012&Subject=&MtgDay=&MtgTime=';
get_links($mainPage);
function get_links($url) {
$data = new simple_html_dom();
$data = file_get_html($url);
$nodes = $data->find("input[type=hidden]");
$fURL = $data->find("/html/body/form");
$firstPart = $fURL[0]->action . '<br>';
foreach ($nodes as $node) {
$val = $node->value;
$name = $node->name;
$name . '<br />';
$val . "<br />";
$str1 = $str1 . "&" . $name . "=" . $val;
}
$fixStr1 = str_replace('&College', '?College', $str1);
$fixStr2 = str_replace('Fall 2012', 'Fall+2012', $fixStr1);
$fixStr3 = str_replace('Class Subject', 'Class+Subject', $fixStr2);
$fixStr4 = $firstPart . $fixStr3;
echo $nextPageURL = chop($fixStr4);
get_links($nextPageURL);
}
?>
Alright so I was using the load->file() function somewhere in my code and did not see it until I really scraped through it. Finally have a running script :) The key is to use file_get_html instead of loading the webpage as an object using the load->file() function.

Categories