member function error in data extraction - php

I am trying get data from a external source but i can't get the data and i am facing this error.
Notice: Trying to get property of non-object in E:\xampp\htdocs\test\merit-list.php on line 38
Fatal error: Call to a member function find() on null in E:\xampp\htdocs\test\merit-list.php on line 39
Here is my code
<?php
require('resources/inc/simple_html_dom.php');
$linksrc = 'http://58.65.172.36/Portal/WebSiteUpdates/Achievements.aspx';
$curl = curl_init();
curl_setopt_array($curl, array(
CURLOPT_URL => $linksrc,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_FOLLOWLOCATION => 1,
CURLOPT_ENCODING => "",
CURLOPT_MAXREDIRS => 10,
CURLOPT_TIMEOUT => 3000,
CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
CURLOPT_USERAGENT => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36',
));
$file = curl_exec($curl);
$error = curl_error($curl);
curl_close($curl);
$dom = new simple_html_dom();
$dom->load($file);
$doctorDivs = $dom->find("table#Farooq", 0)->children();
$doctors = array();
foreach($doctorDivs as $div){
$doctor = array();
// line 38
$image = $doctor["image"] = $div->find('img', 0)->src;
$details = $div->find('tr', 0)->find("td");
$name = $doctor["name"] = trim($details[1]->plaintext);
$spec = $doctor["desc"] = trim($details[2]->plaintext);
$doctors[] = $doctor;
echo $image;
echo $name;
echo $spec;
}
?>

The problem is that the first row of the table doesn't have an <img>, because it's the row of headings, so $div->find("img", 0) returns null. You get the first error when you try to access null->src.
The second error is because $div is the <tr> element. $div->find("tr") searches the children of $div, it doesn't include $div itself, so it always returns null. Also, this code won't work in the heading row, either, because it contains <th> rather than <tr> elements.
You could just skip over the heading row by putting:
array_shift($doctorDivs);
before the foreach loop. This will remove the first element of the array.
And change $details to :
$details = $div->find("td");

Related

PHP DOM how get all image links

I'm trying to download pictures from the site for exercises. But something does not work for me, I do not want to display links. Can anyone help me what am I doing wrong ??
This is my code ;)
$li = 'https://gratka.pl/nieruchomosci/mieszkanie-katowice-dabrowka-mala/ob/20357919';
$options3 = array('http' => array('method'=>"GET",
'header'=>"Accept-language: pl\r\n" .
"Cookie: foo=bar\r\n",
'user_agent' => 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36'));
$context3 = stream_context_create($options3);
$text1 = file_get_contents($li, false, $context3);
$dom = new DOMDocument();
libxml_use_internal_errors(true);
$dom->loadHTML($text1);
$query_string = '';
$divs = $dom->getElementsByTagName('span');
foreach ($divs as $div){
if(preg_match('/\bgallery__imageViewer\b/', $div->getAttribute('class'))) {
$links = $div->getElementsByTagName('img');
foreach($links as $link){
$foto = $link->getAttribute('src');
$query_string .= ('<center><img src="'.$foto.'"> </center><br/>');
}
}
}
print_r($query_string);
Thanks in advance to everyone for your help.

PHP parsing stops

I'm trying to parse pages. I've read that it needs to set a header to avoid a 500 server error So I did.
But what happens is after 5 or so pages, the parsing stops. No error it just stops.
The code:
$url = 'http://www.someurlhere.com';
$options = array('http' => array('header' => "User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.125 Safari/533.4"));
$context = stream_context_create($options);
$html = file_get_html($url, false, $context);
edit
foreach($html->find('table.votes tr.even,tr.odd') as $tr) {
if ($tr->find('td', 3) == '<td>absent</td>') {
$absent = $absent + 1;
}
$possible = $possible + 1;
}
echo 'absent=> ' . $absent . ' out of => ' . $possible . '<br>';

Scraping ASP website using cURL with manual redirection

I need to scrape an ASP website using cURL. My hosting does not allow me to turn off safe_mode or open_basedir. That's why CURLOPT_FOLLOWLOCATION cannot be activated (it throws an error "CURLOPT_FOLLOWLOCATION cannot be activated when an open_basedir is set").
I tried to implement some workaround but after several unlucky days starting to be desperate. I am wondering how to change the code below to contain manual redirection instead of CURLOPT_FOLLOWLOCATION:
include_once __DIR__.'/simple_html_dom.php';
define('COOKIE_FILE', __DIR__.'/cookie.txt');
#unlink(COOKIE_FILE); //clear cookies before we start
define('CURL_LOG_FILE', __DIR__.'/request.txt');
#unlink(CURL_LOG_FILE);//clear curl log
class ASPBrowser {
public $exclude = array();
public $lastUrl = '';
public $dom = false;
/**Get simplehtmldom object from url
* #param $url
* #param $post
* #return bool|simple_html_dom
*/
public function getDom($url, $post = false) {
$f = fopen(CURL_LOG_FILE, 'a+'); // curl session log file
if($this->lastUrl) $header[] = "Referer: {$this->lastUrl}";
$curlOptions = array(
CURLOPT_ENCODING => 'gzip,deflate',
CURLOPT_AUTOREFERER => 1,
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_URL => $url,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_SSL_VERIFYHOST => false,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_MAXREDIRS => 9,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_HEADER => 0,
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36",
CURLOPT_COOKIEFILE => COOKIE_FILE,
CURLOPT_COOKIEJAR => COOKIE_FILE,
CURLOPT_STDERR => $f, // log session
CURLOPT_VERBOSE => true,
);
if($post) { // add post options
$curlOptions[CURLOPT_POSTFIELDS] = $post;
$curlOptions[CURLOPT_POST] = true;
}
$curl = curl_init();
curl_setopt_array($curl, $curlOptions);
$data = curl_exec($curl);
$this->lastUrl = curl_getinfo($curl, CURLINFO_EFFECTIVE_URL); // get url we've been redirected to
curl_close($curl);
if($this->dom) {
$this->dom->clear();
$this->dom = false;
}
$dom = $this->dom = str_get_html($data);
fwrite($f, "{$post}\n\n");
fwrite($f, "-----------------------------------------------------------\n\n");
fclose($f);
return $dom;
}
function createASPPostParams($dom, array $params) {
$postData = $dom->find('input,select,textarea');
$postFields = array();
foreach($postData as $d) {
$name = $d->name;
if(trim($name) == '' || in_array($name, $this->exclude)) continue;
$value = isset($params[$name]) ? $params[$name] : $d->value;
$postFields[] = rawurlencode($name).'='.rawurlencode($value);
}
$postFields = implode('&', $postFields);
return $postFields;
}
function doPostRequest($url, array $params) {
$post = $this->createASPPostParams($this->dom, $params);
return $this->getDom($url, $post);
}
function doPostBack($url, $eventTarget, $eventArgument = '') {
return $this->doPostRequest($url, array(
'__EVENTTARGET' => $eventTarget,
'__EVENTARGUMENT' => $eventArgument
));
}
function doGetRequest($url) {
return $this->getDom($url);
}
}
(Credits: Andrey http://256cats.com/scraping-asp-websites-php-dopostback-ajax-emulation/)
You're probably looking for the CURLINFO_REDIRECT_URL info variable, as that returns the URL that it would otherwise had redirected to if you'd allowed it. Added in PHP 5.3.7.
Note that the exact response code 3xx also affects how the HTTP request method is supposed to change or not change when you follow a redirect. See details in the HTTP spec, RFC 7231 section 6.4.
The libcurl docs for CURLINFO_REDIRECT_URL.

file_get_contents not fetching the correct results

I am trying to fetch prices from play and amazon for a personal project, but i have 2 problems.
Firstly i have got play to work, but it fetches the wrong price, and secondly amazon doesnt fetch any results.
Here is the code i have been trying to get working.
$playdotcom = file_get_contents('http://www.play.com/Search.html?searchstring=".$getdata[getlist_item]."&searchsource=0&searchtype=r2alldvd');
$amazoncouk = file_get_contents('http://www.amazon.co.uk/gp/search?search-alias=dvd&keywords=".$getdata[getlist_item]."');
preg_match('#<span class="price">(.*)</span>#', $playdotcom, $pmatch);
$newpricep = $pmatch[1];
preg_match('#used</a> from <strong>(.*)</strong>#', $playdotcom, $pmatch);
$usedpricep = $pmatch[1];
preg_match('#<span class="bld lrg red"> (.*)</span>#', $amazoncouk, $amatch);
$newpricea = $amatch[1];
preg_match('#<span class="price bld">(.*)</span> used#', $amazoncouk, $amatch);
$usedpricea = $amatch[1];
then echo the results:
echo "Play :: New: $newpricep - Used: $usedpricep";
echo "Amazon :: New: $newpricea - Used: $usedpricea";
Just so you know whats going on
$getdata[getlist_item] = "American Pie 5: The Naked Mile";
which is working fine.
Any idea why these aren't working correctly?
EDIT: I have just realised that $getdata[getlist_item] in the file_get_contents is not using the variable, just printing the variable as is... why is it doing that???
The quotes you are using aren't consistent! Both your opening and closing quotes need to be the same.
Try this:
$playdotcom = file_get_contents("http://www.play.com/Search.html?searchstring=".$getdata['getlist_item']."&searchsource=0&searchtype=r2alldvd");
$amazoncouk = file_get_contents("http://www.amazon.co.uk/gp/search?search-alias=dvd&keywords=".$getdata['getlist_item']);
As it were ".$getdata[getlist_item]." was considered part of the string as you never closed the single quote string you initiated.
Use curl function with correct headers. Below code will read the any web pages and then use a proper parser DOMDocument or simpleHTMLDomParser tool for read price from html content
$playdotcom = getPage("http://www.play.com/Search.html?searchstring=".$getdata['getlist_item']."&searchsource=0&searchtype=r2alldvd");
$amazoncouk = getPage("http://www.amazon.co.uk/gp/search?search-alias=dvd&keywords=".$getdata['getlist_item']);
function getPage($url){
$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0';
$options = array(
CURLOPT_CUSTOMREQUEST =>"GET",
CURLOPT_POST =>false,
CURLOPT_USERAGENT => $user_agent,
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => false,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_ENCODING => 'gzip',
CURLOPT_AUTOREFERER => true,
CURLOPT_CONNECTTIMEOUT => 30000,
CURLOPT_TIMEOUT => 30000,
CURLOPT_MAXREDIRS => 10,
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
curl_close( $ch );
return $content;
}

change siteid in ebay API call

I need to connect with the eBay motors site using a function in openbay which is an opencart extension. The site id for eBay motors is 100, but for the life of me I cannot get it to change with the way this function is written, am I missing something here???
API function call
public function openbay_call($call, array $post = NULL, array $options = array(), $content_type = 'json', $statusOverride = false){
if(defined("HTTPS_CATALOG")){
$domain = HTTPS_CATALOG;
}else{
$domain = HTTPS_SERVER;
}
$data = array(
'token' => $this->token,
'language' => $this->config->get('openbay_language'),
'secret' => $this->secret,
'server' => $this->server,
'domain' => $domain,
'openbay_version' => (int)$this->config->get('openbay_version'),
'data' => $post,
'content_type' => $content_type
);
$defaults = array(
CURLOPT_POST => 1,
CURLOPT_HEADER => 0,
CURLOPT_URL => $this->url.$call,
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1",
CURLOPT_FRESH_CONNECT => 1,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_FORBID_REUSE => 1,
CURLOPT_TIMEOUT => 0,
CURLOPT_SSL_VERIFYPEER => 0,
CURLOPT_SSL_VERIFYHOST => 0,
CURLOPT_POSTFIELDS => http_build_query($data, '', "&")
);
$ch = curl_init();
curl_setopt_array($ch, ($options + $defaults));
if( ! $result = curl_exec($ch)){
$this->log('openbay_call() - Curl Failed '.curl_error($ch).' '.curl_errno($ch));
}
curl_close($ch);
/* There may be some calls we just dont want to log */
if(!in_array($call, $this->noLog)){
$this->log('openbay_call() - Result of : "'.$result.'"');
}
/* JSON RESPONSE */
if($content_type == 'json'){
$encoding = mb_detect_encoding($result);
/* some json data may have BOM due to php not handling types correctly */
if($encoding == 'UTF-8') {
$result = preg_replace('/[^(\x20-\x7F)]*/','', $result);
}
$result = json_decode($result, 1);
$this->lasterror = $result['error'];
$this->lastmsg = $result['msg'];
if(!empty($result['data'])){
return $result['data'];
}else{
return false;
}
/* XML RESPONSE */
}elseif($content_type == 'xml'){
$result = simplexml_load_string($result);
$this->lasterror = $result->error;
$this->lastmsg = $result->msg;
if(!empty($result->data)){
return $result->data;
}else{
return false;
}
}
}else{
$this->log('openbay_call() - OpenBay not active');
$this->log('openbay_call() - Data: '.serialize($post));
}
}
predefined parameters within the class - probably don't help but included anyways.
public function __construct($registry) {
$this->registry = $registry;
$this->token = $this->config->get('openbaypro_token');
$this->secret = $this->config->get('openbaypro_secret');
$this->logging = $this->config->get('openbaypro_logging');
$this->tax = $this->config->get('tax');
$this->server = 1;
$this->lasterror = '';
$this->lastmsg = '';
}
the function call
$this->data['test_category_features'] = $this->ebay->openbay_call('listing/getCategoryFeatures/', array('id' => 35618));
Everything works but how would i get this to change siteid to 100, the only way I can figure it out is to re-write my own API call class, but the client is paying for the subscription to openbay and wants to use the API calls through them, so I have to use there function. Im trying to return eBay motors category features so he can list them the same way he has been for years "used parts". If you don't switch to the eBay motors site id "100" then it will not return the category variations needed or more less accept the categories when trying to add products to eBay through the opencart extension.
Any advice would be greatly appreciated, really stuck here!!! Thanks in advance :)
according to this page: http://developer.ebay.com/DevZone/merchandising/docs/Concepts/SiteIDToGlobalID.html you need to add "... X-EBAY-SOA-GLOBAL-ID HTTP header for each API call" so add that to the curl options.

Categories