After server change simple html dom doesn't work anymore

After server change simple html dom doesn't work anymore - php

After transferring my previously working code on 5.6 PHP to a new host and server with PHP 7.2, I'm now getting this Fatal error: Uncaught Error: Call to a member function find() on array .... How do I fix this?
<?php
// Get Source
$html = file_get_html('URL');
// Get needed table
$table = $html->find('table',1);
// Find each row, starting with the 2nd, and echo the Cells
foreach($table->find('tr') as $rowNumber => $row) {
if ( $rowNumber < 1 ) continue;
$cell = $row->find('td', 0)->plaintext;
echo $cell;
$cell2 = $row->find('td', 1)->plaintext;
echo $cell2;
}
?>
UPDATE
So it seems the source of the error is the file_get_html code which doesn't work perfectly with PHP 7.
I've found two go-arounds:
1) Through curl
// Curl-Verbindung zu HTM-Datei
$base = 'FULL PATH';
$curl = curl_init();
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl, CURLOPT_HEADER, false);
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curl, CURLOPT_URL, $base);
curl_setopt($curl, CURLOPT_REFERER, $base);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, TRUE);
$str = curl_exec($curl);
curl_close($curl);
// Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load($str);
2) Another through str_get_html
$html = str_get_html(file_get_contents('RELATIVE PATH'));
I guess the second is better?

Just check the file URL on line 3.
$html = file_get_html('URL');
Try to use absolute URL if you are using a relative one.

Related

how to skip InnerText empy in a If statment using php

I am trying to skip when InnerText is empty but it put a default value.
This is my code:
if (strip_tags($result[$c]->innertext) == '') {
$c++;
continue;
}
This is the output:
Thanks
EDIT2: I did the var_dump
var_dump($result[$c]->innertext)
and I got this:
how can I fix it please?
EDIT3: This is my code; I extract in this way the names of the teams and the results, but the last one not works in the best way when We have postponed matches
<?php
include('../simple_html_dom.php');
function getHTML($url,$timeout)
{
$ch = curl_init($url); // initialize curl with given url
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER["HTTP_USER_AGENT"]); // set useragent
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // write the response to a variable
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); // max. seconds to execute
curl_setopt($ch, CURLOPT_FAILONERROR, 1); // stop when it encounters an error
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
return #curl_exec($ch);
}
$response=getHTML("https://www.betexplorer.com/soccer/japan/j3-league/results/",10);
$html = str_get_html($response);
$titles = $html->find("a[class=in-match]"); // name match
$result = $html->find("td[class=h-text-center]/a"); // result match
$c=0; $b=0; $o=0; $z=0; $h=0; // set counters
foreach ($titles as $match) { //get all data
$match_status = $result[$h++];
if (strip_tags($result[$c]->innertext) == 'POSTP.') { //bypass postponed match but it doesn't work anymore
$c++;
continue;
}
list($num1, $num2) = explode(':', $result[$c++]->innertext); // <- explode
$num1 = intval($num1);
$num2 = intval($num2);
$num3 = ($num1 + $num2);
$risultato = ($num1 . '-' . $num2);
list($home, $away) = explode(' - ', $titles[$z++]->innertext); // <- explode
$home = strip_tags($home);
$away = strip_tags($away);
$matchunit = $home . ' - ' . $away;
echo "<tr><td class='rtitle'>".
"<td> ".$matchunit. "</td> / " . // name match
"<td class='first-cell'>" . $risultato . "</td> " .
"</td></tr><br/>";
} //close foreach
?>

By browsing the content of the website you will always be dependent on the changes made in the future.
However, I will use PHP's native libxml DOM extension.
By doing the following:
<?php
function getHTML($url,$timeout)
{
$ch = curl_init($url); // initialize curl with given url
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER["HTTP_USER_AGENT"]); // set useragent
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // write the response to a variable
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); // max. seconds to execute
curl_setopt($ch, CURLOPT_FAILONERROR, 1); // stop when it encounters an error
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
return #curl_exec($ch);
}
$response=getHTML("https://www.betexplorer.com/soccer/japan/j3-league/results/",10);
// "Create" the new DOM document
$doc = new DomDocument();
// Load HTML from a string, and disable xml error handling
$doc->loadHTML($response, LIBXML_NOERROR);
// Creates a new DOMXPath object
$xpath = new DomXpath($doc);
// Evaluates the given XPath expression and get all tr's without first line from table main
$row = $xpath->query('//table[#class="table-main js-tablebanner-t js-tablebanner-ntb"]/tr[position()>1]');
echo '<table>';
// Parse the values in row
foreach ($row as $tr) {
// Only get 2 first td's
$col = $xpath->query('td[position()<3]', $tr);
// Do not show POSTP and Round values
if (!str_contains($tr->nodeValue, 'POSTP') && !str_contains($tr->nodeValue, 'Round')) {
echo '<tr><td>'.$col->item(0)->nodeValue.'</td><td>'.$col->item(1)->nodeValue.'</td></tr>';
}
}
echo '</table>';
You obtain:
<tr><td>Nagano - Tegevajaro Miyazaki</td><td>3:2</td></tr>
<tr><td>YSCC - Toyama</td><td>1:2</td></tr>
...

Trouble writing results in a csv file

I've written a script in php to fetch links and write them in a csv file from the main page of wikipedia. The script does fetch the links accordingly. However, I can't write the populated results in a csv file. When I execute my script, It does nothing, no error either. Any help will be highly appreciated.
My try so far:
<?php
include "simple_html_dom.php";
$url = "https://en.wikipedia.org/wiki/Main_Page";
function fetch_content($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
$htmlContent = curl_exec($ch);
curl_close($ch);
$dom = new simple_html_dom();
$dom->load($htmlContent);
$links = array();
foreach ($dom->find('a') as $link) {
$links[]= $link->href . '<br>';
}
return implode("\n", $links);
$file = fopen("itemfile.csv","w");
foreach ($links as $item) {
fputcsv($file,$item);
}
fclose($file);
}
fetch_content($url);
?>

1.You are using return in your function, that's why nothing gets written in the file as code stops executing after that.
2.Simplified your logic with below code:-
$file = fopen("itemfile.csv","w");
foreach ($dom->find('a') as $link) {
fputcsv($file,array($link->href));
}
fclose($file);
So the full code needs to be:-
<?php
//comment these two lines when script started working properly
error_reporting(E_ALL);
ini_set('display_errors',1); // 2 lines are for Checking and displaying all errors
include "simple_html_dom.php";
$url = "https://en.wikipedia.org/wiki/Main_Page";
function fetch_content($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_BINARYTRANSFER, 1);
$htmlContent = curl_exec($ch);
curl_close($ch);
$dom = new simple_html_dom();
$dom->load($htmlContent);
$links = array();
$file = fopen("itemfile.csv","w");
foreach ($dom->find('a') as $link) {
fputcsv($file,array($link->href));
}
fclose($file);
}
fetch_content($url);
?>

The reason the file does not get written is because you return out of the function before that code can even be executed.

extract specific data from webpage using php

I wants to create a php script for alerts from my work website when new notice is published, so following the page url
http://www.mahapwd.com/nit/ueviewnotice.asp?noticeid=1767
from this page i want a variable for Date & Time of Meeting (Date and time seperately two variables)
Place of Meeting and Published On
please help me to create a perfect php script.
I tried to create following script but it gives to many errors
<?php
$url1 = "http://www.mahapwd.com/nit/ueIndex.asp?district=12";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
preg_match("/href=(.*)\", $data, $urldata);
$url2 = "http://www.mahapwd.com/nit/$urldata[1];
curl_setopt($ch, CURLOPT_URL, $url2);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data2 = curl_exec($ch);
preg_match("/Published On:</b>(.*)<\/font>", $data, $pubDt);
$PubDate = $pubDt[1];
preg_match("/Time of Meeting:</b>(.*)&nbsp", $data, $MtDt);
$MeetDate = $MtDt[1];
preg_match("/Time of Meeting:</b>$MtDt[1]&nbsp(.*)</font>", $data, $MtTime);
$MeetTime = $MtTime[1];
preg_match("/Place of Meeting:</b>(.*)<\/font>", $data, $pubDt);
$PubDate = $pubDt[1];
?>

Hello i have done simple code for you. You can download simple_html_dom.php from http://simplehtmldom.sourceforge.net/
require_once "simple_html_dom.php";
$url='http://www.mahapwd.com/nit/ueviewnotice.asp?noticeid=1767';
//parse url
for ($i=0;$i<1;$i++) {
$html1 = file_get_html($url);
if(!$html1){ echo "no content"; }
else {
//here is parsed html
$string1 = $html1;
//now you need to find table
$element1=$html1->find('table');
//here is a table you need
$input=$element1[2];
//now you can select row from here
foreach($input->find('td') as $element) {
//in here you can find name than save it to database than check it
}
}
}

curl_exec returns empty string

I'm still a bit new to using curl to pull data and I've recently started using Fiddler to help find what options need to be set.
I'm trying to see if I can pull an image from a site. I first hit a search page - I set the search parameters, then start hitting links in the results. When I attempt to go a link in one of the results for an image, I get an empty string returned from curl_exec().
The weird thing is - at one point, it worked - I got the data back and successfully saved the image locally. But then it stopped, and I have no idea what I was doing to have it working. Naturally, everything works OK in the browser. :(
I'm using Simple HTML DOM to parse through results and cUrl for the actual page requests. curl_error() does not show an error, curl_getinfo() thinks everything is OK too. It's probably something trivial, but I'm not sure how to troubleshoot it beyond where I am.
<?php
include 'includes/simple_html_dom.php';
$url = "http://nwweb.co.bell.tx.us/NewWorld.Aegis.WebPortal/Corrections/InmateInquiry.aspx";
// Get Cookie - ASP.NET_SessionId
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
$r = curl_exec($ch);
preg_match_all('/^Set-Cookie:\s*([^;]*)/mi', $r, $matches);
$cookies = array();
foreach($matches[1] as $item)
{
parse_str($item, $cookie);
$cookies = array_merge($cookies, $cookie);
}
$sessionCookie = "ASP_NET_SessionId=".$cookies['ASP_NET_SessionId'];
// now load up page into Simple HTML DOM and get all inputs - ignore buttons and populate our dates
$startDate = "02%2F01%2F2000";
$endDate = "02%2F07%2F2016";
$getInputs = str_get_html($r);
$inputs = $getInputs->find('input');
$inputs_array = array();
$buttons_array = array();
for ($i=0; $i<count($inputs); $i++)
{
if ($inputs[$i]->type != "submit")
{
$inputs_array[$inputs[$i]->id] = $inputs[$i]->value;
if (stripos($inputs[$i]->id, "FromDate") > 0)
$inputs_array[$inputs[$i]->id] = $startDate;
if (stripos($inputs[$i]->id, "ToDate") > 0)
$inputs_array[$inputs[$i]->id] = $endDate;
}
}
// build up our curl data - includes hidden inputs, our to & from dates, plus the Search button
$curl_data = http_build_query($inputs_array)."&ctl00%24DefaultContent%24uxSearch=Search";
// POST the data, include session cookie
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $curl_data);
curl_setopt($ch, CURLOPT_COOKIE, $sessionCookie);
$response = curl_exec($ch);
// this shows that we can get data
// find the links from the HTML
$htmlDom = str_get_html($response); // load up Simple HTML DOM
// get the table of results
$divTable = $htmlDom->find('div#ctl00_DefaultContent_uxResultsWrapper',0)->find('table',0);
$rows = $divTable->find('tr');
for ($i=1; $i<count($rows);$i++)
{
if ($i>3) break; // limit the length of script for debugging
$link = $rows[$i]->find('td',1)->find('a',0)->href;
// build up query to get inmate details from the link above
$url = "http://nwweb.co.bell.tx.us/NewWorld.Aegis.WebPortal/Corrections/".$link;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_COOKIE, $sessionCookie);
$page = curl_exec($ch);
$pageData = str_get_html($page);
// Now find the Photo, there's a thumb in div.BookingPhotos
// It is linked to a full size image, the link is of the form http://nwweb.co.bell.tx.us/NewWorld.Aegis.WebPortal/GetImage.aspx?ImageKey=17C030IS, but in the href, it has ../GetImage.aspx?ImageKey=xxxx
$photoLink = $pageData->find('div.BookingPhotos',0)->find('a',0)->href;
// get rid of .. and put the base URL on the front
$imgLink = str_replace("..", "http://nwweb.co.bell.tx.us/NewWorld.Aegis.WebPortal", $photoLink);
// now attempt to pull the image
$ch = curl_init($imgLink);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_COOKIE, $sessionCookie);
// here is the PROBLEM - NO DATA RETURNED
$imgData = curl_exec($ch); // I get a header back, but NO data
}
?>

How to get Google +1 count for current page in PHP?

I want to get count of Google +1s for current web page ? I want to do this process in PHP, then write number of shares or +1s to database. That's why, I need it. So, How can I do this process (getting count of +1s) in PHP ?
Thanks in advance.

This one works for me and is faster than the CURL one:
function getPlus1($url) {
$html = file_get_contents( "https://plusone.google.com/_/+1/fastbutton?url=".urlencode($url));
$doc = new DOMDocument(); $doc->loadHTML($html);
$counter=$doc->getElementById('aggregateCount');
return $counter->nodeValue;
}
also here for Tweets, Pins and Facebooks
function getTweets($url){
$json = file_get_contents( "http://urls.api.twitter.com/1/urls/count.json?url=".$url );
$ajsn = json_decode($json, true);
$cont = $ajsn['count'];
return $cont;
}
function getPins($url){
$json = file_get_contents( "http://api.pinterest.com/v1/urls/count.json?callback=receiveCount&url=".$url );
$json = substr( $json, 13, -1);
$ajsn = json_decode($json, true);
$cont = $ajsn['count'];
return $cont;
}
function getFacebooks($url) {
$xml = file_get_contents("http://api.facebook.com/restserver.php?method=links.getStats&urls=".urlencode($url));
$xml = simplexml_load_string($xml);
$shares = $xml->link_stat->share_count;
$likes = $xml->link_stat->like_count;
$comments = $xml->link_stat->comment_count;
return $likes + $shares + $comments;
}
Note: Facebook numbers are the sum of likes+shares and some people said plus comments (I didn't search this yet), anyway use the one you need.
This will works if your php settings allow open external url, check your "allow_url_open" php setting.
Hope helps.

function get_plusones($url) {
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, "https://clients6.google.com/rpc");
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_POSTFIELDS, '[{"method":"pos.plusones.get","id":"p","params":{"nolog":true,"id":"' . $url . '","source":"widget","userId":"#viewer","groupId":"#self"},"jsonrpc":"2.0","key":"p","apiVersion":"v1"}]');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_HTTPHEADER, array('Content-type: application/json'));
$curl_results = curl_exec ($curl);
curl_close ($curl);
$json = json_decode($curl_results, true);
return intval( $json[0]['result']['metadata']['globalCounts']['count'] );
}
echo get_plusones("http://www.stackoverflow.com")
from internoetics.com

The cURL and API way listed in the other posts here no longer works.
There is still at least 1 method, but it's ugly and Google clearly doesn't support it. You just rip the variable out of the JavaScript source code for the official button with a regular expression:
function shinra_gplus_get_count( $url ) {
$contents = file_get_contents(
'https://plusone.google.com/_/+1/fastbutton?url='
. urlencode( $url )
);
preg_match( '/window\.__SSR = {c: ([\d]+)/', $contents, $matches );
if( isset( $matches[0] ) )
return (int) str_replace( 'window.__SSR = {c: ', '', $matches[0] );
return 0;
}

The next PHP script works great so far for retrieving Google+ count on shares and +1's.
$url = 'http://nike.com';
$gplus_type = true ? 'shares' : '+1s';
/**
* Get Google+ shares or +1's.
* See out post at stackoverflow.com/a/23088544/328272
*/
function get_gplus_count($url, $type = 'shares') {
$curl = curl_init();
// According to stackoverflow.com/a/7321638/328272 we should use certificates
// to connect through SSL, but they also offer the following easier solution.
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
if ($type == 'shares') {
// Use the default developer key AIzaSyCKSbrvQasunBoV16zDH9R33D88CeLr9gQ, see
// tomanthony.co.uk/blog/google_plus_one_button_seo_count_api.
curl_setopt($curl, CURLOPT_URL, 'https://clients6.google.com/rpc?key=AIzaSyCKSbrvQasunBoV16zDH9R33D88CeLr9gQ');
curl_setopt($curl, CURLOPT_POST, 1);
curl_setopt($curl, CURLOPT_POSTFIELDS, '[{"method":"pos.plusones.get","id":"p","params":{"nolog":true,"id":"' . $url . '","source":"widget","userId":"#viewer","groupId":"#self"},"jsonrpc":"2.0","key":"p","apiVersion":"v1"}]');
curl_setopt($curl, CURLOPT_HTTPHEADER, array('Content-type: application/json'));
}
elseif ($type == '+1s') {
curl_setopt($curl, CURLOPT_URL, 'https://plusone.google.com/_/+1/fastbutton?url='.urlencode($url));
}
else {
throw new Exception('No $type defined, possible values are "shares" and "+1s".');
}
$curl_result = curl_exec($curl);
curl_close($curl);
if ($type == 'shares') {
$json = json_decode($curl_result, true);
return intval($json[0]['result']['metadata']['globalCounts']['count']);
}
elseif ($type == '+1s') {
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTML($curl_result);
$counter=$doc->getElementById('aggregateCount');
return $counter->nodeValue;
}
}
// Get Google+ count.
$gplus_count = get_gplus_count($url, $gplus_type);

Google does not currently have a public API for getting the +1 count for URLs. You can file a feature request here. You can also use the reverse engineered method mentioned by #DerVo. Keep in mind though that method could change and break at anytime.

I've assembled this code to read count directly from the iframe used by social button.
I haven't tested it on bulk scale, so maybe you've to slow down requests and/or change user agent :) .
This is my working code:
function get_plusone($url)
{
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, "https://plusone.google.com/_/+1/fastbutton?
bsv&size=tall&hl=it&url=".urlencode($url));
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
$html = curl_exec ($curl);
curl_close ($curl);
$doc = new DOMDocument();
$doc->loadHTML($html);
$counter=$doc->getElementById('aggregateCount');
return $counter->nodeValue;
}
Usage is the following:
echo get_plusones('http://stackoverflow.com/');
Result is: 3166

I had to merge a few ideas from different options and urls to get it to work for me:
function getPlusOnes($url) {
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, "https://plusone.google.com/_/+1/fastbutton?url=".urlencode($url));
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_SSL_VERIFYPEER, false);
$html = curl_exec ($curl);
curl_close ($curl);
$doc = new DOMDocument();
$doc->loadHTML($html);
$counter=$doc->getElementById('aggregateCount');
return $counter->nodeValue;
}
All I had to do was update the url but I wanted to post a complete option for those interested.
echo getPlusOnes('http://stackoverflow.com/')
Thanks to Cardy for using this approach, then I just had to just get a url that worked for me...

I've released a PHP library retrieving count for major social networks. It currently supports Google, Facebook, Twitter and Pinterest.
Techniques used are similar to the one described here and the library provides a mechanism to cache retrieved data. This library also have some other nice features: installable through Composer, fully tested, HHVM support.
http://dunglas.fr/2014/01/introducing-the-socialshare-php-library/

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

After server change simple html dom doesn't work anymore - php

Just check the file URL on line 3. $html = file_get_html('URL'); Try to use absolute URL if you are using a relative one.

Related

how to skip InnerText empy in a If statment using php

Trouble writing results in a csv file

extract specific data from webpage using php

curl_exec returns empty string

How to get Google +1 count for current page in PHP?

Categories

Resources