simple_html_dom: trying to find height in google search - php

Anyone can explain to me what is wrong with the code and how do i get the height value? I am trying to get the height of celebrities. Any suggestions?
Thanks.
My code (Updated with CURL user agent setting as advised):
$url='https://www.google.com/webhp?ie=UTF-8#q=ailee+height';
//Set CURL user agent
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
$data = curl_exec($ch);
curl_close($ch);
//simple html dom
require_once('lib/simple_html_dom.php');
$html = str_get_html($data);
$height= $html->find('div[class="_eF"]',0)->innertext;
echo $height;
I get empty from the above code. In this case, I want to return:
5' 5" (1.65 m)

The problem is that curl doesn't process JavaScript and Google will show a different webpage when JavaScript is disabled, in this case, the div changes to a span with a different id
<span class="_m3b">1.65 m</span>
Also, the link you were using wasn't working for me.
Try this instead:
<?php
header('Content-Type: text/html; charset=utf-8');
$url='https://www.google.pt/search?q=ailee+height&num=10&gbv=1';
//Set CURL user agent
$ch = curl_init();
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
$data = curl_exec($ch);
curl_close($ch);
require_once('simple_html_dom.php');
$html = str_get_html($data);
$height= $html->find('span[class="_m3b"]',0)->innertext;
echo $height;
//1.65 m

Related

get_meta_tags http request failed 403 forbidden

When I do:
$tags = get_meta_tags('http://example.com');
I get error: http request failed 403 forbidden, but when I go to site with browser all ok, status code: 200. May be I need set user_agent? But how I can do it?
You can do it by cURL. Here's the example:
$user_agent = 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36';
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($curl, CURLOPT_USERAGENT, $user_agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, https://example.com);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
return $data;
You can set the user agent and retrieve the meta information
ini_set('user_agent', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:7.0.1) Gecko/20100101 Firefox/7.0.1');
$meta_tags = get_meta_tags('www.example.com');
it will return an array of all meta tags.
For more information please refer to PHP Manual

angular.callbacks to string PHP?

I have this data sheet from angular that I want to convert to sting to put in a table sql database...
URL= 'http://lscluster.hockeytech.com/feed/index.php?feed=statviewfeed&view=players&season=60&team=all&position=skaters&rookies=0&statsType=expanded&rosterstatus=undefined&site_id=1&first=0&limit=1185&sort=points&league_id=4&lang=en&division=-1&key=50c2cd9b5e18e390&client_code=ahl&league_id=4&callback=angular.callbacks._q'
Normaly with that code I'm able to parse everything but it returns zero for that URL...
$curl_handle=curl_init();
curl_setopt($curl_handle, CURLOPT_URL, $URL);
curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 2);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curl_handle, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.80 Safari/537.36');
$querymain = curl_exec($curl_handle);
curl_close($curl_handle);
$arr = json_decode((string)$querymain, true);
(json_last_error()); // call after json_decode
$numarr = count($arr['data']);
thanks for the tips.

A website URL is not loading with Curl php

I am using Curl PHP to fetch data from remote site. My Script is:
<?php
$url = 'https://www.(url).com/';
$sleep = rand(10, 12);
sleep($sleep);
$agent= 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.110 Safari/537.36';
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8','accept-encoding:gzip, deflate, sdch','accept:image/webp,image/*,*/*;q=0.8'));
curl_setopt($ch, CURLOPT_PROXY, "x.x.x.x:x");
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_URL,$url);
$result=curl_exec($ch);
$mainPage = new simple_html_dom;
echo $mainPage->load($result);
But it returns 403 forbidden error in response.
I tried with advanced User agents include, but still I am getting this error in response.
Thanks in advance for suggestions and comments.

php curl to read data from webpage

I am given a project on fetching data from this url.
For this, Simple HTML DOM process has already failed, so I am working on:
function curl_download($Url){
if (!function_exists('curl_init')){
die('Sorry cURL is not installed!');
}
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $Url);
curl_setopt($ch, CURLOPT_REFERER, "www.idealo.de/preisvergleich/MainSearchProductCategory.html?q=0018208925063");
curl_setopt($ch, CURLOPT_USERAGENT, "MozillaXYZ/1.0");
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 10);
$output = curl_exec($ch);
curl_close($ch);
return $output;
}
print curl_download('www.idealo.de/preisvergleich/MainSearchProductCategory.html?q=0018208925063');
This code returns a blank page. Can anyone please help me?
The reason is the Useragent you used is too short to look like a real browser.
Try to use this one bellow:
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.130 Safari/537.38");

Using curl I am not getting whole html while scraping, why?

For example, I tried to scrape meta tags for yebhi.com and for some pages its coming back as null.
I'm using the following code:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.yebhi.com/253196/PD/Tech-Graphic-Tee-81703611.htm');
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.94 Safari/537.4');
$data = curl_exec($ch);
curl_close($ch);
I am not getting the html properly, what am I doing wrong?

Categories