I am writing a tool that accesses a set of external website pages. Here is my test code to see if I can retrieve the page:
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
/* curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); */
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$returned_content = get_data('http://www.imdb.com/');
echo $returned_content;
The thing is when I pass Google (for example) as the URL, I get the Google homepage in my browser (sans images for obvious reasons), but when I pass the site I want to see, www.imdb.com, I get nothing. Why is this, and what can I do about it?
Related
I am trying to Scrape dropdown data from this https://www.equibase.com/premium/eqbRaceChartCalendar.cfm?SAP=TN
I use curl to create dom from Url
<?php
include('simple_html_dom.php');
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$returned_content = get_data('https://www.equibase.com/premium/eqbRaceChartCalendar.cfm?SAP=TN');
echo $returned_content;
?>
When i get content like that it's not return the hole webpage. it's not returnning middle part content(search aria and calander aria) of this page.
so i cant Scrape from Track,month,Year dropdown data because that part not return with curl. How return hole web page with curl with middle content
i write following code to get html data from url and its working for https site like Facebook but not working for Instagram only.
Instagram returns the blank
<?php
$url = 'https://www.instagram.com';
$returned_content = get_data($url);
print_r($returned_content)
/* gets the data from a URL */
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
?>
The Instagram will return only javascript, that can't be render by your browser because it uses dynamic path, so <script src='/path/file.js'> will try to get localhost/path/file.js instead of instagram.com/path/file.js and in this situation the localhost/path/file.js not will exist, so the page will be blank.
One solution is find a way to give the full HTML instead of the Javascript, in this case you can use the "User-Agent" to do this trick. You might know that JS not handle by the search-engine, so for this situation the Instagram (and many websites) give the page without JS that is supported by the bot.
So, add this:
curl_setopt($ch, CURLOPT_USERAGENT, "ABACHOBot");
The "ABACHOBot" is one Crawler. In this page you can found many others alternatives, like a "Baiduspider", "BecomeBot"...
You can use "generic" user-agent too, like "bot", "spider", "crawler" and probably will work too.
Here try this on
<?php
$url = 'https://www.instagram.com';
$returned_content = get_data($url);
print_r($returned_content);
/* gets the data from a URL */
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
//Update.................
curl_setopt($ch, CURLOPT_USERAGENT, 'spider');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_HEADER, false);
//....................................................
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
?>
You should pass
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false)
and other header info as above.
For more detail,Please see
http://stackoverflow.com/questions/4372710/php-curl-https
i input some phone number on http://www.dndstatus.com/dnd-check-process.php
and after submiting it shows details like DND status,network operator and Telecom Circle.
I want to fetch these details from the resulting webpage (http://www.dndstatus.com/dnd-check-process.php?num=9721395967) and store to a json file.
How can i do that??
/* gets the data from a URL */
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$strDetails = get_data("http://www.dndstatus.com/dnd-check-process.php?num=9721395967");
Alternatively, you can use the file_get_contents function remotely, but many hosts don't allow this.
I built an application on top of the foursquare API. It works perfectly on localhost but as soon as I upload it to a public website it stops working. I have checked to see if php is working by placing simple echos throughout my code and have had no issues. On my localhost I am able to echo information from the JSON the foursquare generates. When it is on the public server it echos nothing.
$urlgen = "https://api.foursquare.com/v2/venues/search?near={$city}&query={$query}&client_id={$client_id}&client_secret={$client_secret}&v=20141015";
$resultFour = fetchData($urlgen);
echo "$resultFour";
This works code returns JSON on localhost but not on the website.
Fetch Data:
function fetchData($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
$result = curl_exec($ch);
curl_close($ch);
return $result;
}
Check if your public server has curl installed (i assume that fethData use curl to connect to server). If you are sure that it is installed. For me works code:
private function fetchUrl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false);
$feedData = curl_exec($ch);
curl_close($ch);
return $feedData;
}
I have an Actiontec V1000H router. I want to access its "WAN Ethernet Status" page using a script (which will extract the sent and received packet counts for plotting). From a browser, this URL works fine:
http://192.168.1.1/modemstatus_wanethstatus.html
But, when I use that URL in my script, I nearly always get the main screen. (It works on rare occasions.) Here's my script:
$wanStatusUrl = "http://192.168.1.1/modemstatus_wanethstatus.html";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $wanStatusUrl);
curl_setopt($ch, CURLOPT_USERPWD, 'admin:myPassword');
$output = curl_exec($ch);
curl_close($ch);
I need help accessing the modemstatus_wanethstatus.html page. I believe the issue is due to some idiocycracy of the modem.
Use this so that curl return you the html source as response into your $output:
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
The main screen has a "login" button, and adding the equivalent of that prior to accessing the WAN Status screen made it work. So, for the record:
// login
$loginUrl = 'http://192.168.1.1/login.cgi?inputUserName=admin&inputPassword=myPassword¬hankyou=1';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $loginUrl);
curl_exec($ch);
curl_close($ch);
// get status page
$wanStatusUrl = "http://192.168.1.1/modemstatus_wanethstatus.html";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $wanStatusUrl);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // so curl_exec returns the response
$responseText = curl_exec($ch);
curl_close($ch);
// print $responseText; // contains wanEthStatus_ReceivedPackets and wanEthStatus_SendPackets
// get the two packet counts ... wanEthStatus_ReceivedPackets and wanEthStatus_SendPackets
preg_match( "/wanEthStatus_ReceivedPackets.*?\'(\d+)\';.*?\'(\d+)\';.*?wanEthStatus_TimeSpan/s", $responseText, $matches );
print_r( $matches );
"Man Always Wins in the End."