Retreiving a 3rd-party webpage with CURL/PHP - not entirely working

Retreiving a 3rd-party webpage with CURL/PHP - not entirely working - php

I am writing a tool that accesses a set of external website pages. Here is my test code to see if I can retrieve the page:
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
/* curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); */
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$returned_content = get_data('http://www.imdb.com/');
echo $returned_content;
The thing is when I pass Google (for example) as the URL, I get the Google homepage in my browser (sans images for obvious reasons), but when I pass the site I want to see, www.imdb.com, I get nothing. Why is this, and what can I do about it?

Related

scrape data from dropdown php

I am trying to Scrape dropdown data from this https://www.equibase.com/premium/eqbRaceChartCalendar.cfm?SAP=TN
I use curl to create dom from Url
<?php
include('simple_html_dom.php');
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$returned_content = get_data('https://www.equibase.com/premium/eqbRaceChartCalendar.cfm?SAP=TN');
echo $returned_content;
?>
When i get content like that it's not return the hole webpage. it's not returnning middle part content(search aria and calander aria) of this page.
so i cant Scrape from Track,month,Year dropdown data because that part not return with curl. How return hole web page with curl with middle content

why Instagram returns blank to CURL request?

i write following code to get html data from url and its working for https site like Facebook but not working for Instagram only.
Instagram returns the blank
<?php
$url = 'https://www.instagram.com';
$returned_content = get_data($url);
print_r($returned_content)
/* gets the data from a URL */
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
?>

The Instagram will return only javascript, that can't be render by your browser because it uses dynamic path, so <script src='/path/file.js'> will try to get localhost/path/file.js instead of instagram.com/path/file.js and in this situation the localhost/path/file.js not will exist, so the page will be blank.
One solution is find a way to give the full HTML instead of the Javascript, in this case you can use the "User-Agent" to do this trick. You might know that JS not handle by the search-engine, so for this situation the Instagram (and many websites) give the page without JS that is supported by the bot.
So, add this:
curl_setopt($ch, CURLOPT_USERAGENT, "ABACHOBot");
The "ABACHOBot" is one Crawler. In this page you can found many others alternatives, like a "Baiduspider", "BecomeBot"...
You can use "generic" user-agent too, like "bot", "spider", "crawler" and probably will work too.

Here try this on
<?php
$url = 'https://www.instagram.com';
$returned_content = get_data($url);
print_r($returned_content);
/* gets the data from a URL */
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
//Update.................
curl_setopt($ch, CURLOPT_USERAGENT, 'spider');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_HEADER, false);
//....................................................
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
?>
You should pass
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false)
and other header info as above.
For more detail,Please see
http://stackoverflow.com/questions/4372710/php-curl-https

Fetch some data from webpage And store the datas into json file using php or node.js

i input some phone number on http://www.dndstatus.com/dnd-check-process.php
and after submiting it shows details like DND status,network operator and Telecom Circle.
I want to fetch these details from the resulting webpage (http://www.dndstatus.com/dnd-check-process.php?num=9721395967) and store to a json file.
How can i do that??

/* gets the data from a URL */
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$strDetails = get_data("http://www.dndstatus.com/dnd-check-process.php?num=9721395967");
Alternatively, you can use the file_get_contents function remotely, but many hosts don't allow this.

Foursquare API only works locally

I built an application on top of the foursquare API. It works perfectly on localhost but as soon as I upload it to a public website it stops working. I have checked to see if php is working by placing simple echos throughout my code and have had no issues. On my localhost I am able to echo information from the JSON the foursquare generates. When it is on the public server it echos nothing.
$urlgen = "https://api.foursquare.com/v2/venues/search?near={$city}&query={$query}&client_id={$client_id}&client_secret={$client_secret}&v=20141015";
$resultFour = fetchData($urlgen);
echo "$resultFour";
This works code returns JSON on localhost but not on the website.
Fetch Data:
function fetchData($url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
$result = curl_exec($ch);
curl_close($ch);
return $result;
}

Check if your public server has curl installed (i assume that fethData use curl to connect to server). If you are sure that it is installed. For me works code:
private function fetchUrl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 20);
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false);
$feedData = curl_exec($ch);
curl_close($ch);
return $feedData;
}

Accessing Actiontec modem screens via PHP

I have an Actiontec V1000H router. I want to access its "WAN Ethernet Status" page using a script (which will extract the sent and received packet counts for plotting). From a browser, this URL works fine:
http://192.168.1.1/modemstatus_wanethstatus.html
But, when I use that URL in my script, I nearly always get the main screen. (It works on rare occasions.) Here's my script:
$wanStatusUrl = "http://192.168.1.1/modemstatus_wanethstatus.html";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $wanStatusUrl);
curl_setopt($ch, CURLOPT_USERPWD, 'admin:myPassword');
$output = curl_exec($ch);
curl_close($ch);
I need help accessing the modemstatus_wanethstatus.html page. I believe the issue is due to some idiocycracy of the modem.

Use this so that curl return you the html source as response into your $output:
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

The main screen has a "login" button, and adding the equivalent of that prior to accessing the WAN Status screen made it work. So, for the record:
// login
$loginUrl = 'http://192.168.1.1/login.cgi?inputUserName=admin&inputPassword=myPassword&nothankyou=1';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $loginUrl);
curl_exec($ch);
curl_close($ch);
// get status page
$wanStatusUrl = "http://192.168.1.1/modemstatus_wanethstatus.html";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $wanStatusUrl);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // so curl_exec returns the response
$responseText = curl_exec($ch);
curl_close($ch);
// print $responseText; // contains wanEthStatus_ReceivedPackets and wanEthStatus_SendPackets
// get the two packet counts ... wanEthStatus_ReceivedPackets and wanEthStatus_SendPackets
preg_match( "/wanEthStatus_ReceivedPackets.*?\'(\d+)\';.*?\'(\d+)\';.*?wanEthStatus_TimeSpan/s", $responseText, $matches );
print_r( $matches );
"Man Always Wins in the End."

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Retreiving a 3rd-party webpage with CURL/PHP - not entirely working - php

Related

scrape data from dropdown php

why Instagram returns blank to CURL request?

Fetch some data from webpage And store the datas into json file using php or node.js

Foursquare API only works locally

Accessing Actiontec modem screens via PHP

Categories

Resources