I am trying to Scrape dropdown data from this https://www.equibase.com/premium/eqbRaceChartCalendar.cfm?SAP=TN
I use curl to create dom from Url
<?php
include('simple_html_dom.php');
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$returned_content = get_data('https://www.equibase.com/premium/eqbRaceChartCalendar.cfm?SAP=TN');
echo $returned_content;
?>
When i get content like that it's not return the hole webpage. it's not returnning middle part content(search aria and calander aria) of this page.
so i cant Scrape from Track,month,Year dropdown data because that part not return with curl. How return hole web page with curl with middle content
Related
I would like to display my tiktok profile page on my website by going through curl with php, I tried with one of my site it displays well but when I enter the link of my tiktok profile then it displays a blank page.
here is my php code
<?php
$url = "https://www.tiktok.com/#user_me";
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
$result = curl_exec($ch);
curl_close($ch);
echo $result;
?>
I am writing a tool that accesses a set of external website pages. Here is my test code to see if I can retrieve the page:
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
/* curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); */
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$returned_content = get_data('http://www.imdb.com/');
echo $returned_content;
The thing is when I pass Google (for example) as the URL, I get the Google homepage in my browser (sans images for obvious reasons), but when I pass the site I want to see, www.imdb.com, I get nothing. Why is this, and what can I do about it?
i write following code to get html data from url and its working for https site like Facebook but not working for Instagram only.
Instagram returns the blank
<?php
$url = 'https://www.instagram.com';
$returned_content = get_data($url);
print_r($returned_content)
/* gets the data from a URL */
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
?>
The Instagram will return only javascript, that can't be render by your browser because it uses dynamic path, so <script src='/path/file.js'> will try to get localhost/path/file.js instead of instagram.com/path/file.js and in this situation the localhost/path/file.js not will exist, so the page will be blank.
One solution is find a way to give the full HTML instead of the Javascript, in this case you can use the "User-Agent" to do this trick. You might know that JS not handle by the search-engine, so for this situation the Instagram (and many websites) give the page without JS that is supported by the bot.
So, add this:
curl_setopt($ch, CURLOPT_USERAGENT, "ABACHOBot");
The "ABACHOBot" is one Crawler. In this page you can found many others alternatives, like a "Baiduspider", "BecomeBot"...
You can use "generic" user-agent too, like "bot", "spider", "crawler" and probably will work too.
Here try this on
<?php
$url = 'https://www.instagram.com';
$returned_content = get_data($url);
print_r($returned_content);
/* gets the data from a URL */
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
//Update.................
curl_setopt($ch, CURLOPT_USERAGENT, 'spider');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_HEADER, false);
//....................................................
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
?>
You should pass
curl_setopt($ch,CURLOPT_SSL_VERIFYPEER, false)
and other header info as above.
For more detail,Please see
http://stackoverflow.com/questions/4372710/php-curl-https
scraping perticular data from website using php without using any tools,i have tried this code but it is not sufficient-
<?php
$url = 'http://www.google.com';
$output = file_get_contents($url);
echo $output;
?>
you can used curl in php
<?
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.example.com');
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec();
curl_close($ch);
?>
where $data contain html of given url
i input some phone number on http://www.dndstatus.com/dnd-check-process.php
and after submiting it shows details like DND status,network operator and Telecom Circle.
I want to fetch these details from the resulting webpage (http://www.dndstatus.com/dnd-check-process.php?num=9721395967) and store to a json file.
How can i do that??
/* gets the data from a URL */
function get_data($url) {
$ch = curl_init();
$timeout = 5;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
$strDetails = get_data("http://www.dndstatus.com/dnd-check-process.php?num=9721395967");
Alternatively, you can use the file_get_contents function remotely, but many hosts don't allow this.