I am trying to use curl to get response from a web page. But I get different responses while using curl and when browsing normally.
PHP file
$ch = curl_init();
echo "trying<br>";
//$url = "home.iitk.ac.in/~gopi/student_search/feedback.php";
$roll_no = "11101";
$name="";
$program="all";
$department="all";
$email="";
$gender="both";
$city="";
$course="";
$order="id";
$hostel="";
$bg='';
$tile = '0';
$offset = 0;
$url = "http://search.junta.iitk.ac.in/get2.php?&tile=0&roll_no=".$roll_no."&name=".$name."
&program=".$program."&dept=".$department."&login=".$email."&gender=".$gender."
&city=".$city."&course=".$course."&hostel=".$hostel."&bg=".$bg."&offset=".$offset;
echo $url;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
$ret_val = curl_error($ch);
echo $result;
echo $ret_val;
curl_close($ch);
And this results me in this page,
But when I directly go to the same url, it gives me 44 results.
and more results.
How do I get the same result using curl?
Edit
(doesn't work.)
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_REFERER, 'http://search.junta.iitk.ac.in');
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
$result = curl_exec($ch);
Looks like the targeted server checks the user agent and if it's not a real browser it throws it a way(or generally behaves differently).
Try specifying the user agent - http://curl.haxx.se/docs/manpage.html
For example:
curl -A "Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_3 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8J2 Safari/6533.18.5" http://www.apple.com
In PHP:
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
Related
So I found a solution to a problem I was having, (how to determine a domain protocol) How to find the domain is whether HTTP or HTTPS (with or without WWW) using PHP?
Below are two versions of my code;
The first doesn't work as expected it only echoes out my domains.
<?php
$url_list = file('urls.txt');
foreach($url_list as $url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_exec($ch);
$real_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
echo $real_url;
}
?>
The Second version of my code gives me an error of how I am supplied my foreach statement...
<?php
$fn = fopen("urls.txt","r");
while(! feof($fn)) {
$url_list = fgets($fn);
foreach($url_list as $url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_exec($ch);
$real_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
echo $real_url;
#echo $result;
fclose($fn);
}
}
?>
What can I be doing wrong??
Expected results is this, but when reading the domains from a file;
code: reference --> How to find the domain is whether HTTP or HTTPS (with or without WWW) using PHP?
<?php
$url_list = ['facebook.com','google.com'];
foreach($url_list as $url){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_exec($ch);
$real_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
echo $real_url;//add here your db commands
}
?>
Output:
test#linux: php domain-fuzzer.php
https://www.facebook.com/http://www.google.com/#
I found the solution, just incase anyone would find such an issue here is the code:
<?php
$fn = file_get_contents("urls.txt");
$url_list = explode(PHP_EOL, $fn);
foreach($url_list as $url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_exec($ch);
$real_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
echo nl2br("$real_url \n");
}
?>
By default fgets returns a single current line from the file not an array of all lines of the file.
So, your code will be:
<?php
$fn = fopen("urls.txt","r");
while(! feof($fn)) {
$url = fgets($fn);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_exec($ch);
$real_url = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
echo $real_url;
#echo $result;
fclose($fn);
}
?>
cannot download this link using curl php
https://www.economy.gov.ae/PublicationsArabic/2%20%D9%86%D8%B4%D8%B1%D8%A9%20%D8%A7%D9%84%D8%B9%D9%84%D8%A7%D9%85%D8%A7%D8%AA%20%D8%A7%D9%84%D8%AA%D8%AC%D8%A7%D8%B1%D9%8A%D8%A9%20%D8%A7%D9%84%D8%B9%D8%AF%D8%AF%20199-%20%D8%A7%D9%84%D9%86%D8%B4%D8%B1%20%D8%B9%D9%86%20%D8%A7%D9%84%D8%B9%D9%84%D8%A7%D9%85%D8%A7%D8%AA%20%D8%A7%D9%84%D9%85%D9%82%D8%A8%D9%88%D9%84%D8%A9.pdf
tried basic curl didn't work, wget don't work too
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch, CURLOPT_ENCODING, "utf-8");
echo $output=curl_exec($ch);
curl_close($ch);
empty pdf or 189 byte
just website use post request before that get request and you need to forge it before as it has a certain logic on app, now it work right
Try urldecode before using curl:
$ch = curl_init();
$url = urldecode($url);
curl_setopt($ch,CURLOPT_URL,$url);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($ch, CURLOPT_ENCODING, "utf-8");
echo $output=curl_exec($ch);
curl_close($ch);
I am trying to recover the content of a page with PHP cURL. It works well on other websites, but on this website it does not work and I don't why.
Here is my code :
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'https://ratings.fide.com/top.phtml');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
$result = curl_exec($curl);
curl_close($curl);
echo $result;
This is my journal page in Google Scholar:
https://scholar.google.com/citations?user=F4z6guYAAAAJ
I can check the page with browser. But can not get contents by PHP (Curl or File_get_contents)
I tried many headers but was not useful.
Update : My code is here:
$fgc_context = stream_context_create(array(
'http'=>array(
'method'=>"GET",
'header'=>"Accept: text/html,application/xhtml+xml,application/xml\r\n" .
"Accept-Charset: ISO-8859-1,utf-8\r\n" .
"Accept-Encoding: gzip,deflate,sdch\r\n" .
"Accept-Language: en-US,en;q=0.8\r\n",
"timeout" => 60,
'user_agent'=>"user_agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9\r\n"
)
));
ini_set('user_agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9');
$wcnt = #file_get_contents($the_journal_url, false, $fgc_context);
And google return a page ends with:
<H1>Server Error</H1> We're sorry but it appears that there has been an internal server error while processing your request. Our engineers have been notified and are working to resolve the issue.<p>Please try again later.</p>
Try with this code :
(run it 2 times to create the cookie the first time)
$cookie = __DIR__ . '/cookie.txt';
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_URL, 'https://scholar.google.com/citations?user=F4z6guYAAAAJ');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0');
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
echo $data;
Works great, even just by typing it into the URL.
But now in my PHP script, when I build the URL it is not working.
Yellow pages API:
http://api2.yp.com/listings/v1/search?searchloc=91203&term=pizza&format=json&sort=distance&radius=5&listingcount=10&key=xxxxxxxxxx
Here is my code snippet
$apiURL = 'http://api2.yp.com/listings/v1/search?searchloc=91203&term=pizza&format=json&sort=distance&radius=5&listingcount=10&key=xxxx';
$ch = curl_init();
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_URL,$apiURL);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
$data = curl_exec($ch);
curl_close($ch);
var_dump($data);
Thanks in advance
Add this cURL param.
curl_setopt($ch, CURLOPT_FOLLOWLOCATION,1);
That is because you are getting 301 redirect from the URL. So you need to add it. [Personally Tested]