How to have multiple CURLOPT_USERAGENT in one query? - php

I have the following code which currently use only one user agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8.
My question is how I can use multiple user agents in one time? It should change user agent if the current one will not pass.
function file_get_contents_curl($url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8');
$html = curl_exec($ch);
curl_close($ch);
return $html;
}

Related

cURL request does not display any content

I am trying to recover the content of a page with PHP cURL. It works well on other websites, but on this website it does not work and I don't why.
Here is my code :
$curl = curl_init();
curl_setopt($curl, CURLOPT_URL, 'https://ratings.fide.com/top.phtml');
curl_setopt($curl, CURLOPT_RETURNTRANSFER, true);
curl_setopt($curl, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
curl_setopt($curl, CURLOPT_FOLLOWLOCATION, true);
$result = curl_exec($curl);
curl_close($curl);
echo $result;

Get data that PHP curl is sending

PHP's curl is surprisingly undebugable and obscure. I have some problem downloading a JSON API data with cURL. I want to see what is exactly cURL sending to the remote HTTP server.
Currently the only debug option I have is to temporarily send request to some simple HTTP server that writes input to stdout. I would need to write that server just to debug curl!
What I do:
function get_data($url) {
$ch = curl_init();
echo "Download: $url.\n";
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
// I hoped to get some debug info
// but this setting has no effect
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_HEADER, array(
'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0',
'X-Purpose: Counting downloads.'
));
echo "Sending: \n".curl_getinfo($ch, CURLINFO_HEADER_OUT);
$data = curl_exec($ch);
var_dump($data);
echo curl_error($ch)." ".curl_errno($ch);
curl_close($ch);
return $data;
}
How can I get the data that is sent by cURL as a text?
If you want to define the headers you should use CURLOPT_HTTPHEADER and not CURLOPT_HEADER, i.e.:
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0',
'X-Purpose: Counting downloads.'
));
To get the the content curl is sending use:
curl_setopt($handle, CURLOPT_VERBOSE, true);
curl_setopt($handle, CURLOPT_STDERR,$f = fopen($verbosePath, "w+"));
function get_data($url) {
$verbosePath = __DIR__.DIRECTORY_SEPARATOR.'verbose.txt';
echo "Saving verbose to: $verbosePath\n";
$handle=curl_init('http://www.google.com/');
curl_setopt($handle, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($handle, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($handle, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($handle, CURLOPT_HTTPHEADER, array(
'User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0',
'X-Purpose: Counting downloads.'
));
curl_setopt($handle, CURLOPT_VERBOSE, true);
curl_setopt($handle, CURLOPT_STDERR,$f = fopen($verbosePath, "w+"));
$data = curl_exec($handle);
curl_close($handle);
fclose($f);
return $data;
}
get_data("https://www.google.com");
verbose.txt
* About to connect() to www.google.com port 80
* Trying 172.217.0.100... * connected
* Connected to www.google.com (172.217.0.100) port 80
> GET / HTTP/1.1
Host: www.google.com
Accept: */*
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0
X-Purpose: Counting downloads.

Fetch content of google scholar page with php

This is my journal page in Google Scholar:
https://scholar.google.com/citations?user=F4z6guYAAAAJ
I can check the page with browser. But can not get contents by PHP (Curl or File_get_contents)
I tried many headers but was not useful.
Update : My code is here:
$fgc_context = stream_context_create(array(
'http'=>array(
'method'=>"GET",
'header'=>"Accept: text/html,application/xhtml+xml,application/xml\r\n" .
"Accept-Charset: ISO-8859-1,utf-8\r\n" .
"Accept-Encoding: gzip,deflate,sdch\r\n" .
"Accept-Language: en-US,en;q=0.8\r\n",
"timeout" => 60,
'user_agent'=>"user_agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9\r\n"
)
));
ini_set('user_agent', 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9');
$wcnt = #file_get_contents($the_journal_url, false, $fgc_context);
And google return a page ends with:
<H1>Server Error</H1> We're sorry but it appears that there has been an internal server error while processing your request. Our engineers have been notified and are working to resolve the issue.<p>Please try again later.</p>
Try with this code :
(run it 2 times to create the cookie the first time)
$cookie = __DIR__ . '/cookie.txt';
$ch = curl_init();
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_URL, 'https://scholar.google.com/citations?user=F4z6guYAAAAJ');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0');
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
$data = curl_exec($ch);
curl_close($ch);
echo $data;

Curl gives a different response than normal browser

I am trying to use curl to get response from a web page. But I get different responses while using curl and when browsing normally.
PHP file
$ch = curl_init();
echo "trying<br>";
//$url = "home.iitk.ac.in/~gopi/student_search/feedback.php";
$roll_no = "11101";
$name="";
$program="all";
$department="all";
$email="";
$gender="both";
$city="";
$course="";
$order="id";
$hostel="";
$bg='';
$tile = '0';
$offset = 0;
$url = "http://search.junta.iitk.ac.in/get2.php?&tile=0&roll_no=".$roll_no."&name=".$name."
&program=".$program."&dept=".$department."&login=".$email."&gender=".$gender."
&city=".$city."&course=".$course."&hostel=".$hostel."&bg=".$bg."&offset=".$offset;
echo $url;
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
$ret_val = curl_error($ch);
echo $result;
echo $ret_val;
curl_close($ch);
And this results me in this page,
But when I directly go to the same url, it gives me 44 results.
and more results.
How do I get the same result using curl?
Edit
(doesn't work.)
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_REFERER, 'http://search.junta.iitk.ac.in');
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
$result = curl_exec($ch);
Looks like the targeted server checks the user agent and if it's not a real browser it throws it a way(or generally behaves differently).
Try specifying the user agent - http://curl.haxx.se/docs/manpage.html
For example:
curl -A "Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_3 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8J2 Safari/6533.18.5" http://www.apple.com
In PHP:
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');

Amazon Blocks cURL Request?

I am trying to use php cURL to fetch amazon web page but get
HTTP/1.1 503 Service Temporarily Unavailable instead. Is Amazon blocking cURL?
http://www.amazon.com/gp/offer-listing/B003B7Q5YY/
<?php
function get_html_content($url) {
// fake user agent
$userAgent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.2) Gecko/20070219 Firefox/2.0.0.2';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $userAgent);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch,CURLOPT_COOKIEFILE,'cookies.txt');
curl_setopt($ch,CURLOPT_COOKIEJAR,'cookies.txt');
$string = curl_exec($ch);
curl_close($ch);
return $string;
}
echo get_html_content("http://www.amazon.com/gp/offer-listing/B003B7Q5YY");
?>
I use simple
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $offers_page);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5');
$html = curl_exec($ch);
curl_close($ch);
but i have another problem. if you send a lot of queries to amazon - they start send 500 page to you.

Categories