How to use PHP curl to fetch www.yahoo.com page? - php

Try to write a simple crawler method. When I use PHP curl to get the www.yahoo.com page, I fetch nothing. How can I fetch the page?
My code is in the following.
public function getWebPage($url, $timeout = 120) {
$options = array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => false,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_ENCODING => "",
CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.19) Gecko/20081216 Ubuntu/8.04 (hardy) Firefox/2.0.0.19",
CURLOPT_AUTOREFERER => true,
CURLOPT_CONNECTTIMEOUT => $timeout,
CURLOPT_TIMEOUT => $timeout,
CURLOPT_MAXREDIRS => 10,
);
$ch = curl_init($url);
curl_setopt_array($ch, $options);
$content = curl_exec($ch);
$err = curl_errno($ch);
$errmsg = curl_error($ch);
$header = curl_getinfo($ch);
curl_close($ch);
return $content;
}

The yahoo.com runs on secure socket layer. So add this cURL param to your existing set.
CURLOPT_SSL_VERIFYPEER => false,
and also disable the USERAGENT..
The working code.. (tested)
<?php
class A
{
public function getWebPage($url, $timeout = 120) {
$options = array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => false,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_ENCODING => "",
//CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.19) Gecko/20081216 Ubuntu/8.04 (hardy) Firefox/2.0.0.19",
CURLOPT_AUTOREFERER => true,
CURLOPT_CONNECTTIMEOUT => $timeout,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_TIMEOUT => $timeout,
CURLOPT_MAXREDIRS => 10,
);
$ch = curl_init($url);
curl_setopt_array($ch, $options);
$content = curl_exec($ch);
$err = curl_errno($ch);
$errmsg = curl_error($ch);
$header = curl_getinfo($ch);
curl_close($ch);
return $content;
}
}
$a = new A;
echo $a->getWebPage('www.yahoo.com');

Related

Php curl with proxy response 502

I am getting 502 as response from a specific site while scrapping, it works with other sites and without proxy works too, can anyone help ?
function doo ( $proxyip, $port, $auth, $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => false,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_ENCODING => "",
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36", // tell yourself i am this
CURLOPT_AUTOREFERER => true,
CURLOPT_CONNECTTIMEOUT => 120,
CURLOPT_TIMEOUT => 120,
CURLOPT_MAXREDIRS => 10,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_PROXY => $proxyip,
CURLOPT_PROXYPORT => $port,
CURLOPT_PROXYUSERPWD => $auth,
CURLOPT_HTTPPROXYTUNNEL => 1,
CURLOPT_PROXYTYPE => 'HTTP'
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header = $content;
return $header;
}
Exact Error:
Failure when receiving data from the peer: Received HTTP code 502 from proxy after CONNECT

php - how can i getting a web page content with curl for this site

how can i getting a web page content with curl for this site:
divar . ir
I want try to get a page from a website with curl,But not work
I wrote the following code page 404 is displayed.
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_SSL_VERIFYPEER => false // Disabled SSL Cert checks
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
$link="https://divar.ir/";
$res =get_web_page($link);
echo $res['content'];
Change CURLOPT_USERAGENT
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0",

PHP cURL code won't work [duplicate]

This question already has answers here:
How do I add PHP code/file to HTML(.html) files?
(12 answers)
Closed 6 years ago.
I am trying to learn how to use PHP cURL and I am following a tutorial and while using Wamp. I am going to localhost and I never see the result of the code no matter the changes I do, all I see is:
This is my code:
<html>
<head>
</head>
<body>
<?php
function curl($url){
$options = Array(
CURLOPT_RETURNTRANSFER => TRUE,
CURLOPT_FOLLOWLOCATION => TRUE,
CURLOPT_AUTOREFERER => TRUE,
CURLOPT_CONNECTTIMEOUT => 120,
CURLOPT_TIMEOUT => 120,
CURLOPT_MAXREDIRS => 10,
CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8",
CURLOPT_URL => $url,
$ch = curl_init();
curl_setopt_array($ch, $options);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
function scrape_between($data, $start, $end){
$data= stristr($data, $start);
$data= substr($data, strlen($start));
$stop= stripos($data, $end);
$data= substr($data, 0, $stop);
return $data;
}
$scraped_page = curl("http://www.imdb.com"); // Downloading IMDB home page to variable $scraped_page
$scraped_data = scrape_between($scraped_page, "<title>", "</title>"); // Scraping downloaded dara in $scraped_page for content between <title> and </title> tags
echo $scraped_data; // Echoing $scraped data, should show "The Internet Movie Database (IMDb)"
?>
</body>
</html>
Change the file extension to php, not html (i.e. Make it end in .php). To quote #John Conde from this answer:
You can't run PHP in .html files because the server does not recognize that as a valid PHP extension unless you tell it to.
So you could modify the web server (e.g Apache, IIS, etc.) to process files with the HTML extension as PHP files.
Also, ensure that the assignment of the options array is ended with a closing parenthesis terminated with a semi-colon. For more information about arrays see php.net/array. So this line:
$options = Array(
CURLOPT_RETURNTRANSFER => TRUE,
CURLOPT_FOLLOWLOCATION => TRUE,
CURLOPT_AUTOREFERER => TRUE,
CURLOPT_CONNECTTIMEOUT => 120,
CURLOPT_TIMEOUT => 120,
CURLOPT_MAXREDIRS => 10,
CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8",
CURLOPT_URL => $url,
should be updated to:
$options = Array(
CURLOPT_RETURNTRANSFER => TRUE,
CURLOPT_FOLLOWLOCATION => TRUE,
CURLOPT_AUTOREFERER => TRUE,
CURLOPT_CONNECTTIMEOUT => 120,
CURLOPT_TIMEOUT => 120,
CURLOPT_MAXREDIRS => 10,
CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8",
CURLOPT_URL => $url
);
You can see this working on this phpfiddle
You missed ')' in line 18 (CURLOPT_URL => $url,)
Try this
<html>
<head>
</head>
<body>
<?php
function curl($url){
$options = Array(
CURLOPT_RETURNTRANSFER => TRUE,
CURLOPT_FOLLOWLOCATION => TRUE,
CURLOPT_AUTOREFERER => TRUE,
CURLOPT_CONNECTTIMEOUT => 120,
CURLOPT_TIMEOUT => 120,
CURLOPT_MAXREDIRS => 10,
CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8",
CURLOPT_URL => $url);
$ch = curl_init();
curl_setopt_array($ch, $options);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
function scrape_between($data, $start, $end){
$data= stristr($data, $start);
$data= substr($data, strlen($start));
$stop= stripos($data, $end);
$data= substr($data, 0, $stop);
return $data;
}
$scraped_page = curl("http://www.imdb.com"); // Downloading IMDB home page to variable $scraped_page
$scraped_data = scrape_between($scraped_page, "<title>", "</title>"); // Scraping downloaded dara in $scraped_page for content between <title> and </title> tags
echo $scraped_data; // Echoing $scraped data, should show "The Internet Movie Database (IMDb)"
?>
</body>
</html>

Creating cURL With Different POST DATA

I created a code algorithm that creates 10 different numbers. By using the PHP cURL I want to send POST requests to a website with those different 10 numbers, and then get the response bodies for them. Here is my example code for generating 10 different numbers:
function solver($aaa,$bbb,$number) {
$solo = substr($aaa,0,9);
$x=substr($solo,0,5);
$y=substr($solo,5,4);
if ($bbb == 0) {
for ($i = 1; $i <= $number ; $i++ ) {
$xx=$x+8*$i;
$dokuz=$xx.$y-1*$i;
$yeni=$dokuz;
echo $yeni."<br>";
}
}
}
solver("12345678912",0,10);
Here is the cURL:
<?
$url = "http:/example.com/solver.aspx";
$postdata = 'number'.'='.$yeni;
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
$result = curl_exec($ch);
preg_match_all("/id=\"__VIEWSTATE\" value=\"(.*?)\"/", $result, $arr_viewstate);
$viewstate = urlencode($arr_viewstate[1][0]);
preg_match_all("/id=\"__EVENTVALIDATION\" value=\"(.*?)\"/", $result, $arr_validation);
$eventvalidation = urlencode($arr_validation[1][0]);
preg_match_all("/id=\"__LASTFOCUS\" value=\"(.*?)\"/", $result, $arr_lastfocus);
$lastfocus = urlencode($arr_lastfocus[1][0]);
preg_match_all("/id=\"__EVENTTARGET\" value=\"(.*?)\"/", $result, $arr_eventtarget);
$eventtarget = urlencode($arr_eventtarget[1][0]);
preg_match_all("/id=\"__EVENTARGUMENT\" value=\"(.*?)\"/", $result, $arr_eventargument);
$eventargument = urlencode($arr_eventargument[1][0]);
$options = array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => true,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_ENCODING => "",
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0",
CURLOPT_AUTOREFERER => true,
CURLOPT_CONNECTTIMEOUT => 120,
CURLOPT_TIMEOUT => 120,
CURLOPT_MAXREDIRS => 10,
CURLOPT_POST => true,
CURLOPT_POSTFIELDS => '__EVENTTARGET='.$eventtarget.'&__EVENTARGUMENT='.$eventargument.'&__VIEWSTATE='.$viewstate.'&__EVENTVALIDATION='.$eventvalidation.'&__LASTFOCUS='.$lastfocus.'&'.$postdata.'&Submit=submit');
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$result = curl_exec ($ch);
preg_match("/<input name=\"Adi\" type=\"text\" value=\"(.*?)\" maxlength=\"25\" id=\"txtAdi\" disabled=\"disabled\" class=\"aspNetDisabled\" \/>/", $result, $adi);
$cikan = "<b>".$yeni."</b>"." "." ".$adi[1]." ";
print($cikan);
curl_close ($ch);
So, I am trying to make the cURL reply 10 times with those 10 different numbers. Can anyone help me about this issue?
Thank you,

PHP Curl 302 authentication with cookies

I am trying to learn to use PHP curl and it seemed to go well until I have tried to authenticate to changeip.com. Here is the function I use to make a Curl call:
function request($ch, $url, $params = array())
{
$options = array
(
CURLOPT_URL => $url,
CURLOPT_USERAGENT => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8',
//CURLOPT_COOKIESESSION => TRUE,
CURLOPT_FOLLOWLOCATION => TRUE,
CURLOPT_HEADER => TRUE,
CURLOPT_RETURNTRANSFER => TRUE,
CURLOPT_SSL_VERIFYPEER => FALSE,
CURLOPT_SSL_VERIFYPEER => FALSE,
CURLINFO_HEADER_OUT => TRUE,
CURLOPT_CONNECTTIMEOUT => 30,
CURLOPT_TIMEOUT => 30,
CURLOPT_MAXREDIRS => 30,
CURLOPT_VERBOSE => TRUE,
CURLOPT_COOKIEJAR => __DIR__ . DIRECTORY_SEPARATOR . 'cookies.txt',
CURLOPT_COOKIEFILE => __DIR__ . DIRECTORY_SEPARATOR . 'cookies.txt',
CURLOPT_HTTPHEADER => array
(
'Host: www.changeip.com',
'Pragma:',
'Expect:',
'Keep-alive: 115',
'Connection: keep-alive',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language: en-us,en;q=0.5',
//'Accept-Encoding: gzip,deflate',
'Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7',
'Content-Type: application/x-www-form-urlencoded',
),
);
if(!empty($params['referrer']))
{
$options[CURLOPT_REFERER] = $params['referrer'];
}
if(!empty($params['post']))
{
$options[CURLOPT_POST] = TRUE;
$options[CURLOPT_POSTFIELDS] = $params['post'];
}
curl_setopt_array($ch, $options);
$return = array();
$return['body'] = curl_exec($ch);
$info = curl_getinfo($ch);
//die(var_dump( curl_getinfo($ch, CURLINFO_HEADER_OUT) ));
$return['header'] = http_parse_headers(substr($return['body'], 0, $info['header_size']));
$return['body'] = substr($return['body'], $info['header_size']);
/*if(!empty($return['header']['Location']))
{
$params['referrer'] = $url;
return request($ch, substr($url, 0, strrpos($url, '/')+1) . $return['header']['Location'], $params);
}*/
return $return;
}
And here is the actual call:
// chaneip
$ch = curl_init();
// login
$params = array();
$params['post'] = array
(
'p' => 'aaaaaa2',
'u' => 'aaaaaa2',
);
$params['referrer'] = 'https://www.changeip.com/login.asp';
$return = request($ch, 'https://www.changeip.com/loginverify.asp?', $params);
However, this script does not retrieve valid cookies from changeip.com, i.e., does not authenticate. I have tried to compare Curl sent headers with HTTPLiveHeaders expecting to find any difference but in the end I didn't find anything. Can anyone advice me what is missing to make this work?
Commonly given question:
is cookie.txt 0777? Yes and the script does actually create some sort of cookie:
www.changeip.com FALSE / FALSE 0 ACloginAddrs 6
www.changeip.com FALSE / FALSE 0 ASPSESSIONIDCCSSCQRA DNHKGDICMKHFIJADMAPPMHHC
But it isn't a valid cookie.
$options[CURLOPT_POSTFIELDS] = http_build_query($params['post']);
Fixed the issue.

Categories