I am getting 502 as response from a specific site while scrapping, it works with other sites and without proxy works too, can anyone help ?
function doo ( $proxyip, $port, $auth, $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => false,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_ENCODING => "",
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.182 Safari/537.36", // tell yourself i am this
CURLOPT_AUTOREFERER => true,
CURLOPT_CONNECTTIMEOUT => 120,
CURLOPT_TIMEOUT => 120,
CURLOPT_MAXREDIRS => 10,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_PROXY => $proxyip,
CURLOPT_PROXYPORT => $port,
CURLOPT_PROXYUSERPWD => $auth,
CURLOPT_HTTPPROXYTUNNEL => 1,
CURLOPT_PROXYTYPE => 'HTTP'
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header = $content;
return $header;
}
Exact Error:
Failure when receiving data from the peer: Received HTTP code 502 from proxy after CONNECT
Related
how can i getting a web page content with curl for this site:
divar . ir
I want try to get a page from a website with curl,But not work
I wrote the following code page 404 is displayed.
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_SSL_VERIFYPEER => false // Disabled SSL Cert checks
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
$link="https://divar.ir/";
$res =get_web_page($link);
echo $res['content'];
Change CURLOPT_USERAGENT
CURLOPT_USERAGENT => "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:53.0) Gecko/20100101 Firefox/53.0",
I'm trying to extract a download link from this website using cURL.
http://abelhas.pt/Asantino/TUTORIAIS/SONY+VEGAS/ZLICED+TRAILER/ZLICED+TRAILER/ZLICED+TRAILER+PRE-RENDERD+FX,36948354.mp4(video)
Opening the link gives u a download button, pressing it will then call a ajax script that returns some J SON data from following URL
http://abelhas.pt/action/License/Download
This URL expects 2 parameters ( fileId(int) & __RequestVerificationToken(str)) as POST parameters and this header needs to be send aswell (X-Requested-With: XMLHttpRequest)
This is the script I'm using to login & fetch the __RequestVerificationToken && the fileId from the Item page
$options = array(
'http'=>array(
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"Cookie:".$cookie.";\r\n" . // check function.stream-context-create on php.net
"User-Agent: Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.102011-10-16 20:23:10\r\n"
)
);
$context = stream_context_create($options);
$html = file_get_html($url,false,$context);
$token = $html->find('input[name=__RequestVerificationToken]',0)->value;
$id = $html->find('a.fileCopyAction',0)->rel;
return array(
'id' => $id,
'token' => $token
);
Followed by this code that I use to extract the download link(this does not work)
$res = array();
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // do not return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_POSTFIELDS => 'fileId='.$file_id.'&__RequestVerificationToken='.$token,
CURLOPT_HTTPHEADER => array(
'Accept:*/*',
'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Language: en-US,en;q=0.8',
'Cache-Control: max-age=0',
'Connection: keep-alive',
'Content-Type: application/x-www-form-urlencoded',
'Cookie: '.$cookie,
'Host: abelhas.pt',
'Origin: http://abelhas.pt',
'X-Requested-With: XMLHttpRequest',
),
);
$ch = curl_init("http://abelhas.pt/action/License/Download");
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$res['content'] = $content;
$res['url'] = $header['url'];
return $res;
This returns J SON data but not what the website returns if u inspect it with chrome/firefox.
So the question remains : How to do this to get the same results as if visiting it urself using a browser?
Thanks in advance.
after a user logged into my website a session is started with name "xy-login".
I now want to use cURL to get the content of another webressource which accepts the same session (F.e. xylogin=lqb3a4fma4ggf4rr49ktfggel).
With wget I would do something like this:
$opts = array('http' => array('header'=> 'Cookie: ' . $_SERVER['HTTP_COOKIE']."\r\n"));
$contents = file_get_contents('http://localhst/xy/print', false, $context);
But how would I need to change this function to pass the session (as with wget)?
https://stackoverflow.com/a/14953910/1474777
function get_web_page( $url )
{
$user_agent='Mozilla/5.0 (Windows NT 6.1; rv:8.0) Gecko/20100101 Firefox/8.0';
$options = array(
CURLOPT_CUSTOMREQUEST =>"GET", //set request type post or get
CURLOPT_POST =>false, //set to GET
CURLOPT_USERAGENT => $user_agent, //set user agent
CURLOPT_COOKIEFILE =>"cookie.txt", //set cookie file
CURLOPT_COOKIEJAR =>"cookie.txt", //set cookie jar
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
I am newbie to drupal and trying to develop a custom module for curl functionality of php which will get the content of page and will display the whole content
as it is.I tried but it is not displaying the content below is the code that I have tried.
function mycurl_menu() {
$items['mycurl'] = array(
'title' => 'My curl demo',
'page callback' => 'mycurl_curl',
'access callback' => TRUE,
'type' => MENU_CALLBACK,
);
return $items;
}
function mycurl_curl() {
$user_agent = "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.65 Safari/537.36";
$postdata ="name=amit&pass=amit";
$options = array(
CURLOPT_CUSTOMREQUEST =>"GET", //set request type post or get
CURLOPT_POST =>false, //set to GET
CURLOPT_USERAGENT => $user_agent, //set user agent
CURLOPT_COOKIEFILE =>"cookie.txt", //set cookie file
CURLOPT_COOKIEJAR =>"cookie.txt", //set cookie jar
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$url = "http://localhost/drupal_commerce/product";
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
You should return $header['content'] since the menu callback is no supposed to return an array, and check the permissions of the page you are fetching with curl so that is it available.
Try to write a simple crawler method. When I use PHP curl to get the www.yahoo.com page, I fetch nothing. How can I fetch the page?
My code is in the following.
public function getWebPage($url, $timeout = 120) {
$options = array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => false,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_ENCODING => "",
CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.19) Gecko/20081216 Ubuntu/8.04 (hardy) Firefox/2.0.0.19",
CURLOPT_AUTOREFERER => true,
CURLOPT_CONNECTTIMEOUT => $timeout,
CURLOPT_TIMEOUT => $timeout,
CURLOPT_MAXREDIRS => 10,
);
$ch = curl_init($url);
curl_setopt_array($ch, $options);
$content = curl_exec($ch);
$err = curl_errno($ch);
$errmsg = curl_error($ch);
$header = curl_getinfo($ch);
curl_close($ch);
return $content;
}
The yahoo.com runs on secure socket layer. So add this cURL param to your existing set.
CURLOPT_SSL_VERIFYPEER => false,
and also disable the USERAGENT..
The working code.. (tested)
<?php
class A
{
public function getWebPage($url, $timeout = 120) {
$options = array(
CURLOPT_RETURNTRANSFER => true,
CURLOPT_HEADER => false,
CURLOPT_FOLLOWLOCATION => true,
CURLOPT_ENCODING => "",
//CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.19) Gecko/20081216 Ubuntu/8.04 (hardy) Firefox/2.0.0.19",
CURLOPT_AUTOREFERER => true,
CURLOPT_CONNECTTIMEOUT => $timeout,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_TIMEOUT => $timeout,
CURLOPT_MAXREDIRS => 10,
);
$ch = curl_init($url);
curl_setopt_array($ch, $options);
$content = curl_exec($ch);
$err = curl_errno($ch);
$errmsg = curl_error($ch);
$header = curl_getinfo($ch);
curl_close($ch);
return $content;
}
}
$a = new A;
echo $a->getWebPage('www.yahoo.com');