I have written a small script to scrape some data from a website using cUrl in PHP. When curl is executed, there is a 301-redirect issued by the site which is taken care of by :
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
However, when I run the same code from my browser, the redirect is NOT working.
Here is the complete curl request:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $arr_params['url']);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_setopt($ch, CURLOPT_VERBOSE, true);
curl_setopt($ch, CURLOPT_STDERR, $arr_params['error_file']);
curl_setopt($ch, CURLOPT_MAXREDIRS, 5);
//curl_setopt($ch, CURLOPT_HTTP_VERSION, CURL_HTTP_VERSION_1_0);
In the above code, $arr_params is set previously....
Related
With this code:
<?php
function request($url, $post, $cook)
{
$ch = curl_init();
$cookie_file_path = "vskcookies.txt";
$urll = 'http://auto.vsk.ru/login.aspx';
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_URL, $urll);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_USERAGENT,'"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; Trident/7.0; Touch; .NET4.0C; .NET4.0E; Tablet PC 2.0)"');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $urll);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
curl_setopt( $ch, CURLOPT_COOKIESESSION, true );
$result = curl_exec($ch);
echo "FIELDS:\n\n".$post;
echo "\n\nHEADERS:\n\n";
curl_close($ch);
return $result;
}
$result = request($_POST['url'], $_POST['data'], $_POST['cook']);
if ($result === FALSE)
echo('error');
else
echo($result);
?>
I am getting two cookies as I need but with http 411 error in body:
The same request, the same code, but in the end, right before curl_exec I add
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Content-Type: application/x-www-form- urlencoded;charset=utf-8"));
As the result, I am getting correct body, but now only one cookie (I need both):
Another variants:
This code
curl_setopt($ch, CURLOPT_FRESH_CONNECT, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Cache-Control: no-cache;Content-Type: application/x-www-form- urlencoded;charset=utf-8;Content-Length: '.strlen($post)));
Gives 411 error and correct cookies for reason.
This
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Content-Length:'.strlen($post)));
Causes nothing: just nothing happens.
Also after both (first)variants, vskcookies.txt contains only one (pool) cookie.
Why that?
It looks like, when I add header it erases request’s body(post fields).
UPDATE
For this code
curl_setopt($ch, CURLOPT_FRESH_CONNECT, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Cache-Control: no-cache;Content-Type: application/x-www-form- urlencoded;'));
I just sitting and clicking on button that makes ajax request of the script above on my page, getting every time 411 error response (also cookies/time in header updates every time), but after about 10 clicks I got http 200 and the page I need. Then again many times 411 and then again a single 200. By the way cookies file still has only one cookie.
Wtf?
As I mentioned in question’s update, sometimes I get 200 response with two cookies and body I need.
So I wrote this is bad unstable solution, often it can remain about 20 seconds:
<?php
function request($url, $post, $cook)
{
$sta=0;
while($sta != '200')
{
$ch = curl_init();
$cookie_file_path = "vskcookies.txt";
$urll = 'http://auto.vsk.ru/login.aspx';
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_URL, $urll);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_USERAGENT,'"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 10.0; Trident/7.0; Touch; .NET4.0C; .NET4.0E; Tablet PC 2.0)"');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, $urll);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
curl_setopt( $ch, CURLOPT_COOKIESESSION, true );
//curl_setopt($ch, CURLOPT_FRESH_CONNECT, true);
curl_setopt($ch, CURLOPT_HTTPHEADER, array('Cache-Control: no-cache;Content-Type: application/x-www-form- urlencoded;'));
//curl_setopt($ch, CURLOPT_POSTREDIR, 2);
$result = curl_exec($ch);
$sta = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
}
echo "FIELDS:\n\n".$post;
echo "\n\nHEADERS:\n\n";
return $result;
}
$result = request($_POST['url'], $_POST['data'], $_POST['cook']);
if ($result === FALSE)
echo('error');
else
echo($result);
?>
Bet there is a much better correct answer.
I have now managed the login with cURL. This is confirmed by the following:
Cookie.txt:
# Netscape HTTP Cookie File
# https://curl.haxx.se/docs/http-cookies.html
# This file was generated by libcurl! Edit at your own risk.
www.profi-ortung.de FALSE / FALSE 0 PHPSESSID hg1e550nsu10h4elgbo65hrtsm
If I log in manually, the log says that the login is working:
Windows Firefox 1.0.7 28-05-2020 09:56:33
Why can't I see the content of the user, i.e. home.php?
The basic code:
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, false);
curl_setopt($ch, CURLOPT_NOBODY, false);
curl_setopt($ch, CURLOPT_URL, "https://www.profi-ortung.de/login.php");
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
//set the cookie the site has for certain features, this is optional
curl_setopt($ch, CURLOPT_COOKIE, "cookiename=0");
curl_setopt($ch, CURLOPT_USERAGENT,
"Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_CUSTOMREQUEST, "POST");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, "username=xxx#xxx&password=xxx");
curl_exec($ch);
//page with the content I want to grab
curl_setopt($ch, CURLOPT_URL, "https://www.profi-ortung.de/home.php");
//do stuff with the info with DomDocument() etc
$html = curl_exec($ch);
curl_close($ch);
?>
I have this code
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,"http://www.web.com/index.php?q=login");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS,
"nick=abc&pass=abc");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$server_output = curl_exec ($ch);
print_r($server_output);
On this address www.web.com/index.php?q=login is a login form. I am trying to log there but without any success.($server_output still contains only login form - it should contains user panel - after successfull login)
You need to conduct other operations like accepting cookies, here is some code which you can use to initialize your curl.
$this->_ch = curl_init();
curl_setopt($this->_ch, CURLOPT_VERBOSE, 0);
curl_setopt($this->_ch, CURLOPT_HEADER, 0);
curl_setopt($this->_ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($this->_ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($this->_ch, CURLOPT_TIMEOUT,1000);
curl_setopt($this->_ch, CURLOPT_COOKIEFILE, "cookies.txt");
curl_setopt($this->_ch, CURLOPT_COOKIEJAR, "cookies.txt");
curl_setopt($this->_ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
I want to login in google.com with php via curl but i am getting login page again and again.
Here is my code
$ch = curl_init();
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 30);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.2) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.79 Safari/537.1 AlexaToolbar/alxg-3.1");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, COOKIEJAR);
curl_setopt($ch, CURLOPT_COOKIEFILE, COOKIEJAR);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 120);
curl_setopt($ch, CURLOPT_TIMEOUT, 120);
curl_setopt($ch, CURLOPT_URL, 'https://accounts.google.com/ServiceLoginAuth');
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post_string);
$result = curl_exec($ch);
var_dump($result);
Also it is not creating the cookie file
I found this link Login to Google with PHP and Curl, Cookie turned off? . there is also the same code as mine but also i am unable to login.
What is wrong in this code ?
When I access http://www.google.com/ in my browser I'm redirected to https://www.google.com/. I'm expecting CURLINFO_EFFECTIVE_URL to return me https when running this code:
$ch = curl_init('http://google.com/');
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)");
curl_setopt($ch, CURLOPT_NOBODY, true);
echo curl_exec($ch); // echos nothing
echo curl_getinfo($ch, CURLINFO_HTTP_CODE); // echos '200'
echo curl_getinfo($ch, CURLINFO_EFFECTIVE_URL); // echos 'http://google.com/'
curl_close($ch);
Will adding a ssl cert and adding the following fields make this work as i'm expecting?
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2);
curl_setopt($ch, CURLOPT_CAINFO, 'whateverCrt.crt');