I currently learning to web scraping an asynchronous website. First, I need to get the cookie. I'm using the code below to save the cookie to a txt file. But it not save the cookie when I run it. When I access the file, it's empty. I don't know where my problem is, because you know I still a noob in this thing. Hope you guys can answer it. Thanks for your time.
$cookie_file_path = dirname(__FILE__) . "/cookie.txt";
$ch = curl_init();
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_URL, url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.100 Safari/537.36');
curl_setopt($ch, CURLOPT_TIMEOUT, 40);
curl_exec($ch);
curl_close($ch);
Related
I'm trying to get a Captcha(old, the image one) image from a web page. But, I know it always changes and being regenerated on every HTTP request. But I can't get the image via cURL.
I've tried this with this code in PHP:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://example.com/login.aspx');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_COOKIESESSION, true);
curl_setopt($ch, CURLOPT_FRESH_CONNECT, true);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.2309.372 Safari/537.36");
curl_setopt($ch, CURLOPT_NETRC, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
$data = curl_exec($ch);
curl_close($ch);
Image just comes as empty. There is field like Captcha but nothing is written on it. I couldn't understand if there is a difference between browser request or cURL request.
I want to get the html content from this link
https://store.nike.com/in/en_gb/pw/boys-shoes/7pvZoi3
and for this i have created the below curl request php script
$ua = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/525.13 (KHTML, like Gecko) Chrome/0.A.B.C Safari/525.13';
$ch = curl_init();
curl_setopt($ch,CURLOPT_URL, 'https://store.nike.com/in/en_gb/pw/boys-shoes/7pvZoi3');
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,true);
curl_setopt($ch, CURLOPT_USERAGENT, $ua);
curl_setopt($ch, CURLOPT_COOKIE, '<Pasted_cookie>');
curl_setopt($ch, CURLOPT_AUTOREFERER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, 20);
$result = curl_exec($ch);
$last = curl_getinfo($ch, CURLINFO_EFFECTIVE_URL);
curl_close($ch);
print_r($result);
But the above script redirects me to the page that is showing me a screen to select region.
Please help me as what i need to change to make the script work.
Thanks.
To set location, there is always a network call that set your location in cookies or somewhere else, totally web dependent.
What you can do is, find that call, first mock the location call to set location then hit the main page with same cookies.
I am trying to retrieve the HTML from a user profile on Instagram using cURL.
I am new to cURL so do not know the cause of this error.
Nothing happens when the cURL is executed , the page seems to refresh?
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://www.instagram.com/zohebchaudhry1/');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookiess.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, 'cookiess.txt');
curl_setopt($ch ,CURLOPT_TIMEOUT , 10);
curl_setopt( $ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36" );
$html = curl_exec($ch);
curl_close($ch);
echo $html;
above is the PHP cURL code.
It appears that cURL is working, however you're unable to see the output because printing HTML may not be desired.
I suggest replacing echo $html; with echo htmlentities($html);
Read more: php.net/htmlentities
ima having a problem with login via curl function......
My problem is that it would like to be able to login without the cookie.txt.......
because if i remove cookie.txt i cant login........ when cookie.txt is there it logins successfully, but i would like to login without using cookies....... i tried unlinking cookie.txt but as i said i cant login then......
PART OF THE CODE
$ret=false;
$useragent = "Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.10";
$data = setData($email,$pass);
$ch = curl_init('https://www.website.com/login.php');
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_ENCODING , "gzip,deflate");
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_COOKIESESSION, 1);
curl_setopt($ch, CURLOPT_TIMEOUT, 40);
curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_COOKIEFILE, dirname(__FILE__) . '/cookie.txt');
curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__) . '/cookie.txt');
$source=curl_exec($ch);
$info=curl_getinfo($ch);
if($info["redirect_count"]==1)
{
$ret=true;
}
You can't loging without using cookies, neither via curl, nor via browser (unless the site you are logging to implements a different mechanism to save the session id, for example as part of the urls for example, but this is rarely the case and it doesn't depend on you). The reason is that without the cookie the server can't know that the request comes from you and not from someone else.
Facebook doesn't implement a login system that doesn't use cookies, so you can't.
Trying to automate login to a ASP.NET site using PHP & cURL but running into a cookie problem.
When I check in the browser, initial login page stores 5 cookies. Which are ASP.NET_SessionId, __utma, __utmb, __utmc & __utmz
When this page is accessed via cURL the cookie file is storing only one cookie: "ASP.NET_SessionId"
I referred to many posts & tried all kinds of cURL option combinations returning the same result.
I don't know how ASP.NET cookies work or differ from PHP. Any help is appreciated.
Here is my php code:
$cookie_file_path = "tmp/cookie.txt";
$LOGINURL = "https://godaddy.com";
$agent = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.89 Safari/537.1";
$ch = curl_init();
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_file_path);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie_file_path);
curl_setopt($ch, CURLOPT_URL, $LOGINURL);
$content = curl_exec($ch);
curl_close($ch);
echo '<textarea style="width:1000px; height:300px">'.$content.'</textarea>';
__utma, __utmb, __utmc & __utmz are all Google Analytics cookies stored by javascript, thus being created client side.
So no way of processing them through cURL / PHP