How can I make cURL to get all cookies?
I thought maybe firefox gets different cookies as the page loads or it has some built-in javascript that sets some cookies after the page is loaded, or maybe it redirects to other pages and other pages set other cookies, but I don't know how to make curl do the same thing. I set curl to follow redirects but still no success. Curl does sets some cookies but not all of them.
following is the code I use in php:
$url = 'https://www.example.com';
$handle = curl_init($url);
curl_setopt($handle, CURLOPT_COOKIESESSION, true);
curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
curl_setopt($handle, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($handle, CURLOPT_COOKIEJAR, "cookies.txt");
curl_setopt($handle, CURLOPT_COOKIEFILE, "cookies.txt");
curl_setopt($handle, CURLOPT_AUTOREFERER, true);
curl_setopt($handle, CURLOPT_USERAGENT, 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)');
$htmlContent = curl_exec($handle);
Following is from Live HTTP header in Firefox
https://www.example.com
GET /index.ext HTTP/1.1
Host: www.example.com User-Agent:
Mozilla/5.0 (Macintosh; U; Intel Mac
OS X 10.6; en-US; rv:1.9.2.10)
Gecko/20100914 Firefox/3.6.10
Accept:
text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset:
ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Cookie:
JSESSIONID=3E85C5D0436D160D0623C085F68DC50E.catalog2;
__utma=137925942.1883663033.1299196810.1299196810.1299198374.2; __utmz=137925942.1299196810.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);
citrix_ns_id=0pQdumY48kxToPcBPS/QQC+w2vAA1;
__utmc=137925942
HTTP/1.1 200 OK
Date: Fri, 04 Mar 2011 01:20:30 GMT
Server: Apache/2.2.15
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html;charset=UTF-8
I only get JSESSIONID with curl
Please help!
possibly page you are loading has some other content that actually sets cookies and since ou are only rading one page you don't get them, or some cookies are set through javascript.
Try using a Firefox user agent on CURL and see if you get the same amount of cookies. You should.
Use a network sniffer or a proxy to compare requests and responses, you have differences for sure. Post the requests and responses here if you still can't find.
If faking the user agent on curl side does not work, try to do the opposite by installing a firefox extension which fakes the user agent, and set it to the one used by curl. If it works, it may be some passive browser fingerprinting (such as p0f by lcamtuf) relying on network timing, and you may have hard time to workaround it. Would be extremely surprising though!
I figured it out. It was actually JavaScript that set cookies after the page was loaded:)
Thanks everybody
Related
I want to get content of this page by php curl:
my curl sample:
function curll($url,$headers=null){
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
if ($headers){
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
}
curl_setopt($ch, CURLOPT_ENCODING, '');
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0');
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLINFO_HEADER_OUT, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_TIMEOUT, 60);
$response = curl_exec($ch);
$res['headerout'] = curl_getinfo($ch,CURLINFO_HEADER_OUT);
$res['rescode'] = curl_getinfo($ch, CURLINFO_HTTP_CODE);
if ($response === false) {
$res['content'] = $response;
$res['error'] = array(curl_errno($ch),curl_error($ch));
return $res;
}
$header_size = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
$res['headerin'] = substr($response, 0, $header_size);
$res['content'] = substr($response, $header_size);
return $res;
}
response:
array (size=4)
'headerout' => string 'GET /wallets HTTP/1.1
Host: www.cryptocompare.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Encoding: br
Accept-Language: en-US,en;q=0.5
Connection: keep-alive
Upgrade-Insecure-Requests: 1
' (length=327)
'rescode' => string '200' (length=3)
'content' => boolean false
'error' =>
array (size=2)
0 => int 23
1 => string 'Unrecognized content encoding type. libcurl understands deflate, gzip content encodings.' (length=88)
response encoding is br and response content is false
I am aware that using gzip or deflate as encoding would get me a content. However, the content that I have in mind is only shown by br encoding.
I read on this page that Curl V7.57.0 supports the Brotli Compression Capability. I currently have version 7.59.0 installed, but Curl encounters an error as it recieves content in br encoding.
now I want to know how can I get content of a page with br encoding and uncompress it by php curl ?
I had the exact same issue because one server was only able to return brotli and my PHP Curl-bundled version didn't support Brotli. I had to use a PHP extension: https://github.com/kjdev/php-ext-brotli
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'URL');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$output_brized = curl_exec($ch);
$output_ok = brotli_uncompress($output_brized);
I checked and, with PHP 7.4.9 on Windows with bundled Curl version 7.70.0, setting the CURLOPT_ENCODING option to '' (like you did) forced the bundled Curl to do the request with one additionnal header accept-encoding: deflate, gzip which are the content encodings the bundled Curl can decode. If I omited this option, there was just 2 headers: Host: www.google.com and accept: */*.
Indeed, searching the PHP source code (https://github.com/php/php-src/search?q=CURLOPT_ENCODING) for this CURLOPT_ENCODING option lead to nothing that may set a default value or change value from PHP. PHP sends the option value to Curl without altering it so what I am observing is the default behavior of my bundled Curl version.
I then discovered Curl supports Brotli from version 7.57.0 (https://github.com/curl/curl/blob/bf1571eb6ff24a8299da7da84408da31f0094f66/docs/libcurl/symbols-in-versions) from november 2018 (https://github.com/curl/curl/blob/fd1ce3d4b085e7982975f29904faebf398f66ecd/docs/HISTORY.md) but requires to be compiled with a --with-brotli flag (https://github.com/curl/curl/blob/9325ab2cf98ceca3cf3985313587c94dc1325c81/configure.ac) which was probably not used for my PHP version.
Unfortunately, there is no curl_getopt() function to get the default value of an option. But, phpinfo() gives a valuable info as I got a BROTLI => No line which confirms my version was not compiled with Brotli support. You may want to check your phpinfo to find out if your Curl-bundled version should support Brotli. If it doesn't, use my solution. If it does, more investigation need to be done to find out if it's a bug or a misuse.
If you want to know what your Curl sent, you have to use a proxy like Charles/Fiddler or use Curl verbose mode.
Additionnaly, for the sake of completness, in the HTTP1/1 specs (https://www.rfc-editor.org/rfc/rfc2616#page-102), it's said:
If an Accept-Encoding field is present in a request, and if the
server cannot send a response which is acceptable according to the
Accept-Encoding header, then the server SHOULD send an error response
with the 406 (Not Acceptable) status code.
If no Accept-Encoding field is present in a request, the server MAY
assume that the client will accept any content coding.
So, if your PHP version behaved the same as mine, the website should have received a Accept-Encoding not containing br so should NOT have replied with a br content and, instead, should have replied with a gzip or deflate content or, if it was not able to do so, replied with a 406 Not Acceptable instead of a 200.
if you using cloudflare, then you can try to disable brotli extension from cloudflare.
There is an addon for firefox called httprequester. (https://addons.mozilla.org/en-US/firefox/addon/httprequester/)
When I use the addon to send a GET request with a specific cookie, everything works fine.
Request header:
GET https://store.steampowered.com/account/
Cookie: steamLogin=*removed because of obvious reasons*
Response header:
200 OK
Server: Apache
... (continued, not important)
And then I am trying to do the same thing with cURL:
$ch = curl_init("https://store.steampowered.com/account/");
curl_setopt($ch, CURLOPT_VERBOSE, 1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, array("Cookie: steamLogin=*removed because of obvious reasons*"));
curl_setopt($ch, CURLINFO_HEADER_OUT, 1);
$response = curl_exec($ch);
$request_header = curl_getinfo($ch, CURLINFO_HEADER_OUT);
echo "<pre>$request_header</pre>";
echo "<pre>$response</pre>";
Request header:
GET /account/ HTTP/1.1
Host: store.steampowered.com
Accept: */*
Cookie: steamLogin=*removed because of obvious reasons*
Response header:
HTTP/1.1 302 Moved Temporarily
Server: Apache
... (continued, not important)
I don't know if it has anything to do with my problem, but a thing I noticed is that the first lines of the request headers are different
GET https://store.steampowered.com/account/
and
GET /account/ HTTP/1.1
Host: store.steampowered.com
My problem is that I get 200 http code with the addon and 302 with curl, however I'm sending (or trying to send) the same request.
The page is doing some redirect, so you must follow it
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
If i really understand your problem, the thing is cURL is not following the redirect. He don't do that by default, you need to set a option:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
With this, cURL is able to follow the redirects.
To set the Cookies to the request use, (You may need pass the user agent):
curl_setopt($ch, CURLOPT_COOKIE, "Cookie: steamLogin=*removed because of obvious reasons*; User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0");
I think your addon sends the useragent string by default from the browser. If you add useragent string with your curl request, I believe your problem will resolve!
curl_setopt($ch, CURLOPT_HTTPHEADER, array(
"Cookie: steamLogin=*removed because of obvious reasons*",
"User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:43.0) Gecko/20100101 Firefox/43.0"
));
My URL "www.example.com" is working in browser but when I get response via curl of URL "www.example.com" I get 503 service unavailable response.
I used the following code:
$url = 'http://www.example.com';
$curl_handle = curl_init();
curl_setopt($curl_handle, CURLOPT_URL, $url);
curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, 0);
curl_setopt($curl_handle, CURLOPT_TIMEOUT, 0);
curl_setopt($curl_handle, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($curl_handle, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($curl_handle, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, TRUE);
$JsonResponse = curl_exec($curl_handle);
$http_code = curl_getinfo($curl_handle);
print_r($http_code);die;
I'm pretty sure the remote server requires specific HTTP headers (cookies for example), like a session token or a language preference.
You have to analyze the HTTP traffic sent from your browser to the remote server and find the required HTTP headers yourself. I recommend a tool like Fiddler.
An example:
GET / HTTP/1.1
Host: www.example.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:27.0) Gecko/20100101 Firefox/27.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Cookie: foo=bar
Connection: keep-alive
Assuming the remote server requires clients to send a cookie with the name foo, he will probably send you a 503 or 400 error message in the case you omit it. You have to send the cookie from cURL as well in order to get a successful response, acting like a regular client.
I am using the following PHP code to send a GET request with specific headers and a cookie:
$getheader = array(
"Accept: text/html, application/xhtml+xml, */*",
"Accept-Language: en-US",
"User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)",
"Accept-Encoding: gzip, deflate",
"Host: mysite.com",
"Connection: Keep-Alive"
);
curl_setopt($ch, CURLOPT_URL, 'http://mysite.com');
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_HTTPHEADER, $getheader);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST,0);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER,false);
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt'); //read from the cookie
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_exec($ch);
It's working fine but Header is sent in wrong order like the following:
GET http://mysite.com/ HTTP/1.1
Cookie: remember_me=1; id=9089018083 <------ this line should be at the end
Accept: text/html, application/xhtml+xml, */*
Accept-Language: en-US
User-Agent: Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)
Accept-Encoding: gzip, deflate
Host: mysite.com
Connection: Keep-Alive
The cookie should be sent after the header (As what web browsers do) but in my case I don't know what's wrong. can you please help?
Thanks
Why "should" it be at the bottom?
The HTTP RFC states, amongst other things:
that the capitalization of header keys has no importance
that the order of the headers has no importance
All this is stated quite clearly in RFC 2616 (HTTP 1.1): page 31:
The order in which header fields with differing field names are
received is not significant. However, it is "good practice" to send
general-header fields first, followed by request-header or response-
header fields, and ending with the entity-header fields.
So, whilst curl is not producing the output you expect, it is not doing anything wrong. The order is arbitrary, and the reason curl does this is because it will process the cookie jar first, and then allow you to override any headers you like at the very end using the HEADERS setting.
So, really, if your code is picky on header order, you need to teach your code to not worry about them, as a wide variety of browsers will send a different header order. Ultimately, be lenient on reception, strict on emission.
I was trying to login to a site using the PHP Curl library. Even though i have ben successfully logged in, i cant seem to access any other pages beyond the login page. Now i know there could be some issue with cookies but trust me, ive tried all possible combinations with COOKIEJAR and COOKIEFILE.
I needed some help with analyzing this set of LiveHTTPHeaders info. Im worried about the post fields- particularly the Login.x and the Login.y. They seem to change on every login. Could that be an issue? How do i figure out the way a random integer is being assigned to this value? Also, are more than 1 cookies being added? If so, how do i incorporate that into curl? Do i use one COOKIEJAR, multiple or name number of cookies in a single statement..
Ive pasted the Headers below-
http://amizone.net/Amizone/default.aspx
POST /Amizone/default.aspx HTTP/1.1
Host: amizone.net
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection: keep-alive
Referer: http://amizone.net/
Cookie: ASPSESSIONIDSSBCDQAQ=FJHPMILBALMDGIFEOOOBNFHI
Content-Type: application/x-www-form-urlencoded
Content-Length: 55
username=1596681&password=CENSORED&Login.x=14&Login.y=15
I will only post the cURL code if needed.
LiveHTTPHeaders info for HOME PAGE:
GET /amizone/default.aspx HTTP/1.1
Host: amizone.net
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection: keep-alive
LiveHTTPHeaders info for LOGIN: ** Shown on top. No changes.
LiveHTTPHeaders info for ANY PAGE ACCESS AFTER LOGIN--
GET /amizone/WebForms/TimeTable/StudentTimeTableForWeek.aspx HTTP/1.1
Host: amizone.net
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0.2) Gecko/20100101 Firefox/6.0.2
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Connection: keep-alive
Referer: http://amizone.net/amizone/WebForms/Home.aspx
Cookie: ASP.NET_SessionId=hn5mfsre0y3b1l45nxlgzr55; UserId=127953D3849DEF71FB6CF9F55DD3BBADE48E686D24ADC87923FB6C60077ECC0362AB0C5A9C4DF194461C348DBAE6FEC861827F886FE2C17EA79155500CA4FC04EE897B7658A59DA2F286F2436F6EDD07BE2DD7DD829798F4C81ABAEFEE400B3A71078A74BF1C169BF1DA2865CC9E5968FF26ED7D; countrytabs=0; countrytabsRT=0; countrytabsRB=0
***Notice how multiple cookies are sent in this case (i think). How should my cookiejar and cookiefile commands change?
When recording a session it is important that you first flush all cookies and then make sure you note when cookies are set by the server.
Very often, the required cookies are set in the login page or another page that the browser loads first, and then when you POST to the particular URL the previously set cookies must be passed on.
So, the attached trace is insufficient.
This cURL code has been sufficient for me in the past to maintain login sessions by storing cookies:
$ch = curl_init('https://somesecureurl.com/login/account');
curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_ENCODING, 'gzip, deflate');
curl_setopt($ch, CURLOPT_COOKIEFILE, '/tmp/hmcookies.txt'); // cookies in this file are sent by curl with the request
curl_setopt($ch, CURLOPT_COOKIEJAR, '/tmp/hmcookies.txt'); // upon completing request, curl saves/updates any cookies in this file
$data = curl_exec($ch);
Other things to ensure, the cookiejar file is writable by the webserver, or it has the permission to create the file.
As also stated by Daniel, some sites may require that you first visit a login page to get some initial cookies set, and then post the login form. So your requests may go:
Request login page
Post to login form
Try to access protected page