Usually, when using a browser, session cookies expire when browser window closed.
But when using (php) cURL (and set COOKIE_FILE and COOKIE_JAR options), how long do they keep alive?
According to mozilla.org:
session cookie [...] is deleted when the client shuts down, because it didn't specify an Expires or Max-Age directive. However, web browsers may use session restoring, which makes most session cookies permanent, as if the browser was never closed.
According to the documentation of curl_setopt function:
By default, libcurl always stores and loads all cookies, independent if they are session cookies or not. Session cookies are cookies without expiry date and they are meant to be alive and existing for this "session" only.
If you save a cookie in a specified file with
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'https://stackoverflow.com');
curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__) . '/cookie.txt');
$output = curl_exec($ch);
curl_close($ch);
Then, from the client perspective, the session will be active as long as CURLOPT_COOKIEJAR is set with the right cookie. This is a choice of your script.
Related
As I understand it, curl uses the cookiefile parameter to read the cookies, and cookiejar to save them once the curl session is completed.
Typical examples for using this indicate a file must be used, but I don't want to need manual cleanup of these leftover bits.
For example, if I set the cookiejar to a file using tempnam, I will inevitably end up with a directory full of little cookiejars that I will need to clean up.
IF the user properly logs out, I can of course delete this temp file, but I'm counting on a majority of the users just closing the browser window and the session (eventually) expiring, leaving me with no way to delete the cookie jar automatically.
My best idea thus far is to splash a cookiejar into the temp folder, read it into a session variable, and then delete the cookiejar file every time curl is used.
Previous users' implementations is to obviate the cookiejar by parsing header information, but this is a little more involved that I want to get.
Decided to go the temp-file wraparound method. Assuming your curl handler is named $c:
//Put down the cookieJar
$cookieJar = tempnam(sys_get_temp_dir(),"cookie-");
if (isset($_SESSION['c_Cookies'])) file_put_contents($cookieJar, $_SESSION['c_Cookies']);
curl_setopt($c, CURLOPT_COOKIEJAR, $cookieJar);
curl_setopt($c, CURLOPT_COOKIEFILE, $cookieJar);
and at the end of the script:
//Always destroy curl, just in case...
curl_close ($c);
unset($c);
//And pickup the cookieJar
$_SESSION['c_Cookies'] = file_get_contents($cookieJar);
unlink($cookieJar);
This of course assumes the system temp directory is writable by whoever is running php. It should ensure that the cookiejar is always deleted at the end of the script, so long as said script does not terminate prematurely.
Instead of parsing them, you could just pass-through the headers between your clients and the other server. Just remember to add a regex replace for the "domain=[^;]+" part in the set-cookie case.
I have a thought - is there a possibility to get cookies from a curl POST request to a particular page?
I mean, I create the curl request with post fields, and then get the response of the page - but if the page has some cookies set as HttpOnly I won't be able to see them, right?
cUrl has the possibility to save cookies in a txt file, that's correct I did it and that is fantastic - but how can a split that .txt to save each cookie in a database ?
You can store cookies in .TXT which each time will have different name like Time/Date or you can do this.
$dir = dirname(__file__);
$config['cookie_file'] = $dir . '/cookies/' . md5($_SERVER['REMOTE_ADDR']) . 'txt';
This will save all cookies with User's IP in MD5 to COOKIES Folder.
or for directly storing into database.
$cookie = tempnam('cookie',rand(000000000,999999999));
and the after each request, you can send cookie to Database.
TIP:
cURL(cookie)
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cookie);
SQL(storing)
Use INSERT for adding cookies to database.
URL1: https://duapp3.drexel.edu/webtms_du/
URL2: https://duapp3.drexel.edu/webtms_du/Colleges.asp?Term=201125&univ=DREX
URL3: https://duapp3.drexel.edu/webtms_du/Courses.asp?SubjCode=CS&CollCode=E&univ=DREX
As a personal programming project, I want to scrape my University's course catalog and provide it as a RESTful API.
However, I'm running into the following issue.
The page that I need to scrape is URL3. But URL3 only returns meaningful information after I visit URL2 (it sets the term there Colleges.asp?Term=201125), but URL2 can only be visited after visiting URL1.
I tried monitoring the HTTP data going to and fro using Fiddler and I don't think they are using cookies. Closing the browser instantly resets everything, so I suspect they are using Session.
How can I scrape URL 3? I tried, programatically, visiting URLs 1 and 2 first, and then doing file_get_contents(url3) but that doesn't work (probably because it registers as three different sessions.
A session needs a mechanism to identify you as well. Popular methods include: cookies, session id in URL.
A curl -v on URL 1 reveals a session cookie is indeed being set.
Set-Cookie: ASPSESSIONIDASBRRCCS=LKLLPGGDFBGGNFJBKKHMPCDA; path=/
You need to send this cookie back to the server on any subsequent requests to keep your session alive.
If you want to use file_get_contents, you need to manually create a context for it with stream_context_create for to include cookies with the request.
An alternative (which I would personally prefer) would be to use curl functions conveniently provided by PHP. (It can even take care of the cookie traffic for you!) But that's just my preference.
Edit:
Here's a working example to scrape the path in your question.
$scrape = array(
"https://duapp3.drexel.edu/webtms_du/",
"https://duapp3.drexel.edu/webtms_du/Colleges.asp?Term=201125&univ=DREX",
"https://duapp3.drexel.edu/webtms_du/Courses.asp?SubjCode=CS&CollCode=E&univ=DREX"
);
$data = '';
$ch = curl_init();
// Set cookie jar to temporary file, because, even if we don't need them,
// it seems curl does not store the cookies anywhere otherwise or include
// them in subsequent requests
curl_setopt($ch, CURLOPT_COOKIEJAR, tempnam(sys_get_temp_dir(), 'curl'));
// We don't want direct output by curl
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Then run along the scrape path
foreach ($scrape as $url) {
curl_setopt($ch, CURLOPT_URL, $url);
$data = curl_exec($ch);
}
curl_close($ch);
echo $data;
I would like to login to a site, so the first time I request a page, it redirects me to another page setting the cookies.
I am following a tutorial where they specify doing this
$cookie = '/tmp/cookies.txt';
curl_setopt($ch, CURLOPT_COOKIE, $cookie);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
But when i check http live headers, the server passes cookie information to set my cookies.
But i don't see it doing anything. When I examine the cookies, those values aren't there.
So do I have to specify another path for $cookie?
You've to use CURLOPT_COOKIEFILE for sending cookies instead of CURLOPT_COOKIE.
From the docs for function curl_setopt():
CURLOPT_COOKIE
The contents of the "Cookie: " header to be used in the HTTP request. Note that multiple cookies are separated with a semicolon followed by a space (e.g., "fruit=apple; colour=red")
CURLOPT_COOKIEFILE
The name of the file containing the cookie data. The cookie file can be in Netscape format, or just plain HTTP-style headers dumped into a file.
CURLOPT_COOKIEJAR
The name of a file to save all internal cookies to when the handle is closed, e.g. after a call to curl_close.
I'm using curl to get the contents of a webpage.. The website sets cookies when i visit them using browser..
Can i use the cURL same way and send a request to that specific website with the cookie information...????
Here are some of the option I found useful regarding curl and cookies.
curl_setopt($ch, CURLOPT_COOKIEFILE, 'cookie.txt' ); //use this cookie file
curl_setopt($ch, CURLOPT_COOKIEJAR, '/tmp/cookie.jar'); //if you close the session the cookies will be saved here
curl_setopt($ch, CURLOPT_COOKIE,"cookie_test=yes; domain=.google.com; path=/"); //set the cookie for the current session