As I understand it, curl uses the cookiefile parameter to read the cookies, and cookiejar to save them once the curl session is completed.
Typical examples for using this indicate a file must be used, but I don't want to need manual cleanup of these leftover bits.
For example, if I set the cookiejar to a file using tempnam, I will inevitably end up with a directory full of little cookiejars that I will need to clean up.
IF the user properly logs out, I can of course delete this temp file, but I'm counting on a majority of the users just closing the browser window and the session (eventually) expiring, leaving me with no way to delete the cookie jar automatically.
My best idea thus far is to splash a cookiejar into the temp folder, read it into a session variable, and then delete the cookiejar file every time curl is used.
Previous users' implementations is to obviate the cookiejar by parsing header information, but this is a little more involved that I want to get.
Decided to go the temp-file wraparound method. Assuming your curl handler is named $c:
//Put down the cookieJar
$cookieJar = tempnam(sys_get_temp_dir(),"cookie-");
if (isset($_SESSION['c_Cookies'])) file_put_contents($cookieJar, $_SESSION['c_Cookies']);
curl_setopt($c, CURLOPT_COOKIEJAR, $cookieJar);
curl_setopt($c, CURLOPT_COOKIEFILE, $cookieJar);
and at the end of the script:
//Always destroy curl, just in case...
curl_close ($c);
unset($c);
//And pickup the cookieJar
$_SESSION['c_Cookies'] = file_get_contents($cookieJar);
unlink($cookieJar);
This of course assumes the system temp directory is writable by whoever is running php. It should ensure that the cookiejar is always deleted at the end of the script, so long as said script does not terminate prematurely.
Instead of parsing them, you could just pass-through the headers between your clients and the other server. Just remember to add a regex replace for the "domain=[^;]+" part in the set-cookie case.
Related
I would like to know if it's possible to add a cookie with name, value, domain, path, secure, http only and expiry before exec the curl.
I'm looking for it and what I found was only some ways to set the name and value of the cookie. And I also found a lot of ways to add it by using a file, but I would like to add the cookie without the file.
Another question related to the topic:
If I init the curl to make a GET request and then without close the curl I make a POST. Is it possible to use the cookies that the GET request has received to make the POST (without file)?
You can do this using the CURLOPT_COOKIE cookie option with curl_setopt. Example:
<?php
curl_setopt($ch, CURLOPT_COOKIE, "<cookie-name>=<cookie-value>; Domain=<domain-value>; Path=<path-value>; Secure; HttpOnly; Expires=<date>");
more on how to format the cookie header can be found on the man docs https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie
For your second question you simply need to set the CURLOPT_COOKIEFILE to an empty string to enable cookie
<?php
curl_setopt($curl, CURLOPT_COOKIEFILE, "");
I was wondering if someone knew the equivalent of doing (from terminal):
curl --cookie "session_id=12345" http://www.example.com
Using CURL in php. I would prefer to do it without using a cookies.txt file by just doing the php curl calls by passing a cookie key/value pair. Please let me know if this makes sense, otherwise I can clarify further. I'm using this to connect to an API that requires sending a session variable via a cookie.
MORE CLARIFICATION:
The spec specifies this...
"The first thing that has to be done is to login. The response has a session id in it. This should be stored and used for subsequent calls. This should be added as a cookie, session_id, for further calls into the API."
You want CURLOPT_COOKIE as specified in the curl_setops page.
$ch = curl_init('http://www.example.com');
curl_setopt($ch, CURLOPT_COOKIE, 'session_id=12345');
curl_exec($ch);
curl_close($ch);
For multiple cookies, separate with a semicolon and a space:
curl_setopt($ch, CURLOPT_COOKIE, 'session_id=12345; fruit=apple');
You may be looking for the following flags:
CURLOPT_COOKIESESSION
And:
CURLOPT_COOKIE
CURLOPT_COOKIEFILE
CURLOPT_COOKIEJAR
Good evening!
I've an script in PHP which makes a CURL call to a remote host loggin page.
After loggin in and keeping the session via cookiejar opt and cookiefile opt, I use the same CURL connection handler to loggin in on to the immediatly next page wich needs an upload.
When it's done, I got the full session parameters and I can call any page I want from the site, but IN CURL!
The idea, is that this script wich uses CURL, needs to finally be redirected to one of those pages in the remote host using the CURL session, but this is not possible, because from curl you can not show the results as a redirected page.
So I've tried alot of options. None of em works at all.
Schema:
PHP script on a local server.
Call to domain.com/loggin.php (creates curl ch)
Keep curl session on cookie.txt file.
Call to domain.com/loggin_2.php with the same ch (non closed last one).
Full logged in on the remote site.
Back to the PHP script. Need to redirect to domain.com/index.php, wich needs Session variables filled in with the full login process.
What to do then?
1) After having full loggin in, read cookies.txt file to get PHPSESSID.
Then tried to use setcookie(), or via header("Set-cookie: ...") and immediatly after, using header("Location: domain.com/index.php").
Doesn't work.
2) Tried same thing via ajax call and finally document.cookie = ...
Doesn't work.
3) Adding a third cURL call to a file in my remote host wich prints a JSONED $_SESSION.
Getting it on my PHP script, decoding it and loaded on my local session via foreach on any array value (foreach()...$_SESSION[$c] = $v).
Added a session_start() before this foreach. And immediatly after, a header("Loaction: domain.com/index.php").
Doesn't work.
4) Added a session_write_close() before the header("Loaction: domain.com/index.php").
Doesn't work.
So I don't really know how to use the CURL session.
I've tried to manually fix the PHPSESSID via Web Developer Firefox plugin. And I wrote down the curl generated session id. It perfeclty works. So, It should be possible to fix it via scripting on my php script! But I can't!
Give me a hand, please!
Thanks!
I may have gotten lost a bit, but I think I understand.
You can use
CURLOPT_HEADER for some debugging (will contain current redirected page info)
and CURLOPT_FOLLOWLOCATION like so:
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://domain.com/login.php');
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
I also use
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
to return as a string, which is much more useful for debugging, or parsing.
URL1: https://duapp3.drexel.edu/webtms_du/
URL2: https://duapp3.drexel.edu/webtms_du/Colleges.asp?Term=201125&univ=DREX
URL3: https://duapp3.drexel.edu/webtms_du/Courses.asp?SubjCode=CS&CollCode=E&univ=DREX
As a personal programming project, I want to scrape my University's course catalog and provide it as a RESTful API.
However, I'm running into the following issue.
The page that I need to scrape is URL3. But URL3 only returns meaningful information after I visit URL2 (it sets the term there Colleges.asp?Term=201125), but URL2 can only be visited after visiting URL1.
I tried monitoring the HTTP data going to and fro using Fiddler and I don't think they are using cookies. Closing the browser instantly resets everything, so I suspect they are using Session.
How can I scrape URL 3? I tried, programatically, visiting URLs 1 and 2 first, and then doing file_get_contents(url3) but that doesn't work (probably because it registers as three different sessions.
A session needs a mechanism to identify you as well. Popular methods include: cookies, session id in URL.
A curl -v on URL 1 reveals a session cookie is indeed being set.
Set-Cookie: ASPSESSIONIDASBRRCCS=LKLLPGGDFBGGNFJBKKHMPCDA; path=/
You need to send this cookie back to the server on any subsequent requests to keep your session alive.
If you want to use file_get_contents, you need to manually create a context for it with stream_context_create for to include cookies with the request.
An alternative (which I would personally prefer) would be to use curl functions conveniently provided by PHP. (It can even take care of the cookie traffic for you!) But that's just my preference.
Edit:
Here's a working example to scrape the path in your question.
$scrape = array(
"https://duapp3.drexel.edu/webtms_du/",
"https://duapp3.drexel.edu/webtms_du/Colleges.asp?Term=201125&univ=DREX",
"https://duapp3.drexel.edu/webtms_du/Courses.asp?SubjCode=CS&CollCode=E&univ=DREX"
);
$data = '';
$ch = curl_init();
// Set cookie jar to temporary file, because, even if we don't need them,
// it seems curl does not store the cookies anywhere otherwise or include
// them in subsequent requests
curl_setopt($ch, CURLOPT_COOKIEJAR, tempnam(sys_get_temp_dir(), 'curl'));
// We don't want direct output by curl
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Then run along the scrape path
foreach ($scrape as $url) {
curl_setopt($ch, CURLOPT_URL, $url);
$data = curl_exec($ch);
}
curl_close($ch);
echo $data;
Although "slightly" related to a previous question, it is different. How "secure" is this code in terms of cURL? Are there any other "bits" that should/ought to be added. Note it is not being used to pass "sensitive" info.
$ch = curl_init("http://www.example.com/test.xml");
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, 0);
$data = curl_exec($ch);
curl_close($ch);
Few things to note:
You should try/catch in case the http://www.example.com/test.xml gives an error, such as 404 or 500. In that case you probably want to raise a fatal error or have it dealt with in your app.
You should calculate the amount of data coming over the line. What if example.com decides (or is broken into) and test.xml becomes several gigabytes large? You app needs to deal with this, somehow.
You probably want to include some 30X header/ redirect logic. curl does follow a redirect, but in that case, you probably want the redirect logged so you can take measures in your app (change the location to the new location)
You should make very sure that curl_close() is always called. In case of fatal errors, memory overflowing and so on, you certainly don't want these sockets to remain opened.
Your code is not insecure, nor is it wrong. It just does not handle edge-cases and could be hardened.