I was wondering if someone knew the equivalent of doing (from terminal):
curl --cookie "session_id=12345" http://www.example.com
Using CURL in php. I would prefer to do it without using a cookies.txt file by just doing the php curl calls by passing a cookie key/value pair. Please let me know if this makes sense, otherwise I can clarify further. I'm using this to connect to an API that requires sending a session variable via a cookie.
MORE CLARIFICATION:
The spec specifies this...
"The first thing that has to be done is to login. The response has a session id in it. This should be stored and used for subsequent calls. This should be added as a cookie, session_id, for further calls into the API."
You want CURLOPT_COOKIE as specified in the curl_setops page.
$ch = curl_init('http://www.example.com');
curl_setopt($ch, CURLOPT_COOKIE, 'session_id=12345');
curl_exec($ch);
curl_close($ch);
For multiple cookies, separate with a semicolon and a space:
curl_setopt($ch, CURLOPT_COOKIE, 'session_id=12345; fruit=apple');
You may be looking for the following flags:
CURLOPT_COOKIESESSION
And:
CURLOPT_COOKIE
CURLOPT_COOKIEFILE
CURLOPT_COOKIEJAR
Related
I would like to know if it's possible to add a cookie with name, value, domain, path, secure, http only and expiry before exec the curl.
I'm looking for it and what I found was only some ways to set the name and value of the cookie. And I also found a lot of ways to add it by using a file, but I would like to add the cookie without the file.
Another question related to the topic:
If I init the curl to make a GET request and then without close the curl I make a POST. Is it possible to use the cookies that the GET request has received to make the POST (without file)?
You can do this using the CURLOPT_COOKIE cookie option with curl_setopt. Example:
<?php
curl_setopt($ch, CURLOPT_COOKIE, "<cookie-name>=<cookie-value>; Domain=<domain-value>; Path=<path-value>; Secure; HttpOnly; Expires=<date>");
more on how to format the cookie header can be found on the man docs https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie
For your second question you simply need to set the CURLOPT_COOKIEFILE to an empty string to enable cookie
<?php
curl_setopt($curl, CURLOPT_COOKIEFILE, "");
I need to authenticate my user through a curl script
session_start();
$_POST["username"]= "user";
$_POST["password"]= "password";
$ch = curl_init();
$url = 'signin.php';
curl_setopt($ch,CURLOPT_URL, $url);
curl_setopt($ch,CURLOPT_POST, count($_POST));
curl_setopt($ch,CURLOPT_POSTFIELDS, $_POST);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
$result = json_decode(curl_exec($ch),true);
curl_close($ch);
The signin.php makes another curl call to an api, I made sure that signin.php returns all required information, sets all required session variables, returns an array:
echo json_encode(array(
'success' => true,
'ALLSESSION' => $_SESSION,
'error'=> ""
));
the ALLSESSION is returning the correct session variables, but they are not accessible directly, I mean I cant use $_SESSION["userid"], its not existent in the array of sessions.
How to preserve the session between the 2 pages?
Thanks
The problem is that the client is not remembering/transmitting the PHP session id.
When an HTTP client makes a request to a php script (via an HTTP server), it must include the session id in the request if it wishes to continue a previously started session. This can be done either in the HTTP headers as a cookie or as a URL parameter (named PHPSESSID by default).
If you do not want to use PHP's default session variable name, or if you want to use a POST variable instead of a URL parameter, then you can use any request variable or URL parameter you wish (whether it be GET, POST, or COOKIE), but then you will need to manually interpret this variable on the server-side.
Here are three solutions, in order of most recommended to least recommended.
Turn on cookie support in cUrl or
Pass the session id as a URL parameter or
Pass the session id as a request variable (post/cookie) or a URL parameter that does not use the name expected by PHP, and then manually start the session on the server-side using that session id.
Solution #1: Turn on cookie support in cUrl
PHP uses the session id in the cookie to reload your session data each time you make a request from that client.
In this case, the client is cUrl. You need to setup your cUrl request to allow/use cookies.
This is done by setting the CURLOPT_COOKIEJAR and CURLOPT_COOKIEFILE options.
session_start();
$_POST["username"]= "user";
$_POST["password"]= "password";
$ch = curl_init();
$url = 'signin.php';
//Name of a file to store cookie data in.
//If the file does not exist, it will be created.
//cUrl (or your web server) needs to have write permissions to the folder.
$cookieFile = "/some/writable/folder/filename";
curl_setopt($ch,CURLOPT_URL, $url);
curl_setopt($ch,CURLOPT_POST, count($_POST));
curl_setopt($ch,CURLOPT_POSTFIELDS, $_POST);
curl_setopt($ch,CURLOPT_RETURNTRANSFER,1);
//Tell cUrl about the cookie file
curl_setopt($ch,CURLOPT_COOKIEJAR, $cookieFile); //tell cUrl where to write cookie data
curl_setopt($ch,CURLOPT_COOKIEFILE, $cookieFile); //tell cUrl where to read cookie data from
$result = json_decode(curl_exec($ch),true);
curl_close($ch);
Any subsequent cUrl calls that use $cookieFile for CURLOPT_COOKIEJAR and CURLOPT_COOKIEFILE will have the same session data as prior calls.
Solution #2: Pass the session id in the URL query string using the expected parameter name (PHPSESSID by default, but this can be changed)
You can append the session id to all urls like this:
somepage.php?PHPSESSID=sessionidgoeshere
"PHPSESSID" is the variable name that is used by default in PHP. If the server is setup to use a non-default name, then you would need to use that variable name instead.
With solution #2, you will still need to store the session id on the client-side somehow.
Solution #3: Pass the session id as a request variable or a URL parameter and then manually start the session on the server-side using that session id.
This solution is not recommended for normal situations. Unlike the previous solutions, this one requires changes to the server-side script as well as the client-side (cUrl). This solution is only useful if you specifically want to send the session id as something other than a URL parameter or cookie, or if you want to use a variable name other than the name that the server is expecting.
Place the following code in your server-side PHP that is handling the request, prior to starting the session:
session_id($_POST[<param_name>]); or session_id($_GET[<param_name>]); or session_id($_COOKIE[<param_name>]);
I suggest using Solution #1 unless you have a compelling reason not to.
Also, PHP doesn't care whether the request is a GET or a POST or any other HTTP request method. Regardless of the HTTP request method, if the session id is passed as a URL parameter or in a cookie, then the related session will persist on the server-side.
From everything I've read, it seems that this is an impossible. But here is my scenario:
I need to scrape a table's content containing for sale housing information. The page is not password protected or anything, but you first have to click an "I Agree" link on the previous page so that a cookie gets set saying you agree that the content may not be 100% accurate. You are only then shown the data. Is there any way at all to accomplish this using php/jquery/javascript? I know you cannot create an iframe because of the fact that it is cross-domain. I also do not have access to this other website.
Thanks for any answers, as I'm not really expecting anything positive. :) And many thanks if you can tell me how to do this. :D
Use server side script (PHP using cURL) to crawl the website and return the information you need. Make sure you set the appropriate HTTP header with your request that represents the "I agree" cookie.
Sample:
<?php
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://www.example.com/');
curl_setopt($ch, CURLOPT_COOKIE, 'I_Agree=1');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$responseBody = curl_exec($ch);
curl_close($ch);
// Read the information you need from $responseBody and return it as response body
?>
Now you can access the information from your website by calling your server side script above. For details about how to use cURL take a look at the documentation.
CURL can store or recall cookies from a file depending on the options you set. Here is the "cookiejar" example:
http://curl.haxx.se/libcurl/php/examples/cookiejar.html
Check out the CURLOPT_COOKIEJAR and CURLOPT_COOKIEFILE options
URL1: https://duapp3.drexel.edu/webtms_du/
URL2: https://duapp3.drexel.edu/webtms_du/Colleges.asp?Term=201125&univ=DREX
URL3: https://duapp3.drexel.edu/webtms_du/Courses.asp?SubjCode=CS&CollCode=E&univ=DREX
As a personal programming project, I want to scrape my University's course catalog and provide it as a RESTful API.
However, I'm running into the following issue.
The page that I need to scrape is URL3. But URL3 only returns meaningful information after I visit URL2 (it sets the term there Colleges.asp?Term=201125), but URL2 can only be visited after visiting URL1.
I tried monitoring the HTTP data going to and fro using Fiddler and I don't think they are using cookies. Closing the browser instantly resets everything, so I suspect they are using Session.
How can I scrape URL 3? I tried, programatically, visiting URLs 1 and 2 first, and then doing file_get_contents(url3) but that doesn't work (probably because it registers as three different sessions.
A session needs a mechanism to identify you as well. Popular methods include: cookies, session id in URL.
A curl -v on URL 1 reveals a session cookie is indeed being set.
Set-Cookie: ASPSESSIONIDASBRRCCS=LKLLPGGDFBGGNFJBKKHMPCDA; path=/
You need to send this cookie back to the server on any subsequent requests to keep your session alive.
If you want to use file_get_contents, you need to manually create a context for it with stream_context_create for to include cookies with the request.
An alternative (which I would personally prefer) would be to use curl functions conveniently provided by PHP. (It can even take care of the cookie traffic for you!) But that's just my preference.
Edit:
Here's a working example to scrape the path in your question.
$scrape = array(
"https://duapp3.drexel.edu/webtms_du/",
"https://duapp3.drexel.edu/webtms_du/Colleges.asp?Term=201125&univ=DREX",
"https://duapp3.drexel.edu/webtms_du/Courses.asp?SubjCode=CS&CollCode=E&univ=DREX"
);
$data = '';
$ch = curl_init();
// Set cookie jar to temporary file, because, even if we don't need them,
// it seems curl does not store the cookies anywhere otherwise or include
// them in subsequent requests
curl_setopt($ch, CURLOPT_COOKIEJAR, tempnam(sys_get_temp_dir(), 'curl'));
// We don't want direct output by curl
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Then run along the scrape path
foreach ($scrape as $url) {
curl_setopt($ch, CURLOPT_URL, $url);
$data = curl_exec($ch);
}
curl_close($ch);
echo $data;
I would like to login to a site, so the first time I request a page, it redirects me to another page setting the cookies.
I am following a tutorial where they specify doing this
$cookie = '/tmp/cookies.txt';
curl_setopt($ch, CURLOPT_COOKIE, $cookie);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie);
But when i check http live headers, the server passes cookie information to set my cookies.
But i don't see it doing anything. When I examine the cookies, those values aren't there.
So do I have to specify another path for $cookie?
You've to use CURLOPT_COOKIEFILE for sending cookies instead of CURLOPT_COOKIE.
From the docs for function curl_setopt():
CURLOPT_COOKIE
The contents of the "Cookie: " header to be used in the HTTP request. Note that multiple cookies are separated with a semicolon followed by a space (e.g., "fruit=apple; colour=red")
CURLOPT_COOKIEFILE
The name of the file containing the cookie data. The cookie file can be in Netscape format, or just plain HTTP-style headers dumped into a file.
CURLOPT_COOKIEJAR
The name of a file to save all internal cookies to when the handle is closed, e.g. after a call to curl_close.