I'm using CURLOPT_COOKIEJAR to store cookies to a file and CURLOPT_COOKIEFILE to retrieve them from the file.
What I'm wonder is what happens when multiple users are accessing the script at the same time - won't it mess up the contents of the cookie file? Also, how do I manage the cookie files so that it's possible to have multiple users at the same time?
CURLOPT_COOKIEJAR and CURLOPT_COOKIEFILE are just utilities for handling cookies in a file, like a web browser.
And it's not recommended for your case.
But you can play directly with http headers to set and retrieve cookies.
For setting you cookies
<?php
curl_setopt($ch, CURLOPT_COOKIE, 'user=xxxxxxxx-xxxxxxxx');
?>
For retrieving cookies, just identify the headers that startswith Set-Cookie:
You can check this document for understanding how cookie headers works http://curl.haxx.se/rfc/cookie_spec.html
Usage example, quick and dirty, but definitely not standard.
With this headers
<?php
$header_blob = '
Set-Cookie: PART_NUMBER=ROCKET_LAUNCHER_0001; path=/
Set-Cookie: PART_NUMBER=RIDING_ROCKET_0023; path=/ammo
';
Extract cookie headers
$cookies = array();
if (preg_match_all('/Set-Cookie:\s*(?P<cookies>.+?);/i', $header_blob, $matches)) {
foreach ($matches['cookies'] as $cookie) {
$cookies[] = $cookie;
}
$cookies = array_unique($cookies);
}
var_dump($cookies);
Resend cookies
$cookie_blob = implode('; ', $cookies);
var_dump($cookie_blob);
You'll need to specify a different file for each execution of the script, otherwise you'll have issues with the file being overwritten, etc. as you suggest.
You might want to have a look at the tempnam (example below) as a means of generating the unique file, or simply use uniqid, etc. and create the file yourself.
<?php
session_start();
$cookieFilePath = $_SESSION['cookiefilepath']
? $_SESSION['cookiefilepath']
: tempnam(sys_get_temp_dir(), session_id().'_cookie_');
$_SESSION['cookiefilepath'] = $cookieFilePath;
...
curl_setopt($curlSession, CURLOPT_COOKIEFILE, $cookieFilePath);
...
?>
That said, you'll need to ensure that you remove these files once they're no longer required. (If this isn't within the lifetime of your script, you might want to periodically execute a tidy-up script via cron that uses filemtime or similar.)
Incidentally, you can simply provide a full path to the file you want to use - it doesn't have to be in the same directory that the script is in, despite what is said in the existing Can someone explain CURL cookie handling (PHP)? question.
Multiple requests will overwrite the same file (but will probably also slow all other requests execution down due to file locking).
You could incorporate the session_id() into the cookie file name so you'll have one cookie file for every client session. I'd also recommend storing the files in something like sys_get_temp_dir().
something like:
$cookieFile = sys_get_temp_dir().PATH_SEPARATOR.session_id().'-cookies.txt';
Should work fine for that.
Related
I'm trying to avoid cURL storing the cookie session into an actual file via "CURLOPT_COOKIEJAR". So I created a method to catch / parse the cookies into a local variable - which is then used via "CURLOPT_COOKIE" to restore the cookie session.
I cut out the cookies via
preg_match_all("/^Set-cookie: (.*?);/ism", $header, $cookies);
To use "CURLOPT_COOKIE" we take the key=value and separate them via "; ". However (As I'm aware), CURLOPT_COOKIE doesn't allow you throw in various flags I.e. expiration, secure flag, and so on.
Update 1/29/2014 6:45pm
So I think my issue actually occurs where CURLOPT_FOLLOWLOCATION occurs. I don't think it has to do with the flags. It doesn't seem like the manual cookie session I have is updating when following a new location (i.e. a site has 2-3 redirects to append various cookies / session). Which would actually make sense because utilizing CURLOPT_COOKIEJAR will directly grab / update cookies sent on header redirects. So, I tried creating a manual redirection path while grabbing / appending the latest cookie - however this method did not work for some plain reason.
Update 1/30/2014 4:22pm
Almost got this figured out. Will be updating with answer shortly. It turns out my method works perfectly fine, it's just a matter of jumping through the manual redirected pages correctly.
Update 1/30/2014 4:51pm
Issue solved -- answered myself below.
So it turns out I was actually doing this correctly and my assumptions were correct.
To keep the cookie session in a variable (vs. CURLOPT_COOKIEJAR). *Make sure you have CURLOPT_HEADER and CURLINFO_HEADER_OUT enabled.*
CURLOPT_FOLLOWLOCATION must be set to false. Otherwise your cookie won't send correctly (This is where CURLOPT_COOKIEJAR does best).
Use preg_match_all to extract cookies. Then use strpos to find the first occurence of "=". Some sites use encoding and include "="'s which won't work with "explode".
$data = curl_exec($curl);
$header_size = curl_getinfo($curl, CURLINFO_HEADER_SIZE);
$header = substr($data, 0, $header_size);
preg_match_all("/^Set-cookie: (.*?);/ism", $header, $cookies);
foreach( $cookies[1] as $cookie ){
$buffer_explode = strpos($cookie, "=");
$this->cookies[ substr($cookie,0,$buffer_explode) ] = substr($cookie,$buffer_explode+1);
}
When making your next curl call, re-call the cookie var/object into CURLOPT_COOKIE.
if( count($this->cookies) > 0 ){
$cookieBuffer = array();
foreach( $this->cookies as $k=>$c ) $cookieBuffer[] = "$k=$c";
curl_setopt($curl, CURLOPT_COOKIE, implode("; ",$cookieBuffer) );
}
This will allow you to keep the latest variable (i.e. changing sessions) intact.
Hope this helps anyone who bumps into this issue!
I'm working on a bit of PHP code that depends on a remote file which happens to be hosted on pastebin. The server I am working on has all the necessary functions enabled, as running it with FILE_URL set to http://google.com returns the expected results. I've also verified through php.ini for extra measure.
Everything should work, but it doesn't. Calling file() on a URL formed as such, http://pastebin.com/raw.php?i=<paste id here>, returns a 500 server error. Doing the same on the exact same file hosted locally or on google.com returns a reasonable result.
I have verified that the URL is set to the correct value and verified that the remote page is where I think that it is. I'm at a loss.
ini_set("allow_url_fopen", true);
// Prefer remote (up-to-date) file, fallback to local file
if( ini_get("allow_url_fopen") ){
$file = file( FILE_URL );
}
if(!isset( $file ) || !$file ) {
$file = file( LOCAL_FILE_PATH );
}
I wasn't able to test this, but you should use curl, try something like this:
<?php
$url = "http://pastebin.com/2ZdFcEKh";
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_exec($ch);
Pastebin appear to use a protection system that will automatically block IP addresses that issue requests that are "bot-like".
In the case of your example, you will get a 500 server error since the file() command never completes (since their protection system never closes the connection) and there is no timeout facility in your call. The script is probably considered "bot-like" since file() does not pass through all the standard HTTP headers a typical browser would.
To solve this problem, I would recommend investigating cURL and perhaps look at setting a browser user agent as a starting point to grant access to your script. I should also mention that it would be in your interests to investigate whether or not this is considered a breach of the Pastebin user agreement. While I cannot see any reference to using scripts in their FAQ (as of 2012/12/29), they have installed protection against scripts for a reason.
I have been searching for a way, to specify the cookie data for CURL. I have found some solutions on how to save the cookies from a visited page, but that's not what I need. What I want is, to write the data for the cookie myself, so CURL uses it.
You can use curl_setopt with the CURLOPT_COOKIE constant:
<?php
// create a new cURL resource
$ch = curl_init();
// cookies to be sent
curl_setopt($ch, CURLOPT_COOKIE, "fruit=apple; colour=red");
You really should read the documentation - it's listed with exactly the keywords you'd expect and contains a lot of helpful info:
-b, --cookie
(HTTP) Pass the data to the HTTP server as a cookie. It is supposedly
the data previously received from the server in a "Set-Cookie:" line.
The data should be in the format "NAME1=VALUE1; NAME2=VALUE2".
If no '=' symbol is used in the line, it is treated as a filename to
use to read previously stored cookie lines from, which should be used
in this session if they match. Using this method also activates the
"cookie parser" which will make curl record incoming cookies too,
which may be handy if you're using this in combination with the -L,
--location option. The file format of the file to read cookies from should be plain HTTP headers or the Netscape/Mozilla cookie file
format.
NOTE that the file specified with -b, --cookie is only used as input.
No cookies will be stored in the file. To store cookies, use the -c,
--cookie-jar option or you could even save the HTTP headers to a file using -D, --dump-header!
If this option is set more than once, the last one will be the one
that's used.
cURL can use a cookie file in Netscape format. Just create such a file yourself and use as the CURLOPT_COOKIEFILE option.
I'm not sure if I'm asking this properly.
I have two PHP pages located on the same server. The first PHP page sets a cookie with an expiration and the second one checks to see if that cookie was set. if it is set, it returns "on". If it isn't set, it returns "off".
If I just run the pages like
"www.example.com/set_cookie.php"
AND
"www.example.com/is_cookie_set.php"
I get an "on" from is_cookie_set.php.
Heres the problem, on the set_cookie.php file I have a function called is_set. This function executes the following cURL and returns the contents ("on" or "off"). Unfortunately, the contents are always returned as "off". however, if I check the file manually ("www.example.com/is_cookie_set.php") I can see that the cookie was set.
Heres the function :
<?php
function is_set()
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, 'http://example.com/is_cookie_set.php');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$contents = curl_exec ($ch);
curl_close ($ch);
echo $contents;
}
?>
Please note, I'm not using cURL to GET or SET cookies, only to check a page that checks if the cookie was set.
I've looked into CURLOPT_COOKIEJAR, and CURLOPT_COOKIEFILE, but I believe those are for setting cookies via cURL and I don't want to do this.
I believe you are making a confusion. When you are using curl, PHP will go to the trouble of acting like a client (like a browser maybe), and make that request for you. That is, the cookies that curl checks for have nothing to do with the cookies in your current browser. I think.
I'm not entirely sure what you are trying to do here but you are aware, as nc3b already states, that in your is_set() function, it's PHP acting as the client and not your browser, right? That means that your cookie test will always fail (= return with no cookies).
Cookies are stored by the client and sent along with every request to the server.
If you want to find out in PHP whether a cookie has been set - of course, you need to be on the same domain as the cookie for that - you can use plain if (isset($_COOKIE["cookiename"])).
Maybe you are trying to build a solution to query for a cookie on a remote host. For that, see this SO question:
Cross domain cookies
Curl acts like your browser as a http client.
If configured they both recceive and store cookies, but they are in no way related.
Curl doesn't use the browser cookies. If you want to use your browser cookies, you have to use the --cookie option switch. See the manpage for details: http://curl.haxx.se/docs/manpage.html
For example Firefox stores them in a file called cookies.txt.
Under linux its located under ~/.mozilla/firefox/$profilefolder/cookies.txt
Hint: If you use Firefox >= 3.0 the cookies are stored in a sqlite database. If you want to use them with curl, you have to extract a cookies.txt file by yourself.
Here are some examples how to do that:
http://roshan.info/blog/2010/03/14/using-firefox-30-cookies-with-wgetcurl/
http://slacy.com/blog/2010/02/using-cookies-sqlite-in-wget-or-curl/
sqlite3 -separator $'\t' cookies.sqlite \
'select host, "TRUE", path, case isSecure when 0 then "FALSE" else "TRUE" end, expiry, name, value from moz_cookies' > cookies.txt
I'm trying to get the contents from another file with file_get_contents (don't ask why).
I have two files: test1.php and test2.php. test1.php returns a string, bases on the user that is logged in.
test2.php tries to get the contents of test1.php and is being executed by the browser, thus getting the cookies.
To send the cookies with file_get_contents, I create a streaming context:
$opts = array('http' => array('header'=> 'Cookie: ' . $_SERVER['HTTP_COOKIE']."\r\n"))`;
I'm retrieving the contents with:
$contents = file_get_contents("http://www.example.com/test1.php", false, $opts);
But now I get the error:
Warning: file_get_contents(http://www.example.com/test1.php) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found
Does somebody knows what I'm doing wrong here?
edit:
forgot to mention: Without the streaming_context, the page just loads. But without the cookies I don't get the info I need.
First, this is probably just a typo in your question, but the third arguments to file_get_contents() needs to be your streaming context, NOT the array of options. I ran a quick test with something like this, and everything worked as expected
$opts = array('http' => array('header'=> 'Cookie: ' . $_SERVER['HTTP_COOKIE']."\r\n"));
$context = stream_context_create($opts);
$contents = file_get_contents('http://example.com/test1.txt', false, $context);
echo $contents;
The error indicates the server is returning a 404. Try fetching the URL from the machine PHP is running on and not from your workstation/desktop/laptop. It may be that your web server is having trouble reaching the site, your local machine has a cached copy, or some other network screwiness.
Be sure you repeat your exact request when running this test, including the cookie you're sending (command line curl is good for this). It's entirely possible that the page in question may load fine in a browser without the cookie, but when you send the cookie the site actually is returning a 404.
Make sure that $_SERVER['HTTP_COOKIE'] has the raw cookie you think it does.
If you're screen scraping, download Firefox and a copy of the LiveHTTPHeaders extension. Perform all the necessary steps to reach whatever page it is you want in Firefox. Then, using the output from LiveHTTPHeaders, recreate the exact same request requence. Include every header, not just the cookies.
Finally, PHP Curl exists for a reason. If at all possible, (I'm not asking!) use it instead. :)
Just to share this information.
When using session_start(), the session file is lock by PHP. Thus the actual script is the only script that can access the session file. If you try to access it via fsockopen() or file_get_contents() you can wait a long time since you try to open a file that has been locked.
One way to solve this problem is to use the session_write_close() to unlock the file and relock it after with session_start().
Example:
<?php
$opts = array('http' => array('header'=> 'Cookie: ' . $_SERVER['HTTP_COOKIE']."\r\n"));
$context = stream_context_create($opts);
session_write_close(); // unlock the file
$contents = file_get_contents('http://120.0.0.1/controler.php?c=test_session', false, $context);
session_start(); // Lock the file
echo $contents;
?>
Since file_get_contents() is a blocking function, both script won't be in concurrency while trying to modify the session file.
But i'm sure this is not the best manner to manipulate session with an extend connection.
Btw: it's faster than cURL and fsockopen()
Let me know if you find something better.
Just out of curiosity, are you attempting file_get_contents on a page that has a space in it? I remember trying to use fgc on a URL that had a space in the name and while my web browser parsed it just fine, fgc didn't. I ended up having to use a str_replace to replace ' ' with '%20'.
I would think that this should have been relatively easy to spot that though as it would report only half of the filename. Also, I noticed in one of these posts, someone used \r\n while defining the headers. Keep in mind that PHP doesn't like these to be in single quotes, but they work fine in double.
Make sure that file1.php exists on the server. Try opening it in your own browser to make sure!