Why does PHP replace pluses with spaces in $_COOKIE? - php

So from my understanding of PHP and cookies, if I use the setcookie() function, then I get a cookie that is automatically url encoded. And when I go to the $_COOKIE array, I should get the cookie back, automatically url decoded. Problem is, it seems to be decoding the cookie twice when I look in $_COOKIE.
Say I have a cookie whose value is "Name|ID|Email", for example:
Joe|123|my+email#somewhere.com
This would be encoded as:
Joe%7C123%7Cmy%2Bemail%40somewhere.com
Notice the plus sign is encoded, so theoretically I ought to get it back if I decode it. Since this is automatically done in $_COOKIE, I ought to get back what I started with. But instead, I'm getting back:
Joe|123|my email#somewhere.com
Notice the space where the plus used to be. This is what I would expect if I ran an additional urldecode() on the cookie. But I'm not, so I have no idea why I would be getting a space instead of a plus.
Another interesting twist. A refresh on the page seems to produce the correct output. Any ideas why it's behaving like this?
FYI, to set the initial cookie, I use javascript and escape() the script to produce the encoded string. Might this be an hand off issue between javascript and PHP?
Thoughts would be appreciated.

It's worth noting that both "%20" and "+" are valid encodings of a space character. Per the Wikipedia article on URL encoding (emphasis added):
When data that has been entered into HTML forms is submitted, the form
field names and values are encoded and sent to the server in an HTTP
request message using method GET or POST, or, historically, via email.
The encoding used by default is based on a very early version of the
general URI percent-encoding rules, with a number of modifications
such as newline normalization and replacing spaces with "+" instead of
"%20". The MIME type of data encoded this way is
application/x-www-form-urlencoded, and it is currently defined (still
in a very outdated manner) in the HTML and XForms specifications.
More specifically related to PHP and JavaScript, see the top answer on this question:
When to encode space to plus (+) or %20?

Firstly, PHP will always run before JavaScript - it's server side rather than client side so the cookie you set with JavaScript won't actually be available to PHP until you refresh the page (hence that issue).
Next JavaScript has different ways to encode the strings; only one will work with PHP automatically.
So:
document.cookie = "testuser=" + "Joe|123|my+email#somewhere.com";
// Joe|123|my email#somewhere.com (when decoded by PHP)
document.cookie = "testuser=" + escape("Joe|123|my+email#somewhere.com");
// Joe|123|my email#somewhere.com (when decoded by PHP)
document.cookie = "testuser=" + encodeURI("Joe|123|my+email#somewhere.com");
// Joe|123|my email#somewhere.com (when decoded by PHP)
document.cookie = "testuser=" + encodeURIComponent("Joe|123|my+email#somewhere.com");
// Joe|123|my+email#somewhere.com
So, try this for the sake of a test (remember you'll need to refresh the page to see the cookie value):
<html>
<head>
<title>Cookie Juggling</title>
<script type="text/javascript">
document.cookie = "testuser=" + encodeURIComponent("Joe|123|my+email#somewhere.com");
</script>
</head>
<body>
<div><?php echo !empty($_COOKIE['testuser']) ? $_COOKIE['testuser'] : "Cookie not set yet"; ?></div>
</body>
</html>

If you don't want to automatically encode the cookie, you can use setrawcookie function.
The exception with this function is, you can not use these characters: (,; \t\r\n\013\014) :
setrawcookie("NAME","Joe|123|my+email#somewhere.com");
# Output in browser:
Joe|123|my+email#somewhere.com
# Output in PHP `echo $_COOKIE['NAME']`:
Joe|123|my email#somewhere.com
Tested with PHP 5.3
setcookie("NAME","Joe|123|my+email#somewhere.com");
# Output in browser:
Joe%7C123%7Cmy%2Bemail%40somewhere.com
# Output in PHP echo $_COOKIE['NAME']`:
Joe|123|my+email#somewhere.com
now : As an alternative way, you can use setcookie(), and rawurldecode() to decode it:
echo rawurldecode($_COOKIE['NAME'])

Related

UTF-8 data received by php isn't decoded

I'm having some troubles with my $_POST/$_REQUEST datas, they appear to be utf8_encoded still.
I am sending conventional ajax post requests, in these conditions:
oXhr.setRequestHeader("Content-type", "application/x-www-form-urlencoded; charset=utf-8");
js file saved under utf8-nobom format
meta-tags in html <header> tag setup
php files saved under utf-8-nobom format as well
encodeURIComponent is used but I tried without and it gives the same result
Ok, so everything is fine: the database is also in utf8, and receives it this way, pages show well.
But when I'm receiving the character "º" for example (through $_REQUEST or $_POST), its binary represention is 11000010 10111010, while "º" hardcoded in php (utf8...) binary representation is 10111010 only.
wtf? I just don't know whether it is a good thing or not... for instance if I use "#º#" as a delimiter of the explode php function, it won't get detected and this is actually the problem which lead me here.
Any help will be as usual greatly appreciated, thank you so much for your time.
Best rgds.
EDIT1: checking against mb_check_encoding
if (mb_check_encoding($_REQUEST[$i], 'UTF-8')) {
raise("$_REQUEST is encoded properly in utf8 at index " . $i);
} else {
raise(false);
}
The encoding got confirmed, I had the message raised up properly.
Single byte utf-8 characters do not have bit 7(the eight bit) set so 10111010 is not utf-8, your file is probably encoded in ISO-8859-1.

How to write a non-english character cookie using javascript and let php reads it properly

my websites sometimes needs to read a javascript cookie using php but sometimes I get weird character set from some users like this `#16 3CFJ) DD%J,'1 while for some users it reads it properly. therefore, I think the problem is in the client-side. I use this method to write cookies:
var expireDate = new Date();
expireDate.setMonth(expireDate.getMonth() + 1);
var value="Sami";
document.cookie = "name="+value+";path=/;expires="+expireDate.toGMTString();
and this $_COOKIE['name']to read it using php.
Cookies cant be handled using headers. So,
Encode your cookie using base64_encode() and decode it using base64_decode() to read it.
To encode/decode in Javascript, this answer might help.

IE adds encoded values to encodeURIComponent in every AJAX call

I am passing a value for example "Cats & Dogs" through a AJAX request.
I am applying encodeURIComponent to the value like: encodeURIComponent("Cats & Dogs");
Then I set the browser hash to this value for AJAX bookmarking. It works well in Firefox - in the the hash it appears as #value=Cats %26 Dogs, however in IE it appears as Cats%20%26%20Dogs, this causes a problem when I attempt to read the hash and resend it in a AJAX call, IE keeps adding more encoded values so the previous value becomes Cats%2520%2526%2520Dogs and then Cats%2520%2526%2520Dogs and so on...
This does not occur in Firefox.
How can I overcome this issue?
Nevermind, I found my problem. I was not decoding the encoded value before setting it as the hash.

Cyrillic characters from javascript cookie to php output via $_COOKIE

When i'm trying to put russian text in cookie via javascript and then output it via php it returns:
%u043F%u0440%u043E%u0432%u0435%u0440%u043A%u0430
How to decode this to normal cyrillic characters?
This is the function i'm using to pass to document.cookie:
function setCookie(c_name,val,c_expiredays,c_path,c_domain,c_secure)
{
var exdate=new Date();
exdate.setDate(exdate.getDate()+c_expiredays);
document.cookie=c_name+ "=" +escape(val)+
/* Additional settings */
((c_path) ? "; path=" + c_path : "") +
((c_domain) ? "; domain=" + c_domain : "") + // used to allow using only on a certain domain
((c_secure) ? "; secure" : "") + // used for HTTPS (SSL)
((c_expiredays==null) ? "" : ";expires="+exdate.toGMTString());
}
setCookie('name',$(this).val(),1);
On server side, i'm outputting like that:
(isset($_COOKIE['img_href_value']) ? $_COOKIE['img_href_value'] : '')
You can't put non-ASCII characters directly in a cookie (what happens if you try varies over each different browser and many of them mangles the results irretrievably).
So you have to choose some encoding to use to use on cookie values. It doesn't matter what, as long as your client-side JavaScript and server-side PHP agree on one encoding. URL-encoding is certainly a popular choice of encoding scheme for this purpose, but it's not mandated by any standard and tools won't automatically decode it for you.
To get characters out of a URL-encoded cookie value you must manually call rawurldecode on the value at the PHP end, or decodeURIComponent to extract from document.cookie at the JavaScript end.
To encode to this format the corresponding functions are rawurlencode on the PHP side and encodeURIComponent in JavaScript.
(This assumes you are using UTF-8 for your strings, which you should be.)
Don't use urlencode for this in PHP (it's for form submission parameters only and gets the space character wrong in this context), and definitely don't use escape in JavaScript (it gets every non-ASCII character wrong, coming up with that weird non-standard %uNNNN format you quoted).
(In general, JS escape/unescape is an ancient and highly questionable encoding scheme that you should almost never have any reason to use.)
Try with this: http://kobesearch.cpan.org/htdocs/Encode-JavaScript-Cyrillic/Encode/JavaScript/Cyrillic.pm.html

PHP form auto escaping posted data?

I have an HTML form POSTing to a PHP page.
I can read in the data using the $_POST variable on the PHP.
However, all the data seems to be escaped.
So, for example
a comma (,) = %2C
a colon (:) = %3a
a slash (/) = %2
so things like a simple URL of such as http://example.com get POSTed as http%3A%2F%2Fexample.com
Any ideas as to what is happening?
Actually you want urldecode. %xx is an URL encoding, not a html encoding. The real question is why are you getting these codes. PHP usually decodes the URL for you as it parses the request into the $_GET and $_REQUEST variables. POSTed forms should not be urlencoded. Can you show us some of the code generating the form? Maybe your form is being encoded on the way out for some reason.
See the warning on this page: http://us2.php.net/manual/en/function.urldecode.php
Here is a simple PHP loop to decode all POST vars
foreach($_POST as $key=>$value) {
$_POST[$key] = urldecode($value);
}
You can then access them as per normal, but properly decoded. I, however, would use a different array to store them, as I don't like to pollute the super globals (I believe they should always have the exact data in them as by PHP).
This shouldn't be happening, and though you can fix it by manually urldecode()ing, you will probably be hiding a basic bug elsewhere that might come round to bite you later.
Although when you POST a form using the default content-type ‘application/x-www-form-encoded’, the values inside it are URL-encoded (%xx), PHP undoes that for you when it makes values available in the $_POST[] array.
If you are still getting unwanted %xx sequences afterwards, there must be another layer of manual URL-encoding going on that shouldn't be there. You need to find where that is. If it's a hidden field, maybe the page that generates it is accidentally encoding it using urlencode() instead of htmlspecialchars(), or something? Putting some example code online might help us find out.

Categories