I have a cookie which I am setting based on data in a field in Drupal. When I write the cookie using PHP the extended ASCII characters are shown as hex-codes (e.g. %7E) but if I write a similar cookie with JavaScript then the extended ASCII characters are show in the cookie as single characters (e.g. ~ ).
This is the string I want in my cookie.
Section1~email,email.calendar,calendar.wordpresssml,wordpress.moodlesml,moodle.maharasml,mahara.gdrive,gdrive.eportfolio,eportfolioblogs.wiki,wiki.youtube,email.feature,feature|Section2~reader,reader|
If I use
setcookie("p", "Section1~email,email.calendar,calendar.wordpresssml,wordpress.moodlesml,moodle.maharasml,mahara.gdrive,gdrive.eportfolio,eportfolioblogs.wiki,wiki.youtube,email.feature,feature|Section2~reader,reader|", $expire);
I get Section1%7Eemail%2Cemail.calendar%2Ccalendar.wordpresssml%2Cwordpress.moodlesml%2Cmoodle.maharasml%2Cmahara.gdrive%2Cgdrive.eportfolio%2Ceportfolioblogs.wiki%2Cwiki.youtube%2Cemail.feature%2Cfeature%7CSection2%7Ereader%2Creader%7C
rather than the string I want. If I write the cookie using JavaScript it works fine. I know this is an encoding issue but I would really like PHP to write the cookie using the full set of Extended ASCII characters.
Why is this a problem? You have to remember that this will appear in the Set-Cookie HTTP header and by not encoding it you will always have to remember what the special characters are and avoid using them. With encoding, you don't have to worry about that.
With PHP, when you do $_COOKIE['p'] it will appear as you originally intended. If you want to extract it in Javascript using document.cookie or something, then you can use decodeURIComponent(cookieValue).
Related
I have created one php page with UTF-8-BOM encoding. I want to use this encoding because I have some content which are in my regional language, and do display it properly i need to use UTF-8-BOM encoding.
Now I want to use session with this page but it is throwing error of headers already set.
So is there any way i can use both together.
If I am trying to use UTF-8 only I am not getting problem displaying data in regional format.
See Attached Image
The "Byte Order Mark" is a sequence of 3 bytes that a file begins with, making it pretty much incompatible with PHP, because a script that is supposed to contain only PHP code must start with the <?php tag instead.
Obviously, it's not like the whole thing doesn't work at all, but anything that involves sending HTTP headers (which is A LOT) automatically gets broken.
Sessions use cookies - transferred via headers - won't work.
Redirecting to another page - the Location header - won't work.
Dynamically generated downloads - the downloaded file itself will be broken.
etc.
Sorry, but you'll have to give up on BOM and figure another way to handle your locale-specific data (which I can only assume is using another charset for whatever reason).
I have a web app where I first store JSON data in a cookie, then save to the database every x seconds. It just opens a connection to the server, and the server reads the cookie. It doesn't actually send anything via POST or GET.
While I save to the cookie, my data is formatted fine. However, when I work with it in PHP then setcookie a new json_encoded array, it replaces spaces with + symbols, and then these show up in my web app. I can't find any way to disable encoding of strings for json_encode, nor a JS way of parsing those plus symbols out (using jQuery.parseJSON; stringify's parse didn't work either)... Does anyone have any idea :S?
From the fine manual:
Note that the value portion of the cookie will automatically be urlencoded when you send the cookie, and when it is received, it is automatically decoded and assigned to a variable by the same name as the cookie name. If you don't want this, you can use setrawcookie() instead if you are using PHP 5.
But I think you still want the cookie URL encoded, you just want %20 for spaces instead of +. However, urlencode:
[...] for historical reasons, spaces are encoded as plus (+) signs
You could try using rawurlencode to encode it yourself:
Returns a string in which all non-alphanumeric characters except -_.~ have been replaced with a percent (%) sign followed by two hex digits. This is the encoding described in RFC 3986 [...]
And then setrawcookie to set the cookie. Unfortunately, none of decodeURI, decodeURIComponent, or even the deprecated unescape JavaScript functions will convert a + back to a space; so, you're probably stuck forcing everyone to make sense the hard way.
Hey, guys. I work for http://pastebin.com and we have a little issue with the new API and char encoding.
On the site itself we run a meta tag which specifies that everything on the site, including the forms, are utf-8. Because of this all chars get stored in the right way, without having to modify any char types.
With the API however, people can send data from all kinds of different sources & forms, and therefor has to get checked and possibly changed, before storing it.
Chars that are giving a problem are for example:
고객님이 티빙
Iñtërnâtiônàlizætiøn
♥♥♥♥♥
идите в *оопу, он лучший)
What would be a good way to approach this data input to the API to make sure all chars get stored in a valid UTF-8 format, which will work on our site.
Assuming your client is sending utf8 data and headers correctly: Sounds like you're doing a utf8_encode() on already-encoded utf8 data.
Duplicate: What is the best way to handle uploaded text files of different encodings?
In a nutshell, the only reliable way is having the client specify what encoding they are using. Automatic encoding detection is imperfect and tends to be unreliable.
You could for example specify that incoming data needs an encoding specified if it's not UTF-8.
After answering Zend_Cache: After loading cached data, character encoding seems messed up
I use it to change the PHP's internal encoding , its originally ISO-8859-1,
so I need to change the encoding of every non English input value, but using it I force PHP to convert every value to UTF-8, as you might see in the question linked above.
I am Caching arabic text in files using Zend_cache, I wasn't be able to do it without that function.
I need to know: How bad is to use this function mb_internal_encoding("UTF-8");?
I had adopt to use this function in every project I opt in , all of them are using non-english characters
Using PHP against a UTF-8 compliant database. Here's how input goes in.
user types input into textarea
textarea encoded with javascript escape()
passed via HTTP post
decoded with PHP rawurldecode()
passed through HTMLPurifier with default settings
escaped for MySQL and stored in database
And it comes out in the usual way and I run unescape() on page load. This is to allow people to, say, copy and paste directly from a word document and have the smart quotes show up.
But HTMLPurifier seems to be clobbering non-UTF-8 special characters, ones that escape() to a simple % expression, like Ö, which escapes to %D6, whereas smartquotes escape to %u2024 or something and go into the database that way. It takes out both the special character and the one immediately following.
I need to change something in this process. Perhaps I need to change multiple things.
What can I do to not get special characters clobbered?
textarea encoded with javascript escape()
escape isn't safe for non-ascii. Use escapeURIComponent
passed via HTTP post
I assume that you use XmlHttpRequest? If not, make sure that the page containing the form is served as utf-8.
decoded with PHP rawurldecode()
If you access the value through $_POST, you should not decode it, since that has already been done. Doing so will mess up data.
escaped for MySQL and stored in database
Make sure you don't have magic quotes turned on. Make sure that the database stores tables as utf-8 (The encoding and the collation must be both utf-8). Make sure that the connection between php and MySql is utf-8 (Use set names utf8, if you don't use PDO).
Finally, make sure that the page is served as utf-8 when you output the string again.