Special characters escaping with JS and PHP - php

my application geting Text from a input field an post it over ajax to a php file for saving it to db.
var title = encodeURIComponent($('#title').val());
if I escape() the title it is all OK but i have Problems with "+" character. So i use the encodeURIComponent().
Now i habe a Problem with german special characters like "ö" "ä" "ü" they will be displayed like a crypdet something....
Have some an idea how can i solve this problem?
Thx

I suppose this has to do with encoding : your HTML page might be using UTF-8, and the special characters are encoded like this :
>>> encodeURIComponent('ö');
"%C3%B6"
When your PHP page receives this, it has to know it's UTF-8, and deal with it as UTF-8 -- which means that everything on the server-side has to work with UTF-8 :
PHP code must use functions that can work with multi-byte characters
The database (db, tables, columns, ...) must use UTF-8 for storing data
When generating HTML pages, you need to indicate it's UTF-8 too, ...
For instance, if you are using var_dump() on the PHP side to display what's been sent from the client, don't forget to indicate that the generated page is in UTF-8, with something like this :
header('Content-type: text/html; charset=UTF-8');
Else, the browser will use it's default charset -- which is not necessarily the right one, and possibly display garbage.

You might use escape("AbcÄüö") and you would get "Abc%C4%FC%F6"
In php you could then use urldecode($myValue) to get "AbcÄüö" again

Related

PHP urlencode for chinese characters

I'm creating a php application that involves sending chinese characters as url parameters.
I have to send query like :
http://xyz.com/?q=新
But the script at xyz.com won't automatically encode the chinese character. So, I need to explicitly send an encoded string as the paramter. It becomes:
http://xyz.com/?q=%E6%96%B0
The problem is, PHP won't encode the chinese character properly.
I've tried urlencode() and rawurlencode(). But they give %D0%C2 (doesn't work for my purpose) instead of %E6%96%B0 (works well with xyz.com) as the output.
I'm using this website to create the latter encoded string.
I've also defined header('Content-Type: text/html; charset=gb2312'); to display chinese characters properly.
Is there anything I can do to urlencode the chinese character properly?
Thanks!
PS: I'm a relatively new programmer and don't understand chinese.
You're URLencoding using the charset you specify in your header. %D0%C2 is 新 in gb2312; %E6%96%B0 is 新 in UTF-8. Switch your charset over to UTF-8 and you should fix this issue and still be able to display Simplified Chinese Han.
In order to reproduce your problem I created a simple PHP file:
<?php
var_dump(urlencode('新'));
?>
First I used UTF8 encoding and got %E6%96%B0. Afterwards I changed to GB2312 and got %D0%C2.
At http://meyerweb.com/eric/tools/dencoder/ they seem to use JavaScript, that's UTF8 capable and therefore returns %E6%96%B0, too.
PS: When changing from GB2312 to UTF8 some editors might break code some internationalized code. So please make sure to have a copy of your file before converting!

encoding from any language to utf in php

I insert from csv characters from different languages..
I apply this to every set of characters:
private function process_elements($element){
utf8_encode($element);
return $element;
}
The problem is when they go into the database, they go like this:
???????? ?? ???????????? ????? ??????? ??? ???????...
When I retrieve them from the databse, I also get this.
This happens with greek. However, when I retrieve greek pages (through scrapping), who are on a utf encoded page. The characters look like this:
Δες webcam δωμάτια | Gr.ImLive.com
which is okay, because when i use the utf8_encode function, they look normal on the screen..
But when the data is taken from the csv and be put into the database, i get those question marks..
Is there a way to encode form any language to utf.. why retrieving data from csv and a utf8 encoded webpage makes such a difference.. they look the same.. how do I address that problem?
please take a look at this
it will help you
Handling Unicode Front To Back In A Web App
It's not about "languages", it's about encodings. Text is encoded as bits and bytes. Any one byte is equal to any other byte. If you only have a blob of bytes, you cannot know what encoding it represents. You can guess, but that's not accurate. You have to know what encoding some text is in by reading the accompanying meta data. That may be documentation, a <meta> tag or an HTTP header. Then you need to treat the text in that encoding.
utf8_encode actually converts text from ISO-8859-1 to UTF-8. It does not simply encode anything to UTF-8, because it does not have the means to determine what something is encoded in either. If your text is already UTF-8 encoded or was not ISO-8859-1 encoded to begin with, you're just garbling the text (as you are).

decoding ISO characters

I got Chinese characters encoded in ISO-8859-1, for example 兼 = 兼
Those characters are taken form the database using AJAX and sent by Json using json_encode.
I then use the template Handlebars to set the data on the page.
When I look at the ajax page the characters are displayed correctly, the source is still encoded.
But the final result displays the encrypted characters.
I tried to decode on the javascript part with unescape but there is no foreach with the template that gives me the possibility to decode the specific variable, so it crashes.
I tried to decode on the PHP side with htmlspecialchars_decode but without success.
Both pages are encoded in ISO-8859-1, but I can change them in UTF8 if necessary, but the data in the database remains encoded in ISO-8859-1.
Thank you for your help.
You're simply representing your characters in HTML entities. If you want them as "actual characters", you'll need to use an encoding that can represent those characters, ISO-8859 won't do. htmlspecialchars_decode doesn't work because it only decodes a handful of characters that are special in HTML and leaves other characters alone. You'll need html_entity_decode to decode all entities, and you'll need to provide it with a character set to decode to which can handle Chinese characters, UTF-8 being the obvious best choice:
$str = html_entity_decode($str, ENT_COMPAT, 'UTF-8');
You'll then need to make sure the browser knows that you're sending it UTF-8. If you want to store the text in the database in UTF-8 as well (which you really should), best follow the guide How to handle UTF-8 in a web app which explains all the pitfalls.
Are you including your text with the "double-stache" Handlebars syntax?
{{your expression}}
As the Handlebars documentation mentions, that syntax HTML-escapes its output, which would cause the results you're mentioning, where you're seeing the entity 兼 instead of 兼.
Using three braces instead ("triple-stache") won't escape the output and will let the browser correctly interpet those numeric entities:
{{{your expression}}}

Corrupted characters when jQuery.AJAX sends to PHP (UTF-8 & ISO-8859 incompatibilities)

I have a javascript/PHP script that does the following:
Uses javascript to find text on a web-page.
Transmits the text using jQuery AJAX to a PHP page.
The PHP stores the text in a MySQL database.
The trouble is, when I look at what has been stored in the database, some non-ASCII characters are corrupted.
I have simplified the problem and printed out the character codes of each letter to investigate what is going on.
For example: send over a single character, the pound sterling symbol.
When I check in PHP, what is being received is characters 0xC2 followed by 0xA3
(capital A circumflex follwed by pound sterling).
Ie getting a spurious extra character  before the £).
I've looked at similar problems which suggested setting the jQuery.ajax contentType etc, but none of this made sense to me.
Thanks
Sounds like you're got mixed character sets. UTF-8, ISO-8859 there. PHP won't mangle the single pound character into two on its own, but the browser might if it's been told to expect iso-8859 but is sent UTF-8 instead. the  is a dead giveaway for that.
Basically, make sure you're using UTF-8 at all stages of processing (database, PHP, html) and usually things will work much better.
Finally got this to work.
The problem seems to be that the jQuery.ajax transmits data to the server using UTF-8 but the PHP expects iso-8859-1.
Solution: in PHP convert UTF-8 to ISO using the utf8_decode function, e.g.
$incomming = utf8_decode($_REQUEST('incomming'));
And when you send data back for the ajax return handler, use utf8_encode() to convert back to UTF-8.
Other things that seem to work include using the javascript escape() function on the data prior to transmission to the server and then un-escape the data in PHP with URLdecode().
Other things I tried but couldn't get to work:
I tried to make ajax transmit in iso-8859-1 so it would be compatible with the PHP: In the jquery.ajax settings: contentType: "application/x-www-form-urlencoded; charset=iso-8859-1".
Seemed to have no effect.
I tried to make PHP use UTF-8: header('Content-Type: text/html; charset=utf-8').
Again it didnt work.

Google Autocompleter and character encoding

I am using this autocompleter from Google
http://code.google.com/p/jquery-autocomplete/ (if you click on "Source" you can find all the source files for the script)
and everything is working fine, except it's having problems with special Croatian characters (like č, ć, ž etc. I'm not sure if you'll see these, so here's an idea of what I am talking about: link - the letter c with a hachek on top etc.)
Here's the setup:
an html file points to a jquery autocomplete script and a php file with the results array
the metadata for the html file has a charset of utf-8, no other pages have any kind of encoding at all
the array in the php file has those special characters encoded with html codes (the letter "ž" is replaced with ž so a typical array element looks like this: "Požega" => "5")
when I enter a search string into the input field, the returning results are encoded correctly - Požega etc. but when I click the result to accept it, it enters Požega into the input field, which is obviously not what I want
when my search string has a special letter in it, the script doesn't find anything
How do I fix this? Should I just replace the HTML special codes in the array with the actual special letters(it seems to work fine then, but I'm not sure whether everybody will see this as I intended)? If not, how do I set the character encoding on all pages so the special letters display correctly on the input field and they're searchable?
Thanks for the help!
Character encoding is such a pain in the ass with browsers. There are several things you can do to cover your bases, one of which you've already done.
Set the tag to indicate charset of UTF-8
Use .htaccess to define a charset of UTF-8
Use PHP to define a charset of UTF-8 in the header (something like: header('Content-Type: text/html; charset=UTF-8');"
Making sure these are true should ensure that the data shows up on all UTF-8 supported browsers. By the way, I can see the special characters, so you must be doing something right. :)

Categories