Strange special character encoding error - php

I have a very strange character encoding error:
I am sending a textfield to a script via jQuerys ajax function.
Assuming I want to send the euro sign
echo $string;
produces
€
however
echo base64_decode(base64_encode($string));
produces
€
any hints on how I could debug this problem?

This is not a real world example though, is it? You are encoding it in one page, and decoding it in another, aren't you? In that case, you need to tell us which character set those pages use.
Pekka was right, my charsets got mixed up, after I set a global UTF8 charset header, everything works fine.

Related

PHP Character Encoding Error: How they do it?

Problem:
I have a Textarea, that except XML as content and post to server. It works fine if all Ascii characters are there, but when we put data in hebrew then simplexml_load_string fail to load the data, prompting that invalid XML as data encoding breaks the data been posted.
What I did:
I have my HTML meta tag for UTF-8 is set, I do have php header set for content to be UTF-8
I have MySQL set to 'SET NAMES utf8.
When print_r(iconv_get_encoding('all')); it print all three values as ISO-8859-1.
When I print $_POST it shows hebrew characters fine on browser [on Browser view source as well], but still the function failed.
When I change php.ini to take iconv encoding as UTF-8 all works fine again.
However:
Same server does have 100s of Wordpress installation that run Hebrew website, and they don't have such problem.
So, my question is: Why my code is failing but wordpress or any other open source software works just fine with encoding. I did try to set iconv to utf-8 as first executable line, but nothing changed for me.
Not sure I explain my problem fine and my question is clear, if not please let me know. Thanks.
EDIT: I did try utf8_encode and utf8_decode function but they too failed.
You need to use mb_internal_encoding('UTF-8') to tell php what encoding you are using. With this you are overwriting the settings from php.ini.

PHP urlencode for chinese characters

I'm creating a php application that involves sending chinese characters as url parameters.
I have to send query like :
http://xyz.com/?q=新
But the script at xyz.com won't automatically encode the chinese character. So, I need to explicitly send an encoded string as the paramter. It becomes:
http://xyz.com/?q=%E6%96%B0
The problem is, PHP won't encode the chinese character properly.
I've tried urlencode() and rawurlencode(). But they give %D0%C2 (doesn't work for my purpose) instead of %E6%96%B0 (works well with xyz.com) as the output.
I'm using this website to create the latter encoded string.
I've also defined header('Content-Type: text/html; charset=gb2312'); to display chinese characters properly.
Is there anything I can do to urlencode the chinese character properly?
Thanks!
PS: I'm a relatively new programmer and don't understand chinese.
You're URLencoding using the charset you specify in your header. %D0%C2 is 新 in gb2312; %E6%96%B0 is 新 in UTF-8. Switch your charset over to UTF-8 and you should fix this issue and still be able to display Simplified Chinese Han.
In order to reproduce your problem I created a simple PHP file:
<?php
var_dump(urlencode('新'));
?>
First I used UTF8 encoding and got %E6%96%B0. Afterwards I changed to GB2312 and got %D0%C2.
At http://meyerweb.com/eric/tools/dencoder/ they seem to use JavaScript, that's UTF8 capable and therefore returns %E6%96%B0, too.
PS: When changing from GB2312 to UTF8 some editors might break code some internationalized code. So please make sure to have a copy of your file before converting!

Corrupted characters when jQuery.AJAX sends to PHP (UTF-8 & ISO-8859 incompatibilities)

I have a javascript/PHP script that does the following:
Uses javascript to find text on a web-page.
Transmits the text using jQuery AJAX to a PHP page.
The PHP stores the text in a MySQL database.
The trouble is, when I look at what has been stored in the database, some non-ASCII characters are corrupted.
I have simplified the problem and printed out the character codes of each letter to investigate what is going on.
For example: send over a single character, the pound sterling symbol.
When I check in PHP, what is being received is characters 0xC2 followed by 0xA3
(capital A circumflex follwed by pound sterling).
Ie getting a spurious extra character  before the £).
I've looked at similar problems which suggested setting the jQuery.ajax contentType etc, but none of this made sense to me.
Thanks
Sounds like you're got mixed character sets. UTF-8, ISO-8859 there. PHP won't mangle the single pound character into two on its own, but the browser might if it's been told to expect iso-8859 but is sent UTF-8 instead. the  is a dead giveaway for that.
Basically, make sure you're using UTF-8 at all stages of processing (database, PHP, html) and usually things will work much better.
Finally got this to work.
The problem seems to be that the jQuery.ajax transmits data to the server using UTF-8 but the PHP expects iso-8859-1.
Solution: in PHP convert UTF-8 to ISO using the utf8_decode function, e.g.
$incomming = utf8_decode($_REQUEST('incomming'));
And when you send data back for the ajax return handler, use utf8_encode() to convert back to UTF-8.
Other things that seem to work include using the javascript escape() function on the data prior to transmission to the server and then un-escape the data in PHP with URLdecode().
Other things I tried but couldn't get to work:
I tried to make ajax transmit in iso-8859-1 so it would be compatible with the PHP: In the jquery.ajax settings: contentType: "application/x-www-form-urlencoded; charset=iso-8859-1".
Seemed to have no effect.
I tried to make PHP use UTF-8: header('Content-Type: text/html; charset=utf-8').
Again it didnt work.

I got weird characters extracting data from MySQL db

Well, I got a MySQL db, encoded as utf8_unicode_ci, and it runs like a charm with the current application (written in Code Igniter)
Now, I'm developing a new PHP app, and when I try to recover the data, several characters are unreadable - chars appears ok in the DB with phpMyAdmin, but when I try to put it up in a webpage, it became like "ROLA �60".
These characters are spanish letters, such as ñ, á, Ó... or ascii codes like €, Ø...
Where's the problem? I've set the page as meta http-equiv="Content-Type" content="text/html; charset=UTF-8", I've tried the mysql_set_charset() function, and still nothing.
Any experience with this kind of problems?
I have noticed this exact problem in my database driven applications, and it took me a long time to work it out!
The problem occurs because there are at least three places in your application that need their character sets defining, and they must all be the same character set (and that character set must be able to handle the characters you are handling).
The question mark symbol and it's variations occurs when the browser doesn't understand what character it is being passed.
Make sure your character sets match in the following places:
The HTML head section.
The database collation itself - this can be set in phpMyAdmin when you create a new table, or alter the schema of an existing table.
The most overlooked character set setting: The php.ini file's "default_charset" value (can be set via PHP script or the httpd.conf file).
Judging from your post, it looks like your PHP configuration might be set to be using a different default_charset. This means that your database will be storing the characters fine, and will be sendign the characters fine to your script, but the PHP script itself will not know what to do with the character, and thus outputs to the browser as the annoying question mark symbol.
Try changing the php.ini value to the same charset, and you may be surprised to see the characters displaying fine! If you don't have access to php.ini, you can change the value with the following function:
ini_set("default_charset", "UTF-8");
Hope this helps.
if mysql_set_charset() didn't work try executing
mysql_query('SET NAMES utf8');
after establishing connection.
Any kind of problem requires only one experience: understanding of what are you doing and debugging. Very rare skills nowadays, thanks to sites like stackoverflow.
there are 3 levels where encoding gets involved:
database level
server side script level.
html page level.
each should be checked respectively.
to check HTML level is easy. Just click "View" menu in your browser, then "encoding" and see which one is marked. If it's right one, you are all right. If not - wrong HTTP header being sent and you have to make it right
for the server side there is very little to do. Just to tell mysql which encoding you want. it can be done 2 ways:
mysql_set_charset() - preferred one
SET NAMES query
note that UTF-8 encoding named utf8 in mysql.
database level seems O.K. in your case.
mysql_query('SET NAMES utf8'); works when browsing via Chrome but not FF or IE
Try below may be your problem will be solved:
<?php
$text = "This is the Euro symbol '€'.";
echo 'Original : ', $text, PHP_EOL;
echo 'TRANSLIT : ', iconv("UTF-8", "ISO-8859-1//TRANSLIT", $text), PHP_EOL;
echo 'IGNORE : ', iconv("UTF-8", "ISO-8859-1//IGNORE", $text), PHP_EOL;
echo 'Plain : ', iconv("UTF-8", "ISO-8859-1", $text), PHP_EOL;
?>
http://php.net/manual/en/function.iconv.php

PHP output showing little black diamonds with a question mark

I'm writing a php program that pulls from a database source. Some of the varchars have quotes that are displaying as black diamonds with a question mark in them (�, REPLACEMENT CHARACTER, I assume from Microsoft Word text).
How can I use php to strip these characters out?
If you see that character (� U+FFFD "REPLACEMENT CHARACTER") it usually means that the text itself is encoded in some form of single byte encoding but interpreted in one of the unicode encodings (UTF8 or UTF16).
If it were the other way around it would (usually) look something like this: ä.
Probably the original encoding is ISO-8859-1, also known as Latin-1. You can check this without having to change your script: Browsers give you the option to re-interpret a page in a different encoding -- in Firefox use "View" -> "Character Encoding".
To make the browser use the correct encoding, add an HTTP header like this:
header("Content-Type: text/html; charset=ISO-8859-1");
or put the encoding in a meta tag:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
Alternatively you could try to read from the database in another encoding (UTF-8, preferably) or convert the text with iconv().
I also faced this � issue. Meanwhile I ran into three cases where it happened:
substr()
I was using substr() on a UTF8 string which cut UTF8 characters, thus the cut chars could not be displayed correctly. Use mb_substr($utfstring, 0, 10, 'utf-8'); instead. Credits
htmlspecialchars()
Another problem was using htmlspecialchars() on a UTF8 string. The fix is to use: htmlspecialchars($utfstring, ENT_QUOTES, 'UTF-8');
preg_replace()
Lastly I found out that preg_replace() can lead to problems with UTF. The code $string = preg_replace('/[^A-Za-z0-9ÄäÜüÖöß]/', ' ', $string); for example transformed the UTF string "F(×)=2×-3" into "F � 2� ". The fix is to use mb_ereg_replace() instead.
I hope this additional information will help to get rid of such problems.
This is a charset issue. As such, it can have gone wrong on many different levels, but most likely, the strings in your database are utf-8 encoded, and you are presenting them as iso-8859-1. Or the other way around.
The proper way to fix this problem, is to get your character-sets straight. The simplest strategy, since you're using PHP, is to use iso-8859-1 throughout your application. To do this, you must ensure that:
All PHP source-files are saved as iso-8859-1 (Not to be confused with cp-1252).
Your web-server is configured to serve files with charset=iso-8859-1
Alternatively, you can override the webservers settings from within the PHP-document, using header.
In addition, you may insert a meta-tag in you HTML, that specifies the same thing, but this isn't strictly needed.
You may also specify the accept-charset attribute on your <form> elements.
Database tables are defined with encoding as latin1
The database connection between PHP to and database is set to latin1
If you already have data in your database, you should be aware that they are probably messed up already. If you are not already in production phase, just wipe it all and start over. Otherwise you'll have to do some data cleanup.
A note on meta-tags, since everybody misunderstands what they are:
When a web-server serves a file (A HTML-document), it sends some information, that isn't presented directly in the browser. This is known as HTTP-headers. One such header, is the Content-Type header, which specifies the mimetype of the file (Eg. text/html) as well as the encoding (aka charset).
While most webservers will send a Content-Type header with charset info, it's optional. If it isn't present, the browser will instead interpret any meta-tags with http-equiv="Content-Type". It's important to realise that the meta-tag is only interpreted if the webserver doesn't send the header. In practice this means that it's only used if the page is saved to disk and then opened from there.
This page has a very good explanation of these things.
As mentioned in earlier answers, it is happening because your text has been written to the database in iso-8859-1 encoding, or any other format.
So you just need to convert the data to utf8 before outputting it.
$text = “string from database”;
$text = utf8_encode($text);
echo $text;
To make sure your MYSQL connection is set to UTF-8 (or latin1, depending on what you're using), you can do this to:
$con = mysql_connect("localhost","username","password");
mysql_set_charset('utf8',$con);
or use this to check what charset you are using:
$con = mysql_connect("localhost","username","password");
$charset = mysql_client_encoding($con);
echo "The current character set is: $charset\n";
More info here: http://php.net/manual/en/function.mysql-set-charset.php
I chose to strip these characters out of the string by doing this -
ini_set('mbstring.substitute_character', "none");
$text= mb_convert_encoding($text, 'UTF-8', 'UTF-8');
Just Paste This Code In Starting to The Top of Page.
<?php
header("Content-Type: text/html; charset=ISO-8859-1");
?>
Based on your description of the problem, the data in your database is almost certainly encoded as Windows-1252, and your page is almost certainly being served as ISO-8859-1. These two character sets are equivalent except that Windows-1252 has 16 extra characters which are not present in ISO-8859-1, including left and right curly quotes.
Assuming my analysis is correct, the simplest solution is to serve your page as Windows-1252. This will work because all characters that are in ISO-8859-1 are also in Windows-1252. In PHP you can change the encoding as follows:
header('Content-Type: text/html; charset=Windows-1252');
However, you really should check what character encoding you are using in your HTML files and the contents of your database, and take care to be consistent, or convert properly where this is not possible.
Add this function to your variables
utf8_encode($your variable);
Try This Please
mb_substr($description, 0, 490, "UTF-8");
This will help you. Put this inside <head> tag
<meta charset="iso-8859-1">
That can be caused by unicode or other charset mismatch. Try changing charset in your browser, in of the settings the text will look OK. Then it's question of how to convert your database contents to charset you use for displaying. (Which can actually be just adding utf-8 charset statement to your output.)
what I ended up doing in the end after I fixed my tables was to back it up and change back the settings to utf-8 then I altered my dump file so that DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci are my character set entries
now I don't have characterset issues anymore because the database and browser are utf8.
I figured out what caused it. It was the web page+browser effects on the DB. On the terminals that are linux (ubuntu+firefox) it was encoding the database in latin1 which is what the tabes are set. But on the windows 10+edge terminals, the entries were force coded into utf8. Also I noticed the windows 10 has issues staying with latin1 so I decided to bend with the wind and convert all to utf8.
I figured it was a windows 10 issue because we started using win 10 terminals.
so yet again microsoft bugs causes issues. I still don't know why the encoding changes on the forms because the browser in windows 10 shows the latin1 characterset but when it goes in its utf8 encoded and I get the data anomaly. but in linux+firefox it doesn't do that.
This happened to work in my case:
$text = utf8_decode($text)
I turns the black diamond character into a question mark so you can:
$text = str_replace('?', '', utf8_decode($text));
Just add these lines before headers.
Accurate format of .doc/docx files will be retrieved:
if(ini_get('zlib.output_compression'))
ini_set('zlib.output_compression', 'Off');
ob_clean();
When you extract data from anywhere you should use functions with the prefix md_FUNC_NAME.
Had the same problem it helped me out.
Or you can find the code of this symbol and use regexp to delete these symbols.
You can also change the caracter set in your browser. Just for debug reasons.
Using the same charset (as suggested here) in both the database and the HTML has not worked for me... So remembering that the code is generated as HTML, I chose to use the "(HTML code) or the " (ISO Latin-1 code) in my database text where quotes were used. This solved the problem while providing me a quotation mark. It is odd to note that prior to this solution, only some of the quotation marks and apostrophes did not display correctly while others did, however, the special code did work in all instances.
I ran the "detect encoding" code after my collation change in phpmyadmin and now it comes up as Latin_1.
but here is something I came across looking a different data anomaly in my application and how I fixed it:
I just imported a table that has mixed encoding (with diamond question marks in some lines, and all were in the same column.) so here is my fix code. I used utf8_decode process that takes the undefined placeholder and assigns a plain question mark in the place of the "diamond question mark " then I used str_replace to replace the question mark with a space between quotes.
here is the
[code]
include 'dbconnectfile.php';
//// the variable $db comes from my db connect file
/// inx is my auto increment column
/// broke_column is the column I need to fix
$qwy = "select inx,broke_column from Table ";
$res = $db->query($qwy);
while ($data = $res->fetch_row()) {
for ($m=0; $m<$res->field_count; $m++) {
if ($m==0){
$id=0;
$id=$data[$m];
echo $id;
}else if ($m==1){
$fix=0;
$fix=$data[$m];
$fix = utf8_decode($fix);
$fixx =str_replace("?"," ",$fix);
echo $fixx;
////I echoed the data to the screen because I like to see something as I execute it :)
}
}
$insert= "UPDATE Table SET broke_column='".$fixx."' where inx='".$id."'";
$insresult= $db->query($insert);
echo"<br>";
}
?>
For global purposes.
Instead of converting, codifying, decodifying each text I prefer to let them as they are and instead change the server php settings.
So,
Let the diamonds
From the browser, on the view menu select
"text encoding" and find the one which let's you see your text
correctly.
Edit your php.ini and add:
default_charset = "ISO-8859-1"
or instead of ISO-8859 the one which fits your text encoding.
Go to your phpmyadmin and select your database and just increase the length/value of that table's field to 500 or 1000 it will solve your problem.

Categories