UTF8 lithuanian characters unrecognized in MySQL database - php

I have well known but quite difficult to sort out problem here. And yes I was searching on forum but those threads are old enough so I decided to create new post.
So I built a website using WP and included html FORM in one page. When user fills the form (in his/her language) the values of the fields' go into MySQL database table reg_form.
Everything works, the values are saved, BUT some characters (specific in that language) are not recognized. I tried a lot of different methods to solve this, but nothing can help.
The strangest thing is that if you look at WordPress tables you can find those specific characters are recognizable but not in reg_form table which I created.
I was trying to solve this problem and finally I decided to approach in somehow ridiculous way. I created NEW database, new tables, installed new wordpress, created new form etc.
That‘s what I was doing:
I used this suggestion first:
http://tympanus.net/codrops/2009/08/31/solving-php-mysql-utf-8-issues/
Yes, my files are saved using UTF8 encoding (without BOM). Yes, meta tags are ok. Yes, the FORM uses accept-charset='UTF-8'. Yes, all tables in database use UTF8. Yes, server, database and tables collation is the same “utf8_general_ci”.
Then I tried to insert in my code this:
$conn = mysql_connect($server, $username, $password);
mysql_set_charset("UTF8", $conn);
Then I tried this suggestion
link here: akrabat.com/php/utf8-php-and-mysql/
Then I tried to set Apache's AddDefaultCharset in .htaccess file using this link here: httpd.apache.org/docs/2.0/mod/core.html#AddDefaultCharset
BUT… still the problem remains. I can’t see those specific characters properly – only weird hieroglyphic.

The problem you face has to do with a little specific detail in database character encoding settings and Wordpress.
While Wordpress has a general character encoding setting that normally takes care about database tables as well, it does not care about the default character encoding setting of the database those tables are in.
So when your plugin/code adds a database table your own, you need to take care about the encoding settings as well - because by default they will be the database default you create the table in, which most likely is latin-1 which does not work well for your language.
To set the default character set for the database (replace "wpdb" with your database name if it varies):
ALTER DATABASE wpdb CHARACTER SET utf8 COLLATE utf8_general_ci;
To change the character set for your existing table *"reg_form"*:
ALTER TABLE reg_form CONVERT TO CHARACTER SET charset_name;
Note: Backup your database first.

HOLLY SHIT!! FINALLY! : ))))))))
The problem was that I was using mysqli_ queries. Now I tried to change to mysql_ (notice the change!) queries and it worked!! Two weeks of haaaaard working and researches... Phew!
Now who can explain me properly the reasons of this phenomena? : ))

Related

fix WordPress custom menu character encoding

The problem is only with custom menus. When I create a new menu and I use non latin letters in menu name such as Arabic menu name or french name with characters such as "é" then it displays bogus characters such as "Municipalitéé" instead of "Municipalité"
The weird thing is that there are no issues if I add a new PAGE or POST through the wordpress CMS. This issue only happens when adding a new menu.
the entries are being saved wrong in the dabatase too (i checked the mysql database with phpmyadmin, and I saw that the values there have weird characters too).
So I believe this has to do with the database connection but however I do use the following code to specify the CHARSET in wp-config.php which contains the connection string to the database:
define('DB_CHARSET', 'utf8');
define('DB_COLLATE', '');
I have been fighting with this problem over the last 24 hours. Appreciate any help
It's not weird at all. wp_editor adds auto html entities. so your accents and such are converted to html entities, thus displayed properly. Menus are a different story, and are saved as is. with enconding being an issue, some installs may have difficulties as yours. try commenting out these 2 lines from the wp_config.php file
define('DB_CHARSET', 'utf8');
define('DB_COLLATE', '');
if this does not succeed, validate your database's character encoding. (simple trick if you have phpmyadmin is to manually insert those characters and validate they are saved properly.
let me know if you still have issues after this!
step 2: (edit 1)
uncomment lines from step 1, and make sure Settings->Reading->Encoding for Pages and Feeds is set to utf-8. make sure your set/collation for your mysql database is set to
UTF-8/utf8_general_ci
if not, convert all tables. a plugin used to exist for this but i haven't used it in ages and it seems to be unmaintained for an equal period of time... though slight modifications to the routines to make sure data structure is well represented would take less time than manually doing this.
https://wordpress.org/plugins/utf-8-db-converter/
let me know if step 2 fails.
to manually alter tables,
Change tables:
ALTER TABLE $table DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci
Change columns:
ALTER TABLE $table CHANGE $field_name $field_name $field_type CHARACTER SET utf8 COLLATE utf8_general_ci
but as stated., it,s probably simpler to just use the plugin (modify it to suit new wordpress data structure, but then again, i believe it will take you less time than manually going through all existing fields in all existing tables. Changing database default encoding will only affect new tables and new columns
edit 3: issue was solved by using
define('DB_COLLATE', 'utf8_general_ci');
in the wp_config.php file. (troubleshooting happened in chat)

Character Encoding & Databases

I am having a big problem with character encoding accross my domains. The big thing for me really is that I don't understand it. I set all my websites to be utf-8 using the meta tag:
<meta charset="UTF-8">
Which seemed to have solved a few problems a while back. Now I am seeing problems with between the website and the database, when a user enters their first or last name, and it has an accent in it, it doesnt display correctly. However I ran the following test.
I created a test table called 'test' (imaginative I know)
I wrote a very tiny script to take the value from a text box and put it into this table and then display the contents of this table each time this page displays, so I could see what is going on.
Here are some screen shots, first, from the output of the page:
And then a screenshot of the database itself:
So the type of column first is VARCHAR(50), I just left the settings as I would do normally, and the character encoding was latin_swedish1 or something. After id 4, I changed it utf8_bin, but that still didnt make a difference.
The problem is, the data still display okay on the website, but looks terrible in the database. Is that a problem, is this how it should be done? I think the problems I the users are complaining about are when it is put into emails and PDFs etc, which I don't think I set character encoding on.
Any help and advice would be greatly welcomed.
Make sure to have the charset is utf8 when you created the table.
create table my_table (
id int primary key,
.....
) engine=innodb charset=utf8;
and make sure that your connection is setting the charset for utf8. it depends on each framework you're using.
You can set internal character to utf-8 /* Set internal character encoding to UTF-8 */
mb_internal_encoding("UTF-8"); check http://php.net/manual/en/function.mb-internal-encoding.php
Try to use utf8_general_ci. This will solve troubles.
When the3 column type was assigned, how was it assigned? I would ensure that it was along the lines of VARCHAR(n) CHARSET utf8 as that would make your column type correct for the UTF8 standard, more normally I see the UCS2 (VARCHAR(n) CHARSET ucs2) which can then cause problems later down the line.
you can set this using NVARCHAR ( see http://dev.mysql.com/doc/refman/5.0/en/charset-national.html) since sql 5.
To change the type of a column on its own you can use this type of command:
Alter table tablenamehere MODIFY columnidhere newdatatypehere;
for example
Alter table mytabelforphp MODIFY oldvarcharcolumnId nvarchar(1024);
Let me know if you need more info:)
also add your character encoding to database connection
if you have mysqli connection use :
mysqli::set_charset
if you use PDO follow this:
PDO_MYSQL DSN

How to prevent question mark in diamond shape from showing up?

I searched through StackOverFlow and found similar questions but nothing that answered my particular situation so I thought of asking this as a new question.
When i insert entries into the database form a textbox, i use mysql_real_escape_string(); and then when i display the information i use htmlspecialchars();
I use UTF-8 as the charset. We are using HTML5 formatting. The collation for the database was by default set to latin1_swedish_ci, so i used that. For all the tables its set to latin1_swedish_ci. For all the fields in the table we use utf8_general_ci.
As an example, this is how it looks when its shown:
�Who are you?� he asked his iPhone. �I am a humble
personal assistant,� the device replied, bringing the biggest...
How do i fix this?
Your problem is the collation setting, make sure it is one of the utf8 ones (like utf8_general_ci) for database, tables and text fields.
Furthermore make sure that you are setting your connection charset to UTF-8 as well :
SET NAMES utf8;

Black Diamonds that are Fixing themselves in MySQL

I am running into a very strange issue with a site that I am working on. The site is basically a job board where the owner or users can create job listings including a description that ends up being stored into a MySQL text field. What we are experiencing is this, whenever listings from certain sources are entered, they initially end up with the "Black Diamond" with a question mark inside character in place of apostrophes and double spaces. This part I know is an encoding issue and can correct. The real question is this, these black diamonds show when the record is displayed in a MySQL admin tool and when the job listing is viewed in a web browser (simple select statement displays the listing in a PHP app), but after the first time it is viewed, then the problem somehow fixes itself. It is like the running the select then displaying the record updates the job description field and fixes the encoding issues. How could this be? Has anyone ever heard of this or anything similar? I cannot understand how a database field would change without running an update statement...
How are the job listings entered? Are they entered via a web page? If so, what character encoding does the web page use? (This should determine the character encoding of the submitted data AFAIK.) What character set is the connection used to communicate with MySQL? What is the character set of the column the data is stored in? Finally, what is the character encoding of the web page(s) on which the entered data is reviewed?
Here is what I do: I declare all of my pages as UTF-8 encoded, using the following tag at the start of the <head> section:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
I issue the following command immediately when I connect to MySQL, so as to make sure that MySQL understands the data I send to it will be UTF-8 encoded:
SET NAMES uft8
(Depending on the database abstraction method you use, a special function might be recommended in order to set the connection character set, like mysqli's mysqli_set_charset().)
I also make sure that those columns in which I intend to store UTF-8 data are declared to be UTF-8. You can find out what the character set of a column is by issuing SHOW CREATE TABLE table_name. The character set of the table (which by default is the character set for any column in the table) is displayed at the end. If the character set for the column is different to the default character set for the table then it is displayed as part of the column definition. If you wish to change the character set of a column then you can do so using ALTER TABLE.
If you have not previously taken the steps to handle character sets in your app then you may find that the tables are all using the latin1 character set. If you naively store UTF-8-encoded data (for example) into these columns, you may run into character encoding issues. Changing the column character set using ALTER TABLE does not necessarily fix your old data, because MySQL reads your old data assuming it to be valid latin1-encoded text and converts it to the eqivalent UTF-8 (correctly converting what it has read, but not giving the result you want).
The above steps would hopefully mean that future data will be correctly encoded and correctly displayed, but you may have data already mis-encoded in your database, so be aware that if you follow the above steps and still see older data displaying incorrectly, this may be why. Good luck.
Run into this problem a few years ago... I remember finding those notorious characters, and replacing them in php with a single quote or a double quote... Ofcourse with escaping... A simple preg_replace for those characters will do the trick... Its just an encoding issue...
This page, though geared for wordpress might help
http://codex.wordpress.org/Converting_Database_Character_Sets
I had the same issue (mysql encoding and webpage encoding set to UTF-8 but black diamonds showing up in my query results. I found this snippet while googling but cannot for the life of me find its source to give proper attribution:
if( function_exists('mysql_set_charset') ){
mysql_set_charset('utf8', $db_connection);
}else{
mysql_query("SET NAMES 'utf8'", $db_connection);
}
Anyway, it cleared up the issue for me.

Will changing collation affect my database?

I'm trying to track down a bug with some random characters appearing when saving data to our database. So far my travels have indicated that it's a character encoding issue.
I've swapped the collation on the dev to utf8_general_ci and it doesn't seem to have made a difference to the system, but I'm still unsure as to the full implications of changing collation.
I have been poking around in here, http://dev.mysql.com/doc/refman/5.0/en/charset-charsets.html and it's still not entirely clear.
I've also updated the page with the form on to include a utf-8 <meta /> tag.
The background of the issue is that posting a £ from the form, when it runs through our SQLBuilder class, it's passed through mysql_real_escape_string (deprecated I know :() and ends up in the database, and subsequently generated config files as £
As I understand it, the collation is a way for the database to compare characters, but I'm still not totally sure.
Ninja edit
Web application, posting an HTML form through a PHP class, into a MySQL DB
I usually do a mysql_query("set names utf8"); immediately after connecting to the database.

Categories