MySQL database migration UTF-8 issues with PHP [duplicate] - php

This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 12 months ago.
I'm migrating my existent database into another server. To achieve that I've exported and imported the database using phpMyAdmin SQL queries. Everything works fine, except that some UTF-8 characters appear broken in the website. I fetch them using the same PHP code (on a different server but with same PHP extensions and version).
Example of a string as I see it on the new website and on the databases (both old and new) (using phpMyAdmin): péri-prothétique
Example of a string as I see it in the old website péri-prothétique
As you can see, PHP used to automatically encode the characters the right way even thought the characters are mangled in the database, but doesn't do so anymore (not even if i explicitly utf8_encode or utf8_decode the result). I even tried forcing $mysqli->set_charset("UTF8") on every connection to no avail.
Both the web server, the database server,server connection, PHP and the tables use UTF-8 or utf8mb4 charset and collation, and are setup the same way as the old ones.
The only difference I see is that the new database server is MariaDB instead of MySQL and its webserver is nginx instead of Apache.
New database specs picture from phpMyAdmin:
Old database specs picture:
New webserver specs on which the website and PHP runs (same specs as old one but different server):
Apache 2.4 PHP 7.0
How can I get back that old correct encoding? Why doesn't PHP automatically decode them right anymore?
UPDATE:
Using mb_detect_encoding I see that PHP in both new and old version detects ASCII or UTF-8 on the query results, depending on whether there's at least an UTF-8 symbol or not.
The issue is that on the new version PHP doesn't display the UTF-8 symbols right even thought it detects the string encoding as UTF-8.
UPDATE 2:
thanks to this question I figured out why my entries were mangled: double encoding arose from the fact that the database collation was latin1_swedish_ci while the tables collation was utf8_general_ci.
This doesn't answer the question thought since the old website was automatically "translating" those mangled characters, rendering them right in the HTML, and I want to replicate that behavior into the new website which is a different one but with the same code and php.ini settings.

To check for double encoding, use SELECT HEX(col)... é should come back C3A9 (proper utf8), but instead shows C383C2A9 (double encoding).
See: Trouble with UTF-8 characters; what I see is not what I stored
If you have actually determined that you have double encoding, then the fix involves
UPDATE tbl SET col = CONVERT(BINARY(CONVERT(col USING latin1)) USING utf8mb4);
See http://mysql.rjweb.org/doc.php/charcoll#fixes_for_various_cases
Yes, "double encoding" is a silent bug -- two wrongs make a right (sort of).

I think that you should check for your MariaDB configuration.
First check your php code in order to know if there isn't misleading typo ( but i think it doesn't)
Second, check for your MariaDB database/tables structure [extracted from here ]:
SELECT * FROM INFORMATION_SCHEMA.SCHEMATA;
Third, check for your MariaDB files config (my.cnf)[extracted from here ] :
[client]
default-character-set = utf8mb4
[mysqld]
character-set-server = utf8mb4
Then restart your server :
mysql.server restart
Hope it will help you to fix your problem bro.
Bye

Are you expecting the changes on existing data? It will not work. You need to add the data again to see the changes. Remove all the data from the new database and add again.

Related

PHP doesn't show some characters properly

I am running a PHP CLI script on my local machine, which calls an API and receives back UTF encoded string. Now, this string is basically a simple word in Portuguese, and I can see it properly when printed on the screen(terminal). However, on a different machine, some characters are not shown properly. The question is, which php.ini option would I need to set in order to see string properly on the other machine? I haven't touched my PHP.ini in regards to encoding and everything works fine, so I'm not sure what do I need to enable for UTF to work on a php cli app?
Edit: this should not be a terminal issue. To be more specific, API returns back a string(UTF encoded) which is supposed to be found inside an array. However, PHP is issuing a warning on another machine saying that can't find that string inside an array. It is the key doesn't exist error that I'm getting, and the key or the string that is shown doesn't look right, so that's the case.
It is basically
$stringReturnedFromApi = $apiCall();
$this->myArray[$stringReturnedFromApi];
it works right on my machine, on another is complaining that the key doesn't exist, and when I looked up $stringReturnedFromAp, it doesn't look UTF encoded
The other terminals need to be set to UTF-8 encoding. For instance, PuTTY needs to be told what encoding, or he will use (on windows) ANSI if i remember well.
If the other host is not on Windows, just verify the locales to be sure it is UTF-8. This is not on your side that the error is, this is just an encoding which is client-side.
If the result is printed out to a webpage, be sure the charset encoding is set to UTF8 also;
I am not sure that this is actually a PHP problem.
Without seeing your script I can't say anything 100%, but I am guessing that the PHP script is actually correctly outputting the UTF character. The reason it looks odd is because the terminal doesn't understand UTF encoding and is unable to display the character.
You would need to be a little more specific on the details of the terminal application you are using to determine where the actual problem lies. I doubt it is something you can fix in the php.ini

MySQL collation/charset - I'm not sure what to do?

I have a Multilingual Wordpress website on my host (which someone else built), and I want to install a copy of this site on Xampp so that I can make changes.
I don't know phpmyadmin & mysql very well. Usually I just use utf8_general_ci for everything.
After exporting and importing, I kept getting syntax errors. So... I figured this was due to the collation & charset of the database? I checked in the database and next to some of the columns it says "utf8_general_ci", and next to others it says "utf8_unicode_ci", and then at the bottom of the table it says "latin1_swedish_ci". So now I have no clue what the collation of this database is...
How do I know what the collation & charsets of this database is? And how do I export and import everything as is? I heard phpmyadmin automatically converts databases during export... is this true? What must I do to get this database exported and imported correctly?

kannada words are displaying as question marks in firefox browser

I am trying to display kannada words in mozilla browser on Ubuntu 12.04 lts connecting through the MySql.
I have used collation utf-8 general ci and used header('Content-type:text/html; charset=utf-32'); php code in php.
When I tried to retrieve the words from database and display it on the firefox browser it is displaying as question marks...
Please help.
The character encoding that you declare in a header must be the same as the actual encoding. It seems that these differ radically (UTF-32 vs. possibly UTF-8). Find out the actual encoding and declare it.
Don’t use UTF-32 on web pages. Firefox was the last major browser that supported it, and the support was removed in 2011.

Migrate web-pages from different char-sets to UTF-8

For the last years I used Notepad++ on Win XP SP2.
As I just have seen, the setting in Notepad++ is to encode new files in "ANSI" in "Windows Format". Basically all files on my harddisk should be ANSI files then, but I'm not sure.
Most .html-files have a charset-tag as "text/html; charset=iso-8859-1", but some have none.
Other files, especially text-files (for example keyword-lists) I stored with Firefox XPCOM-system, I don't know how they are currently encoded.
On Server-side I have Apache with PHP and MySql.
For Upload I used Filezilla.
Now the problem is: I want to use Japanes signs (or arabic, etc.). This only works partly.
I can get my selfmade Firefox-Application to constantly write or read UTF-8. But I can't check everytime which of the old files is which encoding.
Having just read Joel Spolsky's old article about UTF-8 strengthens my view that I simply have to get my whole system changed as much as possible to UTF-8.
As long as I have it running that way locally on my Hard-Disk I could just re-upload everything to the server.
So: How do I get all my files locally transfered to UTF-8?
And: Is it possible at all to have Win XP SP2 using constantly UTF-8 everywhere? Or do I have to check it with every program, or even worse with every file, that the right encoding is to be used.
How about files I get for example in E-Mails or via an USB-stick, or that I download in zip-files? (Or a thousand possibilities more.)
Update:
1.-4. went OK so far. I tried first with BOM, but without seems to be better.
So to 5.) Something I have to change there too. I changed as in 3.) the charset in the html-template-file, and the text coming from the template is displayed correctly. But the text coming from MySql/Php shows the UnknownChar-sign at some places currently, i.e. where there should be Umlaute äöü.
I have changed all collations for text fields in the MySql-Database via phpmyadmin to "utf8_unicode_ci", but that didn't do the trick.
Is it a php-issue, or do I only have to convert somehow the data in the MySql-Database once?
The beauty of UTF-8 is that it's a superset to ASCII, so if your html and php files only contain Latin alphabets (i.e. English and programing/HTML syntax), you don't need to convert the file at all. You can leave most of your file unchanged.
Should you find few exceptions that you want to convert it manually, you may open them up in Notepad++, and do 'Encoding' - 'Convert to UTF-8 (No BOM)'.
Yes, you do need to change/add <meta> charset tag to all the HTML files to make sure the browser render your files in UTF-8.
In Notepad++ you could set the new file to always open with 'UTF-8 (No BOM), Unix'. Also, check the tick on "Apply to ANSI files" so old file can be correctly saved to the new encoding. I suggest the format is because even though you are working on a Windows machine, the web servers usually runs Linux/BSD so the format is the native form (keeping files in native form is important especially when you are using a version control system).
Migrate a live site with database is a different issue. Data in MySQL comes with their own encoding, and from your question I cannot tell if you need to do it and how to do it. Need more specifics on that (if you need to).

Intermittent problem with UTF-8 characters

I am running a fairly standard LAMP stack.
The problem is an intermittent rendering of UTF-8 characters correctly. About 50% of the time the non-ASCII UTF-8 characters render correctly (e.g. with appropriate diacritical marks), but about 50% of the time I get the '?' rendition instead. If I reload the page, sometimes it corrects the problem and sometimes it does not. It happens with all browsers on all platforms, which suggests a MYSQL or Apache problem but I have not been able to figure it out.
The data base itself is in UTF-8 format and I have never seen the problem while browsing the database in phpMyAdmin.
I issue a SET NAMES utf-8 command upon opening the data base (and have tried changing that to a SET CHARSET utf-8 command) with no luck.
What's confusing me is that it is intermittent, happening in streaks, e.g. it will happen on 30 pages in a row (even if they are just reloads), and then clear up for 10 pages, and then happen again for a few pages, etc.
You can try to see the problem by hitting the 'list' button here: http://latin-words.com/list_vocab.php though it may take many reloads to either make it happen or make it go away
Server Configuration:
Ubuntu: 9.10
Mysql: 5.1.37
PHP 5.2.10
Apache 2.2.12
Any hints would be greatly appreciated?
edit:
For searchers sake, from the comments, the problem was actually an issue doing a SET NAMES utf-8; (incorrect) instead of an SET NAMES utf8; (correct) That doesn't mean my much more obscure reason posted below cannot also be the reason ;)
Sounds like a problem with locales & iconv, try to determine what locale is used in the webserver process the moment all is well, and the moment it doesn't work anymore (try $currentlocale = setlocale(LC_ALL,NULL); or $currentlocale = setlocale(LC_CTYPE ,NULL); to get the used locale).

Categories