We are having a PHP5.6 website project and we are about to re-launch it on PHP7.4.
Let's call them old environment and new environment. Old one still intact. Both are on different server machines.
Charsets (html meta tags) are set to utf-8.
Zend Framework 1 is involved in both. The database is on an SQL Server, shared by both environments. We use the SqlSrv driver to connect to the database (new environment), the old environment has PDO-Sql.
The encoding of the database is set to Latin1_General_CI_AS.
Information is getting inserted and selected into/from many tables (INSERT, SELECT). Html textfields and
textareas are in use.
In the old environment, any text written in textfields/-areas with special characters, such as umlauts, is being saved in the database in a corrupt form, like instead of ö there is ö in the database table. On the screen, after a select-statement however, it is shown as ö (clean!).
That was all okay until now, but now we have the new environment.
Let's say there is are old entries saved during the old environment era and we open the website on the new environment. The content is shown 1:1 as seen in the database table, in other words: corrupted. Which explains why anything that is saved with the help of the new environment is shown correctly on screen since special characters and umlauts are saved without any changes in the database table.
But the entries made with the new environment cannot be seen on the old environment website.
Using utf8_encode or utf8_decode didn't help much, either it looked even worse, or there is no text on screen to be seen neither.
Writing some script that changes the encoding in the table would cause mayhem, because since the old environment still in use, it can't be done that easy.
There are no encoding options mentioned in the options, that are used on the class called Zend_Db_Adapter_Sqlsrv.
Well, I don't trust mb_detect_encoding and yet we tried that, but it returned UTF-8 on the returned values from the tables.
So what would people recommend? I might have missed some facts, but I'll provide you with more information if needed.
This sounds very similar to a problem I've solved in the past. Unfortunately I solved it in ASP.NET, so I can only describe what I did and let you translate it into PHP.
So the issue probably arises because your old system is using a non-UTF-8 codepage, in my case the codepage was windows-1252 which was fairly common at the time. The codepage determines the character coding that your code uses.
So on my more modern system what I had to do was force the codepage back to windows-1252 while I was reading from the database. And then before rendering the page, set the content encoding to UTF-8.
So unless you are able to fix the problem at source, you basically have to hack your new system to continue operating the same way - which is a unfortunate but sometimes necessary.
The ASP.NET code looks like this:
protected void Page_Load(object Sender, EventArgs Args)
{
// Set the encoding for building and rendering, then switch later to display as utf-8
Response.Charset = "windows-1252"; // Hmmm... double check this
Response.ContentEncoding = System.Text.Encoding.GetEncoding("windows-1252");
}
protected void Render(HtmlTextWriter writer)
{
// Now that all the character encoding has taken place, switch to utf-8 to force it to display this way...
Response.Charset = "utf-8";
Response.ContentEncoding = Encoding.UTF8;
}
Hopefully that gives you enough to go on... its been a long time since I did this, but the pain still sticks in my mind!
Since it was a project someone else did, we decided to update all tables and correct the values.
Related
I have a website, with arabic content which has been migrated from a different server. On the old server, everything was displaying correctly, supposedly everything was encoded with UTF-8.
On the current server, the data started displaying incorrectly, showing نبذة عن and similar characters.
The application is build on the CakePHP Framework.
After many trials, I changed the 'encoding' parameter in the MySql connection array to become 'latin1'. For the people who don't know CakePHP, this sets MySql's connection encoding. Setting this value to UTF8 did not change anything, even after the steps described below.
Some of the records started showing correctly in Arabic, while others remained gibberish.
I have already gone through all the database and server checks, confirming that:
The database created is UTF-8.
The table is UTF-8.
The columns are not explicitly set to any encoding, thus encoded in UTF-8.
Default Character set in PHP is UTF-8
mysql.cnf settings default to UTF-8
After that, I retrieved my data and looped through it, printing the encoding of each string (from each row) using mb_detect_encoding. The rows that are displaying correctly are returning UTF8 while it is returning nothing for the rows that are corrupt.
The data of the website has been edited on multiple types, possibly with different encodings, this is something I cannot know for sure. What I can confirm though, is that the only 2 encodings that this data might have passed through are UTF-8 and latin1.
Is there any possible way to recover the data when mb_detect_encoding is not returning anything and the current dataset is unknown?
UPDATE: I have found out that while the database was active on the new server, the my.cnf was updated.
The below directive was changed:
character-set-server=utf8
To
default-character-set=utf8
I am not sure how much this makes a difference though.
Checking the modified dates, I can conclude to a certain degree of certainty that the data I could recover was not edited on the new server, while the data I couldn't retrieve has been edited.
Try to fix the problem from DB side .. not from php or DB connection
I advice you to go to your old server and export your DB again with character set UTF8
then after import it to a new server .. be sure that you can see the arabic characters inside the tables(with phpmyadmin)
if your tables looks fine ..
then you can move to check the next
DB connection
php file encoding
the header encoding in html
as I know if the problem from the DB .. there is no way without export the data again from the old server
Edit:
if you do not have access to your old DB please check this answer it can help you
You were expecting نبذة عن? Mojibake. See duplicate for discussion and solution, including how to recover the data via a pair of ALTER TABLEs.
I had a similar problem with migrating database tables encoded with utf8 from a public server to localhost. The resolution was in setting the localhost server encoding using PHP
$db->set_charset("utf8")
right after the mysqli connection.
Now it works properly.
I know there were plenty of questions like this but I am creating the new one because to my point of view it is specific to each situation.
So, my page is displayed in UTF-8 format. The data is taken from mySQL that has utf8_unicode_ci collation. The data I am displaying is the string - 1 Bröllops-Festkläder.
There are some unicode characters in here and they should display fine but they do not. On my page these are just a bunch of hieroglyphs.
Now, the interesting situation:
I am using phpMyAdmin to keep track of what is happening in the database. The website has the ability to import CSV documents containing customer data and modify each customer individually. If I import CSV document containing these characters they are written to the database, readable in phpMyAdmin and not readable on my page. If I use my script to modify the customer information and I type those characters from the browser, the it is vice versa - they are readable on the page and they are not readable in phpMyAdmin, so clearly the encoding is different. I spent ages figuring out the right combination and I could not.
UPDATE: Deceze posted a link below that I copy here to make it more noticeable. I am sure this will save hours and days to many people facing similar issues - Handling Unicode Front to Back in a Web App
There're couple of things that got involved here. If your database encoding is fine and html encoding is fine and you still see artefact, it's most likely your db connection is not using same encoding, thus leading to data corruption. If you connect by hand, you can easily enforce utf encoding, by doing query SET NAMES UTF8 as very first thing after you connect() to your database. It is sufficient to do this only once per connection.
EDIT: one important note though - depending on how you put your data to the DB, your database content may require fixing as it can be corrupted if you put it via broken connection. So, if anyone is facing the same issue - once you set all things up, ensure you are checking on fresh data set, or you may still see things incorrectly, even all is now fine.
I haven't got clue if this is a normal issue or not but I have a small flash application that handles management for my company. It's a small company, so its not a big deal, its just a bunch of INSERTs, SELECTs, UPDATEs and other stuff to manage their clients, address, phone numbers, etc.
The flash (in AS3) sends the variables through a URLRequest to several php pages and the php handles the request to mySQL.
My problem is that, sometimes, instead of inserting the String I sent, it instead gets a weird string, made mostly, but not only, of numbers (and it happens like 1 column out of about 10 per INSERT, so its fairly common).
Is this a known issue? Could it be because of the encoding (I used UTF-8, which I believe is the one that we use here in portugal, due to special characters, like ã, à, á, etc)?
Thank you for your time.
Marco Fox.
After connecting to the DB, try the following query "SET CHARACTER SET utf8;".
Make sure every PHP page are in utf-8.
To do that, open the file in Notepad++ and use the menu Encoding -> Convert to UTF-8 without BOM, or open the file in notepad and ask to save as and look at encoding dropdown bellow name (this will save the BOM, which is not good).
Some IDE have the ability to save in ANSI, UTF-8 and more, or have the conversion option.
In Flash, use encodeURI() in your URLLoader data if you are passing it by GET.
Hopes that this solves your problem (if it is, in fact, encoding issues).
I have had no end of problems trying to do what I thought would be relatively simple:
I need to have a form which can accept user input text in a mix of English an other languages, some multi-byte (ie Japanese, Korean, etc), and this gets processed by php and is stored (safely, avoiding SQL injection) in a mysql database. It also needs to be accessed from the database, processed, and used on-screen.
I have it set up fine for Latin chars but when I add a mix of Latin andmulti-byte chars it turns garbled.
I have tried to do my homework but just am banging my head against a wall now.
Magic quotes is off, I have tried using utf8_encode/decode, htmlentities, addslashes/stripslashes, and (in mysql) both "utf8_general_ci" and "utf8_unicode_ci" for the field in the table.
Part of the problem is that there are so many places where I could be messing it up that I'm not sure where to begin solving the problem.
Thanks very much for any and all help with this. Ideally, if someone has working php code examples and/or knows the right mysql table format, that would be fantastic.
Here is a laundry list of things to check are in UTF8 mode:
MySQL table encoding. You seem to have already done this.
MySQL connection encoding. Do SHOW STATUS LIKE 'char%' and you will see what MySQL is using. You need character_set_client, character_set_connection and character_set_results set to utf8 which can easily set in your application by doing SET NAMES 'utf8' at the start of all connections. This is the one most people forget to check, IME.
If you use them, your CLI and terminal settings. In bash, this means LANG=(something).UTF-8.
Your source code (this is not usually a problem unless you have UTF8 constant text).
The page encoding. You seem to have this one right, too, but your browsers debug tools can help a lot.
Once you get all this right, all you will need in your app is mysql_real_escape_string().
Oh and it is (sadly) possible to successfully store correctly encoded UTf8 text in a column with the wrong encoding type or from a connection with the wrong encoding type. And it can come back "correctly", too. Until you fix all the bits that aren't UTF8, at which point it breaks.
I don't think you have any practical alternatives to UTF-8. You're going to have to track down where the encoding and/or decoding breaks. Start by checking whether you can round-trip multi-language text to the data base from the mysql command line, or perhaps through phpmyadmin. Track down and eliminate problems at that level. Then move out one more level by simulating input to your php and examining the output, again dealing with any problems. Finally add browsers into the mix.
First you need to check if you can add multi-language text to your database directly. If its possible you can do it in your application
Are you serializing any data by chance? PHPs serialize function has some issue when serializing non-english characters.
Everything you do should be utf-8 encoded.
One thing you could try is to json_encode() the data when putting it into the database and json_decoding() it when it's retrieved.
The problem was caused by my not having the default char set in the php.ini file, and (possibly) not having set the char set in the mysql table (in PhpMyAdmin, via the Operations tab).
Setting the default char set to "utf-8" fixed it. Thanks for the help!!
Check your database connection settings. It also needs to support UTF-8.
I am trying to debug a nasty utf-8 problem, and do not know where to start.
A page contains the word 'categorieën', wich should be categorieën. Clearly something is wrong with the UTF-8. This happens with all these multibite characters. I have scanned the gazillion topics here on UTF8, but they mostly cover the basics, not this situation where everything appears to be configured and set correct, but clearly is not.
The pages are served by Drupal, from a MySQL database.
The database was migrated (not by me) by sql-dumping and -importing trough phpmyadmin. Good chance something went wrong there, because before, there was no problem. And because the problem occurs only on older, imported items. Editing these items or inserting new ones, and fixxing the wrongly encoded characters by hand, fixes the problem. Though I cannot see a difference in the database.
Content re-edited trough Drupal does not have this problem.
When, on the CLI, using MySQL, I can read out that text and get the correct ë character. On both The articles that render "correct "and "incorrect" characters.
The tables have collation utf8_general_ci
Headers appear to be sent with correct encoding: Vary Accept-Encoding and Content-Type text/html; charset=utf-8
HTML head contains a <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
the HTTP headers tell me there is a Varnish proxy inbetween. Could that cause UTF8-conversion/breakage
content is served Gzipped, normal in Drupal, and I have never seen this UTF8 issie wrt the gzipping, but you never know.
It appears the import is the culprit and I would like to know
a) what went wrong.
b) why I cannot see a difference in the mysql cli client between "wrong" and "correct" characters
c) how to fix the database, or where to start looking and learning on how to fix it.
The dump file was probably output as UTF-8, but interpreted as latin1 during import.
The ë, the latin1 two-byte representation of UTF-8's ë, is physically in your tables as UTF-8 data.
Seeing as you have a mix of intact and broken data, this will be tough to fix in a general way, but usually, this dirty workaround* will work well:
UPDATE table SET column = REPLACE("ë", "ë", column);
Unless you are working with languages other than dutch, the range of broken characters should be extremely limited and you might be able to fix it with a small number of such statements.
Related questions with the same problem:
Detecting utf8 broken characters in MySQL
I need help fixing Broken UTF8 encoding
* (of course, don't forget to make backups before running anything like this!)
There should have not gone anything awol in exporting and importing a Drupal dump, unless the person doing this somehow succeeded into setting the export as something else than UTF8. We export/import dumps a lot and have never bumped into a such problem.
Hopefully Pekkas answers will help you to resolve the issue, if it is in the DB, but I also thought that you could check wether the data being shown on the web page is being ran through some php functions that arent multibyte friendly.
Here are some equivalents of normal functions in mb: http://php.net/manual/en/ref.mbstring.php
ps. If you have recently moved your site to another server (so it's not just a db import), you should check what headers your site is sending out with a tool such as http://www.webconfs.com/http-header-check.php
Make sure the last row has UTF8 in it.
You mention that the import might be the problem. In that case it's possible that during import the connection with the client and the MySQL server wasn't using UTF-8. I've had this problem a couple of times in the past, so I'd like to share with you these MySQL settings (in my.conf):
Under the server settings add these:
# UTF 8
default-character-set=utf8
character-set-server=utf8
collation-server=utf8_general_ci
skip-character-set-client-handshake
And under the client settings add:
default-character-set=utf8
This might save you some headache the next time.
To be absolutely sure you have utf8 from start to end:
- source code files in utf8 without BOM
- database with utf8 collation
- database tables with utf8 collation
- database connection in utf8 (query it with 'SET CHARSET UTF8')
- pages header set to utf8 (the ajax ones too)
- meta tag to set page in utf8