Very strange behaviour with UTF-8 [duplicate] - php

This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 7 years ago.
How can I echo a sql text field into a paragraph?
My code does puts the text in but changes the accents like á to this -> �.
I tried adding UTF-8 in the header and removing it.
Removing UTF-A makes the sql content is ok but all the content outside the paragraph messes up.
I checked the DB was using (UTF-8-unicode), The files were saved with UTF-8
Any ideas on what might be wrong?
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<textarea class="texto" name="textoarticulo" id="textoarticulo" form="formarticulo" placeholder="Texto del artículo...">
<?
echo $row['txt4'];
?>
</textarea>

It's important that your entire code has the same charset to avoid issues where characters displays incorrectly.
Here's a little list of things that has to be set to a specific charset.
Headers
Setting the charset in both HTML and PHP headers to UTF-8
PHP (PHP headers has to be placed before any output: PHP echo, whitespace, HTML!):
header('Content-Type: text/html; charset=utf-8');
HTML (HTML-headers are placed within the <head> / </head> tag):
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Connection
You also need to specify the charset in the connection itself.
PDO (specified in the object itself):
$handler = new PDO('mysql:host=localhost;dbname=database;charset=utf8', 'username', 'password');
MySQLi: (placed directly after creating the connection, $mysqli is the connection object)
$mysqli->set_charset("utf8"); // OOP style
mysqli_set_charset($mysqli, "utf8"); // procedural style
MySQL (depricated): (placed directly after creating the connection)
mysql_set_charset("utf8");
Database
Your database and its tables has to be set to UTF-8. Note that charset is not the same as collation.
You can do that by running the queries below once for each database and tables (for example in phpMyAdmin)
ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;
ALTER TABLE tablename CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Caution... There are various different situations that need different ALTERs. Details Here . Doing the wrong ALTER is likely to make things worse. -- Rick James
php.ini specification
In your php.ini file, you should specify the default charset for your platform, like this
default_charset = "utf-8";
(This is in essence the same as doing header('Content-Type: text/html; charset=utf-8'); on all pages)
File-encoding
It's also important that the .php file itself is UTF-8 encoded. If you're using Notepad++ to write your code, this can be done in the "Format" drop-down on the taskbar. You should use UTF-8 w/o BOM.
Should you follow all of the pointers above, chances are your problem will be solved. If not, you can take a look at this StackOverflow post: UTF-8 all the way through.

Related

Why does my output change?

I'm working with UTF-8 encoding in PHP and I keep managing to get the output just as I want it. And then without anything happening with the code, the output all of a sudden changes.
Previously I was getting hebrew output. Now I'm getting "&&&&&".
Any ideas what might be causing this?
These are most common problems:
Your editor that you’re creating the PHP/HTML files in
The web browser you are viewing your site through
Your PHP web application running on the web server
The MySQL database
Anywhere else external you’re reading/writing data from (memcached, APIs, RSS feeds, etc)
And few things you can try:
Configuring your editor
Ensure that your text editor, IDE or whatever you’re writing the PHP code in saves your files in UTF-8 format. Your FTP client, scp, SFTP client doesn’t need any special UTF-8 setting.
Making sure that web browsers know to use UTF-8
To make sure your users’ browsers all know to read/write all data as UTF-8 you can set this in two places.
The content-type tag
Ensure the content-type META header specifies UTF-8 as the character set like this:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">
The HTTP response headers
Make sure that the Content-Type response header also specifies UTF-8 as the character-set like this:
ini_set('default_charset', 'utf-8')
Configuring the MySQL Connection
Now you know that all of the data you’re receiving from the users is in UTF-8 format we need to configure the client connection between the PHP and the MySQL database.
There’s a generic way of doing by simply executing the MySQL query:
SET NAMES utf8;
…and depending on which client/driver you’re using there are helper functions to do this more easily instead:
With the built in mysql functions
mysql_set_charset('utf8', $link);
With MySQLi
$mysqli->set_charset("utf8")
*With PDO_MySQL (as you connect)*
$pdo = new PDO(
'mysql:host=hostname;dbname=defaultDbName',
'username',
'password',
array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8")
);
The MySQL Database
We’re pretty much there now, you just need to make sure that MySQL knows to store the data in your tables as UTF-8. You can check their encoding by looking at the Collation value in the output of SHOW TABLE STATUS (in phpmyadmin this is shown in the list of tables).
If your tables are not already in UTF-8 (it’s likely they’re in latin1) then you’ll need to convert them by running the following command for each table:
ALTER TABLE myTable CHARACTER SET utf8 COLLATE utf8_general_ci;
One last thing to watch out for
With all of these steps complete now your application should be free of any character set problems.
There is one thing to watch out for, most of the PHP string functions are not unicode aware so for example if you run strlen() against a multi-byte character it’ll return the number of bytes in the input, not the number of characters. You can work round this by using the Multibyte String PHP extension though it’s not that common for these byte/character issues to cause problems.
Taken form here: http://webmonkeyuk.wordpress.com/2011/04/23/how-to-avoid-character-encoding-problems-in-php/
Try after setting the content type with header like this
header('Content-Type: text/html; charset=utf-8');
Try this function - >
$html = "Bla Bla Bla...";
$html = mb_convert_encoding($html, 'HTML-ENTITIES', "UTF-8");
for more - http://php.net/manual/en/function.mb-convert-encoding.php
I put together this method and called it in the file I'm working with, and that seemed to resolve the issue.
function setutf_8()
{
header('content-type: text/html; charset: utf-8');
mb_internal_encoding('UTF-8');
mb_http_output('UTF-8');
mb_http_input('UTF-8');
mb_language('uni');
mb_regex_encoding('UTF-8');
ob_start('mb_output_handler');
}
Thank you for all your help! :)

Same dataset outputs different characters : phpmyadmin / own query

Im trying to get a some data from the db , but the output isn't what i expected.
Doing my own querying on the db , i get this output : string 'C�te d�Ivoire' (length=13)
Querying the db from phpmyadmin i get normal output : Côte d’Ivoire
php.ini default charset, mysql db default charset , <meta> charset are all set to utf-8 .
I can't fugire it out where the encoding is being made that i get different output with same configuration .
P.S. : using mysqli driver .
In the same page that gives you wrong results, try first running this instruction
print base64_encode("Côte");
The correct answer is Q8O0dGU.... If you get something else, like Q/R0ZQo..., this means that your script is working with another charset (here Latin-1) instead of UTF-8. It's still possible that also MySQL and also the browser are playing tricks, but the line above ensures that PHP and/or your editor are playing you false.
Next, extract Côte from the database and output its base64_encode. If you see Q8O0..., then the connection between MySQL and PHP is safely UTF8. If not, then whatever else might also be needed, you need to change the MySQL charset (SET NAMES utf8 and/or ALTER of table and database collation).
If PHP is UTF8, and MySQL is UTF8, and still you see invalid characters, then it's something between PHP and the browser. Verify that the content type header is sent correctly; if not, try sending it yourself as first thing in the script:
Header('Content-Type: text/html; charset=UTF8');
For example in Apache configuration you should have
AddDefaultCharset utf-8
Verify also that your browser is not set to override both server charset and auto-detection.
NOTE: as a rule of thumb, if you get a single diamond with a question mark instead of a UTF8 international character, this means that an UTF8 reader received an invalid UTF8 code point. In other words, the entity showing the diamond (your browser) is expecting UTF8, but is receiving something else, for example Latin1 a.k.a. ISO-8859-15.
Another difficult-to-track way of getting that error is if the output somehow contains a byte order mark (BOM). This may happen if you create a file such as
###<?php
Header("Content-Type: text/html; charset=UTF8");
?>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF8" />
</head>
<body>
Hellò, world!
</body>
</html>
where that ### is an (invisible in most editors) UTF8 BOM. To remove it, you either need to save the file as "without BOM" if the editor allows it, or use a different editor.
If you do your "own querying" with the command line tool mysql, you have to set the option --default-character-set=utf8, too. Otherwise, please tell us how you do your own querying.

php using utf8 without utf_encode

i'm running a german website which gets content from a mysql database.
i've defined the charset as utf8 as following:
<meta http-equiv='Content-Type' content='text/html;charset=utf-8' />
the problem is, when fetching + displaying contents from the database i always need to use utf8_encode in order to get the proper german "umlauts".
i want to maintain the utf8 charset for my web as i'll have to add more languages which have special characters.
any ideas on how to 1:1 echo database contents without having to utf8_encode?
thanks
Hard to tell without seeing how you are connecting to your database, but a common problem is the database connection itself.
After opening / selecting the database you need to set:
$db->exec('SET CHARACTER SET utf8'); // PDO
mysql_set_charset('utf8'); // Deprecated mysql_* extension
Whenever I want to use utf-8 with PHP and MySQL, I found that usually these two functions are the ones you should use after mysql_connect():
mysql_set_charset('utf8', $link);
mysql_query('SET NAMES utf8', $link);
Setting the content type in the header may do the trick:
header('content-type: text/html; charset=utf-8');
I had a similar problem and i solve adding this in the beginning of my PHP file:
ini_set('default_charset', 'UTF-8');
mb_internal_encoding('UTF-8');
Additionally, is very important to check if you are saving your PHP file in UTF-8 format without BOM, i had a big headache with this. I recomend Notepad++, it shows the current file encoding and allow you to convert to UTF-8 without BOM if necessary.
If you would like to see my problem and solution, it is here.
Hope it can help you!

UTF-8 display-issues in PhpMyAdmin-Gui

I've got the following problem with my PMA-GUI:
While the data submitted by PHP-Scripts to my database is displayed correctly, ONLY PMA displays several german Umlaut's (such as äüß, ..) as ü or ä
The problem occurs also while exporting tables to file..
MySQL: 5.0.51a-3ubuntu5.8
PMA: 3.4.5
Database & fields are utf8_general_ci
Does anybody know a solution?
Are you sure that your client is sending data as utf-8?
this seems to me a duplicate of:
German Umlaute in Mysql/Phpmyadmin
You need to ensure you use consistent use of character set/character encoding.
For example, to normalise to UTF-8 content, your DB fields' character sets should be set to UTF-8. Then, in your PHP (if you have your own scripts running that fetch DB information) you need to then add to the head section:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
Then, in the PHP, before any output to the browser, include the content type PHP header:
header ('Content-type: text/html; charset=utf-8');
Before you run any SQL to fetch content (so after you connect, but before executing your query), use mysql_set_charset:
mysql_set_charset('utf8',$link);
// $link is optional, refers to your DB connection
You can think of it as three steps:
The step used to add the characters to your DB
Storage of characters in your DB
Retrieval and display of characters
The simplest bet to ensure conformity and that characters display as you anticipate, is to ensure the correct, consistant, character set is defined at each stage.

phpmyadmin+php in russian

In phpmyadmin I have stored a few russian values, using utf8_unicode_ci encoding. They are shown perfectly in phpmyadmin.
The problem appears when I get those values with php and I try to put them into options of a select, they are shown as "??????".
I've tried changing the encoding in the headers to iso-8859-1 instead of utf-8 but it doesn't work neither.
I've also tried with
mb_convert_encoding($str, 'UTF-8', 'auto');
but no change :(
Any other idea??
If you're using a MySQL DB/connection, use mysql_query("SET NAMES 'utf8'"); before your run your query, though a better alternative is mysql_set_charset().
Also ensure you have the entry:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
In the header section of your page.
If you're using PDO, change your connection to:
$PDO_connection = new PDO("mysql:host=".$db['host'].";dbname=".$db['name'],
$db['user'], $db['pword'],
array(PDO::MYSQL_ATTR_INIT_COMMAND => "SET NAMES utf8")
I've tried changing the encoding in the headers to iso-8859-1 instead of utf-8
What for? what's the point in changing right encoding that support russian characters to wrong one that doesn't?
In order to achieve proper encoding on your page, you u have to do 2 things:
To tell the database what encoding you're expecting your data in. It should be done with mysql_set_charset('utf8') (or similar function of other library if you'r using one) where utf8 is the name of the encoding in mysql lingo.
to tell a browser what encoding your page in. it should be done with Content-type HTTP header, using header ('Content-type: text/html; charset=utf-8'); and nothing else.

Categories