I have hard time with character charset, I suspect my fonction that display date to return non UTF-8 character (août is replaced by a question mark inside a diamond août).
When working on my local server everything's fine but when I push my code on my staging server, it's not displaying properly.
My php files are saved as UTF-8 NO BOM
If I inspect my output page, headers indicate UTF-8.
My local machine is a Mac with MAMP installed and my stating server have CentOS with cPanel installed.
Here is the part I suspect causing problem :
$langCode = "fr_FR"; /* Alos tried fr_FR.UTF-8 */
setlocale(LC_ALL, $langCode);
$monthName = _(strftime("%B",strtotime($dateStr)))
echo $monthName; /* Alos tried utf8_encode($monthName) worked on my staging server but not on my local server ! I'm using */
Finally found how to find the bug and fix it.
setlocale(LC_ALL, 'fr_FR');
var_dump(mb_detect_encoding(_(strftime("%B",strtotime($dateStr)))));
the dump returned UTF-8 on local and FALSE on staging server.
PHP.net documentation about mb_detect_encoding()
Return Values ¶
The detected character encoding or FALSE if the encoding cannot be
detected from the given string.
So charset can't be detected. I will try to force it "again"
setlocale(LC_ALL, 'fr_FR.UTF-8');
var_dump(mb_detect_encoding(_(strftime("%B",strtotime($dateStr)))));
this time the dump returned UTF-8 on local and UTF-8 on staging server. So I rollback my code to see what's happened when I tried first time with fr_FR.UTF-8 why does it was not working ? And I realize I was using utf8_encode() like pointed by user deceze in comment of this function's doc,
In fact, applying this function to text that is not encoded in ISO-8859-1 will most likely simply garble that text.
Thank you for your help everyone !
put this meta tag on your html code inside <head></head>
<meta charset="UTF-8">
It seems your server are configured to send the header
content-type: text/html; charset=UTF-8
as default. You could change your server configuration or you could add at the very start
<?php
header("content-type: text/html; charset=UTF-8");
?>
to set this header by yourself.
you need to use :
<?php
$conn = mysql_connect("localhost","root","root");
mysql_select_db("test");
mysql_query("SET NAMES 'utf8'", $conn);//put this line after you select db.
Related
This question already has answers here:
UTF-8 all the way through
(13 answers)
Closed 4 years ago.
When I moved from php mysql shared hosting to my own VPS I've found that code which outputs user names in UTF8 from mysql database outputs ?�??????� instead of 鬼神❗. My page has utf-8 encoding, and I have default_charset = "UTF-8" in php.ini, and header('Content-Type: text/html; charset=utf-8'); in my php file, as well as <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> in html part of it.
My database has collation utf8_bin, and table has the same. On both previos and current hosting in phpmyadmin for this database record I see: 鬼神❗. When I create ANSI text file in Notepad++, paste 鬼神❗ into it and select Encoding->Encode in UTF-8 menu I see 鬼神❗, so I suppose it is correct encoded UTF-8 string.
Ok, and then I added
init_connect='SET collation_connection = utf8_general_bin'
init_connect='SET NAMES utf8'
character-set-server=utf8
collation-server=utf8_general_bin
skip-character-set-client-handshake
to my.cnf and now my page shows 鬼神❗ instead of ?�??????�. This is the same output I get in phpmyadmin on both hostings, so I'm on a right way. And still somehow on my old hosting the same php script returns utf-8 web page with name 鬼神❗ while on new hosting - 鬼神❗. It looks like the string is twice utf-8 encoded: I get utf-8 string, I give it as ansi string to Notepad++ and it encodes it in correct utf-8 string.
However when I try utf8_encode() I get й¬ÑзÒÑвÑâ, and utf8_decode() returns ?�???????. The same result return mb_convert_encoding($name,"UTF-8","ISO-8859-1"); and iconv( "ISO-8859-1","UTF-8", $name);.
So how could I reproduce the same conversion Notepad++ does?
See answer below.
The solution was simple yet not obvious for me, as I never saw my.cnf on that shared hosting: it seems that that server had settings as follows
init_connect='SET collation_connection = cp1252'
init_connect='SET NAMES cp1252'
character-set-server=cp1252
So to make no harm to other code on my new server I have to place mysql_query("SET NAMES CP1252"); on top of each php script which works with utf8 strings.
The trick here was script gets a string as is (ansi) and outputs it, and when browser is warned that page is in utf-8 encoding it just renders my strings as utf-8.
Alter language settings for PHP application - WAMP server. Firefox do not regognize language in my PHP-application and replace letters in my language with very small <?>-icons (with appropriate quotes, of cause). It does not help to write <html lang=nb-NO> or <html lang=no> at the start of the PHP application file and not using <meta charset=iso-8859-1> in the header either.
The letter substitution does not occur in PHP-admin and not in a local installed Wordpress. In Wamp server 3.0.0. most text are in english. Data stored in MySQL have been stored correct. Right click on Wamp-server icon in the system tray and then the Language menu choice gives a language list with v-mark before english.
How can settings be altered so that Firefox/IE/Opera/... will interpret the application correct and display all characters in the alphabet?
Your problem is not with the language, actually the language does not influence anything. The problem is with charset.
Commons issues in enconding
It is very common when working with accents that we find strange characters such as:
Like this é (é character in Unicode), this is because the character is unicode, but the page is in iso-8859-1 (or compatible).
The � signal/character is an example of this is when you use a compatible accents with "iso-8859-1" on a page that's trying to process "UTF-8" because of the use of Content-Type: ...; charset=utf8.
What is needed to use UTF-8
PHP scripts (refer to files on the server and not the answer thereof) must be saved in "utf-8 without BOM"
Set MySQL (or other database system) with charset=utf-8
It is recommended to use header('Content-type: text/html; charset=UTF-8'); in PHP scripts (use a framework may not be necessary, the situation varies).
Note: The advantage of UTF-8 is that you can use various "languages" in your page with characters that are not supported by "iso-8859-1".
About ISO-8859-1
I recommend using ISO-8859-1 if your site does not use characters other than Latin and you do not need "extra encodings" (such as "icons" of "UTF-8" or "UTF-16"), however even if you do no need of UTF-8, one of the reasons that might be good to move to UTF-8, it is that in June 2004, the development group of ISO/IEC responsible for its maintenance, declared the end of support for this encoding, focusing on UCS and Unicode.
Source: http://en.wikipedia.org/wiki/ISO_8859-1
If you decide to use UTF-8 in your site, I recommend the following steps:
PHP script with UTF-8 (without BOM)
Note: read about BOM in http://en.wikipedia.org/wiki/UTF-8#Byte_order_mark
You should save all PHP scripts (yet you will use with include,require, etc.) in UTF-8 without BOM, use programs like SublimeText or notepad++ for convert files:
Using notepad++:
Using Sublime Text:
Netbeans got to Window > Preferences > General > Workspace > Text File Encoding:
MySQL with UTF-8
To create a table in UTF-8 you should use something like:
CREATE TABLE mytable (
id BIGINT NOT NULL AUTO_INCREMENT PRIMARY KEY,
title varchar(300) DEFAULT NULL
) ENGINE=InnoDB CHARACTER SET=utf8 COLLATE utf8_unicode_ci;
If the tables already existed, so first make a BACKUP them and then use one of the following commands (as appropriate):
Convert database:
ALTER DATABASE mysdatabase CHARACTER SET utf8 COLLATE utf8_unicode_ci;
Convert specific table:
ALTER TABLE mytable CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;
In addition to creating the tables in UTF-8 it is necessary to define the connection as UTF-8.
With PDO use exec:
$conn = new PDO('mysql:host=HOST;dbname=mysdatabase', 'USER', 'PASSWORD');
$conn->exec('SET CHARACTER SET utf8');//set UTF-8
With mysqli use mysqli_set_charset:
$mysqli = new mysqli('HOST', 'USER', 'PASSWORD', 'mysdatabase');
if (false === $mysqli->set_charset('utf8')) {
printf('Error: %s', $mysqli->error);
}
Setting the page charset
You can use the <meta> tag to set the charset, but recommended is you do this in the response of the request (server response), defining the "headers" (this does not mean that you should not use <meta>).
Use header function, the reason to use the server response is also because the page rendering time as the server response and page "AJAX" also need the charset defined by header();.
Note: header(); should always go at the top of the script before anyone echo ...;, print "...";, or other output function.
Example:
<?php
header('Content-Type: text/html; charset=UTF-8');
echo 'Hello World!';
I have a php file that reads a CSV file that im assuming is in UTF-8 - sent via API. I'm using fopen() to read it.
The issue is my output returns as :
IU?Q?JL?.?/Q?R??/)?J-.?))VH?/OM?K-NI?T0?P?*ͩT0204jzԴ?H???X???# D??K
I checked my php5 config settings:
Default is UTF-8 already :/ ; php.net/default-charset ;default_charset = "UTF-8"
I changed ISO-8859-1 to UTF-8 below also:
[iconv] ;iconv.input_encoding = UTF-8 ;iconv.internal_encoding = UTF-8 ;iconv.output_encoding = UTF-8 ;mssql.charset = "UTF-8"
The output is still the same. Any suggestions or steps I could take to solve the issue.
I never opened files with php but,
Have you used
$data = fopen($file);
fgets($data);
, too?
If you are just reading from the source, no problem as php doesn't make any encoding assumptions for strings. So, if your source is sending you the data as UTF8, it is UTF8, the default_charset in php is just an header sent before your page which can be overridden in a number of ways. Check if your browser is actually showing the page in the correct encoding... in Chrome, go to the menu More Tools / Encoding, there you'll see the encoding that is being used.
I had to use compress.zlib to solve the issue
$f_pointer=fopen("compress.zlib:URL","r");
I have a piece of PHP code, which was written in notepad++ on a Windows 7 machine
The Encoding in notepad++ is set to "Encode to ANSI" (ASCII)
I am them doing this in my code:
utf8_encode("£")
so I am sure to get the utf friendly version of the £ symbol.
All works perfectly fine on the local server.
But when I push it up to my live server I'm getting all sorts of issues with utf8 encoding errors in php.
Is something in the git push/pull process corrupting this, or is it perhaps a locale setting on the live server?
Both local and live servers run ubuntu 12.04
Thanks
Update 1
The actual error I'm getting is
invalid byte sequence for encoding "UTF8": 0xa3'
(This is a Postgres SQL error)
Other difference in local and live is live is over https and local is just http (both apache)
Update 2
Running:
file -bi script.php
on both local and live produces:
text/x-php; charset=iso-8859-1
So it seems as if the encoding of the file is intact?
Update 3
Looking at the local Postgres installation it has the following settings:
ENCODING = 'UTF8'
LC_COLLATE = 'en_GB.UTF-8'
LC_CTYPE = 'en_GB.UTF-8'
Whereas live has:
ENCODING = 'UTF8'
LC_COLLATE = 'en_US.UTF-8'
LC_CTYPE = 'en_US.UTF-8'
I'm going to see if I can swap the collate types to match local and see if that helps
Update 4
I'm doing this, which is the ultimately resulting in the failing piece of code on live (not local)
setlocale(LC_MONETARY, 'en_GB');
$equivFinal = utf8_encode("£") . money_format('%.2n', $equivFinal);
Update 5
I'm getting closer to the issue.
On local the string is produced as
£1.00
On live the string is produced as
£�1.00
So for some reason the live server is adding more crap in when doing the UTF8 conversion
Update 6
Ok so I've pinned it down to this:
setlocale(LC_MONETARY, 'en_GB');
Logger::getInstance(__NAMESPACE__)->info("TEST 01= " .money_format('%.2n', 1.00));
On local it outputs
TEST 01= 1.00
As expected
on live it output
TEST 01= �1.00
With the random characters added to the start, which is what is causing my utf8 issue as it's croaking on that.
Any idea why money_format would do that on one server and not another?
finally nailed it
it's money_format
if you dont specifiy a locale or specify it incorrectly then it just does its own thing
so i was doing
setlocale(LC_MONETARY, 'en_GB');
and on local that meant money_format just ignored the £ from the start of the output
but on live it meant that money_format put the unicode WTF character.
doing it properly for ubuntu of
setlocale(LC_MONETARY, 'en_GB.UTF-8');
means money_format comes out with £ at the front and therefore i dont need my utf8 rubbish
Update 1
Better still, don't bother with setlocale and I'm just going to do this:
utf8_encode("£") . money_format('%!.2n', $equivFinal);
Which basically formats the money and excludes the symbol prefix
and then better still just use number_format and do
utf8_encode("£") . number_format($equivFinal, 2);
I've learnt something new :)
The issue is that you can't save raw GBP symbol inside ASCII file.
Never use weird characters in your source code because no matter how much they "should" work you always run into problems like this. (You can come up with your own definition of "weird" but mine is anything you can't type in on a us-english keyboard without resorting to alt-codes.)
To get arround this restriction concatinate in the results of the chr() function. (use the following code snipit to find out the parameter you need to pass chr is 163 in this case.)
<?php echo(ord('£')); ?>
so in your case the line would read:
$equivFinal = chr(163) . money_format('%.2n', $equivFinal);
this is really doing my nut.....
all relevant PHP Output scripts set headers (in this case only one file - the main php script):
header("Content-type: text/html; charset=utf-8");
HTML meta is set in head:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
all Mysql tables and related columns set to:
utf8_unicode_ci Unicode (multilingual), case-insensitive
I have been writing a class to do some translation.. when the class writes to a file using fopen, fputs etc everything works great, the correct chars appear in my output files (Which are written as php arrays and saved to the filesystem as .php or .htm files. eval() brings back .htm files correctly, as does just including the .php files when I want to use them. All good.
Prob is when I am trying to create translation entries to my DB. My DB connection class has the following line added directly after the initial connection:
mysql_query("SET NAMES utf8, character_set_results = 'utf8', character_set_client = 'utf8', character_set_connection = 'utf8', character_set_database = 'utf8', character_set_server = 'utf8'");
instead of seeing the correct chars, i get the usual crud you would expect using the wrong charset in the DB. Eg:
Propriétés
instead of:
propriétés
don't even get me started on Russian, Japanese, etc chars! But then using UTF8 should not make any single language charset an issue...
What have I missed? I know its not the PHP as the site shows the correct chars from the included translation .php or .htm files, its only when I am dealing with the MySQL DB that I am having these issues. PHPMyAdmin shows the entries with the wrong chars, so I assume its happening when the PHP "writes" to MySQL. Have checked similar questions here on stack, but none of the answers (all of which were taken care of) give me any clues...
Also, anyone have thoughts on speed difference using include $filename vs eval(file_get_contents($filename)).
You say that you are seeing "the usual crud you would expect using the wrong charset". But that crud is in fact created by using utf8_encode() on an already UTF8 string, so chances are that you are not using the "wrong encoding" anywhere, but exceeding the times you are encoding into UTF8.
You may take a look into a library I made to fix that kind of problems:
https://stackoverflow.com/a/3521340/290221
Here is all you need to make sure you have a good display of those chars :
/* HTTP charset */
header("Content-Type:text/html; charset=UTF-8");
/* Set MySQL communication encoding */
mysql_set_charset("UTF8");
You also need to set the DB encoding to the correct one, also each table's encoding AND the field's encoding
Last but not least, your php file's encoding should also match.
There is a mysql_set_charset('utf8'); in mysql for that. Run the query at the beginning of another query.