I'm trying to echo the date with strftime but I'm getting bad encoding on utf-8 only characters. (accented characters basically)
setlocale(LC_TIME, 'spanish');
define("CHARSET", "iso-8859-1");
echo strftime("%A, %d de %B",strtotime($row['Date']));
Is there any problem in this part of the code? Everything is encoded in utf-8 and echoing a 'á' character above it displays the character correctly.
Try adding utf8_encode()
setlocale(LC_TIME, 'spanish');
define("CHARSET", "iso-8859-1");
echo utf8_encode(strftime("%A, %d de %B",strtotime($row['Date'])));
I'm a bit late, but Googling around I found this post. And the answers weren't appropriate in my case.
I'm experiencing the same problem as the OP, but my locale is fr_FR and everything works fine on my computer but not on the dev server.
If I add an iconv (as most people suggest when you Google this issue), it works on the dev server but not on my computer, so I needed a "bulletproof" solution that would work the same everywhere (as there is also a production server).
So, the issue here is with the setlocale, this function changes the locale on the current execution, but every locale is associated with a charset and if none is specified, it falls back to the default one of your system (I think, in my case it was falling back to ISO-8859-1 when using the fr_FR locale). You can list all available locales on your computer/server with the locale -a command. You will most likely see the locale you want, with ".UTF-8" (in my case "fr_FR.UTF-8"), that's how you must set it: setlocale('fr_FR.UTF-8');
perhaps:
echo iconv("iso-8859-1","utf-8",strftime("%A, %d %B",strtotime($row['Date'])));
For those that don't have iconv, you can use the mb function to convert the stftime encoded string to utf-8
echo mb_convert_encoding(strftime("%A, %d de %B",strtotime($row['Date'])), 'UTF-8', mb_internal_encoding());
Related
I'm spanish, and making tests internacionalizing a text width PHP, i only get it translated to english.
I got this structure of files:
locale/en_US/LC_MESSAGES/con los ficheros messages.mo y messages.po
locale/es_ES/LC_MESSAGES/con los ficheros messages.mo y messages.po
locale/fr_FR/LC_MESSAGES/con los ficheros messages.mo y messages.po
Every files have the key word "Servicios" translated to each languaje.
And in PHP i have this code:
<?php
putenv("LANG=en_US");
setlocale(LC_ALL, "en_US");
bindtextdomain("messages", "locale");
textdomain("messages");
?>
When i put the code 'en_US' show the good translation, but when i change it to 'es_ES' or 'fr_FR' that way:
<?php
putenv("LANG=es_ES");
setlocale(LC_ALL, "es_ES");
?>
or
<?php
putenv("LANG=fr_FR");
setlocale(LC_ALL, "fr_FR");
?>
still showing the translation to English
I am working on Widnows 7 and the function
echo $_SERVER['HTTP_ACCEPT_LANGUAGE'] ;
returns to
"es-ES,es;q=0.8"
always,
Which problem could it be?
Thank you
It is quite likely that the languages are not installed on the server your running the script on - do you have shell access to the server? Then try
locale -a
to see which locales are installed. Also have a look here Is it feasible to rely on setlocale, and rely on locales being installed?
NOTE:
be careful with the LC_ALL setting, as it may introduce some unwanted conversions. For example, I used
setlocale (LC_ALL, "Dutch");
to get my weekdays in dutch on the page. From that moment on (as I found out many hours later) my floating point values from MYSQL where interpreted as integers because the Dutch locale wants a comma (,) instead of a point (.) before the decimals. I tried printf, number_format, floatval.... all to no avail. 1.50 was always printed as 1.00 :(
When I set my locale to :
setlocale (LC_TIME, "Dutch");
my weekdays are good now and my floating point values too.
I have hard time with character charset, I suspect my fonction that display date to return non UTF-8 character (août is replaced by a question mark inside a diamond août).
When working on my local server everything's fine but when I push my code on my staging server, it's not displaying properly.
My php files are saved as UTF-8 NO BOM
If I inspect my output page, headers indicate UTF-8.
My local machine is a Mac with MAMP installed and my stating server have CentOS with cPanel installed.
Here is the part I suspect causing problem :
$langCode = "fr_FR"; /* Alos tried fr_FR.UTF-8 */
setlocale(LC_ALL, $langCode);
$monthName = _(strftime("%B",strtotime($dateStr)))
echo $monthName; /* Alos tried utf8_encode($monthName) worked on my staging server but not on my local server ! I'm using */
Finally found how to find the bug and fix it.
setlocale(LC_ALL, 'fr_FR');
var_dump(mb_detect_encoding(_(strftime("%B",strtotime($dateStr)))));
the dump returned UTF-8 on local and FALSE on staging server.
PHP.net documentation about mb_detect_encoding()
Return Values ¶
The detected character encoding or FALSE if the encoding cannot be
detected from the given string.
So charset can't be detected. I will try to force it "again"
setlocale(LC_ALL, 'fr_FR.UTF-8');
var_dump(mb_detect_encoding(_(strftime("%B",strtotime($dateStr)))));
this time the dump returned UTF-8 on local and UTF-8 on staging server. So I rollback my code to see what's happened when I tried first time with fr_FR.UTF-8 why does it was not working ? And I realize I was using utf8_encode() like pointed by user deceze in comment of this function's doc,
In fact, applying this function to text that is not encoded in ISO-8859-1 will most likely simply garble that text.
Thank you for your help everyone !
put this meta tag on your html code inside <head></head>
<meta charset="UTF-8">
It seems your server are configured to send the header
content-type: text/html; charset=UTF-8
as default. You could change your server configuration or you could add at the very start
<?php
header("content-type: text/html; charset=UTF-8");
?>
to set this header by yourself.
you need to use :
<?php
$conn = mysql_connect("localhost","root","root");
mysql_select_db("test");
mysql_query("SET NAMES 'utf8'", $conn);//put this line after you select db.
I have a piece of PHP code, which was written in notepad++ on a Windows 7 machine
The Encoding in notepad++ is set to "Encode to ANSI" (ASCII)
I am them doing this in my code:
utf8_encode("£")
so I am sure to get the utf friendly version of the £ symbol.
All works perfectly fine on the local server.
But when I push it up to my live server I'm getting all sorts of issues with utf8 encoding errors in php.
Is something in the git push/pull process corrupting this, or is it perhaps a locale setting on the live server?
Both local and live servers run ubuntu 12.04
Thanks
Update 1
The actual error I'm getting is
invalid byte sequence for encoding "UTF8": 0xa3'
(This is a Postgres SQL error)
Other difference in local and live is live is over https and local is just http (both apache)
Update 2
Running:
file -bi script.php
on both local and live produces:
text/x-php; charset=iso-8859-1
So it seems as if the encoding of the file is intact?
Update 3
Looking at the local Postgres installation it has the following settings:
ENCODING = 'UTF8'
LC_COLLATE = 'en_GB.UTF-8'
LC_CTYPE = 'en_GB.UTF-8'
Whereas live has:
ENCODING = 'UTF8'
LC_COLLATE = 'en_US.UTF-8'
LC_CTYPE = 'en_US.UTF-8'
I'm going to see if I can swap the collate types to match local and see if that helps
Update 4
I'm doing this, which is the ultimately resulting in the failing piece of code on live (not local)
setlocale(LC_MONETARY, 'en_GB');
$equivFinal = utf8_encode("£") . money_format('%.2n', $equivFinal);
Update 5
I'm getting closer to the issue.
On local the string is produced as
£1.00
On live the string is produced as
£�1.00
So for some reason the live server is adding more crap in when doing the UTF8 conversion
Update 6
Ok so I've pinned it down to this:
setlocale(LC_MONETARY, 'en_GB');
Logger::getInstance(__NAMESPACE__)->info("TEST 01= " .money_format('%.2n', 1.00));
On local it outputs
TEST 01= 1.00
As expected
on live it output
TEST 01= �1.00
With the random characters added to the start, which is what is causing my utf8 issue as it's croaking on that.
Any idea why money_format would do that on one server and not another?
finally nailed it
it's money_format
if you dont specifiy a locale or specify it incorrectly then it just does its own thing
so i was doing
setlocale(LC_MONETARY, 'en_GB');
and on local that meant money_format just ignored the £ from the start of the output
but on live it meant that money_format put the unicode WTF character.
doing it properly for ubuntu of
setlocale(LC_MONETARY, 'en_GB.UTF-8');
means money_format comes out with £ at the front and therefore i dont need my utf8 rubbish
Update 1
Better still, don't bother with setlocale and I'm just going to do this:
utf8_encode("£") . money_format('%!.2n', $equivFinal);
Which basically formats the money and excludes the symbol prefix
and then better still just use number_format and do
utf8_encode("£") . number_format($equivFinal, 2);
I've learnt something new :)
The issue is that you can't save raw GBP symbol inside ASCII file.
Never use weird characters in your source code because no matter how much they "should" work you always run into problems like this. (You can come up with your own definition of "weird" but mine is anything you can't type in on a us-english keyboard without resorting to alt-codes.)
To get arround this restriction concatinate in the results of the chr() function. (use the following code snipit to find out the parameter you need to pass chr is 163 in this case.)
<?php echo(ord('£')); ?>
so in your case the line would read:
$equivFinal = chr(163) . money_format('%.2n', $equivFinal);
Starting with only the locale identifier name (string) provided by clients, how or where do I look up the default "list separator" character for that locale?
The "list separator" setting is the character many different types of applications and programming languages may use as the default grouping character when joining or splitting strings and arrays. This is especially important for opening CSV files in spreadsheet programs. Though this is often the comma ",", this default character may be different depending on the machine's region settings. It may even differ between OS's.
I'm not interested in my own server environment here. Instead, I need to know more about the client's based off their locale identifier which they've given to me, so my own server settings are irrelevant. Also for this solution, I can not change the locale setting on this server to match a client's for the entire current process as a shortcut to look this value up.
If this is defined in the ICU library, I'm not able to find any way to look this value up using the INTL extension.
Any hints?
I am not sure if my answer will satisfy your requirements but I suggest (especially as you don't want to change the locale on the server) to use a function that will give you the answer:
To my knowledge (and also Wikipedia's it seems) the list separator in a CSV is a comma unless the decimal point of the locale is a comma, in that case the list separator is a semicolon.
So you could get a list of all locales that use a comma (Unicode U+002C) as separator using this command:
cd /usr/share/i18n/locales/
grep decimal_point.*2C *_* -l
and you could then take this list to determine the appropriate list separator:
function get_csv_list_separator($locale) {
$locales_with_comma_separator = "az_AZ be_BY bg_BG bs_BA ca_ES crh_UA cs_CZ da_DK de_AT de_BE de_DE de_LU el_CY el_GR es_AR es_BO es_CL es_CO es_CR es_EC es_ES es_PY es_UY es_VE et_EE eu_ES eu_ES#euro ff_SN fi_FI fr_BE fr_CA fr_FR fr_LU gl_ES hr_HR ht_HT hu_HU id_ID is_IS it_IT ka_GE kk_KZ ky_KG lt_LT lv_LV mg_MG mk_MK mn_MN nb_NO nl_AW nl_NL nn_NO pap_AN pl_PL pt_BR pt_PT ro_RO ru_RU ru_UA rw_RW se_NO sk_SK sl_SI sq_AL sq_MK sr_ME sr_RS sr_RS#latin sv_SE tg_TJ tr_TR tt_RU#iqtelif uk_UA vi_VN wo_SN");
if (stripos($locales_with_comma_separator, $locale) !== false) {
return ";";
}
return ",";
}
(the list of locales is taken from my own Debian machine, I don't know about the completeness of the list)
If you don't want to have this static list of locales (though I assume that this doesn't change that often), you can of course generate the list using the command above and cache it.
As a final note, according to RFC4180 section 2.6 the list separator actually never changes but rather fields containing a comma (so this also means floating numbers, depending on the locale) should be enclosed in double-quotes. Though (as linked above) not many people follow the RFC standard.
There's no such locale setting as "list separator" it might be software specific, but I doubt it's user specific.
However... You can detect user's locale and try to match the settings.
Get browsers locale: $accept_lang = $_SERVER['HTTP_ACCEPT_LANGUAGE']; this might contain a list of comma-separated values. Some browser don't send this though. more here...
Next you can use setlocale(LC_ALL, $accept_lang); and get available locale settings using $locale_info = localeconv(); more here...
I have working localization in my project. Working means that my project gets translated to whatever language I have in the locale/sk folder, sk for slovak being my default system language.
Setting to any other language doesn't work. I have tried $lang = 'cs', 'cz', 'en', 'en_UK', 'en_UK.utf8' and others. Still, only the translation in the 'sk' folder is taken and still the setlocale() function returns false. I have tried to change default language in browser - no effect.
This is my code:
putenv("LANG=$lang");
setlocale(LC_ALL, $lang);
bindtextdomain("messages", realpath("../localem"));
textdomain("messages");
...
_("Welcome!")
I have also tried these:
putenv("LANGUAGE=$lang");
putenv('LC_ALL=$lang');
Any suggestions are welcome.
Edit:
$loc = array('nor');
if (setlocale(LC_ALL, $loc)==false) print ' false'; else print setlocale(LC_ALL, $loc);
'nor' prints Norwegian (Bokmĺl)_Norway.1252, 'rus' russian, but 'svk' prints false and so does 'cze'.
On the list all of these are mentioned:
http://msdn.microsoft.com/en-us/library/cdax410z%28v=vs.80%29.aspx
Windows uses another format for the locale setting, see MSDN: List of Country/Region Strings.
You can send a list of locales to setlocale by sending in an array, such as to get Norwegian month names and time formats:
setlocale(LC_TIME, array('nb_NO.UTF-8', 'no_NO.UTF-8', 'nor'));
Windows might however return strings in another encoding than UTF-8, so you might want to handle this manually (converting from cpXXXX to UTF-8).