Get "list separator" character for any locale

Get "list separator" character for any locale - php

Starting with only the locale identifier name (string) provided by clients, how or where do I look up the default "list separator" character for that locale?
The "list separator" setting is the character many different types of applications and programming languages may use as the default grouping character when joining or splitting strings and arrays. This is especially important for opening CSV files in spreadsheet programs. Though this is often the comma ",", this default character may be different depending on the machine's region settings. It may even differ between OS's.
I'm not interested in my own server environment here. Instead, I need to know more about the client's based off their locale identifier which they've given to me, so my own server settings are irrelevant. Also for this solution, I can not change the locale setting on this server to match a client's for the entire current process as a shortcut to look this value up.
If this is defined in the ICU library, I'm not able to find any way to look this value up using the INTL extension.
Any hints?

I am not sure if my answer will satisfy your requirements but I suggest (especially as you don't want to change the locale on the server) to use a function that will give you the answer:
To my knowledge (and also Wikipedia's it seems) the list separator in a CSV is a comma unless the decimal point of the locale is a comma, in that case the list separator is a semicolon.
So you could get a list of all locales that use a comma (Unicode U+002C) as separator using this command:
cd /usr/share/i18n/locales/
grep decimal_point.*2C *_* -l
and you could then take this list to determine the appropriate list separator:
function get_csv_list_separator($locale) {
$locales_with_comma_separator = "az_AZ be_BY bg_BG bs_BA ca_ES crh_UA cs_CZ da_DK de_AT de_BE de_DE de_LU el_CY el_GR es_AR es_BO es_CL es_CO es_CR es_EC es_ES es_PY es_UY es_VE et_EE eu_ES eu_ES#euro ff_SN fi_FI fr_BE fr_CA fr_FR fr_LU gl_ES hr_HR ht_HT hu_HU id_ID is_IS it_IT ka_GE kk_KZ ky_KG lt_LT lv_LV mg_MG mk_MK mn_MN nb_NO nl_AW nl_NL nn_NO pap_AN pl_PL pt_BR pt_PT ro_RO ru_RU ru_UA rw_RW se_NO sk_SK sl_SI sq_AL sq_MK sr_ME sr_RS sr_RS#latin sv_SE tg_TJ tr_TR tt_RU#iqtelif uk_UA vi_VN wo_SN");
if (stripos($locales_with_comma_separator, $locale) !== false) {
return ";";
}
return ",";
}
(the list of locales is taken from my own Debian machine, I don't know about the completeness of the list)
If you don't want to have this static list of locales (though I assume that this doesn't change that often), you can of course generate the list using the command above and cache it.
As a final note, according to RFC4180 section 2.6 the list separator actually never changes but rather fields containing a comma (so this also means floating numbers, depending on the locale) should be enclosed in double-quotes. Though (as linked above) not many people follow the RFC standard.

There's no such locale setting as "list separator" it might be software specific, but I doubt it's user specific.
However... You can detect user's locale and try to match the settings.
Get browsers locale: $accept_lang = $_SERVER['HTTP_ACCEPT_LANGUAGE']; this might contain a list of comma-separated values. Some browser don't send this though. more here...
Next you can use setlocale(LC_ALL, $accept_lang); and get available locale settings using $locale_info = localeconv(); more here...

Related

Unexpected behavior of the "Money" field in PostgreSQL

I started developing a web sistem using Linux Ubuntu and at some point I had to do the following with the data type "money":
explode(" ", "R$ 3,000.00"); // [0] => "R$" and [1] => "3,000.00"
However when I installed the software in Windows I realized that the data is saved without space, that is, "R$3,000.00". Soon, the code snippet fails to function properly.
Note: 1 could "fix" this using:
preg_replace("/[R$]+/", "$0 $1", "R$3,000.00"); // "R$ 3,000.00"
But certainly not a better way.
Note 2: The version of PostgreSQL used is 9.5
Would anyone have any suggestions for resolving this?
Thank you very much.

The issue you are having is that the lc_monetary locale does not have the same value on both computers. This is what you have an "Unexpected behavior" on two different operating systems.
You can change the lc_monetary locale with:
set lc_monetary to 'SOME_LOCALE';
Then test it with:
test=# SELECT 34.888::money;
money
--------
$34.89
(1 row)
Read more at https://www.postgresql.org/docs/current/static/runtime-config-client.html#GUC-LC-MONETARY
If your application is using different OS, it is wise to set the locale correctly at the beginning of the connection or in the configuration.
On Mac/Linux you can see available locales with locale -a. I an not sure for Windows.
If you don't generally use the currency symbol you should definitively consider to store the number as a decimal instead.

Trying to use gettext without country code ('es' instead 'es_ES')

I have the next files in my php project:
libraries/locale/es_ES/LC_MESSAGES/messages.po
libraries/locale/es_ES/LC_MESSAGES/messages.mo
libraries/locale/es/LC_MESSAGES/messages.po
libraries/locale/es/LC_MESSAGES/messages.mo
Both are the same file edited with PoEdit just differenced by Catalog->Properties->Language (es and es_ES respectively)
And this code into localization.php file
$language = "es_ES.UTF-8";
putenv("LANG=$language");
setlocale(LC_ALL, $language);
bindtextdomain(STRING_DOMAIN, LOCALE_PATH);
textdomain(STRING_DOMAIN);
echo "Test translation: "._('string to translate');
This code works fine and 'string to translate' is displayed correctly. However if I try to use the generic 'es' code:
$language = "es.UTF-8";
...String is not translated. Seems to be related to locales installed in my ubuntu (es_ES.utf8 exists but not es.utf8)
Can I force gettext to use es.UTF-8 file?

As a workaround, you can always use localedef to compile a new locale based on an existing one. To create es based on es_ES.UTF-8:
localedef -i es_ES -f UTF-8 es.UTF8
But here comes some important questions which covers areas beyond simple loading of translation files. Since your locale-specific information like date and time formats, measurements, etc are depending on installed locales, it is always a good idea to have a plan regarding use of locales. Assuming that you are using es_ES for Spanish (Spain), what is es? Is it intended for a specific flavor of Spanish like es_IC (Canary Islands) or some Latin American flavor?
Here is a clarifying example; if I want to design a customized repertoire of locales to cover various flavors of Spanish, I will do it like this; first I add those locales which are easily installable with locale-gen:
locale-gen es_ES.UTF-8 es_MX.UTF-8 es_AR.UTF-8
Then I would like to have es_US based on es_MX and es_419 (419 is the geographical area code for Latin America) based on es_AR:
localedef -i es_MX -f UTF-8 es_US.UTF-8
localedef -i es_AR -f UTF-8 es_419.UTF-8

NumberFormatter::formatCurrency() ignores MIN_FRACTION_DIGITS

I want to use PHP's Intl's NumberFormatter class to display prices in a human-readable format. What our project needs:
The CLDR number pattern, and the currency and separator symbols will need to be configured through our code and not default to what Intl/ICU knows.
Our application will take care of the decimals. NumberFormatter should display any decimals that we pass on to it.
However, when playing around with different configurations to find the exact combination that works for our project, I noticed some effects that I can't explain. The three formatters in the following code snippet are almost identical. As opposed to the first one, the second one uses the euro instead of the U.S. dollar, and the third one has a currency sign set. The output of the first formatter is as I expected it to be, but when I change the currency or set a currency sign, the MIN_FRACTION_DIGITS attribute is ignored and the sign is never changed.
<?php
$fmt = new NumberFormatter('de_DE', NumberFormatter::CURRENCY);
$fmt->setAttribute(NumberFormatter::MIN_FRACTION_DIGITS, 4);
echo $fmt->formatCurrency(1234567890.891234567890000, "EUR")."\n";
// Outputs 1.234.567.890,8912 €
$fmt = new NumberFormatter('de_DE', NumberFormatter::CURRENCY);
$fmt->setAttribute(NumberFormatter::MIN_FRACTION_DIGITS, 4);
echo $fmt->formatCurrency(1234567890.891234567890000, "USD")."\n";
// Ouputs 1.234.567.890,89 $
$fmt = new NumberFormatter('de_DE', NumberFormatter::CURRENCY);
$fmt->setAttribute(NumberFormatter::MIN_FRACTION_DIGITS, 4);
$fmt->setSymbol(\NumberFormatter::CURRENCY_SYMBOL, '%');
echo $fmt->formatCurrency(1234567890.891234567890000, "EUR")."\n";
// Outputs 1.234.567.890,89 €
?>
The first table row under General Purpose Numbers of the Unicode CLDR number pattern documentation describes that when parsing currency patterns, the two zeroes in the decimal part of the pattern will need to be replaced by however many digits the application thinks is appropriate. The application here is ICU (the C library that PHP uses for this), and the MIN_FRACTION_DIGITS attribute does its job of letting me override default behavior in the first example, but not in the second or the third.
Can someone please explain this seemingly random change in behavior? Let me know if there is any additional information that you need.

I just found the following:
https://bugs.php.net/bug.php?id=63140
http://bugs.icu-project.org/trac/ticket/7667
[2012-10-05 08:21 UTC] jpauli#email.com
I confirm this is an ICU bug in 4.4.x branch.
Consider upgrading libicu, 4.8.x gives correct result

Gettext() with larger texts

I'm using gettext() to translate some of my texts in my website. Mostly these are short texts/buttons like "Back", "Name",...
// I18N support information here
$language = "en_US";
putenv("LANG=$language");
setlocale(LC_ALL, $language);
// Set the text domain as 'messages'
$domain = 'messages';
bindtextdomain($domain, "/opt/www/abc/web/www/lcl");
textdomain($domain);
echo gettext("Back");
My question is, how 'long' can this text (id) be in the echo gettext("") part ?
Is it slowing down the process for long texts? Or does it work just fine too? Like this for example:
echo _("LZ adfadffs is a VVV contributor who writes a weekly column for Cv00m. The former Hechinger Institute Fellow has had his commentary recognized by the Online News Association, the National Association of Black Journalists and the National ");

The official gettext documentation merely has this advice:
Translatable strings should be limited to one paragraph; don't let a single message be longer than ten lines. The reason is that when the translatable string changes, the translator is faced with the task of updating the entire translated string. Maybe only a single word will have changed in the English string, but the translator doesn't see that (with the current translation tools), therefore she has to proofread the entire message.
There's no official limitation on the length of strings, and they can obviously exceed at least "one paragraph/10 lines".
There should be virtually no measurable performance penalty for long strings.

gettext effectively has a limit of 4096 chars on the length of strings.
When you pass this limit you get a warning:
Warning: gettext(): msgid passed too long in %s on line %d
and returns you bool(false) instead of the text.
Source:
PHP Interpreter repository - The real fix for the gettext overflow bug

function gettext http://www.php.net/manual/en/function.gettext.php
it's defined as a string input so your machines memory would be the limiting factor.
try to benchmark it with microtime or better with xdebug if you have it on your development machine.

PHP PECL extension intl giving garbled results for Swedish ordinal numbers

I'm using the PECL intl module to localize dates and numbers in a PHP project. In all other languages I'm using (40), localizing ordinal numbers works fine. In Swedish, however, I get strange output. It appears to be the template constants used to generate the ordinals.
$fnf = new NumberFormatter('sv_FI', NumberFormatter::ORDINAL);
echo $fnf->format(1);
and
$snf = new NumberFormatter('sv_SE', NumberFormatter::ORDINAL);
echo $snf->format(1);
Both return 1:e%digits-ordinal-neutre:0: 1:a vs. something like 1st or 1er.
My only guess, other than a bug, is that I'm missing some additional argument such as the gender of an associated verb.

If you output the rule based number formatters rules $fnf->getPattern():
%digits-ordinal-masculine:
0: =#,##0==%%dord-mascabbrev=;
-x: −>%digits-ordinal-masculine>;
%%dord-mascabbrev:
0: :e%digits-ordinal-neutre:0: =%digits-ordinal-feminine=;
%digits-ordinal-reale:
0: =%digits-ordinal-feminine=;
%digits-ordinal-feminine:
0: =#,##0==%%dord-femabbrev=;
-x: −>%digits-ordinal-feminine>;
%%dord-femabbrev:
0: :e;
1: :a;
2: :a;
3: :e;
20: >%%dord-femabbrev>;
100: >%%dord-femabbrev>;
%digits-ordinal:
0: =%digits-ordinal-masculine=;
You can see that the private rule set dord-mascabbrev only has one rule giving that value:
:e%digits-ordinal-neutre:0: 1:a
Which you will have then output after your 1, like you describe in your question.
This is not a bug in PECL INTL, but the underlying rule is malformatted which is part of the ICU Libraries (that rule there). About three years ago the sv number formatter rules were fixed for missing semicolons, it looks like that one line slipped through.
These rules are taken into ICU from the CLDR (Common Locale Data Repository) at the Unicode Consortium. I opened a bug report there, because unless this is fixed in CLDR, and then put into ICU, it can't work with the PHP INTL extension.
The alternative might be to manually patch the ICU libraries (version 4.8) and then build the PECL package against your patched libraries.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Get "list separator" character for any locale - php

Related

Unexpected behavior of the "Money" field in PostgreSQL

Trying to use gettext without country code ('es' instead 'es_ES')

NumberFormatter::formatCurrency() ignores MIN_FRACTION_DIGITS

Gettext() with larger texts

PHP PECL extension intl giving garbled results for Swedish ordinal numbers

Categories

Resources