NumberFormatter::SPELLOUT spellout-ordinal in russian and italian

NumberFormatter::SPELLOUT spellout-ordinal in russian and italian - php

this code works for english, spanish and german ordninal numbers, but with russian or italian ordninal numbers it doesn't work.
'ru-RU','it-IT' also don't work
I get for example in russian for 2 -> два (this is the cardinal number) , but I want the ordinal number and this would be here 2 -> второй.
I get for example in italian for 2 -> due (this is the cardinal number) , but I want the ordinal number and this would be here 2 -> secondo.
Update:
I found a solution with works in french, spain, german and some other languages:
maskuline ordinal numbers: %spellout-ordinal-maskuline
feminine ordinal numbers: %spellout-ordinal-feminine
russian and italian version doesn't work and I tried already with -maskuline/-feminine
$ru_ordinal = new NumberFormatter('ru', NumberFormatter::SPELLOUT);
$ru_ordinal->setTextAttribute(NumberFormatter::DEFAULT_RULESET, "%spellout-ordinal");

NumberFormatter is using ICU formatting.
As you can check here: http://saxonica.com/html/documentation/extensibility/config-extend/localizing/ICU-numbering-dates/ICU-numbering.html
... Russian (ru) has following formatting available:
spellout-cardinal-feminine (scf)
spellout-cardinal-masculine (scm)
spellout-cardinal-neuter (scne)
spellout-numbering (sn)
spellout-numbering-year (sny)
... and Italian (it):
spellout-cardinal-feminine (scf)
spellout-cardinal-masculine (scm)
spellout-numbering (sn)
spellout-numbering-year (sny)
spellout-ordinal-feminine (sof)
spellout-ordinal-masculine (som)
That is why you will not be able to set ordinal format for (ru) and following code:
$nFormat = new NumberFormatter('it', NumberFormatter::SPELLOUT);
$nFormat->setTextAttribute(NumberFormatter::DEFAULT_RULESET, "%spellout-ordinal-feminine");
var_dump($nFormat->format(42));
Will print:
string 'quarantaduesima' (length=17)
Like you (propably) want.
EDIT:
Informations about used formatting with references to ICU: http://php.net/manual/en/numberformatter.create.php
Tested with PHP 5.4.x and ICU version => 51.2; ICU Data version => 51.2.
You can use shell command:
$ php -i | grep ICU
To check what version of ICU you have.
For latest ICU version you should propably install/update php-intl package: http://php.net/manual/en/intl.installation.php
EDIT 2:
I have created extension for NumberFormatter (so far with polish ordinals). Feel free to contribute another languages: https://github.com/arius86/number-formatter

Just a Recommendation, I am not sure if this works or have an Apache services open at this point of time as I am at college, but have you tried to put ru-RU for Russia. In PHP I personally put my Language Codes as "en-GB"
http://download1.parallels.com/SiteBuilder/Windows/docs/3.2/en_US/sitebulder-3.2-win-sdk-localization-pack-creation-guide/30801.htm
Here is a List I found on the internet with some to help you.

Related

[Docker][PHP] NumberFormatter::formatCurrency incompatible between 7.4.29 and 7.4.28

(Same as https://github.com/docker-library/php/issues/1301 )
The results of running the following script were different between php:7.4.29-fpm-alpine and php:7.4.28-fpm-alpine.
<?php
$fmt = new \NumberFormatter('JA_JP', \NumberFormatter::CURRENCY);
$formatString = $fmt->formatCurrency(0, 'JPY');
var_dump($formatString);
Result
php:7.4.28-fpm-alpine (I expects)
string(4) "￥0"
php:7.4.29-fpm-alpine
string(3) "¥0"
(My) problem
Is there any way to get the results in php:7.4.29-fpm-alpine to be what they were in php:7.4.28-fpm-alpine ? (Is there a workaround?)
Way to reproduce
see https://github.com/sogaoh/reproduce-incompatibility-of-format-currency (README.md)
Remarks
( unconfirmed ) I'm guessing that there is a similar problem between
8.0.19 and 8.0.18
8.1.6 and 8.1.5

The difference between the two is that the .28 version uses \uffe5 'Fullwidth Yen Sign, and the .29 version uses \u00a5 'Yen Sign.
As noted in the github issue comment, it's likely a change in the underlying ICU library between builds of the image, as opposed to an issue in the image, Alpine, or PHP themselves. As an interim fix you could use a modified image that installs a pinned version of ICU, but YMMV.
Alternatively you could shim in a simple replacement like:
str_replace("\xc2\xa5", "\xef\xbf\xa5", $formatter_output);
Though the issue itself is fairly cosmetic, and IMO the original use of a fullwidth glyph for the currency symbol was the more questionable option.

My solution:
Install package: icu-data-full additionally.
refs
https://github.com/sogaoh/reproduce-incompatibility-of-format-currency/pull/2/files
https://github.com/docker-library/php/issues/1301#issuecomment-1139142261

Trying to use gettext without country code ('es' instead 'es_ES')

I have the next files in my php project:
libraries/locale/es_ES/LC_MESSAGES/messages.po
libraries/locale/es_ES/LC_MESSAGES/messages.mo
libraries/locale/es/LC_MESSAGES/messages.po
libraries/locale/es/LC_MESSAGES/messages.mo
Both are the same file edited with PoEdit just differenced by Catalog->Properties->Language (es and es_ES respectively)
And this code into localization.php file
$language = "es_ES.UTF-8";
putenv("LANG=$language");
setlocale(LC_ALL, $language);
bindtextdomain(STRING_DOMAIN, LOCALE_PATH);
textdomain(STRING_DOMAIN);
echo "Test translation: "._('string to translate');
This code works fine and 'string to translate' is displayed correctly. However if I try to use the generic 'es' code:
$language = "es.UTF-8";
...String is not translated. Seems to be related to locales installed in my ubuntu (es_ES.utf8 exists but not es.utf8)
Can I force gettext to use es.UTF-8 file?

As a workaround, you can always use localedef to compile a new locale based on an existing one. To create es based on es_ES.UTF-8:
localedef -i es_ES -f UTF-8 es.UTF8
But here comes some important questions which covers areas beyond simple loading of translation files. Since your locale-specific information like date and time formats, measurements, etc are depending on installed locales, it is always a good idea to have a plan regarding use of locales. Assuming that you are using es_ES for Spanish (Spain), what is es? Is it intended for a specific flavor of Spanish like es_IC (Canary Islands) or some Latin American flavor?
Here is a clarifying example; if I want to design a customized repertoire of locales to cover various flavors of Spanish, I will do it like this; first I add those locales which are easily installable with locale-gen:
locale-gen es_ES.UTF-8 es_MX.UTF-8 es_AR.UTF-8
Then I would like to have es_US based on es_MX and es_419 (419 is the geographical area code for Latin America) based on es_AR:
localedef -i es_MX -f UTF-8 es_US.UTF-8
localedef -i es_AR -f UTF-8 es_419.UTF-8

NumberFormatter::formatCurrency() ignores MIN_FRACTION_DIGITS

I want to use PHP's Intl's NumberFormatter class to display prices in a human-readable format. What our project needs:
The CLDR number pattern, and the currency and separator symbols will need to be configured through our code and not default to what Intl/ICU knows.
Our application will take care of the decimals. NumberFormatter should display any decimals that we pass on to it.
However, when playing around with different configurations to find the exact combination that works for our project, I noticed some effects that I can't explain. The three formatters in the following code snippet are almost identical. As opposed to the first one, the second one uses the euro instead of the U.S. dollar, and the third one has a currency sign set. The output of the first formatter is as I expected it to be, but when I change the currency or set a currency sign, the MIN_FRACTION_DIGITS attribute is ignored and the sign is never changed.
<?php
$fmt = new NumberFormatter('de_DE', NumberFormatter::CURRENCY);
$fmt->setAttribute(NumberFormatter::MIN_FRACTION_DIGITS, 4);
echo $fmt->formatCurrency(1234567890.891234567890000, "EUR")."\n";
// Outputs 1.234.567.890,8912 €
$fmt = new NumberFormatter('de_DE', NumberFormatter::CURRENCY);
$fmt->setAttribute(NumberFormatter::MIN_FRACTION_DIGITS, 4);
echo $fmt->formatCurrency(1234567890.891234567890000, "USD")."\n";
// Ouputs 1.234.567.890,89 $
$fmt = new NumberFormatter('de_DE', NumberFormatter::CURRENCY);
$fmt->setAttribute(NumberFormatter::MIN_FRACTION_DIGITS, 4);
$fmt->setSymbol(\NumberFormatter::CURRENCY_SYMBOL, '%');
echo $fmt->formatCurrency(1234567890.891234567890000, "EUR")."\n";
// Outputs 1.234.567.890,89 €
?>
The first table row under General Purpose Numbers of the Unicode CLDR number pattern documentation describes that when parsing currency patterns, the two zeroes in the decimal part of the pattern will need to be replaced by however many digits the application thinks is appropriate. The application here is ICU (the C library that PHP uses for this), and the MIN_FRACTION_DIGITS attribute does its job of letting me override default behavior in the first example, but not in the second or the third.
Can someone please explain this seemingly random change in behavior? Let me know if there is any additional information that you need.

I just found the following:
https://bugs.php.net/bug.php?id=63140
http://bugs.icu-project.org/trac/ticket/7667
[2012-10-05 08:21 UTC] jpauli#email.com
I confirm this is an ICU bug in 4.4.x branch.
Consider upgrading libicu, 4.8.x gives correct result

Get "list separator" character for any locale

Starting with only the locale identifier name (string) provided by clients, how or where do I look up the default "list separator" character for that locale?
The "list separator" setting is the character many different types of applications and programming languages may use as the default grouping character when joining or splitting strings and arrays. This is especially important for opening CSV files in spreadsheet programs. Though this is often the comma ",", this default character may be different depending on the machine's region settings. It may even differ between OS's.
I'm not interested in my own server environment here. Instead, I need to know more about the client's based off their locale identifier which they've given to me, so my own server settings are irrelevant. Also for this solution, I can not change the locale setting on this server to match a client's for the entire current process as a shortcut to look this value up.
If this is defined in the ICU library, I'm not able to find any way to look this value up using the INTL extension.
Any hints?

I am not sure if my answer will satisfy your requirements but I suggest (especially as you don't want to change the locale on the server) to use a function that will give you the answer:
To my knowledge (and also Wikipedia's it seems) the list separator in a CSV is a comma unless the decimal point of the locale is a comma, in that case the list separator is a semicolon.
So you could get a list of all locales that use a comma (Unicode U+002C) as separator using this command:
cd /usr/share/i18n/locales/
grep decimal_point.*2C *_* -l
and you could then take this list to determine the appropriate list separator:
function get_csv_list_separator($locale) {
$locales_with_comma_separator = "az_AZ be_BY bg_BG bs_BA ca_ES crh_UA cs_CZ da_DK de_AT de_BE de_DE de_LU el_CY el_GR es_AR es_BO es_CL es_CO es_CR es_EC es_ES es_PY es_UY es_VE et_EE eu_ES eu_ES#euro ff_SN fi_FI fr_BE fr_CA fr_FR fr_LU gl_ES hr_HR ht_HT hu_HU id_ID is_IS it_IT ka_GE kk_KZ ky_KG lt_LT lv_LV mg_MG mk_MK mn_MN nb_NO nl_AW nl_NL nn_NO pap_AN pl_PL pt_BR pt_PT ro_RO ru_RU ru_UA rw_RW se_NO sk_SK sl_SI sq_AL sq_MK sr_ME sr_RS sr_RS#latin sv_SE tg_TJ tr_TR tt_RU#iqtelif uk_UA vi_VN wo_SN");
if (stripos($locales_with_comma_separator, $locale) !== false) {
return ";";
}
return ",";
}
(the list of locales is taken from my own Debian machine, I don't know about the completeness of the list)
If you don't want to have this static list of locales (though I assume that this doesn't change that often), you can of course generate the list using the command above and cache it.
As a final note, according to RFC4180 section 2.6 the list separator actually never changes but rather fields containing a comma (so this also means floating numbers, depending on the locale) should be enclosed in double-quotes. Though (as linked above) not many people follow the RFC standard.

There's no such locale setting as "list separator" it might be software specific, but I doubt it's user specific.
However... You can detect user's locale and try to match the settings.
Get browsers locale: $accept_lang = $_SERVER['HTTP_ACCEPT_LANGUAGE']; this might contain a list of comma-separated values. Some browser don't send this though. more here...
Next you can use setlocale(LC_ALL, $accept_lang); and get available locale settings using $locale_info = localeconv(); more here...

PHP PECL extension intl giving garbled results for Swedish ordinal numbers

I'm using the PECL intl module to localize dates and numbers in a PHP project. In all other languages I'm using (40), localizing ordinal numbers works fine. In Swedish, however, I get strange output. It appears to be the template constants used to generate the ordinals.
$fnf = new NumberFormatter('sv_FI', NumberFormatter::ORDINAL);
echo $fnf->format(1);
and
$snf = new NumberFormatter('sv_SE', NumberFormatter::ORDINAL);
echo $snf->format(1);
Both return 1:e%digits-ordinal-neutre:0: 1:a vs. something like 1st or 1er.
My only guess, other than a bug, is that I'm missing some additional argument such as the gender of an associated verb.

If you output the rule based number formatters rules $fnf->getPattern():
%digits-ordinal-masculine:
0: =#,##0==%%dord-mascabbrev=;
-x: −>%digits-ordinal-masculine>;
%%dord-mascabbrev:
0: :e%digits-ordinal-neutre:0: =%digits-ordinal-feminine=;
%digits-ordinal-reale:
0: =%digits-ordinal-feminine=;
%digits-ordinal-feminine:
0: =#,##0==%%dord-femabbrev=;
-x: −>%digits-ordinal-feminine>;
%%dord-femabbrev:
0: :e;
1: :a;
2: :a;
3: :e;
20: >%%dord-femabbrev>;
100: >%%dord-femabbrev>;
%digits-ordinal:
0: =%digits-ordinal-masculine=;
You can see that the private rule set dord-mascabbrev only has one rule giving that value:
:e%digits-ordinal-neutre:0: 1:a
Which you will have then output after your 1, like you describe in your question.
This is not a bug in PECL INTL, but the underlying rule is malformatted which is part of the ICU Libraries (that rule there). About three years ago the sv number formatter rules were fixed for missing semicolons, it looks like that one line slipped through.
These rules are taken into ICU from the CLDR (Common Locale Data Repository) at the Unicode Consortium. I opened a bug report there, because unless this is fixed in CLDR, and then put into ICU, it can't work with the PHP INTL extension.
The alternative might be to manually patch the ICU libraries (version 4.8) and then build the PECL package against your patched libraries.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

NumberFormatter::SPELLOUT spellout-ordinal in russian and italian - php

Related

[Docker][PHP] NumberFormatter::formatCurrency incompatible between 7.4.29 and 7.4.28

Trying to use gettext without country code ('es' instead 'es_ES')

NumberFormatter::formatCurrency() ignores MIN_FRACTION_DIGITS

Get "list separator" character for any locale

PHP PECL extension intl giving garbled results for Swedish ordinal numbers

Categories

Resources