PHP numbers to words SPANISH! - php

Does anyone know of a free-licence PHP code that would convert numbers to words in spanish?
It's needed for my work for generating bills so it needs to be accurate.
My knowledge of spanish is practically non-existent, so it would probably be hard to write it myself; I don't know anything about spanish grammar.
Edit: I've written my own version of this, for now it works for only 3 digit numbers (on both sides of the decimal symbol), but it's a good start. If I ever get it big (5 languages planned atm), I'll probably share it on github. Don't bet on it though.

You may use PHP Intl extension to do that.
if (version_compare(PHP_VERSION, '5.3.0', '<')
|| !class_exists('NumberFormatter')) {
exit('You need PHP 5.3 or above, and php_intl extension');
}
$formatter = new \NumberFormatter('es', \NumberFormatter::SPELLOUT);
echo $formatter->format(1234567) . "\n";
Output:
un millón doscientos treinta y cuatro mil quinientos sesenta y siete
It works not only with Spanish but many other languages as well.

Is there an easy way to convert a number to a word in PHP?
From the above, you can derive the "word" from the number, and then translate it to any language you like using any of the Translate API's out there ... or your own lookups.
Edit
Another way you could employ is simply hard coding in a PHP file or a text file a big array of values:
$numTranslation = array(1 => "uno", 2 => "dos");
Then, in your code, just retrieve the echo $numTranslation[2] if the number was 2 to print out the spanish equivalent.
Edit 2
Just to make it a bit more complete, if you ever want to support multiple languages, and not just spanish:
$numTranslation = array(
1 => array("en" => "One", "es" => "uno"),
2 => array("en" => "Two", "es" => "dos")
);
And to print it out to the end user: echo $numTranslation[2]['es']; to get the Spanish equivalent...

The only thing I could find is a Perl script.
The code itself is easy to write in PHP, and from the Perl script you can get the logic and the Spanish words for numbers.

here's the words...
1 to 100
http://www.top-tour-of-spain.com/1-100-Numbers-In-Spanish.html
100+
http://www.intro2spanish.com/vocabulary/numbers/advanced.htm
or
http://www.donquijote.org/spanishlanguage/numbers/numbers2.asp
and I found some nicely formatted VBA here:
http://www.excelforum.com/excel-programming/637231-translating-numbers-into-words.html

Related

Enforce English only on PHP form submission

I would like the contact form on my website to only accept text submitted in English. I've been dealing with a lot of spam recently that has appeared in multiple languages that is slipping right past the CAPTCHA. There is simply no reason for anyone to submit this form in a language other than English since it's not a business and more of a hobby for personal use.
I've been looking through this documentation and was hopeful that something like preg_match( '/[\p{Latin}]/u', $input) might work, but I'm not bilingual and don't understand all the nuances of character encoding, so while this will help filter out something like Russian it still allows languages like Vietnamese to slip through.
Ideally I would like it to accept:
Any Unicode symbol that might be used. I have frequently come across different styles of dashes, apostrophes, or things related to math, for example.
Common diacritical marks / accented characters found in words like "résumé."
And I would like it to reject:
Anything that appears to be something other than English, or uncommon. I'm not overly concerned with accents such as "naïve" or in words borrowed from other languages.
I'm thinking of simply stripping all potentially valid characters as follows:
$input = 'testing for English only!';
// reference: https://en.wikipedia.org/wiki/List_of_Unicode_characters
// allowed punctuation
$basic_latin = '`~!##$%^&*()-_=+[{]}\\|;:\'",<.>/?';
$input = str_replace(str_split($basic_latin), '', $input);
// allowed symbols and accents
$latin1_supplement = '¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿É×é÷';
$input = str_replace(str_split($latin1_supplement), '', $input);
$unicode_symbols = '–—―‗‘’‚‛“”„†‡•…‰′″‹›‼‾⁄⁊';
$input = str_replace(str_split($unicode_symbols), '', $input);
// remove all spaces including tabs and end lines
$input = preg_replace('/\s+/', '', $input);
// check that remaining characters are alpha-numeric
if (strlen($input) > 0 && ctype_alnum($input)) {
echo 'this is English';
} else {
echo 'no bueno señor';
}
However, I'm afraid there might be some perfectly common and valid exceptions that I'm unwittingly leaving out. I'm hoping that someone might be able to offer a more elegant solution or approach?
There are no native PHP features that would provide language recognition. There's an abandoned Pear package and some classes floating around the cyberspace (I haven't tested). If an external API is fine, Google's Translation API Basic can detect language, 500K free characters per month.
There is however a very simple solution to all this. We don't really need to know what language it is. All we need to know is whether it's reasonably valid English. And not Swahili or Klingon or Russian or Gibberish. Now, there is a convenient PHP extension for this: PSpell.
Here's a sample function you might use:
/**
* Spell Check Stats.
* Returns an array with OK, FAIL spell check counts and their ratio.
* Use the ratio to filter out undesirable (non-English/garbled) content.
*
* #updated 2022-12-29 00:00:29 +07:00
* #author #cmswares
* #ref https://stackoverflow.com/q/74910421/4630325
*
* #param string $text
*
* #return array
*/
function spell_check_stats(string $text): array
{
$stats = [
'ratio' => null,
'ok' => 0,
'fail' => 0
];
// Split into words
$words = preg_split('~[^\w\']+~', $text, -1, PREG_SPLIT_NO_EMPTY);
// Nw PSpell:
$pspeller = pspell_new("en");
// Check spelling and build stats
foreach($words as $word) {
if(pspell_check($pspeller, $word)) {
$stats['ok']++;
} else {
$stats['fail']++;
}
}
// Calculate ratio of OK to FAIL
$stats['ratio'] = match(true) {
$stats['fail'] === 0 => 0, // avoiding division by zero here!
$stats['ok'] === 0 => count($words),
default => $stats['ok'] / $stats['fail'],
};
return $stats;
}
Source at BitBucket. Function usage:
$stats = spell_check_stats('This starts in English, esto no se quiere, tätä ei haluta.');
// ratio: 0.7142857142857143, ok: 5, fail: 7
Then simply decide the threshold at which a submission is rejected. For example, if 20 words in 100 fail; ie. 80:20 ratio, or "ratio = 4". The higher the ratio, the more (properly-spelled) English it is.
The "ok" and "fail" counts are also returned in case you need to calibrate separately for very short strings. Run some tests on existing valid and spam content to see what sorts of figures you get, and then tune your rejection threshold accordingly.
PSpell package for PHP may not be installed by default on your server. On CentOS / RedHat, yum install php-pspell aspell-en, to install both the PHP module (includes ASpell dependency), along with an English dictionary. For other platforms, install per your package manager.
For Windows and modern PHP, I can't find the extension dll, or a maintained Aspell port. Please share if you've found a solution. Would like to have this on my dev machine too.

Getting different output for same PHP code

(Can't paste the exact question as the contest is over and I am unable to access the question. Sorry.)
Hello, recently I took part in a programming contest (PHP). I tested the code on my PC and got the desired output but when I checked my code on the contest website and ideone, I got wrong output. This is the 2nd time the same thing has happened. Same PHP code but different output.
It is taking input from command line. The purpose is to bring substrings that contact the characters 'A','B','C','a','b','c'.
For example: Consider the string 'AaBbCc' as CLI input.
Substrings: A,a,B,b,C,c,Aa,AaB,AaBb,AaBbC,AaBbCc,aB,aBb,aBbC,aBbCc,Bb,BbC,BbCc,bC,bCc,Cc.
Total substrings: 21 which is the correct output.
My machine:
Windows 7 64 Bit
PHP 5.3.13 (Wamp Server)
Following is the code:
<?php
$stdin = fopen('php://stdin', 'r');
while(true) {
$t = fread($stdin,3);
$t = trim($t);
$t = (int)$t;
while($t--) {
$sLen=0;
$subStringsNum=0;
$searchString="";
$searchString = fread($stdin,20);
$sLen=strlen($searchString);
$sLen=strlen(trim($searchString));
for($i=0;$i<$sLen;$i++) {
for($j=$i;$j<$sLen;$j++) {
if(preg_match("/^[A-C]+$/i",substr($searchString,$i,$sLen-$j))) {$subStringsNum++;}
}
}
echo $subStringsNum."\n";
}
die;
}
?>
Input:
2
AaBbCc
XxYyZz
Correct Output (My PC):
21
0
Ideone/Contest Website Output:
20
0
You have to keep in mind that your code is also processing the newline symbols.
On Windows systems, newline is composed by two characters, which escaped representation is \r\n.
On UNIX systems including Linux, only \n is used, and on MAC they use \r instead.
Since you are relying on the standard output, it will be susceptible to those architecture differences, and even if it was a file you are enforcing the architecture standard by using the flag "r" when creating the file handle instead of "rb", explicitly declaring you don't want to read the file in binary safe mode.
You can see in in this Ideone.com version of your code how the PHP script there will give the expected output when you enforce the newline symbols used by your home system, while in this other version using UNIX newlines it gives the "wrong" output.
I suppose you should be using fgets() to read each string separetely instead of fread() and then trim() them to remove those characters before processing.
I tried to analyse this code and that's what I know:
It seems there are no problems with input strings. If there were any it would be impossible to return result 20
I don't see any problem with loops, I usually use pre-incrementation but it shouldn't affect result at all
There are only 2 possibilities for me that cause unexpected result:
One of the loops iteration isn't executed - it could be only the last one inner loop (when $i == 5 and then $j == 5 because this loop is run just once) so it will match difference between 21 and 20.
preg_match won't match this string in one of occurrences (there are 21 checks of preg_match and one of them - possible the last one doesn't match).
If I had to choose I would go for the 1st possible cause. If I were you I would contact concepts author and ask them about version and possibility to test other codes. In this case the most important is how many times preg_match() is launched at all - 20 or 21 (using simple echo or extra counter would tell us that) and what are the strings that preg_match() checks. Only this way you can find out why this code doesn't work in my opinion.
It would be nice if you could put here any info when you find out something more.
PS. Of course I also get result 21 so it's hard to say what could be wrong

How to parse strings - detailed explanation and information on syntax

I would like to parse a sting of data in a shell script with a simple 1 line expression. But I do not know how or where to find any information describing how it is done. All the examples I can find just looks like an illegal math equations, and I can not find any documentation describing how it works.
First, what exactly is this form of parsing called so I know what I am talking about and what to search for. Secondly, where can I find what it all means so I can learn how to use it correctly and not just copy some one else's work with little understanding of how it works.
/\.(\w+)/*.[0-9]/'s/" /"\n/g;s/=/\n/gp
I recall learning about this in perl a couple decades ago, but have long since forgotten what it all means. I have spent days searching for information on what this all means. All I can find are specific examples with no explanations of what it is technically called and how it works!
I want to separate each field then extract the key name and numerical data in a shell script. I realize some forms of parsing are done differently in shell scripts as opposed to php or perl scripts. But I need to learn the parsing syntax used to filter out the specific data sets that I could use in both, shell and php.
Currently I need to parse a single line of data from a file in a shell script for a set of conditionals required by other support scripts.
#!/bin/sh
Line=`cat ./dump.txt`
#Line = "V:12.46 A:3.427 AV:6.08 D:57.32 S:LOAD CT:45.00 P:42.71 AH:2016.80"
# for each field parse data ("/[A-Z]:[0-9]/}" < $Line)
# $val[$1] = $2
# $val["V"] = "12.46"
# $val["AV"] = "6.08"
if $val["V"] < 11.4
then
~/controls/stop.sh
else
~/controls/start.sh
fi
if $val["AV"] > 10.7
then
echo $val["AV"] > ./source.txt
else
echo "DOWN" > ./source.txt
fi
I need to identify and separate the difference between "V:" and "AV:".
In php I can use foreach & explode into an array. But I am tired of writing half a page of code for some thing that can be done in a single line. I need to learn a simpler and more efficient way to parse data from a string and extract the data in to a usable variable.
$Line = file_get_contents("./dump.txt");
$field = explode (' ' , $Line);
foreach($field as $arg)
{
$val = explode (':' , $arg);
$data[$val[0]] = $val[1];
}
# $data["V"] = "12.46"
# $data["AV"] = "6.08"
A quick shell example is much appreciated, but I really need to know "HOW TO" do this my self. Please give me some links or search criteria to find the definitions and syntax to these parsing expressions.
Thank you in advance for your help.
The parsing patterns you're talking about are commonly referred to as regular expressions or regex.
For php you can find a lot of helpful information from http://au1.php.net/manual/en/book.pcre.php
Regex is quite hard especially for complex expressions so I usually google search for an online regex expression tester. Preferably one which highlights whats being matched. Javascript ones are especially good as the results are instant and the regex syntax is the same for PHP.
Special thanks to James T for leading me in the right direction.
After reading through the regular expressions I have figured out the search pattern I need. Also included is a brief script to test the output. Taking into account that BASH can not use decimal numbers we need to convert it to a whole number. The decimal intigers is always fixed at 2 or 3 places so conversion is easy, just drop the decimal. Also the order in which the fields are recorded remains constant so the order in which they are read will remain the same.
The regular expression that fits the search for each of the first 4 fields is:
\w+:([0-9]+)\.([0-9]+)\s
( ) = the items to search/parse; using 2 searches for each data set "V:12.46"
\w = for the word search and the " + " means any 1 or more letters
: = for the delimiter
( -search set 1:
[0-9] = search any numbers and the " + " means any 1 or more digits
) -end search set 1
\. = for the decimal point in the data
( -search set 2:
[0-9] = search any numbers and the " + " means any 1 or more ( second set after the decimal)
) -end search set 2
\s = white space (blank space)
Now duplicate the search 3 times for the first 3 fields, giving me 6 variables.
\w+:([0-9]+)\.([0-9]+)\s\w+:([0-9]+)\.([0-9]+)\s\w+:([0-9]+)\.([0-9]+)\s
And here is a simple script to test the output:
#!/bin/bash
Line="V:13.53 A:7.990 AV:13.65 D:100.00 S:BulkCharge CT:35.00 P:108.11 AH:2116.20"
regex="\w+:([0-9]+)\.([0-9]+)\s\w+:([0-9]+)\.([0-9]+)\s\w+:([0-9]+)\.([0-9]+)\s"
if [[ $Line =~ $regex ]]; then
echo "match found in $Line"
i=1
n=${#BASH_REMATCH[*]}
while [[ $i -lt $n ]]
do
echo " capture[$i]: ${BASH_REMATCH[$i]}"
let i++
done
Volt=${BASH_REMATCH[1]}${BASH_REMATCH[2]}
Amp=${BASH_REMATCH[3]}${BASH_REMATCH[4]}
AVG=${BASH_REMATCH[5]}${BASH_REMATCH[6]}
else
echo "$Line does not match"
fi
if [ $Volt -gt 1200 ]
then
echo "Voltage is $Volt"
fi
resulting with an output of:
match found in V:13.53 A:7.990 AV:13.65 D:100.00 S:BulkCharge CT:35.00 P:108.11 AH:2116.20
capture[1]: 13
capture[2]: 53
capture[3]: 7
capture[4]: 990
capture[5]: 13
capture[6]: 65
Voltage is 1353

Gettext() with larger texts

I'm using gettext() to translate some of my texts in my website. Mostly these are short texts/buttons like "Back", "Name",...
// I18N support information here
$language = "en_US";
putenv("LANG=$language");
setlocale(LC_ALL, $language);
// Set the text domain as 'messages'
$domain = 'messages';
bindtextdomain($domain, "/opt/www/abc/web/www/lcl");
textdomain($domain);
echo gettext("Back");
My question is, how 'long' can this text (id) be in the echo gettext("") part ?
Is it slowing down the process for long texts? Or does it work just fine too? Like this for example:
echo _("LZ adfadffs is a VVV contributor who writes a weekly column for Cv00m. The former Hechinger Institute Fellow has had his commentary recognized by the Online News Association, the National Association of Black Journalists and the National ");
The official gettext documentation merely has this advice:
Translatable strings should be limited to one paragraph; don't let a single message be longer than ten lines. The reason is that when the translatable string changes, the translator is faced with the task of updating the entire translated string. Maybe only a single word will have changed in the English string, but the translator doesn't see that (with the current translation tools), therefore she has to proofread the entire message.
There's no official limitation on the length of strings, and they can obviously exceed at least "one paragraph/10 lines".
There should be virtually no measurable performance penalty for long strings.
gettext effectively has a limit of 4096 chars on the length of strings.
When you pass this limit you get a warning:
Warning: gettext(): msgid passed too long in %s on line %d
and returns you bool(false) instead of the text.
Source:
PHP Interpreter repository - The real fix for the gettext overflow bug
function gettext http://www.php.net/manual/en/function.gettext.php
it's defined as a string input so your machines memory would be the limiting factor.
try to benchmark it with microtime or better with xdebug if you have it on your development machine.

Sorting Katakana names

If I have a list of Katakana names what is the best way to sort them?
Also is it more common to sort names based on their {first name}{last name} or {last name}{first name}.
Another question is how do we get the first character Hiragana representation of a Katakana name like how it is done for the iPhone's contact list is sorted.?
Thanks.
In Japan it is common (if not expected) that a person's first name appear after their surname when written: {last} {first}. But this would also depend on the context. In a less formal context it would be acceptable for a name to appear {first} {last}.
http://en.wikipedia.org/wiki/Japanese_name
Not that it matters, but why would the names of individuals be written in Katakana and not in the traditional Kanji?
I think it's
sort($array,SORT_LOCALE_STRING);
Provide more information if it's not your case
This answer talks about using the system locale to sort Unicode strings in PHP. Besides threading issues, it is also dependent on your vendor having supplied you with a correct locale for what you want to use. I’ve had so much trouble with that particular issue that I’ve given up using vendor locales altogether.
If you’re worried about different pronunciations of Unihan ideographs, then you probably need access to the Unihan database — or its moral equivalent. A smaller subset may suffice.
For example, I know that in Perl, the JIS X 0208 standard is used when the Japanese "ja" locale for is selected in the constructor for Unicode::Collate::Locale. This doesn’t depend on the system locale, so you can rely on it.
I’ve also had good luck in Perl with Lingua::JA::Romanize::Japanese, as that’s somewhat friendlier to use than accessing Unicode::Unihan directly.
Back to PHP. This article observes that you can’t get PHP to sort Japanese correctly.
I’ve taken his set of strings and run it through Perl’s sort, and I indeed get a different answer than he gets. If I use the default or English locale, I get in Perl what he gets in PHP. But if I use the Japanese locale for the collation module — which has nothing to do with the system locale and is completely thread-safe — then I get a rather different result. Watch:
JA Sort                          = EN Sort
------------------------------------------------------------
Java                               Java
NVIDIA                             NVIDIA
Windows ファイウォール             Windows ファイウォール
インターネット オプション          インターネット オプション
キーボード                         キーボード
システム                           システム
タスク                             タスク
フォント                           フォント
プログラムの追加と削除             プログラムの追加と削除
マウス                             マウス
メール                             メール
音声認識                         ! 地域と言語オプション
画面                             ! 日付と時刻
管理ツール                       ! 画面
自動更新                         ! 管理ツール
地域と言語オプション             ! 自動更新
電源オプション                     電源オプション
電話とモデムのオプション           電話とモデムのオプション
日付と時刻                       ! 音声認識
I don’t know whether this will help you at all, because I don’t know how to get at the Perl bits from PHP (can you?), but here is the program that generates that. It uses a couple of non-standard modules installed from CPAN to do its business.
#!/usr/bin/env perl
#
# jsort - demo showing how Perl sorts Japanese in a
# different way than PHP does.
#
# Data taken from http://www.localizingjapan.com/blog/2011/02/13/sorting-in-japanese-—-an-unsolved-problem/
#
# Program by Tom Christiansen <tchrist#perl.com>
# Saturday, April 9th, 2011
use utf8;
use 5.10.1;
use strict;
use autodie;
use warnings;
use open qw[ :std :utf8 ];
use Unicode::Collate::Locale;
use Unicode::GCString;
binmode(DATA, ":utf8");
my #data = <DATA>;
chomp #data;
my $ja_sorter = new Unicode::Collate::Locale locale => "ja";
my $en_sorter = new Unicode::Collate::Locale locale => "en";
my #en_data = $en_sorter->sort(#data);
my #ja_data = $ja_sorter->sort(#data);
my $gap = 8;
my $width = 0;
for my $datum (#data) {
my $columns = width($datum);
$width = $columns if $columns > $width;
}
my $bar = "-" x ( 2 + 2 * $width + $gap );
$width = -($width + $gap);
say justify($width => "JA Sort"), "= ", "EN Sort";
say $bar;
for my $i ( 0 .. $#data ) {
my $same = $ja_data[$i] eq $en_data[$i] ? " " : "!";
say justify($width => $ja_data[$i]), $same, " ", $en_data[$i];
}
sub justify {
my($len, $str) = #_;
my $alen = abs($len);
my $cols = width($str);
my $spacing = ($alen > $cols) && " " x ($alen - $cols);
return ($len < 0)
? $str . $spacing
: $spacing . $str
}
sub width {
return 0 unless #_;
my $str = shift();
return 0 unless length $str;
return Unicode::GCString->new($str)->columns;
}
__END__
システム
画面
Windows ファイウォール
インターネット オプション
キーボード
メール
音声認識
管理ツール
自動更新
日付と時刻
タスク
プログラムの追加と削除
フォント
電源オプション
マウス
地域と言語オプション
電話とモデムのオプション
Java
NVIDIA
Hope this helps. It shows that it is, at least theoretically, possible.
EDIT
This answer from How can I use Perl libraries from PHP? references this PHP package to do that for you. So if you don’t find a PHP library with the needed Japanese sorting stuff, you should be able to use the Perl module. The only one you need is Unicode::Collate::Locale. It comes standard as of release 5.14 (really 5.13.4, but that’s a devel version), but you can always install it from CPAN if you have an earlier version of Perl.

Categories