I just posted this question link text about 5 minutes ago and I forgot to mention that the format was like this
"$2,090.99 "
I need the final value like
"209099"
Striping the final extra space and getting rid of any other punctuation in the money value with php so i can store into a mysql decimal 10,2
You can use a regular expression to replace everything that is not a digit:
$output = preg_replace('/\D/', '', $str);
\D is equivalent to [^\d] that is equivalent to [^0-9].
You might be better off using PHP 5.3's MessageFormatter, Locale and Intl classes if you'll be handling different locales and currency formats. The msgfmt_parse() method might just be what you need.
Related
I need to change the decimal separator in a given string that has numbers in it.
What RegEx code can ONLY select the thousands separator character in the string?
It need to only select, when there is number around it. For example only when 123,456 I need to select and replace ,
I'm converting English numbers into Persian (e.g: Hello 123 becomes Hello ۱۲۳). Now I need to replace the decimal separator with Persian version too. But I don't know how I can select it with regex. e.g. Hello 121,534 most become Hello ۱۲۱/۵۳۴
The character that needs to be replaced is , with /
Use a regular expression with lookarounds.
$new_string = preg_replace('/(?<=\d),(?=\d)/', '/', $string);
DEMO
(?<=\d) means there has to be a digit before the comma, (?=\d) means there has to be a digit after it. But since these are lookarounds, they're not included in the match, so they don't get replaced.
According to your question, the main problem you face is to convert the English number into the Persian.
In PHP there is a library available that can format and parse numbers according to the locale, you can find it in the class NumberFormatter which makes use of the Unicode Common Locale Data Repository (CLDR) to handle - in the end - all languages known to the world.
So converting a number 123,456 from en_UK (or en_US) to fa_IR is shown in this little example:
$string = '123,456';
$float = (new NumberFormatter('en_UK', NumberFormatter::DECIMAL))->parse($string);
var_dump(
(new NumberFormatter('fa_IR', NumberFormatter::DECIMAL))->format($float)
);
Output:
string(14) "۱۲۳٬۴۵۶"
(play with it on 3v4l.org)
Now this shows (somehow) how to convert the number. I'm not so firm with Persian, so please excuse if I used the wrong locale here. There might be options as well to tell which character to use for grouping, but for the moment for the example, it's just to show that conversion of the numbers is taken care of by existing libraries. You don't need to re-invent this, which is even a sort of miss-wording, this isn't anything a single person could do, or at least it would be sort of insane to do this alone.
So after clarifying on how to convert these numbers, question remains on how to do that on the whole text. Well, why not locate all the potential places looking for and then try to parse the match and if successful (and only if successful) convert it to the different locale.
Luckily the NumberFormatter::parse() method returns false if parsing did fail (there is even more error reporting in case you're interested in more details) so this is workable.
For regular expression matching it only needs a pattern which matches a number (largest match wins) and the replacement can be done by callback. In the following example the translation is done verbose so the actual parsing and formatting is more visible:
# some text
$buffer = <<<TEXT
it need to only select , when there is number around it. for example only
when 123,456 i need to select and replace "," I'm converting English
numbers into Persian (e.g: "Hello 123" becomes "Hello ۱۲۳"). now I need to
replace the Decimal separator with Persian version too. but I don't know how
I can select it with regex. e.g: "Hello 121,534" most become
"Hello ۱۲۱/۵۳۴" The character that needs to be replaced is , with /
TEXT;
# prepare formatters
$inFormat = new NumberFormatter('en_UK', NumberFormatter::DECIMAL);
$outFormat = new NumberFormatter('fa_IR', NumberFormatter::DECIMAL);
$bufferWithFarsiNumbers = preg_replace_callback(
'(\b[1-9]\d{0,2}(?:[ ,.]\d{3})*\b)u',
function (array $matches) use ($inFormat, $outFormat) {
[$number] = $matches;
$result = $inFormat->parse($number);
if (false === $result) {
return $number;
}
return sprintf("< %s (%.4f) = %s >", $number, $result, $outFormat->format($result));
},
$buffer
);
echo $bufferWithFarsiNumbers;
Output:
it need to only select , when there is number around it. for example only
when < 123,456 (123456.0000) = ۱۲۳٬۴۵۶ > i need to select and replace "," I'm converting English
numbers into Persian (e.g: "Hello < 123 (123.0000) = ۱۲۳ >" becomes "Hello ۱۲۳"). now I need to
replace the Decimal separator with Persian version too. but I don't know how
I can select it with regex. e.g: "Hello < 121,534 (121534.0000) = ۱۲۱٬۵۳۴ >" most become
"Hello ۱۲۱/۵۳۴" The character that needs to be replaced is , with /
Here the magic is just two bring the string parts into action with the number conversion by making use of preg_replace_callback with a regular expression pattern which should match the needs in your question but is relatively easy to refine as you define the whole number part and false positives are filtered thanks to the NumberFormatter class:
pattern for Unicode UTF-8 strings
|
(\b[1-9]\d{0,2}(?:[ ,.]\d{3})*\b)u
| | |
| grouping character |
| |
word boundary -----------------+
(play with it on regex101.com)
Edit:
To only match the same grouping character over multiple thousand blocks, a named reference can be created and referenced back to it for the repetition:
(\b[1-9]\d{0,2}(?:(?<grouping_char>[ ,.])\d{3}(?:(?&grouping_char)\d{3})*)?\b)u
(now this get's less easy to read, get it deciphered and play with it on regex101.com)
To finalize the answer, only the return clause needs to be condensed to return $outFormat->format($result); and the $outFormat NumberFormatter might need some more configuration but as it is available in the closure, this can be done when it is created.
(play with it on 3v4l.org)
I hope this is helpful and opens up a broader picture to not look for solutions only because hitting a wall (and only there). Regex alone most often is not the answer. I'm pretty sure there are regex-freaks which can give you a one-liner which is pretty stable, but the context of using it will not be very stable. However not saying there is only one answer. Instead bringing together different levels of doings (divide and conquer) allows to rely on a stable number conversion even if yet still unsure on how to regex-pattern an English number.
You can write a regex to capture numbers with thousand separator, and then aggregate the two numeric parts with the separator you want :
$text = "Hello, world, 121,534" ;
$pattern = "/([0-9]{1,3}),([0-9]{3})/" ;
$new_text = preg_replace($pattern, "$1X$2", $text); // replace comma per 'X', keep other groups intact.
echo $new_text ; // Hello, world, 121X534
In PHP you can do that using str_replace
$a="Hello 123,456";
echo str_replace(",", "X", $a);
This will return: Hello 123X456
This is my code to pre_match when an amount looks like this: $ 99.00 and it works
if (preg_match_all('/[$]\s\d+(\.\d+)?/', $tout, $matches))
{ $tot2 = $matches[0];
$tot2 = preg_replace("/\\\$/", '', $tot2);}
I need to do the same thing for a amount that looks like this (with a comma): $ 99,00
Thank you for your help (changing dot for comma do not help, there is an "escape" thing I do not understand...
Idealy I need to preg_match any number that looks like an amount with dot or commas and with or without dollar sign before or after (I know, it's a lot to ask :) since on the result form I want to scan there are phone and street numbers...
UPDATE (For some reason I cannot comment on replies) : To test properly, I need to preg_replace the comma by a dot (since we are dealings with sums, I don't think calculations can be done on numbers with commas in it).
So to clarify my question, I should say : I need to transform, let's say "$ 200,24" to "200.24". (could be amounts bettween 0.10 to 1000.99) :
$tot2 = preg_replace("/\\\$/", '', $tot2);}
(this code just deals with the $ (it works), I need adaptation to deal also with the change of (,) for (.))
No, using , in place of \. works perfectly fine.
It's just that your input does not contain a space between dollar sign and amount $ 99,00 like your .-using source did.
Make the \s optional.
How about:
$str='$ 200,24';
echo str_replace(array('$',',',' '), array('','.',''), $str);
output:
200.24
replace the . character with a character class [,.] which includes both a dot(.) and comma(,)
'/[$]\s\d+([.,]\d+)?/'
edit: comment is correct, regex fixed.
I'm trying to make a function to verify names on PHP using Regex, I want the names to be able to carry infinite amount of spaces and ' and -, and to allow only capital characters after spaces but to allow capital and none capitals after - and '.. Also the total length should be of 50 characters and the name should end with a lowercase, note that the uppercases are A to Z plus those characters :
ÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ
and the lower cases are a to z plus those characters :
éçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß
each word (between a space , ' or - and another) should count at least 2 characters the name should also start with an uppercase and finish with a lower case and in words (between a space , ' or - and another) no uppercases but that of the beginning is allowed
Examples of acceptable names are :
Adam Klsld
Adam'odskdl
Adam'Ddlsl
Ùdam-ddkkdk
Addssd-Ddsdsd
I've been trying a lot but here's my last try that I still keep in my php file, the others I've deleted in the chaos of non-successful attempts (using mb_ereg function to match, so this is a posix-ere):
([A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ][a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+){1}((^[\'\-\s])[A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ][a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+)*
(this does not necessarily mean it's the best attempt but I though it may help and give an idea on how much of a dork am I)
I wouldn't exactly suggest you use this... but I think this does what you want?
^([A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ][a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+){1}((([\s])[A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ][a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+)|((['\-])([A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ]|[a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß])[a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+))*$
Here it is in a non-code block so you can see how insane it is... think it strips some characters here though:
^([A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ][a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+){1}((([\s])[A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ][a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+)|((['-])([A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ]|[a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß])[a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]+))*$
Is this Regex answering what you need to check ?
(You'll have to add the weird characters inside each brackets of course).
You can use this to avoid accented characters issue:
$pattern = "~^[\p{Lu}ß]\p{Ll}*+(?>(?> [\p{Lu}ß]|['-]\p{L})\p{Ll}*+)*$~u";
if(preg_match($pattern, $name)) { ...
Or for a more specific set of characters:
$pattern = "~(?(DEFINE)(?<Up>[A-ZÙÒÌÈÀÁÉÍßÓÚÝÂÊÎÔÛÃÑÕÄÅÆŒÇÐØËÏÖÜŸ]))
(?(DEFINE)(?<Lo>[a-zéçàèàèìòùáéíóúýâêîôûãñõäëïöüÿåæœçðøß]))
^\g<Up>\g<Lo>*+(?>(?>\h\g<Up>|['-]\g<Up>?+\g<Lo>)\g<Lo>*+)*+$~ux";
if (preg_match($pattern, $name, $matches)) { ...
or the same in a shorter way:
$pattern = "~(?(DEFINE)(?<Up>[A-ZÀ-ÖØ-ݟߌ]))
(?(DEFINE)(?<Lo>[a-zà-öø-ýÿßœ]))
^\g<Up>\g<Lo>*+(?>(?>\h\g<Up>|['-]\g<Up>?+\g<Lo>)\g<Lo>*+)*+$~ux";
I have the following example strings:
The price is $54.00 including delivery
On sale for £12.99 until December
European pricing €54.76 excluding UK
From each of them I want to return only the price and currency denominator
$54.00
£12.99
€54.76
My though process is the have an array of currency symbols and search the string for each one and then capture just the characters before the space after that - however, $ 67.00 would then fail
So, can i run through an array of preset currency symbols, then explode the string and chop it at the next instance of a non numeric character that is not a . or , - or maybe with regex
Is this possible?
In regex, \p{Currency_Symbol} or \p{Sc} represent any currency symbol.
However, PHP supports only the shorthand form \p{Sc} and /u modifier is required.
Using regex pattern
/\p{Sc}\s*\d[.,\d]*(?<=\d)/u
you will be able to match for example:
$1,234
£12.3
€ 5,345.01
If you want to use . as a decimal separator and , as a thousands delimiter, then go with
/\p{Sc}\s*\d{1,3}(?:,\d{3})*(?:\.\d+)?/u
Check this demo.
You could go for something like this:
preg_match('/(?:\$|€|£)\s*[\d,.-]+/', $input, $match);
And then find your currency and price inside $match.
Of course, you can generate that first part from an array of currency symbols. Just don't forget to escape everything:
$escapedCurrency = array_map("preg_quote", $currencyArray);
$pattern = '/(?:' . implode("|", $escapedCurrency) . ')\s*[\d,.-]+/';
preg_match($pattern, $input, $match);
Some possible improvement to the end of the pattern (the actual number):
(?:\$|€|£)\s*\d+(?:[.,](?:-|\d+))?
That will make sure that there is only one . or , followed by either - or only digits (in case your intention was to allow an international decimal separator).
If you only want to allow the comma to separate thousands, you could go for this:
(?:\$|€|£)\s*\d{1,3}(?:,\d{3})*(?:\.(?:-|\d+))?
This will match the longest "correctly" formatted number (i.e. $ 1,234.4567,123.456 -> $ 1,234.4567 or € 123,456789.12 -> € 123,456). It really depends on how accurate you want to go for.
I'm using PHP 5.3 to receive a Dataset from a web service call that brings back information on one or many transactions. Each transaction's return values are delimited by a pipe (|), and beginning/ending of a transaction is delimited by a space.
2109695|49658|25446|4|NSF|2010-11-24 13:34:00Z 2110314|45276|26311|4|NSF|2010-11-24 13:34:00Z 2110311|52117|26308|4|NSF|2010-11-24 13:34:00Z (etc)
Doing a simple split on space doesn't work because of the space in the datetime stamp. I know regex well enough to know that there are always different ways to break this down, so I thought getting a few expert opinions would help me come up with the most airtight regex.
If each timestamp is going to have a Z at the end you can use positive lookbehind assertion to split on space only if it's preceded by a Z as:
$transaction = preg_split('/(?<=Z) /',$input);
Once you get the transactions, you can split them on | to get the individual parts.
Codepad link
Note that if your data has a Z followed a space anywhere else other than the timestamp, the above logic will fail. To overcome than you can split on space only if it's preceded by a timestamp pattern as:
$transaction = preg_split('/(?<=\d\d:\d\d:\d\dZ) /',$input);
As others have said, if you know for sure that there will be no Z characters anywhere other than in the date, you could just do:
$records = explode('Z', $data);
But if you have them elsewhere, you'll need to do something a bit fancier.
$regex = '#(?<=\d{2}:\d{2}:\d{2}Z)\s#i';
$records = preg_split($regex, $data, -1, PREG_SPLIT_NO_EMPTY);
Basically, that record looks for the time portion (00:00:00) followed by a Z. Then it splits on the following white-space character...
Each timestamp is going to have a Z at the end so explode it by 'Z '. You don't need a regular expression. There's no chance that the date has a Z after it only the time.
example
Use explode('|', $data) function