How to remove letters with preceding full stop using php regex? - php

One of my script has the following function to return a float value from a text value consist with currency code.
function preformatFloat($value, $decimal_point = '.')
{
if ($decimal_point != '.' && strpos($value, $decimal_point)) {
$value = str_replace('.', '~', $value);
$value = str_replace($decimal_point, '.', $value);
}
return (float)preg_replace('/[^0-9\-\.]/', '', $value);
}
And if I run this function
echo preformatFloat("Rs.990.00",".");
I got the answer as 0.99 but I wanted to get the value as 990.00
I tried lot by modifying regex value but no luck.
Can you please help with this. Thank you

try this
function preformatFloat($string)
{
$precedingFullStop = strpos($string, ".");
$newString = substr($string, $precedingFullStop + 1);
return (float)preg_replace('/[^0-9\-\.]/', '', $newString);
}
echo preformatFloat("PKR.200.09");

Here is a different approach using a RegExp match. It tries to find the first valid (float) number and ignores anything else.
function parseFloatInput(
$value,
$decimal_separator = '.',
$thousand_separator = ','
) {
$pattern = '(
# define named templates for the separators,
# escape variable values for pattern
(?(DEFINE)(?<dp>'.preg_quote($decimal_separator).'))
(?(DEFINE)(?<tp>'.preg_quote($thousand_separator).'))
(
# at least one digit
(?:\d+)
# any repeat: a thousand separator followed by 3 digits
(?:(?&tp)?\d{3})*
# one or no: decimal separator followed by at least one digit
(?:(?&dp)?\d+)?
)
)x';
if (preg_match($pattern, $value, $match)) {
return (float)str_replace(
[$thousand_separator, $decimal_separator],
['', '.'],
$match[0]
);
}
return NAN;
}
var_dump(parseFloatInput("Rs.990.99","."));
var_dump(parseFloatInput("Rs.1,990.99","."));
var_dump(parseFloatInput("Rs.990.99",","));
var_dump(parseFloatInput("Rs.",","));
Output:
float(990.99)
float(1990.99)
float(990)
float(NAN)
Notes
/ is not the only possible delimiter. Any non alpha numeric character works. But only parentheses/brackets do not conflict with usage inside the pattern. So I prefer using () - reading it as group 0.
Modifier x for extended syntax. It allows to format and comment the pattern.
The separators are defined as templates, to improve the readability of the actual matching part.

Related

getting int value from comma separated number php

How do I turn a thousand-comma separated string representation of an integer into an integer value in PHP? (is there a general way to do it for other separators too?)
e.g. 1,000 -> 1000
Edit (Thanks #ghost) Ideally, decimals should be handled, but I could accept a solution that truncates at a decimal point.
If thats simple as it gets you could use filter_var():
$number = '1,000';
$number = (int) filter_var($number, FILTER_SANITIZE_NUMBER_INT);
var_dump($number);
Or
$number = '1,000.5669';
$number = (float) str_replace(',', '', $number);
var_dump($number);
You can strip a specific character using str_replace, and cast as an integer using intval. A regular expression filter can also be used to determine if the input string is formatted correctly. Here is what that code might look like:
<?php
function remove_delimiters_simple($string, $delimiter = ',') {
// Removes all instances of the specified delimiter and cast as an integer
// Comma (,) is the default delimiter
return (int) str_replace($delimiter, '', $string);
}
function remove_delimiters_advanced($string, $delimiter = ',') {
// Use preg_quote in case our delimiter is '/' for some reason
// The regular expression should match validly formatted numbers using a delimiter
// every 3 characters
$valid_format_expression = sprintf(
'/^\d{1,3}(%s\d{3})*$/',
preg_quote($delimiter, '/')
);
// If not a validly formatted number, return null
if (! preg_match($valid_format_expression, $string)) {
return null;
}
// Otherwise, return the simple value
return remove_delimiters_simple($string, $delimiter);
}
If using PHP >= 5.3, you could use numfmt_create(), like:
$fmt = numfmt_create( 'nl_NL', NumberFormatter::TYPE_INT32 );
$num = "1,000";
echo numfmt_parse($fmt, $num); //gives 1000
Note::nl_NL is the locale you used in formatting number, and it should be the same when using for numfmt_create

regex to trim down subdomain in the url

I have a regexp that match to something like : wiseman.google.com.jp, me.co.uk, paradise.museum, abcd-abc.net, www.google.jp, 12345-daswe-23dswe-dswedsswe-54eddss.info, del.icio.us, jo.ggi.ng, all of this is from a textarea value.
used regexp (in preg_match_all($regex1, $str, $match)) to get the above values: /(?:[a-zA-Z0-9]{2,}\.)?[-a-zA-Z0-9]{2,}\.[a-zA-Z0-9]{2,7}(?:\.[-a-zA-Z0-9]{2,3})?/
Now, my question is : how can I make the regexp to trim down the "wiseman.google.com.jp" into "google.com.jp" and "www.google.jp" into "google.jp"?
I am thingking to make a second preg_match($regex2, $str, $match) function with each value coming from the preg_match_all function.
I have tried this regexp in $regex2 : ([-a-zA-Z0-9\x{0080}-\x{00FF}]{2,}+)\.[a-zA-Z0-9\x{0080}-\x{00FF}]{2,7}(?:\.[-a-zA-Z0-9\x{0080}-\x{00FF}]{2,3})? but it doesn't work.
Any inputs? TIA
here is my little solution :
preg_match_all($regex, $str, $matches, PREG_PATTERN_ORDER);
$arrlength=count($matches[0]);
for($x=0;$x<$arrlength;$x++){
$dom = $matches[0][$x];
$newstringcount = substr_count($dom, '.'); // this line is to count how many "." present in the string.
if($newstringcount == 3){ // if there are 3 '.' present in the string = true
$pos = strpos($dom, '.', 0); // this line is to find the first occurence of the '.' in the string
$find = substr($dom, $pos+1); //this line is to get the value after the first occurence of the '.' in the string
echo $find;
}else if($newstringcount == 2){
if ($pos = strpos($dom,'www.') !== false) {
$find = substr($dom, $pos+3);
echo $find;
}else{
echo $dom;
}
}else if($newstringcount == 1){
echo $dom;
}
echo "<br>";
}
(Caution: this answer will only fit your needs if you HAVE to use regex or you're somewhat... desperate...)
What you want to achieve isn't possible with general rules due to domains like .com.jp or .co.uk.
The only general rule one can find is:
When read from right to left there are one or two TLDs followed by one second level domain
Thus, we have to whitelist all available TLDs. I think i'll call the following the "domain-kraken".
Release the kraken!
([a-z0-9\-]{2,63}(?:\.(?:a(?:cademy|ero|rpa|sia|[cdefgilmnoqrstuwxz])|b(?:ike
|iz|uilders|uzz|[abdefghijlmnoqrstvwyz])|c(?:ab|amera|amp|areers|at|enter|eo
|lothing|odes|offee|om(?:pany|puter)?|onstruction|ontractors|oop|
[acdfghiklmnoruvwxyz])|d(?:iamonds|irectory|omains|[ejkmoz])|e(?:du(?:cation)?
|mail|nterprises|quipment|state|[ceghrstu])|f(?:arm|lorist|[ijkmor])|g(?:allery|
lass|raphics|uru|[abdefghlmnpqrstuwy])|h(?:ol(?:dings|iday)|ouse|[kmnrtu])|
i(?:mmobilien|n(?:fo|stitute|ternational)|[delmnoqrst])|j(?:obs|[emop])|
k(?:aufen|i(?:tchen|wi)|[eghimnprwxyz])|l(?:and|i(?:ghting|mo)|[abcikrstuvy])|
m(?:anagement|enu|il|obi|useum|[acdefghklmnopqrstuvwxyz])|n(?:ame|et|inja|
[acefgilopruz])|o(?:m|nl|rg)|p(?:hoto(?:graphy|s)|lumbing|ost|ro|[aefghklmnrstwy])|
r(?:e(?:cipes|pair)|uhr|[eosuw])|s(?:exy|hoes|ingles|ol(?:ar|utions)|upport|
ystems|[abcdeghijklmnorstuvxyz])|t(?:attoo|echnology|el|ips|oday|
[cdfghjklmnoprtvwz])|u(?:no|[agkmsyz])|v(?:entures|iajes|oyage|[aceginu])|
w(?:ang|ien|[fs])|xxx|y(?:[et])|z(?:[amw]))){1,2})$
Use it together with the i and m flags.
This supposes your data is on mutiple lines.
In case your data is seperated by a ,, change the last character in the regex ($) to ,? and use the g and i flags.
Demos are available on regex101 and debuggex.
(Both of the demos have an explanation: regex101 describes it with text while debuggex visualizes the beast)
A list of available TLDs can be found at iana.org, the used TLDs in the regex are as of January 2014.

preg_match to capture string part after a special character

I have a text files with strings and for each string I need to divide and capture each part of it.
The string is like:
Joao.Martins.G2R71.Pedro.Feliz.sno
Being: NAME 1st player (only first or first+surname) G = game (can be 2 or 02 or other number less than 99) ; R = result (in this example home team wis 7x1) and NAME 2nd player ... last 3 chars are the game type (this example snooker)
But the string can also be:
Joao Martins |2x71| Pedro Feliz.poo
I'm no Regex expert (sadly) and already searched lots of questions here without finding a solution or for that matter even getting help just by reading the answers to other questions (mainly because I never seem to understand this)
I already have this:
preg_match("/\[(|^|]+)\]/",$string,$result);
echo $result[1] . "<br />";
But this only gives me the all thingy between the | | part without even separating them and ignores everything else
Can you guys help me with a solution for both cases? I'm as usual completely lost here!
Thanks in advance!
explode way:
You don't have to use complex regexp, you may use simple explode.
$parts = explode( '.', $string);
Parts now how either 2 parts or 6, so you can do:
if( count( $parts) == 6)){
list( $fistName1, $surName1, $string, $fistName2, $surName2, $gameType) = $parts;
} elseif( count( $parts) == 2) {
$gameType = $parts[1];
list( $fistName1, $surName1, $string, $fistName2, $surName2) = explode( $parts[0]);
} else {
echo "Cannot parse";
}
And now parsing $gameType :)
if( preg_match( '~^\|(\d+)x(\d+)\|$~', $gameType, $parts)){
$first = $parts[1];
$second = $parts[2];
} elseif( preg_match( '~^G(\d+)R(\d+)$~', $gameType, $parts)){
$first = $parts[1];
$second = $parts[2];
} else {
echo "Cannot parse!";
}
preg_match way:
The second regexp is intentionally different, so you can see how to write regexp that will "eat" whole name doesn't matter whether it has 2,3 or 5 parts and you will get used to *? (greedy killer).
$match = array();
if( preg_match( '~^(\w+)\.(\w+)\.G(\d+)R(\d+)\.(\w+)\.(\w+)\.(\w+)$~', $text, $match)){
// First way
} elseif (preg_match( '~^([^\|]+)\|(\d+)x(\d+)\|(.*?)\.(\w+)$~', $text, $match)){
// Second way
} else {
// Failed to parse
}
Edit (more than 2 names)
And if player may have more than 2 names (like Armin Van Buuren) you should go with regexp like this:
~^([\w.]+)\.G(\d+)R(\d+)\.([\w.]+)\.(\w+)$~
This will match names in Albert.Einstein, Armin.Van.Buuren (regexp relies on that name won't contain \d (decimal number) so names like Gerold The 3rd won't match).
You should be fine with using just: ~^([\w\d.]+)\.G(\d+)R(\d+)\.([\w\d.]+)\.(\w+)$~ which would also match Gerold The 3rd and any other name (\.G(\d+)R(\d+)\. is quite strict and you would have to make up really crazy name like G3R01 (like "3l1t33 kid Gerold") to parse it wrong.
Oh and one more thing, don't forget to $name = strtr( $name, '.', ' ') :)
RegExp explained
~~ - regexp delimiter; starts end finishes regexp; ~regexp~, it can be practically anything /regexp/, (regexp)
^ and $ - meta characters;^ start of string/line, $ end of string/line
\w is escape sequence for any word character, the same as [a-zA-Z]
([\w.]+) - captures subpatern/match group what contains [a-zA-Z.] at least once. + is called quantifier
+? - ? (after other quantifier) is called greedy killer and it means take as little as possible, normally would (\w+)a would match (on string ababa) abab, (\w+?)a would match ab and (\w*?)a would match empty string :)
I think this will do it for you.
/^(\w+)(?:\.| )(\w+)(?:\.| \|)G?(\d+)[x|R](\d+)(?:\.|\| )(\w+)(?:\.| )(\w+)(?:\.| )(\w+)$/
$1 will be p1 first name
$2 will be p1 last name
$3 will be game number
$4 will be results
$5 will be p2 first name
$6 will be p2 last name
$7 will be game type
If the $n things don't make sense then just think of them as the elements of the $results array. The pattern might be simplified some but I don't have enough time to figure that out.
You can do this:
//to get the string without the game type
$yourstring = substr($yourstring ,0 ,strlen($yourstring)-4);
//separating strings with "." as delimiter
$results = explode(".",$yourstring);
//checking whether "." was the delimiter
if(!strcmp($results[0],$yourstring)) {
//if "." was not the delimiter, then split the string with " "
//as the delimiter.
$results = explode(" ",$yourstring);
}
//storing them in separate variables. and removing "|" if exists.
if( count( $results) == 5){
$results[2] = trim($results[2],"|");
list( $var1, $var2, $var3, $var4, $var5) = $results;
}
elseif( count( $results) == 4){
$results[1] = trim($results[1],"|");
$results[2] = trim($results[2],"|");
list( $var1, $var2, $var3, $var4) = $results;
}
else {
$results[1] = trim($results[1],"|");
list( $var1, $var2, $var3) = $results;
}
All your string parts will be separated and stored in $results.
To get them to separate variable, you can use list function.

php trim a string

I'm trying to build a function to trim a string is it's too long per my specifications.
Here's what I have:
function trim_me($s,$max)
{
if (strlen($s) > $max)
{
$s = substr($s, 0, $max - 3) . '...';
}
return $s;
}
The above will trim a string if it's longer than the $max and will add a continuation...
I want to expand that function to handle multiple words. Currently it does what it does, but if I have a string say: How are you today? which is 18 characters long. If I run trim_me($s,10) it will show as How are yo..., which is not aesthetically pleasing. How can I make it so it adds a ... after the whole word. Say if I run trim_me($s,10) I want it to display How are you... adding the continuation AFTER the word. Any ideas?
I pretty much don't want to add a continuation in the middle of a word. But if the string has only one word, then the continuation can break the word then only.
So, here's what you want:
<?php
// Original PHP code by Chirp Internet: www.chirp.com.au
// Please acknowledge use of this code by including this header.
function myTruncate($string, $limit, $break=".", $pad="...") {
// is $break present between $limit and the end of the string?
if(false !== ($breakpoint = strpos($string, $break, $limit))) {
if($breakpoint < strlen($string) - 1) {
$string = substr($string, 0, $breakpoint) . $pad;
}
}
return $string;
}
?>
Also, you can read more at http://www.the-art-of-web.com/php/truncate/
function trim_me($s,$max) {
if( strlen($s) <= $max) return $s;
return substr($s,0,strrpos($s," ",$max-3))."...";
}
strrpos is the function that does the magic.
I've named the function str_trunc. You can specify strict being TRUE, in which case it will only allow a string of the maximum size and no more, otherwise it will search for the shortest string fitting in the word it was about to finish.
var_dump(str_trunc('How are you today?', 10)); // string(10) "How are..."
var_dump(str_trunc('How are you today? ', 10, FALSE)); // string(14) "How are you..."
// Returns a trunctated version of $str up to $max chars, excluding $trunc.
// $strict = FALSE will allow longer strings to fit the last word.
function str_trunc($str, $max, $strict = TRUE, $trunc = '...') {
if ( strlen($str) <= $max ) {
return $str;
} else {
if ($strict) {
return substr( $str, 0, strrposlimit($str, ' ', 0, $max + 1) ) . $trunc;
} else {
return substr( $str, 0, strpos($str, ' ', $max) ) . $trunc;
}
}
}
// Works like strrpos, but allows a limit
function strrposlimit($haystack, $needle, $offset = 0, $limit = NULL) {
if ($limit === NULL) {
return strrpos($haystack, $needle, $offset);
} else {
$search = substr($haystack, $offset, $limit);
return strrpos($search, $needle, 0);
}
}
It's actually somehow simple and I add this answer because the suggested duplicate does not match your needs (but it does give some pointers).
What you want is to cut a string a maximum length but preserve the last word. So you need to find out the position where to cut the string (and if it's actually necessary to cut it at all).
As getting the length (strlen) and cutting a string (substr) is not your problem (you already make use of it), the problem to solve is how to obtain the position of the last word that is within the limit.
This involves to analyze the string and find out about the offsets of each word. String processing can be done with regular expressions. While writing this, it reminds me on some actually more similar question where this has been already solved:
Extract a fixed number of chars from an array, just full words (with regex)
How to get first x chars from a string, without cutting off the last word? (with wordwrap)
It does exactly this: Obtaining the "full words" string by using a regular expression. The only difference is, that it removes the last word (instead of extending it). As you want to extend the last word instead, this needs a different regular expression pattern.
In a regular expression \b matches a word-boundary. That is before or after a word. You now want to pick at least $length characters until the next word boundary.
As this could contain spaces before the next word, you might want to trim the result to remove these spaces at the end.
You could extend your function like the following then with the regular expression pattern (preg_replace) and the trim:
/**
* Cut a string at length while preserving the last word.
*
* #param string $str
* #param int $length
* #param string $suffix (optional)
*/
function trim_word($str, $length, $suffix = '...')
{
$len = strlen($str);
if ($len < $length) return $str;
$pattern = sprintf('/^(.{%d,}?)\b.*$/', $length);
$str = preg_replace($pattern, '$1', $str);
$str = trim($str);
$str .= $suffix;
return $str;
}
Usage:
$str = 'How are you today?';
echo trim_word($str, 10); # How are you...
You can further on extend this by reducing the minimum length in the pattern by the length of the suffix (as it's somehow suggested in your question, however the results you gave in your question did not match with your code).
I hope this is helpful. Also please use the search function on this site, it's not perfect but many gems are hidden in existing questions for alternative approaches.

PHP:PCRE: How to replace repeatable char

for example I have following string:
a_b__c___d____e
How to preg_replace char _ to char '-', but only if part ' __...' contains more than N repeated _.
I hope you understand me ))
source: a_b__c___d____e
cond: change '_' where 2 or more
result: a_b--c---d----e
or
source: a_b__c___d____e_____f
cont: change '_' where 4 or more
result: a_b__c___d----e-----f
Thanks!
p.s. Interesting solution without using loops. How implement it with loops (I think) know anybody. Just a one regex and preg_replace.
Here is another one using the e modifier:
$str = 'a_b__c___d____e_____f';
echo preg_replace('/_{4,}/e', 'str_repeat("-", strlen("$0"))', $str);
Replace 4 by the number you need. Or as function:
function repl($str, $char, $times) {
$char = preg_quote($char, '/');
$times = preg_quote($times, '/');
$pattern = '/' . $char . '{' . $times . ',}/e',
return preg_replace($pattern, 'str_repeat("-", strlen("$0"))', $str);
}
$source = 'a_b__c___d____e_____f';
function yourfunc($param)
{
$count = strlen($param);
$return = '';
for ($i = 0; $i < $count; $i++)
{
$return .= '-';
}
return $return;
}
echo preg_replace('#(_{4,})#e', 'yourfunc("$1");', $source);
A solution without callback function and loop is much harder to read.
preg_replace('#(_{4,})#e', 'implode("", array_pad(array(), strlen("$1"), "-"));', $source);
this is inline solution :
preg_replace('/(_{2,})/ie', 'str_repeat("-",strlen("$1"));', $source);
and reusable funciton:
$source = 'a_b__c___d____e_____f';
function replace_repeatable($source,$char,$replacement,$minrepeat = 2)
{
return preg_replace('/(' . preg_quote($char) . '{' . $minrepeat . ',})/ie', 'str_repeat("' . $replacement . '",strlen("$1"));', $source);
}
$b = replace_repeatable($source,'_','-',4);
As referring to php.net documenation using modifier e is discouraged,
This feature has been DEPRECATED as of PHP 5.5.0. Relying on this feature is highly discouraged.
so we'd better to achieve our goal without using this modifier.
Here's solution based on up to date PHP's tools:
$source = 'a_b__c___d____e';
echo preg_replace_callback( "%(_{2,})%i", function($matches) {return str_repeat( "-", strlen($matches[1]) ); }, $source );
/* in callback function matches[0] is whole matched pattern, groups go like this matches[1],matches[2]... */
Even with e still available in our PHP environment, it is generally better to use callback function - thank's to callback we avoid rather unsafe combination of addslashes() function and string evaluation, since running preg_replace with mentioned modifier engages both actions at a time.
A preg_replace_callback has been available since version 4.0.5, but function($matches) {} is an anonymous function which is actually much newer language feature, to run this code u need PHP in version 5.3.0 or newer.
You can replace the dashes one by one using the \G anchor to ensure a contiguity from the position of the first - (followed by n-1 other -) to the last one. This way you only have to check the number of following dashes after the first one:
echo preg_replace('~\G(?!^)_|_(?=_)~', '-', $str);
demo
for n=2:
\G(?!^)_|_(?=_)
for n=3:
\G(?!^)_|_(?=_{2})
for n=4:
\G(?!^)_|_(?=_{3})
etc.
The first branch \G(?!^)_ succeeds only when there's a successfull match at the previous position. In other words, that means this branch will fail until the next second branch succeeds.
The second branch _(?=_{n-1}) is devoted to the first underscore. It checks using a lookahead assertion the number of following underscores.

Categories