regexp monetary strings with decimals and thousands separator - php

https://www.tehplayground.com/KWmxySzbC9VoDvP9
Why is the first string matched?
$list = [
'3928.3939392', // Should not be matched
'4.239,99',
'39',
'3929',
'2993.39',
'393993.999'
];
foreach($list as $str){
preg_match('/^(?<![\d.,])-?\d{1,3}(?:[,. ]?\d{3})*(?:[^.,%]|[.,]\d{1,2})-?(?![\d.,%]|(?: %))$/', $str, $matches);
print_r($matches);
}
output
Array
(
[0] => 3928.3939392
)
Array
(
[0] => 4.239,99
)
Array
(
[0] => 39
)
Array
(
[0] => 3929
)
Array
(
[0] => 2993.39
)
Array
(
)

You seem to want to match the numbers as standalone strings, and thus, you do not need the lookarounds, you only need to use anchors.
You may use
^-?(?:\d{1,3}(?:[,. ]\d{3})*|\d*)(?:[.,]\d{1,2})?$
See the regex demo
Details
^ - start of string
-? - an optional -
(?: - start of a non-capturing alternation group:
\d{1,3}(?:[,. ]\d{3})* - 1 to 3 digits, followed with 0+ sequences of ,, . or space and then 3 digits
| - or
\d* - 0+ digits
) - end of the group
(?:[.,]\d{1,2})? - an optional sequence of . or , followed with 1 or 2 digits
$ - end of string.

Related

Regex: Capturing multiple instances in one word group

I'm not good at Regex and I've been trying for hours now so I hope you can help me. I have this text:
✝his is *✝he* *in✝erne✝*
I need to capture (using PREG_OFFSET_CAPTURE) only the ✝ in a word surrounded with *, so I only need to capture the last three ✝ in this example. The output array should look something like this:
[0] => Array
(
[0] => ✝
[1] => 17
)
[1] => Array
(
[0] => ✝
[1] => 32
)
[2] => Array
(
[0] => ✝
[1] => 44
)
I've tried using (✝) but ofcourse this will select all instances including the words without asterisks. Then I've tried \*[^ ]*(✝)[^ ]*\* but this only gives me the last instance in one word. I've tried many other variations but all were wrong.
To clarify: The asterisk can be at all places in the string, but always at the beginning and end of a word. The opening asterisk always precedes a space except at the beginning of the string and the closing asterisk always ends with a space except at the end of the string. I must add that punctuation marks can be inside these asterisks. ✝ is exactly (and only) what I need to capture and can be at any position in a word.
You could make use of the \G anchor to get iterative matches between the *. The anchor matches either at the start of the string, or at the end of the previous match.
(?:\*|\G(?!^))[^&*]*(?>&(?!#)[^&*]*)*\K✝(?=[^*]*\*)
Explanation
(?: Non capture group
\* Match *
| Or
\G(?!^) Assert the end of the previous match, not at the start
) Close non capture group
[^&*]* Match 0+ times any char except & and *
(?> Atomic group
&(?!#) Match & only when not directly followed by #
[^&*]* Match 0+ times any char except & and *
)* Close atomic group and repeat 0+ times
\K Clear the match buffer (forget what is matched until now)
✝ Match literally
(?=[^*]*\*) Positive lookahead, assert a * at the right
Regex demo | Php demo
For example
$re = '/(?:\*|\G(?!^))[^&*]*(?>&(?!#)[^&*]*)*\K✝(?=[^*]*\*)/m';
$str = '✝his is *✝he* *in✝erne✝*';
preg_match_all($re, $str, $matches, PREG_OFFSET_CAPTURE);
print_r($matches[0]);
Output
Array
(
[0] => Array
(
[0] => ✝
[1] => 16
)
[1] => Array
(
[0] => ✝
[1] => 31
)
[2] => Array
(
[0] => ✝
[1] => 43
)
)
Note The the offset is 1 less than the expected as the string starts counting at 0. See PREG_OFFSET_CAPTURE
If you want to match more variations, you could use a non capturing group and list the ones that you would accept to match. If you don't want to cross newline boundaries you can exclude matching those in the negated character class.
(?:\*|\G(?!^))[^&*\r\n]*(?>&(?!#)[^&*\\rn]*)*\K&#(?:x271D|169);(?=[^*\r\n]*\*)
Regex demo

Finding sentences between characters

I am trying to find sentences between pipe | and dot ., e.g.
| This is one. This is two.
The regex pattern I use :
preg_match_all('/(:\s|\|+)(.*?)(\.|!|\?)/s', $file0, $matches);
So far I could not manage to capture both sentences. The regex I use captures only the first sentence.
How can I solve this problem?
EDIT: as it may seen from the regex, I am trying to find the sentences BETWEEN (: or |) AND (. or ! or ?)
Column or pipe indicates starting point for sentences.
The sentences might be:
: Sentence one. Sentence two. Sentence three.
| Sentence one. Sentence two?
| Sentence one. Sentence two! Sentence three?
I would keep it simple and just match on:
\s*[^.|]+\s*
This says to match any content not consisting of pipes or full stops, and it also trims optional whitespace before/after each sentence.
$input = "| This is one. This is two.";
preg_match_all('/\s*[^.|]+\s*/s', $input, $matches);
print_r($matches[0]);
This prints:
Array
(
[0] => This is one
[1] => This is two
)
This does the job:
$str = '| This is one. This is two.';
preg_match_all('/(?:\s|\|)+(.*?)(?=[.!?])/', $str, $m);
print_r($m)
Output:
Array
(
[0] => Array
(
[0] => | This is one
[1] => This is two
)
[1] => Array
(
[0] => This is one
[1] => This is two
)
)
Demo & explanation
Another option is to make use of \G to get iterative matches asserting the position at the end of the previous match and capture the values in a capturing group matching a dot and 0+ horizontal whitespace chars after.
(?:\|\h*|\G(?!^))([^.\r\n]+)\.\h*
In parts
(?: Non capturing group
\|\h* Match | and 0+ horizontal whitespace chars
| Or
\G(?!^) Assert position at the end of previous match
) Close group
( Capture group 1
- [^.\r\n]+ Match 1+ times any char other than . or a newline
) Close group
\.\h* Match 1 . and 0+ horizontal whitespace chars
Regex demo | Php demo
For example
$re = '/(?:\|\h*|\G(?!^))([^.\r\n]+)\.\h*/';
$str = '| This is one. This is two.
John loves Mary.| This is one. This is two.';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
print_r($matches);
Output
Array
(
[0] => Array
(
[0] => | This is one.
[1] => This is one
)
[1] => Array
(
[0] => This is two
[1] => This is tw
)
)
To keep it simple, find everything between | and . and then split:
$input = "John loves Mary. | This is one. This is two. | Sentence 1. Sentence 2.";
preg_match_all('/\|\s*([^|]+)\./', $input, $matches);
if ($matches) {
foreach($matches[1] as $match) {
print_r(preg_split('/\.\s*/', $match));
}
}
Prints:
Array
(
[0] => This is one
[1] => This is two
)
Array
(
[0] => Sentence 1
[1] => Sentence 2
)

Validate url parameters with preg_match

Valid example
12[red,green],13[xs,xl,xxl,some other text with chars like _&-##%]
number[anythingBut ()[]{},anythingBut ()[]{}](,number[anythingBut ()[]{},anythingBut ()[]{}]) or nothing
Full match 12[red,green]
Group 1 12
Group 2 red,green
Full match 13[xs,xl,xxl,some other text with chars like _&-##%]
Group 1 13
Group 2 xs,xl,xxl,some other text with chars like _&-##%
Not valid example
13[xs,xl,xxl 9974-?ds12[dfgd,dfgd]]
What I tried is this: (\d+(?=\[))\[([^\(\[\{\}\]\)]+)\], regex101 link with what I tried, but this also matches wrong input like given in the example.
If you just need to validate the input, you can add some anchors:
^(?:\d+\[[^\(\[\{\}\]\)]+\](?:,|$))+$
Regex101
If you also need to get all the matching parts, you can use another regex. Using only one will not work well.
$in = '12[red,green],13[xs,xl,xxl,some other text with chars like _&-##%],13[xs,xl,xxl 9974-?ds12[dfgd,dfgd]]';
preg_match_all('/(\d+)\[([^][{}()]+)(?=\](?:,|$))/', $in, $matches);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => 12[red,green
[1] => 13[xs,xl,xxl,some other text with chars like _&-##%
)
[1] => Array
(
[0] => 12
[1] => 13
)
[2] => Array
(
[0] => red,green
[1] => xs,xl,xxl,some other text with chars like _&-##%
)
)
Explanation:
/ : regex delimiter
(\d+) : group 1, 1 or more digits
\[ : open square bracket
( : start group 2
[^][{}()]+ : 1 or more any character that is not open or close parenthesis, brackets or square brackets
) : end group 2
(?= : positive lookahead, make sure we have after
\] : a close square bracket
(?:,|$) : non capture group, a comma or end of string
) : end group 2
/ : regex delimiter

need some help on regex in preg_match_all()

so I need to extract the ticket number "Ticket#999999" from a string.. how do i do this using regex.
my current regex is working if I have more than one number in the Ticket#9999.. but if I only have Ticket#9 it's not working please help.
current regex.
preg_match_all('/(Ticket#[0-9])\w\d+/i',$data,$matches);
thank you.
In your pattern [0-9] matches 1 digit, \w matches another digit and \d+ matches 1+ digits, thus requiring 3 digits after #.
Use
preg_match_all('/Ticket#([0-9]+)/i',$data,$matches);
This will match:
Ticket# - a literal string Ticket#
([0-9]+) - Group 1 capturing 1 or more digits.
PHP demo:
$data = "Ticket#999999 ticket#9";
preg_match_all('/Ticket#([0-9]+)/i',$data,$matches, PREG_SET_ORDER);
print_r($matches);
Output:
Array
(
[0] => Array
(
[0] => Ticket#999999
[1] => 999999
)
[1] => Array
(
[0] => ticket#9
[1] => 9
)
)

Regexp tip request

I have a string like
"first,second[,b],third[a,b[1,2,3]],fourth[a[1,2]],sixth"
I want to explode it to array
Array (
0 => "first",
1 => "second[,b]",
2 => "third[a,b[1,2,3]]",
3 => "fourth[a[1,2]]",
4 => "sixth"
}
I tried to remove brackets:
preg_replace("/[ ( (?>[^[]]+) | (?R) )* ]/xis",
"",
"first,second[,b],third[a,b[1,2,3]],fourth[a[1,2]],sixth"
);
But got stuck one the next step
PHP's regex flavor supports recursive patterns, so something like this would work:
$text = "first,second[,b],third[a,b[1,2,3]],fourth[a[1,2]],sixth";
preg_match_all('/[^,\[\]]+(\[([^\[\]]|(?1))*])?/', $text, $matches);
print_r($matches[0]);
which will print:
Array
(
[0] => first
[1] => second[,b]
[2] => third[a,b[1,2,3]]
[3] => fourth[a[1,2]]
[4] => sixth
)
The key here is not to split, but match.
Whether you want to add such a cryptic regex to your code base, is up to you :)
EDIT
I just realized that my suggestion above will not match entries starting with [. To do that, do it like this:
$text = "first,second[,b],third[a,b[1,2,3]],fourth[a[1,2]],sixth,[s,[,e,[,v,],e,],n]";
preg_match_all("/
( # start match group 1
[^,\[\]] # any char other than a comma or square bracket
| # OR
\[ # an opening square bracket
( # start match group 2
[^\[\]] # any char other than a square bracket
| # OR
(?R) # recursively match the entire pattern
)* # end match group 2, and repeat it zero or more times
] # an closing square bracket
)+ # end match group 1, and repeat it once or more times
/x",
$text,
$matches
);
print_r($matches[0]);
which prints:
Array
(
[0] => first
[1] => second[,b]
[2] => third[a,b[1,2,3]]
[3] => fourth[a[1,2]]
[4] => sixth
[5] => [s,[,e,[,v,],e,],n]
)

Categories