regex repetitive pattern with monetary amount

regex repetitive pattern with monetary amount - php

(?:[.,]?\d{3})* this is a part of a monetary amount pattern which match the thousand separator and the 3 digits after. The thousand separator can either be ., , or nothing
But how to make it repetitive? If the first thousand separator is . then the rest should also be .

You can use a capture group with a repeating backreference:
^(?:\d{4,}|\d{1,3}(?:([,.])\d{3}(?:\1\d{3})*)?)$
Explanation
^ Start of string
(?: Non capture group for the alternatives
\d{4,} Match 4 or more digits
| Or
\d{1,3} Match 1-3 digits
(?: Non capture group
([,.])\d{3} capture either a . or comma in group 1
(?:\1\d{3})* Optionally repeat the backreference to group 1 (the same char) followed by 3 digits
)? Close the non capture group and make it optional (to also match 1-3 digits)
) Close the non capture group
$ End of string
See a regex demo.

Take it block by block: the begining the middle and the end. Then use a or to match the numbers < 1000.
^((\d{1,3}[.,]){,2}(\d{3}[,.])*(\d{3})|(\d{1,3}))$
(This works with python regex on this tool: https://regex101.com/)
For PHP I had to modify it like this:
^(\d{1,3}|(\d{1,3}[.,])+(\d{3}[.,])*\d{3})$

Related

regex repetitive parts in monetary amounts

In a monetary pattern I look for thousand separator
(?:[. ]\d{3})*
In this case the thousand separator could either be . or . But how to make sure that the pattern will not match patterns where the thousand separator is mixed?
Only match patterns like
.123.123.123
123 123 123
Do not match
.123 123 123.123

You can match the first separator and then use a backreference to match the same separator for the remaining digit groups:
^(?:([. ])\d{3}(\1\d{3})*)?$
Demo: https://regex101.com/r/u4zPuf/1

If you don't want to match empty strings:
^([. ])\d{3}(?:\1\d{3})*$
In parts the pattern matches:
^ Start of string
([. ]) Capture group 1, match either . or a space
\d{3} Match 3 digits
(?:\1\d{3})* Optionally repeat a back reference to what is captured in group 1 followed by 3 digits
$ End of strings
Regex demo

Regex optional groups and digit length

Maybe some regex-Master can solve my problem.
I have a big list with many addresses with no seperators( , ; ).
The address string contains following Information:
The first group is the street name
The second group is the street number
The third group is the zipcode (optional)
The last group is the town name (optional)
As you can see on the image above the last two test strings are not matching.
I need the last two regex groups to be optional and the third group should be either 4 or 5 digits.
I tried (\d{4,5}) for allowing 4 and 5 digits. But this only works halfways as you can see here: https://regex101.com/r/ZurqHh/1
(This sometimes mixes the street number and zipcode together)
I also tried (?:\d{5})? to make the third and fourth group optional. But this destroys my whole group layout...
https://regex101.com/r/EgxeMy/1
This is my current regex:
/^([a-zäöüÄÖÜß\s\d.,-]+?)\s*([\d\s]+(?:\s?[-|+\/]\s?\d+)?\s*[a-z]?)?\s*(\d{5})\s*(.+)?$/im
Try it out yourself:
https://regex101.com/r/zC8NCP/1
My brain is only farting at this moment and i can't think straight anymore.
Please help me fix this problem so i can die in peace.

You can use
^(.*?)(?:\s+(\d+(?:\s*[-|+\/]\s*\d+)*\s*[a-z]?\b))?(?:\s+(\d{4,5})(?:\s+(.*))?)?$
See the regex demo (note all \s are replaced with \h to only match horizontal whitespaces).
Details:
^ - start of string
(.*?) - Group 1: any zero or more chars other than line break chars
(?:\s+(\d+(?:\s*[-|+\/]\s*\d+)*\s*[a-z]?\b))? - an optional non-capturing group matching
\s+ - one or more whitespaces
(\d+(?:\s*[-|+\/]\s*\d+)*\s*[a-z]?\b) - Group 2:
\d+ - one or more digits
(?:\s*[-|+\/]\s*\d+)* - zero or more sequences of zero or more whitespaces, -, +, | or /, zero or more whitespaces, one or more digits
\s* - zero or more whitespaces
[a-z]?\b - an optional lowercase ASCII letter and a word boundary
(?:\s+(\d{4,5})\b(?:\s+(.*))?)? - an optional non-capturing group matching
\s+ - one or more whitespaces
(\d{4,5}) - Group 3: four or five digits
(?:\s+(.*))? - an optional sequence of one or more whitespaces and then any zero or more chars other than line break chars as many as possible
$ - end of string.
Please note that the (?:\s+(.*))? optional group must be inside the (?:\s+(\d{4,5})...)? group to work.

It is difficult to parse addresses because we are halfway between formatted text and natural language. Here is a pattern that tries as much as possible to reduce the number of optional parameters to succeed with the examples offered without asking too much to the regex engine. To do this, I mainly rely on character classes, atomic groups, and a relatively accurate description of the street names. Obviously, all the examples of the question cannot be representative of reality and characters could be added or removed from the classes to deal with new cases. Nevertheless, the structure of this pattern is a good starting point.
~
^
(?<strasse> [\pL\d-]+ \.? (?> \h+ [\pL\d-]+ \.? )*? ) \h*
(?<nummer> \b (?> \d+ | [-+/\h]+ | [a-z] \b )*? )
(?: \h+ (?<plz> \d{4,5} )
\h+ (?<stadt> .+ ) )?
$
~mxui
demo
Note that in the above link you can also see a previous version of this pattern with a more accurate description of the street number (a bit more efficient but longer).

Maximum character length for PHP multiline regular expressions?

I'm trying to evaluate a multiline RegExp with preg_match_all.
Unfortunately there seems to be a character limit around 24,000 characters (24,577 to be specific).
Does anyone know how to get this to work?
Pseudo-code:
<?php
$data = 'TRACE: aaaa(24,577 characters)';
preg_match_all('/([A-Z]+): ((?:(?![A-Z]+:).)*)\n/s', $data, $matches);
var_dump($matches);
?>
Working example (with < 24,577 characters): https://3v4l.org/8iRCc
Example that's NOT working (with > 24,577 characters): https://3v4l.org/ceKn6

You might rewrite the pattern using a negated character class instead of the tempered greedy token approach with the negative lookahead:
([A-Z]+): ([^A-Z\r\n]*(?>(?:\r?\n|[A-Z](?![A-Z]*:))[^A-Z\r\n]*)*)\r?\n
([A-Z]+): Capture group 1, match 1+ uppercase chars : and a space
( Capture group 2
[^A-Z\r\n]* Match 1+ times any char except A-Z or a newline
(?> Atomic group
(?: Non capture group
\r?\n Match a newline
| Or
[A-Z] Match a char other than A-Z
(?![A-Z]*:) Negative lookahead, assert not optional chars A-Z and :
) Close non capture group
[^A-Z\r\n]* Optionally match any char except A-Z
)* Close atomic group and optionally repeat
)\r?\n Close group 2 and match a newline
Regex demo | Php demo
If the TRACE: is at the start of the string, you can also add an anchor:
^([A-Z]+): ([^A-Z\r\n]*(?>(?:\r?\n|[A-Z](?![A-Z]*:))[^A-Z\r\n]*)*)\r?\n
Regex demo
Edit
If the strings start with the same format, you can capture and match all lines that do not start with the opening format.
^([A-Z]+): (.*(?:\r?\n(?![A-Z]+: ).*)*)
The pattern matches:
^ Start of string
([A-Z]+): Capture group 1
( Capture group 2
.* Match the rest of the line
(?:\r?\n(?![A-Z]+: ).*)* Repeat matching all lines that do not start with the pattern [A-Z]+:
) Close group 2
Regex demo
In php you can use
$re = '/^([A-Z]+): (.*(?:\r?\n(?![A-Z]+: ).*)*)/m';
Php demo

Try this
preg_match('/\A(?>[^\r\n]*(?>\r\n?|\n)){0,4}[^\r\n]*\z/',$data)

I'm trying to capture data in a web url with regex

I'm trying to build my regex to match my urls
Here are 2 example urls
category/sorganiser/bouger/escalade/offre/78934/
category/sorganiser/savourer/offre/8040/
I would like to get the number just after offre (78934 and 8040)
as well as the word just before the word offre (escalade and savourer)
I did several tests but did not pass
^category/(((\w)+/){1,3})(\d+)/?$
^category/(((\w)+/){1,3})/offre/(\d+)/?$
https://regex101.com/r/S4MTvK/1
Thank you

Instead of repeating a single word char in a group (\w)+ you can repeat 1+ word chars in a single group (\w+)
Note to not match the / before /offre as it is already matched in the iteration ^category/(?:(\w+)/){1,3}
You can repeat the capture group inside a non capture group (?: to capture the last occurrence in the iteration.
^category/(?:(\w+)/){1,3}offre/(\d+)
The pattern matches
^ Start of string
category/ Match literally
(?: Non capture group
(\w+)/ Capture group 1, match 1+ word chars and match /
){1,3} Close non capture, repeat 1-3 times and capture group 1 contains the last occurrence of 1+ word chars which is escalade or savourer
offre/ Match literally
(\d+) Capture group 2, match 1+ digits
Regex demo
To also match an optional / before the end of the sting
^category/(?:(\w+)/){1,3}offre/(\d+)/?$
Regex demo

Phone no contain this patteren AABBCC e.g 112233

I want to check if phone no contains this pattern AABBCC
Where A[0-9],B[0-9],C[0,9] They should be different e.g 112233,553322,887766
Let Us Suppose
I Have a phone no 03334112233
It will say yes pattern matched.
PHP Code but It Is For Exact String
$str = 'aabbaabbccaass'; //or whatever
if (preg_match('/(?!.*?aabbcc)^.*$/', $str))
echo "accepted\n";
else
echo "rejected\n";
Problem i don't know how to do if string is for numbers
Possible Duplicate
but it does not contain answer and exact detail.
Edited :
I want to match the last 6 characters of the string in this pattern AABBCC e.g 03329112233

To match number with AABBCC format, you can use this pattern:
(?:(\d)\1(?!\1)){2}(\d)\2
example of use:
if (preg_match('/(?:(\d)\1(?!\1)){2}(\d)\2/', $str)
echo "rejected\n";
else
echo "accepted\n";
But if you have other tests to do (for example that there is only digits), it can be more flexible to use it in this way:
if (preg_match('/(?!.*(?:(\d)\1(?!\1)){2}(\d)\2)^\d+$/', $str)
echo "accepted\n";
else
echo "rejected\n";
pattern details:
(?: # open a non capturing group that describes a repeated digit
(\d) # capture the first digit with group 1
\1 # a backreference to group 1 (the same digit thus)
(?!\1) # check with a negative lookahead that the same digit doesn't follow
){2} # repeat the group two times
(\d)\2 # same thing for digits 5 & 6 (the lookahead isn't needed here)
Note that the digit in the capture group change at each repetition of the non capturing group (because the negative lookahead forces it).
Notice: if you want to reject numbers that contains, for example, 111122 or 112222 or 111111, you only need to remove the negative lookahead.
if you want to reject numbers with the format 112211 or 448844, you must change the pattern like this: (\d)\1(?!\d{0,2}\1)(\d)\2(?!\2)(\d)\3

As I understand, you only want to match the last 6 characters of the string, if they are digits, and of 3 all different digit pairs. Would also use a lookahead and some pattern like this:
(?>((\d)\2)(?!.*\1)){3}$
\2 checks for an equivalent of 2nd capturing group, which is one digit (shorthand \d)
using a negative lookahead to check, if not followed by .* any amount of any characters, followed by equivalent of 1st capturing group (which contains 2 equal digits).
{3} 3 repitions at $ end of string.
Test on regex101.com, Regex FAQ

Your regex should be like this:
^((\d)\2){3}$
It is simpler and also works.

You can use capturing groups and backreferences like this:
if (preg_match('/(?!.*(.)\1(.)\2(.)\3)^.*$/', $str))
The (.) will match any single character and assign it to a group. The first instance is assigned to group 1, the second to group 2 and so on. Later in the pattern, the backreference \1 will match exactly what was previously captured in group the first group, \2 will match what was captured in the second group, etc.
You probably will also want to use \d to match any single digit (it's only necessary to use this outside of the lookahead) and a {n,m} quantifier to match between n and m digits. For example, the following will match any sequence of 7 to 10 digits that does not contain a subsequence like AABBCC:
if (preg_match('/(?!.*(.)\1(.)\2(.)\3)^\d{7,10}$/', $str))

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

regex repetitive pattern with monetary amount - php

(?:[.,]?\d{3})* this is a part of a monetary amount pattern which match the thousand separator and the 3 digits after. The thousand separator can either be ., , or nothing But how to make it repetitive? If the first thousand separator is . then the rest should also be .

Related

regex repetitive parts in monetary amounts

Regex optional groups and digit length

Maximum character length for PHP multiline regular expressions?

I'm trying to capture data in a web url with regex

Phone no contain this patteren AABBCC e.g 112233

Categories

Resources