Optional capture group part [duplicate]

Optional capture group part [duplicate] - php

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 4 years ago.
I have this regex pattern:
[^-]+-(.+)[^\d](.+)-(.*?)-.*(\d+).*-([\w]+-[\w]+-[^-]+)-(\d+-\d+)-(.+)\.
That needs to match both these cases
Data Location 1 - many many words 201808206566 - many words - 010114-INL-USD-B087834-2018-08-Bill.PDF
Data Location 1 - many many words 201808206565 - many words - 010115-INL-B087845-2018-08-Bill.PDF
As is, this matches the first case and not the second. And I get the opposite result by removing one instance of [\w]+- from within the 5th capture group, this is because the first case contains INL-USD-B087834, which has an additional data block in it. How can I make the second instance optional?

Put it in an optional group using the ? operator.
[^-]+-(.+)[^\d](.+)-(.*?)-.*(\d+).*-(\w+-(?:\w+-)?[^-]+)-(\d+-\d+)-(.+)\.
Or you use a numeric quantifier to allow 1 or 2 word blocks there:
[^-]+-(.+)[^\d](.+)-(.*?)-.*(\d+).*-((?:\w+-){1,2}[^-]+)-(\d+-\d+)-(.+)\.

Related

Regular expression for exposed filter in a view [Drupal 8]

I am trying to filter out a result in a Drupal 8 view [Exposed Filter] using a regular expression. What I need is to search the keyword in the last 4 or 5 digits/letters of a specific field.
For example:
2006ABC00022
2014DEF03120
2019GHI03128
2019GHI07437
These are the data I need to filter. If someone tried to search "0022" I want to show the result as 2006ABC00022. Because the last 4 digit is 0022. We can use Ends with operator to do this. But I want something different because If someone tried to filter the result with "312" I want to show the results as 2014DEF03120 and 2019GHI03128. Because these 2 strings have 312 as starting of the last 4 digits. This scenario will not work if I use 'Ends with' operator. So I go for a regular expression.
"[0- 9]{4}$"
I tried to use the regex with the above one. And I realize that this is not working as I expected. This one is searching all over the string.
If I search for 2019 it shows the last 2 results. But it should be empty.
I just want to search the keyword on the last 4 digits. And if the keyword is 5 digit search for the last 5 digits.

It seems that a valid match here is a keyword starting with a four digit year, followed by three letters, then followed by 5 digits. This implies the following pattern:
\b\d{4}[A-Z]{3}\d{5}\b
Specifically, if you wanted to find matches ending in 0022 then modify the above pattern and use:
\b\d{4}[A-Z]{3}\d0022\b

How to Retrieve Overlapping Matches with Complex Regex and Preg_Match_All in PHP

Have read the following which have some overlap (pun intended!) with the issue I am facing:
preg_match_all how to get *all* combinations? Even overlapping ones
Overlapping matches with preg_match_all and pattern ending with repeated character
However, I don’t really know how to apply their answers to my issue which is a little more complicated.
My regex that I use with preg_match_all():
/.{240}[^\[]Order[^ ][^\(].{9}/u
With the following string:
56A.  Subject to the provisions of this Act, any decision of the Court or the Appeal Board shall be final and conclusive, and no decision or order of the Court or the Appeal Board shall be challenged, appealed against, reviewed, quashed or called into question in any court and shall not be subject to any Quashing Order, Prohibiting Order, Mandatory Order or injunction in any court on any account.[20/99; 42/2005]
I intended it to match exactly 3 times. The first match has “Quashing Order” 9 characters before the end. The second match has “Prohibiting Order” 9 characters before the end. The third match has “Mandatory Order” 9 characters before the end.
However, as expected it’s only matching the first one, as the expected matches are overlapping.
I applied what I read in the other posts, I tried this:
(?=(.{240}[^\[]Order[^ ][^\(].{9}))
I still don’t get what I need.
How do I solve this?

You can use
\w+\s+Order\b
See the regex demo.
Regex details
\w+ - one or more word chars
\s+ - 1 or more whitespaces
Order\b - a whole word Order, as \b is a word boundary.

You will need to use a positive look-behind assertion for .{240}, just like the answer you found suggests using a positive look-ahead assertion for .{9}:
/(?<=.{240})[^\[]Order[^ ][^\(](?=.{9})/u
This RE matches your string only twice because of [^ ], as #bobblebubble said. Adjust that part as necessary.

php regex match a comma delimited list of integers [duplicate]

This question already has answers here:
Difference between * and + regex
(7 answers)
Closed 4 years ago.
I know the following regex will match a single integer, as well as a list of comma delimited integers:
/^\d+(?:,\d+)*$/
How can i turn this into only matching a list of integers? A single integer should not match. 123,456 and 634,34643,3424 should match.

You would use the + operator meaning "one or more" times instead of * to repeat your group.
/^\d+(?:,\d+)+$/
Live Demo

Regular Expression for starting woth specfic number

How to write regular expression for following conditions:
Should have only numbers
Must be 8 digits long
Must start with 8 or 9 or 6
So for I can do only for first two conditions. I am not sure how to do the third conditions
My code is
if (!preg_match('/^[0-9]{8}$/', $number))

You're almost there. Simply remove the first number from the character class and validate it at the start of your pattern...
/^(8|9|6)\d{7}$/
FYI - \d is the escape sequence for digits. I suppose you could also use this
/^[896]\d{7}$/
as it means about the same thing when you're only watching for a single character at the start.

PHP regex non-capture non-match group

I'm making a date matching regex, and it's all going pretty well, I've got this so far:
"/(?:[0-3])?[0-9]-(?:[0-1])?[0-9]-(?:20)[0-1][0-9]/"
It will (hopefully) match single or double digit days and months, and double or quadruple digit years in the 21st century. A few trials and errors have gotten me this far.
But, I've got two simple questions regarding these results:
(?: ) what is a simple explanation for this? Apparently it's a non-matching group. But then...
What is the trailing ? for? e.g. (? )?

[Edited (again) to improve formatting and fix the intro.]
This is a comment and an answer.
The answer part... I do agree with alex' earlier answer.
(?: ), in contrast to ( ), is used to avoid capturing text, generally so as to have fewer back references thrown in with those you do want or to improve speed performance.
The ? following the (?: ) -- or when following anything except * + ? or {} -- means that the preceding item may or may not be found within a legitimate match. Eg, /z34?/ will match z3 as well as z34 but it won't match z35 or z etc.
The comment part... I made what might considered to be improvements to the regex you were working on:
(?:^|\s)(0?[1-9]|[1-2][0-9]|30|31)-(0?[1-9]|10|11|12)-((?:20)?[0-9][0-9])(?:\s|$)
-- First, it avoids things like 0-0-2011
-- Second, it avoids things like 233443-4-201154564
-- Third, it includes things like 1-1-2022
-- Forth, it includes things like 1-1-11
-- Fifth, it avoids things like 34-4-11
-- Sixth, it allows you to capture the day, month, and year so you can refer to these more easily in code.. code that would, for example, do a further check (is the second captured group 2 and is either the first captured group 29 and this a leap year or else the first captured group is <29) in order to see if a feb 29 date qualified or not.
Finally, note that you'll still get dates that won't exist, eg, 31-6-11. If you want to avoid these, then try:
(?:^|\s)(?:(?:(0?[1-9]|[1-2][0-9]|30|31)-(0?[13578]|10|12))|(?:(0?[1-9]|[1-2][0-9]|30)-(0?[469]|11))|(?:(0?[1-9]|[1-2][0-9])-(0?2)))-((?:20)?[0-9][0-9])(?:\s|$)
Also, I assumed the dates would be preceded and followed by a space (or beg/end of line), but you may want ot adjust that (eg, to allow punctuations).
A commenter elsewhere referenced this resource which you might find useful:
http://rubular.com/

It is a non capturing group. You can not back reference it. Usually used to declutter backreferences and/or increase performance.
It means the previous capturing group is optional.

Subpatterns
Subpatterns are delimited by parentheses (round brackets), which can be nested. Marking part of a pattern as a subpattern does two things:
It localizes a set of alternatives. For example, the pattern
cat(aract|erpillar|) matches one of the words "cat", "cataract", or
"caterpillar". Without the parentheses, it would match "cataract",
"erpillar" or the empty string.
It sets up the subpattern as a capturing subpattern (as defined
above). When the whole pattern matches, that portion of the subject
string that matched the subpattern is passed back to the caller via
the ovector argument of pcre_exec(). Opening parentheses are counted
from left to right (starting from 1) to obtain the numbers of the
capturing subpatterns.
For example, if the string "the red king" is matched against the pattern the ((red|white) (king|queen)) the captured substrings are "red king", "red", and "king", and are numbered 1, 2, and 3.
The fact that plain parentheses fulfill two functions is not always helpful. There are often times when a grouping subpattern is required without a capturing requirement. If an opening parenthesis is followed by "?:", the subpattern does not do any capturing, and is not counted when computing the number of any subsequent capturing subpatterns. For example, if the string "the white queen" is matched against the pattern the ((?:red|white) (king|queen)) the captured substrings are "white queen" and "queen", and are numbered 1 and 2. The maximum number of captured substrings is 65535. It may not be possible to compile such large patterns, however, depending on the configuration options of libpcre.
As a convenient shorthand, if any option settings are required at the start of a non-capturing subpattern, the option letters may appear between the "?" and the ":". Thus the two patterns
(?i:saturday|sunday)
(?:(?i)saturday|sunday)
match exactly the same set of strings. Because alternative branches are tried from left to right, and options are not reset until the end of the subpattern is reached, an option setting in one branch does affect subsequent branches, so the above patterns match "SUNDAY" as well as "Saturday".
It is possible to name a subpattern using the syntax (?Ppattern). This subpattern will then be indexed in the matches array by its normal numeric position and also by name. PHP 5.2.2 introduced two alternative syntaxes (?pattern) and (?'name'pattern).
Sometimes it is necessary to have multiple matching, but alternating subgroups in a regular expression. Normally, each of these would be given their own backreference number even though only one of them would ever possibly match. To overcome this, the (?| syntax allows having duplicate numbers. Consider the following regex matched against the string Sunday:
(?:(Sat)ur|(Sun))day
Here Sun is stored in backreference 2, while backreference 1 is empty. Matching yields Sat in backreference 1 while backreference 2 does not exist. Changing the pattern to use the (?| fixes this problem:
(?|(Sat)ur|(Sun))day
Using this pattern, both Sun and Sat would be stored in backreference 1.
Reference : http://php.net/manual/en/regexp.reference.subpatterns.php

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Optional capture group part [duplicate] - php

Put it in an optional group using the ? operator. [^-]+-(.+)[^\d](.+)-(.?)-.(\d+).-(\w+-(?:\w+-)?[^-]+)-(\d+-\d+)-(.+)\. Or you use a numeric quantifier to allow 1 or 2 word blocks there: [^-]+-(.+)[^\d](.+)-(.?)-.(\d+).-((?:\w+-){1,2}[^-]+)-(\d+-\d+)-(.+)\.

Related

Regular expression for exposed filter in a view [Drupal 8]

How to Retrieve Overlapping Matches with Complex Regex and Preg_Match_All in PHP

php regex match a comma delimited list of integers [duplicate]

Regular Expression for starting woth specfic number

PHP regex non-capture non-match group

Categories

Resources

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Optional capture group part [duplicate] - php

Put it in an optional group using the ? operator. [^-]+-(.+)[^\d](.+)-(.*?)-.*(\d+).*-(\w+-(?:\w+-)?[^-]+)-(\d+-\d+)-(.+)\. Or you use a numeric quantifier to allow 1 or 2 word blocks there: [^-]+-(.+)[^\d](.+)-(.*?)-.*(\d+).*-((?:\w+-){1,2}[^-]+)-(\d+-\d+)-(.+)\.

Related

Regular expression for exposed filter in a view [Drupal 8]

How to Retrieve Overlapping Matches with Complex Regex and Preg_Match_All in PHP

php regex match a comma delimited list of integers [duplicate]

Regular Expression for starting woth specfic number

PHP regex non-capture non-match group

Categories

Resources

Put it in an optional group using the ? operator. [^-]+-(.+)[^\d](.+)-(.?)-.(\d+).-(\w+-(?:\w+-)?[^-]+)-(\d+-\d+)-(.+)\. Or you use a numeric quantifier to allow 1 or 2 word blocks there: [^-]+-(.+)[^\d](.+)-(.?)-.(\d+).-((?:\w+-){1,2}[^-]+)-(\d+-\d+)-(.+)\.