Regex matching multiple pattern - php

I have these
name
name[one]
name[one][two]
name[one][two][three]
I want to be able to match them like this:
[name]
[name, one]
[name, one, two]
[name, one, two, three]
Here's my regex I've tried:
/([\w]+)(?:(?:\[([\w]+)\])+)?/
I just can't quite to seem to get it right, only gets the last square brackets

You can't have a dynamic number of captures; the number of captures is exactly equal to the number of capture parenthesis pairs ((?:...) don't count). You have two capture parenthesis pairs, that means you get two captures - no more, no less.
To handle variable number of matches, use submatches (in a replace with a function, if your language supports that), or split.
You haven't labeled with a programming language, so this is as specific as I can go.

This should do ([\w]+)(?:\[([\w]+)\]\+)?
http://regex101.com/r/mF8pC8/3
Changes from original regex - removed extra capture and added \ before last +.
1st Capturing group ([\w]+)
[\w]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\w match any word character [a-zA-Z0-9_]
(?:\[([\w]+)\]\+)? Non-capturing group
Quantifier: Between zero and one time, as many times as possible, giving back as needed [greedy]
\[ matches the character [ literally
2nd Capturing group ([\w]+)
[\w]+ match a single character present in the list below
Quantifier: Between one and unlimited times, as many times as possible, giving back as needed [greedy]
\w match any word character [a-zA-Z0-9_]
\] matches the character ] literally
\+ matches the character + literally
g modifier: global. All matches (don't return on first match)

You can't repeat groups in a regex. You can write them out a number of times though. This works for up to three groups in square brackets. You can add more if you like.
(\w+)\[(\w+)\](?:\[(\w+)\])?(?:\[(\w+)\])?

You can not have dynamic number of captures with php regexp...
Why not just, write something like: explode('[',strtr('name[one][two][three]', [']'=>''])) - it will give you desired result.

Related

Regular Expression in a METAR

I have some kind of simple and tricky problem.
Here I have a METAR (Weather in a very specific string format).
LIEA 051550Z 21005KT 9999 FEW020 19/14 Q1011
In this string, 051550Z represents that the weather bullettin has been emitted on 5th of the month at 15:50 UTC,... and 9999 indicates the visibility,...
Well, I tried to match a RegExp which could output me the visibility, but I didn't manage to get out of the problem.
preg_match_all() returns me the numbers
0515 (from the time group)
2100 (from the wind group)
9999 (wanted)
1011 (from the pressure group)
with the RegExp I've tried
([0-9]{4})
And then, I blindly added a
(?!Z)
trying not to get at least the time group...
But it doesn't work...
Looking at the problem itself, is it better to consider taking every time the third element of the array (without (?!Z) RegExp addition) or trying to catch directly the right value?
In my opinion the last choice would be better...
So, how can I get the visibility?
You could use a word boundary \b and then match 4 digits to get the visibility:
\b\d{4}\b
If it has to be 4 digits at the fourth position you could also match the first 3 sets matching 1+ times not a whitespace character \S+ followed by 1+ times a horizonal whitespace \h and repeat that 3 times.
Then use \K to forget what was matched and match 4 digit followed by a word boundary.
^(?:\S+\h+){3}\K\d{4}\b
Regex demo

PCRE(php) Is it possible to check if sequence of numbers contains only unique number for that sequence?

Assuming I have a set of numbers (from 1 to 22) divided by some trivial delimiters (comma, point, space, etc). I need to make sure that this set of numbers does not contain any repetition of the same number. Examples:
1,14,22,3 // good
1,12,12,3 // not good
Is it possible to do via regular expression?
I know it's easy to do using just php, but I really wander how to make it work with regex.
Yes, you could achieve this through regex via negative looahead.
^(?!.*\b(\d+)\b.*\b\1\b)\d+(?:,\d+)+$
(?!.*\b(\d+)\b.*\b\1\b) Negative lookahead at the start asserts that the there wouldn't be a repeated number present in the match. \b(\d+)\b.*\b\1\b matches the repeated number.
\d+ matches one or more digits.
(?:,\d+)+ One or more occurances of , , one or more digits.
$ Asserts that we are at the end .
DEMO
OR
Regex for the numbers separated by space, dot, comma as delimiters.
^(?!.*\b(\d+)\b.*\b\1\b)\d+(?:([.\s,])\d+)(?:\2\d+)*$
(?:([.\s,])\d+) capturing group inside this non-capturing group helps us to check for following delimiters are of the same type. ie, the above regex won't match the strings like 2,3 5.6
DEMO
You can use this regex:
^(?!.*?(\b\d+)\W+\1\b)\d+(\W+\d+)*$
Negative lookahead (?!.*?(\b\d+)\W+\1\b) avoids the match when 2 similar numbers appear one after another separated by 1 or more non-word characters.
RegEx Demo
Here is the solution that fit my current need:
^(?>(?!\2\b|\3\b)(1\d{1}|2[0-2]{1}|\d{1}+)[,.; ]+)(?>(?!\1\b|\3\b)(1\d{1}|2[0-2]{1}|\d{1}+)[,.; ]+)(?>(?!\1\b|\2\b)(1\d{1}|2[0-2]{1}|\d{1}+))$
It returns all the sequences with unique numbers divided by one or more separator and also limit the number itself from 1 to 22, allowing only 3 numbers in the sequence.
See working example
Yet, it's not perfect, but work fine! Thanks a lot to everyone who gave me a hand on this!

Regex Lookahead (PHP)

I have a quick question about regex for PHP.
My code:
^(\d{0,4}?)\.(?=(\d{1,2}))$
doesn't seem to work, where it's supposed to capture an optional group of up to 4 digits, then look ahead and conditionally capture a period based on if it captures a group of 1-2 digits.
Does anyone know why this doesn't work?
That's not the right way to do it - nothing about your regex indicates that the . is optional.
Try:
^(\d{0,4})(?:\.(\d{1,2}))?$
This will match up to four digits, which may optionally be followed by a dot, then one or two digits. In any case, the two subpatterns will contain the groups of digits.

PHP regex non-capture non-match group

I'm making a date matching regex, and it's all going pretty well, I've got this so far:
"/(?:[0-3])?[0-9]-(?:[0-1])?[0-9]-(?:20)[0-1][0-9]/"
It will (hopefully) match single or double digit days and months, and double or quadruple digit years in the 21st century. A few trials and errors have gotten me this far.
But, I've got two simple questions regarding these results:
(?: ) what is a simple explanation for this? Apparently it's a non-matching group. But then...
What is the trailing ? for? e.g. (? )?
[Edited (again) to improve formatting and fix the intro.]
This is a comment and an answer.
The answer part... I do agree with alex' earlier answer.
(?: ), in contrast to ( ), is used to avoid capturing text, generally so as to have fewer back references thrown in with those you do want or to improve speed performance.
The ? following the (?: ) -- or when following anything except * + ? or {} -- means that the preceding item may or may not be found within a legitimate match. Eg, /z34?/ will match z3 as well as z34 but it won't match z35 or z etc.
The comment part... I made what might considered to be improvements to the regex you were working on:
(?:^|\s)(0?[1-9]|[1-2][0-9]|30|31)-(0?[1-9]|10|11|12)-((?:20)?[0-9][0-9])(?:\s|$)
-- First, it avoids things like 0-0-2011
-- Second, it avoids things like 233443-4-201154564
-- Third, it includes things like 1-1-2022
-- Forth, it includes things like 1-1-11
-- Fifth, it avoids things like 34-4-11
-- Sixth, it allows you to capture the day, month, and year so you can refer to these more easily in code.. code that would, for example, do a further check (is the second captured group 2 and is either the first captured group 29 and this a leap year or else the first captured group is <29) in order to see if a feb 29 date qualified or not.
Finally, note that you'll still get dates that won't exist, eg, 31-6-11. If you want to avoid these, then try:
(?:^|\s)(?:(?:(0?[1-9]|[1-2][0-9]|30|31)-(0?[13578]|10|12))|(?:(0?[1-9]|[1-2][0-9]|30)-(0?[469]|11))|(?:(0?[1-9]|[1-2][0-9])-(0?2)))-((?:20)?[0-9][0-9])(?:\s|$)
Also, I assumed the dates would be preceded and followed by a space (or beg/end of line), but you may want ot adjust that (eg, to allow punctuations).
A commenter elsewhere referenced this resource which you might find useful:
http://rubular.com/
It is a non capturing group. You can not back reference it. Usually used to declutter backreferences and/or increase performance.
It means the previous capturing group is optional.
Subpatterns
Subpatterns are delimited by parentheses (round brackets), which can be nested. Marking part of a pattern as a subpattern does two things:
It localizes a set of alternatives. For example, the pattern
cat(aract|erpillar|) matches one of the words "cat", "cataract", or
"caterpillar". Without the parentheses, it would match "cataract",
"erpillar" or the empty string.
It sets up the subpattern as a capturing subpattern (as defined
above). When the whole pattern matches, that portion of the subject
string that matched the subpattern is passed back to the caller via
the ovector argument of pcre_exec(). Opening parentheses are counted
from left to right (starting from 1) to obtain the numbers of the
capturing subpatterns.
For example, if the string "the red king" is matched against the pattern the ((red|white) (king|queen)) the captured substrings are "red king", "red", and "king", and are numbered 1, 2, and 3.
The fact that plain parentheses fulfill two functions is not always helpful. There are often times when a grouping subpattern is required without a capturing requirement. If an opening parenthesis is followed by "?:", the subpattern does not do any capturing, and is not counted when computing the number of any subsequent capturing subpatterns. For example, if the string "the white queen" is matched against the pattern the ((?:red|white) (king|queen)) the captured substrings are "white queen" and "queen", and are numbered 1 and 2. The maximum number of captured substrings is 65535. It may not be possible to compile such large patterns, however, depending on the configuration options of libpcre.
As a convenient shorthand, if any option settings are required at the start of a non-capturing subpattern, the option letters may appear between the "?" and the ":". Thus the two patterns
(?i:saturday|sunday)
(?:(?i)saturday|sunday)
match exactly the same set of strings. Because alternative branches are tried from left to right, and options are not reset until the end of the subpattern is reached, an option setting in one branch does affect subsequent branches, so the above patterns match "SUNDAY" as well as "Saturday".
It is possible to name a subpattern using the syntax (?Ppattern). This subpattern will then be indexed in the matches array by its normal numeric position and also by name. PHP 5.2.2 introduced two alternative syntaxes (?pattern) and (?'name'pattern).
Sometimes it is necessary to have multiple matching, but alternating subgroups in a regular expression. Normally, each of these would be given their own backreference number even though only one of them would ever possibly match. To overcome this, the (?| syntax allows having duplicate numbers. Consider the following regex matched against the string Sunday:
(?:(Sat)ur|(Sun))day
Here Sun is stored in backreference 2, while backreference 1 is empty. Matching yields Sat in backreference 1 while backreference 2 does not exist. Changing the pattern to use the (?| fixes this problem:
(?|(Sat)ur|(Sun))day
Using this pattern, both Sun and Sat would be stored in backreference 1.
Reference : http://php.net/manual/en/regexp.reference.subpatterns.php

regex pattern to match alternating subpatterns

I'm trying to devise a regex pattern (in PHP) which will allow for any alternation of two subpatterns. So if pattern A matches a group of three letters, and B matches a group of 2 numerals, all of these would be OK:
aaa
aaa66bbb
66
67abc
12abc34def56ghi78jkl
I don't mind which subpattern starts or ends the sequence, just that after the first match, the subpatterns must alternate. I'm totally stumped by this - any advice will be gratefully received!
Here's a general solution:
^(?:[a-z]{3}(?![a-z]{3})|[0-9]{2}(?![0-9]{2}))+$
It's a simple alternation--three letters or two digits--but the negative lookaheads ensure that the same alternative is never matched twice in a row. Here's a slightly more elegant solution just for PHP:
/^(?:([a-z]{3})(?!(?1))|([0-9]{2})(?!(?2)))+$/
Instead of typing the same subpatterns multiple times, you can put them capturing groups and use (?1), (?2), etc. to apply them again wherever else you want--in this case, in the lookaheads.
"/^(?:$A(?:$B$A)*$B?|$B(?:$A$B)*$A?)\$/"
will match either pattern A followed by however many alternating pattern B's and pattern A's, and maybe a final B...or a B followed by however many A-B pairs plus an A if it's there.
I've made this a string (and escaped the final $) cause you're going to have some interpolation to do. Make sure $A and $B are in some kind of grouping (like parentheses) if you want the ?'s to match the right thing. In your examples, $A might be '([a-zA-Z]{3})' and $B might be '(\d\d)'.
Note, if you want to match some number of the same letter or digit, or instances of the same set of letters or digits, you'll need to do some magic with backreferences -- probably named ones, since any numbered backreference will depend on the number of capture groups before the one you want (or between the one you want and where you are), but that number gets complicated if the subpatterns have parentheses in them.
Take a look at this (and check conditional subpatterns). I've personally never used them but seems to be what you're looking for.
/\b(?:(([a-z])\2\2)(?:(([0-9])\4)\1)*(?:([0-9])\5)?|(([0-9])\7)(?:(([a-z])\9\9)\6)*(?:([a-z])\10\10)?)\b/
or if you want to allow any non digit char in the group of three:
/\b(?:((\D)\2\2)(?:((\d)\4)\1)*(?:(\d)\5)?|((\d)\7)(?:((\D)\9\9)\6)*(?:(\D)\10\10)?)\b/
This will match any pattern that consists of two alternating groups one group consists of 3 times the same char and the other of 2 times the same digit.
This Regex will match
aaa
11
bbb22
33ccc
ddd44ddd
55eee55
fff66fff66
77ggg77ggg
But not
aaa11bbb

Categories