Creating a regular expression that will match requirements in a string

Creating a regular expression that will match requirements in a string - php

The issue
I need to write a regular expression that will match the following requirements in a string with the structure {A/B}.
Requirements/Conditions:
A and B can only be exactly one of [UGWRB].
A structure where U or G do not appear is invalid.
A structure where both characters are equal is invalid.
U or G must appear in the combination at least once.
The structure can repeat or continue infinite times, as long as each following instance is still valid when read alone. (see valid matches below)
Valid matches:
{U/G}{U/G}{U/G}
{W/G}{U/B}
{U/G}{U/B}
{U/G}
{G/U}
{U/B}
...
Invalid matches:
{U/U}{U/U}
{U/U}{G/G}
{U/G}{U/U}
{U/G}{R/B}
{G/G}
{R/B}
{W/R}
{B/W}
...
My attempt
This is what I have gotten so far, but out of all the combinations of UGWRB, I'm only getting 8 matches out of 14.
{([UG])(?(1)|\w)\/(?(1)\w|[UG])}

You have to work with lookaheads both negative and positive in order to accomplish the task:
^(?:{(?=[^{}]*[UG])([UGWRB])\/(?!\1)(?1)})+$
See live demo here
Note that m flag should be set.
Regex breakdown:
^ Match start of input string
(?: Start of non-capturing group
{ Match { literally
(?= Start of positive lookahead
[^{}]*[UG] Look for [UG] in combination
) End of lookahead
([UGWRB]) Match and capture a letter from character class
\/(?!\1)(?1) Match / and see if next char is not the same as recently captured one
} Match } literally
)+ End of group, repeat at least once
$ Match end of input string

Try this regex:
^(?!.*{([UGWRB])\/\1})(?:{(?(?=[UG]).\/[UGWRB]|[WRB]\/[UG])})+$
Click for Demo
Explanation:
^ - matches the start of the string
(?!.*{([UGWRB])\/\1}) - negative lookahead to make sure that the structures like {G/G} or {U/U} or {R/R} are not present anywhere in the string
{ - matches {
(?(?=[UG]).\/[UGWRB]|[WRB]\/[UG]) - Regex Conditional. If the current position is followed by either U or G, then the match that character followed by / and the character class [UGWRB]. Otherwise, match the character class [WRB] followed by / followed by U or G
} - matches }
+ - matches 1+ occurrences of the above sub-sequence (?:{(?(?=[UG]).\/[UGWRB]|[WRB]\/[UG])})
$ - matches the end of the string

Related

How to capture all phrases which doesn't have a pattern in the middle of theirself?

I want to capture all strings that doesn't have the pattern _ a[a-z]* _ in the specified position in the example below:
<?php
$myStrings = array(
"123-456",
"123-7-456",
"123-Apple-456",
"123-0-456",
"123-Alphabet-456"
);
foreach($myStrings as $myStr){
echo var_dump(
preg_match("/123-(?!a[a-z]*)-456/i", $myStr)
);
}
?>

You can check the following solution at this Regex101 share link.
^(123-(?:(?![aA][a-zA-Z]*).*)-456)|(123-456)$
It uses regex non-capturing group (?:) and regex negative lookahead (?!) to find all inner sections that do not start with 'a' (or 'A') and any letters after that. Also, the case with no inner section (123-456) is added (with the | sign) as a 2nd alternative for a wrong pattern.

A lookahead is a zero-length assertion. The middle part also needs to be consumed to meet 456. For consuming use e.g. \w+- for one or more word characters and hyphen inside an optional group that starts with your lookahead condition. See this regex101 demo (i flag for caseless matching).
Further for searching an array preg_grep can be used (see php demo at tio.run).
preg_grep('~^123-(?:(?!a[a-z]*-)\w+-)?456$~i', $myStrings);
There is also an invert option: PREG_GREP_INVERT. If you don't need to check for start and end a more simple pattern like -a[a-z]*- without lookahead could be used (another php demo).

Match the pattern and invert the result:
!preg_match('/a[a-z]*/i', $yourStr);
Don't try to do everything with a regex when programming languages exist to do the job.

You are not getting a match because in the pattern 123-(?!a[a-z]*)-456 the lookahead assertion (?!a[a-z]*) is always true because after matching the first - it has to directly match another hyphen like the pattern actually 123--456
If you move the last hyphen inside the lookahead like 123-(?!a[a-z]*-)456 you only get 1 match for 123-456 because you are actually not matching the middle part of the string.
Another option with php can be to consume the part that you don't want, and then use SKIP FAIL
^123-(?:a[a-z]*-(*SKIP)(*F)|\w+-)?456$
Explanation
^ Start of string
123- Match literally
(?: Non capture group for the alternation
a[a-z]*-(*SKIP)(*F) Match a, then optional chars a-z, then match - and skip the match
| Or
\w+- Match 1+ word chars followed by -
)? Close the non capture group and make it optional to also match when there is no middle part
456 Match literally
$ End of string
Regex demo
Example
$myStrings = array(
"123-456",
"123-7-456",
"123-Apple-456",
"123-0-456",
"123-Alphabet-456",
"123-b-456"
);
foreach($myStrings as $myStr) {
if (preg_match("/^123-(?:a[a-z]*-(*SKIP)(*F)|\w+-)?456$/i", $myStr, $match)) {
echo "Match for $match[0]" . PHP_EOL;
} else {
echo "No match for $myStr" . PHP_EOL;
}
}
Output
Match for 123-456
Match for 123-7-456
No match for 123-Apple-456
Match for 123-0-456
No match for 123-Alphabet-456
Match for 123-b-456

RegEx for matching specific HTML Entity pattern (Emoji)

I am working in this regular expression to match exactly the following pattern. The issue is that if it is exceeded, the pattern should not be considered:
I want exactly 6 digits starting with #, but if I write {5} returns true. Then the same happens with ; I want exactly one and to be at the end. Also, I don't know how to use here the $ to specify the final character.
if(preg_match(('/^(#)+([0-9]{6}){1}(;)/'),"#128515;")){
return true;
}
SHOULD BE IN THIS FORMAT:
#128515; for #DDDDDD; not ##DDDD;;
Exactly 6 digits start with one # and finish with one ;

preg_match will return 1 when it matches given subject and if you have 6 digits, it can match 5 as well when there is no ending semicolon as there is no ending boundary set.
You could add anchors ^ and $ to assert the start and the end of the string so it matches exactly 6 digits.
From your pattern you can omit {1} because the group is already matched 1 time.
If you don't reference to the groups in the code you could also omit them and just us a match only.
You could use:
^#[0-9]{6};$
^ Start of string
# Match #
[0-9]{6}; Match 6 digits
$ Assert end of string
Your code could look like
if(preg_match(('/^#[0-9]{6};$/'),"#128515;")){
return true;
}

Simple regex to match some characters and exclude others

I need a regex for preg_match to accept all alphanumeric characters except l, L, v, V, 0, 2.
I've tried
^[a-zA-Z0-9][^lLvV02]*$
It works good excluding lLvV02 but it also accept other characters like SPACE,ù,#,#, etc...
How should I change it?

You may use
^(?:(?![lLvV02])[a-zA-Z0-9])*$
Details
^ - start of string
(?: - start of a non-capturing group
(?![lLvV02])[a-zA-Z0-9] - an alnum char that is not one of the chars inside the character class residing inside a negative lookahead
)* - end of the non-capturing group, 0 or more repetitions
$ - end of string
See the Regulex graph:

I know you asked for a Regex, but you can test for alphanumeric first and only if that passes check that the others are NOT present:
if(ctype_alnum($string) && !preg_match('/[lLvV02]/', $string)) {
//pass
} else {
//fail
}
Or possibly substitute preg_match('/^[^lLvV02]+$/', $string).

Easiest would probably be: ^[a-km-uw-zA-KM-UW-Z13-9]*$.
I'm not saying that it's pretty but it does what it's supposed to.

Update a regex that matches twitter like mentions to allow for dots

I have already found helpful answers for a regex that matches twitter like username mentions in this answer and this answer
(?<=^|(?<=[^a-zA-Z0-9-_\.]))#([A-Za-z]+[A-Za-z0-9_]+)
(?<=^|(?<=[^a-zA-Z0-9-_\.]))#([A-Za-z]+[A-Za-z0-9-_]+)
However, I need to update this regex to also include usernames that has dots.
One or more dots are allowed in a username.
The username must not start or end with a dot.
No two consecutive dots are allowed.
Example of a matched string:
#valid.user.name
^^^^^^^^^^^^^^^^
Examples of non-matched strings:
#.user.name // starts with a dot
#user.name. // ends with a dot
#user..name // has two consecutive dots

You can use this refactored regex:
(?<=[^\w.-]|^)#([A-Za-z]+(?:\.\w+)*)$
RegEx Demo
RegEx Details:
(?<=[^\w.-]|^): Lookbehind to assert that we have start of line or any non-word, non-dot, non-hyphen character before current position
#: Match literal `#1
(: Start capture group
[A-Za-z]+: Match 1+ ASCII letters
(?:\.\w+)*: Match 0 or more instances of dot followed 1+ word characters
): End capture group
$: End

The (?<=^|(?<=[^a-zA-Z0-9-_\.])) is a positive lookbehind that requires a match to be at the start of the string or right after an alphanumeric, -, _, ., you may write it in a more compact way as (?<![\w.-]), a negative lookbehind.
Next, ([A-Za-z]+[A-Za-z0-9_]+) captures 1+ ASCII letters and then 1+ ASCII letters or/and underscores. You seem to make sure the first char is a letter, then any number of sequences of . and 1+ word chars are allowed, that is, you may use [A-Za-z]\w*(?:\.\w+)*.
As you do not want to match it if there is a . right after the expected match, you need to set a lookahead that will require a space or end of string, (?!\S).
So, combining it, you can use
'~(?<![\w.-])#([A-Za-z]\w*(?:\.\w+)*)(?!\S)~'
See the regex demo
Details
(?<![\w.-]) - no letters, digits, _, . and - immediately to the left of the current location are allowed
# - a # char
([A-Za-z]\w*(?:\.\w+)*) - Group 1:
[A-Za-z] - an ASCII letter
\w* - 0+ letters, digits, _
(?:\.\w+)* - 0+ sequences of
\. - dot
\w+ - 1+ letters, digits, _
(?!\S) - whitespace or end of string are required immediately to the right of the current location.

EDIT: Simpler version (same result)
^#[a-zA-Z](\.?[\w-]+)*$
Original
Another one:
^#[a-zA-Z][a-zA-Z_-]?(\.?[\w\d-]+){0,}$
^# starts with #
[a-zA-Z] first char
[a-zA-Z_-]? match a-zA-Z_- 0 or more times
( start group
\.? match . (optional)
[\w\d-]+ match a-zA-Z0-9-_ 1 or more times
) end group
{0,} repeat group 0 to infinite times
$ end
Tests
valid:
#validusername
#valid.user.name
#valid-user-name
#valid_user-name
#valid-user123_name
#a.valid-user123_name
not valid:
#-invalid.user
#_invalid.user
#1notvalid-user_123name33
#.user.name
#user.name.
#user..name

Capturing group with optional start and end characters

i have the follow string: find me String1\String2\String3, so i wanna capture string1, 2 and 3 if they exist. String 3 can be optional.
So far, what i could make is: (?<=find me)\s(\\?[\w]+\\?){1,3}, my assumption was:
The string should have find meat the beggining but it should not be captured
a whitespace
a group with \ as optional character at the beggining of the string, a word following it and \at the end of it, optional too, it can appear from 1 to 3 times.
What is wrong with my regex pattern?

Assuming your regex flavor supports \G, you can use this regex to capture all 3 strings separately:
(?<=find me |(?<!^)\G\\)\w+
RegEx Demo
\G asserts position at the end of the previous match or the start of the string for the first match.
\G matches a position that either line start OR end of the previous match. In this case I also have a negative lookbehind (?<!^) which means don't match line start, hence it makes \G match only the positions that end of the previous matches. For your example, it matches twice i.e. end of String1 and end of String2.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Creating a regular expression that will match requirements in a string - php

Related

How to capture all phrases which doesn't have a pattern in the middle of theirself?

RegEx for matching specific HTML Entity pattern (Emoji)

Simple regex to match some characters and exclude others

Update a regex that matches twitter like mentions to allow for dots

Capturing group with optional start and end characters

Categories

Resources