Capturing group with optional start and end characters - php

i have the follow string: find me String1\String2\String3, so i wanna capture string1, 2 and 3 if they exist. String 3 can be optional.
So far, what i could make is: (?<=find me)\s(\\?[\w]+\\?){1,3}, my assumption was:
The string should have find meat the beggining but it should not be captured
a whitespace
a group with \ as optional character at the beggining of the string, a word following it and \at the end of it, optional too, it can appear from 1 to 3 times.
What is wrong with my regex pattern?

Assuming your regex flavor supports \G, you can use this regex to capture all 3 strings separately:
(?<=find me |(?<!^)\G\\)\w+
RegEx Demo
\G asserts position at the end of the previous match or the start of the string for the first match.
\G matches a position that either line start OR end of the previous match. In this case I also have a negative lookbehind (?<!^) which means don't match line start, hence it makes \G match only the positions that end of the previous matches. For your example, it matches twice i.e. end of String1 and end of String2.

Related

regex (regular expression): parse the optional end of the string if it exists

Please help me to find the end of the string and parse it.
I have the string like this "Some words" and "Some another words||MyTag". Please help me with regex to check any of this strings. But in the second case please extract "MyTag". So please make two groups: "Some words" or "Some another words" and MyTag or empty string.
I tried "^([\W\w]+)(?:||(.*))?" but without any success.
You can create 2 capture groups:
^(.+?)(?:\|\|(.+))?$
^ Start of string
(.+?) Capture group 1 Match any character except newlines as least as possible
(?: Non capture group
\|\|(.+) Match || and capture 1+ chars in group 2
)? Close non capture group and make it optional
$ End of string
If the second part is not present, then there will be no group 2 value.
Regex demo
You might also consider to split on ||
Another option without a non greedy dot is to not allow matching |, only when it is not directly followed by | using a negative lookahead.
^([^\r\n|]*(?:\|(?!\|)[^\r\n|]*)*)(?:\|\|(.+))?$
Regex demo

RegEx for matching specific HTML Entity pattern (Emoji)

I am working in this regular expression to match exactly the following pattern. The issue is that if it is exceeded, the pattern should not be considered:
I want exactly 6 digits starting with #, but if I write {5} returns true. Then the same happens with ; I want exactly one and to be at the end. Also, I don't know how to use here the $ to specify the final character.
if(preg_match(('/^(#)+([0-9]{6}){1}(;)/'),"#128515;")){
return true;
}
SHOULD BE IN THIS FORMAT:
#128515; for #DDDDDD; not ##DDDD;;
Exactly 6 digits start with one # and finish with one ;
preg_match will return 1 when it matches given subject and if you have 6 digits, it can match 5 as well when there is no ending semicolon as there is no ending boundary set.
You could add anchors ^ and $ to assert the start and the end of the string so it matches exactly 6 digits.
From your pattern you can omit {1} because the group is already matched 1 time.
If you don't reference to the groups in the code you could also omit them and just us a match only.
You could use:
^#[0-9]{6};$
^ Start of string
# Match #
[0-9]{6}; Match 6 digits
$ Assert end of string
Your code could look like
if(preg_match(('/^#[0-9]{6};$/'),"#128515;")){
return true;
}

Creating a regular expression that will match requirements in a string

The issue
I need to write a regular expression that will match the following requirements in a string with the structure {A/B}.
Requirements/Conditions:
A and B can only be exactly one of [UGWRB].
A structure where U or G do not appear is invalid.
A structure where both characters are equal is invalid.
U or G must appear in the combination at least once.
The structure can repeat or continue infinite times, as long as each following instance is still valid when read alone. (see valid matches below)
Valid matches:
{U/G}{U/G}{U/G}
{W/G}{U/B}
{U/G}{U/B}
{U/G}
{G/U}
{U/B}
...
Invalid matches:
{U/U}{U/U}
{U/U}{G/G}
{U/G}{U/U}
{U/G}{R/B}
{G/G}
{R/B}
{W/R}
{B/W}
...
My attempt
This is what I have gotten so far, but out of all the combinations of UGWRB, I'm only getting 8 matches out of 14.
{([UG])(?(1)|\w)\/(?(1)\w|[UG])}
You have to work with lookaheads both negative and positive in order to accomplish the task:
^(?:{(?=[^{}]*[UG])([UGWRB])\/(?!\1)(?1)})+$
See live demo here
Note that m flag should be set.
Regex breakdown:
^ Match start of input string
(?: Start of non-capturing group
{ Match { literally
(?= Start of positive lookahead
[^{}]*[UG] Look for [UG] in combination
) End of lookahead
([UGWRB]) Match and capture a letter from character class
\/(?!\1)(?1) Match / and see if next char is not the same as recently captured one
} Match } literally
)+ End of group, repeat at least once
$ Match end of input string
Try this regex:
^(?!.*{([UGWRB])\/\1})(?:{(?(?=[UG]).\/[UGWRB]|[WRB]\/[UG])})+$
Click for Demo
Explanation:
^ - matches the start of the string
(?!.*{([UGWRB])\/\1}) - negative lookahead to make sure that the structures like {G/G} or {U/U} or {R/R} are not present anywhere in the string
{ - matches {
(?(?=[UG]).\/[UGWRB]|[WRB]\/[UG]) - Regex Conditional. If the current position is followed by either U or G, then the match that character followed by / and the character class [UGWRB]. Otherwise, match the character class [WRB] followed by / followed by U or G
} - matches }
+ - matches 1+ occurrences of the above sub-sequence (?:{(?(?=[UG]).\/[UGWRB]|[WRB]\/[UG])})
$ - matches the end of the string

REGEX - match words that contain letters repeating next to each other

im looking for a regex that matches words that repeat a letter(s) more than once and that are next to each other.
Here's an example:
This is an exxxmaple oooonnnnllllyyyyy!
By far I havent found anything that can exactly match:
exxxmaple and oooonnnnllllyyyyy
I need to find it and place them in an array, like this:
preg_match_all('/\b(???)\b/', $str, $arr) );
Can somebody explain what regexp i have to use?
You can use a very simple regex like
\S*(\w)(?=\1+)\S*
See how the regex matches at http://regex101.com/r/rF3pR7/3
\S matches anything other than a space
* quantifier, zero or more occurance of \S
(\w) matches a single character, captures in \1
(?=\1+) postive look ahead. Asserts that the captrued character is followed by itsef \1
+ quantifiers, one or more occurence of the repeated character
\S* matches anything other than space
EDIT
If the repeating must be more than once, a slight modification of the regex would do the trick
\S*(\w)(?=\1{2,})\S*
for example http://regex101.com/r/rF3pR7/5
Use this if you want discard words like apple etc .
\b\w*(\w)(?=\1\1+)\w*\b
or
\b(?=[^\s]*(\w)\1\1+)\w+\b
Try this.See demo.
http://regex101.com/r/kP8uF5/20
http://regex101.com/r/kP8uF5/21
You can use this pattern:
\b\w*?(\w)\1{2}\w*
The \w class and the word-boundary \b limit the search to words. Note that the word boundary can be removed, however, it reduces the number of steps to obtain a match (as the lazy quantifier). Note too, that if you are looking for words (in the common meaning), you need to remove the word boundary and to use [a-zA-Z] instead of \w.
(\w)\1{2} checks if a repeated character is present. A word character is captured in group 1 and must be followed with the content of the capture group (the backreference \1).

regex matches numbers, but not letters

I have a string that looks like this:
[if-abc] 12345 [if-def] 67890 [/if][/if]
I have the following regex:
/\[if-([a-z0-9-]*)\]([^\[if]*?)\[\/if\]/s
This matches the inner brackets just like I want it to. However, when I replace the 67890 with text (ie. abcdef), it doesn't match it.
[if-abc] 12345 [if-def] abcdef [/if][/if]
I want to be able to match ANY characters, including line breaks, except for another opening bracket [if-.
This part doesn't work like you think it does:
[^\[if]
This will match a single character that is neither of [, i or f. Regardless of the combination. You can mimic the desired behavior using a negative lookahead though:
~\[if-([a-z0-9-]*)\]((?:(?!\[/?if).)*)\[/if\]~s
I've also included closing tags in the lookahead, as this avoid the ungreedy repetition (which is usually worse performance-wise). Plus, I've changed the delimiters, so that you don't have to escape the slash in the pattern.
So this is the interesting part ((?:(?!\[/?if).)*) explained:
( # capture the contents of the tag-pair
(?: # start a non-capturing group (the ?: are just a performance
# optimization). this group represents a single "allowed" character
(?! # negative lookahead - makes sure that the next character does not mark
# the start of either [if or [/if (the negative lookahead will cause
# the entire pattern to fail if its contents match)
\[/?if
# match [if or [/if
) # end of lookahead
. # consume/match any single character
)* # end of group - repeat 0 or more times
) # end of capturing group
Modifying a little results in:
/\[if-([a-z0-9-]+)\](.+?)(?=\[if)/s
Running it on [if-abc] 12345 [if-def] abcdef [/if][/if]
Results in a first match as: [if-abc] 12345
Your groups are: abc and 12345
And modifying even further:
/\[if-([a-z0-9-]+)\](.+?)(?=(?:\[\/?if))/s
matches both groups. Although the delimiter [/if] is not captured by either of these.
NOTE: Instead of matching the delimeters I used a lookahead ((?=)) in the regex to stop when the text ahead matches the lookahead.
Use a period to match any character.

Categories