Regex group include if condition - php

i have try to use that regex /^(\S+)(?:\?$|$)/
with yolo and yolo?
works with both but on the second string (yolo?) the ? will be include on the capturing group (\S+).
It's a bug of regex or i have made some mistake?
edit: i don't want that the '?' included on the capturing group. Sry for my bad english.

You can use
If what you want to capture can't have a ? in it, use a negated character class [^...] (see demo here):
^([^\s?]+)\??$
If what you want to capture can have ? in it (for example, yolo?yolo? and you want
yolo?yolo), you need to make your quantifier + lazy by adding ? (see demo here):
^(\S+?)\??$
There is BTW no need for a capturing group here, you can use a look ahead (?=...) instead and look at the whole match (see demo here):
^[^\s?]+(?=\??$)
What was happening
The rules are: quantifiers (like +) are greedy by default, and the regex engine will return the first match it finds.
Considers what this means here:
\S+ will first match everything in yolo?, then the engine will try to match (?:\?$|$).
\?$ fails (we're already at the end of the string, so we now try to match an empty string and there's no ? left), but $ matches.
The regex has succesfully reached its end, the engine returns the match where \S+ has matched all the string and everything is in the first capturing group.
To match what you want you have to make the quantifier lazy (+?), or prevent the character class (yeah, \S is a character class) from matching your ending delimiter ? (with [^\s?] for example).

This is the correct response as \S+ matches one or more non-whitespace characters greedily, of which ? is one.
thus the question mark is matched in the (\S+) group and the non-capturing group resolves to $ you could make it work as you expect by making the match non-greedy with:
/^(\S+?)(?:\?$|$)/
demo
alternatively you could restrict the character group:
/^([^\s?]+)(?:\?$|$)/
demo

Make the + non greedy:
^(\S+?)\??$

The below regex would capture all the non space characters followed by an option ?,
^([\S]+)\??$
DEMO
OR
^([\w]+)\??$
DEMO
If you use \S+, it matches even the ? character also. So to seperate word and non word character you could use the above regex. It would capture only the word characters and matches the optional ? which is follwed by one or more word characters.

It is doing that because \S matches any non-white space character and it is being greedy.
Following the + quantifier with ? for a non-greedy match will prevent this.
^(\S+?)\??$
Or use \w here which matches any word character.
^(\w+)\??$

Related

regex skip match if its follows by whitespace and a keyword

Currently trying to match comments with regexes but only if no function follows.
Currently I use a regex which also matches the keyword function.
And then check in the source code (php) if this group is set or not.
/\/\*\*.*?\*\/\s*(function)?/sg
https://regex101.com/r/l0j1ip/1
Now the question is whether it is possible to realize with pure regex.
I have tried it with a simple negative lookahead but without success.
Although the comment is no longer made individually, but then just with the subsequent comment.
/\/\*\*.*?\*\/\s*(?!function)/sg
https://regex101.com/r/PuUUw6/1
Next I tried non capture group. But also there without success.
/(?:\/\*\*.*?\*\/\s*function)|\/\*\*.*?\*\/\s*/sg
https://regex101.com/r/wkQE7E/1
After a comment with the information (*SKIP)(*FAIL) I also tried it without success.
All matches above this keyword are skipped. Also the single matches are skipped.
/\/\*\*.*?\*\/\s*function(*SKIP)(*FAIL)|\/\*\*.*?\*\//sg
https://regex101.com/r/OJSFrF/1
After reading the question again, it should be doable using negative lookahead ; the repetition must be inside the negative expression:
/\/\*\*((?!\*\/).)*\*\/(?!\s*function)/sg
Seems you need to understand better how backtracking works, using .*? instead of .* means the regex engine will try first to match everything after before .* however the negative lookahead makes the match fail and .* continues to match. Using ((?!\*\/).)* can't match \*\/ wheras .*? can, after backtracking.
Another solution is to use atomic group (?>\/\*\*.*?\*\/)(?!\s*function).
Another option without the /s flag could be
/\*\*(?:[^*]*+|\*(?!/)[^*]*+)*\*/(?!\s*function)
The pattern matches:
/\*\* Match /**
(?: Non capture group
[^*]*+ Match any char except * using a possessive quantifier
| Or
\*(?!/) Match * not followed by /
[^*]*+ Match any char except * using a possessive quantifier
)* Close non capture group and optionally repeat
\*/ Match */
(?!\s*function) Negative lookahead, assert not optional whitspace chars followed by function to the right
Regex demo
Note that you don't have to escape the backslash when using a different delimiter.
$regex = '~/\*\*(?:[^*]*+|\*(?!/)[^*]*+)*\*/(?!\s*function)~';

Regex match exact words

I want my regex to match ?ver and ?v, but not ?version
This is what I have so far: $parts = preg_split( "(\b\?ver\b|\b\?v\b)", $src );
I think the trouble might be how I escape the ?.
Your pattern tries to match a ? that is preceded with a word char, and since there is none, you do not have a match.
Use the following pattern:
'/\?v(?:er)?\b/'
See the regex demo
Pattern details:
\? - a literal ? char
v(?:er)? - v or ver
\b - a word boundary (i.e. there must be a non-word char (not a digit, letter or _) or end of string after v or ver).
Note you do not need the first (initial) word boundary as it is already there, between a ? (a non-word char) and v (a word char). You would need a word boundary there if the ? were optional.
Try the following regex pattern;
(\?v(?:\b|(?:er(?!sion))))
Demo
This will allow ?ver and ?v, but will use a negative look-ahead to prevent matching if ?ver is followed by sion, as in your case ?version.
Building upon above answers, to match a word without being a part of another you can try
\b(WORD_HERE)\b which in your case is \b(\?ver)\b
this will allow ver and prevent version average

PHP Regex display either abc or abc xyz format

I am trying to build regex for the expression to get values for either Boost Mobile or BoostMobile whichever is present.
Any suggestions please ?
In NFA regexes, in unanchored alternation groups, the first branch matched stops the group processing, the other branches located further on the right are not checked against the string. You may read more on that at Alternation with The Vertical Bar or Pipe Symbol.
So, swapping the values and simplifying the pattern you could use
/\b(Boost \s*Mobile|Boost)\b/i
However, the most effective way here is through using an optional group:
/\bBoost(?:\s*Mobile)?\b/i
^^ ^^
See the regex demo
The i case insensitive modifier is set on the whole regex. You need not switch it on and off at the beginning/end of the pattern. Also, \W* can match an empty string, so your way of checking a word boundary may fail here when \b will work.
Pattern details:
\b - leading word boundary
Boost - a literal substring
(?:\s*Mobile)? - an optional group matching 1 or 0 sequences of
\s* - 0+ whitespaces
Mobile - a literal substring
\b - trailing word boundary

REGEX - match words that contain letters repeating next to each other

im looking for a regex that matches words that repeat a letter(s) more than once and that are next to each other.
Here's an example:
This is an exxxmaple oooonnnnllllyyyyy!
By far I havent found anything that can exactly match:
exxxmaple and oooonnnnllllyyyyy
I need to find it and place them in an array, like this:
preg_match_all('/\b(???)\b/', $str, $arr) );
Can somebody explain what regexp i have to use?
You can use a very simple regex like
\S*(\w)(?=\1+)\S*
See how the regex matches at http://regex101.com/r/rF3pR7/3
\S matches anything other than a space
* quantifier, zero or more occurance of \S
(\w) matches a single character, captures in \1
(?=\1+) postive look ahead. Asserts that the captrued character is followed by itsef \1
+ quantifiers, one or more occurence of the repeated character
\S* matches anything other than space
EDIT
If the repeating must be more than once, a slight modification of the regex would do the trick
\S*(\w)(?=\1{2,})\S*
for example http://regex101.com/r/rF3pR7/5
Use this if you want discard words like apple etc .
\b\w*(\w)(?=\1\1+)\w*\b
or
\b(?=[^\s]*(\w)\1\1+)\w+\b
Try this.See demo.
http://regex101.com/r/kP8uF5/20
http://regex101.com/r/kP8uF5/21
You can use this pattern:
\b\w*?(\w)\1{2}\w*
The \w class and the word-boundary \b limit the search to words. Note that the word boundary can be removed, however, it reduces the number of steps to obtain a match (as the lazy quantifier). Note too, that if you are looking for words (in the common meaning), you need to remove the word boundary and to use [a-zA-Z] instead of \w.
(\w)\1{2} checks if a repeated character is present. A word character is captured in group 1 and must be followed with the content of the capture group (the backreference \1).

Regular Expression to match ([^>(),]+) but include some \w's in it?

I'm using php's preg_replace function, and I have the following regex:
(?:[^>(),]+)
to match any characters but >(),. The problem is that I want to make sure that there is at least one letter in it (\w) and the match is not empty, how can I do that?
Is there a way to say what i DO WANT to match in the [^>(),]+ part?
You can add a lookahead assertion:
(?:(?=.*\p{L})[^>(),]+)
This makes sure that there will be at least one letter (\p{L}; \w also matches digits and underscores) somewhere in the string.
You don't really need the (?:...) non-capturing parentheses, though:
(?=.*\p{L})[^>(),]+
works just as well. Also, to ensure that we always match the entire string, it might be a good idea to surround the regex with anchors:
^(?=.*\p{L})[^>(),]+$
EDIT:
For the added requirement of not including surrounding whitespace in the match, things get a little more complicated. Try
^(?=.*\p{L})(\s*)((?:(?!\s*$)[^>(),])+)(\s*)$
In PHP, for example to replace all those strings we found with REPLACEMENT, leaving leading and trailing whitespace alone, this could look like this:
$result = preg_replace(
'/^ # Start of string
(?=.*\p{L}) # Assert that there is at least one letter
(\s*) # Match and capture optional leading whitespace (--> \1)
( # Match and capture... (--> \2)
(?: # ...at least one character of the following:
(?!\s*$) # (unless it is part of trailing whitespace)
[^>(),] # any character except >(),
)+ # End of repeating group
) # End of capturing group
(\s*) # Match and capture optional trailing whitespace (--> \3)
$ # End of string
/xu',
'\1REPLACEMENT\3', $subject);
You can just "insert" \w inside (?:[^>(),]+\w[^>(),]+). So it will have at least one letter and obviously not empty. BTW \w captures digits as well as letters. If you want only letters you can use unicode letter character class \p{L} instead of \w.
How about this:
(?:[^>(),]*\w[^>(),]*)

Categories