Regular Expression to match ([^>(),]+) but include some \w's in it?

Regular Expression to match ([^>(),]+) but include some \w's in it? - php

I'm using php's preg_replace function, and I have the following regex:
(?:[^>(),]+)
to match any characters but >(),. The problem is that I want to make sure that there is at least one letter in it (\w) and the match is not empty, how can I do that?
Is there a way to say what i DO WANT to match in the [^>(),]+ part?

You can add a lookahead assertion:
(?:(?=.*\p{L})[^>(),]+)
This makes sure that there will be at least one letter (\p{L}; \w also matches digits and underscores) somewhere in the string.
You don't really need the (?:...) non-capturing parentheses, though:
(?=.*\p{L})[^>(),]+
works just as well. Also, to ensure that we always match the entire string, it might be a good idea to surround the regex with anchors:
^(?=.*\p{L})[^>(),]+$
EDIT:
For the added requirement of not including surrounding whitespace in the match, things get a little more complicated. Try
^(?=.*\p{L})(\s*)((?:(?!\s*$)[^>(),])+)(\s*)$
In PHP, for example to replace all those strings we found with REPLACEMENT, leaving leading and trailing whitespace alone, this could look like this:
$result = preg_replace(
'/^ # Start of string
(?=.*\p{L}) # Assert that there is at least one letter
(\s*) # Match and capture optional leading whitespace (--> \1)
( # Match and capture... (--> \2)
(?: # ...at least one character of the following:
(?!\s*$) # (unless it is part of trailing whitespace)
[^>(),] # any character except >(),
)+ # End of repeating group
) # End of capturing group
(\s*) # Match and capture optional trailing whitespace (--> \3)
$ # End of string
/xu',
'\1REPLACEMENT\3', $subject);

You can just "insert" \w inside (?:[^>(),]+\w[^>(),]+). So it will have at least one letter and obviously not empty. BTW \w captures digits as well as letters. If you want only letters you can use unicode letter character class \p{L} instead of \w.

How about this:
(?:[^>(),]*\w[^>(),]*)

Related

Regular expression alphanumeric with dash and underscore and space, but not at the beginning or at the end of the string [duplicate]

I want to design an expression for not allowing whitespace at the beginning and at the end of a string, but allowing in the middle of the string.
The regex I've tried is this:
\^[^\s][a-z\sA-Z\s0-9\s-()][^\s$]\

This should work:
^[^\s]+(\s+[^\s]+)*$
If you want to include character restrictions:
^[-a-zA-Z0-9-()]+(\s+[-a-zA-Z0-9-()]+)*$
Explanation:
the starting ^ and ending $ denotes the string.
considering the first regex I gave, [^\s]+ means at least one not whitespace and \s+ means at least one white space. Note also that parentheses () groups together the second and third fragments and * at the end means zero or more of this group.
So, if you take a look, the expression is: begins with at least one non whitespace and ends with any number of groups of at least one whitespace followed by at least one non whitespace.
For example if the input is 'A' then it matches, because it matches with the begins with at least one non whitespace condition. The input 'AA' matches for the same reason. The input 'A A' matches also because the first A matches for the at least one not whitespace condition, then the ' A' matches for the any number of groups of at least one whitespace followed by at least one non whitespace.
' A' does not match because the begins with at least one non whitespace condition is not satisfied. 'A ' does not matches because the ends with any number of groups of at least one whitespace followed by at least one non whitespace condition is not satisfied.
If you want to restrict which characters to accept at the beginning and end, see the second regex. I have allowed a-z, A-Z, 0-9 and () at beginning and end. Only these are allowed.
Regex playground: http://www.regexr.com/

This RegEx will allow neither white-space at the beginning nor at the end of your string/word.
^[^\s].+[^\s]$
Any string that doesn't begin or end with a white-space will be matched.
Explanation:
^ denotes the beginning of the string.
\s denotes white-spaces and so [^\s] denotes NOT white-space. You could alternatively use \S to denote the same.
. denotes any character expect line break.
+ is a quantifier which denote - one or more times. That means, the character which + follows can be repeated on or more times.
You can use this as RegEx cheat sheet.

In cases when you have a specific pattern, say, ^[a-zA-Z0-9\s()-]+$, that you want to adjust so that spaces at the start and end were not allowed, you may use lookaheads anchored at the pattern start:
^(?!\s)(?![\s\S]*\s$)[a-zA-Z0-9\s()-]+$
^^^^^^^^^^^^^^^^^^^^
Here,
(?!\s) - a negative lookahead that fails the match if (since it is after ^) immediately at the start of string there is a whitespace char
(?![\s\S]*\s$) - a negative lookahead that fails the match if, (since it is also executed after ^, the previous pattern is a lookaround that is not a consuming pattern) immediately at the start of string, there are any 0+ chars as many as possible ([\s\S]*, equal to [^]*) followed with a whitespace char at the end of string ($).
In JS, you may use the following equivalent regex declarations:
var regex = /^(?!\s)(?![\s\S]*\s$)[a-zA-Z0-9\s()-]+$/
var regex = /^(?!\s)(?![^]*\s$)[a-zA-Z0-9\s()-]+$/
var regex = new RegExp("^(?!\\s)(?![^]*\\s$)[a-zA-Z0-9\\s()-]+$")
var regex = new RegExp(String.raw`^(?!\s)(?![^]*\s$)[a-zA-Z0-9\s()-]+$`)
If you know there are no linebreaks, [\s\S] and [^] may be replaced with .:
var regex = /^(?!\s)(?!.*\s$)[a-zA-Z0-9\s()-]+$/
See the regex demo.
JS demo:
var strs = ['a b c', ' a b b', 'a b c '];
var regex = /^(?!\s)(?![\s\S]*\s$)[a-zA-Z0-9\s()-]+$/;
for (var i=0; i<strs.length; i++){
console.log('"',strs[i], '"=>', regex.test(strs[i]))
}

if the string must be at least 1 character long, if newlines are allowed in the middle together with any other characters and the first+last character can really be anyhing except whitespace (including ##$!...), then you are looking for:
^\S$|^\S[\s\S]*\S$
explanation and unit tests: https://regex101.com/r/uT8zU0

This worked for me:
^[^\s].+[a-zA-Z]+[a-zA-Z]+$
Hope it helps.

How about:
^\S.+\S$
This will match any string that doesn't begin or end with any kind of space.

^[^\s].+[^\s]$
That's it!!!! it allows any string that contains any caracter (a part from \n) without whitespace at the beginning or end; in case you want \n in the middle there is an option s that you have to replace .+ by [.\n]+

pattern="^[^\s]+[-a-zA-Z\s]+([-a-zA-Z]+)*$"
This will help you accept only characters and wont allow spaces at the start nor whitespaces.

This is the regex for no white space at the begining nor at the end but only one between. Also works without a 3 character limit :
\^([^\s]*[A-Za-z0-9]\s{0,1})[^\s]*$\ - just remove {0,1} and add * in order to have limitless space between.

As a modification of #Aprillion's answer, I prefer:
^\S$|^\S[ \S]*\S$
It will not match a space at the beginning, end, or both.
It matches any number of spaces between a non-whitespace character at the beginning and end of a string.
It also matches only a single non-whitespace character (unlike many of the answers here).
It will not match any newline (\n), \r, \t, \f, nor \v in the string (unlike Aprillion's answer). I realize this isn't explicit to the question, but it's a useful distinction.

Letters and numbers divided only by one space. Also, no spaces allowed at beginning and end.
/^[a-z0-9]+( [a-z0-9]+)*$/gi

I found a reliable way to do this is just to specify what you do want to allow for the first character and check the other characters as normal e.g. in JavaScript:
RegExp("^[a-zA-Z][a-zA-Z- ]*$")
So that expression accepts only a single letter at the start, and then any number of letters, hyphens or spaces thereafter.

use /^[^\s].([A-Za-z]+\s)*[A-Za-z]+$/. this one. it only accept one space between words and no more space at beginning and end

If we do not have to make a specific class of valid character set (Going to accept any language character), and we just going to prevent spaces from Start & End, The must simple can be this pattern:
/^(?! ).*[^ ]$/
Try on HTML Input:
input:invalid {box-shadow:0 0 0 4px red}
/* Note: ^ and $ removed from pattern. Because HTML Input already use the pattern from First to End by itself. */
<input pattern="(?! ).*[^ ]">
Explaination
^ Start of
(?!...) (Negative lookahead) Not equal to ... > for next set
Just Space / \s (Space & Tabs & Next line chars)
(?! ) Do not accept any space in first of next set (.*)
. Any character (Execpt \n\r linebreaks)
* Zero or more (Length of the set)
[^ ] Set/Class of Any character expect space
$ End of
Try it live: https://regexr.com/6e1o4

^[^0-9 ]{1}([a-zA-Z]+\s{1})+[a-zA-Z]+$
-for No more than one whitespaces in between , No spaces in first and last.
^[^0-9 ]{1}([a-zA-Z ])+[a-zA-Z]+$
-for more than one whitespaces in between , No spaces in first and last.

Other answers introduce a limit on the length of the match. This can be avoided using Negative lookaheads and lookbehinds:
^(?!\s)([a-zA-Z0-9\s])*?(?<!\s)$
This starts by checking that the first character is not whitespace ^(?!\s). It then captures the characters you want a-zA-Z0-9\s non greedily (*?), and ends by checking that the character before $ (end of string/line) is not \s.
Check that lookaheads/lookbehinds are supported in your platform/browser.

Here you go,
\b^[^\s][a-zA-Z0-9]*\s+[a-zA-Z0-9]*\b
\b refers to word boundary
\s+ means allowing white-space one or more at the middle.

(^(\s)+|(\s)+$)
This expression will match the first and last spaces of the article..

Match regular expression specific character quantities in any order

I need to match a series of strings that:
Contain at least 3 numbers
0 or more letters
0 or 1 - (not more)
0 or 1 \ (not more)
These characters can be in any position in the string.
The regular expression I have so far is:
([A-Z0-9]*[0-9]{3,}[\/]?[\-]?[0-9]*[A-Z]*)
This matches the following data in the following cases. The only one that does not match is the first one:
02ABU-D9435
013DFC
1123451
03323456782
ADS7124536768
03SDFA9433/0
03SDFA9433/
03SDFA9433/1
A41B03423523
O4AGFC4430
I think perhaps I am being too prescriptive about positioning. How can I update this regex to match all possibilities?
PHP PCRE
The following would not match:
01/01/2018 [multiple / or -]
AA-AA [no numbers]
Thanks

One option could be using lookaheads to assert 3 digits, not 2 backslashes and not 2 times a hyphen.
(?<!\S)(?=(?:[^\d\s]*\d){3})(?!(?:[^\s-]*-){2})(?!(?:[^\s\\]*\\){2})[A-Z0-9/\\-]+(?!\S)
About the pattern
(?<!\S) Assert what is on the left is not a non whitespace char
(?=(?:[^\d\s]*\d){3}) Assert wat is on the right is 3 times a whitespace char or digit
(?!(?:[^\s-]*-){2}) Assert what is on the right is not 2 times a whitespace char a hyphen
(?!(?:[^\s\\]*\\){2}) Assert what is on the right is not 2 times a whitespace char a backslash
[A-Z0-9/\\-]+ Match any of the listed 1+ times
(?!\S) Assert what is on the right is not a non whitespace char
Regex demo

Your patterns can be checked with positive/negative lookaheads anchored at the start of the string:
at least 3 digits -> find (not necessarily consecutive) 3 digits
no more than 1 '-' -> assert absence of (not necessarily consecutive) 2 '-' characters
no more than 1 '/' -> assert absence of (not necessarily consecutive) 2 '/' characters
0 or more letters -> no check needed.
If these conditions are met, any content is permitted.
The regex implementing this:
^(?=(([^0-9\r\n]*\d){3}))(?!(.*-){2})(?!(.*\/){2}).*$
Check out this Regex101 demo.
Remark
This solution assumes that each string tested resides on its own line, ie. not just being separated by whitespace.
In case the strings are separated by whitespace, choose the solution of user #TheFourthBird (which essentially is the same as this one but caters for the whitespace separation)

You can test the condition for both the hyphen and the slash into a same lookahead using a capture group and a backreference:
~\A(?!.*([-/]).*\1)(?:[A-Z/-]*\d){3,}[A-Z/-]*\z~
demo
detailled:
~ # using the tild as pattern delimiter avoids to escape all slashes in the pattern
\A # start of the string
(?! .* ([-/]) .* \1 ) # negative lookahead:
# check that there's no more than one hyphen and one slash
(?: [A-Z/-]* \d ){3,} # at least 3 digits
[A-Z/-]* # eventual other characters until the end of the string
\z # end of the string.
~
To better understand (if you are not familiar with): these three subpatterns start from the same position (in this case the beginning of the string):
\A
(?! .* ([-/]) .* \1 )
(?: [A-Z/-]* \d ){3,}
This is possible only because the two first are zero-width assertions that are simple tests and don't consume any character.

How do grab words that start and end with double percent signs?

I am really bad at regex and I am trying to do the following:
How do I get all strings that starts and end with %%.
If these words appear in a string I want to be able to grab them: %%HELLO_WOLD%%, %%STUFF%%
Here's what I came up with so far: %%[a-zA-Z0-9]\w+

You could use anchors to assert the start ^ and the end $ of the line and match zero or more times any character .* or if there must be at least one character your might use .+
^%%.*%%$
Or instead of .* you could add your character class [a-zA-Z0-9]+ which will match lower and uppercase characters and digits or use the \w+ which will match a word character.
Note that the character class [a-zA-Z0-9] does not match an underscore and \w does.
If you want to find multiple matches in a string you might use %%\w+%%. This will also match %%HELLO_WOLD%% in %%%%%HELLO_WOLD%%%.
If there should be only 2 percentage signs at the beginning and at the end, you could use a positive lookahead (?= and positive lookbehind (?<= to assert that what is before and after the 2 percentage signs is not a percentage sign or are the start ^ or end $ of the string.
(?<=^|[^%])%%\w+%%(?=[^%]|$)

PHP regex: zero or more whitespace not working

I'm trying to apply a regex constraint to a Symfony form input. The requirement for the input is that the start of the string and all commas must be followed by zero or more whitespace, then a # or # symbol, except when it's the empty string.
As far as I can tell, there is no way to tell the constraint to use preg_match_all instead of just preg_match, but it does have the ability to negate the match. So, I need a regular expression that preg_match will NOT MATCH for the given scenario: any string containing the start of the string or a comma, followed by zero or more whitespace, followed by any character that is not a # or # and is not the end of the string, but will match for everything else. Here are a few examples:
preg_match(..., ''); // No match
preg_match(..., '#yolo'); // No match
preg_match(..., '#yolo, #swag'); // No match
preg_match(..., '#yolo,#swag'); // No match
preg_match(..., '#yolo, #swag,'); // No match
preg_match(..., 'yolo'); // Match
preg_match(..., 'swag,#yolo'); // Match
preg_match(..., '#swag, yolo'); // Match
I would've thought for sure that /(^|,)\s*[^##]/ would work, but it's failing in every case with 1 or more spaces and it appears to be because of the asterisk. If I get rid of the asterisk, preg_match('/(^|,)\s[^##]/', '#yolo, #swag') does not match (as desired) when there's exactly once space, but as as soon as I reintroduce the asterisk it breaks for any quantity of spaces > 0.
My theory is that the regex engine is interpreting the second space as a character that is not in the character set [##], but that's just a theory and I don't know what to do about it. I know that I could create a custom constraint to use preg_match_all instead to get around this, but I'd like to avoid that if possible.

You may use
'~(?:^|,)\s*+[^##]~'
Here, the + symbol defines a *+ possessive quantifier matching 0 or more occurrences of whitespace chars, and disallowing the regex engine to backtrack into \s* pattern if [^##] cannot match the subsequent char.
See the regex demo.
Details
(?:^|,) - either start of string or ,
\s*+ - zero or more whitespace chars, possessively matched (i.e. if the next char is not matched with [^##] pattern, the whole pattern match will fail)
[^##] - a negated character class matching any char but # and #.

regex matches numbers, but not letters

I have a string that looks like this:
[if-abc] 12345 [if-def] 67890 [/if][/if]
I have the following regex:
/\[if-([a-z0-9-]*)\]([^\[if]*?)\[\/if\]/s
This matches the inner brackets just like I want it to. However, when I replace the 67890 with text (ie. abcdef), it doesn't match it.
[if-abc] 12345 [if-def] abcdef [/if][/if]
I want to be able to match ANY characters, including line breaks, except for another opening bracket [if-.

This part doesn't work like you think it does:
[^\[if]
This will match a single character that is neither of [, i or f. Regardless of the combination. You can mimic the desired behavior using a negative lookahead though:
~\[if-([a-z0-9-]*)\]((?:(?!\[/?if).)*)\[/if\]~s
I've also included closing tags in the lookahead, as this avoid the ungreedy repetition (which is usually worse performance-wise). Plus, I've changed the delimiters, so that you don't have to escape the slash in the pattern.
So this is the interesting part ((?:(?!\[/?if).)*) explained:
( # capture the contents of the tag-pair
(?: # start a non-capturing group (the ?: are just a performance
# optimization). this group represents a single "allowed" character
(?! # negative lookahead - makes sure that the next character does not mark
# the start of either [if or [/if (the negative lookahead will cause
# the entire pattern to fail if its contents match)
\[/?if
# match [if or [/if
) # end of lookahead
. # consume/match any single character
)* # end of group - repeat 0 or more times
) # end of capturing group

Modifying a little results in:
/\[if-([a-z0-9-]+)\](.+?)(?=\[if)/s
Running it on [if-abc] 12345 [if-def] abcdef [/if][/if]
Results in a first match as: [if-abc] 12345
Your groups are: abc and 12345
And modifying even further:
/\[if-([a-z0-9-]+)\](.+?)(?=(?:\[\/?if))/s
matches both groups. Although the delimiter [/if] is not captured by either of these.
NOTE: Instead of matching the delimeters I used a lookahead ((?=)) in the regex to stop when the text ahead matches the lookahead.

Use a period to match any character.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regular Expression to match ([^>(),]+) but include some \w's in it? - php

You can just "insert" \w inside (?:[^>(),]+\w[^>(),]+). So it will have at least one letter and obviously not empty. BTW \w captures digits as well as letters. If you want only letters you can use unicode letter character class \p{L} instead of \w.

How about this: (?:[^>(),]\w[^>(),])

Related

Regular expression alphanumeric with dash and underscore and space, but not at the beginning or at the end of the string [duplicate]

Match regular expression specific character quantities in any order

How do grab words that start and end with double percent signs?

PHP regex: zero or more whitespace not working

regex matches numbers, but not letters

Categories

Resources

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Regular Expression to match ([^>(),]+) but include some \w's in it? - php

You can just "insert" \w inside (?:[^>(),]+\w[^>(),]+). So it will have at least one letter and obviously not empty. BTW \w captures digits as well as letters. If you want only letters you can use unicode letter character class \p{L} instead of \w.

How about this: (?:[^>(),]*\w[^>(),]*)

Related

Regular expression alphanumeric with dash and underscore and space, but not at the beginning or at the end of the string [duplicate]

Match regular expression specific character quantities in any order

How do grab words that start and end with double percent signs?

PHP regex: zero or more whitespace not working

regex matches numbers, but not letters

Categories

Resources

How about this: (?:[^>(),]\w[^>(),])