I got stuck with regexp to validate only numbers from 1-10 that could have two dashes(hyphens) before, for example:
--9
or
--10
or
--1
but not
--11 or not --0
I tried like seems to me everything, example:
/(-\-\[1-10])/
What is wrong?
EDIT 1:
Thanks a lot for so many working examples!!
What if I also wanted to validate to numbers before all of this, example:
8--10 but not 0--10 or not 11--11
I tried this but it didn't work:
/--([1-9]|10:[1-9]|10)\b/
EDIT 2:
Oh, this one works, finally:
/^(10|[1-9])--(10|[1-9])$/
Have a try with:
/\b(?:[1-9]|10)--(?:[1-9]|10)\b/
Change according to OP's edit.
Explanation:
The regular expression:
(?-imsx:\b(?:[1-9]|10)--(?:[1-9]|10)\b)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[1-9] any character of: '1' to '9'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
10 '10'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
-- '--'
----------------------------------------------------------------------
(?: group, but do not capture:
----------------------------------------------------------------------
[1-9] any character of: '1' to '9'
----------------------------------------------------------------------
| OR
----------------------------------------------------------------------
10 '10'
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
I guess this will fit
/\-\-([1-9]|10)\b/
if you don't want to capture your number, add ?: :
/\-\-(?:[1-9]|10)\b/
Outside a character class, you don't need to escape hyphens. Also, your character class [1-10] will only match 1 and 0, because [1-10] is equal to [10] and that will only match 1 and 0. Try this regex:
/^--(10|[1-9])$/
The correct regex is
/\b--([1-9]|10)\b/
You're incorrectly escaping the first [ of your character class as \[. The character class used is incorrect as well. It would be treated as a character class with members 1 to 1 and a 0 i.e. [10] which means it matches either 0 or 1.
Also, the hyphens - don't need to be escaped outside a character class []. To validate the numbers that come before the hyphens as well use
/\b([1-9]|10)--([1-9]|10)\b/
When you write [1-10], it mean characters 1 to 1 + the 0 character. It as if you had write [0-1].
In fact, in your case, it would be better to test cases --1 to --9 and case --10 separately with something like : /^(--10)|(--[1-9])$/
You can test your regex on http://myregexp.com/
Related
I need to match a series of strings that:
Contain at least 3 numbers
0 or more letters
0 or 1 - (not more)
0 or 1 \ (not more)
These characters can be in any position in the string.
The regular expression I have so far is:
([A-Z0-9]*[0-9]{3,}[\/]?[\-]?[0-9]*[A-Z]*)
This matches the following data in the following cases. The only one that does not match is the first one:
02ABU-D9435
013DFC
1123451
03323456782
ADS7124536768
03SDFA9433/0
03SDFA9433/
03SDFA9433/1
A41B03423523
O4AGFC4430
I think perhaps I am being too prescriptive about positioning. How can I update this regex to match all possibilities?
PHP PCRE
The following would not match:
01/01/2018 [multiple / or -]
AA-AA [no numbers]
Thanks
One option could be using lookaheads to assert 3 digits, not 2 backslashes and not 2 times a hyphen.
(?<!\S)(?=(?:[^\d\s]*\d){3})(?!(?:[^\s-]*-){2})(?!(?:[^\s\\]*\\){2})[A-Z0-9/\\-]+(?!\S)
About the pattern
(?<!\S) Assert what is on the left is not a non whitespace char
(?=(?:[^\d\s]*\d){3}) Assert wat is on the right is 3 times a whitespace char or digit
(?!(?:[^\s-]*-){2}) Assert what is on the right is not 2 times a whitespace char a hyphen
(?!(?:[^\s\\]*\\){2}) Assert what is on the right is not 2 times a whitespace char a backslash
[A-Z0-9/\\-]+ Match any of the listed 1+ times
(?!\S) Assert what is on the right is not a non whitespace char
Regex demo
Your patterns can be checked with positive/negative lookaheads anchored at the start of the string:
at least 3 digits -> find (not necessarily consecutive) 3 digits
no more than 1 '-' -> assert absence of (not necessarily consecutive) 2 '-' characters
no more than 1 '/' -> assert absence of (not necessarily consecutive) 2 '/' characters
0 or more letters -> no check needed.
If these conditions are met, any content is permitted.
The regex implementing this:
^(?=(([^0-9\r\n]*\d){3}))(?!(.*-){2})(?!(.*\/){2}).*$
Check out this Regex101 demo.
Remark
This solution assumes that each string tested resides on its own line, ie. not just being separated by whitespace.
In case the strings are separated by whitespace, choose the solution of user #TheFourthBird (which essentially is the same as this one but caters for the whitespace separation)
You can test the condition for both the hyphen and the slash into a same lookahead using a capture group and a backreference:
~\A(?!.*([-/]).*\1)(?:[A-Z/-]*\d){3,}[A-Z/-]*\z~
demo
detailled:
~ # using the tild as pattern delimiter avoids to escape all slashes in the pattern
\A # start of the string
(?! .* ([-/]) .* \1 ) # negative lookahead:
# check that there's no more than one hyphen and one slash
(?: [A-Z/-]* \d ){3,} # at least 3 digits
[A-Z/-]* # eventual other characters until the end of the string
\z # end of the string.
~
To better understand (if you are not familiar with): these three subpatterns start from the same position (in this case the beginning of the string):
\A
(?! .* ([-/]) .* \1 )
(?: [A-Z/-]* \d ){3,}
This is possible only because the two first are zero-width assertions that are simple tests and don't consume any character.
What regex would I need to use to achieve the following;
Check if string does not contain one or more options... I tried a lot expressions.
I think this is closest to be the correct one.
/^[^(256K)]$|^[^(2M)]$/
I would like preg_match to tell me if there is anything other than 256K or 2M, and I cant negate preg_match (!preg_match) for reasons that take to long to explain ;)
You can not place whole words or capturing groups inside of Character Classes. A character class matches any one character from a set of characters.
Your regular expression matches the beginning of the string, any character except: (, 2, 5, 6, K, ), followed by the end of the string, OR the beginning of the string, any character except: (, 2, M, ), followed by the end of string.
I believe you are wanting a Negative Lookahead here instead.
/^((?!256K|2M).)*$/i
Regular expression:
^ # the beginning of the string
( # group and capture to \1 (0 or more times)
(?! # look ahead to see if there is not:
256K # '256K'
| # OR
2M # '2M'
) # end of look-ahead
. # any character except \n
)* # end of \1
$ # before an optional \n, and the end of the string
How can I make a RegEx in PHP that only accepts 3-9 letters (uppercase) and 5-50 numbers?
I'm not that good at regular expressions. But this one doesn't work:
/[A-Z]{3,9}[0-9]{5,50}/
For instance, it matches ABC12345 but not A12345BC
Any ideas?
This is a classic "password validation"-type problem. For this, the "rough recipe" is to check each condition with a lookahead, then we match everything.
^(?=(?:[^A-Z]*[A-Z]){3,9}[^A-Z]*$)(?=(?:[^0-9]*[0-9]){5,50}[^0-9]*$)[A-Z0-9]*$
I'll explain this one below, but here's a variation that I'll leave for you to figure out.
^(?=(?:[^A-Z]*[A-Z]){3,9}[0-9]*$)(?=(?:[^0-9]*[0-9]){5,50}[A-Z]*$).*$
Let's look at the first regex piece by piece.
We anchor the regex between the head of string ^ and end of string $ assertions, ensuring that the match (if any) is the whole string.
We have two lookaheads: one for the capital letters, one for the digits.
After the lookaheads, [A-Z0-9]* matches the whole string (if it consists only of uppercase ASCII letters and digits). (Thanks to #TimPietzcker for pointing out that I was asleep at the wheel for starting out with a dot-star there.)
How do the lookaheads work?
The (?:[^A-Z]*[A-Z]){3,9}[^A-Z]*$) asserts that at the current position, i.e. the beginning of the string, we are able to match "any number of characters that are not capital letters, followed by a single capital letter", 3 to 9 times. This ensures we have enough capital letters. Note that the {3,9} is greedy, so we will match as many capital letters as possible. But we don't want to match more than we wish to allow, so after the expression quantifies by {3,9}, the lookahead checks that we can match "zero or any number" of characters that are not a capital letter, until the end of the string, marked by the anchor $.
The second lookahead works in similar fashion.
For a more in-depth explanation of this technique, you may want to peruse the password validation section of this page about regex lookarounds.
In case you are interested, here is a token-by-token explanation of the technique.
^ the beginning of the string
(?= look ahead to see if there is:
(?: group, but do not capture (between 3 and 9 times)
[^A-Z]* any character except: 'A' to 'Z' (0 or more times)
[A-Z] any character of: 'A' to 'Z'
){3,9} end of grouping
[^A-Z]* any character except: 'A' to 'Z' (0 or more times)
$ before an optional \n, and the end of the string
) end of look-ahead
(?= look ahead to see if there is:
(?: group, but do not capture (between 5 and 50 times)
[^0-9]* any character except: '0' to '9' (0 or more times)
[0-9] any character of: '0' to '9'
){5,50} end of grouping
[^0-9]* any character except: '0' to '9' (0 or more times)
$ before an optional \n, and the end of the string
) end of look-ahead
[A-Z0-9]* any character of: 'A' to 'Z', '0' to '9' (0 or more times)
$ before an optional \n, and the end of the string
Is this your problem? http://regexr.com/38pn0
If so, you need to anchor the expression to the start and end of the string:
/^[A-Z]{3,9}[0-9]{5,50}$/
See, the result: http://regexr.com/38pmt (no match)
A regular expression in preg_match is given as /server\-([^\-\.\d]+)(\d+)/. Can someone help me understand what this means? I see that the string starts with server- but I dont get ([^\-\.\d]+)(\d+)'
[ ] -> Match anything inside the square brackets for ONE character position once and only once, for example, [12] means match the target to 1 and if that does not match then match the target to 2 while [0123456789] means match to any character in the range 0 to 9.
- -> The - (dash) inside square brackets is the 'range separator' and allows us to define a range, in our example above of [0123456789] we could rewrite it as [0-9].
You can define more than one range inside a list, for example, [0-9A-C] means check for 0 to 9 and A to C (but not a to c).
NOTE: To test for - inside brackets (as a literal) it must come first or last, that is, [-0-9] will test for - and 0 to 9.
^ -> The ^ (circumflex or caret) inside square brackets negates the expression (we will see an alternate use for the circumflex/caret outside square brackets later), for example, [^Ff] means anything except upper or lower case F and [^a-z] means everything except lower case a to z.
You can check more explanations about it in the source I got this information: http://www.zytrax.com/tech/web/regex.htm
And if u want to test, u can try this one: http://gskinner.com/RegExr/
Here's the explanation:
# server\-([^\-\.\d]+)(\d+)
#
# Match the characters “server” literally «server»
# Match the character “-” literally «\-»
# Match the regular expression below and capture its match into backreference number 1 «([^\-\.\d]+)»
# Match a single character NOT present in the list below «[^\-\.\d]+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
# A - character «\-»
# A . character «\.»
# A single digit 0..9 «\d»
# Match the regular expression below and capture its match into backreference number 2 «(\d+)»
# Match a single digit 0..9 «\d+»
# Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
You can use programs such as RegexBuddy if you intend to work with regexes and are willing to spend some funds.
You can also use this free web based explanation utility.
^ means not one of the following characters inside the brackets
\- \. are the - and . characters
\d is a number
[^\-\.\d]+ means on of more of the characters inside the bracket, so one or more of anything not a -, . or a number.
(\d+) one or more number
Here is the explanation given by the perl module YAPE::Regex::Explain
The regular expression:
(?-imsx:server\-([^\-\.\d]+)(\d+))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
server 'server'
----------------------------------------------------------------------
\- '-'
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[^\-\.\d]+ any character except: '\-', '\.', digits
(0-9) (1 or more times (matching the
most amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
( group and capture to \2:
----------------------------------------------------------------------
\d+ digits (0-9) (1 or more times (matching
the most amount possible))
----------------------------------------------------------------------
) end of \2
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
I know this regex divides a text into sentences. Can someone help me understand how?
/(?<!\..)([\?\!\.])\s(?!.\.)/
You can use YAPE::Regex::Explain to decipher Perl regular expressions:
use strict;
use warnings;
use YAPE::Regex::Explain;
my $re = qr/(?<!\..)([\?\!\.])\s(?!.\.)/;
print YAPE::Regex::Explain->new($re)->explain();
__END__
The regular expression:
(?-imsx:(?<!\..)([\?\!\.])\s(?!.\.))
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
(?<! look behind to see if there is not:
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
) end of look-behind
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
[\?\!\.] any character of: '\?', '\!', '\.'
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
(?! look ahead to see if there is not:
----------------------------------------------------------------------
. any character except \n
----------------------------------------------------------------------
\. '.'
----------------------------------------------------------------------
) end of look-ahead
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
There is the Regular Expression Analyzer which will do quite the same as toolic already suggested - but completely webbased.
(? # Find a group (don't capture)
< # before the following regular expression
! # that does not match
\. # a literal "."
. # followed by 1 character
) # (End look-behind group)
( # Start a group (capture it to $1)
[\?\!\.] # Containing any one of the characters in the following set "?!."
) # End group $1
\s # followed by a whitespace character " ", \t, etc.
(? # Followed by a group (don't capture)
# after the preceding regular expression
! # that does not have
. # 1 character
\. # followed by a literal "."
) # (End look-ahead group)
The first part (?<!\..) is a negative look-behind. It specifies a pattern which invalidates the match. In this case it's looking for two characters--the first a period and the other one any character.
The second part is a standard capture/group, which could be better expressed: ([?!.]) (you don't need the escapes in the class brackets), that is a sentence ending punctuation character.
The next part is a single (??) white-space character: \s
And the last part is a negative look-ahead: (?!.\.). Again it is guarding against the case of a single character followed by a period.
This should work, relatively well. But I don't think I would recommend it. I don't see what the coder was getting at trying to make sure that just a period wasn't the second most recent character, or that it wasn't the second one to come.
I mean if you are looking to split on terminal punctuation, why don't you want to guard against the same class being two-back or two-ahead? Instead it relies on periods not being there. Thus a more regular expression would be:
/(?<![?!.].)([?!.])\s(?!.[?!.])/
Portions:
([\?\!\.])\s: split by ending character (.,!,or ?) which is followed by a whitespace character (space, tab, newline)
(?<!\..) where the characters before this 'ending character' arent a .+anything
(?!.\.) after the whitespace character any character directly followed by any . isn't allowed.
Those look-ahead ((?!) & look-behind ((?<!) assertions mainly seem to prevent splitting on (whitespaced?) abbreviations (q. e. d. etc.).