PHP: RegEx Syntax

PHP: RegEx Syntax - php

I am a serious newbie with regular expression so please disregard my mistakes. I need to be sure that several criteria in a string are met.
Requirements:
Have at most 5 words
Max of 256 characters
Word is considered 1 or more characters - no spaces
Shouldn't contain two consecutive spaces
Example:
Tree blows in the wind
1-Tree falls over
Failure Example:
Tree blows in the night sky
Tree breaks 2 limbs during night
Can this be done in one single expression or should it be broken up?
Validating for 2 spaces:
- /^\s\s$/
Max of 256 characters:
- /^[a-zA-Z0-9]{,256}$/
I am not sure how to test case for the 5 words and combine the other criteria that I impose. Can anyone help?
Test for word:
- /^\w{1,5}$

You can try this:
(?s)\A(?!.{257}|.*\s\s)\W*\w*(?:\W+\w+){0,4}\W*\z
pattern details:
(?s) # turn on the singleline mode: allow the dot to match newlines
\A # start of the string anchor
(?! # open a negative lookahead assertion: means not followed by
.{257} # 257 characters
| # OR
.*\s\s # two consecutive whitespaces
) # close the negative lookahead
\W* # optional non-word characters
\w* # optional word characters (nothing in your requirements forbids to have a string without words or an empty string)
(?: # open a non-capturing group
\W+ # non-word characters: words are obviously separated with non-word characters
\w+ # an other word
){0,4} # repeat the non-capturing group between zero and 4 times
\W* # optional non-word characters
\z # anchor for the end of the string

Related

Match regular expression specific character quantities in any order

I need to match a series of strings that:
Contain at least 3 numbers
0 or more letters
0 or 1 - (not more)
0 or 1 \ (not more)
These characters can be in any position in the string.
The regular expression I have so far is:
([A-Z0-9]*[0-9]{3,}[\/]?[\-]?[0-9]*[A-Z]*)
This matches the following data in the following cases. The only one that does not match is the first one:
02ABU-D9435
013DFC
1123451
03323456782
ADS7124536768
03SDFA9433/0
03SDFA9433/
03SDFA9433/1
A41B03423523
O4AGFC4430
I think perhaps I am being too prescriptive about positioning. How can I update this regex to match all possibilities?
PHP PCRE
The following would not match:
01/01/2018 [multiple / or -]
AA-AA [no numbers]
Thanks

One option could be using lookaheads to assert 3 digits, not 2 backslashes and not 2 times a hyphen.
(?<!\S)(?=(?:[^\d\s]*\d){3})(?!(?:[^\s-]*-){2})(?!(?:[^\s\\]*\\){2})[A-Z0-9/\\-]+(?!\S)
About the pattern
(?<!\S) Assert what is on the left is not a non whitespace char
(?=(?:[^\d\s]*\d){3}) Assert wat is on the right is 3 times a whitespace char or digit
(?!(?:[^\s-]*-){2}) Assert what is on the right is not 2 times a whitespace char a hyphen
(?!(?:[^\s\\]*\\){2}) Assert what is on the right is not 2 times a whitespace char a backslash
[A-Z0-9/\\-]+ Match any of the listed 1+ times
(?!\S) Assert what is on the right is not a non whitespace char
Regex demo

Your patterns can be checked with positive/negative lookaheads anchored at the start of the string:
at least 3 digits -> find (not necessarily consecutive) 3 digits
no more than 1 '-' -> assert absence of (not necessarily consecutive) 2 '-' characters
no more than 1 '/' -> assert absence of (not necessarily consecutive) 2 '/' characters
0 or more letters -> no check needed.
If these conditions are met, any content is permitted.
The regex implementing this:
^(?=(([^0-9\r\n]*\d){3}))(?!(.*-){2})(?!(.*\/){2}).*$
Check out this Regex101 demo.
Remark
This solution assumes that each string tested resides on its own line, ie. not just being separated by whitespace.
In case the strings are separated by whitespace, choose the solution of user #TheFourthBird (which essentially is the same as this one but caters for the whitespace separation)

You can test the condition for both the hyphen and the slash into a same lookahead using a capture group and a backreference:
~\A(?!.*([-/]).*\1)(?:[A-Z/-]*\d){3,}[A-Z/-]*\z~
demo
detailled:
~ # using the tild as pattern delimiter avoids to escape all slashes in the pattern
\A # start of the string
(?! .* ([-/]) .* \1 ) # negative lookahead:
# check that there's no more than one hyphen and one slash
(?: [A-Z/-]* \d ){3,} # at least 3 digits
[A-Z/-]* # eventual other characters until the end of the string
\z # end of the string.
~
To better understand (if you are not familiar with): these three subpatterns start from the same position (in this case the beginning of the string):
\A
(?! .* ([-/]) .* \1 )
(?: [A-Z/-]* \d ){3,}
This is possible only because the two first are zero-width assertions that are simple tests and don't consume any character.

Building a complex regex with "conditions"

I'm trying to build a complex regex with the following constraints:
1. My string can only be composed of:
"Regular" alphanumeric characters : a-zA-Z0-9
4 specials characters : space . _ -
2. Length has to be between 3 and 25
So far it's quite easy but then it gets complicated :
3. There cannot be 2 consecutive special characters, unless the 1st one is a space and the 2nd one isn't a space. Logical consequence : there cannot be 3 consecutive special characters
4 The string cannot start or end with a space
I'm especially struggling with 3.
Any help/hint would be much appreciated.
Examples:
" lkjsdi1SD" => FALSE (starts with a space)
"-lkjsdi1SD" => TRUE
"lkjsd -i1SD " => FALSE (ends with a space)
".Dg5 -lkjsdi1SD" => TRUE
"jhv5675gjjvghHJHvg655775vfFVHFJFf445576JHFFfhd12" => FALSE (too long)
"jhv 12" => FALSE (two consecutive spaces)
"as" => FALSE (too short)
"a r" => TRUE

I suggest using:
^ # Start of string
(?=.{3,25}$) # The total string length is from 3 to 25
[._-]? # An optional . _ or - (? means "match 1 or 0 times")
[a-zA-Z0-9]+ # one or more alphanumeric symbols
(?: # Zero or more sequences of:
(?:[._-]|[ ][._-]?) # one . _ or - OR a space followed with an optional . _ or -
[a-zA-Z0-9]+ # one or more alphanumerics
)* # (here * defines zero or more times)
[._-]? # one optional . _ or -
$ # End of string
See the inline description for each part (I used /x VERBOSE (or free-space) modifier to enable comments that is helpful to keep long patterns readable).
See the regex demo
More pattern details
^ - start of string anchor, the regex engine will only look for the whole pattern at the string start. Thus, if there is a space at the start, no match will be returned as [a-zA-Z0-9]+, the first obligatory subpattern, requires an alphanumeric, and [._-]? (a character class that matches one or zero ., _, or - (the ? is a quantifier matching one or zero occurrences of the quantified subpattern) only allows 1 of these 3 characters before the first alphanumeric.
(?=.{3,25}$) is a positive lookahead anchored at the start, that requires at least 3 and at most 25 characters other than a newline (. matches any char other than a LF if /s modifier is not defined) from start till end ($ is the string end anchor that matches at the end of string or before the final char that is a newline character, replace with \z if you want to disallow matching a string with a newline symbol at the end).
The {3,25} is a limiting quantifier that allows matching min to max amount of characters conforming to the subpattern quantified. Note that a lookahead does not consume the text, i.e. the regex engine returns to the place where it starts matching the lookahead pattern with the true or false result, and if true, goes on matching the rest of the pattern.
[._-]? - an optional single char, one of the defined chars in the character class (see explanation above)
[a-zA-Z0-9]+ - one or more (I wrote "1+") characters (the + quantifier matches 1 or more occurrences) that are in the ranges defined in the character class.
(?:(?:[._-]|[ ][._-]?)[a-zA-Z0-9]+)* - is a non-capturing group used only for grouping subpatterns (to match them consecutively) that can match one or more (as the * stands after it) sequences of (?:[._-]|[ ][._-]?)[a-zA-Z0-9]+:
(?:[._-]|[ ][._-]?) - either a ., _, or -, OR (due to the | alternation operator) the space (I put the space into a character class [ ] because I used the /x VERBOSE modifier to introduce newline formatting and comments into the pattern, you may use a regular space if you do not use the /x modifier) followed with ., _, or -.
[a-zA-Z0-9]+ - 1 or more (due to +) alphanumerics.

Try using this:
^(?:[a-zA-Z0-9]|[._-](?![ ._-]))(?:[a-zA-Z0-9 ]|[._-](?![ ._-])){1,23}[a-zA-Z0-9._-]$
The part [._-](?![ ._-]) means "match [._-] if it's not followed by [ ._-].
In general you can look into lookarounds

Regexp for checking a-z‚ A-Z‚ 0-9, -, _, but no more than 5 numbers

Can someone tell me what the syntax for a regex would be that would only allow the following characters:
a-z
A-Z
0-9
dash
underscore
Additionally the string cannot contain more than 5 numbers.
Thank you in advance for the help!

The regex you need is
^[a-zA-Z0-9_-]{0,5}$
It matches any combination of the characters up to five characters.

several possibilities:
~\A(?:[a-z_-]*[0-9]){0,5}[a-z_-]*\z(?<=.)~i
or
~\A(?!(?:.*[0-9]){6})[\w-]+\z~
The two patterns assumes that the empty string is not allowed.
First pattern:
~ # pattern delimiter
\A # anchor for the start of the string
(?:[a-z_-]*[0-9]){0,5} # repeat this group between 0 or 5 times (so 5 digits max)
[a-z_-]* # zero or more allowed characters
\z # end of the string
(?<=.) # lookbehind that checks there is at least one character
~
i # make the pattern case insensitive
second pattern:
~
\A
(?! # negative lookahead that checks there is not
(?:.*[0-9]){6} # 6 digits in the string
)
[\w-]+
\z
~

Find words without repeated characters using php regex

I have an initial string of words, like:
abab sbs abc ffuuu qwerty uii onnl ghj
And I would like to be able to extract only the words that do not contain adjacently-repeating characters, so that the above string is returned as:
abc qwerty ghj
How to accomplish this task using Regular Expressions?

I guess the post is open again after a little rewording of the question.
This is moved from the comments, to the answer region.
A while ago I saw this style problem on a question about no duplicate characters
that encompased the entire string. I just translated it to word boundries.
#Michael J Mulligan did a test case for it (see comments).
The links:
'Working Regex test (regex101.com/r/bA2wB0/1) and a working PHP example (ideone.com/7ID8Ct)'
# For NO duplicate letters anywhere within word characters
# -----------------------------------------------------------
# \b(?!\w*(\w)\w*\1)\w+
\b # Word boundry
# Only word chars now
(?! # Lookahead assertion (like a true/false conditional)
# It doesn't matter if the assertion is negative or positive.
# In this section, the engine is forced to match if it can,
# it has no choice, it can't backtrack its way out of here.
\w*
( \w ) # (1), Pick a word char, any word char
\w*
\1 # Now it is here again
# Ok, the expression matched, time to check if the assertion is correct.
) # End assertion
\w+ # Its here now, looks like the assertion let us through
# The assert is that no duplicate word chars ahead,
# so free to match word chars 'en masse'
# For ONLY duplicate letters anywhere within word characters
# just do the inverse. In this case, the inverse is changing
# the lookahead assertion to positive (want duplicates).
# -----------------------------------------------------------
# \b(?=\w*(\w)\w*\1)\w+

Regex to match a slug?

I'm having trouble creating a Regex to match URL slugs (basically, alphanumeric "words" separated by single dashes)
this-is-an-example
I've come up with this Regex: /[a-z0-9\-]+$/ and while it restricts the string to only alphanumerical characters and dashes, it still produces some false positives like these:
-example
example-
this-----is---an--example
-
I'm quite bad with regular expressions, so any help would be appreciated.

You can use this:
/^
[a-z0-9]+ # One or more repetition of given characters
(?: # A non-capture group.
- # A hyphen
[a-z0-9]+ # One or more repetition of given characters
)* # Zero or more repetition of previous group
$/
This will match:
A sequence of alphanumeric characters at the beginning.
Then it will match a hyphen, then a sequence of alphanumeric characters, 0 or more times.

A more comprehensive regex that will match both ascii and non-ascii characters in slugs would be,
/^ # start of string
[^\s!?\/.*#|] # exclude spaces/tabs/line feed.. as well as reserved characters !?/.*#
+ # match one or more times
$/ # end of string
for good measure we exclude the reserved URL characters.
so for example, the above will match
une-ecole_123-soleil
une-école_123-soleil
une-%C3%A9cole-123_soleil

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP: RegEx Syntax - php

Related

Match regular expression specific character quantities in any order

Building a complex regex with "conditions"

Regexp for checking a-z‚ A-Z‚ 0-9, -, _, but no more than 5 numbers

Find words without repeated characters using php regex

Regex to match a slug?

Categories

Resources