PHP Regex: Remove words not equal exactly 3 characters - php

An excellent "very close" answer at
Remove words less than 3 chars
with DEMO
where Regex
\b([a-z]{1,2})\b
removes all words less than 3 chars.
But how to reset this demo vise versa ? to remove all words NOT EXACTLY EQUAL 3 chars ?
We can catch word where exactly 3 chars by
\b([a-z]{3})\b
but how to tell regex - remove all other words what are NOT equal to 3 ?
So in regex demo ref above should leave only word 'and'

Use alternatives to match either 1-2 or 4+ letters.
\b(?:[a-z]{1,2}|[a-z]{4,})\b

Another variation with a negative lookbehind asserting not 3 chars to the left
\b[a-z]+\b(?<!\b[a-z][a-z][a-z]\b)
Regex demo
Or with a skip fail approach for 3 chars a-z:
\b[a-z]{3}\b(*SKIP)(*F)|\b[a-z]+\b
Regex demo

I think maybe:
\b(?![a-z]{3}\b)[a-z]+\b
Matching:
\b - A word-boundary.
(?![a-z]{3}\b) - A negative lookahead to avoid three-letter words.
[a-z]+\b - Any 1+ letter-words (greedy) us to a word boundary.
Another trick is to use a capture group to match what you want:
\b(?:[a-z]{3}|([a-z]+))\b
\b - A word-boundary
(?:[a-z]{3}|([a-z]+)) - A nested capture group inside alternation to first neglect three alpha chars and capture any 1+ words (greedy).
\b - A word-boundary

With an optional group of letters with at least 2 characters and a possessive quantifier:
\b[a-z]{1,2}+(?:[a-z]{2,})?\b
demo
This approach is based on a calculation trick and on backtracking.
In other words: 2 + x = 3 with x > 1 has no solution.
If I had written \b[a-z]{1,2}(?:[a-z]{2,})?\b (with or without the last \b it isn't important), when the regex engine reaches the position at the start of a three letters word [a-z]{1,2} would have consumed the two first letters, but as an extra character is needed for the last word boundary to succeed, the regex engine doesn't have an other choice to backtrack the {1,2} quantifier. With one backtracking step, the [a-z]{1,2} would have consumed only one character and (?:[a-z]{2,})?\b could have succeeded. But by making this quantifier possessive I forbid this backtracking step. Since, for a three letters word, [a-z]{1,2}+ takes 2 characters and [a-z]{2,} needs at least 2 letters, the pattern fails.
Use the word boundary and force to fail with the possessive quantifier:
\b(?:[a-z]{3}\b)?+[a-z]+
demo
This one plays also with an impossible assertion: three letters followed by a word boundary, can't be followed by a letter.
One more time, with a three letter words, once the three letters are consumed by [a-z]{3}, the possessive quantifier ?+ forbids to backtrack and [a-z]+ makes the pattern fail.
Force to fail with 3 letters and skip them using a backtracking control verb:
\b[a-z]{3}\b(*SKIP)^|[a-z]+
demo

Related

Regex repeating letters

How can I not allow a user to enter a word with repeating letters I already have the case for special characters?
I have tried this and it works for the special characters allowed in the text.
^(?!.*([ \-])\1)\w[a-zA-z0-9 \-]*$
3 My Address--
Will not work (--)
This is what I am trying to do for the letters (?!.*([a-z])\1{4}) but it does not work it breaks the regex.
(?!.*([ \-])\1)(?!.*([a-z])\1{4})\w[a-zA-z0-9 \-]*$
It should prevent any repeating letters when they have been entered 4 times in a row for example this is for a address and as it stand I can enter.
3 My Adddddddddd
You need to use \2 backreference in the second lookahead, and mind using [a-zA-Z], not [a-zA-z] in the consuming part:
^(?!.*([ -])\1)(?!.*([A-Za-z])\2{3})\w[a-zA-Z0-9 -]*$
See the regex demo.
The first capturing group is ([ -]) in the first lookahead, the second lookahead contains the second group, thus, \2 is necessary.
As you want to filter out matches with at least 4 identical consecutive letters, you need ([A-Za-z])\2{3}, not {4}.
Also, if you plan to match a digit at the beginning, consider replacing \w with \d.
Regex details
^ - start of string
(?!.*([ -])\1) - no two identical consecutive spaces or hyphens allowed in the string
(?!.*([A-Za-z])\2{3}) - no four identical consecutive letters allowed in the string
\w - the first char should be a letter, digit or _
[a-zA-Z0-9 -]* - 0+ letters, digits, spaces or hyphens
$ - end of string.

Password Regular expression with four criteria

I am trying to write a regular expression in PHP to ensure a password matches a criteria which is:
It should atleast 8 characters long
It should include at least one special character
It should include at least one capital letter.
I have written the following expression:
$pattern=([a-zA-Z\W+0-9]{8,})
However, it doesn't seem to work as per the listed criteria. Could I get another pair of eyes to aid me please?
Your regex - ([a-zA-Z\W+0-9]{8,}) - actually searches for a substring in a larger text that is at least 8 characters long, but also allowing any English letters, non-word characters (other than [a-zA-Z0-9_]), and digits, so it does not enforce 2 of your requirements. They can be set with look-aheads.
Here is a fixed regex:
^(?=.*\W.*)(?=.*[A-Z].*).{8,}$
Actually, you can replace [A-Z] with \p{Lu} if you want to also match/allow non-English letters. You can also consider using \p{S} instead of \W, or further precise your criterion of a special character by adding symbols or character classes, e.g. [\p{P}\p{S}] (this will also include all Unicode punctuation).
An enhanced regex version:
^(?=.*[\p{S}\p{P}].*)(?=.*\p{Lu}.*).{8,}$
A human-readable explanation:
^ - Beginning of a string
(?=.*\W.*) - Requirement to have at least 1 non-word character
OR (?=.*[\p{S}\p{P}].*) - At least 1 Unicode special or punctuation symbol
(?=.*[A-Z].*) - Requirement to have at least 1 uppercase English letter
OR (?=.*\p{Lu}.*) - At least 1 Unicode letter
.{8,} - Requirement to have at least 8 symbols
$ - End of string
See Demo 1 and Demo 2 (Enhanced regex)
Sample code:
if (preg_match('/^(?=.*\W.*)(?=.*[A-Z].*).{8,}$/u', $header)) {
// PASS
}
else {
# FAIL
}
Using positive lookahead ?= we make sure that all password requirements are met.
Requirements for strong password:
At least 8 chars long
At least 1 Capital Letter
At least 1 Special Character
Regex:
^((?=[\S]{8})(?:.*)(?=[A-Z]{1})(?:.*)(?=[\p{S}])(?:.*))$
PHP implementation:
if (preg_match('/^((?=[\S]{8})(?:.*)(?=[A-Z]{1})(?:.*)(?=[\p{S}])(?:.*))$/u', $password)) {
# Strong Password
} else {
# Weak Password
}
Examples:
12345678 - WEAK
1234%fff - WEAK
1234_44A - WEAK
133333A$ - STRONG
Regex Explanation:
^ assert position at start of the string
1st Capturing group ((?=[\S]{8})(?:.*)(?=[A-Z]{1})(?:.*)(?=[\p{S}])(?:.*))
(?=[\S]{8}) Positive Lookahead - Assert that the regex below can be matched
[\S]{8} match a single character present in the list below
Quantifier: {8} Exactly 8 times
\S match any kind of visible character [\P{Z}\H\V]
(?:.*) Non-capturing group
.* matches any character (except newline) [unicode]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
(?=[A-Z]{1}) Positive Lookahead - Assert that the regex below can be matched
[A-Z]{1} match a single character present in the list below
Quantifier: {1} Exactly 1 time (meaningless quantifier)
A-Z a single character in the range between A and Z (case sensitive)
(?:.*) Non-capturing group
.* matches any character (except newline) [unicode]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
(?=[\p{S}]) Positive Lookahead - Assert that the regex below can be matched
[\p{S}] match a single character present in the list below
\p{S} matches math symbols, currency signs, dingbats, box-drawing characters, etc
(?:.*) Non-capturing group
.* matches any character (except newline) [unicode]
Quantifier: * Between zero and unlimited times, as many times as possible, giving back as needed [greedy]
$ assert position at end of the string
u modifier: unicode: Pattern strings are treated as UTF-16. Also causes escape sequences to match unicode characters
Demo:
https://regex101.com/r/hE2dD2/1

REGEX - match words that contain letters repeating next to each other

im looking for a regex that matches words that repeat a letter(s) more than once and that are next to each other.
Here's an example:
This is an exxxmaple oooonnnnllllyyyyy!
By far I havent found anything that can exactly match:
exxxmaple and oooonnnnllllyyyyy
I need to find it and place them in an array, like this:
preg_match_all('/\b(???)\b/', $str, $arr) );
Can somebody explain what regexp i have to use?
You can use a very simple regex like
\S*(\w)(?=\1+)\S*
See how the regex matches at http://regex101.com/r/rF3pR7/3
\S matches anything other than a space
* quantifier, zero or more occurance of \S
(\w) matches a single character, captures in \1
(?=\1+) postive look ahead. Asserts that the captrued character is followed by itsef \1
+ quantifiers, one or more occurence of the repeated character
\S* matches anything other than space
EDIT
If the repeating must be more than once, a slight modification of the regex would do the trick
\S*(\w)(?=\1{2,})\S*
for example http://regex101.com/r/rF3pR7/5
Use this if you want discard words like apple etc .
\b\w*(\w)(?=\1\1+)\w*\b
or
\b(?=[^\s]*(\w)\1\1+)\w+\b
Try this.See demo.
http://regex101.com/r/kP8uF5/20
http://regex101.com/r/kP8uF5/21
You can use this pattern:
\b\w*?(\w)\1{2}\w*
The \w class and the word-boundary \b limit the search to words. Note that the word boundary can be removed, however, it reduces the number of steps to obtain a match (as the lazy quantifier). Note too, that if you are looking for words (in the common meaning), you need to remove the word boundary and to use [a-zA-Z] instead of \w.
(\w)\1{2} checks if a repeated character is present. A word character is captured in group 1 and must be followed with the content of the capture group (the backreference \1).

php regex - find uppercase string with number and spaces in text

I want to write php regular expression to find uppercase string , which can also contain one number and spaces, from text.
For example from this text "some text to contain EXAM PL E 7STRING uppercase word" I want to get string- EXAM PL E 7STRING ,
found string should start and end only with uppercase, but in the middle, without uppercase letters can also contain(but not necessarily ) one number and spaces. So, regex should match any of these patterns
1) EXAMPLESTRING - just uppercase string
2) EXAMP4LESTRING - with number
3) EXAMPLES TRING - with space
4) EXAM PL E STRING - with more than one spaces
5) EXAMP LE4STRING - with number and space
6) EXAMP LE 4ST RI NG - with number and spaces
and with total length string should be equal or more than 4 letters
I wrote this regex '/[A-Z]{1,}([A-Z\s]{2,}|\d?)[A-Z]{1,}/', that can find first 4 patterns, but I can not figure it out to match also the last 2 patterns.
Thanks
There is a neat trick called a lookahead. It just checks what is following after the current position. That can be used to check for multiple conditions:
'/(?<![A-Z])(?=(?:[A-Z][\s\d]*){3}[A-Z])(?!(?:[A-Z\s]*\d){2})[A-Z][A-Z\s\d]*[A-Z]/'
The first lookaround is actually a lookbehind and checks that there is no previous uppercase letter. This is just a little speedup for strings that would fail the match anyway. The second lookaround (a lookahead) checks that there are at least four letters. The third one checks that there are no two digits. The rest just matches then a string of the allowed characters, starting and ending with an uppercase letter.
Note that in the case of two digits this will not match at all (instead of matching everything up to the second digit). If you do want to match in such a case, you could incorporate the "1 digit" rule into the actual match instead:
'/(?<![A-Z])(?=(?:[A-Z][\s\d]*){3}[A-Z])[A-Z][A-Z\s]*\d?[A-Z\s]*[A-Z]/'
EDIT:
As Ωmega pointed out, this will cause problems if there are less then four letters before the second digit, but more after that. This is actually quite tough, because the assertion needs to be, that there are more than 4 letters before the second digit. Since we do not know where the first digit occurs in those four letters, we have to check for all possible positions. For this I would do away with the lookaheads altogether, and simply provide the three different alternatives. (I will keep the lookbehind as an optimization for non-matching parts.)
'/(?<![A-Z])[A-Z]\s*(?:\d\s*[A-Z]\s*[A-Z]|[A-Z]\s*\d\s*[A-Z]|[A-Z]\s*[A-Z][A-Z\s]*\d?)[A-Z\s]*[A-Z]/'
Or here with added comments:
'/
(?<! # negative lookbehind
[A-Z] # current position is not preceded by a letter
) # end of lookbehind
[A-Z] # match has to start with uppercase letter
\s* # optional spaces after first letter
(?: # subpattern for possible digit positions
\d\s*[A-Z]\s*[A-Z]
# digit comes after first letter, we need two more letters before last one
| # OR
[A-Z]\s*\d\s*[A-Z]
# digit comes after second letter, we need one more letter before last one
| # OR
[A-Z]\s*[A-Z][A-Z\s]*\d?
# digit comes after third letter, or later, or not at all
) # end of subpattern for possible digit positions
[A-Z\s]* # arbitrary amount of further letters and whitespace
[A-Z] # match has to end with uppercase letter
/x'
That gives the same result on Ωmega's lengthy test input.
I suggest to use regex pattern
[A-Z][ ]*(\d)?(?(1)(?:[ ]*[A-Z]){3,}|[A-Z][ ]*(\d)?(?(2)(?:[ ]*[A-Z]){2,}|[A-Z][ ]*(\d)?(?(3)(?:[ ]*[A-Z]){2,}|[A-Z][ ]*(?:\d|(?:[ ]*[A-Z])+[ ]*\d?))))(?:[ ]*[A-Z])*
(see this demo).
[A-Z][ ]*(?:\d(?:[ ]*[A-Z]){2}|[A-Z][ ]*\d[ ]*[A-Z]|(?:[A-Z][ ]*){2,}\d?)[A-Z ]*[A-Z]
(see this demo)

Regex: how to match an word that doesn't end with a specific character

I would like to match the whole "word"—one that starts with a number character and that may include special characters but does not end with a '%'.
Match these:
112 (whole numbers)
10-12 (ranges)
11/2 (fractions)
11.2 (decimal numbers)
1,200 (thousand separator)
but not
12% (percentages)
A38 (words starting with a alphabetic character)
I've tried these regular expressions:
(\b\p{N}\S)*)
but that returns '12%' in '12%'
(\b\p{N}(?:(?!%)\S)*)
but that returns '12' in '12%'
Can I make an exception to the \S term that disregards %?
Or will have to do something else?
I'll be using it in PHP, but just write as you would like and I'll convert it to PHP.
This matches your specification:
\b\p{N}\S*+(?<!%)
Explanation:
\b # Start of number
\p{N} # One Digit
\S*+ # Any number of non-space characters, match possessively
(?<!%) # Last character must not be a %
The possessive quantifier \S*+ makes sure that the regex engine will not backtrack into a string of non-space characters it has already matched. Therefore, it will not "give back" a % to match 12 within 12%.
Of course, that will also match 1!abc, so you might want to be more specific than \S which matches anything that's not a whitespace character.
Can i make an exception to the \S term that disregards %
Yes you can:
[^%\s]
See this expression \b\d[^%\s]* here on Regexr
\d+([-/\.,]\d+)?(?!%)
Explanation:
\d+ one or more digits
(
[-/\.,] one "-", "/", "." or ","
\d+ one or more digits
)? the group above zero or one times
(?!%) not followed by a "%" (negative lookahead)
KISS (restrictive):
/[0-9][0-9.,-/]*\s/
try this one
preg_match("/^[0-9].*[^%]$/", $string);
Try this PCRE regex:
/^(\d[^%]+)$/
It should give you what you need.
I would suggest just:
(\b[\p{N},.-]++(?!%))
That's not very exact regarding decimal delimiters or ranges. (As example). But the ++ possessive quantifier will eat up as many decimals as it can. So that you really just need to check the following character with a simple assertion. Did work for your examples.

Categories