Regex to match a-z spaces, character limit, only first letter capitalized - php

I need some help with a regex. I'm terrible at this.
Rules:
Only letters a through z and spaces
Minimum 2 letters
Maximum 30 letters
Each word must be at least 2 letters
Only the first letter of each word may be capital but the first letter must always be capital
My attempt:
^[A-Z][a-z]{2,30}$
I'm using this in PHP.

Okay, so let's try solving requirements 1 to 3 first. If you mean 2 to 30 characters it's as simple as this:
^[a-zA-Z ]{2,30}$
Now for the other requirements. Let's handle those alone. Point 4 requires each word to be of the form [a-zA-Z][a-z]*. To make sure that each word has at least two letters, we can simply turn the * into a + (which means 1 or more repetitions). If we insert explicit spaces around these, that makes sure that the [a-z]+ cannot be followed directly by a capital letter:
^[A-Z][a-z]+(?:[ ]+[a-zA-Z][a-z]+)*$
Note that I treated the first word separately.
Finally, how do we combine the two? By putting one into a lookahead. I'm going for the counting here:
^(?=[a-zA-Z ]{2,30}$)[A-Z][a-z]+(?:[ ]+[a-zA-Z][a-z]+)*$
This works because, after the input is checked against the lookahead the engine resets it "cursor" to where it started (the beginning of the string) and continues matching as usual. This way we can run two passes over the input, checking for independent conditions.
Finally, note that the lookahead requirement simply translates to the string's length. In such a case it would be easier (and most often better) to check this separately:
$len = strlen($input)
if ($len < 2 || $len > 30)
// report error about string length
else if (!preg_match('/^[A-Z][a-z]+(?:[ ]+[a-zA-Z][a-z]+)*$/', $input))
// report error about pattern
else
// process input
This makes it much easier to give sensible error messages depending on which condition was violated.

Let's try this:
^[A-Z]((?<= )[A-Z]|[a-z ]){2,29}$
[A-Z] -- a capital letter
(
(?<= )[A-Z] -- either a capital letter preceded by a space
| -- or
[a-z ] -- a lowercase letter or a space
){2,29} -- 2 to 29 times (plus the initial capital)
You will need to use the PCRE (not ereg_*) for the lookbehind to work.
"My name Is bob"
↑ ↑ ↑
| | \-- this is a "(?<= )[A-Z]"
| \--- this is a "[a-z]"
\---- this is a "[ ]"
"naMe"
↑
\-- this is NOT a "(?<= )[A-Z]" (a character before the [A-Z] is not a space)
EDIT: damn, you added the "Each word must be at least 2 letters". Use m.buettner's.

Related

PHP Pattern Validation

I'm having a bit of trouble getting my pattern to validate the string entry correctly. The PHP portion of this assignment is working correctly, so I won't include that here as to make this easier to read. Can someone tell me why this pattern isn't matching what I'm trying to do?
This pattern has these validation requirements:
Should first have 3-6 lowercase letters
This is immediately followed by either a hyphen or a space
Followed by 1-3 digits
$codecheck = '/^([[:lower:]]{3,6}-)|([[:lower:]]{3,6} ?)\d{1,3}$/';
Currently this catches most of the requirements, but it only seems to validate the minimum character requirements - and doesn't return false when more than 6 or 3 characters (respectively) are entered.
Thanks in advance for any assistance!
The problem here lies in how you group the alternatives. Right now, the regex matches a string that
^([[:lower:]]{3,6}-) - starts with 3-6 lowercase letters followed with a hyphen
| - or
([[:lower:]]{3,6} ?)\d{1,3}$ - ends with 3-6 lowercase letters followed with an optional space and followed with 1-3 digits.
In fact, you can get rid of the alternation altogether:
$codecheck = '/^\p{Ll}{3,6}[- ]\d{1,3}$/';
See the regex demo
Explanation:
^ - start of string
\p{Ll}{3,6} - 3-6 lowercase letters
[- ] - a positive character class matching one character, either a hyphen or a space
\d{1,3} - 1-3 digits
$ - end of string
You need to delimit the scope of the | operator in the middle of your regex.
As it is now:
the right-side argument of that OR runs up until the very end of your regex, even including the $. So the digits, nor the end-of-string condition do not apply for the left side of the |.
the left-side argument of the OR starts with ^, and only applies to the left side.
That is why you get a match when you supply 7 lowercase characters. The first character is ignored, and the rest matches with the right-side of the regex pattern.

PHP Regex - Minimum of 8 characters, 1 letter, 1 number

So I thought I had this right?
if(!preg_match('^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).+.{7,}$', $passOne)) {
$msg = "Password does not contain at least 1 number/letter, 8 character minimum requirement.";
}
I test it over at https://regex101.com/ and put ^(?=.*[a-z])(?=.*[A-Z])(?=.*\d).+.{7,}$ to work and things like RubyGlue12 pass and is a match and other things that aren't.
But no matter what, I cannot make any match in the actual PHP code. Even if the POST is a variable manually.
edit: $_POST['password'] is $passOne
Help?
You have .+.{7,} that does not make much sense since it means match any characters 1 or more times, and then any characters 7 or more times.
1 letter, 1 digit and min 8 characters regex will look like
(?i)^(?=.*[a-z])(?=.*\d).{8,}$
Regex explanation:
(?i) - Turning case sensitivity off (unless you need to check for 1 uppercase and 1 lowercase letters - then, remove the i flag and replace (?=.*[a-z]) with (?=.*[a-z])(?=.*[A-Z]))
^ - Start of string
(?=.*[a-z]) - Positive look-ahead requirement to have at least 1 letter
(?=.*\d) - Positive look-ahead requirement to have at least 1 digit
.{8,} - At least 8 characters
$ - End of string.
And PHP code is:
$re = "/^(?=.*[a-z])(?=.*\\d).{8,}$/i";
$passOne = "RubyGlue12";
//$passOne = "eee"; // uncomment to echo the error message
if(!preg_match($re, $passOne)) {
echo "Password does not contain at least 1 number/letter, 8 character minimum requirement.";
}
And yes, with preg_match function, you must use some regex delimiters. I am using / here.
Here's a regex that tests for what the password should not be, instead of what it should be.
/^(.{0,7}|[^a-z]*|[^\d]*)$/i
Example:
if (preg_match('/^(.{0,7}|[^a-z]*|[^\d]*)$/i', $passOne)) {
echo "Validation failed.\n";
}
Explanation:
There are essentially 3 separate tests within the regex (each separated by a |, and each are case-insensitive due to the i option at the end). If it passes any of the tests, then the entire validation fails.
Test 1: Does the entire string only contain 0-7 characters? Fail.
Test 2: Does the entire string contain no alpha characters? Fail.
Test 3: Does the entire string contain no digits? Fail.

php regex - find uppercase string with number and spaces in text

I want to write php regular expression to find uppercase string , which can also contain one number and spaces, from text.
For example from this text "some text to contain EXAM PL E 7STRING uppercase word" I want to get string- EXAM PL E 7STRING ,
found string should start and end only with uppercase, but in the middle, without uppercase letters can also contain(but not necessarily ) one number and spaces. So, regex should match any of these patterns
1) EXAMPLESTRING - just uppercase string
2) EXAMP4LESTRING - with number
3) EXAMPLES TRING - with space
4) EXAM PL E STRING - with more than one spaces
5) EXAMP LE4STRING - with number and space
6) EXAMP LE 4ST RI NG - with number and spaces
and with total length string should be equal or more than 4 letters
I wrote this regex '/[A-Z]{1,}([A-Z\s]{2,}|\d?)[A-Z]{1,}/', that can find first 4 patterns, but I can not figure it out to match also the last 2 patterns.
Thanks
There is a neat trick called a lookahead. It just checks what is following after the current position. That can be used to check for multiple conditions:
'/(?<![A-Z])(?=(?:[A-Z][\s\d]*){3}[A-Z])(?!(?:[A-Z\s]*\d){2})[A-Z][A-Z\s\d]*[A-Z]/'
The first lookaround is actually a lookbehind and checks that there is no previous uppercase letter. This is just a little speedup for strings that would fail the match anyway. The second lookaround (a lookahead) checks that there are at least four letters. The third one checks that there are no two digits. The rest just matches then a string of the allowed characters, starting and ending with an uppercase letter.
Note that in the case of two digits this will not match at all (instead of matching everything up to the second digit). If you do want to match in such a case, you could incorporate the "1 digit" rule into the actual match instead:
'/(?<![A-Z])(?=(?:[A-Z][\s\d]*){3}[A-Z])[A-Z][A-Z\s]*\d?[A-Z\s]*[A-Z]/'
EDIT:
As Ωmega pointed out, this will cause problems if there are less then four letters before the second digit, but more after that. This is actually quite tough, because the assertion needs to be, that there are more than 4 letters before the second digit. Since we do not know where the first digit occurs in those four letters, we have to check for all possible positions. For this I would do away with the lookaheads altogether, and simply provide the three different alternatives. (I will keep the lookbehind as an optimization for non-matching parts.)
'/(?<![A-Z])[A-Z]\s*(?:\d\s*[A-Z]\s*[A-Z]|[A-Z]\s*\d\s*[A-Z]|[A-Z]\s*[A-Z][A-Z\s]*\d?)[A-Z\s]*[A-Z]/'
Or here with added comments:
'/
(?<! # negative lookbehind
[A-Z] # current position is not preceded by a letter
) # end of lookbehind
[A-Z] # match has to start with uppercase letter
\s* # optional spaces after first letter
(?: # subpattern for possible digit positions
\d\s*[A-Z]\s*[A-Z]
# digit comes after first letter, we need two more letters before last one
| # OR
[A-Z]\s*\d\s*[A-Z]
# digit comes after second letter, we need one more letter before last one
| # OR
[A-Z]\s*[A-Z][A-Z\s]*\d?
# digit comes after third letter, or later, or not at all
) # end of subpattern for possible digit positions
[A-Z\s]* # arbitrary amount of further letters and whitespace
[A-Z] # match has to end with uppercase letter
/x'
That gives the same result on Ωmega's lengthy test input.
I suggest to use regex pattern
[A-Z][ ]*(\d)?(?(1)(?:[ ]*[A-Z]){3,}|[A-Z][ ]*(\d)?(?(2)(?:[ ]*[A-Z]){2,}|[A-Z][ ]*(\d)?(?(3)(?:[ ]*[A-Z]){2,}|[A-Z][ ]*(?:\d|(?:[ ]*[A-Z])+[ ]*\d?))))(?:[ ]*[A-Z])*
(see this demo).
[A-Z][ ]*(?:\d(?:[ ]*[A-Z]){2}|[A-Z][ ]*\d[ ]*[A-Z]|(?:[A-Z][ ]*){2,}\d?)[A-Z ]*[A-Z]
(see this demo)

Regex Rules for First and Second Character

I need help on following regular expression rules of javascript and php.
JS
var charFilter = new RegExp("^[A|B].+[^0123456789]$");
PHP
if (!preg_match('/^[A|B].+[^0123456789]$/', $data_array['sample_textfield'])) {
This regular expression is about
First character must be start with A or B and last character must not include 0 to 9.
I have another validation about, character must be min 3 character and max 6 number.
New rule I want to add is, second character cannot be C, if first letter is A.
Which means
ADA (is valid)
ACA (is not valid)
So I changed the regex code like this
JS
var charFilter = new RegExp("^(A[^C])|(B).+[^0123456789]$");
PHP
if (!preg_match('/^(A[^C])|(B).+[^0123456789]$/', $data_array['sample_textfield'])) {
It is worked for first and second character. If i type
ACA (it says invalid) , But if i type
AD3 (it says valid), it doesn't check the last character anymore. Last character must not contain 0 to 9 number, but it's show as valid.
Can anyone help me to fix that regex code for me ? Thank you so much.
Putting all of your requirements together, it seems that you want this pattern:
^(?=.{3,6}$)(?=A(?!C)|B).+\D$
That is:
From the beginning of the string ^
We can assert that there are between 3 to 6 of "any" characters to end of the string (?=.{3,6}$)
We can also assert that it starts with A not followed by C, or starts with B (?=A(?!C)|B)
And the whole thing doesn't end with a digit .+\D$
This will match (as seen on rubular.com):
= match = = no match =
ADA ACA
ABCD AD3
ABCDE ABCDEFG
ABCDEF
A123X
A X
Note that spaces are allowed by .+ and \D. If you insist on no spaces, you can use e.g. (?=\S{3,6}$) in the first part of the pattern.
(?=…) is positive lookahead; it asserts that a given pattern can be matched. (?!…) is negative lookahead; it asserts that a given pattern can NOT be matched.
References
regular-expressions.info
Lookarounds, Alternation, Anchors, Repetition, Dot, Character Class
Related questions
How does the regular expression (?<=#)[^#]+(?=#) work?
On alternation precedence
The problem with the original pattern is in misunderstanding the precedence of the alternation | specifier.
Consider the following pattern:
this|that-thing
This pattern consists of two alternates, one that matches "this", and another that matches "that-thing". Contrast this with the following pattern:
(this|that)-thing
Now this pattern matches "this-thing" or "that-thing", thanks to the grouping (…). Coincidentally it also creates a capturing group (which will capture either "this" or "that"). If you don't need the capturing feature, but you need the grouping aspect, use a non-capturing group ``(?:…)`.
Another example of where grouping is desired is with repetition: ha{3} matches "haaa", but (ha){3} matches "hahaha".
References
regular-expressions.info/Brackets for Grouping
Your OR is against the wrong grouping. Try:
^((A[^C])|(B)).+[^0123456789]$
In jasonbars solution the reason it doesn't match ABC is because it requires A followed by not C, which is two characters, followed by one or more of any character followed by a non number. Thus if the string begins with an A the minimum length is 4. You can solve this by using a look ahead assertion.
PHP
$pattern = '#^(A(?=[^C])|B).+\D$#';
i think it should be like
/^(A[^C]|B.).*[^0-9]$/
try this test code
$test = "
A
B
AB
AC
AAA
ABA
ACA
AA9
add more
";
$pat = '/^(A[^C]|B.).*[^0-9]$/';
foreach(preg_split('~\s+~', $test) as $p)
printf("%5s : %s\n<br>", $p, preg_match($pat, $p) ? "ok" : "not ok");

Can Someone explain this reg ex to me?

I recently asked a question on formatting a telephone number and I got lots of responses. Most of the responses were great but one i really wanted to figure out what its doing because it worked great. If phone is the following how do the other lines work...what are they doing so i can learn
$phone = "(407)888-9999";
$phone = preg_replace("~[^0-9]~", "", $phone);
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
Let's break the code into two lines.
preg_replace("~[^0-9]~", "", $phone);
First, we're going to replace matches to a regex with an empty string (in other words, delete matches from the string). The regex is [^0-9] (the ~ on each end is a delimiter). [...] in a regex defines a character class, which tells the regex engine to match one character within the class. Dashes are generally special characters inside a character class, and are used to specify a range (ie. 0-9 means all characters between 0 and 9, inclusive).
You can think of a character class like a shorthand for a big OR condition: ie. [0-9] is a shorthand for 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9. Note that classes don't have to contain ranges, either -- [aeiou] is a character class that matches a or e or i or o or u (or in other words, any vowel).
When the first character in the class is ^, the class is negated, which means that the regex engine should match any character that isn't in the class. So when you put all that together, the first line removes anything that isn't a digit (a character between 0 and 9) from $phone.
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
The second line tries to match $phone against a second expression, and puts the results into an array called $matches, if a match is made. You will note there are three sets of brackets; these define capturing groups -- ie. if there is a match of a pattern as a whole, you will end up with three submatches, which in this case will contain the area code, prefix and suffix of the phone number. In general, anything contained in brackets in a regular expression is capturing (while there are exceptions, they are beyond the scope of this explanation). Groups can be useful for other things too, without wanting the overhead of capturing, so a group can be made non-capturing by prefacing it with ?: (ie. (?:...)).
Each group does a similar thing: [0-9]{3} or [0-9]{4}. As we saw above, [0-9] defines a character class containing the digits between 0 and 9 (as the classes here don't start with ^, these are not negated groups). The {3} or {4} is a repetition operator, which says "match exactly 3 (or 4) of the previous token (or group)". So [0-9]{3} will match exactly three digits in a row, and [0-9]{4} will match exactly four digits in a row. Note that the digits don't have to be all the same (ie. 111), because the character class is evaluate for each repetition (so 123 will match because 1 matches [0-9], then 2 matches [0-9], and then 3 matches [0-9]).
In the preg_replace it looks for anything that is not, ^ inside of the [], 0-9 (basically not a number) and replaces / removes it from that string given the replacement is "".
For the first section, it pulls out the first 3 numbers ([0-9]{3}) the {3} is the number of characters to match the items inside the [] are what to match and since this is inside of paranthesis () it stores it as a match in the array $matches. The second part pulls out the next 3 numbers and the last part pulls out the last 4 numbers from $phone and stores the matches that were matched in $matches.
The ~ are delimeters for the regular expressions.
You know it's a regular expression from the regex tag.
So, you are pattern matching.
The pattern you are matching is: [^0-9] followed by the phone number.
[^0-9] is NOT '^' any one digit
So, the match after that is any 3 digits, followed by any 3 digits, followed by any 4 digits.
I don't think it will match because of the () around the area code and the dash are missing.
I'd do this:
~\(([0-9]{3})\)([0-9]{3})-([0-9]{4})~'
"[^0-9]" means everything but numbers from 0 to 9. So basically, first line replace everything but numbers with "" (nothing)
[0-9]{3} means number from 0 to 9, 3 times in a row.
So it check if you have 3 numbers then 3 numbers than 4 numbers and try to match it with $matches.
Check this tuts
Using Regular Expressions with PHP
http://www.webcheatsheet.com/php/regular_expressions.php
$phone = "(407)888-9999";
$phone = preg_replace("~[^0-9]~", "", $phone);
In php you have to delimit regex pattern in some non-alphanumeric character "~" is used here.
[^0-9] is regex pattern used to remove anything out of $phone that is not in 0-9 range remember [^...] will negate the pattern it precedes.
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
Again in this line of code you have "~" as delimiter and
([0-9]{3}) this part of pattern will return 3 numbers from string (note: {} is used to specify range/number of characters to match) in a different output array dimension (check your $matches variable for result) using ( ) in a pattern results in groups/submatches

Categories