php regex - find uppercase string with number and spaces in text - php

I want to write php regular expression to find uppercase string , which can also contain one number and spaces, from text.
For example from this text "some text to contain EXAM PL E 7STRING uppercase word" I want to get string- EXAM PL E 7STRING ,
found string should start and end only with uppercase, but in the middle, without uppercase letters can also contain(but not necessarily ) one number and spaces. So, regex should match any of these patterns
1) EXAMPLESTRING - just uppercase string
2) EXAMP4LESTRING - with number
3) EXAMPLES TRING - with space
4) EXAM PL E STRING - with more than one spaces
5) EXAMP LE4STRING - with number and space
6) EXAMP LE 4ST RI NG - with number and spaces
and with total length string should be equal or more than 4 letters
I wrote this regex '/[A-Z]{1,}([A-Z\s]{2,}|\d?)[A-Z]{1,}/', that can find first 4 patterns, but I can not figure it out to match also the last 2 patterns.
Thanks

There is a neat trick called a lookahead. It just checks what is following after the current position. That can be used to check for multiple conditions:
'/(?<![A-Z])(?=(?:[A-Z][\s\d]*){3}[A-Z])(?!(?:[A-Z\s]*\d){2})[A-Z][A-Z\s\d]*[A-Z]/'
The first lookaround is actually a lookbehind and checks that there is no previous uppercase letter. This is just a little speedup for strings that would fail the match anyway. The second lookaround (a lookahead) checks that there are at least four letters. The third one checks that there are no two digits. The rest just matches then a string of the allowed characters, starting and ending with an uppercase letter.
Note that in the case of two digits this will not match at all (instead of matching everything up to the second digit). If you do want to match in such a case, you could incorporate the "1 digit" rule into the actual match instead:
'/(?<![A-Z])(?=(?:[A-Z][\s\d]*){3}[A-Z])[A-Z][A-Z\s]*\d?[A-Z\s]*[A-Z]/'
EDIT:
As Ωmega pointed out, this will cause problems if there are less then four letters before the second digit, but more after that. This is actually quite tough, because the assertion needs to be, that there are more than 4 letters before the second digit. Since we do not know where the first digit occurs in those four letters, we have to check for all possible positions. For this I would do away with the lookaheads altogether, and simply provide the three different alternatives. (I will keep the lookbehind as an optimization for non-matching parts.)
'/(?<![A-Z])[A-Z]\s*(?:\d\s*[A-Z]\s*[A-Z]|[A-Z]\s*\d\s*[A-Z]|[A-Z]\s*[A-Z][A-Z\s]*\d?)[A-Z\s]*[A-Z]/'
Or here with added comments:
'/
(?<! # negative lookbehind
[A-Z] # current position is not preceded by a letter
) # end of lookbehind
[A-Z] # match has to start with uppercase letter
\s* # optional spaces after first letter
(?: # subpattern for possible digit positions
\d\s*[A-Z]\s*[A-Z]
# digit comes after first letter, we need two more letters before last one
| # OR
[A-Z]\s*\d\s*[A-Z]
# digit comes after second letter, we need one more letter before last one
| # OR
[A-Z]\s*[A-Z][A-Z\s]*\d?
# digit comes after third letter, or later, or not at all
) # end of subpattern for possible digit positions
[A-Z\s]* # arbitrary amount of further letters and whitespace
[A-Z] # match has to end with uppercase letter
/x'
That gives the same result on Ωmega's lengthy test input.

I suggest to use regex pattern
[A-Z][ ]*(\d)?(?(1)(?:[ ]*[A-Z]){3,}|[A-Z][ ]*(\d)?(?(2)(?:[ ]*[A-Z]){2,}|[A-Z][ ]*(\d)?(?(3)(?:[ ]*[A-Z]){2,}|[A-Z][ ]*(?:\d|(?:[ ]*[A-Z])+[ ]*\d?))))(?:[ ]*[A-Z])*
(see this demo).
[A-Z][ ]*(?:\d(?:[ ]*[A-Z]){2}|[A-Z][ ]*\d[ ]*[A-Z]|(?:[A-Z][ ]*){2,}\d?)[A-Z ]*[A-Z]
(see this demo)

Related

Regex repeating letters

How can I not allow a user to enter a word with repeating letters I already have the case for special characters?
I have tried this and it works for the special characters allowed in the text.
^(?!.*([ \-])\1)\w[a-zA-z0-9 \-]*$
3 My Address--
Will not work (--)
This is what I am trying to do for the letters (?!.*([a-z])\1{4}) but it does not work it breaks the regex.
(?!.*([ \-])\1)(?!.*([a-z])\1{4})\w[a-zA-z0-9 \-]*$
It should prevent any repeating letters when they have been entered 4 times in a row for example this is for a address and as it stand I can enter.
3 My Adddddddddd
You need to use \2 backreference in the second lookahead, and mind using [a-zA-Z], not [a-zA-z] in the consuming part:
^(?!.*([ -])\1)(?!.*([A-Za-z])\2{3})\w[a-zA-Z0-9 -]*$
See the regex demo.
The first capturing group is ([ -]) in the first lookahead, the second lookahead contains the second group, thus, \2 is necessary.
As you want to filter out matches with at least 4 identical consecutive letters, you need ([A-Za-z])\2{3}, not {4}.
Also, if you plan to match a digit at the beginning, consider replacing \w with \d.
Regex details
^ - start of string
(?!.*([ -])\1) - no two identical consecutive spaces or hyphens allowed in the string
(?!.*([A-Za-z])\2{3}) - no four identical consecutive letters allowed in the string
\w - the first char should be a letter, digit or _
[a-zA-Z0-9 -]* - 0+ letters, digits, spaces or hyphens
$ - end of string.

PHP Pattern Validation

I'm having a bit of trouble getting my pattern to validate the string entry correctly. The PHP portion of this assignment is working correctly, so I won't include that here as to make this easier to read. Can someone tell me why this pattern isn't matching what I'm trying to do?
This pattern has these validation requirements:
Should first have 3-6 lowercase letters
This is immediately followed by either a hyphen or a space
Followed by 1-3 digits
$codecheck = '/^([[:lower:]]{3,6}-)|([[:lower:]]{3,6} ?)\d{1,3}$/';
Currently this catches most of the requirements, but it only seems to validate the minimum character requirements - and doesn't return false when more than 6 or 3 characters (respectively) are entered.
Thanks in advance for any assistance!
The problem here lies in how you group the alternatives. Right now, the regex matches a string that
^([[:lower:]]{3,6}-) - starts with 3-6 lowercase letters followed with a hyphen
| - or
([[:lower:]]{3,6} ?)\d{1,3}$ - ends with 3-6 lowercase letters followed with an optional space and followed with 1-3 digits.
In fact, you can get rid of the alternation altogether:
$codecheck = '/^\p{Ll}{3,6}[- ]\d{1,3}$/';
See the regex demo
Explanation:
^ - start of string
\p{Ll}{3,6} - 3-6 lowercase letters
[- ] - a positive character class matching one character, either a hyphen or a space
\d{1,3} - 1-3 digits
$ - end of string
You need to delimit the scope of the | operator in the middle of your regex.
As it is now:
the right-side argument of that OR runs up until the very end of your regex, even including the $. So the digits, nor the end-of-string condition do not apply for the left side of the |.
the left-side argument of the OR starts with ^, and only applies to the left side.
That is why you get a match when you supply 7 lowercase characters. The first character is ignored, and the rest matches with the right-side of the regex pattern.

Phone no contain this patteren AABBCC e.g 112233

I want to check if phone no contains this pattern AABBCC
Where A[0-9],B[0-9],C[0,9] They should be different e.g 112233,553322,887766
Let Us Suppose
I Have a phone no 03334112233
It will say yes pattern matched.
PHP Code but It Is For Exact String
$str = 'aabbaabbccaass'; //or whatever
if (preg_match('/(?!.*?aabbcc)^.*$/', $str))
echo "accepted\n";
else
echo "rejected\n";
Problem i don't know how to do if string is for numbers
Possible Duplicate
but it does not contain answer and exact detail.
Edited :
I want to match the last 6 characters of the string in this pattern AABBCC e.g 03329112233
To match number with AABBCC format, you can use this pattern:
(?:(\d)\1(?!\1)){2}(\d)\2
example of use:
if (preg_match('/(?:(\d)\1(?!\1)){2}(\d)\2/', $str)
echo "rejected\n";
else
echo "accepted\n";
But if you have other tests to do (for example that there is only digits), it can be more flexible to use it in this way:
if (preg_match('/(?!.*(?:(\d)\1(?!\1)){2}(\d)\2)^\d+$/', $str)
echo "accepted\n";
else
echo "rejected\n";
pattern details:
(?: # open a non capturing group that describes a repeated digit
(\d) # capture the first digit with group 1
\1 # a backreference to group 1 (the same digit thus)
(?!\1) # check with a negative lookahead that the same digit doesn't follow
){2} # repeat the group two times
(\d)\2 # same thing for digits 5 & 6 (the lookahead isn't needed here)
Note that the digit in the capture group change at each repetition of the non capturing group (because the negative lookahead forces it).
Notice: if you want to reject numbers that contains, for example, 111122 or 112222 or 111111, you only need to remove the negative lookahead.
if you want to reject numbers with the format 112211 or 448844, you must change the pattern like this: (\d)\1(?!\d{0,2}\1)(\d)\2(?!\2)(\d)\3
As I understand, you only want to match the last 6 characters of the string, if they are digits, and of 3 all different digit pairs. Would also use a lookahead and some pattern like this:
(?>((\d)\2)(?!.*\1)){3}$
\2 checks for an equivalent of 2nd capturing group, which is one digit (shorthand \d)
using a negative lookahead to check, if not followed by .* any amount of any characters, followed by equivalent of 1st capturing group (which contains 2 equal digits).
{3} 3 repitions at $ end of string.
Test on regex101.com, Regex FAQ
Your regex should be like this:
^((\d)\2){3}$
It is simpler and also works.
You can use capturing groups and backreferences like this:
if (preg_match('/(?!.*(.)\1(.)\2(.)\3)^.*$/', $str))
The (.) will match any single character and assign it to a group. The first instance is assigned to group 1, the second to group 2 and so on. Later in the pattern, the backreference \1 will match exactly what was previously captured in group the first group, \2 will match what was captured in the second group, etc.
You probably will also want to use \d to match any single digit (it's only necessary to use this outside of the lookahead) and a {n,m} quantifier to match between n and m digits. For example, the following will match any sequence of 7 to 10 digits that does not contain a subsequence like AABBCC:
if (preg_match('/(?!.*(.)\1(.)\2(.)\3)^\d{7,10}$/', $str))

PHP check that string has 2 numbers, 8 chars and 1 capital

I found lots of php regex and other options to determine string length, and if it contains one letter or one number, but how do I determine if a string has 2 numbers in it?
I am trying to validate a password that
Must have exactly 8 characters
One of them must be an Uppercase letter
2 of them must be numbers
Is there a one line regex solution for this?
if (preg_match(
'/^ # Start of string
(?=.*\p{Lu}) # at least one uppercase letter
(?=.*\d.*\d) # at least two digits
.{8} # exactly 8 characters
$ # End of string
/xu',
$subject)) {
# Successful match
(?=...) is a lookahead assertion. It checks if a certain regex can be matched at the current position, but doesn't actually consume any part of the string, so you can just place several of those in a row.

preg_match string

Can someone explain me the meaning of this pattern.
preg_match(/'^(d{1,2}([a-z]+))(?:s*)S (?=200[0-9])/','21st March 2006','$matches);
So correct me if I'm wrong:
^ = beginning of the line
d{1,2} = digit with minimum 1 and maximum 2 digits
([a-z]+) = one or more letters from a-z
(?:s*)S = no idea...
(?= = no idea...
200[0-9] = a number, starting with 200 and ending with a number (0-9)
Can someone complete this list?
Here's a nice diagram courtesy of strfriend:
But I think you probably meant ^(\d{1,2}([a-z]+))(?:\s*)\S (?=200[0-9]) with the backslashes, which gives this diagram:
That is, this regexp matches the beginning of the string, followed by one or two digits, one or more lowercase letters, zero or more whitespace characters, one non-whitespace character and a space. Also, all this has to be followed by a number between 2000 and 2009, although that number is not actually matched by the regexp — it's only a look-ahead assertion. Also, the leading digits and letters are captures into $matches[1], and just the letters into $matches[2].
For more information on PHP's PCRE regexp syntax, see http://php.net/manual/en/pcre.pattern.php
regular-exressions.info is very helpful resource.
/'^(d{1,2}([a-z]+))(?:s*)S (?=200[0-9])/
(?:regex) are non-capturing parentheses; They aren't very useful in your example, but could be used to expres things like (?:bar)+, to mean 1 or more bars
(?=regex) does a positive lookahead, but matches the position not the contents. So (?=200[0-9]) in your example makes the regex match only dates in the previous decade, without matching the year itself.

Categories