how to develop regex for a number with php - php

I want to build a regex with php for a number such as '123 2345 7890'. The first 3 characters should be numbers then space, again 4 characters should be numbers then space, again 4 characters should be numbers then space. So far I have done this but it's not working I mean this does not gives me the actual format that i want, can anyone please help me to sort it.
preg_match("/^([0-9]{3})([0-9]{4})([0-9]{4}).*$/", $new_password)

Your pattern does not match spaces, and the .* at the end optionally matches any character except a newline.
You could use \h+ to match 1 or more horizontal whitespace chars and at the end match optional horizontal whitespace chars \h*
Or just to match a mere space instead.
If you don't need the capture groups for after processing, you could omit them.
^\d{3}\h+\d{4}\h+\d{4}\h*$
Regex demo

Related

PHP regex match Latin words may contains symbols, digits and spaces

There are names of records in which are mixed Cyrillic and Latin words, symbols, spaces, digits, etc.
I need to preg_match (PHP) only Latin part with any symbols in any combinations.
Test set:
БлаблаБла Uty-223
Блабла (бла.)Бла CAROP-C
Бла бла ST.MORITZ
Бла бла RAMIRO2-TED
LA PLYSGNE 1 H - 001
(Блабла) - doesn't matter Cyrillic words.
So i tried pattern:
/[-0-9a-zA-Z.]+/
But [Блабла (бла.)Бла CAROP-C] and [LA PLYSGNE 1 H - 001] not found as string.
Next i tried to write more flexible pattern:
/[-0-9a-zA-Z]+(?:.)?+(?:\s+)?+[-0-9a-zA-Z]+/
But there is still problem with matching [LA PLYSGNE 1 H - 001].
Is there any idea how can this be solved?
Thanks.
If the . and - can not occur at the beginning or end, you can start the match with [0-9a-zA-Z] and optionally repeat one of the chars listed in the character class followed by again [0-9a-zA-Z]
\b[0-9a-zA-Z]+(?:[.\h-]+[0-9a-zA-Z]+)*\b
The \b is a word boundary preventing a partial word match
\h matches a horizontal whitespace character
See a regex101 demo.
Matching at least a single char [0-9a-zA-Z] with allowed chars . and - in the whole string, and asserting whitespace boundaries to the left and right
(?<!\S)[.-]*\b[0-9a-zA-Z](?:[0-9a-zA-Z.\h-]*[0-9a-zA-Z.-])?(?!\S)
Using (?<!\S) and (?!\S) are lookaround assertions that are whitespace boundaries, asserting not a non whitespace char to the left and the right.
See a regex101 demo.
You can also use a script run starting with a latin letter:
~(*sr:\p{Latin}.*\S)~u
demo

Regex optional groups and digit length

Maybe some regex-Master can solve my problem.
I have a big list with many addresses with no seperators( , ; ).
The address string contains following Information:
The first group is the street name
The second group is the street number
The third group is the zipcode (optional)
The last group is the town name (optional)
As you can see on the image above the last two test strings are not matching.
I need the last two regex groups to be optional and the third group should be either 4 or 5 digits.
I tried (\d{4,5}) for allowing 4 and 5 digits. But this only works halfways as you can see here: https://regex101.com/r/ZurqHh/1
(This sometimes mixes the street number and zipcode together)
I also tried (?:\d{5})? to make the third and fourth group optional. But this destroys my whole group layout...
https://regex101.com/r/EgxeMy/1
This is my current regex:
/^([a-zäöüÄÖÜß\s\d.,-]+?)\s*([\d\s]+(?:\s?[-|+\/]\s?\d+)?\s*[a-z]?)?\s*(\d{5})\s*(.+)?$/im
Try it out yourself:
https://regex101.com/r/zC8NCP/1
My brain is only farting at this moment and i can't think straight anymore.
Please help me fix this problem so i can die in peace.
You can use
^(.*?)(?:\s+(\d+(?:\s*[-|+\/]\s*\d+)*\s*[a-z]?\b))?(?:\s+(\d{4,5})(?:\s+(.*))?)?$
See the regex demo (note all \s are replaced with \h to only match horizontal whitespaces).
Details:
^ - start of string
(.*?) - Group 1: any zero or more chars other than line break chars
(?:\s+(\d+(?:\s*[-|+\/]\s*\d+)*\s*[a-z]?\b))? - an optional non-capturing group matching
\s+ - one or more whitespaces
(\d+(?:\s*[-|+\/]\s*\d+)*\s*[a-z]?\b) - Group 2:
\d+ - one or more digits
(?:\s*[-|+\/]\s*\d+)* - zero or more sequences of zero or more whitespaces, -, +, | or /, zero or more whitespaces, one or more digits
\s* - zero or more whitespaces
[a-z]?\b - an optional lowercase ASCII letter and a word boundary
(?:\s+(\d{4,5})\b(?:\s+(.*))?)? - an optional non-capturing group matching
\s+ - one or more whitespaces
(\d{4,5}) - Group 3: four or five digits
(?:\s+(.*))? - an optional sequence of one or more whitespaces and then any zero or more chars other than line break chars as many as possible
$ - end of string.
Please note that the (?:\s+(.*))? optional group must be inside the (?:\s+(\d{4,5})...)? group to work.
It is difficult to parse addresses because we are halfway between formatted text and natural language. Here is a pattern that tries as much as possible to reduce the number of optional parameters to succeed with the examples offered without asking too much to the regex engine. To do this, I mainly rely on character classes, atomic groups, and a relatively accurate description of the street names. Obviously, all the examples of the question cannot be representative of reality and characters could be added or removed from the classes to deal with new cases. Nevertheless, the structure of this pattern is a good starting point.
~
^
(?<strasse> [\pL\d-]+ \.? (?> \h+ [\pL\d-]+ \.? )*? ) \h*
(?<nummer> \b (?> \d+ | [-+/\h]+ | [a-z] \b )*? )
(?: \h+ (?<plz> \d{4,5} )
\h+ (?<stadt> .+ ) )?
$
~mxui
demo
Note that in the above link you can also see a previous version of this pattern with a more accurate description of the street number (a bit more efficient but longer).

Regex repeating letters

How can I not allow a user to enter a word with repeating letters I already have the case for special characters?
I have tried this and it works for the special characters allowed in the text.
^(?!.*([ \-])\1)\w[a-zA-z0-9 \-]*$
3 My Address--
Will not work (--)
This is what I am trying to do for the letters (?!.*([a-z])\1{4}) but it does not work it breaks the regex.
(?!.*([ \-])\1)(?!.*([a-z])\1{4})\w[a-zA-z0-9 \-]*$
It should prevent any repeating letters when they have been entered 4 times in a row for example this is for a address and as it stand I can enter.
3 My Adddddddddd
You need to use \2 backreference in the second lookahead, and mind using [a-zA-Z], not [a-zA-z] in the consuming part:
^(?!.*([ -])\1)(?!.*([A-Za-z])\2{3})\w[a-zA-Z0-9 -]*$
See the regex demo.
The first capturing group is ([ -]) in the first lookahead, the second lookahead contains the second group, thus, \2 is necessary.
As you want to filter out matches with at least 4 identical consecutive letters, you need ([A-Za-z])\2{3}, not {4}.
Also, if you plan to match a digit at the beginning, consider replacing \w with \d.
Regex details
^ - start of string
(?!.*([ -])\1) - no two identical consecutive spaces or hyphens allowed in the string
(?!.*([A-Za-z])\2{3}) - no four identical consecutive letters allowed in the string
\w - the first char should be a letter, digit or _
[a-zA-Z0-9 -]* - 0+ letters, digits, spaces or hyphens
$ - end of string.

Input field validation constraints using regular expression

I am working on a symfony(2.8) project. Where in the registration form needs some input validation.
I need to set following constraints on the Subdomain name input field:
1. Should contain only alphanumeric characters
2. First character can not be a number
3. No white spaces
I am using annotations for this task.
Here is the Assert statement I am using:
#Assert\Regex(pattern="/^[a-zA-Z][a-zA-Z0-9]\s+$/", message="Subdomain name must start with a letter and can only have alphanumeric characters with no spaces", groups={"registration"})
When I enter any simple string of words eg. svits, it still shows the error message "Subdomain name must start with a letter and can only have alphanumeric characters with no spaces"
Any suggestions would be appreciated.
You are very close with your regex, just add quantifier and remove \s:
/^[a-zA-Z][a-zA-Z0-9]+$/
Your pattern does not work because:
The [a-zA-Z0-9] only matches 1 alphanumeric character. To match 0 or more, add * quantifier (*zero or more occurrences of the quantified subpattern), or + (as in Toto's answer) to match one or more occurrences (to only match 2+-letter words).
Since your third requirement forbids the usage of whitespaces in the input string, remove \s+ from your pattern as it requires 1 or more whitespace symbols at the end of the string.
So, my suggestion is
pattern="/^[a-zA-Z][a-zA-Z0-9]*$/"
^
to match 1+ letter words as full strings that start with a letter and may be followed with 0+ any alphanumeric symbols.
To allow whitespaces in any place of the string but the start, put the \s into the second [...] (character class):
pattern="/^[a-zA-Z][a-zA-Z0-9\s]*$/"
^^ ^
If you do not want to allow more than 1 whitespace on end (no 2+ consecutive whitespaces), use
pattern="/^[a-zA-Z][a-zA-Z0-9]*(?:\s[a-zA-Z0-9]+)*$/"
^^^^^^^^^^^^^^^^^^^
The (?:\s[a-zA-Z0-9]+)* will match 0+ sequences of a single whitespace followed with 1+ alphanumerics.

php regex - find uppercase string with number and spaces in text

I want to write php regular expression to find uppercase string , which can also contain one number and spaces, from text.
For example from this text "some text to contain EXAM PL E 7STRING uppercase word" I want to get string- EXAM PL E 7STRING ,
found string should start and end only with uppercase, but in the middle, without uppercase letters can also contain(but not necessarily ) one number and spaces. So, regex should match any of these patterns
1) EXAMPLESTRING - just uppercase string
2) EXAMP4LESTRING - with number
3) EXAMPLES TRING - with space
4) EXAM PL E STRING - with more than one spaces
5) EXAMP LE4STRING - with number and space
6) EXAMP LE 4ST RI NG - with number and spaces
and with total length string should be equal or more than 4 letters
I wrote this regex '/[A-Z]{1,}([A-Z\s]{2,}|\d?)[A-Z]{1,}/', that can find first 4 patterns, but I can not figure it out to match also the last 2 patterns.
Thanks
There is a neat trick called a lookahead. It just checks what is following after the current position. That can be used to check for multiple conditions:
'/(?<![A-Z])(?=(?:[A-Z][\s\d]*){3}[A-Z])(?!(?:[A-Z\s]*\d){2})[A-Z][A-Z\s\d]*[A-Z]/'
The first lookaround is actually a lookbehind and checks that there is no previous uppercase letter. This is just a little speedup for strings that would fail the match anyway. The second lookaround (a lookahead) checks that there are at least four letters. The third one checks that there are no two digits. The rest just matches then a string of the allowed characters, starting and ending with an uppercase letter.
Note that in the case of two digits this will not match at all (instead of matching everything up to the second digit). If you do want to match in such a case, you could incorporate the "1 digit" rule into the actual match instead:
'/(?<![A-Z])(?=(?:[A-Z][\s\d]*){3}[A-Z])[A-Z][A-Z\s]*\d?[A-Z\s]*[A-Z]/'
EDIT:
As Ωmega pointed out, this will cause problems if there are less then four letters before the second digit, but more after that. This is actually quite tough, because the assertion needs to be, that there are more than 4 letters before the second digit. Since we do not know where the first digit occurs in those four letters, we have to check for all possible positions. For this I would do away with the lookaheads altogether, and simply provide the three different alternatives. (I will keep the lookbehind as an optimization for non-matching parts.)
'/(?<![A-Z])[A-Z]\s*(?:\d\s*[A-Z]\s*[A-Z]|[A-Z]\s*\d\s*[A-Z]|[A-Z]\s*[A-Z][A-Z\s]*\d?)[A-Z\s]*[A-Z]/'
Or here with added comments:
'/
(?<! # negative lookbehind
[A-Z] # current position is not preceded by a letter
) # end of lookbehind
[A-Z] # match has to start with uppercase letter
\s* # optional spaces after first letter
(?: # subpattern for possible digit positions
\d\s*[A-Z]\s*[A-Z]
# digit comes after first letter, we need two more letters before last one
| # OR
[A-Z]\s*\d\s*[A-Z]
# digit comes after second letter, we need one more letter before last one
| # OR
[A-Z]\s*[A-Z][A-Z\s]*\d?
# digit comes after third letter, or later, or not at all
) # end of subpattern for possible digit positions
[A-Z\s]* # arbitrary amount of further letters and whitespace
[A-Z] # match has to end with uppercase letter
/x'
That gives the same result on Ωmega's lengthy test input.
I suggest to use regex pattern
[A-Z][ ]*(\d)?(?(1)(?:[ ]*[A-Z]){3,}|[A-Z][ ]*(\d)?(?(2)(?:[ ]*[A-Z]){2,}|[A-Z][ ]*(\d)?(?(3)(?:[ ]*[A-Z]){2,}|[A-Z][ ]*(?:\d|(?:[ ]*[A-Z])+[ ]*\d?))))(?:[ ]*[A-Z])*
(see this demo).
[A-Z][ ]*(?:\d(?:[ ]*[A-Z]){2}|[A-Z][ ]*\d[ ]*[A-Z]|(?:[A-Z][ ]*){2,}\d?)[A-Z ]*[A-Z]
(see this demo)

Categories