PHP Regex for validation certain number format

PHP Regex for validation certain number format - php

I need a regular expression which validates any one of the below formats:
+10%
-123
+5.5
+50
99
99.99
-20%
25% (this should not be validated)
(% without any + or - should not be validated)
I tried to use preg_match('/^[+-]?(\d+\.)?(\d+)[%]?$/', $value) but this also validates 25%.
Can anyone share regex which validates the above format?

You may be able to do this using conditional sub-pattern in PCRE that avoids repeating whole number matching pattern again in alternation:
^([+-])?\d+(?:\.\d+)?(?(1)%)?$
RegEx Demo
RegEx Details:
^: Start
([+-])?: Match + or - in optional group #1
\d+: Match 1+ digits
(?:\.\d+)?: Match dot followed by 1+ digits in an optional non-capturing group
(?(1)%)?: Conditional subpattern. If group #1 is present then match % as optional match.
$: End

I might just keep it simple here and use an alternation:
^(?:[+-]?\d+(?:\.\d+)?|[+-]\d+(?:\.\d+)?%)$
Demo
The tricky part of your requirement is that the leading sign is optional for a non percentage number, but mandatory for a percentage. The alternation makes it easy to separate out these two concerns.

Related

Regex optional groups and digit length

Maybe some regex-Master can solve my problem.
I have a big list with many addresses with no seperators( , ; ).
The address string contains following Information:
The first group is the street name
The second group is the street number
The third group is the zipcode (optional)
The last group is the town name (optional)
As you can see on the image above the last two test strings are not matching.
I need the last two regex groups to be optional and the third group should be either 4 or 5 digits.
I tried (\d{4,5}) for allowing 4 and 5 digits. But this only works halfways as you can see here: https://regex101.com/r/ZurqHh/1
(This sometimes mixes the street number and zipcode together)
I also tried (?:\d{5})? to make the third and fourth group optional. But this destroys my whole group layout...
https://regex101.com/r/EgxeMy/1
This is my current regex:
/^([a-zäöüÄÖÜß\s\d.,-]+?)\s*([\d\s]+(?:\s?[-|+\/]\s?\d+)?\s*[a-z]?)?\s*(\d{5})\s*(.+)?$/im
Try it out yourself:
https://regex101.com/r/zC8NCP/1
My brain is only farting at this moment and i can't think straight anymore.
Please help me fix this problem so i can die in peace.

You can use
^(.*?)(?:\s+(\d+(?:\s*[-|+\/]\s*\d+)*\s*[a-z]?\b))?(?:\s+(\d{4,5})(?:\s+(.*))?)?$
See the regex demo (note all \s are replaced with \h to only match horizontal whitespaces).
Details:
^ - start of string
(.*?) - Group 1: any zero or more chars other than line break chars
(?:\s+(\d+(?:\s*[-|+\/]\s*\d+)*\s*[a-z]?\b))? - an optional non-capturing group matching
\s+ - one or more whitespaces
(\d+(?:\s*[-|+\/]\s*\d+)*\s*[a-z]?\b) - Group 2:
\d+ - one or more digits
(?:\s*[-|+\/]\s*\d+)* - zero or more sequences of zero or more whitespaces, -, +, | or /, zero or more whitespaces, one or more digits
\s* - zero or more whitespaces
[a-z]?\b - an optional lowercase ASCII letter and a word boundary
(?:\s+(\d{4,5})\b(?:\s+(.*))?)? - an optional non-capturing group matching
\s+ - one or more whitespaces
(\d{4,5}) - Group 3: four or five digits
(?:\s+(.*))? - an optional sequence of one or more whitespaces and then any zero or more chars other than line break chars as many as possible
$ - end of string.
Please note that the (?:\s+(.*))? optional group must be inside the (?:\s+(\d{4,5})...)? group to work.

It is difficult to parse addresses because we are halfway between formatted text and natural language. Here is a pattern that tries as much as possible to reduce the number of optional parameters to succeed with the examples offered without asking too much to the regex engine. To do this, I mainly rely on character classes, atomic groups, and a relatively accurate description of the street names. Obviously, all the examples of the question cannot be representative of reality and characters could be added or removed from the classes to deal with new cases. Nevertheless, the structure of this pattern is a good starting point.
~
^
(?<strasse> [\pL\d-]+ \.? (?> \h+ [\pL\d-]+ \.? )*? ) \h*
(?<nummer> \b (?> \d+ | [-+/\h]+ | [a-z] \b )*? )
(?: \h+ (?<plz> \d{4,5} )
\h+ (?<stadt> .+ ) )?
$
~mxui
demo
Note that in the above link you can also see a previous version of this pattern with a more accurate description of the street number (a bit more efficient but longer).

PHP Regex: Remove words not equal exactly 3 characters

An excellent "very close" answer at
Remove words less than 3 chars
with DEMO
where Regex
\b([a-z]{1,2})\b
removes all words less than 3 chars.
But how to reset this demo vise versa ? to remove all words NOT EXACTLY EQUAL 3 chars ?
We can catch word where exactly 3 chars by
\b([a-z]{3})\b
but how to tell regex - remove all other words what are NOT equal to 3 ?
So in regex demo ref above should leave only word 'and'

Use alternatives to match either 1-2 or 4+ letters.
\b(?:[a-z]{1,2}|[a-z]{4,})\b

Another variation with a negative lookbehind asserting not 3 chars to the left
\b[a-z]+\b(?<!\b[a-z][a-z][a-z]\b)
Regex demo
Or with a skip fail approach for 3 chars a-z:
\b[a-z]{3}\b(*SKIP)(*F)|\b[a-z]+\b
Regex demo

I think maybe:
\b(?![a-z]{3}\b)[a-z]+\b
Matching:
\b - A word-boundary.
(?![a-z]{3}\b) - A negative lookahead to avoid three-letter words.
[a-z]+\b - Any 1+ letter-words (greedy) us to a word boundary.
Another trick is to use a capture group to match what you want:
\b(?:[a-z]{3}|([a-z]+))\b
\b - A word-boundary
(?:[a-z]{3}|([a-z]+)) - A nested capture group inside alternation to first neglect three alpha chars and capture any 1+ words (greedy).
\b - A word-boundary

With an optional group of letters with at least 2 characters and a possessive quantifier:
\b[a-z]{1,2}+(?:[a-z]{2,})?\b
demo
This approach is based on a calculation trick and on backtracking.
In other words: 2 + x = 3 with x > 1 has no solution.
If I had written \b[a-z]{1,2}(?:[a-z]{2,})?\b (with or without the last \b it isn't important), when the regex engine reaches the position at the start of a three letters word [a-z]{1,2} would have consumed the two first letters, but as an extra character is needed for the last word boundary to succeed, the regex engine doesn't have an other choice to backtrack the {1,2} quantifier. With one backtracking step, the [a-z]{1,2} would have consumed only one character and (?:[a-z]{2,})?\b could have succeeded. But by making this quantifier possessive I forbid this backtracking step. Since, for a three letters word, [a-z]{1,2}+ takes 2 characters and [a-z]{2,} needs at least 2 letters, the pattern fails.
Use the word boundary and force to fail with the possessive quantifier:
\b(?:[a-z]{3}\b)?+[a-z]+
demo
This one plays also with an impossible assertion: three letters followed by a word boundary, can't be followed by a letter.
One more time, with a three letter words, once the three letters are consumed by [a-z]{3}, the possessive quantifier ?+ forbids to backtrack and [a-z]+ makes the pattern fail.
Force to fail with 3 letters and skip them using a backtracking control verb:
\b[a-z]{3}\b(*SKIP)^|[a-z]+
demo

Regex to get the first number after a certain string followed by any data until the number

I have a piece of data, retrieved from the database and containing information I need. Text is entered in a free form so it's written in many different ways. The only thing I know for sure is that I'm looking for the first number after a given string, but after that certain string (before the number) can be any text as well.
I tried this (where mytoken is the string I know for sure its there) but this doesn't work.
/(mytoken|MYTOKEN)(.*)\d{1}/
/(mytoken|MYTOKEN)[a-zA-Z]+\d{1}/
/(mytoken|MYTOKEN)(.*)[0-9]/
/(mytoken|MYTOKEN)[a-zA-Z]+[0-9]/
Even mytoken can be written in capitals, lowercase or a mix of capitals and lowercase character. Can the expression be case insensitive?

You do not need any lazy matching since you want to match any number of non-digit symbols up to the first digit. It is better done with a \D*:
/(mytoken)(\D*)(\d+)/i
See the regex demo
The pattern details:
(mytoken) - Group 1 matching mytoken (case insensitively, as there is a /i modifier)
(\D*) - Group 2 matching zero or more characters other than a digit
(\d+) - Group 3 matching 1 or more digits.
Note that \D also matches newlines, . needs a DOTALL modifier to match across newlines.

You need to use a lazy quantifier. You can do that by putting a question mark after the star quantifier in the regex: .*?. Otherwise, the numbers will be matched by the dot operator until the last number, which will be matched by \d.
Regex: /(mytoken|MYTOKEN)(.*?)\d/
Regex demo

You can use the opposite:
/(mytoken|MYTOKEN)(\D+)(\d)/
This says: mytoken, followed by anything not a number, followed by a number. The (lazy) dot-star-soup is not always your best bet. The desired number will be in $3 in this example.

Regex for PHP - similar to IP pattern?

I need a regular expression which consist of: 1-3 digits and optional dot. It is something like IP pattern. I want my regex to allow the following:
192
192.
192.168
192.168.
and NOT the following:
192.1688
This is what I have so far:
preg_match('/^((\d{1,3})(\.?))+$/', $string);
But it still allows me to have more than 3 digits. Any suggestions how to fix the regex?

If you plan to match any number of 1-3 digit sequences separated with a dot (which is optional at the end), you can use
^\d{1,3}(?:\.\d{1,3})*\.?$
See demo
If you need the numbers to be in the range between 0 and 255 as in IP address, use
^(?:25[0-5]|2[0-4][0-9]|[01]?[1-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[1-9][0-9]?))*\.?$
See another demo.
To limit to only 2 groups of numbers, use a ? quantifier with the second non-capturing group:
^(?:25[0-5]|2[0-4][0-9]|[01]?[1-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[1-9][0-9]?))?\.?$
^
See the 3rd demo

php regex - find uppercase string with number and spaces in text

I want to write php regular expression to find uppercase string , which can also contain one number and spaces, from text.
For example from this text "some text to contain EXAM PL E 7STRING uppercase word" I want to get string- EXAM PL E 7STRING ,
found string should start and end only with uppercase, but in the middle, without uppercase letters can also contain(but not necessarily ) one number and spaces. So, regex should match any of these patterns
1) EXAMPLESTRING - just uppercase string
2) EXAMP4LESTRING - with number
3) EXAMPLES TRING - with space
4) EXAM PL E STRING - with more than one spaces
5) EXAMP LE4STRING - with number and space
6) EXAMP LE 4ST RI NG - with number and spaces
and with total length string should be equal or more than 4 letters
I wrote this regex '/[A-Z]{1,}([A-Z\s]{2,}|\d?)[A-Z]{1,}/', that can find first 4 patterns, but I can not figure it out to match also the last 2 patterns.
Thanks

There is a neat trick called a lookahead. It just checks what is following after the current position. That can be used to check for multiple conditions:
'/(?<![A-Z])(?=(?:[A-Z][\s\d]*){3}[A-Z])(?!(?:[A-Z\s]*\d){2})[A-Z][A-Z\s\d]*[A-Z]/'
The first lookaround is actually a lookbehind and checks that there is no previous uppercase letter. This is just a little speedup for strings that would fail the match anyway. The second lookaround (a lookahead) checks that there are at least four letters. The third one checks that there are no two digits. The rest just matches then a string of the allowed characters, starting and ending with an uppercase letter.
Note that in the case of two digits this will not match at all (instead of matching everything up to the second digit). If you do want to match in such a case, you could incorporate the "1 digit" rule into the actual match instead:
'/(?<![A-Z])(?=(?:[A-Z][\s\d]*){3}[A-Z])[A-Z][A-Z\s]*\d?[A-Z\s]*[A-Z]/'
EDIT:
As Ωmega pointed out, this will cause problems if there are less then four letters before the second digit, but more after that. This is actually quite tough, because the assertion needs to be, that there are more than 4 letters before the second digit. Since we do not know where the first digit occurs in those four letters, we have to check for all possible positions. For this I would do away with the lookaheads altogether, and simply provide the three different alternatives. (I will keep the lookbehind as an optimization for non-matching parts.)
'/(?<![A-Z])[A-Z]\s*(?:\d\s*[A-Z]\s*[A-Z]|[A-Z]\s*\d\s*[A-Z]|[A-Z]\s*[A-Z][A-Z\s]*\d?)[A-Z\s]*[A-Z]/'
Or here with added comments:
'/
(?<! # negative lookbehind
[A-Z] # current position is not preceded by a letter
) # end of lookbehind
[A-Z] # match has to start with uppercase letter
\s* # optional spaces after first letter
(?: # subpattern for possible digit positions
\d\s*[A-Z]\s*[A-Z]
# digit comes after first letter, we need two more letters before last one
| # OR
[A-Z]\s*\d\s*[A-Z]
# digit comes after second letter, we need one more letter before last one
| # OR
[A-Z]\s*[A-Z][A-Z\s]*\d?
# digit comes after third letter, or later, or not at all
) # end of subpattern for possible digit positions
[A-Z\s]* # arbitrary amount of further letters and whitespace
[A-Z] # match has to end with uppercase letter
/x'
That gives the same result on Ωmega's lengthy test input.

I suggest to use regex pattern
[A-Z][ ]*(\d)?(?(1)(?:[ ]*[A-Z]){3,}|[A-Z][ ]*(\d)?(?(2)(?:[ ]*[A-Z]){2,}|[A-Z][ ]*(\d)?(?(3)(?:[ ]*[A-Z]){2,}|[A-Z][ ]*(?:\d|(?:[ ]*[A-Z])+[ ]*\d?))))(?:[ ]*[A-Z])*
(see this demo).
[A-Z][ ]*(?:\d(?:[ ]*[A-Z]){2}|[A-Z][ ]*\d[ ]*[A-Z]|(?:[A-Z][ ]*){2,}\d?)[A-Z ]*[A-Z]
(see this demo)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Regex for validation certain number format - php

Related

Regex optional groups and digit length

PHP Regex: Remove words not equal exactly 3 characters

Regex to get the first number after a certain string followed by any data until the number

Regex for PHP - similar to IP pattern?

php regex - find uppercase string with number and spaces in text

Categories

Resources