How preg_match return 1 - php

It is not possible to create a regular expression of this type xx.xx.xxx,
where x - can be any Latin or Russian character of any register or digit. But there must be 2 symbols, then the dot => 2 symbols => point => 3 characters
Made the following expression -
var_dump(preg_match('/^([а-я]*[А-Я]*[A-Z]*[a-z]*ё*Ё*[0-9]*){2}.([а-я]*[А-Я]*[A-Z]*[a-z]*ё*Ё*[0-9]*){2}.([а-я]*[А-Я]*[A-Z]*[a-z]*ё*Ё*[0-9]*){3}$/u', 'd1.df.dfd'));
The expression works correctly, but if you delete 1 character at the end, for example d1.df.df, it returns 1, although it should 0. Tell me please what is the problem?

The ([а-я]*[А-Я]*[A-Z]*[a-z]*ё*Ё*[0-9]*){2} pattern part matches 0 or more letters from а to я, then 0+ chars А to Я, etc. All that can match 0 or more times (see the * quantifier after ) that creates a repeated capturing group, so, the captures always only contain empty strings).
What you need is to "merge" all character classes inside each group into single character class, and apply the limiting quantifier to the class:
'~^([а-яА-ЯA-Za-zёЁ0-9]{2})\.([а-яА-ЯA-Za-zёЁ0-9]{2})\.([а-яА-ЯA-Za-zёЁ0-9]{3})$~u'
See the regex demo
With a case insensitive modifier, it will be a bit shorter:
'~^([а-яa-zё0-9]{2})\.([а-яa-zё0-9]{2})\.([а-яa-zё0-9]{3})$~ui'
Also, you may shorten the pattern using subroutines:
'~^(([а-яa-zё0-9]){2})\.((?2){2})\.((?2){3})$~ui'
See another regex demo. Herem (?2) repeats the pattern inside Capturing group #2, ([а-яa-zё0-9]).

Related

Match 2 or more uppercase characters in entire string

I'm trying to create a pattern in PHP that matches 2 or more upper case characters in a string.
I've tried the following, but it only matches 2 or more upper case characters in a row, not the entire string:
preg_match('/[A-Z]{2,}/', $string);
For example, the string "aBcDe" or "Red Apple" should return true.
You just have to allow other characters between your uppercase letters:
^(?:.*?\p{Lu}){2}
Demo
I used \p{Lu} here to include Unicode characters as well. If you don't want that just use [A-Z] instead like you did in your pattern.
This simply means:
^ from the start of the pattern
(?: group:
.*? match anything, but as few chars as possible
\p{Lu} match an uppercase letter
){2} ... two times
If all you need to do is identify that a string contains at least 2 uppercase characters then you can use the following:
[A-Z].*?[A-Z]
Try it here.
If you need to identify the specific uppercase characters in the string then things get more complicated.
UPDATE: As Lucas mentioned, you need a different regex if you want unicode support.
\p{Lu}.*?\p{Lu}
^.*[A-Z].*[A-Z].*$
A simple pattern stating the same would do.See demo.
https://regex101.com/r/pT4tM5/23
[A-Z].*[A-Z]
is about as simple as it gets - match an uppercase followed by anything repeated any number of times followed by any other uppercase letter.
If you need to match the whole line/string that has at least 2 upper case letters, you can also use
^(?=(?:.*[A-Z]){2}).+$
Demo here.

Why preg_match("/[^(22|75)]/", "25") returns false?

I want to test that a given string does not belong to the following group of strings: 22 75.
Could anyone please tell why PHP's preg_match("/[^(22|75)]/", "25") returns 0?
The weirdest thing is that preg_match("/[^(22|76)]/", "25") returns 1 as expected...
Edit:
I guess I understand the reason and the nature of my mistake, not how to make a check that a given two-digit number does not match 20,21,22,23,24, 75,76,77,78,79,80 ?
I need to assemble an expression to check a given age against the list of allowed ages (this presumes only two-digit numbers)
I can not use anything other than preg_match() (!preg_match() is not available in my case), I can only play with RegEx pattern.
Time for a Regular Expressions Lesson!
Explanation of your regular expressions
[^(22|75)]
Matches false because it is looking for the following:
A single character NOT in this list of characters: |()275
[^(22|76)]
Matches true because it is looking for:
A single character NOT in this list of characters: |()276
Why does it do this?
You wrapped your regex in a character class (click for more info)
To give an example of how character classes work, look at this regex:
[2222222222222221111111199999999999]
This character class will only match ONE character, if it is a 2,1 or a 9.
How to make it work for you:
To match the number 25 (or 22, 52, and 55), you can use this character class:
[25]{2}
This will match a 2 digit number containing either 2 or 5 at either place.
What are character classes
A character class is a collection of characters (not strings). With a character class, you're telling the regex engine to match only one out of several characters.
For example, if you wanted to match an a or e, you'd write [ae]. If you wanted to match grey or gray, you'd write gr[ae]y.
Explanation for first regex
[^(22|75)]
As said above, character classes match a single character from the list. Here, you're using ^ to get a negated character class, so this will match a single character that's not in the supplied list. In this case, our list contains the following characters:
( 2 2 | 7 5 )
Multiple characters are only counted once. So this effectively becomes:
( 2 | 7 5 )
25 is the string you're matching against. The regular expression asks: Does the supplied string contains a single character that's not in the above list? 2 and 5 are in the list, so the answer is No. That explains why preg_match() returns false (not false, 0 to be precise).
Explanation for second regex
/[^(22|76)]/
It is same as above. The only difference here is that 5 changed to 6. It now checks for the absense of any of the following characters:
( 2 | 7 6 )
The supplied string is still the same as before - 25. Does the string contain any character that's not in the list above? Yes! It does contain 5 (which is not in the list anymore). That explains why preg_match() returns 1.
Difference between character classes and alternation
They look similar but they do different things. Alternation can be used when you want to match a single regular expression out of several possible regular expressions. Unlike character classes, alternation works with a regex. A simple string, say foo is also a valid regular expression. It matches f followed by o, followed by o.
Use character class when you want to match one of the included characters. Use alternation when you want to match between n number of strings.
How should you modify the regex to obtain correct results
Negate your preg_match() call and use the regex (22|75):
if (!preg_match('/(22|75)/', '25')) {
# code...
}
This is the easiest approach. If you want to achieve this directly using a regex, then you may want to use look-arounds.
Alternative solution
If this is exactly what you're trying to do, then you don't need a regular expression at all. Leverage PHP's built-in functions for string manipulation! Not only it will be faster, it will be more readable too.
In this case, a simple in_array() should suffice:
if(!in_array('25', array(25,75))) {
# code ...
}
In regular expression, [...] match any character inside the bracket.
To be more correct:
[^...]: match any charcter not listed inside the bracket. (^: negate)
Remove the [, and ] if you want to match string that starts with 22 or 76.
Your regex is asking "does the string contain a character that is not (, 2, 7, 5, | or )?"
This is obviously not what you want.
Try this:
if( !in_array("25", array("22","75")))
^ inside of [...] is a negation of a character list.
(22|76)
Regex multiple character negation is a very tricky subject and can't be easily resolved.
But you could invert the return result of preg_match function ie.:
if(!preg_match('#22|76#', '25', $matches))
doSomething();

php regular expression for 4 characters

I am trying to construct a regular expression for a string which can have 0 upto 4 characters. The characters can only be 0 to 9 or a to z or A to Z.
I have the following expression, it works but I dont know how to set it so that only maximum of 4 characters are accepted. In this expression, 0 to infinity characters that match the pattern are accepted.
'([0-9a-zA-Z\s]*)'
You can use {0,4} instead of the * which will allow zero to four instances of the preceding token:
'([0-9a-zA-Z\s]{0,4})'
(* is actually the same as {0,}, i.e. at least zero and unbounded.)
If you want to match a string that consists entirely of zero to four of those characters, you need to anchor the regex at both ends:
'(^[0-9a-zA-Z]{0,4}$)'
I took the liberty of removing the \s because it doesn't fit your problem description. Also, I don't know if you're aware of this, but those parentheses do not form a group, capturing or otherwise. They're not even part of the regex; PHP is using them as regex delimiters. Your regex is equivalent to:
'/^[0-9a-zA-Z]{0,4}$/'
If you really want to capture the whole match in group #1, you should add parentheses inside the delimiters:
'/(^[0-9a-zA-Z]{0,4}$)/'
... but I don't see why you would want to; the whole match is always captured in group #0 automatically.
You can use { } to specify finite quantifiers:
[0-9a-zA-Z\s]{0,4}
http://www.regular-expressions.info/reference.html
You can avoid regular expressions completely.
if (strlen($str) <= 4 && ctype_alnum($str)) {
// contains 0-4 characters, that are either letters or digits
}
ctype_alnum()

Regex Rules for First and Second Character

I need help on following regular expression rules of javascript and php.
JS
var charFilter = new RegExp("^[A|B].+[^0123456789]$");
PHP
if (!preg_match('/^[A|B].+[^0123456789]$/', $data_array['sample_textfield'])) {
This regular expression is about
First character must be start with A or B and last character must not include 0 to 9.
I have another validation about, character must be min 3 character and max 6 number.
New rule I want to add is, second character cannot be C, if first letter is A.
Which means
ADA (is valid)
ACA (is not valid)
So I changed the regex code like this
JS
var charFilter = new RegExp("^(A[^C])|(B).+[^0123456789]$");
PHP
if (!preg_match('/^(A[^C])|(B).+[^0123456789]$/', $data_array['sample_textfield'])) {
It is worked for first and second character. If i type
ACA (it says invalid) , But if i type
AD3 (it says valid), it doesn't check the last character anymore. Last character must not contain 0 to 9 number, but it's show as valid.
Can anyone help me to fix that regex code for me ? Thank you so much.
Putting all of your requirements together, it seems that you want this pattern:
^(?=.{3,6}$)(?=A(?!C)|B).+\D$
That is:
From the beginning of the string ^
We can assert that there are between 3 to 6 of "any" characters to end of the string (?=.{3,6}$)
We can also assert that it starts with A not followed by C, or starts with B (?=A(?!C)|B)
And the whole thing doesn't end with a digit .+\D$
This will match (as seen on rubular.com):
= match = = no match =
ADA ACA
ABCD AD3
ABCDE ABCDEFG
ABCDEF
A123X
A X
Note that spaces are allowed by .+ and \D. If you insist on no spaces, you can use e.g. (?=\S{3,6}$) in the first part of the pattern.
(?=…) is positive lookahead; it asserts that a given pattern can be matched. (?!…) is negative lookahead; it asserts that a given pattern can NOT be matched.
References
regular-expressions.info
Lookarounds, Alternation, Anchors, Repetition, Dot, Character Class
Related questions
How does the regular expression (?<=#)[^#]+(?=#) work?
On alternation precedence
The problem with the original pattern is in misunderstanding the precedence of the alternation | specifier.
Consider the following pattern:
this|that-thing
This pattern consists of two alternates, one that matches "this", and another that matches "that-thing". Contrast this with the following pattern:
(this|that)-thing
Now this pattern matches "this-thing" or "that-thing", thanks to the grouping (…). Coincidentally it also creates a capturing group (which will capture either "this" or "that"). If you don't need the capturing feature, but you need the grouping aspect, use a non-capturing group ``(?:…)`.
Another example of where grouping is desired is with repetition: ha{3} matches "haaa", but (ha){3} matches "hahaha".
References
regular-expressions.info/Brackets for Grouping
Your OR is against the wrong grouping. Try:
^((A[^C])|(B)).+[^0123456789]$
In jasonbars solution the reason it doesn't match ABC is because it requires A followed by not C, which is two characters, followed by one or more of any character followed by a non number. Thus if the string begins with an A the minimum length is 4. You can solve this by using a look ahead assertion.
PHP
$pattern = '#^(A(?=[^C])|B).+\D$#';
i think it should be like
/^(A[^C]|B.).*[^0-9]$/
try this test code
$test = "
A
B
AB
AC
AAA
ABA
ACA
AA9
add more
";
$pat = '/^(A[^C]|B.).*[^0-9]$/';
foreach(preg_split('~\s+~', $test) as $p)
printf("%5s : %s\n<br>", $p, preg_match($pat, $p) ? "ok" : "not ok");

Can Someone explain this reg ex to me?

I recently asked a question on formatting a telephone number and I got lots of responses. Most of the responses were great but one i really wanted to figure out what its doing because it worked great. If phone is the following how do the other lines work...what are they doing so i can learn
$phone = "(407)888-9999";
$phone = preg_replace("~[^0-9]~", "", $phone);
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
Let's break the code into two lines.
preg_replace("~[^0-9]~", "", $phone);
First, we're going to replace matches to a regex with an empty string (in other words, delete matches from the string). The regex is [^0-9] (the ~ on each end is a delimiter). [...] in a regex defines a character class, which tells the regex engine to match one character within the class. Dashes are generally special characters inside a character class, and are used to specify a range (ie. 0-9 means all characters between 0 and 9, inclusive).
You can think of a character class like a shorthand for a big OR condition: ie. [0-9] is a shorthand for 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9. Note that classes don't have to contain ranges, either -- [aeiou] is a character class that matches a or e or i or o or u (or in other words, any vowel).
When the first character in the class is ^, the class is negated, which means that the regex engine should match any character that isn't in the class. So when you put all that together, the first line removes anything that isn't a digit (a character between 0 and 9) from $phone.
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
The second line tries to match $phone against a second expression, and puts the results into an array called $matches, if a match is made. You will note there are three sets of brackets; these define capturing groups -- ie. if there is a match of a pattern as a whole, you will end up with three submatches, which in this case will contain the area code, prefix and suffix of the phone number. In general, anything contained in brackets in a regular expression is capturing (while there are exceptions, they are beyond the scope of this explanation). Groups can be useful for other things too, without wanting the overhead of capturing, so a group can be made non-capturing by prefacing it with ?: (ie. (?:...)).
Each group does a similar thing: [0-9]{3} or [0-9]{4}. As we saw above, [0-9] defines a character class containing the digits between 0 and 9 (as the classes here don't start with ^, these are not negated groups). The {3} or {4} is a repetition operator, which says "match exactly 3 (or 4) of the previous token (or group)". So [0-9]{3} will match exactly three digits in a row, and [0-9]{4} will match exactly four digits in a row. Note that the digits don't have to be all the same (ie. 111), because the character class is evaluate for each repetition (so 123 will match because 1 matches [0-9], then 2 matches [0-9], and then 3 matches [0-9]).
In the preg_replace it looks for anything that is not, ^ inside of the [], 0-9 (basically not a number) and replaces / removes it from that string given the replacement is "".
For the first section, it pulls out the first 3 numbers ([0-9]{3}) the {3} is the number of characters to match the items inside the [] are what to match and since this is inside of paranthesis () it stores it as a match in the array $matches. The second part pulls out the next 3 numbers and the last part pulls out the last 4 numbers from $phone and stores the matches that were matched in $matches.
The ~ are delimeters for the regular expressions.
You know it's a regular expression from the regex tag.
So, you are pattern matching.
The pattern you are matching is: [^0-9] followed by the phone number.
[^0-9] is NOT '^' any one digit
So, the match after that is any 3 digits, followed by any 3 digits, followed by any 4 digits.
I don't think it will match because of the () around the area code and the dash are missing.
I'd do this:
~\(([0-9]{3})\)([0-9]{3})-([0-9]{4})~'
"[^0-9]" means everything but numbers from 0 to 9. So basically, first line replace everything but numbers with "" (nothing)
[0-9]{3} means number from 0 to 9, 3 times in a row.
So it check if you have 3 numbers then 3 numbers than 4 numbers and try to match it with $matches.
Check this tuts
Using Regular Expressions with PHP
http://www.webcheatsheet.com/php/regular_expressions.php
$phone = "(407)888-9999";
$phone = preg_replace("~[^0-9]~", "", $phone);
In php you have to delimit regex pattern in some non-alphanumeric character "~" is used here.
[^0-9] is regex pattern used to remove anything out of $phone that is not in 0-9 range remember [^...] will negate the pattern it precedes.
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
Again in this line of code you have "~" as delimiter and
([0-9]{3}) this part of pattern will return 3 numbers from string (note: {} is used to specify range/number of characters to match) in a different output array dimension (check your $matches variable for result) using ( ) in a pattern results in groups/submatches

Categories