Regex Rules for First and Second Character - php

I need help on following regular expression rules of javascript and php.
JS
var charFilter = new RegExp("^[A|B].+[^0123456789]$");
PHP
if (!preg_match('/^[A|B].+[^0123456789]$/', $data_array['sample_textfield'])) {
This regular expression is about
First character must be start with A or B and last character must not include 0 to 9.
I have another validation about, character must be min 3 character and max 6 number.
New rule I want to add is, second character cannot be C, if first letter is A.
Which means
ADA (is valid)
ACA (is not valid)
So I changed the regex code like this
JS
var charFilter = new RegExp("^(A[^C])|(B).+[^0123456789]$");
PHP
if (!preg_match('/^(A[^C])|(B).+[^0123456789]$/', $data_array['sample_textfield'])) {
It is worked for first and second character. If i type
ACA (it says invalid) , But if i type
AD3 (it says valid), it doesn't check the last character anymore. Last character must not contain 0 to 9 number, but it's show as valid.
Can anyone help me to fix that regex code for me ? Thank you so much.

Putting all of your requirements together, it seems that you want this pattern:
^(?=.{3,6}$)(?=A(?!C)|B).+\D$
That is:
From the beginning of the string ^
We can assert that there are between 3 to 6 of "any" characters to end of the string (?=.{3,6}$)
We can also assert that it starts with A not followed by C, or starts with B (?=A(?!C)|B)
And the whole thing doesn't end with a digit .+\D$
This will match (as seen on rubular.com):
= match = = no match =
ADA ACA
ABCD AD3
ABCDE ABCDEFG
ABCDEF
A123X
A X
Note that spaces are allowed by .+ and \D. If you insist on no spaces, you can use e.g. (?=\S{3,6}$) in the first part of the pattern.
(?=…) is positive lookahead; it asserts that a given pattern can be matched. (?!…) is negative lookahead; it asserts that a given pattern can NOT be matched.
References
regular-expressions.info
Lookarounds, Alternation, Anchors, Repetition, Dot, Character Class
Related questions
How does the regular expression (?<=#)[^#]+(?=#) work?
On alternation precedence
The problem with the original pattern is in misunderstanding the precedence of the alternation | specifier.
Consider the following pattern:
this|that-thing
This pattern consists of two alternates, one that matches "this", and another that matches "that-thing". Contrast this with the following pattern:
(this|that)-thing
Now this pattern matches "this-thing" or "that-thing", thanks to the grouping (…). Coincidentally it also creates a capturing group (which will capture either "this" or "that"). If you don't need the capturing feature, but you need the grouping aspect, use a non-capturing group ``(?:…)`.
Another example of where grouping is desired is with repetition: ha{3} matches "haaa", but (ha){3} matches "hahaha".
References
regular-expressions.info/Brackets for Grouping

Your OR is against the wrong grouping. Try:
^((A[^C])|(B)).+[^0123456789]$

In jasonbars solution the reason it doesn't match ABC is because it requires A followed by not C, which is two characters, followed by one or more of any character followed by a non number. Thus if the string begins with an A the minimum length is 4. You can solve this by using a look ahead assertion.
PHP
$pattern = '#^(A(?=[^C])|B).+\D$#';

i think it should be like
/^(A[^C]|B.).*[^0-9]$/
try this test code
$test = "
A
B
AB
AC
AAA
ABA
ACA
AA9
add more
";
$pat = '/^(A[^C]|B.).*[^0-9]$/';
foreach(preg_split('~\s+~', $test) as $p)
printf("%5s : %s\n<br>", $p, preg_match($pat, $p) ? "ok" : "not ok");

Related

preg_match() is evaluating my regex incorrently

My regex validation is producing true when it should be false. I've tried this exact example using online regex validators, and it is always rejected except in my code. Am I doing something wrong?
$name = "1NTH";
preg_match("/[A-Z][A-Z][A-Z][A-Z]?/",$name);
This exact example is evaluating to true.
You're getting the correct behaviour, as you're asking for three capital letters eventually followed by a fourth one.
You probably want to use this regex:
/^[A-Z][A-Z][A-Z][A-Z]?$/
(note the ^, start of line, and $ end of line) as it explicitly requires that the capital letters must be all the content of the text line.
This is because it is true. It contains [A-Z] characters.
You're missing the anchors to start your regex from the start of the string to finish of the string.
^[A-Z][A-Z][A-Z][A-Z]?$
There's nothing wrong with your regex. It is valid based on the rule you specified.
Let's do it one step at a time:
[A-Z] means match exactly 1 upper case alphabet.
[A-Z]? means, match either 0 or 1 upper case alphabet.
See what's going on? If not, move on.
[A-Z][A-Z][A-Z] means match exactly 3 upper case alphabets. (1 for each [A-Z] rule)
[A-Z][A-Z][A-Z][A-Z]? means the first three characters must be an upper case alphabet. The last one can either be 0 or 1 upper case alphabet.
In your example, 1NTH contains exactly 3 upper case alphabets, which is correct. You didn't put any restrictions on whether it should contain number or not, whether before or after the 3 alphabets. And the last [A-Z]?? Well, that's optional, right? (see rule #2)
The standard PHP regular expression engine checks if the the string contains the pattern, and is not an exact match. That differs to, for example, the standard Java regular expression engine.
You should use ^ and $, which match respectively the beginning and the end of a string. Both are zero-length assertions.
$name = "1NTH";
preg_match("/^[A-Z]{3}[A-Z]?$/", $name);
PS: I have optimized your regular expression by using the quantifier {3}, which matches three subsequent occurrences of the preceding character or group.
Accoring to PHP Manual:
preg_match() returns 1 if the pattern matches given subject, 0 if it does not, or FALSE if an error occurred.
In your example, there must be 3 obligatory and 1 optional capital letter. So, the match is due.
As stribizhev said, your regex matches since you're asking for more than 3 letters which are found in $name. I assume you want to reject "1NTH" because it starts with a digit. That means you have to add an anchor saying "from the start" (\A).
Also, the 3 repeated [A-Z] can be summarized by adding a repeat-counter. So the whole statement should be: \A[A-Z]{3,}
You have given like this,
$name = "1NTH";
preg_match("/[A-Z][A-Z][A-Z][A-Z]?/",$name);
In your code some please change this below code
$name = "1NTH";
preg_match("/[A-Z][A-Z][A-Z][A-Z]?$/",$name);
you have missed '$' in end of preg string.
i have checked and it's working perfectly to your requirement.
See this link,and you also test once in this link. Click Here

Regex to match a-z spaces, character limit, only first letter capitalized

I need some help with a regex. I'm terrible at this.
Rules:
Only letters a through z and spaces
Minimum 2 letters
Maximum 30 letters
Each word must be at least 2 letters
Only the first letter of each word may be capital but the first letter must always be capital
My attempt:
^[A-Z][a-z]{2,30}$
I'm using this in PHP.
Okay, so let's try solving requirements 1 to 3 first. If you mean 2 to 30 characters it's as simple as this:
^[a-zA-Z ]{2,30}$
Now for the other requirements. Let's handle those alone. Point 4 requires each word to be of the form [a-zA-Z][a-z]*. To make sure that each word has at least two letters, we can simply turn the * into a + (which means 1 or more repetitions). If we insert explicit spaces around these, that makes sure that the [a-z]+ cannot be followed directly by a capital letter:
^[A-Z][a-z]+(?:[ ]+[a-zA-Z][a-z]+)*$
Note that I treated the first word separately.
Finally, how do we combine the two? By putting one into a lookahead. I'm going for the counting here:
^(?=[a-zA-Z ]{2,30}$)[A-Z][a-z]+(?:[ ]+[a-zA-Z][a-z]+)*$
This works because, after the input is checked against the lookahead the engine resets it "cursor" to where it started (the beginning of the string) and continues matching as usual. This way we can run two passes over the input, checking for independent conditions.
Finally, note that the lookahead requirement simply translates to the string's length. In such a case it would be easier (and most often better) to check this separately:
$len = strlen($input)
if ($len < 2 || $len > 30)
// report error about string length
else if (!preg_match('/^[A-Z][a-z]+(?:[ ]+[a-zA-Z][a-z]+)*$/', $input))
// report error about pattern
else
// process input
This makes it much easier to give sensible error messages depending on which condition was violated.
Let's try this:
^[A-Z]((?<= )[A-Z]|[a-z ]){2,29}$
[A-Z] -- a capital letter
(
(?<= )[A-Z] -- either a capital letter preceded by a space
| -- or
[a-z ] -- a lowercase letter or a space
){2,29} -- 2 to 29 times (plus the initial capital)
You will need to use the PCRE (not ereg_*) for the lookbehind to work.
"My name Is bob"
↑ ↑ ↑
| | \-- this is a "(?<= )[A-Z]"
| \--- this is a "[a-z]"
\---- this is a "[ ]"
"naMe"
↑
\-- this is NOT a "(?<= )[A-Z]" (a character before the [A-Z] is not a space)
EDIT: damn, you added the "Each word must be at least 2 letters". Use m.buettner's.

php regular expression plus sign

I'm playing around with PHP Regex in order to improve my skills with it.
I'm having a hard time trying to understand the plus sign - so I wrote the following code:
$subject = 'aaa bbb cccc dddd';
echo preg_replace('/(\w)/',"$1*",$subject) . '<br>';
echo preg_replace('/(\w+)/',"$1*",$subject) . '<br>';
echo preg_replace('/(\w)+/',"$1*",$subject) . '<br>';
With results in:
a*a*a* b*b*b* c*c*c*c* d*d*d*d*
aaa* bbb* cccc* dddd*
a* b* c* d*
I don't understand why these results come about. Can someone please explain what's going on in this example
in regular expressions, + means one or more of the preceding character or group.
The pattern /(\w)/, means match a single word character (a-zA-Z0-9_) in a single group. So it will match each letter. The first match group will be just a. The replace will replace each individual letter with that letter followed by an asterisk.
The pattern /(\w+)/ will match one or more word characters in a group. So it will match each block of letters. The first match group will be aaa. The replace will replace each block of multiple letters followed by a asterisk.
The last pattern /(\w)+/ is a little more tricky, but will match a single word character in a group but the + means that it will match one or more of the group. So the first match will be a, but the replace will replace all of the groups until there isn't a match with the last matched group (of course followed by an asterisk). So if you tried the string aaab ccc, your result would end up as b* c*. b is the last matched group in the first sequence and so the replace would use that.
Your mistake isn't the plus sign, it's understanding what the parentesis is for and how it works. The parenthesis is for grouping your match into a variable, hence why you can do $1, the second set of () gives you $2 and so on...
(\w) means 1 word character
(\w+) means 1 or more word characters
(\w)+ matches 1 or more word characters, but only the first one is put into the variable, because only the \w is inside the paranthesis

Regexp return true, but author of a book says it shouldn't

Reading an online resource on PHP about Regexp(TuxRadar).
According to the author the following should not match "aaa1" to the pattern and therefore return false(0), but I get true(1).
<?php
$str = "aaa1";
print preg_match("/[a-z]+[0-9]?[a-z]{1}/", $str);
?>
Why?
Regular Expressions
Are you sure there isn't supposed to be a trailing $ there? Without it, returning true makes a lot of sense - the first [a-z] block matches the first 2 a characters, the [0-9] matches nothing, and the last [a-z] matches the 3rd a. The trailing 1 is ignored.
Looking at the link to the book, it does seem there's an error there:
Must end with a lower case letter
This is only true if the regular expression is anchored to the end of the string with a $.
It matches because [0-9]? matches a digit zero or one times.
<?php
$str = "aaa1";
print preg_match("/[a-z]+[0-9]+[a-z]{1}/", $str);
?>
won't result in a match.
Lets break down the regular expression
[a-z]+ means one or more letters, being gready that would match a, aa or aaa
[0-9]? means an optional - so could match a digit
[a-z] means to match a letter, that could be an a
Therefore due to the [0-9] being optional 1 would match aa, 2 would match nothing and 3 would match an a

Can Someone explain this reg ex to me?

I recently asked a question on formatting a telephone number and I got lots of responses. Most of the responses were great but one i really wanted to figure out what its doing because it worked great. If phone is the following how do the other lines work...what are they doing so i can learn
$phone = "(407)888-9999";
$phone = preg_replace("~[^0-9]~", "", $phone);
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
Let's break the code into two lines.
preg_replace("~[^0-9]~", "", $phone);
First, we're going to replace matches to a regex with an empty string (in other words, delete matches from the string). The regex is [^0-9] (the ~ on each end is a delimiter). [...] in a regex defines a character class, which tells the regex engine to match one character within the class. Dashes are generally special characters inside a character class, and are used to specify a range (ie. 0-9 means all characters between 0 and 9, inclusive).
You can think of a character class like a shorthand for a big OR condition: ie. [0-9] is a shorthand for 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9. Note that classes don't have to contain ranges, either -- [aeiou] is a character class that matches a or e or i or o or u (or in other words, any vowel).
When the first character in the class is ^, the class is negated, which means that the regex engine should match any character that isn't in the class. So when you put all that together, the first line removes anything that isn't a digit (a character between 0 and 9) from $phone.
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
The second line tries to match $phone against a second expression, and puts the results into an array called $matches, if a match is made. You will note there are three sets of brackets; these define capturing groups -- ie. if there is a match of a pattern as a whole, you will end up with three submatches, which in this case will contain the area code, prefix and suffix of the phone number. In general, anything contained in brackets in a regular expression is capturing (while there are exceptions, they are beyond the scope of this explanation). Groups can be useful for other things too, without wanting the overhead of capturing, so a group can be made non-capturing by prefacing it with ?: (ie. (?:...)).
Each group does a similar thing: [0-9]{3} or [0-9]{4}. As we saw above, [0-9] defines a character class containing the digits between 0 and 9 (as the classes here don't start with ^, these are not negated groups). The {3} or {4} is a repetition operator, which says "match exactly 3 (or 4) of the previous token (or group)". So [0-9]{3} will match exactly three digits in a row, and [0-9]{4} will match exactly four digits in a row. Note that the digits don't have to be all the same (ie. 111), because the character class is evaluate for each repetition (so 123 will match because 1 matches [0-9], then 2 matches [0-9], and then 3 matches [0-9]).
In the preg_replace it looks for anything that is not, ^ inside of the [], 0-9 (basically not a number) and replaces / removes it from that string given the replacement is "".
For the first section, it pulls out the first 3 numbers ([0-9]{3}) the {3} is the number of characters to match the items inside the [] are what to match and since this is inside of paranthesis () it stores it as a match in the array $matches. The second part pulls out the next 3 numbers and the last part pulls out the last 4 numbers from $phone and stores the matches that were matched in $matches.
The ~ are delimeters for the regular expressions.
You know it's a regular expression from the regex tag.
So, you are pattern matching.
The pattern you are matching is: [^0-9] followed by the phone number.
[^0-9] is NOT '^' any one digit
So, the match after that is any 3 digits, followed by any 3 digits, followed by any 4 digits.
I don't think it will match because of the () around the area code and the dash are missing.
I'd do this:
~\(([0-9]{3})\)([0-9]{3})-([0-9]{4})~'
"[^0-9]" means everything but numbers from 0 to 9. So basically, first line replace everything but numbers with "" (nothing)
[0-9]{3} means number from 0 to 9, 3 times in a row.
So it check if you have 3 numbers then 3 numbers than 4 numbers and try to match it with $matches.
Check this tuts
Using Regular Expressions with PHP
http://www.webcheatsheet.com/php/regular_expressions.php
$phone = "(407)888-9999";
$phone = preg_replace("~[^0-9]~", "", $phone);
In php you have to delimit regex pattern in some non-alphanumeric character "~" is used here.
[^0-9] is regex pattern used to remove anything out of $phone that is not in 0-9 range remember [^...] will negate the pattern it precedes.
preg_match('~([0-9]{3})([0-9]{3})([0-9]{4})~', $phone, $matches);
Again in this line of code you have "~" as delimiter and
([0-9]{3}) this part of pattern will return 3 numbers from string (note: {} is used to specify range/number of characters to match) in a different output array dimension (check your $matches variable for result) using ( ) in a pattern results in groups/submatches

Categories