preg_match() is evaluating my regex incorrently - php

My regex validation is producing true when it should be false. I've tried this exact example using online regex validators, and it is always rejected except in my code. Am I doing something wrong?
$name = "1NTH";
preg_match("/[A-Z][A-Z][A-Z][A-Z]?/",$name);
This exact example is evaluating to true.

You're getting the correct behaviour, as you're asking for three capital letters eventually followed by a fourth one.
You probably want to use this regex:
/^[A-Z][A-Z][A-Z][A-Z]?$/
(note the ^, start of line, and $ end of line) as it explicitly requires that the capital letters must be all the content of the text line.

This is because it is true. It contains [A-Z] characters.
You're missing the anchors to start your regex from the start of the string to finish of the string.
^[A-Z][A-Z][A-Z][A-Z]?$

There's nothing wrong with your regex. It is valid based on the rule you specified.
Let's do it one step at a time:
[A-Z] means match exactly 1 upper case alphabet.
[A-Z]? means, match either 0 or 1 upper case alphabet.
See what's going on? If not, move on.
[A-Z][A-Z][A-Z] means match exactly 3 upper case alphabets. (1 for each [A-Z] rule)
[A-Z][A-Z][A-Z][A-Z]? means the first three characters must be an upper case alphabet. The last one can either be 0 or 1 upper case alphabet.
In your example, 1NTH contains exactly 3 upper case alphabets, which is correct. You didn't put any restrictions on whether it should contain number or not, whether before or after the 3 alphabets. And the last [A-Z]?? Well, that's optional, right? (see rule #2)

The standard PHP regular expression engine checks if the the string contains the pattern, and is not an exact match. That differs to, for example, the standard Java regular expression engine.
You should use ^ and $, which match respectively the beginning and the end of a string. Both are zero-length assertions.
$name = "1NTH";
preg_match("/^[A-Z]{3}[A-Z]?$/", $name);
PS: I have optimized your regular expression by using the quantifier {3}, which matches three subsequent occurrences of the preceding character or group.

Accoring to PHP Manual:
preg_match() returns 1 if the pattern matches given subject, 0 if it does not, or FALSE if an error occurred.
In your example, there must be 3 obligatory and 1 optional capital letter. So, the match is due.

As stribizhev said, your regex matches since you're asking for more than 3 letters which are found in $name. I assume you want to reject "1NTH" because it starts with a digit. That means you have to add an anchor saying "from the start" (\A).
Also, the 3 repeated [A-Z] can be summarized by adding a repeat-counter. So the whole statement should be: \A[A-Z]{3,}

You have given like this,
$name = "1NTH";
preg_match("/[A-Z][A-Z][A-Z][A-Z]?/",$name);
In your code some please change this below code
$name = "1NTH";
preg_match("/[A-Z][A-Z][A-Z][A-Z]?$/",$name);
you have missed '$' in end of preg string.
i have checked and it's working perfectly to your requirement.
See this link,and you also test once in this link. Click Here

Related

Why does this regex fail to work...any ideas?

I am faced with strings as follows:
start of line;
characters C, M, P, T, K, X, or Q;
3 more word characters;
any number of other characters except newline;
space;
possible M literal;
2 digits;
/;
possible M literal;
2 or 3 digits;
space.
I am nearly certain I have translated this into the following regex correctly but this line of PHP code still returns NULL when passed valid strings. Furthermore, when I test this regex with regexpal and the identical subject string, the correct result is returned. I'm pretty sure I'm having a problem with the pattern delimiter or the first 2 groups (start of line then character check). Any ideas? - Brandon
preg_match_all('&^(\C|\M|\P|\T|\K|\X|\Q)[A-Z0-9]{3}.*\sM?[0-9]{2}/M?[0-9]{2,3}\s&', $subject, $resultArr);
First, I would typically suggest using a more common pattern delimiter such as /, #, or ~. I personally would not actually use / here since you use that in the pattern. This is just preference though, & is totally valid.
Second, there is no need for backslashes along with the characters at the start of the line (you can also use a character class for these, which I find more readable). As shown, some of these do form valid escape sequences, so you are likely getting unpredictable behavior.
Third, I am guessing you want an ungreedy search (U pattern modifier after pattern). I find in most cases this is desired behavior when using .* somewhere in pattern. In this case, since you are using preg_match_all() a greedy search is particularly problematic, as it would match the first case where the first portion of your pattern matches along with the last case with the last part of the pattern matches with all other potential matches lumped into the .* portion of the pattern.
So this leaves us with something like this:
$pattern = '#^[CMPTKXQ][A-Z0-9]{3}.*\sM?[0-9]{2}/M?[0-9]{2,3}\s#U';
preg_match_all($pattern, $subject, $resultArr);

php regex needed to check that a string has at least one uppercase char, one lower case char and either one number or symbol

Hi I need to use php's pregmatch to check a string is valid. In order to be valid the string needs to have at least one uppercase character, at least one lowercase character, and then at least one symbol or number
thanks
You can achieve this by using lookaheads
^(?=.*[a-z])(?=.*[A-Z])(?=.*[\d,.;:]).+$
See it here on Regexr
A lookahead is a zero width assertion, that means it does not match characters, it checks from its position if the assertion stated is true. All assertions are evaluated separately, so the characters can be in any order.
^ Matches the start of the string
(?=.*[a-z]) checks if somewhere in the string is a lowercase character
(?=.*[A-Z]) checks if somewhere in the string is a uppercase character
(?=.*[\d,.;:]) checks if somewhere in the string is a digit or one of the other characters, add those you want.
.+$ Matches the string till the end of the string
As soon as one of the Assertions fail, the complete regex fail.
If the match has to be in the order you've described, you could use
$result = preg_match('/[A-Z]+[a-z]+[\d!$%^&]+/', $string);
If the characters can be in any order I'm not so sure, without doing three separate checks like so:
$result = (preg_match('/[A-Z]+/', $string) && preg_match('/[a-z]+/', $string) && preg_match('/[\d!$%^&]+/', $string));
As people have pointed out below, you can do this all in one regular expression with lookaheads.
According to your request:
[A-Z]+ Match any uppercase char
[a-z]+ Match any lowercase char
[\d§$%&]+ Match a number or special chars (add more special if you need to)
The result would look like this: [A-Z]+[a-z]+[\d§$%&]+
This isn't ideal though. You might want to check Regexr and try what kind of regex fits your requirements.
If you want these not to be necessarily in order, you need a lookahead. The following expression will validate for at least one lower char, one upper char and one number:
$result = preg_match('^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])', $string);
You can put a lot of special chars with the numbers, like this:
$result = preg_match('^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9$%])', $string);

Matching ugly extra abbreviations and numbers in titles with PHP regex

I have to create regex to match ugly abbreviations and numbers. These can be one of following "formats":
1) [any alphabet char length of 1 char][0-9]
2) [double][whitespace][2-3 length of any alphabet char]
I tried to match double:
preg_match("/^-?(?:\d+|\d*\.\d+)$/", $source, $matches);
But I coldn't get it to select following example: 1.1 AA My test title. What is wrong with my regex and how can I add those others to my regex too?
In your regex you say "start of string, followed by maybe a - followed by at least one digit or followed by 0 or more digits, followed by a dot and followed by at least one digit and followed by the end of string.
So you regex could match for example.. 4.5, -.1 etc. This is exactly what you tell it to do.
You test input string does not match since there are other characters present after the number 1.1 and even if it somehow magically matched your "double" matching regex is wrong.
For a double without scientific notation you usually use this regex :
[-+]?\b[0-9]+(\.[0-9]+)?\b
Now that we have this out of our way we need a whitespace \s and
[2-3 length of alphabet]
Now I have no idea what [2-3 length of alphabet] means but by combining the above you get a regex like this :
[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]
You can also place anchors ^$ if you want the string to match entirely :
^[-+]?\b[0-9]+(\.[0-9]+)?\b\s[2-3 length of alphabet]$
Feel free to ask if you are stuck! :)
I see multiple issues with your regex:
You try to match the whole string (as a number) by the anchors: ^ at the beginning and $ at the end. If you don't want that, remove those.
The number group is non-catching. It will be checked for matches, but those won't be added to $matches. That's because of the ?: internal options you set in (?:...). Remove ?: to make that group catching.
You place the shorter digit-pattern before the longer one. If you swap the order, the regex engine will look for it first and on success prefer it over the shorter one.
Maybe this already solves your issue:
preg_match("/-?(\d*\.\d+|\d+)/", $source, $matches);
Demo

Regex Rules for First and Second Character

I need help on following regular expression rules of javascript and php.
JS
var charFilter = new RegExp("^[A|B].+[^0123456789]$");
PHP
if (!preg_match('/^[A|B].+[^0123456789]$/', $data_array['sample_textfield'])) {
This regular expression is about
First character must be start with A or B and last character must not include 0 to 9.
I have another validation about, character must be min 3 character and max 6 number.
New rule I want to add is, second character cannot be C, if first letter is A.
Which means
ADA (is valid)
ACA (is not valid)
So I changed the regex code like this
JS
var charFilter = new RegExp("^(A[^C])|(B).+[^0123456789]$");
PHP
if (!preg_match('/^(A[^C])|(B).+[^0123456789]$/', $data_array['sample_textfield'])) {
It is worked for first and second character. If i type
ACA (it says invalid) , But if i type
AD3 (it says valid), it doesn't check the last character anymore. Last character must not contain 0 to 9 number, but it's show as valid.
Can anyone help me to fix that regex code for me ? Thank you so much.
Putting all of your requirements together, it seems that you want this pattern:
^(?=.{3,6}$)(?=A(?!C)|B).+\D$
That is:
From the beginning of the string ^
We can assert that there are between 3 to 6 of "any" characters to end of the string (?=.{3,6}$)
We can also assert that it starts with A not followed by C, or starts with B (?=A(?!C)|B)
And the whole thing doesn't end with a digit .+\D$
This will match (as seen on rubular.com):
= match = = no match =
ADA ACA
ABCD AD3
ABCDE ABCDEFG
ABCDEF
A123X
A X
Note that spaces are allowed by .+ and \D. If you insist on no spaces, you can use e.g. (?=\S{3,6}$) in the first part of the pattern.
(?=…) is positive lookahead; it asserts that a given pattern can be matched. (?!…) is negative lookahead; it asserts that a given pattern can NOT be matched.
References
regular-expressions.info
Lookarounds, Alternation, Anchors, Repetition, Dot, Character Class
Related questions
How does the regular expression (?<=#)[^#]+(?=#) work?
On alternation precedence
The problem with the original pattern is in misunderstanding the precedence of the alternation | specifier.
Consider the following pattern:
this|that-thing
This pattern consists of two alternates, one that matches "this", and another that matches "that-thing". Contrast this with the following pattern:
(this|that)-thing
Now this pattern matches "this-thing" or "that-thing", thanks to the grouping (…). Coincidentally it also creates a capturing group (which will capture either "this" or "that"). If you don't need the capturing feature, but you need the grouping aspect, use a non-capturing group ``(?:…)`.
Another example of where grouping is desired is with repetition: ha{3} matches "haaa", but (ha){3} matches "hahaha".
References
regular-expressions.info/Brackets for Grouping
Your OR is against the wrong grouping. Try:
^((A[^C])|(B)).+[^0123456789]$
In jasonbars solution the reason it doesn't match ABC is because it requires A followed by not C, which is two characters, followed by one or more of any character followed by a non number. Thus if the string begins with an A the minimum length is 4. You can solve this by using a look ahead assertion.
PHP
$pattern = '#^(A(?=[^C])|B).+\D$#';
i think it should be like
/^(A[^C]|B.).*[^0-9]$/
try this test code
$test = "
A
B
AB
AC
AAA
ABA
ACA
AA9
add more
";
$pat = '/^(A[^C]|B.).*[^0-9]$/';
foreach(preg_split('~\s+~', $test) as $p)
printf("%5s : %s\n<br>", $p, preg_match($pat, $p) ? "ok" : "not ok");

preg_match_all to parse an xml-like attribute string

I have a string like so:
option_alpha="value" option_beta="some other value" option_gamma="X" ...etc.
I'm using this to parse them into name & value pairs:
preg_match_all("/([a-z0-9_]+)\s*=\s*[\"\'](.+?)[\"\']/is", $var_string, $matches)
Which works fine, unless it encounters an empty attribute value:
option_alpha="value" option_beta="" option_gamma="X"
What have I done wrong in my regex?
[\"\'](.+?)[\"\']
should be
[\"\'](.*?)[\"\']
* instead of +. The first means there can be zero to whatever occurrences of the previous expression (so it can be omitted, that is what you need). The latter means, there has to be at least one.
I think you want to change the very middle of your expression from (.+?) to (.*?). That makes it a non-greedy match on any character (including no characters), instead of a non-greedy match on at least one character.
preg_match_all("/([a-z0-9_]+)\s*=\s*[\"\'](.*?)[\"\']/is",$var_string,$matches);
The other answers here are right in that you need to change the middle of the expression, but I would change it to [^\"\']* which means "any character that is not a ", 0 or more times. This ensures the greediness doesn't match more than it is supposed to and allows for empty "".
your expression becomes
"/([a-z0-9_]+)\s*=\s*[\"\'][^\"\']*[\"\']/is"
note you can change the [a-z0-9_] to [\w_] which would also for upper case characters.

Categories