Regular expressions in preg_match pattern not matching string - php

I've read the PHPManual RegEx Intro, but am confused on how to structure the pattern for preg_match. I am checking that the username on the login form is all lower case alphabet between 2 and 5 characters in length.
Pattern 1: Initially, I used a character class followed by a repetition quantifier:
if (preg_match("[a-z]{2,5}",$_POST['ULusername'])) {
$formmessage = 'Hello, ' . $_POST['ULusername'];
} else {
$formmessage = 'Enter username.';
}
The output was always "Enter username."
Pattern 2: I then thought perhaps I needed delimiters:
if (preg_match("/[a-z]{2,5}/",$_POST['ULusername'])) {
$formmessage = 'Hello, ' . $_POST['ULusername'];
} else {
$formmessage = 'Enter username.';
}
But the output was still always "Enter username."
Pattern 3: Finally, I tried delimiters with the begin/end anchors:
if (preg_match("#^([a-z]{2,5})$#",$_POST['ULusername'])) {
$formmessage = 'Hello, ' . $_POST['ULusername'];
} else {
$formmessage = 'Enter username.';
}
This gave me the desired output.
Why does the third pattern work, but not the first two?

The first one fails because it doesn't contain a delimiter.
In the second one, there is a problem in your logic. Because /[a-z]{2,5}/ check only two to five consecutive lower case letters only. And there is no indication of input length in there. Try it with ABcdEF, then you'll understand what's going on there.
In the third one first, you grouped this pattern [a-z]{2,5} using () and check whether that given string starts and ends with this ([a-z]{2,5}) group pattern. But according to my tests of your third code, the grouping doesn't affect your logic. Try it without () and you will get the same result. Because when you group the logic [a-z]{2,5} and check whether a given string starts and ends with that group is same as #^[a-z]{2,5}$#.
For more information, you can refer tutorials about regular expressions.
http://www.rexegg.com/regex-quickstart.html
https://www.regular-expressions.info/refcapture.html

The first pattern returns false, an indication that an error occurred (here, no delimiter in pattern).
The second and third patterns are valid regex patterns but they do not match the same set of strings. Using "/[a-z]{2,5}/" you'd have a match whenever $_POST['ULusername'] contains at least two consecutive lowercase characters. However, it does not care if the length of the whole string is greater than 5.
The last pattern both has delimiters and a start and end anchors, so only lowercase strings of length 2 to 5 will match.

Related

Finding presence of chars or strings out of the allowed ones

Well, I'm stuck, I cannot find the correct form for the RegEx to provide to the PHP preg_match.
I have two strings. Say "mdo" and "o", but they could be really random.
I have a dictionary of allowed chars and strings.
For the example, allowed chars are "a-gm0-9", and allowed strings are "do" and "si".
THE GOAL
I'm trying to check that the input string doesn't contain any char or string but those in the dictionary, case-insensitive.
So the case of "mdo" wouldn't match because m is allowed just like the string do. Not the same for o instead, which has o that is not an allowed char and which doesn't contain the whole allowed string do.
My struggling reason
It's ok to negate [^a-gm0-9] and (?!do|si), but what I cannot achieve is to place them inside a single regex in order to apply the following PHP code:
<?php
$inputStr = 'mdo';
$rex = '/?????/i'; // the question subject
// if disallowed chars/strings are found...
if( preg_match($regex, $inputStr) == 1 )
return false; // the $inputStr is not valid
return true;
?>
Because two cascading preg_matches would break the logic and don't work.
How to mix chars check and groups check in "AND" in a single regex? Their positions don't matter.
You can use this pattern:
return (bool) preg_match('~^(?:do|si|[a-gm0-9])*+\C~i', $inputStr);
The idea is to match all allowed chars and substrings from the start in a repeated group with a possessive quantifier and to check if a single byte \C remains. Since the quantifier is greedy and possessive, the single byte after, if found, can't be allowed.
Note that most of the time, it is more simple to negate the preg_match function, example:
return (bool) !preg_match('~^(?:do|si|[a-gm0-9])*$~iD', $inputStr);
(or with a + quantifier, if you don't want to allow empty strings)

Php Sanitize and Validate form with some character exceptions

I'm using in Php Sanitize and Validate Filters but I have problems to add some rules, I have some basic knowledge of php so I think this question is easy for you.
if ($_POST['ccp_n'] != "") {
$ccp = filter_var($_POST['ccp_n'], FILTER_SANITIZE_NUMBER_INT);
if (!filter_var($ccp, FILTER_VALIDATE_INT)) {
$errors .= 'Insert a valid code.<br/>';
}
} else {
$errors .= 'Insert a code.<br/>';
}
I need to add a minimum and maximum number of characters (14-15) and I want to accept this characters ( - or space ) .The exact sequence is 0000-0000-0000 (the last four digits could be 5 too
Thanks
You can use preg_match and apply a regular expression.
preg_match ( string $pattern , string $TestString) See here in detail
The pattern is the problem. You need to define in detail what is allowed.
For example, the pattern:
'~^\d{4}-\d{4}-\d{4,5}$~D'
would be the whole string from start ^ to the end $. 4 digits, hyphen, 4 digits, hyphen, 4 to 5 digits.
See it here on Regexr
Update:
I added the D modifier to the end, otherwise the $ not only match to the end of the string, but also before a newline as last character in the string. See here for php modifiers in detail
Use a regular expression with preg_match(). Alternatively, you can also use sscanf() to parse the input from a string according to a format.

PHP Reverse Preg_match [duplicate]

This question already has answers here:
Regular expression to match a line that doesn't contain a word
(34 answers)
Closed 4 years ago.
if(preg_match("/" . $filter . "/i", $node)) {
echo $node;
}
This code filters a variable to decide whether to display it or not. An example entry for $filter would be "office" or "164(.*)976".
I would like to know whether there is a simple way to say: if $filter does not match in $node. In the form of a regular expression?
So... not an "if(!preg_match" but more of a $filter = "!office" or "!164(.*)976" but one that works?
This can be done if you definitely want to use a "negative regex" instead of simply inverting the result of the positive regex:
if(preg_match("/^(?:(?!" . $filter . ").)*$/i", $node)) {
echo $node;
}
will match a string if it doesn't contain the regex/substring in $filter.
Explanation: (taking office as our example string)
^ # Anchor the match at the start of the string
(?: # Try to match the following:
(?! # (unless it's possible to match
office # the text "office" at this point)
) # (end of negative lookahead),
. # Any character
)* # zero or more times
$ # until the end of the string
The (?!...) negative assertion is what you're looking for.
To exclude a certain string from appearing anywhere in the subject you can use this double assertion method:
preg_match('/(?=^((?!not_this).)+$) (......)/xs', $string);
It allows to specify an arbitrary (......) main regex still. But you could just leave that out, if you only want to forbid a string.
Answer number 2 by mario is the correct answer, and here is why:
First to answer the comment by Justin Morgan,
I'm curious, do you have any idea what the performance of this would
be as opposed to the !preg_match() approach? I'm not in a place where
I can test them both. – Justin Morgan Apr 19 '11 at 21:53
Consider the gate logic for a moment.
When to negate preg_match(): when looking for a match and you want the condition to be 1)true for the absence of the desired regex, or 2)false for the regex being present.
When to use negative assertion on the regex: when looking for a match and you want the condition to be true if the string ONLY matches the regex, and fail if anything else is found. This is necessary if you really need to test for undesireable characters while allowing ommission of permitted characters.
Negating the result of (preg_match() === 1) only tests if the regex is present. If 'bar' is required, and numbers aren't allowed, the following won't work:
if (preg_match('bar', 'foo2bar') === 1) {
echo "found 'bar'"; // but a number is here, so fail.
}
if (!pregmatch('[0-9]', 'foobar') === 1) {
echo "no numbers found"; // but didn't test for 'bar', so fail.
}
So, in order to really test multiple regexes, a beginner would test using multiple preg_match() calls... we know this is a very amateur way to do it.
So, the Op wants to test a string for possible regexes, but the conditional may only pass as true if the string contains at least one of them. For most simple cases, simply negating preg_match() will suffice, but for more complex or extensive regex patterns, it won't. I will use my situation for a more real-life scenario:
Say you want to have a user form for a person's name, particularly a last name. You want your system to accept all letters regardless of case and placement, accept hyphens, accept apostrophes, and exclude all other characters. We know that matching a regex for all undesired characters is the first thing we think of, but imagine you are supporting UTF-8... that's alot of characters! Your program will be nearly as big as the UTF-8 table just on a single line! I don't care what hardware you have, your server application has a finite limit on how long a command be, not to mention the limit of 200 parenthesized subpatterns, so the ENTIRE UTF-8 character table (minus [A-Z],[a-z],-,and ') is too long, never mind that the program itself will be HUGE!
Since we won't use an if (!preg_match('.#\\$\%... this can be quite long and impossible to evaluate... on a string to see if the string is bad, we should instead test the easier way, with an assertion negative lookaround on the regex, then negate the overall result using:
<?php
$string = "O'Reilly-Finlay";
if (preg_match('/?![a-z\'-]/i', $string) === 0) {
echo "the given string matched exclusively for regex pattern";
// should not work on error, since preg_match returns false, which is not an int (we tested for identity, not equality)
} else {
echo "the given string did not match exclusively to the regex pattern";
}
?>
If we only looked for the regex [a-z\'-]/i , all we say is "match string if it contains ANY of those things", so bad characters aren't tested. If we negated at the function, we say "return false if we find a match that contained any of these things". This isn't right either, so we need to say "return false if we match ANYTHING not in the regex", which is done with lookahead. I know the bells are going off in someone's head, and they are thinking wildcard expansion style... no, lookahead doesn't do this, it just does negation on each match, and continues. So, it checks first character for regex, if it matches, it moves on until it finds a non-match or the end. After it finishes, everything that was found to not match the regex is returned to the match array, or simply returns 1. In short, assert negative on regex 'a' is the opposite of matching regex 'b', where 'b' contains EVERYTHING ELSE not matchable by 'a'. Great for when 'b' would be ungodly extensive.
Note: if my regex has an error in it, I apologize... I have been using Lua for the last few months, so I may be mixing my regex rules. Otherwise, the '?!' is proper lookahead syntax for PHP.

How to check if a string is in an array?

I basically need a function to check whether a string's characters (each character) is in an array.
My code isn't working so far, but here it is anyway,
$allowedChars = array("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"," ","A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"," ","0","1","2","3","4","5","6","7","8","9"," ","#",".","-","_","+"," ");
$input = "Test";
$input = str_split($input);
if (in_array($input,$allowedChars)) {echo "Yep, found.";}else {echo "Sigh, not found...";}
I want it to say 'Yep, found.' if one of the letters in $input is found in $allowedChars. Simple enough, right? Well, that doesn't work, and I haven't found a function that will search a string's individual characters for a value in an array.
By the way, I want it to be just those array's values, I'm not looking for fancy html_strip_entities or whatever it is, I want to use that exact array for the allowed characters.
You really should look into regex and the preg_match function: http://php.net/manual/en/function.preg-match.php
But, this should make your specific request work:
$allowedChars = array("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"," ","A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"," ","0","1","2","3","4","5","6","7","8","9"," ","#",".","-","_","+"," ");
$input = "Test";
$input = str_split($input);
$message = "Sigh, not found...";
foreach($input as $letter) {
if (in_array($letter, $allowedChars)) {
$message = "Yep, found.";
break;
}
}
echo $message;
Are you familiar with regular expressions at all? It's sort of the more accepted way of doing what you're trying to do, unless I'm missing something here.
Take a look at preg_match(): http://php.net/manual/en/function.preg-match.php
To address your example, here's some sample code (UPDATED TO ADDRESS ISSUES IN COMMENTS):
$subject = "Hello, this is a string";
$pattern = '/[a-zA-Z0-9 #._+-]*/'; // include all the symbols you want to match here
if (preg_match($pattern, $subject))
echo "Yep, matches";
else
echo "Doesn't match :(";
A little explanation of the regex: the '^' matches the beginning of the string, the '[a-zA-Z0-9 #._+-]' part means "any character in this set", the '*' after it means "zero or more of the last thing", and finally the '$' at the end matches the end of the string.
A somewhat different approach:
$allowedChars = array("a","b","c","d","e");
$char_buff = explode('', "Test");
$foundTheseOnes = array_intersect($char_buff, $allowedChars);
if(!empty($foundTheseOnes)) {
echo 'Yep, something was found. Let\'s find out what: <br />';
print_r($foundTheseOnes);
}
Validating the characters in a string is most appropriately done with string functions.preg_match() is the most direct/elegant method for this task.
Code: (Demo)
$input="Test Test Test Test";
if(preg_match('/^[\w +.#_-]*$/',$input)){
echo "Input string does not contain any disallowed characters";
}else{
echo "Input contains one or more disallowed characters";
}
// output: Yes, input contains only allowed characters
Pattern Explanation:
/ # start pattern
^ # start matching from start of string
[\w +.#-] # match: a-z, A-Z, 0-9, underscore, space, plus, dot, atsign, hyphen
* # zero or more occurrences
$ # match until end of string
/ # end pattern
Significant points:
The ^ and $ anchors are crucial to ensure that the entire string is validated versus just a substring of the string.
The \w (a.k.a. "any word character" -> a shorthand character class) is the easy way to write: [a-zA-Z0-9_]
The . dot character loses its "match anything (almost)" meaning and becomes literal when it is written inside of a character class. No escaping slash is necessary.
The hyphen inside of a character class can be written without an escaping slash (\-) so long as the it is positioned at the start or end of the character class. If the hyphen is not at the start/end and it is not escaped, it will create a range of characters between the characters on either side of it.Like it or not, [.-z] will not match a hyphen symbol because it does not exist "between" the dot character and the lowercase letter z on the ascii table.
The * that follows the character class is the "quantifier". The asterisk means "0 or more" of the preceding character class. In this case, this means that preg_match() will allow an empty string. If you want to deny an empty string, you can use + which means "1 or more" of the preceding character class. Finally, you can be far more specific about string length by using a number or numbers in a curly bracketed expression.
{8} would mean the string must be exactly 8 characters long.
{4,} would mean the string must be at least 4 characters long.
{,10} would mean the string length must be between 0 and 10.
{5,9} would mean the string length must be between 5 and 9 characters.
All of that advice aside, if you absolutely must use your array of characters AND you wanted to use a loop to check individual characters against your validation array (and I certainly don't recommend it), then the goal should be to reduce the number of array elements involved so as to reduce total iterations.
Your $allowedChars array has multiple elements that contain the space character, but only one is necessary. You should prepare the array using array_unique() or a similar technique.
str_split($input) will run the chance of generating an array with duplicate elements. For example, if $input="Test Test Test Test"; then the resultant array from str_split() will have 19 elements, 14 of which will require redundant validation checks.
You could probably eliminate redundancies from str_split() by calling count_chars($input,3) and feeding that to str_split() or alternatively you could call str_split() then array_unique() before performing the iterative process.
Because you're just validating a string, see preg_match() and other PCRE functions for handling this instead.
Alternatively, you can use strcspn() to do...
$check = "abcde.... '; // fill in the rest of the characters
$test = "Test";
echo ((strcspn($test, $check) === strlen($test)) ? "Sigh, not found..." : 'Yep, found.');

A solid nickname regexp

I want a regular expression to validate a nickname: 6 to 36 characters, it should contain at least one letter. Other allowed characters: 0-9 and underscores.
This is what I have now:
if(!preg_match('/^.*(?=\d{0,})(?=[a-zA-Z]{1,})(?=[a-zA-Z0-9_]{6,36}).*$/i', $value)){
echo 'bad';
}
else{
echo 'good';
}
This seems to work, but when a validate this strings for example:
11111111111a > is not valid, but it should
aaaaaaa!aaaa > is valid, but it shouldn't
Any ideas to make this regexp better?
I would actually split your task into two regex:
to find out whether it's a valid word: /^\w{6,36}$/i
to find out whether it contains a letter /[a-z]/i
I think it's much simpler this way.
Try this:
'/^(?=.*[a-z])\w{6,36}$/i'
Here are some of the problems with your original regex:
/^.*(?=\d{0,})(?=[a-zA-Z]{1,})(?=[a-zA-Z0-9_]{6,36}).*$/i
(?=\d{0,}): What is this for??? This is always true and doesn't do anything!
(?=[a-zA-Z]{1,}): You don't need the {1,} part, you just need to find one letter, and i flag also allows you to omit A-Z
/^.*: You're matching these outside of the lookaround; it should be inside
(?=[a-zA-Z0-9_]{6,36}).*$: this means that as long as there are between 6-36 \w characters, everything else in the rest of the string matches! The string can be 100 characters long mostly containing illegal characters and it will still match!
You can do it easily using two calls to preg_match as:
if( preg_match('/^[a-z0-9_]{6,36}$/i',$input) && preg_match('/[a-z]/i',$input)) {
// good
} else {
// bad
}

Categories