Match multiple characters without repetion on a regular expression - php

I'm using PHP's PCRE, and there is one bit of the regex I can't seem to do. I have a character class with 5 characters [adjxz] which can appear or not, in any order, after a token (|) on the string. They all can appear, but they can only each appear once. So for example:
*|ad - is valid
*|dxa - is valid
*|da - is valid
*|a - is valid
*|aaj - is *not* valid
*|adjxz - is valid
*|addjxz - is *not* valid
Any idea how I can do it? a simple [adjxz]+, or even [adjxz]{1,5} do not work as they allow repetition. Since the order does not matter also, I can't do /a?d?j?x?z?/, so I'm at a loss.

Perhaps using a lookahead combined with a backreference like this:
\|(?![adjxz]*([adjxz])[adjxz]*\1)[adjxz]{1,5}
demonstration
If you know these characters are followed by something else, e.g. whitespace you can simplify this to:
\|(?!\S*(\S)\S*\1)[adjxz]{1,5}

I think you should break this in 2 steps:
A regex to check for unexpected characters
A simple PHP check for duplicated characters
function strIsValid($str) {
if (!preg_match('/^\*|([adjxz]+)$/', $str, $matches)) {
return false;
}
return strlen($matches[1]) === count(array_unique(str_split($matches[1])));
}

I suggest using reverse logic where you match the unwanted case using this pattern
\|.*?([adjxz])(?=.*\1)
Demo

Related

Finding presence of chars or strings out of the allowed ones

Well, I'm stuck, I cannot find the correct form for the RegEx to provide to the PHP preg_match.
I have two strings. Say "mdo" and "o", but they could be really random.
I have a dictionary of allowed chars and strings.
For the example, allowed chars are "a-gm0-9", and allowed strings are "do" and "si".
THE GOAL
I'm trying to check that the input string doesn't contain any char or string but those in the dictionary, case-insensitive.
So the case of "mdo" wouldn't match because m is allowed just like the string do. Not the same for o instead, which has o that is not an allowed char and which doesn't contain the whole allowed string do.
My struggling reason
It's ok to negate [^a-gm0-9] and (?!do|si), but what I cannot achieve is to place them inside a single regex in order to apply the following PHP code:
<?php
$inputStr = 'mdo';
$rex = '/?????/i'; // the question subject
// if disallowed chars/strings are found...
if( preg_match($regex, $inputStr) == 1 )
return false; // the $inputStr is not valid
return true;
?>
Because two cascading preg_matches would break the logic and don't work.
How to mix chars check and groups check in "AND" in a single regex? Their positions don't matter.
You can use this pattern:
return (bool) preg_match('~^(?:do|si|[a-gm0-9])*+\C~i', $inputStr);
The idea is to match all allowed chars and substrings from the start in a repeated group with a possessive quantifier and to check if a single byte \C remains. Since the quantifier is greedy and possessive, the single byte after, if found, can't be allowed.
Note that most of the time, it is more simple to negate the preg_match function, example:
return (bool) !preg_match('~^(?:do|si|[a-gm0-9])*$~iD', $inputStr);
(or with a + quantifier, if you don't want to allow empty strings)

my regexp does not work for a simple word match

I want to see if the current request is on the localhost or not. For doing this, I use the following regular expression:
return ( preg_match("/(^localhost).*/", $url) == true ||
preg_match("/^({http|ftp|https}://localhost).*/", $url) == true )
? true : false;
And here is the var_dump() of $url:
string 'http://localhost/aone/public/' (length=29)
Which keeps returning false though. What is the problem of this regular expression?
You are currently using the forward slash (/) as the delimiter, but you aren't escaping it inside your pattern string. This will result in an error and will cause your preg_match() statement to not work (if you don't have error reporting enabled).
Also, you are using alternation incorrectly. If you want to match either foo or bar, you'd write (foo|bar), and not {foo|bar}.
The updated preg_match() should look like:
preg_match("/^(http|ftp|https):\/\/localhost.*/", $url)
Or with a different delimiter (so you don't have to escape all the / characters):
preg_match("#^(http|ftp|https)://localhost.*#", $url)
Curly braces have a special meaning in a regex, they are used to quantify the preceding character(s).
So:
/^({http|ftp|https}://localhost).*/
Should probably be something like:
#^((http|ftp|https)://localhost).*#
Edit: changed the delimiters so that the forward slash does not need to be escaped
This
{http|ftp|https}
is wrong.
I suppose you mean
(http|ftp|https)
Also, if you want only group and don't capture, please add ?::
(?:http|ftp|https)
I would change your current code to:
return preg_match("~^(?:(?:https?|ftp)://)?localhost~", $url);
You were using { and } for grouping, when those are used for quantifying and otherwise mean literal { and `} characters.
A couple of things to add is that:
you can use https? instead of (http|https);
you can use other delimiters for the regex when your pattern has those symbols as delimiters. This avoids you excessive escaping;
you can combine the two regex, since one part is optional (the (?:https?|ftp):// part) and doing so would make the later comparator unnecessary;
the .* at the end is not required.

PHP Reverse Preg_match [duplicate]

This question already has answers here:
Regular expression to match a line that doesn't contain a word
(34 answers)
Closed 4 years ago.
if(preg_match("/" . $filter . "/i", $node)) {
echo $node;
}
This code filters a variable to decide whether to display it or not. An example entry for $filter would be "office" or "164(.*)976".
I would like to know whether there is a simple way to say: if $filter does not match in $node. In the form of a regular expression?
So... not an "if(!preg_match" but more of a $filter = "!office" or "!164(.*)976" but one that works?
This can be done if you definitely want to use a "negative regex" instead of simply inverting the result of the positive regex:
if(preg_match("/^(?:(?!" . $filter . ").)*$/i", $node)) {
echo $node;
}
will match a string if it doesn't contain the regex/substring in $filter.
Explanation: (taking office as our example string)
^ # Anchor the match at the start of the string
(?: # Try to match the following:
(?! # (unless it's possible to match
office # the text "office" at this point)
) # (end of negative lookahead),
. # Any character
)* # zero or more times
$ # until the end of the string
The (?!...) negative assertion is what you're looking for.
To exclude a certain string from appearing anywhere in the subject you can use this double assertion method:
preg_match('/(?=^((?!not_this).)+$) (......)/xs', $string);
It allows to specify an arbitrary (......) main regex still. But you could just leave that out, if you only want to forbid a string.
Answer number 2 by mario is the correct answer, and here is why:
First to answer the comment by Justin Morgan,
I'm curious, do you have any idea what the performance of this would
be as opposed to the !preg_match() approach? I'm not in a place where
I can test them both. – Justin Morgan Apr 19 '11 at 21:53
Consider the gate logic for a moment.
When to negate preg_match(): when looking for a match and you want the condition to be 1)true for the absence of the desired regex, or 2)false for the regex being present.
When to use negative assertion on the regex: when looking for a match and you want the condition to be true if the string ONLY matches the regex, and fail if anything else is found. This is necessary if you really need to test for undesireable characters while allowing ommission of permitted characters.
Negating the result of (preg_match() === 1) only tests if the regex is present. If 'bar' is required, and numbers aren't allowed, the following won't work:
if (preg_match('bar', 'foo2bar') === 1) {
echo "found 'bar'"; // but a number is here, so fail.
}
if (!pregmatch('[0-9]', 'foobar') === 1) {
echo "no numbers found"; // but didn't test for 'bar', so fail.
}
So, in order to really test multiple regexes, a beginner would test using multiple preg_match() calls... we know this is a very amateur way to do it.
So, the Op wants to test a string for possible regexes, but the conditional may only pass as true if the string contains at least one of them. For most simple cases, simply negating preg_match() will suffice, but for more complex or extensive regex patterns, it won't. I will use my situation for a more real-life scenario:
Say you want to have a user form for a person's name, particularly a last name. You want your system to accept all letters regardless of case and placement, accept hyphens, accept apostrophes, and exclude all other characters. We know that matching a regex for all undesired characters is the first thing we think of, but imagine you are supporting UTF-8... that's alot of characters! Your program will be nearly as big as the UTF-8 table just on a single line! I don't care what hardware you have, your server application has a finite limit on how long a command be, not to mention the limit of 200 parenthesized subpatterns, so the ENTIRE UTF-8 character table (minus [A-Z],[a-z],-,and ') is too long, never mind that the program itself will be HUGE!
Since we won't use an if (!preg_match('.#\\$\%... this can be quite long and impossible to evaluate... on a string to see if the string is bad, we should instead test the easier way, with an assertion negative lookaround on the regex, then negate the overall result using:
<?php
$string = "O'Reilly-Finlay";
if (preg_match('/?![a-z\'-]/i', $string) === 0) {
echo "the given string matched exclusively for regex pattern";
// should not work on error, since preg_match returns false, which is not an int (we tested for identity, not equality)
} else {
echo "the given string did not match exclusively to the regex pattern";
}
?>
If we only looked for the regex [a-z\'-]/i , all we say is "match string if it contains ANY of those things", so bad characters aren't tested. If we negated at the function, we say "return false if we find a match that contained any of these things". This isn't right either, so we need to say "return false if we match ANYTHING not in the regex", which is done with lookahead. I know the bells are going off in someone's head, and they are thinking wildcard expansion style... no, lookahead doesn't do this, it just does negation on each match, and continues. So, it checks first character for regex, if it matches, it moves on until it finds a non-match or the end. After it finishes, everything that was found to not match the regex is returned to the match array, or simply returns 1. In short, assert negative on regex 'a' is the opposite of matching regex 'b', where 'b' contains EVERYTHING ELSE not matchable by 'a'. Great for when 'b' would be ungodly extensive.
Note: if my regex has an error in it, I apologize... I have been using Lua for the last few months, so I may be mixing my regex rules. Otherwise, the '?!' is proper lookahead syntax for PHP.

Regex to validate username

I'm trying to understand what's wrong with this regex pattern:
'/^[a-z0-9-_\.]*[a-z0-9]+[a-z0-9-_\.]*{4,20}$/i'
What I'm trying to do is to validate the username. Allowed chars are alphanumeric, dash, underscore, and dot. The restriction I'm trying to implement is to have at least one alphanumeric character so the user will not be allowed to have a nickname like this one: _-_.
The function I'm using right now is:
function validate($pattern, $string){
return (bool) preg_match($pattern, $string);
}
Thanks.
EDIT
As #mario said, yes,t here is a problem with *{4,20}.
What I tried to do now is to add ( ) but this isn't working as excepted:
'/^([a-z0-9-_\.]*[a-z0-9]+[a-z0-9-_\.]*){4,20}$/i'
Now it matches 'aa--aa' but it doesn't match 'aa--' and '--aa'.
Any other suggestions?
EDIT
Maybe someone wants to deny not nice looking usernames like "_..-a".
This regex will deny to have consecutive non alphanumeric chars:
/^(?=.{4,20}$)[a-z0-9]{0,1}([a-z0-9._-][a-z0-9]+)*[a-z0-9.-_]{0,1}$/i
In this case _-this-is-me-_ will not match, but _this-is-me_ will match.
Have a nice day and thanks to all :)
Don't try to cram it all into one regex. Make your life simpler and use a two step-approach:
return (bool)
preg_match('/^[a-z0-9_.-]{4,20}$/', $s) && preg_match('/\w/', $s);
The mistake in your regex probably was the mixup of * and {n,m}. You can have only one of those quantifiers, not *{4,20} both after another.
Very well, here is the cumbersome solution to what you want:
preg_match('/^(?=.{4})(?!.{21})[\w.-]*[a-z][\w-.]*$/i', $s)
The assertions assert the length, and the second part ensures that at least one letter is present.
Try this one instead:
'/[a-z0-9-_\.]*[a-z0-9]{1,20}[a-z0-9-_\.]*$/i'
Its probably just a matter if finetuning, you could try something like this:
if (preg_match('/^[a-zA-Z0-9]+[_.-]{0,1}[a-zA-Z0-9]+$/m', $subject)) {
# Successful match
} else {
# Match attempt failed
}
Matches:
a_b <- you might not want this.
ysername
Username
1254_2367
fg3123as
Non-Matches:
l__asfg
AHA_ar3f!
sAD_ASF_#"#T_
"#%"&#"E
__-.asd
username
1___
Non-matches you might want to be matches:
1_5_2
this_is_my_name
It is clear to me that you should split this into two checks!
Firstly check that they are using all valid characters. If they're not, then you can tell them that they are using invalid characters.
Then check that they have at least one alpha-numeric character. If they're not, then you can tell them that they must.
Two distinct advantages here: more meaningful feedback to the user and cleaner code to read and maintain.
Here is a simple, single regex solution (verbose):
$re = '/ # Match password having at least one alphanum.
^ # Anchor to start of string.
(?=.*?[A-Za-z0-9]) # At least one alphanum.
[\w\-.]{4,20} # Match from 4 to 20 valid chars.
\z # Anchor to end of string.
/x';
In Action (short form):
function validate($string){
$re = '/^(?=.*?[A-Za-z0-9])[\w\-.]{4,20}\z/';
return (bool) preg_match($re, $string);
}
Try this:
^[a-zA-Z][-\w.]{0,22}([a-zA-Z\d]|(?<![-.])_)$
From related question: Create one RegEx to validate a username
^[A-Za-z][A-Za-z0-9]*(?=.{3,31}$)[a-z0-9]{0,1}([a-z0-9._-][a-z0-9]+)*[a-z0-9.-_]{0,1}$
This will Validate the username
start with an alpha
accept underscore dash and dots
no spaces allowed
Why don't you make it simpler like this?
^[a-zA-Z][a-zA-Z0-9\._-]{3,9}
First letter should be Alphabetical.
then followed by character or symbols you allowed
length of the word should be between 4,10 (as explicitly force the first word)

PHP Regular Expression. Check if String contains ONLY letters

In PHP, how do I check if a String contains only letters? I want to write an if statement that will return false if there is (white space, number, symbol) or anything else other than a-z and A-Z.
My string must contain ONLY letters.
I thought I could do it this way, but I'm doing it wrong:
if( ereg("[a-zA-Z]+", $myString))
return true;
else
return false;
How do I find out if myString contains only letters?
Yeah this works fine. Thanks
if(myString.matches("^[a-zA-Z]+$"))
Never heard of ereg, but I'd guess that it will match on substrings.
In that case, you want to include anchors on either end of your regexp so as to force a match on the whole string:
"^[a-zA-Z]+$"
Also, you could simplify your function to read
return ereg("^[a-zA-Z]+$", $myString);
because the if to return true or false from what's already a boolean is redundant.
Alternatively, you could match on any character that's not a letter, and return the complement of the result:
return !ereg("[^a-zA-Z]", $myString);
Note the ^ at the beginning of the character set, which inverts it. Also note that you no longer need the + after it, as a single "bad" character will cause a match.
Finally... this advice is for Java because you have a Java tag on your question. But the $ in $myString makes it look like you're dealing with, maybe Perl or PHP? Some clarification might help.
Your code looks like PHP. It would return true if the string has a letter in it. To make sure the string has only letters you need to use the start and end anchors:
In Java you can make use of the matches method of the String class:
boolean hasOnlyLetters(String str) {
return str.matches("^[a-zA-Z]+$");
}
In PHP the function ereg is deprecated now. You need to use the preg_match as replacement. The PHP equivalent of the above function is:
function hasOnlyLetters($str) {
return preg_match('/^[a-z]+$/i',$str);
}
I'm going to be different and use Character.isLetter definition of what is a letter.
if (myString.matches("\\p{javaLetter}*"))
Note that this matches more than just [A-Za-z]*.
A character is considered to be a letter if its general category type, provided by Character.getType(ch), is any of the following: UPPERCASE_LETTER, LOWERCASE_LETTER, TITLECASE_LETTER, MODIFIER_LETTER, OTHER_LETTER
Not all letters have case. Many characters are letters but are neither uppercase nor lowercase nor titlecase.
The \p{javaXXX} character classes is defined in Pattern API.
Alternatively, try checking if it contains anything other than letters: [^A-Za-z]
The easiest way to do a "is ALL characters of a given type" is to check if ANY character is NOT of the type.
So if \W denotes a non-character, then just check for one of those.

Categories