I'm learning regular expression, so please go easy with me!
Username is considered valid when does not start with _ (underscore) and if contains only word characters (letters, digits and underscore itself):
namespace Gremo\ExtraValidationBundle\Validator\Constraints;
use Symfony\Component\Validator\Constraint;
use Symfony\Component\Validator\ConstraintValidator;
class UsernameValidator extends ConstraintValidator
{
public function validate($value, Constraint $constraint)
{
// Violation if username starts with underscore
if (preg_match('/^_', $value, $matches)) {
$this->context->addViolation($constraint->message);
return;
}
// Violation if username does not contain all word characters
if (!preg_match('/^\w+$/', $value, $matches)) {
$this->context->addViolation($constraint->message);
}
}
}
In order to merge them in one regular expression, i've tried the following:
^_+[^\w]+$
To be read as: add a violation if starts with an underscore (eventually more than one) and if at least one character following is not allowed (not a letter, digit or underscore). Does not work with "_test", for example.
Can you help me to understand where I'm wrong?
You can add a negative lookahead assertion to your 2nd regex:
^(?!_)\w+$
Which now means, try to match the entire string and not any part of it. The string must not begin with an underscore and can have one or more of word characters.
See it work
The problem is De Morgan's Law. ^_+[^\w]+$ will only match if it starts with one or more underscores and all subsequent characters are non-word characters. You need to match if it starts with an underscore or any character is a non-word character.
I think it's simpler, in this case, to focus on the valid usernames: they start with a word character other than an underscore, and all remaining characters are word characters. In other words, valid usernames are described by the pattern ^[^\W_]\w*$. So, you can write:
if (! preg_match('/^[^\W_]\w*$/', $value, $matches)) {
The simple solution is this:
if (!preg_match('/^[a-zA-Z0-9]+$/', $value, $matches)) {
you just wanted the \w group (which includes the underscore) but without the underscore, so [a-zA-Z0-9] is equivalent to \w but without the underscore.
There are of course, many different ways of doing this. I'd probably look at going with something along the lines of /^(?!_)[\w\d_]+/$.
The [\w\d_]+ part combined with the anchors (^ and $), essentially assert that the entire string only consist of those characters. The (?!_) part is a negative lookahead assertion. It means check the next character isn't an underscore. Since it's right next to the ^ anchor, this ensures the first character isn't an underscore.
Related
I'm still a newbie for regular expressions. I want to create a regular expression with this rule:
if (preg_match('^[ A-Za-z0-9_-#]^', $text) == 1) {
return true;
}
else {
return false;
}
In short, I would like $text to accept texts, numbers, spaces, underscores, dashes, and hashes (#).
Is the above reg expression correct? it always return true.
First off, you shouldn't use ^ as the expression boundaries, because they're also used for expression anchors; use /, ~ or # instead.
Second, the dash in the character set should be at the last position; otherwise it matches the range from _ until #, and that's probably not what you want.
Third, the expression now only matches a single character; you will want to use a multiplier such as + or *.
Lastly, you should anchor the expression so that only those valid characters are present in the string:
/^[ \w#-]+$/
Btw, I've replaced A-Za-z0-9_ with \w as a shortcut.
That you can do:
\w stand for [a-zA-Z0-9_]
the character - have a special meaning in a character class since it is used to define ranges, thus you must place it at the begining or at the end of the class
the preg_match function return 0 if there is no match or false when an error occurs, thus you don't need to test if it is equal to 1 (you can use that preg_match returns to do things)
example:
if (preg_match('~[\w #-]++~', $subject))
...
else
...
I have encountered this pattern
(\w+)
and from http://gskinner.com/RegExr/ site I understand that \w = match alpha-numeric characters and underscores, and + = match previous token 1 or more times (not exactly sure what that means).
How can I add the hyphen character to the list?
I tried (\w\-+) but it doesn't work, I don't get any match ...
You need a character class, denoted by [...]. \w can then be used in the character class and more characters can be added:
[\w-]
Careful though, if you add more characters to match. The hyphen-minus needs to be first or last in a class to avoid interpreting it as a range (or escape it accordingly).
The + is a quantifier, so it goes after a token (where the whole character class is a single token [as is \w]):
([\w-]+)
I'm still kinda new to using Regular Expressions, so here's my plight. I have some rules for acceptable usernames and I'm trying to make an expression for them.
Here they are:
1-15 Characters
a-z, A-Z, 0-9, and spaces are acceptable
Must begin with a-z or A-Z
Cannot end in a space
Cannot contain two spaces in a row
This is as far as I've gotten with it.
/^[a-zA-Z]{1}([a-zA-Z0-9]|\s(?!\s)){0,14}[^\s]$/
It works, for the most part, but doesn't match a single character such as "a".
Can anyone help me out here? I'm using PCRE in PHP if that makes any difference.
Try this:
/^(?=.{1,15}$)[a-zA-Z][a-zA-Z0-9]*(?: [a-zA-Z0-9]+)*$/
The look-ahead assertion (?=.{1,15}$) checks the length and the rest checks the structure:
[a-zA-Z] ensures that the first character is an alphabetic character;
[a-zA-Z0-9]* allows any number of following alphanumeric characters;
(?: [a-zA-Z0-9]+)* allows any number of sequences of a single space (not \s that allows any whitespace character) that must be followed by at least one alphanumeric character (see PCRE subpatterns for the syntax of (?:…)).
You could also remove the look-ahead assertion and check the length with strlen.
make everything after your first character optional
^[a-zA-Z]?([a-zA-Z0-9]|\s(?!\s)){0,14}[^\s]$
The main problem of your regexp is that it needs at least two characters two have a match :
one for the [a-zA-Z]{1} part
one for the [^\s] part
Beside this problem, I see some parts of your regexp that could be improved :
The [^\s] class will match any character, except spaces : a dot or semi-colon will be accepted, try to use the [a-zA-Z0-9] class here to ensure the character is a correct one.
You can delete the {1} part at the beginning, as the regexp will match exactly one character by default
Im about to create a registration form for my website. I need to check the variable, and accept it only if contains letter, number, _ or -.
How can do it with regex? I used to work with them with preg_replace(), but i think this is not the case. Also, i know that the "ereg" function is dead. Any solutions?
this regex is pretty common these days.
if(preg_match('/^[a-z0-9\-\_]+$/i',$username))
{
// Ok
}
Use preg_match:
preg_match('/^[\w-]+$/D', $str)
Here \w describes letters, digits and the _, so [\w-]+ matches one or more letters, digits, _, and -. ^ and $ are so called anchors that denote the begin and end of the string respectively. The D modifier avoids that $ really matches the end of the string and is not followed by a line break.
Note that the letter and digits that are matched by \w depend on the current locale and might match other letter or digits than just [a-zA-Z0-9]. So if you just want these, use them explicitly. And if you want to allow more than these, you could also try character classes that are describes by Unicode character properties like \p{L} for all Unicode letters.
Try preg_match(). http://php.net/manual/en/function.preg-match.php
On my registration page I need to validate the usernames as alphanumeric only, but also with optional underscores. I've come up with this:
function validate_alphanumeric_underscore($str)
{
return preg_match('/^\w+$/',$str);
}
Which seems to work okay, but I'm not a regex expert! Does anyone spot any problem?
The actual matched characters of \w depend on the locale that is being used:
A "word" character is any letter or digit or the underscore character, that is, any character which can be part of a Perl "word". The definition of letters and digits is controlled by PCRE's character tables, and may vary if locale-specific matching is taking place. For example, in the "fr" (French) locale, some character codes greater than 128 are used for accented letters, and these are matched by \w.
So you should better explicitly specify what characters you want to allow:
/^[A-Za-z0-9_]+$/
This allows just alphanumeric characters and the underscore.
And if you want to allow underscore only as concatenation character and want to force that the username must start with a alphabet character:
/^[A-Za-z][A-Za-z0-9]*(?:_[A-Za-z0-9]+)*$/
Here's a custom function to validate the string by using the PHP ctype_alnum in conjunction with an array of allowed chars:
<?php
$str = "";
function validate_username($str) {
// each array entry is an special char allowed
// besides the ones from ctype_alnum
$allowed = array(".", "-", "_");
if ( ctype_alnum( str_replace($allowed, '', $str ) ) ) {
return $str;
} else {
$str = "Invalid Username";
return $str;
}
}
?>
try
function validate_alphanumeric_underscore($str)
{
return preg_match('/^[a-zA-Z0-9_]+$/',$str);
}
Looks fine to me. Note that you make no requirement for the placement of the underscore, so "username_" and "___username" would both pass.
I would take gumbo's secondary regex, to only allow underscore as concatenation, but add a + after the _ so a user can be like "special__username", just a minor tweak.
/^[A-Za-z][A-Za-z0-9]*(?:_+[A-Za-z0-9]+)*$/
Your own solution is perfectly fine.
preg_match uses Perl-like regular expressions, in which the character class \w defined to match exactly what you need:
\w - Match a "word" character (alphanumeric plus "_")
(source)