How to add hyphen to regex - php

I have encountered this pattern
(\w+)
and from http://gskinner.com/RegExr/ site I understand that \w = match alpha-numeric characters and underscores, and + = match previous token 1 or more times (not exactly sure what that means).
How can I add the hyphen character to the list?
I tried (\w\-+) but it doesn't work, I don't get any match ...

You need a character class, denoted by [...]. \w can then be used in the character class and more characters can be added:
[\w-]
Careful though, if you add more characters to match. The hyphen-minus needs to be first or last in a class to avoid interpreting it as a range (or escape it accordingly).
The + is a quantifier, so it goes after a token (where the whole character class is a single token [as is \w]):
([\w-]+)

Related

Regex: Differentiating underscore(_) and dash(-)

I want to construct a pattern that identifies the valid domain name. A valid domain name has alphanumeric characters and dashes in it. The only rule is that the name should not begin or end with a dash.
I have a regular expression for the validation as ^\w((\w|-)*\w)?$
However the expression is validating the strings with underscores too (for ex.: cake_centre) which is wrong. Can anyone tell me why this is happening and how it can be corrected?
P.S.: I am using preg_match() function in PHP for checking the validation.
The metacharacter \w includes underscores, you can make a character class that will allow your listed requirements:
[a-zA-Z\d-]
or per your regex:
^[a-zA-Z\d]([a-zA-Z\d-]*[a-zA-Z\d])?$
(Also note the - position in the character class is important, a - at the start or end is the literal value. If you have it in the middle it can create a range. What special characters must be escaped in regular expressions?)
Underscores are being validated because they are part of the \w character class. If you want to exclude it, try:
/^[a-z0-9]+[a-z0-9\-]*[a-z0-9]+$/i
Here is the regexp with lookaround approach
(?<!-)([a-zA-Z0-9_]+)(?!-)
regexp pattern is created in 3 groups
First group ^(?<!-) is negetive look back to ensure that matched chars does not have dash before
Second group ([a-zA-Z0-9_]+) give matching characters
Third group (?!-) is negetive lookahead to ensure match is not ending with dash

Regex for the following condition

I need a small help with regex for the following
Alphanumeric with only lower case alphabets allowed
Starts with number or alphabet
Allows period (.)
Doesn't allow consecutive periods No ..
Doesn't allow any other special characters
Thanks,
-GM
^(?![^.]*\.\.)[a-z0-9][a-z0-9.]*$
The negative lookahead at the beginning covers your 4th requirement, everything else should be pretty straightforward. ^ and $ are beginning and end of string anchors, the character classes enforce the requirement that only lowercase letters, numbers, and . are allowed.
To add the length constraint (between 6 and 16 characters) just change the * to {5,15}. * means "repeat the previous element zero or more times", {n,m} means "repeat the previous element between n and m times (inclusive)". The reason {5,15} is used instead of {6,16} is that one character is already consumed by the first character class. Here is the end result:
^(?![^.]*\.\.)[a-z0-9][a-z0-9.]{5,15}$
Here's some assistance without giving away the answer, as you'll learn the most.
To match from a certain combination of characters, e.g. alphanumeric, use character classes, e.g. [a-z0-9]. Note that this expression matches exactly one character. You must use quantifiers to match more than one, e.g. +.
To "start" or "end" with something, you must use anchors, ^ and $, before the first or after the last character, respectively. (Watch out, though. In a character class, the ^ inverts the character class.)
In regex, . has a special meaning as a wildcard (matching any character besides newline characters). Therefore you have to escape them, \., to select the literal dot. Another way to escape the dot is to put it in a character class: [.].
Non-consecutiveness is trickier. You will need to look up more information about negative lookahead assertions (or lookaround assertions in general).
All the bolded words are terms you can Google to learn.
I'd say something along those lines: /^[a-z0-9]+(\.[a-z0-9]+)*\.?$/ (suppose that the line can end with a period)
Use this if the string may not end with a period:
/^[a-z0-9]+(\.[a-z0-9]+)*$/
or this if it may:
/^[a-z0-9]+(\.[a-z0-9]+)*\.?$/
This should be the best
^([a-z0-9]+\.?)+$

Allow + in regex email validate email [duplicate]

This question already has answers here:
How to validate an email address in PHP
(15 answers)
Closed 2 years ago.
Regex is blowing my mind. How can I change this to validate emails with a plus sign? so I can sign up with test+spam#gmail.com
if(!preg_match("/^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*$/i", $_GET['em'])) {
It seems like you aren't really familiar with what your regex is doing currently, which would be a good first step before modifying it. Let's walk through your regex using the email address john.robert.smith#mail.com (in each section below, the bolded part is what is matched by that section):
^ is the start of string
anchor.
It specifies that any match must
begin at the beginning of the
string. If the pattern is not
anchored, the regex engine can match
a substring, which is often
undesired.
Anchors are zero-width, meaning that
they do not capture any characters.
[_a-z0-9-]+ is made up of two
elements, a character
class
and a repetition
modifer:
[...] defines a character class, which tells the regex engine,
any of these characters are valid matches. In this case the class
contains the characters a-z, numbers
0-9 and the dash and underscore (in
general, a dash in a character class
defines a range, so you can use
a-z instead of
abcdefghijklmnopqrstuvwxyz; when
given as the last character in the
class, it acts as a literal dash).
+ is a repetition modifier that specifies that the preceding token
(in this case, the character class)
can be repeated one or more times.
There are two other repetition
operators: * matches zero or more
times; ? matches exactly zero or
one times (ie. makes something
optional).
(captures
john.robert.smith#mail.com)
(\.[_a-z0-9-]+)* again contains a
repeated character class. It also
contains a
group,
and an escaped character:
(...) defines a group, which allows you to group multiple tokens
together (in this case, the group
will be repeated as a
whole).Let's say we wanted to
match 'abc', zero or more times (ie.
abcabcabc matches, abcccc doesn't).
If we tried to use the pattern
abc*, the repetition modifier
would only apply to the c, because
c is the last token before the
modifier. In order to get around
this, we can group abc ((abc)*),
in which case the modifier would
apply to the entire group, as if it
was a single token.
\. specifies a literal dot character. The reason this is needed
is because . is a special
character in regex, meaning any
character.
Since we want to match an actual dot
character, we need to escape it.
(captures
john.robert.smith#mail.com)
# is not a special character in
regex, so, like all other
non-special characters, it matches
literally.
(captures john.robert.smith#mail.com)
[a-z0-9-]+ again defines a repeated character class, like item #2 above.
(captures john.robert.smith#mail.com)
(\.[a-z0-9-]+)* is almost exactly the same pattern as #3 above.
(captures john.robert.smith#mail.com)
$ is the end of string anchor. It works the same as ^ above, except matches the end of the string.
With that in mind, it should be a bit clearer how to add a section with captures a plus segment. As we saw above, + is a special character so it has to be escaped. Then, since the + has to be followed by some characters, we can define a character class with the characters we want to match and define its repetition. Finally, we should make the whole group optional because email addresses don't need to have a + segment:
(\+[a-z0-9-]+)?
When inserted into your regex, it'd look like this:
/^[_a-z0-9-]+(\.[_a-z0-9-]+)*(\+[a-z0-9-]+)?#[a-z0-9-]+(\.[a-z0-9-]+)*$/i
Save your sanity. Get a pre-made PHP RFC 822 Email address parser
I've used this regex to validate emails, and it works just fine with emails that contain a+:
/^(([^<>()[\]\\.,;:\s#\"]+(\.[^<>()[\]\\.,;:\s#\"]+)*)|(\".+\"))#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\])|(([a-zA-Z\-0-9]+\.)+[a-zA-Z]{2,}))$/
\+ will match a literal + sign, but be aware: You still won't be close to matching all possible email addresses according to the RFC spec, because the actual regex for that is madness. It's almost certainly not worth it; you should use a real email parser for this.
This is another solution (is similar to the solution found by David):
//Escaped for .Net
^[_a-zA-Z0-9-]+((\\.[_a-zA-Z0-9-]+)*|(\\+[_a-zA-Z0-9-]+)*)*#[a-zA-Z0-9-]+(\\.[a-zA-Z0-9-]+)*(\\.[a-zA-Z]{2,4})$
//Native
^[_a-zA-Z0-9-]+((\.[_a-zA-Z0-9-]+)*|(\+[_a-zA-Z0-9-]+)*)*#[a-zA-Z0-9-]+(\.[a-zA-Z0-9-]+)*(\.[a-zA-Z]{2,4})$
This is the another solution
/^[_a-z0-9-+]+(\.[_a-z0-9-+]+)*(\+[a-z0-9-]+)?#[a-z0-9-.]+(\.[a-z0-9]+)$/
or For razor page(#=\u0040)
/^[_a-z0-9-+]+(\.[_a-z0-9-+]+)*(\+[a-z0-9-]+)?\u0040[a-z0-9-.]+(\.[a-z0-9]+)$/

Regular expression for validating a username?

I'm still kinda new to using Regular Expressions, so here's my plight. I have some rules for acceptable usernames and I'm trying to make an expression for them.
Here they are:
1-15 Characters
a-z, A-Z, 0-9, and spaces are acceptable
Must begin with a-z or A-Z
Cannot end in a space
Cannot contain two spaces in a row
This is as far as I've gotten with it.
/^[a-zA-Z]{1}([a-zA-Z0-9]|\s(?!\s)){0,14}[^\s]$/
It works, for the most part, but doesn't match a single character such as "a".
Can anyone help me out here? I'm using PCRE in PHP if that makes any difference.
Try this:
/^(?=.{1,15}$)[a-zA-Z][a-zA-Z0-9]*(?: [a-zA-Z0-9]+)*$/
The look-ahead assertion (?=.{1,15}$) checks the length and the rest checks the structure:
[a-zA-Z] ensures that the first character is an alphabetic character;
[a-zA-Z0-9]* allows any number of following alphanumeric characters;
(?: [a-zA-Z0-9]+)* allows any number of sequences of a single space (not \s that allows any whitespace character) that must be followed by at least one alphanumeric character (see PCRE subpatterns for the syntax of (?:…)).
You could also remove the look-ahead assertion and check the length with strlen.
make everything after your first character optional
^[a-zA-Z]?([a-zA-Z0-9]|\s(?!\s)){0,14}[^\s]$
The main problem of your regexp is that it needs at least two characters two have a match :
one for the [a-zA-Z]{1} part
one for the [^\s] part
Beside this problem, I see some parts of your regexp that could be improved :
The [^\s] class will match any character, except spaces : a dot or semi-colon will be accepted, try to use the [a-zA-Z0-9] class here to ensure the character is a correct one.
You can delete the {1} part at the beginning, as the regexp will match exactly one character by default

PHP Regular Expression [accept selected characters only]

I want to accept a list of character as input from the user and reject the rest. I can accept a formatted string or find if a character/string is missing.
But how I can accept only a set of character while reject all other characters. I would like to use preg_match to do this.
e.g. Allowable characters are: a..z, A..Z, -, ’ ‘
User must able to enter those character in any order. But they must not allowed to use other than those characters.
Use a negated character class: [^A-Za-z-\w]
This will only match if the user enters something OTHER than what is in that character class.
if (preg_match('/[^A-Za-z-\w]/', $input)) { /* invalid charcter entered */ }
[a-zA-Z-\w]
[] brackets are used to group characters and behave like a single character. so you can also do stuff like [...]+ and so on
also a-z, A-Z, 0-9 define ranges so you don't have to write the whole alphabet
You can use the following regular expression: ^[a-zA-Z -]+$.
The ^ matches the beginning of the string, which prevents it from matching the middle of the string 123abc. The $ similarly matches the end of the string, preventing it from matching the middle of abc123.
The brackets match every character inside of them; a-z means every character between a and z. To match the - character itself, put it at the end. ([19-] matches a 1, a 9, or a -; [1-9] matches every character between 1 and 9, and does not match -).
The + tells it to match one or more of the thing before it. You can replace the + with a *, which means 0 or more, if you also want to match an empty string.
For more information, see here.
You would be looking at a negated ^ character class [] that stipulates your allowed characters, then test for matches.
$pattern = '/[^A-Za-z\- ]/';
if (preg_match($pattern, $string_of_input)){
//return a fail
}
//Matt beat me too it...

Categories