email regex not working as expected - php

I am using the following regex to validate emails and just noticed some problems and don't see what the issue is :
/^[a-z0-9_.-]+#[a-z0-9.-]+.[a-z]{2,6}$/i.test(value)
support#tes is invalid
support#test is valid
support#test.c is invalid
support#test.co is valid
the 2,6 is for requiring and ending tld between 2 or 6 and that does not appear to be working either. I am sure I had this working properly before.

In a regex, . is a wildcard (meaning any char). you need to escape it as \.
Keep in mind though, the regex is too restrictive. You can have non-alpha numeric chars in the address, like '

I notice you're not escaping the .. There might be more to it than that, but that jumps out at me.

This is a decent check for an e-mail with Regex
\w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*
However you may want to read this. Using a regular expression to validate an email address

There are many ways to regex an email address. depending on how precise and restrictive you want it, but to re-write a working regex closest to what you have in you question. This should work:
^[\w_.-]+#[\w]+\.[\w]{2,6}$
support#tes - Invalid
support#test - Invalid
support#test.c - Invalid
support#test.co - Valid
supp34o.rt#tes.com - Valid
But also keep in mind ALL the characters allowed in a valid email address - What characters are allowed in an email address?

Related

How to make the regula expression correct?

I am not that familiar with regex or php, this line constantly return parsing error for detect email patterns:
with preg_match with the following inside which I changed from ereg:
if(!preg_match("/^(([A-Za-z0-9!#$%&'*+/=?^_{|}~-][A-Za-z0-9!#$%&'*+\/=?^_{|}~\.-]{0,63})|(\"[^(\|\")]{0,62}\"))$\", $local_array[$i]))
and:
if(!preg_match('/^(([A-Za-z0-9][A-Za-z0-9-]{0,61}[A-Za-z0-9])\|([A-Za-z0-9]+))$/', $domain_array[$i]) )
I tried to add / before and after it / for the following, it seems ok.
^(([A-Za-z0-9!#$%&'*+/=?^_`{|}~-][A-Za-z0-9!#$%&'*+/=?^_`{|}~\.-]{0,63})|(\"[^(\\|\")]{0,62}\"))$
The rest says:
Parse error: syntax error, unexpected '","' (T_CONSTANT_ENCAPSED_STRING), expecting ',' or ')'
How make it correct? It has parse errors when I switch from ereg to preg_match.
Thanks,
J.
Checking the validity of an e-mail according to the actual standard rather than just "[0-9A-z]#[0-9A-z]\\.(?i:[A-Z])" ?
Fantastic. As someone who uses a hyphen in their e-mail address, I wish there were more web-developers like you!
Here's the regex to match according to the RFC standard:
"/^([0-9A-Za-z\/\Q!#$%&'*+-=?^_`{}|~.\E]+|(?:[0-9A-Za-z\/\Q!#$%&'*+-=?^_`{}|~.\E]+\.\"(?:[0-9A-Za-z\/\Q!#$%&'*+-=?^_`{}|~. (),:;<>#[]\E]+|\\\\\\\\|\\\\\")+\"\.)+[0-9A-Za-z\/\Q!#$%&'*+-=?^_`{}|~.\E]+|\"(?:[0-9A-Za-z\/\Q!#$%&'*+-=?^_`{}|~. (),:;<>#[]\E]+|\\\\\\\\|\\\\\")+\")#(?:[0-9A-Za-z\-\.]+|\[[0-9A-Za-z\-\.]+\])$/"
Yhikes. As you can see there are multiple parts to that pattern, and if-statement logic is much, much faster, and helps reduce the eye-sore of a pattern this is.
So, if you care about that sort of thing, I would recommend writing a function to check the e-mail address like so:
1) Check that neither the local or domain part of the e-mail address have leading, trailing, or consecutive dots, and that it is in the correct format. e.g.
if (!preg_match("/^\.|\.\.|\.#|#\.|\.$/",$email) && preg_match("/^[^#]+?#[^\\.]+?\..+$/",$email)) {
This ensures there is an '#' symbol for the next part, and if it fails here, saves what would have been a lot of unnecessary computing.
2) Tokenize the e-mail address by '#:'
$part = explode("#",$email);
3) Of course, there could be more than one '#,' so if the array has more than 2 elements, loop through each and re-concatenate all but the final element, so that you get two strings: the local part (before the mandatory '#') and the domain part.
4) If the first element/local part of the address does not contain any quotation marks ($), then use this pattern:
$pattern = "/^[0-9A-Za-z\/\Q!#$%&'*+-=?^_`{}|~.\E]+$/";
5) Else if the local part begins AND ends with quotation marks, use this pattern:
$pattern = "/^\"(?:[0-9A-Za-z\/\Q!#$%&'*+-=?^_`{}|~. (),:;<>#[]\E]+|\\\\\\\\|\\\\\")+\"$/";
5) Else if the local part contains TWO quotation marks (one only would invalidate), use this pattern:
$pattern = "/^(?:[0-9A-Za-z\/\Q!#$%&'*+-=?^_`{}|~.\E]+\.\"(?:[0-9A-Za-z\/\Q!#$%&'*+-=?^_`{}|~. (),:;<>#[]\E]+|\\\\\\\\|\\\\\")+\"\.)+[0-9A-Za-z\/\Q!#$%&'*+-=?^_`{}|~.\E]+$/";
7) Else the first part is invalid.
8) If the local part was valid: If the second element/domain part is either encapsulated within square brackets ([]) or contains NO square brackets (you can just use substr and substr_count for this, since it will be much faster than regex), and it matches the pattern:
preg_match("/^\[?[0-9A-Za-z\-\.]+\]?$/",$domainPart);
Then it is valid.
Note: According to the standard, e-mail addresses can actually contain comments (why, I have no idea). The comments are not actually part of the e-mail address, and get removed when it is used. For that reason, I didn't bother matching them.

regular expression to match emails

i am stuck at particular problem i have username field where on only alphabets numbers and . - and _ are allowed and should always start with alphabet
here are examples what are accepted
someone#mydomain.com
something1234#mydomain.com
someething.something#mydomain.com
something-something#mydomain.com
something_something#mydomain.com
something_1234#mydomain.com
something.123#mydomain.com
something-456#mydomain.com
what i have done till now is
[a-zA-Z0-9]+[._-]{1,1}[a-zA-Z0-9]+#mydomain.com
this matches all my requirement except of problem it dosent match
someone#mydomain.com
someont123#mydomain.com
but it even matches
someone_someone_something#mydomain.com
which is not required i am really not getting how to solve this one thing i tried is
[a-zA-Z0-9]+[._-]{0}[a-zA-Z0-9]+#mydomain.com
but this is also not solving my problem now it accepts everything like
something+455#mydomain.com
which is not required please help me
If you want to make the - or . optional, then you have to replace the {1,1} (quantifier: once) with an ? (quantifier: one or none) here:
[a-zA-Z0-9]+[._-]?[a-zA-Z0-9]+#mydomain.com
The reason this regex also matches shorter addresses without delimiter -._ is that you don't assert the whole string, but just some part of it. Use start ^ and end $ anchors:
^[a-zA-Z0-9]+[._-]?[a-zA-Z0-9]+#mydomain\.com$
This is why we have filter_var($email, FILTER_VALIDATE_EMAIL).
If email address is valid, then you just have to check if it ends with #domain.com. That could be done with strrpos($email, '#domain.com').

PHP Regex for checking A-Z a-z 0-9 _ and

what I need is not email validation..
Its simple.
Allow #hello.world or #hello_world or #helloworld but #helloworld. should be taken as #helloworld so as #helloworld?
In short check for alphabet or number after . and _ if not than take the string before it.
My existing RegEx is /#.([A-Za-z0-9_]+)(?=\?|\,|\;|\s|\Z)/ it only cares with #helloworld and not the #hello.world or #hello_world.
Update:
So now I got a regex which deals with problem number 1. i.e. Allow #hello.world or #hello_world or #helloworld but still What about #helloworld. should be taken as #helloworld so as #helloworld?
New RegEx: /#([A-Za-z0-9+_.-]+)/
Don't use a regex for that.
Use...
$valid = filter_var($str, FILTER_VALIDATE_EMAIL);
Regex will never be able to verify an email, only to do some very basic format checking.
The most comprehensive regex for matching email addresses was 8000 chars long, and that one is already invalid due to changes in what is accepted in emails.
Use some designed library for the checking if you need to get real verification, otherwise just check for # and some dots, anything more and you will probably end up invalidating perfectly legal email addresses.
Some examples of perfectly legal email addresses: (leading and trailing " are for showing boundary only"
"dama#nodomain.se"
"\"dama\"#nodomain.se"
"da/ma#nodomain.se"
"dama#nõdomain.se"
"da.ma#nodomain.se"
"dama#pa??de??µa.d???µ?"
"dama #nodomain .se"
"dama#nodomain.se "
You can use this regexp to validate email addresses
^[A-Z0-9._%+-]+#[A-Z0-9.-]+.[A-Z]{2,6}$.
For more information and complete complete expressions you can check here
I hope this helps you
Try this:
\#.+(\.|\?|;|[\r\n\s]+)

Regular expression for e-mail domain (not basic e-mail verification)

I'm currently using
if(preg_match('~#(semo\.edu|uni\.uu\.se|)$~', $email))
as a domain check.
However I need to only check if the e-mail ends with the domains above. So for instance, all these need to be accepted:
hello#semo.edu
hello#student.semo.edu
hello#cool.teachers.semo.edu
So I'm guessing I need something after the # but before the ( which is something like "any random string or empty string". Any regexp-ninjas out there who can help me?
([^#]*\.)? works if you already know you're dealing with a valid email address. Explanation: it's either empty, or anything that ends with a period but does not contain an ampersand. So student.cs.semo.edu matches, as does plain semo.edu, but not me#notreallysemo.edu. So:
~#([^#]*\.)?(semo\.edu|uni\.uu\.se)$~
Note that I've removed the last | from your original regex.
You can use [a-zA-Z0-9\.]* to match none or more characters (letters, numbers or dot):
~#[a-zA-Z0-9\.]*(semo\.edu|uni\.uu\.se|)$~
Well .* will match anything. But you don't actually want that. There are a number of characters that are invalid in a domain name (ex. a space). Instead you want something more like this:
[\w.]*
I might not have all of the allowed characters, but that will get you [A-Za-z0-9_.]. The idea is that you make a list of all the allowed characters in the square brakets and then use * to say none or more of them.

Regular expression fun with emails; top level domain not required when it should be

I'm trying to create a regular expressions that will filter valid emails using PHP and have ran into an issue that conflicts with what I understand of regular expressions. Here is the code that I am using.
if (!preg_match('/^[-a-zA-Z0-9_.]+#[-a-zA-Z0-9]+.[a-zA-Z]{2,4}$/', $string)) {
return $false;
}
Now from the materials that I've researched, this should allow content before the # to be multiple letters, numbers, underscores and periods, then afterwards to allow multiple letters and numbers, then require a period, then two to four letters for the top level domain.
However, right now it ignores the requirement for having the top level domain section. For example a#b.c obviously is valid (and should be), but a#b is also returning as valid, which I want ti to be flagged as not so.
I'm sure I"m missing something, but after browsing google for an hour I'm at a loss as to what it could be. Anyone have an answer for this conundrum?
EDIT: The speed that answers arrive here makes this site superior over it's competitors. Well done!
You should escape . when it's not a part of the group: '/^[-a-zA-Z0-9_.]+#[-a-zA-Z0-9]+\.[a-zA-Z]{2,4}$/'
Otherwise it will be equal to any letter:
. - any symbol (but not the newline \n if not using s modifier)
\. - dot symbol
[.] - dot symbol (inside symbol group)
Rather than rolling your own, perhaps you should read the article How to Find or Validate an Email Address on Regular-Expressions.info. The article also discusses reasons why you might not want to validate an email address using a regular expression and provides 3 regular expressions that you might consider using instead of your own.
From the page Comparing E-mail Address Validating Regular Expressions: Geert De Deckere from the Kohana project has developed a near perfect one:
/^[-_a-z0-9\'+*$^&%=~!?{}]++(?:\.[-_a-z0-9\'+*$^&%=~!?{}]+)*+#(?:(?![-.])[-a-z0-9.]+(?<![-.])\.[a-z]{2,6}|\d{1,3}(?:\.\d{1,3}){3})(?::\d++)?$/iD
But there is also a buildin function in PHP filter_var($email, FILTER_VALIDATE_EMAIL) but it seems to be under development. And there is an other serious solution: PEAR:Validate. I think the PEAR Solution is the best one.
An RFC822-compliant e-mail regex is available.
This is the most reasonable trade off of the spec versus real life that I have seen:
[a-z0-9!#$%&'*+/=?^_`{|}~-]+
(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*
#
(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+
(?:[A-Z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum)\b
Of course, you have to remove the line breaks, and you have to update it if more top-level domains become available.
A single dot in a regular expression means "match any character". And that's exactly what is does when a top level domain is missing (also when it's present, of course).
Thus you should change your code like that:
if (!preg_match('/^[-a-zA-Z0-9_.]+#[-a-zA-Z0-9]+\.[a-zA-Z]{2,4}$/', $string)) {
return $false;
}
And by the way: a lot more characters are allowed in the local part than what your regular expression currently allows for.

Categories