regex for email validation: where is the error? - php

This sounds strange, but I've been using this function for quite a while now and "suddenly, from one day to the other" it does not filter some addresses in the right way anymore. However, I cannot see why...
function validate_email($email)
{
/*
(Name) Letters, Numbers, Dots, Hyphens and Underscores
(# sign)
(Domain) (with possible subdomain(s) ).
Contains only letters, numbers, dots and hyphens (up to 255 characters)
(. sign)
(Extension) Letters only (up to 10 (can be increased in the future) characters)
*/
$regex = '/([a-z0-9_.-]+)'. # name
'#'. # at
'([a-z0-9.-]+){2,255}'. # domain & possibly subdomains
'.'. # period
'([a-z]+){2,10}/i'; # domain extension
if($email == '') {
return false;
}
else {
$eregi = preg_replace($regex, '', $email);
}
return empty($eregi) ? true : false;
}
e.g. "some#gmail" will be shown as correct, etc so it seems sth happened with the tld - does anybody could tell me why?
Thank you very much in advance!

. means any character. You should escape it if you actually mean 'dot': \.
Your regex also has some other problems:
No uppercases are allowed in your regex: [a-zA-Z0-9]
No unicode characters are allowed in your regex (for example email addresses with é, ç, ... etc)
Some special characters such as + are in fact allowed in an email address
...
I would keep the email validation very simple. Like check if there is a # present and pretty much keep it at that. For if you really want to validate an email, the regex becomes gruesome.
Check this SO answer for a more detailed explanation.

What you commented with "period":
'.'. # period
is in fact a placeholder for any character. It should be \. instead.
However, you're overcomplicating things. Such validation should exist to reject either empty fields or obviously wrong stuff (e.g. name put in the email field). So in my experience the best check is just to look whether it contains an # and don't worry too much about getting the structure right. You can in fact write a regex which will faithfully validate any valid email address and reject any invalid one. It's a monster spanning about a screen of text. Don't do that. KISS.

I think the error is in this line:
'.'. # period
You mean a literal period here. But periods have a special meaning in regular expressions (they mean "any character").
You need to escape it with a backslash.

What about FILTER_VALIDATE_EMAIL

Related

How to make the regula expression correct?

I am not that familiar with regex or php, this line constantly return parsing error for detect email patterns:
with preg_match with the following inside which I changed from ereg:
if(!preg_match("/^(([A-Za-z0-9!#$%&'*+/=?^_{|}~-][A-Za-z0-9!#$%&'*+\/=?^_{|}~\.-]{0,63})|(\"[^(\|\")]{0,62}\"))$\", $local_array[$i]))
and:
if(!preg_match('/^(([A-Za-z0-9][A-Za-z0-9-]{0,61}[A-Za-z0-9])\|([A-Za-z0-9]+))$/', $domain_array[$i]) )
I tried to add / before and after it / for the following, it seems ok.
^(([A-Za-z0-9!#$%&'*+/=?^_`{|}~-][A-Za-z0-9!#$%&'*+/=?^_`{|}~\.-]{0,63})|(\"[^(\\|\")]{0,62}\"))$
The rest says:
Parse error: syntax error, unexpected '","' (T_CONSTANT_ENCAPSED_STRING), expecting ',' or ')'
How make it correct? It has parse errors when I switch from ereg to preg_match.
Thanks,
J.
Checking the validity of an e-mail according to the actual standard rather than just "[0-9A-z]#[0-9A-z]\\.(?i:[A-Z])" ?
Fantastic. As someone who uses a hyphen in their e-mail address, I wish there were more web-developers like you!
Here's the regex to match according to the RFC standard:
"/^([0-9A-Za-z\/\Q!#$%&'*+-=?^_`{}|~.\E]+|(?:[0-9A-Za-z\/\Q!#$%&'*+-=?^_`{}|~.\E]+\.\"(?:[0-9A-Za-z\/\Q!#$%&'*+-=?^_`{}|~. (),:;<>#[]\E]+|\\\\\\\\|\\\\\")+\"\.)+[0-9A-Za-z\/\Q!#$%&'*+-=?^_`{}|~.\E]+|\"(?:[0-9A-Za-z\/\Q!#$%&'*+-=?^_`{}|~. (),:;<>#[]\E]+|\\\\\\\\|\\\\\")+\")#(?:[0-9A-Za-z\-\.]+|\[[0-9A-Za-z\-\.]+\])$/"
Yhikes. As you can see there are multiple parts to that pattern, and if-statement logic is much, much faster, and helps reduce the eye-sore of a pattern this is.
So, if you care about that sort of thing, I would recommend writing a function to check the e-mail address like so:
1) Check that neither the local or domain part of the e-mail address have leading, trailing, or consecutive dots, and that it is in the correct format. e.g.
if (!preg_match("/^\.|\.\.|\.#|#\.|\.$/",$email) && preg_match("/^[^#]+?#[^\\.]+?\..+$/",$email)) {
This ensures there is an '#' symbol for the next part, and if it fails here, saves what would have been a lot of unnecessary computing.
2) Tokenize the e-mail address by '#:'
$part = explode("#",$email);
3) Of course, there could be more than one '#,' so if the array has more than 2 elements, loop through each and re-concatenate all but the final element, so that you get two strings: the local part (before the mandatory '#') and the domain part.
4) If the first element/local part of the address does not contain any quotation marks ($), then use this pattern:
$pattern = "/^[0-9A-Za-z\/\Q!#$%&'*+-=?^_`{}|~.\E]+$/";
5) Else if the local part begins AND ends with quotation marks, use this pattern:
$pattern = "/^\"(?:[0-9A-Za-z\/\Q!#$%&'*+-=?^_`{}|~. (),:;<>#[]\E]+|\\\\\\\\|\\\\\")+\"$/";
5) Else if the local part contains TWO quotation marks (one only would invalidate), use this pattern:
$pattern = "/^(?:[0-9A-Za-z\/\Q!#$%&'*+-=?^_`{}|~.\E]+\.\"(?:[0-9A-Za-z\/\Q!#$%&'*+-=?^_`{}|~. (),:;<>#[]\E]+|\\\\\\\\|\\\\\")+\"\.)+[0-9A-Za-z\/\Q!#$%&'*+-=?^_`{}|~.\E]+$/";
7) Else the first part is invalid.
8) If the local part was valid: If the second element/domain part is either encapsulated within square brackets ([]) or contains NO square brackets (you can just use substr and substr_count for this, since it will be much faster than regex), and it matches the pattern:
preg_match("/^\[?[0-9A-Za-z\-\.]+\]?$/",$domainPart);
Then it is valid.
Note: According to the standard, e-mail addresses can actually contain comments (why, I have no idea). The comments are not actually part of the e-mail address, and get removed when it is used. For that reason, I didn't bother matching them.

Match multiple characters without repetion on a regular expression

I'm using PHP's PCRE, and there is one bit of the regex I can't seem to do. I have a character class with 5 characters [adjxz] which can appear or not, in any order, after a token (|) on the string. They all can appear, but they can only each appear once. So for example:
*|ad - is valid
*|dxa - is valid
*|da - is valid
*|a - is valid
*|aaj - is *not* valid
*|adjxz - is valid
*|addjxz - is *not* valid
Any idea how I can do it? a simple [adjxz]+, or even [adjxz]{1,5} do not work as they allow repetition. Since the order does not matter also, I can't do /a?d?j?x?z?/, so I'm at a loss.
Perhaps using a lookahead combined with a backreference like this:
\|(?![adjxz]*([adjxz])[adjxz]*\1)[adjxz]{1,5}
demonstration
If you know these characters are followed by something else, e.g. whitespace you can simplify this to:
\|(?!\S*(\S)\S*\1)[adjxz]{1,5}
I think you should break this in 2 steps:
A regex to check for unexpected characters
A simple PHP check for duplicated characters
function strIsValid($str) {
if (!preg_match('/^\*|([adjxz]+)$/', $str, $matches)) {
return false;
}
return strlen($matches[1]) === count(array_unique(str_split($matches[1])));
}
I suggest using reverse logic where you match the unwanted case using this pattern
\|.*?([adjxz])(?=.*\1)
Demo

How validate "(344) 004-1585" type of phone number

I mask the phone numbers like "(342) 004-1452" with jQuery. Now I am trying validate this input with PHP. I tried
if(!preg_match('\(?[2-9][0-8][0-9]\)?[-. ]?[0-9]{3}[-. ]?[0-9]{4}', $_POST['tel']))
{
$errors['tel']='enter a valid phone number';
}
But I think my preg_match expression is not valid. What is the right preg_match expression to validate this data?
You're all over-thinking it. All you truly need to do is verify that the given string contains the proper number of numeric characters.
$input = '(342) 004-1452';
$stripped = preg_replace('/[^0-9]/', '', $input);
if( strlen($stripped) != 10 ) {
printf('%s is not a valid phone number.', $input);
} else {
printf('%s is a valid phone number. yay.', $input);
}
//output: (342) 004-1452 is a valid phone number. yay.
You can pretty-fy the phone number back from whatever garbled input someone has fed it with:
$phone_pretty = sprintf('(%s) %s-%s',
substr($stripped,0,3),
substr($stripped,3,3),
substr($stripped,6,4)
);
Your regular expression is missing delimiters. Wrap it with /:
'/\(?[2-9][0-8][0-9]\)?[-. ]?[0-9]{3}[-. ]?[0-9]{4}/'
If you've configured your environment to display warnings, you should be seeing one:
PHP Warning: preg_match(): Delimiter must not be alphanumeric or backslash
If you haven't turned on warnings, or have intentionally turned them off, you should stop developing PHP code until you turn them back on.
Your expression looks okay, but preg_match() needs you to supply delimiters for the start and end of the regular expression within the quotes.
These markers are typically slashes, but can actually be a number of other characters.
So adding slashes to your line of code gives the following:
if(!preg_match('/\(?[2-9][0-8][0-9]\)?[-. ]?[0-9]{3}[-. ]?[0-9]{4}/', $_POST['tel']))
If you want the string to be only a phone number and nothing else, you may also want to limit the regex by adding ^ at the beginning and $ at the end:
if(!preg_match('/^\(?[2-9][0-8][0-9]\)?[-. ]?[0-9]{3}[-. ]?[0-9]{4}$/', $_POST['tel']))
Hope that helps.
(that said, I would add that the phone number format you're checking only applies for US and countries in the US dialling plan; most international countries use different formats, so if you want to accept international visitors, you'll need a much looser regex than this)

PHP Regex for checking A-Z a-z 0-9 _ and

what I need is not email validation..
Its simple.
Allow #hello.world or #hello_world or #helloworld but #helloworld. should be taken as #helloworld so as #helloworld?
In short check for alphabet or number after . and _ if not than take the string before it.
My existing RegEx is /#.([A-Za-z0-9_]+)(?=\?|\,|\;|\s|\Z)/ it only cares with #helloworld and not the #hello.world or #hello_world.
Update:
So now I got a regex which deals with problem number 1. i.e. Allow #hello.world or #hello_world or #helloworld but still What about #helloworld. should be taken as #helloworld so as #helloworld?
New RegEx: /#([A-Za-z0-9+_.-]+)/
Don't use a regex for that.
Use...
$valid = filter_var($str, FILTER_VALIDATE_EMAIL);
Regex will never be able to verify an email, only to do some very basic format checking.
The most comprehensive regex for matching email addresses was 8000 chars long, and that one is already invalid due to changes in what is accepted in emails.
Use some designed library for the checking if you need to get real verification, otherwise just check for # and some dots, anything more and you will probably end up invalidating perfectly legal email addresses.
Some examples of perfectly legal email addresses: (leading and trailing " are for showing boundary only"
"dama#nodomain.se"
"\"dama\"#nodomain.se"
"da/ma#nodomain.se"
"dama#nõdomain.se"
"da.ma#nodomain.se"
"dama#pa??de??µa.d???µ?"
"dama #nodomain .se"
"dama#nodomain.se "
You can use this regexp to validate email addresses
^[A-Z0-9._%+-]+#[A-Z0-9.-]+.[A-Z]{2,6}$.
For more information and complete complete expressions you can check here
I hope this helps you
Try this:
\#.+(\.|\?|;|[\r\n\s]+)

Regular expression for e-mail domain (not basic e-mail verification)

I'm currently using
if(preg_match('~#(semo\.edu|uni\.uu\.se|)$~', $email))
as a domain check.
However I need to only check if the e-mail ends with the domains above. So for instance, all these need to be accepted:
hello#semo.edu
hello#student.semo.edu
hello#cool.teachers.semo.edu
So I'm guessing I need something after the # but before the ( which is something like "any random string or empty string". Any regexp-ninjas out there who can help me?
([^#]*\.)? works if you already know you're dealing with a valid email address. Explanation: it's either empty, or anything that ends with a period but does not contain an ampersand. So student.cs.semo.edu matches, as does plain semo.edu, but not me#notreallysemo.edu. So:
~#([^#]*\.)?(semo\.edu|uni\.uu\.se)$~
Note that I've removed the last | from your original regex.
You can use [a-zA-Z0-9\.]* to match none or more characters (letters, numbers or dot):
~#[a-zA-Z0-9\.]*(semo\.edu|uni\.uu\.se|)$~
Well .* will match anything. But you don't actually want that. There are a number of characters that are invalid in a domain name (ex. a space). Instead you want something more like this:
[\w.]*
I might not have all of the allowed characters, but that will get you [A-Za-z0-9_.]. The idea is that you make a list of all the allowed characters in the square brakets and then use * to say none or more of them.

Categories