regex assistance for validating an email address - php

I am trying to validate an email field. I took this regex from somewhere on here for and I used it on another form I made and it works fine. Yet when I use it now its not matching.
All I am trying to do is to check the email and if it is good then log it in the proper field in the db.
For the sake of not pasting a bunch of stuff... I have stripped out the problem lines and going to pseudo code next few lines.
Essentially, vars are these:
$theEmail = $_post email from first page here
$regEx ='#^[a-z0-9.!\#$%&\'*+-/=?^_`{|}~]+#([0-9.]+|([^\s]+\.+[a-z]{2,6}))$#si';
and my php is this
//essentially other field validation will go here...for now testing only empty.
if(!empty($theEmail)){
if (preg_match($regEx, $formEmail)) {
//send it through to db.
} else { //error stuff here }
}
essentially, this never comes true. The email never validates no matter what I do and as I said I wrote another more complicated form that validates data just fine
Not sure what is going on.

I would suggest you to use filter_var instead.
if (filter_var($theEmail, FILTER_VALIDATE_EMAIL)) {
//send it through to db.
} else {
//error stuff here
}

/^[a-z0-9.!\#$%&\'*+-=?^_{|}~]+#([0-9.]+|([^\s]+\.+[a-z]{2,6}))$/
I removed the first # and ending #si, and took out the / from the = since it was giving me problems. This generates a match on my e-mail address here:
<?
$theEmail = 'me#davebel.com';
$regEx ='/^[a-z0-9.!\#$%&\'*+-=?^_`{|}~]+#([0-9.]+|([^\s]+\.+[a-z]{2,6}))$/';
print_r(preg_match($regEx, $theEmail));
?>
Though this regex is very complex for something like e-mail validation- I would recommend trying to refine it and fine-tune it before putting it into production.

With email validation there are simple solutions that catch 99 % of all mistakes and complex solutions that might catch a tenth of a percent more, yet be unreadable.
Go the easy route and just check for something like
.+#.+\..+
Yes, it will allow an email address like a#b.c but that's probably a smaller price to pay than a user who cannot register because your 500-character regex has a mistake in it somewhere, rejecting a valid address.

give this a try! hopefully it will resolve your query, although there are infinte regulare expressions for email
^[a-z0-9,!#\$%&'\*\+/=\?\^_`\{\|}~-]+(\.[a-z0-9,!#\$%&'\*\+/=\?\^_`\{\|}~-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*\.([a-z]{2,})$
For testing visit
Regular Expression Tester

Related

Form field Validation custom email requirements

Looking to create a form validation on email text field.
Have previously used validation to ensure correct email is produced.
But here looking to create a more custom rule which allows only emails ending in the format .ac.uk
Here a user would be able to provide any university/college/instituion as long as the last 6 characters in the string = .ac.uk with the general format for the mail as follows: email#university.ac.uk
Solution preferably in PHP, currently looking at employing a rule using the end part in this statement:
^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$
Making this part *(\.[a-z]{2,3}) relate to the .ac.uk
many thanks, much appreciated
Jeanclaude
I would first run the email through filter_var($email, FILTER_VALIDATE_EMAIL) rather than using a simple regex. It's not perfect (I've found a few edge cases that don't validate correctly) but it works well. Once you know it's a valid email address you can simply trust substr($email, -6) == '.ac.uk' and be done with it. Something like:
if (filter_var($email, FILTER_VALIDATE_EMAIL)
&& strtolower(substr(trim($email), -6))) === '.ac.uk') {
// Valid
}

PHP - Filter_var alternative?

I built a php script to output data posted in a form, but I ran into a problem. The server the website is going to run on, runs PHP 5.1.6. This version of PHP does not support filter_var.
I need to know an alternative on short term (preferably yesterday), and can't find something straight forward on Google or Stack Overflow.
Mayhap someone here ran into the same issue in the past and has a quick fix for me?
This code:
$email= filter_var($_POST['email'], FILTER_SANITIZE_EMAIL);
$answer= filter_var($_POST['answer'], FILTER_SANITIZE_STRING);
needs to be compatible with PHP 5.1.6, so the email address is checked on genuinity, and that no malicious code is used in either fields. Any tips?
Thanks so much!
for Emails you can use a Regex: (for example: http://www.totallyphp.co.uk/validate-an-email-address-using-regular-expressions)
for strings you could also do regex, but that is a little bit too heavy, so maybe a combination of mysql_real_escape_string() if you send it to a DB, and for html you should use htmlentities():
http://de.php.net/manual/en/function.mysql-real-escape-string.php
http://www.php.net/manual/en/function.htmlentities.php
I don't think that the filter_var-function does far different than just using these methods
You can install the extension via PECL to PHP 5.1:
http://pecl.php.net/package/filter
i would use a regular expression generally. it provides you the most flexibility. on the internet are many useful resources about it. take a look here or here
Using the information I was given in the previous answers, here's how I fixed my problem:
<?PHP // Retreive POST data and sanitize it: trim string, no HTML, plain text
$variable1=htmlentities(trim($_POST['input1']), ENT_NOQUOTES);
$variable2=htmlentities(trim($_POST['input2']), ENT_NOQUOTES);
$emailaddress=$_POST['email']; // sanitizing email address happens below
if(eregi("^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})$", $emailadres)){ // check email address and if legit, do this:
echo '<p>The e-mail address given is valid.</p>'
} else{ // if email is not legit, do this:
echo '<p>The e-mail address given is not valid.</p>';
}
?>
I hope this helps someone :)

PHP+CSS Obfuscation - PHP ord THEN PHP strrev + CSS reverse text, how to get the special chars validated backwards?

I have a been reading up on email obfuscation.
I found an interesting post entitled Best Method for Email Obfuscation? - By Jeff Starr where he describes various tests preformed over 1.5 years by Silvan Mühlemann.
According to this study, css obfustication was 100% effective throughout the 1.5-year test, despite its various downsides.
Seeing as i was playing around with this method of obfustication before, i decided to give it another go, with the addition of a php function that i came accross.
Here is the function:
// Converts email and tel into html special characters
function convert_email_adr($email)
{
$pieces = str_split(trim($email));
$new_mail = '';
foreach ($pieces as $val)
{
$new_mail .= '&#'.ord($val).';';
}
return $new_mail;
}
And here is the php using that function.
$lstEmail = convert_email_adr("{$row['email']}");
This does exactly as described, and i would assume that this would work out quite well, assuming the harvesters have not written code that identifies the string of special chars and decodes them.
So i decided, what if i combined these two methods, as in, i break the string into special chars, then use strrev on it, then use css to reverse the string... Simple...
Here is the added peice of php that reverses the actual string as seen in the page source:
$lstEmail = strrev($lstEmail);
and the css to reverse it again on the client side:
span.obfuscate { unicode-bidi:bidi-override; direction: rtl; }
And the html:
<p><span class='listHeadings'>eMail:</span> <span class='obfuscate' style='font-size:0.6em;'><a href='mailto: $lstEmail?subject=Testing 123'>$lstEmail</a></span></p>
But the problem is that the string is now in reverse and will not be validated... Here is an example:
;901#&;111#&;99#&;64#&;801#&;501#&;79#&;901#&;301#&;46#&;411#&;101#&;001#&;011#&;111#&;611#&;011#&;79#&;811#&;301#&;501#&;79#&;411#&;99#&
What happens is that the special characters are not decoded into actual characters, so all you see is the string of special character in reverse.
There is also the downside as described by Jeff Starr, that you cannot use the css method in mailto as you cannot use the span tag within the href attribute.
So now i am truly stuck at an odds of how to go about this task. I guess i might be able to live with forcing people to input my email address themselves if they would like to mail me... But, on the other hand, i am not so sure about that.
Then there comes the task of validating special characters in reverse...
Would anyone be able to provide me with any type of input or support in this regard? Also any suggestions in different, LEGITIMATE ways of going about this task would be greatly appreciated!!
I say legitimate because i plan to use these functions in one of my live projects that is a business listing website (currently using the php function above)... The last thing i want to do is start playing around and create a gap and let out a bunch of info for the spammers! I think that would be very bad for business...
As webmaster I always put my email in plain text on the contact site. Its the most comfortable solution for the visitors and it works independent if css is supported or js.
I do this with several emails since 10 years .. yes I got some spam but not that much, about 3-5 a day. I've got a good spam filter and watch over the spam once a week and delete it.
I do not use mailto because a lot of people do not have configured a local email-program and do not know what to do with the popup when clicking the mailto-link.
Just reverse it before you obfuscate it...
$email = 'blah#whatever.co.uk';
$new = convert_email_adr($email);
echo '<span style="unicode-bidi:bidi-override; direction: rtl;">'.$new.'</span>';
function convert_email_adr($email, $reverse = true, $obfuscate = true)
{
$email = trim($email);
if($reverse)
{
$email = strrev($email);
}
if($obfuscate)
{
$pieces = str_split($email);
$email = '';
foreach($pieces as $piece)
{
$email .= '&#'.ord($piece).';';
}
}
return $email;
}
Why don't you use it that way?
function convert_email_adr($email)
{
$pieces = str_split(strrev(trim($email)));
$new_mail = '';
foreach ($pieces as $val)
{
$new_mail .= '&#'.ord($val).';';
}
return $new_mail;
}
Generally, a good solution to this is to provide a layer of abstraction around the email address entirely, by which I mean instead of just the email address, providing a contact form. They fill in their info, submit it, and your server sends along the information to the proper email address.
That's not an especially scalable approach, though, generally mostly applicable to a single "contact me" situation, not a "here are our listings of companies to contact" situation, in which case obsfucation is running directly counter to your goal of making sure the customers can contact the targets with as much ease as possible. In that case you generally want to go with good spam protection.

How to validate non-english (UTF-8) encoded email address in Javascript and PHP?

Part of a website I am currently working on contains registration process where users have to provide their email address. Just recently I became aware that non-ascii based domains are possible (so is email).
My backend is utf-8 encoded MySQL where I am expecting any users (with differnt locales) should be able to enter their email but don't know how to validate this kind of email address.
Currently I am testing out jquery tools and it validates the english email address correctly but fails to validate non ascii email. Also I need to do same at server side with php. Is there a regular expression that can validate this kind of email address?
I have tried this but it fails in jquery tools (this is just example for demo, I don't understand this too)
闪闪发光#闪闪发光.com
Also what will happen when they type their English email address (jonesmith#somemail.com) with their own IME. Can this be validated with current regular expression we have for English mail validation. Currently I don't have to worry if that email exist for not.
Thanks
Attempting to validate email addresses may not be a good idea. The specifications (RFC5321, RFC5322) allow for so much flexibility that validating them with regular expressions is literally impossible, and validating with a function is a great deal of work. The result of this is that most email validation schemes end up rejecting a large number of valid email addresses, much to the inconvenience of the users. (By far the most common example of this is not allowing the + character.)
It is more likely that the user will (accidentally or deliberately) enter an incorrect email address than in an invalid one, so actually validating is a great deal of work for very little benefit, with possible costs if you do it incorrectly.
I would recommend that you just check for the presence of an # character on the client and then send a confirmation email to verify it; it's the most practical way to validate and it confirms that the address is correct as well.
Since 5.2 PHP has a build in validation for email addresses. But I'm not sure if it works for UFT-8 encoded strings:
echo filter_var($email, FILTER_VALIDATE_EMAIL);
In the original PHP source code you will find the reg exp for validating email, this can be used for manually validating when using PHP < 5.2.
Update
idn_to_ascii() can be used to "Convert domain name to IDNA ASCII form." Which then can be validated with filter_var($email, FILTER_VALIDATE_EMAIL);
// International domains
if (function_exists('idn_to_ascii') && strpos($email, '#') !== false) {
$parts = explode('#', $email);
$email = $parts[0].'#'.idn_to_ascii($parts[1]);
}
$is_valid = filter_var($email, FILTER_VALIDATE_EMAIL);
As offered by Mario, playing around a bit, I came up with the following regex to validate non-standard email address:
^([\p{L}\_\.\-\d]+)#([\p{L}\-\.\d]+)((\.(\p{L}){2,63})+)$
It would validate any proper email address with all kind of Unicode letters, with TLD limitations from 2 to 63 characters.
Please check it and let me know if there are any flaws.
Example Online
a reg exp could be something like this:
[^ ]+#[^ ]+\.[^ ]{2,6}
Got this idea from Javascript tutorial page. It is basic but it works for me without worrying about complexity of regular expressions and unicode standards.
Client side validation
if(!$.trim(value).length) {
return false;
}
else {
AtPos = value.indexOf("#");
StopPos = value.lastIndexOf(".");
if (AtPos == -1 || StopPos == -1) {
return false;
}
if (StopPos < AtPos) {
return false;
}
if (StopPos - AtPos == 1) {
return false;
}
return true;
}
Serverside validation
if(!isset($_POST['emailaddr']) || trim($_POST['emailaddr']) == "") {
//Error: Email required
}
else {
$atpos = strpos($_POST['emailaddr'],'#');
$stoppos = strpos($_POST['emailaddr'],'.');
if(($atpos === false) || ($stoppos === false)) {
//Error: invalid email
}
else {
if($stoppos < $atpos) {
//Error: invalid email
}
else {
if (($stoppos-$atpos) == 1) {
//Error: invalid email
}
}
}
Though it still has some loop holes, I guess users will not be fooling around with this stuff. Also real validation is requierd for serious stuff as suggested by 'Jeremy Banks'.
Hope this will be helpful for somebody else too.
Thanks and regards to all
On this subject I liked this page so much that I set up a blog exposing sites that do validation wrong (contributions gratefully received - don't let yours be on it!).
As far as using regexes go, those that say "it's wrong", tend to be light on alternatives, and TBH validation to the last letter of the RFC isn't really that critical - for example while noddy+!#$%&'*-/=?+_{}|~test#gmail.com is a perfectly valid address, it's not too unreasonable to reject it given that a surprisingly large proportion of users can't even type 'hotmail' correctly. Some domains are also quite restrictive on user names anyway, particularly hotmail. So I'm in favour of regexes that are demonstrably reasonable, and my favourite source for that is this page, though I don't like their current JS 'winner' and it would help if they set up a public test page.
jQuery's validate plugin uses this regex which is interestingly constructed, quite similar in style (but smaller!) to the ex-parrot one (actually my ISP!) linked by #powtac .
what is about something this:
mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");
mb_ereg('[\w]+#[\w]+\.com',$mail,'UTF-8');

Calculating difference between username and email in javascript

for security reasons i want the users on my website not to be able to register a username that resembles their email adress. Someone with email adress user#domain.com cant register as user or us.er, etc
For example i want this not to be possible:
tester -> tester#mydomain.com (wrong)
tes.ter -> tester#mydomain.com (wrong)
etc.
But i do want to be able to use the following:
tester6 -> tester#mydomain.com (good)
etc.
//edit
tester6 is wrong too. i ment user6 -> tester#mydomain.com (good).
Does anyone have an idea how to achieve this, or something as close as possible. I am checking this in javascript, and after that on the server in php.
Ciao!
ps. Maybe there is some jquery plugin to do this, i can't find this so far. The downside tho of using a plugin for this, is that i have to implement the same in php. If it is a long plugin it will take some time to translate.
//Edit again
If i only check the part before the # they can still use userhotmailcom, or usergmail, etc. If they supply that there email is abvious.
Typically, I use the Levenshtein distance algorithm to check whether a password looks like a login.
PHP has a native levenshtein function and here is one written in JavaScript.
Something like this?
var charsRe = /[.+]/g; // Add your characters here
if (username.replace(charsRe, '') == email.split('#')[0].replace(charsRe, ''))
doError();
If all you want is to disallow user names that vary from the email address only with periods (.), you can remove periods from the user name and compare it with email address.
//I don't know php - translating this pseudo code won't be hard
$email = "someone#something.com"
$emailname = $email.substring(0, $email.indexOf('#'));
$uname = "som.e.on.e";
$uname = $uname.replace(/\./g, "");//regex matching a '.' globally
if($uname === $emailname)
showInvalidNameErrorMessage();
Modified regex to prevent hyphens and underscores /[\-._]/g
Well, I am a newbie PHP developer. But the answer I have in my mind is, wouldn't it be great if you just allow them to register only with their email address (which won't be shared with others) and then ask for their first name and last name separately and only show their first name within public contents (i.e. Blogs, etc). I am not an expert in programming and if I am wrong please correct me and still I couldn't understand what you by security for you. Sorry for the bad English, I am not a native English speaker.

Categories