I've written this regex to check for valid emails: /^[-a-z0-9._]+#[-a-z0-9._]+\.+[a-z]{2,6}$/i
I want it to work for emails like name1+name2#domaine.com
How can I fix this regex?
I Have a simpler solution.
if(filter_var($email,FILTER_VALID_EMAIL))
{
//true
}
this would be sufficient in most cases, this actually runs an regular check in C which in turn would be faster but if you wish to have control over the reg-ex in your application then the regex below is what's used for this check:
/^((\\\"[^\\\"\\f\\n\\r\\t\\b]+\\\")|([\\w\\!\\#\\$\\%\\&\\'\\*\\+\\-\\~\\/\\^\\`\\|\\{\\}\\=\\?]+(\\.[\\w\\!\\#\\$\\%\\&\\'\\*\\+\\-\\~\\/\\^\\`\\|\\{\\}\\=\\?]+)*))#((\\[(((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9])))\\])|(((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9]))\\.((25[0-5])|(2[0-4][0-9])|([0-1]?[0-9]?[0-9])))|((([A-Za-z0-9\\-])+\\.)+[A-Za-z\\-]+))$/D
Another tip i will give you is that a user may enter an email address such as: invalid#dontexists.com which would then bypass your checks for a valid email, if you wan't to make sure that dontexists.com is running an email server is do:
$has_mx_server = (bool)checkdnsrr($domain,"MX");
if the domain has a registered MX Record the chances of the email being faked is reduced by a good chunk.
First part
[-a-z0-9._]+
does not accept right now plus sign. Expand it:
[-+a-z0-9._]+
Try
/^[-a-z0-9._+]+#[-a-z0-9._]+\.+[a-z]{2,6}$/i
Place the + inside the braces and escape it with a backslash
/^[-a-z0-9._\+]+#[-a-z0-9._]+\.+[a-z]{2,6}$/i
"+" is a meta character meaning to search for 1 or more occurrence, therefore, to search for the actual character, it must be escaped.
Related
I want to validate email address and website in comment box. When someone writes comment in comment box and after submission check if email address or website found in comment remove that email and address.
I have put below regular expression for email.
"/(?:[a-z0-9!#$%&'*+=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+=?^_`{|}~-]+)*|\"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*\")#(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])/"
above expression validates email address but I want to validate like email[at]email[dot]com, email{at}email{dot}com, email(at)email(dot)com
Same for website validation I used below expression
"/((((http|https|ftp|ftps)\:\/\/)|www\.)[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,4}(\/\S*)?)/"
But I want to validate website like website[dot]com, www[dot]website[dot]com
Basically what you need to do is, where you have the validation of # and . character in email or . in weburl, you need to enhance your regex and put the alternatives to # character as you are expecting. So,
# should be written as (?:#|[[({]at[\]})])
And,
\. should be written as (?:\.|[[{(]dot[\]})])
wherever you have them in your regex and then it will also filter those strings as well.
Here is a modified regex for email.
(?:[a-z0-9!#$%&'*+=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+=?^_`{|}~-]+)*|\"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*\")(?:#|[[({]at[\]})])(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(?:\.|[[{(]dot[\]})]))+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])
Regex Demo for email
Same way you can replace . from your website regex and your modified regex becomes this,
(?:(?:(?:http|https|ftp|ftps)\:\/\/)|www(?:\.|[[{(]dot[\]})]))(?:[a-zA-Z0-9.-]|[[{(]dot[\]})])+(?:\.|[[{(]dot[\]})])[a-zA-Z]{2,4}(\/\S*)?
Regex Demo for web url
Now besides matching of [dot], {dot} and (dot), the regex will also match [dot} and similar and as you are trying to detect such strings further, hence matching these strings will be an added advantage, rather than a problem unless the context was otherwise.
I am using the following FUNCTION to extract email address from text.
function is_valid_email($email) {
if (preg_match('/^[_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.([a-z]){2,4})$/',$emailss)) return true;
else return false;
}
It is working very smoothly, but on problem:
an email with "dash" is not working:
for example:
info-test#web-site.com comes out: test#web
Please advise.
Dash has special meaning in regular expression. So cant be used directly and need to be escaped using backslash. following is updated code:
function is_valid_email($email) {
if (preg_match('/^[_a-z0-9\-]+(\.[_a-z0-9\-]+)*#[a-z0-9\-]+(\.[a-z0-9\-]+)*(\.([a-z]){2,4})$/',$emailss)) return true;
else return false;
}
You should escape the dash character, as it has a special meaning (range) in the used context:
[_a-z0-9\-]
There are myriads of problems with that e-mail validation regexp. For example, it won't pass any of perfectly valid modern national TLDs and it honestly thinks that TLD has maximum 4 letters in it. It doesn't allow arbitrary number of dots . in user account part, it doesn't allow pluses +, etc.
Generally, a good practice of validating e-mails boils down to:
Minimal validation - just check that there's # there and that's all.
Just send that e-mail - don't check anything else. If it will be sent - then it's indeed a valid e-mail.
For more details, take a look at http://davidcel.is/blog/2012/09/06/stop-validating-email-addresses-with-regex/ or any similar articles.
oh eh...ya...lots commented there are lots email validation can be used but just that for this one I have to do it like what is mentioned below that's why....
I need to validate email like this
alphanumeric characters followed by # followed by alphanumeric characters followed by . followed by 2 – 4 more alphanumeric characters
this is what I have done but somehow I know it's the last part after . I messed up but I couldn't find where I messed up....
preg_match("/^([0-9]|[a-z])([0-9]|[a-z]|[_-])*#([0-9]|[a-z])*\.([0-9][a-z]){2,4}$/i","")
at start I used [0-9]|[a-z])([0-9]|[a-z]|[_-] because I didn't want people able to use _- as the start....so forced start as number/letters only
There must be a million different people that wrote a new regex for email validation. If you are interested in the email format you can just use
$email = filter_var($email, FILTER_VALIDATE_EMAIL);
and if the final value is empty the initial one wasn't a valid email address format.
(as an extra step you could try to validate the domain by using this function http://php.net/manual/en/function.checkdnsrr.php)
Have a try with this:
^[0-9a-z_\-]+#[0-9a-z_\-]+\.[0-9a-z]{2,4}$
But as said: there are ready-to-use regexes, much better than trying to reinvent the wheel. Also this current approach does not macth all valid addresses and validates some addresses that are illegal.
Which reason of email validation? It is very upset when you try to enter you email and you can't due to the stupid validation. I think it is enoth to check the availability of '#' and '.' signs, in case user unintentionally missed this.
$res = preg_match("/#[^#\.]*\./", $str);
I have the following part of a validation script:
$invalidEmailError .= "<br/>» You did not enter a valid E-mail address";
$match = "/\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/";
That's the expression, here is the validation:
if ( !(preg_match($match,$email)) ) {
$errors .= $invalidEmailError; // checks validity of email
}
I think that's enough info, let me know if more is needed.
Basically, what happens is the message "You did not enter a valid E-mail address" gets echoed no matter what. Whether a correct email address or an incorrect email address is entered.
Does anyone have any idea or a clue as to why?
EDIT: I'm running this on localhost (using Apache), could that be the reason as to why the preg_match ain't working?
Thanks!
Amit
Your regex only includes [A-Z], not [a-z]. Try
$match = "/\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,4}\b/i";
to make the regex case-insensitive.
You can test this live on http://regexpal.com.
However, I'd advise you to try one of the expressions on the page mentioned by strager: http://fightingforalostcause.net/misc/2006/compare-email-regex.php. They have been perfected over time and will probably behave better. But Gmail users will be satisfied with yours, since they'll be able to use plus aliases which are rejected incorrectly by many validators.
You likely got the regular expression you're using from regular-expressions.info. On that page, the author states (emphasis added):
If you want to use the regular expression above, there's two things you need to understand. First, long regexes make it difficult to nicely format paragraphs. So I didn't include a-z in any of the three character classes. This regex is intended to be used with your regex engine's "case insensitive" option turned on. (You'd be surprised how many "bug" reports I get about that.) Second, the above regex is delimited with word boundaries, which makes it suitable for extracting email addresses from files or larger blocks of text. If you want to check whether the user typed in a valid email address, replace the word boundaries with start-of-string and end-of-string anchors, like this: ^[A-Z0-9._%+-]+#[A-Z0-9.-]+.[A-Z]{2,4}$.
To solve this problem, add the i PCRE flag after your regular expression.
You can always try debugging your regex using a simpler tool (I'm quite fond of using Notepad++ for this purpose) and performing iterative tests - ie. making the expression more/less complicated and seeing if that fixes/breaks things.
I'm looking for a decent regex to match a URL (a full URL with scheme, domain, path etc.)
I would normally use filter_var but I can't in this case as I have to support PHP<5.2!
I've searched the web but can't find anything that I'm confident will be fool-proof, and all I can find on SO is people saying to use filter_var.
Does anybody have a regex that they use for this?
My code (just so you can see what I'm trying to achieve):
function validate_url($url){
if (function_exists('filter_var')){
return filter_var($url, FILTER_VALIDATE_URL);
}
return preg_match(REGEX_HERE, $url);
}
I have created a solution for validating the domain. While it does not specifically cover the entire URL, it is very detailed and specific. The question you need to ask yourself is, "Why am I validating a domain?" If it is to see if the domain actually could exist, then you need to confirm the domain (including valid TLDs). The problem is, too many developers take the shortcut of ([a-z]{2,4}) and call it good. If you think along these lines, then why call it URL validation? It's not. It's just passing the URL through a regex.
I have an open source class that will allow you to validate the domain not only using the single source for TLD management (iana.org), but it will also validate the domain via DNS records to make sure it actually exists. The DNS validating is optional, but the domain will be specifically valid based on TLD.
For example: example.ay is NOT a valid domain as the .ay TLD is invalid. But using the regex posted here ([a-z]{2,4}), it would pass. I have an affinity for quality. I try to express that in the code I write. Others may not really care. So if you want to simply "check" the URL, you can use the examples listed in these responses. If you actually want to validate the domain in the URL, you can have at the class I created to do just that. It can be downloaded at:
http://code.google.com/p/blogchuck/source/browse/trunk/domains.php
It validates based on the RFCs that "govern" (using the term loosely) what determines a valid domain. In a nutshell, here is what the domains class will do:
Basic rules of the domain validation
must be at least one character long
must start with a letter or number
contains letters, numbers, and hyphens
must end in a letter or number
may contain multiple nodes (i.e. node1.node2.node3)
each node can only be 63 characters long max
total domain name can only be 255 characters long max
must end in a valid TLD
can be an IP4 address
It will also download a copy of the master TLD file iana.org only after checking your local copy. If your local copy is outdated by 30 days, it will download a new copy. The TLDs in the file will be used in the REGEX to validate the TLD in the domain you are validating. This prevents the .ay (and other invalid TLDs) from passing validation.
This is a lengthy bit of code, but very compact considering what it does. And it is the most accurate. That's why I asked the question earlier. Do you want to do "validation" or simple "checking"?
You could try this one. I haven't tried it myself but it's surely the biggest regexp I've ever seen, haha.
^(?#Protocol)(?:(?:ht|f)tp(?:s?)\:\/\/|~\/|\/)?(?#Username:Password)(?:\w+:\w+#)?(?#Subdomains)(?:(?:[-\w]+\.)+(?#TopLevel Domains)(?:com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum|travel|[a-z]{2}))(?#Port)(?::[\d]{1,5})?(?#Directories)(?:(?:(?:\/(?:[-\w~!$+|.,=]|%[a-f\d]{2})+)+|\/)+|\?|#)?(?#Query)(?:(?:\?(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)(?:&(?:[-\w~!$+|.,*:]|%[a-f\d{2}])+=?(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)*)*(?#Anchor)(?:#(?:[-\w~!$+|.,*:=]|%[a-f\d]{2})*)?$
!(https?://)?([-_a-z0-9]+\.)*([-_a-z0-9]+)\.([a-z]{2,4})(/?)(.*)!i
I use this regular expression for validating URLs. So far it didn't fail me a single time :)
I've seen a regex that could actually validate any kind of valid URL but it was two pages long...
You're probably better off parsing the url with parse_url and then checking if all of your required bits are in order.
Addition:
This is a snip of my URL class:
public static function IsUrl($test)
{
if (strpos($test, ' ') > -1)
{
return false;
}
if (strpos($test, '.') > 1)
{
$check = #parse_url($test);
return is_array($check)
&& isset($check['scheme'])
&& isset($check['host']) && count(explode('.', $check['host'])) > 1
}
return false;
}
It tests the given string and requires some basics in the url, namely that the scheme is set and the hostname has a dot in it.