I'm currently using
if(preg_match('~#(semo\.edu|uni\.uu\.se|)$~', $email))
as a domain check.
However I need to only check if the e-mail ends with the domains above. So for instance, all these need to be accepted:
hello#semo.edu
hello#student.semo.edu
hello#cool.teachers.semo.edu
So I'm guessing I need something after the # but before the ( which is something like "any random string or empty string". Any regexp-ninjas out there who can help me?
([^#]*\.)? works if you already know you're dealing with a valid email address. Explanation: it's either empty, or anything that ends with a period but does not contain an ampersand. So student.cs.semo.edu matches, as does plain semo.edu, but not me#notreallysemo.edu. So:
~#([^#]*\.)?(semo\.edu|uni\.uu\.se)$~
Note that I've removed the last | from your original regex.
You can use [a-zA-Z0-9\.]* to match none or more characters (letters, numbers or dot):
~#[a-zA-Z0-9\.]*(semo\.edu|uni\.uu\.se|)$~
Well .* will match anything. But you don't actually want that. There are a number of characters that are invalid in a domain name (ex. a space). Instead you want something more like this:
[\w.]*
I might not have all of the allowed characters, but that will get you [A-Za-z0-9_.]. The idea is that you make a list of all the allowed characters in the square brakets and then use * to say none or more of them.
Related
I am using the following regex to validate emails and just noticed some problems and don't see what the issue is :
/^[a-z0-9_.-]+#[a-z0-9.-]+.[a-z]{2,6}$/i.test(value)
support#tes is invalid
support#test is valid
support#test.c is invalid
support#test.co is valid
the 2,6 is for requiring and ending tld between 2 or 6 and that does not appear to be working either. I am sure I had this working properly before.
In a regex, . is a wildcard (meaning any char). you need to escape it as \.
Keep in mind though, the regex is too restrictive. You can have non-alpha numeric chars in the address, like '
I notice you're not escaping the .. There might be more to it than that, but that jumps out at me.
This is a decent check for an e-mail with Regex
\w+([-+.']\w+)*#\w+([-.]\w+)*\.\w+([-.]\w+)*
However you may want to read this. Using a regular expression to validate an email address
There are many ways to regex an email address. depending on how precise and restrictive you want it, but to re-write a working regex closest to what you have in you question. This should work:
^[\w_.-]+#[\w]+\.[\w]{2,6}$
support#tes - Invalid
support#test - Invalid
support#test.c - Invalid
support#test.co - Valid
supp34o.rt#tes.com - Valid
But also keep in mind ALL the characters allowed in a valid email address - What characters are allowed in an email address?
Problem: authors have added email addresses wrongly in a CMS - missing out the 'mailto:' text.
I need a regular expression, if possible, to do a search and replace on the stored MySQL content table.
Cases I need to cope with are:
No 'mailto:'
'mailto:' is already included (correct)
web address not email - no replace
multiple mailto: required (more than one in string)
Sample string would be: (line breaks added for readability)
add1#test.com and
add2#test.com and
real web link
second one to replace add3#test.com
Required output would be:
add1#test.com and
add2#test.com and
real web link
second one to replace add3#test.com
What I tried (in PHP) and issues:
pattern: /href="(.+?)(#)(.+?)(<\/a> )/iU
replacement: href="mailto:$1$2$3$4
This is adding mailto: to the correctly formatted mailto: and acting greedily over the last two links.
Thanks for any help. I have looked about, but am running out of time on this as it was an unexpected content issue.
If you are able to save me time and give the SQL expression, that would be even better.
Try replace
/href="(?!(mailto:|http:\/\/|www\.))/iU
with
href="mailto:
?! loosely means "the next characters aren't these".
Alternative:
Replace
/(href=")(?!mailto:)([^"]+#)/iU
with
$1mailto:$2
[^"]+ means 1 or more characters that aren't ".
You'd probably need a more complex matching pattern for guaranteed correctness.
MySQL REGEX matching:
See this or this.
You need to apply a proper mail pattern first (e.g: Using a regular expression to validate an email address), second search for mailto:before mail or nothing (e.g: (mailto:|)), and last preg_replace_callback suits for this.
This looks like working as you wish (searching only email addresses in double quotes);
$s = 'add1#test.com and
add2#test.com and
real web link
second one to replace add3#test.com';
echo preg_replace_callback(
'~"(mailto:|)([_a-z0-9-]+(\.[_a-z0-9-]+)*#[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,4}))"~i',
function($m) {
// print_r($m); #debug
return '"mailto:'. $m[2] .'"';
},
$s
);
Output as you desired;
add1#test.com and
add2#test.com and
real web link
second one to replace add3#test.com
Use the following as pattern:
/(href=")(?!mailto:)(.+?#.+?")/iU
and replace it with
$1mailto:$2
(?!mailto:) is a negative lookahead checking whether a mailto: follows. If there is no such one, remaining part is checked for matching. (.+?#.+?") matches one or more characters followed by a # followed by one or more characters followed by a ". Both + are non-greedy.
The matched pattern is replaced with first capture group (href=") followed by mailto: followed by second capture group (upto closing ").
This sounds strange, but I've been using this function for quite a while now and "suddenly, from one day to the other" it does not filter some addresses in the right way anymore. However, I cannot see why...
function validate_email($email)
{
/*
(Name) Letters, Numbers, Dots, Hyphens and Underscores
(# sign)
(Domain) (with possible subdomain(s) ).
Contains only letters, numbers, dots and hyphens (up to 255 characters)
(. sign)
(Extension) Letters only (up to 10 (can be increased in the future) characters)
*/
$regex = '/([a-z0-9_.-]+)'. # name
'#'. # at
'([a-z0-9.-]+){2,255}'. # domain & possibly subdomains
'.'. # period
'([a-z]+){2,10}/i'; # domain extension
if($email == '') {
return false;
}
else {
$eregi = preg_replace($regex, '', $email);
}
return empty($eregi) ? true : false;
}
e.g. "some#gmail" will be shown as correct, etc so it seems sth happened with the tld - does anybody could tell me why?
Thank you very much in advance!
. means any character. You should escape it if you actually mean 'dot': \.
Your regex also has some other problems:
No uppercases are allowed in your regex: [a-zA-Z0-9]
No unicode characters are allowed in your regex (for example email addresses with é, ç, ... etc)
Some special characters such as + are in fact allowed in an email address
...
I would keep the email validation very simple. Like check if there is a # present and pretty much keep it at that. For if you really want to validate an email, the regex becomes gruesome.
Check this SO answer for a more detailed explanation.
What you commented with "period":
'.'. # period
is in fact a placeholder for any character. It should be \. instead.
However, you're overcomplicating things. Such validation should exist to reject either empty fields or obviously wrong stuff (e.g. name put in the email field). So in my experience the best check is just to look whether it contains an # and don't worry too much about getting the structure right. You can in fact write a regex which will faithfully validate any valid email address and reject any invalid one. It's a monster spanning about a screen of text. Don't do that. KISS.
I think the error is in this line:
'.'. # period
You mean a literal period here. But periods have a special meaning in regular expressions (they mean "any character").
You need to escape it with a backslash.
What about FILTER_VALIDATE_EMAIL
i am stuck at particular problem i have username field where on only alphabets numbers and . - and _ are allowed and should always start with alphabet
here are examples what are accepted
someone#mydomain.com
something1234#mydomain.com
someething.something#mydomain.com
something-something#mydomain.com
something_something#mydomain.com
something_1234#mydomain.com
something.123#mydomain.com
something-456#mydomain.com
what i have done till now is
[a-zA-Z0-9]+[._-]{1,1}[a-zA-Z0-9]+#mydomain.com
this matches all my requirement except of problem it dosent match
someone#mydomain.com
someont123#mydomain.com
but it even matches
someone_someone_something#mydomain.com
which is not required i am really not getting how to solve this one thing i tried is
[a-zA-Z0-9]+[._-]{0}[a-zA-Z0-9]+#mydomain.com
but this is also not solving my problem now it accepts everything like
something+455#mydomain.com
which is not required please help me
If you want to make the - or . optional, then you have to replace the {1,1} (quantifier: once) with an ? (quantifier: one or none) here:
[a-zA-Z0-9]+[._-]?[a-zA-Z0-9]+#mydomain.com
The reason this regex also matches shorter addresses without delimiter -._ is that you don't assert the whole string, but just some part of it. Use start ^ and end $ anchors:
^[a-zA-Z0-9]+[._-]?[a-zA-Z0-9]+#mydomain\.com$
This is why we have filter_var($email, FILTER_VALIDATE_EMAIL).
If email address is valid, then you just have to check if it ends with #domain.com. That could be done with strrpos($email, '#domain.com').
what I need is not email validation..
Its simple.
Allow #hello.world or #hello_world or #helloworld but #helloworld. should be taken as #helloworld so as #helloworld?
In short check for alphabet or number after . and _ if not than take the string before it.
My existing RegEx is /#.([A-Za-z0-9_]+)(?=\?|\,|\;|\s|\Z)/ it only cares with #helloworld and not the #hello.world or #hello_world.
Update:
So now I got a regex which deals with problem number 1. i.e. Allow #hello.world or #hello_world or #helloworld but still What about #helloworld. should be taken as #helloworld so as #helloworld?
New RegEx: /#([A-Za-z0-9+_.-]+)/
Don't use a regex for that.
Use...
$valid = filter_var($str, FILTER_VALIDATE_EMAIL);
Regex will never be able to verify an email, only to do some very basic format checking.
The most comprehensive regex for matching email addresses was 8000 chars long, and that one is already invalid due to changes in what is accepted in emails.
Use some designed library for the checking if you need to get real verification, otherwise just check for # and some dots, anything more and you will probably end up invalidating perfectly legal email addresses.
Some examples of perfectly legal email addresses: (leading and trailing " are for showing boundary only"
"dama#nodomain.se"
"\"dama\"#nodomain.se"
"da/ma#nodomain.se"
"dama#nõdomain.se"
"da.ma#nodomain.se"
"dama#pa??de??µa.d???µ?"
"dama #nodomain .se"
"dama#nodomain.se "
You can use this regexp to validate email addresses
^[A-Z0-9._%+-]+#[A-Z0-9.-]+.[A-Z]{2,6}$.
For more information and complete complete expressions you can check here
I hope this helps you
Try this:
\#.+(\.|\?|;|[\r\n\s]+)