regexp not catching all emails - php

I am trying to get emails in a php program and I am using the following regexp
([a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum)\b)siU
This appears to be working fine for getting your standard emails. Such as me # gmail.com or you # hotmail.com
Where this fails is on emails with ending such as co.uk. Now I have tried adding co.uk in my regexp as such
([a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|co.uk|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum)\b)siU
But that just gives me the same output as the original regexp. Where the output of the email is, you#co . I also tried just adding in uk. What am I missing on this one? Is it the second period throwing it off?
Ideally I am trying to make it catch all emails with .com .net .org co.uk .au .ca. Basically I am searching for all US, UK, AU and CA emails. Can anyone spot what my mistake is to be able to output non US emails properly like you # yoursite.co.uk instead of you # yoursite.co
Thank you. The spaces in the emails shown for example are only there to get this to post.
Edit: I am not trying to validate the emails, its a series of emails that can be anything that are in an array and I am trying to only catch specific ones for a database. Sorry for not making that clear initially.
Edit2: Here is my working string for my issue. Thanks everyone
^([a-zA-Z0-9_\-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$^

Do NOT use regex to validate email addresses. Use libraries that do it correctly.
See I Knew How To Validate An Email Address Until I Read The RFC for more information.

|co.uk|
try to (not in [] , dot mean anything but break link)
|co\.uk|

You have to escape the dot like this.
co\.uk
If you are trying to match only valid emails this regexp should do the trick
(\s)?(([^\s]+)\#([^\s]+))

What am I missing on this one?
You are adding terms to a non-capturing group. So how can any output based on your regex contain anything in a non-capturing group? Not to mention the mistake in the term you added that nacholibre mentioned.

Dont use regex for email, you wont like it, and you will fail.
filter_var()
http://php.net/manual/en/function.filter-var.php
FILTER_VALIDATE_EMAIL
http://www.php.net/manual/en/filter.filters.validate.php

Related

Searching messages using regex - gmail api

I am looking at building a small web application which looks for emails which contain serial keys/codes in an email. I have tried searching around but not sure if Gmail API accepts searching in messages using REGEX.
Anyone got any ideas or used it before?
The Gmail API search has the same features the Gmail client has, which is documented here. It has no support for regex, sadly.
It's better to search with keywords rather regex. Gmail doesnt support regex based search.
Example:- subject:dinner Words in the subject line
Example:- "dinner and movie tonight" Search for an exact word or phrase

RegEx extract website url from email address w/ sub-sub-domain

We are trying to extract from an email list a valid url for that organization.
abc#charleston.k12.il.us is easy, but sometimes we have
someone#u40gw.effingham.k12.il.us where the 040gw is a subdomain for internal mail.
Another example is someone#mail.meridian223.org or someone#athletics.msstate.edu
What would be the most efficient way to capture the .edu + the preceding name only, without additional subdomains, or in the case of high schools the whole part k12.il.us plus the preceding name only?
Tried so far:
/#(([a-zA-Z0-9]*)([.])([a-zA-Z0-9]*)|#([a-zA-Z0-9]*)([.])([a-zA-Z0-9]*)([.])([a-zA-Z0-9]*)([.])([a-zA-Z0-9]*)([.])([a-zA-Z0-9]*))/
You can try the following regex pattern:
#.*?([^.]+[.]\w{3}|[^.]+[.]k12[.]il[.]us)$
Where, you can replace \w{3} with your list of possible extensions, like org, edu, net etc. An example would be like:
#.*?([^.]+[.](edu|org|net|info|com)|[^.]+[.]k12[.]il[.]us)$
You can see it working on regexr.com

Looking for a PHP regex or function to filter variations using . of an email for security

I am getting spam due to gmail allowing the use of . in their emails, so someone like this spammer.
q.i.n.ghu.im.i.n.g.o.u.r#gmail.com
can get through by removing and/or adding another period in his naming structure.
This happens to be on a Joomla install, so I am specifically looking to create a component so I can add to multiple sites, or if there is a simple regex to add inline existing code. Also, is there anything being done about this, as this seems to be along the lines of and be newly termed a loosely typed email address.. that is crazy to me.
If your goal is to match this address against the others that are equivalent to it (because you've already got them blacklisted) then I'd simply normalize the address to it's most basic state before storing it. Lower case it, split it at the #, and if the right side is "gmail.com" then remove all dots from the left side and put the halves back together.
start with JOE.SCHMOE#GMAIL.COM
lowercase to joe.schmoe#gmail.com
split to joe.schmoe and gmail.com
since right side is gmail.com, remove dots from left
reassemble to joeschmoe#gmail.com
Now you've got the base address that you can block/ban/whatever.
You could do something simple like: /^(?:[^#]+\.){5,}[^#]+#(?:[^#]+\.)+[^#]+/
This is just quick toss up not meant for validation, but rather, a pointer to tell you if their email is scetchy. The key here is the {5,} quantifier that says if the email has 5 or more dots (like a.b.c.d.e.f) it will match. In other words be flagged as scetchy.
I hope this helps!
Explanation: http://regex101.com/r/lB5vG3

regex help... php check entry format

Im using php to develop an application, but I am running into some issues with regex...
I found a few sites that explain it, but it is for some reason over my head? can someone please help explain regex arguements?
I uploaded a sample of what I am working on here...
First, click on the "+" button at top right to get to the add content view.
Basically, I need it so when you submit from this form, php will check that the values are formatted correctly.
Domain: this can be .com, .co, .biz, .info, etc... User can enter the prefix, like a url, and php gets rid of it... so the ending strings in the array are just domain.com
domain1.com
somedomain.biz
mydomain.co
Redirect: with this one, php uses the ',' so we are left with the ip, and the domainkey as seperate strings, the ip can be 2-3 numbers per section!, so ###.##.##.###, or even ##.##.##.##, and the domain key is a varchar(not so important)
##.##.##.##, domainkey
###.###.###.###, domainkey
Solution for redirect:
(\d{1,3}\.){3}\d{1,3}
/24's: this is similar to the redirect IP, but the end will always end in '0/24'
##.##.###.0/24
##.##.###.0/24
Names:* This one should be the easiest, it can only be letters, no numbers... any length... *
randomname
thisisaname
May I suggest using some software or even website that allows you to test your regex. Such as:
The Regex Coach
Regexpal
RegExr
Expresso
RegexDesigner
etc
It really depends on how strict you want to get with it and how fancy you want to make your regex.
/((\d{1,3}).){3}(\d{1,3})(\/\d{2})?/

Block certain email providers using Regex

Please dont downvote the question because of the fact that the answer Im looking for is not an anser someone should pursue. I'm fully aware of that, but it's not my idea, I just have to deliver :D
In cakephp, I have the following dataentry in my model:
'email' => array(
'email' => array(
'rule' => array('email',false,'(^[a-zA-Z][\w\.-]*[a-zA-Z0-9]#[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]$)')
),
)
The email rule is a common function in cakephp data validation, and the second and third parameter are optional. The third being the regex. I wasnt happy with the given regex string so I added my own. Now I want to exclude Gmail, Hotmail and yahoo addresses.
How can I change the Regular Expression so those addresses are producing false as result? I cant get it right.
Why on earth would you want to exclude gmail, hotmail and yahoo addresses? There are plenty of people who only have one of these addresses and no other. This is a bad idea. If you are target a specific "audience" I'd suggest making a list of allowed domains instead.
Anyway, here's a functional regex for you which is shorter than the one you already have.. try it out:
\b[\w\.-]+#((?!gmail|googlemail|yahoo|hotmail).)[\w\.-]+\.\w{2,4}\b
Don't use a regex for this.
The proper solution is to explode() the email address at the # sign and then use plain string comparisons or even in_array() to check if the domain is blacklisted.

Categories