Please dont downvote the question because of the fact that the answer Im looking for is not an anser someone should pursue. I'm fully aware of that, but it's not my idea, I just have to deliver :D
In cakephp, I have the following dataentry in my model:
'email' => array(
'email' => array(
'rule' => array('email',false,'(^[a-zA-Z][\w\.-]*[a-zA-Z0-9]#[a-zA-Z0-9][\w\.-]*[a-zA-Z0-9]\.[a-zA-Z][a-zA-Z\.]*[a-zA-Z]$)')
),
)
The email rule is a common function in cakephp data validation, and the second and third parameter are optional. The third being the regex. I wasnt happy with the given regex string so I added my own. Now I want to exclude Gmail, Hotmail and yahoo addresses.
How can I change the Regular Expression so those addresses are producing false as result? I cant get it right.
Why on earth would you want to exclude gmail, hotmail and yahoo addresses? There are plenty of people who only have one of these addresses and no other. This is a bad idea. If you are target a specific "audience" I'd suggest making a list of allowed domains instead.
Anyway, here's a functional regex for you which is shorter than the one you already have.. try it out:
\b[\w\.-]+#((?!gmail|googlemail|yahoo|hotmail).)[\w\.-]+\.\w{2,4}\b
Don't use a regex for this.
The proper solution is to explode() the email address at the # sign and then use plain string comparisons or even in_array() to check if the domain is blacklisted.
Related
I'm trying to aggregate stats on referrers to my site, to give me a simple display of top referrers. Unfortunately referrer data is untrustworthy, and often dirty, so I'm just trying to make a good faith attempt to get something like usable data.
I've already filtered bad urls, and used url_parts to get the host portion of each url. I've then stripped common aliased subdomains, and social media url-shorteners, like t.co or fb.me
The big issue that remains is webmail. many webmail providers shunt their users to a sub-sub domain, as soon as they log in, for load-balancing. This is easy to filter, for mail services like yahoo, as they are all something.something.mail.yahoo.com, so I can just check if the third from last segment is "mail" or a similar subdomain, and strip all previous segments.
But now I am left with the hard cases, subdomains like:
webmaila (like webmaila.juno.com)
email16 (like email16.secureserver.net)
webmailb (like webmailb.netzero.net)
I need to find entries that start with 'mail', 'webmail', 'email', or 'mailbox', followed by any string, and strip the string, leaving me with just the appropriate prefix.
How can I do that?
echo preg_replace('#^(webmaila|email16|webmailb)(.+)?#', '$1', $string);
I am trying to get emails in a php program and I am using the following regexp
([a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum)\b)siU
This appears to be working fine for getting your standard emails. Such as me # gmail.com or you # hotmail.com
Where this fails is on emails with ending such as co.uk. Now I have tried adding co.uk in my regexp as such
([a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*#(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+(?:[A-Z]{2}|com|co.uk|org|net|gov|mil|biz|info|mobi|name|aero|jobs|museum)\b)siU
But that just gives me the same output as the original regexp. Where the output of the email is, you#co . I also tried just adding in uk. What am I missing on this one? Is it the second period throwing it off?
Ideally I am trying to make it catch all emails with .com .net .org co.uk .au .ca. Basically I am searching for all US, UK, AU and CA emails. Can anyone spot what my mistake is to be able to output non US emails properly like you # yoursite.co.uk instead of you # yoursite.co
Thank you. The spaces in the emails shown for example are only there to get this to post.
Edit: I am not trying to validate the emails, its a series of emails that can be anything that are in an array and I am trying to only catch specific ones for a database. Sorry for not making that clear initially.
Edit2: Here is my working string for my issue. Thanks everyone
^([a-zA-Z0-9_\-\.]+)#((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$^
Do NOT use regex to validate email addresses. Use libraries that do it correctly.
See I Knew How To Validate An Email Address Until I Read The RFC for more information.
|co.uk|
try to (not in [] , dot mean anything but break link)
|co\.uk|
You have to escape the dot like this.
co\.uk
If you are trying to match only valid emails this regexp should do the trick
(\s)?(([^\s]+)\#([^\s]+))
What am I missing on this one?
You are adding terms to a non-capturing group. So how can any output based on your regex contain anything in a non-capturing group? Not to mention the mistake in the term you added that nacholibre mentioned.
Dont use regex for email, you wont like it, and you will fail.
filter_var()
http://php.net/manual/en/function.filter-var.php
FILTER_VALIDATE_EMAIL
http://www.php.net/manual/en/filter.filters.validate.php
I am getting spam due to gmail allowing the use of . in their emails, so someone like this spammer.
q.i.n.ghu.im.i.n.g.o.u.r#gmail.com
can get through by removing and/or adding another period in his naming structure.
This happens to be on a Joomla install, so I am specifically looking to create a component so I can add to multiple sites, or if there is a simple regex to add inline existing code. Also, is there anything being done about this, as this seems to be along the lines of and be newly termed a loosely typed email address.. that is crazy to me.
If your goal is to match this address against the others that are equivalent to it (because you've already got them blacklisted) then I'd simply normalize the address to it's most basic state before storing it. Lower case it, split it at the #, and if the right side is "gmail.com" then remove all dots from the left side and put the halves back together.
start with JOE.SCHMOE#GMAIL.COM
lowercase to joe.schmoe#gmail.com
split to joe.schmoe and gmail.com
since right side is gmail.com, remove dots from left
reassemble to joeschmoe#gmail.com
Now you've got the base address that you can block/ban/whatever.
You could do something simple like: /^(?:[^#]+\.){5,}[^#]+#(?:[^#]+\.)+[^#]+/
This is just quick toss up not meant for validation, but rather, a pointer to tell you if their email is scetchy. The key here is the {5,} quantifier that says if the email has 5 or more dots (like a.b.c.d.e.f) it will match. In other words be flagged as scetchy.
I hope this helps!
Explanation: http://regex101.com/r/lB5vG3
I am trying to parse a prose paragraph for anything that might resemble an address. I have a database of addresses I am matching against and these are the only addresses I am interested in. I'm using a lamp server but technology specific answers aren't what I require right now. More of a question of how.
Can anyone provide ideas? Perhaps Regex? or perhaps I should use a database of cities/states etc?
Thanks.
It looks like this question hasn't gotten answered because it's entirely unclear what the problem parameters are. If you want a more specific answer to a problem, please describe your problem more fully.
In general I would suggest aproaching a problem like this using some piece of known data ... small collection of words or formats that delieniate and address, then match on the context of those words to see if they really flesh out to a full address.
I'm trying to see what would be a good way to validate a US address, I know that there might be not a proper way of doing this, but I'm going for the basic way: #, Street name, City, State, and Zip Code.
Any ideas will be appreciate it. Thanks
Don't try. Somebody is likely to have a post office box, or an apartment number etc., and they will be really irate with you. Even a "normal" street name can have numbers, like 125th Street (and many others) in New York City. Even a suburb can have some numbered streets.
And city names can have spaces.
Ask the user to enter parts of the address in separate fields (Street name, City, State, and Zip Code) and use whatever validation appropriate for such a field. This is the general practice.
Alternatively, if you want simplest of regex that matches for four strings separated by three commas, try this:
/^(.+),([^,]+),([^,]+),([^,]+)$/
If things match, you can use additional pattern matching to check components of the address. There is no possible way to check the street address validity but you might be able to text postal codes and state codes.
There are way too many variations in address to be able to do this using regular expressions. You're better off finding a web service that can validate addresses. USPS has one - you'll have to request permission to use it.
I agree with salman: have user enter the data in different fields (one for zip, one for state, one for city, and one for the #/street name. Use a different regex for each field. For the street #/name the best expression i came up with was
/^[0-9]{1,7} [a-zA-z0-9]{2,35}\a*/
This is not a bulletproof solution but the assumption is that an address begins with a numeric for the street number and ends with a zip code which can either be 5 or 9 numbers.
([0-9]{1,} [\s\S]*? [0-9]{5}(?:-[0-9]{4})?)
Like I said, it's not bulletproof, but I've used it with marginal success in the past.
Over here in New Zealand, you can license the official list of postal addresses from New Zealand Post - giving you the data needed to populate a table with every valid postal address in New Zealand.
Validating against this list is a whole lot easier than trying to come up with a Regex - and the accuracy is much much higher as well, as you end up with three cases:
The address you're validating is in the list, so you know it is a real address
The address you're validating is very similar to one in the list, so you know it is probably a real address
The address you're validating is not similar in the list, so it may or may not be real.
The best you'll get with a RegEx is
The address you're validating matches the regex, so it might be a real address
The address you're validating does not match the regex, so it might not be a real address
Needing to know postal addresses is a pretty common situation for many businesses, so I believe that licensing a list will be possible in most areas.
The only sticky bit will be pricing.