Email regex failing on certain email addresses - php

I'm using this code to parse out emails from a string:
function get_emails ($str) {
$pattern = '/([a-z0-9])(([-a-z0-9._])*([a-z0-9]))*\#([a-z0-9])' .
'(([a-z0-9-])*([a-z0-9]))+' . '(\.([a-z0-9])([-a-z0-9_-])?([a-z0-9])+)/i';
preg_match ($pattern, $str, $matches);
return $matches;
}
It works well except when the address has more than one period in domain. So johndoe#yahoo.com works fine but johndoe#yahoo.co.uk get's cut at johndoe#yahoo.co
What can I change to fix this?
Thanks!

You could add a + before the end, i.e. +/i.

Related

Validate url with query string containing email address using PHP

Hi I have problem with correct url validation with query string containing email address like:
https://example.com/?email=john+test1#example.com
this email is ofc correct one john+test1#example.com is an alias of john#example.com
I have regex like this:
$page = trim(preg_replace('/[\\0\\s+]/', '', $page));
but it don't work as I expected because it replaces + to empty string what is wrong. It should keep this + as alias of email address and should cut out special characters while maintaining the correctness of the address.
Example of wrong url with +:
https://examp+le.com/?email=example#exam+ple.com
Other urls without email in query string should be validating correctly using this regex
Any idea how to solve it?
I think this is what you looking for:
<?php
function replace_plus_sign($string){
return
preg_replace(
'/#/',
'+',
preg_replace(
'/\++/i',
'',
preg_replace_callback(
'/(email([\d]+)?=)([^#]+)/i',
function($matches){
return $matches[1] . preg_replace('/\+(?!$)/i', '#', $matches[3]);
},
$string
)
)
);
}
$page = 'https://exam+ple.com/email=john+test1+#example.com&email2=john+test2#exam+ple.com';
echo replace_plus_sign($page);
Gives the following output:
https://example.com/email=john+test1#example.com&email2=john+test2#example.com
At first, I replaced the valid + sign on email addresses with a #, then removing all the remainings +, after that, I replaced the # with +.
This solution won't work if there's a #s on the URL if so you will need to use another character instead of # for the temporary replacement.

Retrieve full email address from string

I'm currently building a Slack bot using Laravel, and one of the features is that it can receive an email address and send a message to it.
The issue is that email addresses (e.g bob#example.com) come through as <mailto:bob#example.com|bob#example.com> from Slack.
I currently have a function that retrieves the email from this:
public function getEmail($string)
{
$pattern = '/[a-z0-9_\-\+]+#[a-z0-9\-]+\.([a-z]{2,3})(?:\.[a-z]{2})?/i';
preg_match_all($pattern, $string, $matches);
$matches = array_filter($matches);
return $matches[0][0];
}
This seemed to be working fine with email addresses like bob#example.com, however it seems to fail when working with email addresses like bob.jones#example.com (which would come through as <mailto:bob.jones#example.com|bob.jones#example.com>.
In these cases, the function is returning jones#example.com as the email address.
I'm not great with regex, but is there something else I could use/change in my pattern, or a better way to fetch the email address from the string provided by Slack?
Could always take regex out of the equation if you know that's always the format it'll be in:
$testString = '<mailto:bob#example.com|bob#example.com>';
$testString = str_replace(['<mailto:', '>'], '', $testString);
$addresses = explode('|', $testString);
echo $addresses[0];
This method will do the job and you avoid to have regular expressions. and make sure the email being returned is a real email address by validating it with php functions.
function getEmailAddress($string)
{
$string = trim($string, '<>');
$args = explode('|', $string);
foreach ($args as $_ => $val) {
if(filter_var($val, FILTER_VALIDATE_EMAIL) !== false) {
return $val;
}
}
return null;
}
echo getEmailAddress('<mailto:bob#example.com|bob#example.com>');
Output
bob#example.com
You know the strings containing the e-mail address will always be of the form <mailto:bob#example.com|bob#example.com>, so use that. Specifically, you know the string will start with <mailto:, will contain a |, and will end with >.
An added difficulty though, is that the local part of an e-mail address may contain a pipe character as well, but the domain may not; see the following question.
What characters are allowed in an email address?
public function getEmail($string)
{
$pattern = '/^<mailto:([^#]+#[^|]+)|(.*)>$/i';
preg_match_all($pattern, $string, $matches);
$matches = array_filter($matches);
return $matches[1][0];
}
This matches the full line from beginning to end, but we capture the e-mail address within the first set of parentheses. $matches[1] contains all matches from the first capturing parentheses. You could use preg_match instead, since you're not looking for all matches, just the first one.

How to extract Email & Name from Full Email text using PHP?

I have a string as
$email_string='Aslam Doctor <aslam.doctor#gmail.com>';
From which I want to extract Name & Email using PHP? so that I can get
$email='aslam.doctor#gmail.com';
$name='Aslam Doctor'
Thanks in advance.
As much as people will probably recommend regular expression I'd say use explode().
Explode splits the string up in several substrings using any delimiter.
In this case I use ' <' as a delimiter to immediately strip the whitespace between the name and e-mail.
$split = explode(' <', $email_string);
$name = $split[0];
$email = rtrim($split[1], '>');
rtrim() will trim the '>' character from the end of the string.
Using explode + list:
$email_string = 'Aslam Doctor <aslam.doctor#gmail.com>';
list($name, $email) = explode(' <', trim($email_string, '> '));
If you can use the IMAP extension, the imap_rfc822_parse_adrlist function is all you need.
/via https://stackoverflow.com/a/3638433/204774
text variable have one paragraph. two emails are included there. using extract_emails_from_string() function we extracts those mails from that paragraph.
preg_match_all function will return all matching strings with the regular expression from inputs.
function extract_emails_from_string($string){
preg_match_all("/[\._a-zA-Z0-9-]+#[\._a-zA-Z0-9-]+/i", $string, $matches);
return $matches[0];
}
$text = "Please be sure to answer the Please arun1#email.com be sure to answer the Please be sure to answer the Please be sure to answer the Please be sure to answer the Please be sure to answer the Please be sure to answer the arun#email.com";
$emails = extract_emails_from_string($text);
print(implode("\n", $emails));
This is what I use - works for email addresses with and without the angle bracket formatting. Because we are searching from right to left, this also works for those weird instances where the name segment actually contains the < character:
$email = 'Aslam Doctor <aslam.doctor#gmail.com>';
$address = trim(substr($email, strrpos($email, '<')), '<>');

Regex Get Email handle from Email Address

I have an email address that could either be
$email = "x#example.com"; or $email="Johnny <x#example.com>"
I want to get
$handle = "x"; for either version of the $email.
How can this be done in PHP (assuming regex). I'm not so good at regex.
Thanks in advance
Use the regex <?([^<]+?)# then get the result from $matches[1].
Here's what it does:
<? matches an optional <.
[^<]+? does a non-greedy match of one or more characters that are not ^ or <.
# matches the # in the email address.
A non-greedy match makes the resulting match the shortest necessary for the regex to match. This prevents running past the #.
Rubular: http://www.rubular.com/r/bntNa8YVZt
Here is a complete PHP solution based on marcog's answer
function extract_email($email_string) {
preg_match("/<?([^<]+?)#([^>]+?)>?$/", $email_string, $matches);
return $matches[1] . "#" . $matches[2];
}
echo extract_email("ice.cream.bob#gmail.com"); // outputs ice.cream.bob#gmail.com
echo extract_email("Ice Cream Bob <ice.cream.bob#gmail.com>"); // outputs ice.cream.bob#gmail.com
Just search the string using this basic email-finding regex: \b[A-Z0-9._%+-]+#[A-Z0-9.-]+.[A-Z]{2,4}\b
It will match any email in any text, and in your first string it will match the whole string, and in the second, only the part of the string that is e-mail.
To quickly learn regexp this is the best place: http://www.regular-expressions.info
$email = 'x#gmail.com';
preg_match('/([a-zA-Z0-9\-\._\+]+#[a-z0-9A-Z\-\._]+\.[a-zA-Z]+)/', $email, $regex);
$handle = array_shift(explode('#', $regex[1]));
Try that (Not tested)

Regular expression and newline

I have such text:
<Neednt#email.com> If you do so, please include this problem report.
<Anotherneednt#email.com> You can delete your
own
text from the attached returned message.
The mail system
<Some#Mail.net>: connect to *.net[82.*.86.*]: Connection timed
out
I have to parse email from it. Could you help me with this job?
upd
There could be another email addresses in <%here%>. There should be connection between 'The mail system' text. I need in email which goes after that text.
Considering this text is stored in $text, what about this :
$matches = array();
if (preg_match('/<([^>]+)>/', $text, $matches)) {
var_dump($matches[1]);
}
Which gives me :
string 'Some#Mail.net' (length=13)
Basically, I used a pretty simple regex, that matches :
a < character
anything that's not a > character : [^>]
at least one time : [^>]+
capturing it : ([^>]+)
a > character
So, it captures anything that's between < and >.
Edit after comments+edit of the OP :
If you only want the e-mail address that's after The mail system, you could use this :
$matches = array();
if (preg_match('/The mail system\s*<([^>]+)>/', $text, $matches)) {
var_dump($matches[1]);
}
In addition to what I posted before, this expects :
The string The mail system
Any number of white-characters : \s*
You want to use preg_match() and looking at this input it should be simple:
<?php
if (preg_match('/<([^>]*?#[^>]*>/', $data, $matches)) {
var_dump($matches); // specifically look at $matches[1]
}
There are other patterns that would match it, you don't have to stick to that same pattern. The '<' and '>' in your input are helpful here.

Categories