Retrieve full email address from string

Retrieve full email address from string - php

I'm currently building a Slack bot using Laravel, and one of the features is that it can receive an email address and send a message to it.
The issue is that email addresses (e.g bob#example.com) come through as <mailto:bob#example.com|bob#example.com> from Slack.
I currently have a function that retrieves the email from this:
public function getEmail($string)
{
$pattern = '/[a-z0-9_\-\+]+#[a-z0-9\-]+\.([a-z]{2,3})(?:\.[a-z]{2})?/i';
preg_match_all($pattern, $string, $matches);
$matches = array_filter($matches);
return $matches[0][0];
}
This seemed to be working fine with email addresses like bob#example.com, however it seems to fail when working with email addresses like bob.jones#example.com (which would come through as <mailto:bob.jones#example.com|bob.jones#example.com>.
In these cases, the function is returning jones#example.com as the email address.
I'm not great with regex, but is there something else I could use/change in my pattern, or a better way to fetch the email address from the string provided by Slack?

Could always take regex out of the equation if you know that's always the format it'll be in:
$testString = '<mailto:bob#example.com|bob#example.com>';
$testString = str_replace(['<mailto:', '>'], '', $testString);
$addresses = explode('|', $testString);
echo $addresses[0];

This method will do the job and you avoid to have regular expressions. and make sure the email being returned is a real email address by validating it with php functions.
function getEmailAddress($string)
{
$string = trim($string, '<>');
$args = explode('|', $string);
foreach ($args as $_ => $val) {
if(filter_var($val, FILTER_VALIDATE_EMAIL) !== false) {
return $val;
}
}
return null;
}
echo getEmailAddress('<mailto:bob#example.com|bob#example.com>');
Output
bob#example.com

You know the strings containing the e-mail address will always be of the form <mailto:bob#example.com|bob#example.com>, so use that. Specifically, you know the string will start with <mailto:, will contain a |, and will end with >.
An added difficulty though, is that the local part of an e-mail address may contain a pipe character as well, but the domain may not; see the following question.
What characters are allowed in an email address?
public function getEmail($string)
{
$pattern = '/^<mailto:([^#]+#[^|]+)|(.*)>$/i';
preg_match_all($pattern, $string, $matches);
$matches = array_filter($matches);
return $matches[1][0];
}
This matches the full line from beginning to end, but we capture the e-mail address within the first set of parentheses. $matches[1] contains all matches from the first capturing parentheses. You could use preg_match instead, since you're not looking for all matches, just the first one.

Related

Email regex failing on certain email addresses

I'm using this code to parse out emails from a string:
function get_emails ($str) {
$pattern = '/([a-z0-9])(([-a-z0-9._])*([a-z0-9]))*\#([a-z0-9])' .
'(([a-z0-9-])*([a-z0-9]))+' . '(\.([a-z0-9])([-a-z0-9_-])?([a-z0-9])+)/i';
preg_match ($pattern, $str, $matches);
return $matches;
}
It works well except when the address has more than one period in domain. So johndoe#yahoo.com works fine but johndoe#yahoo.co.uk get's cut at johndoe#yahoo.co
What can I change to fix this?
Thanks!

You could add a + before the end, i.e. +/i.

Unique email addresses by domain

Im trying to make the below function only return 1 email per domain.
Example: if i feed the function:
email1#domain.com email2#domain.com email1#domain.com
email1#domain.com email3#test.co.uk
I want it to return
email1#domain.com email3#test.co.uk
Here is the current function:
function remove_duplicates($str) {
# match all email addresses using a regular expression and store them
# in an array called $results
preg_match_all("([\w-]+(?:\.[\w-]+)*#(?:[\w-]+\.)+[a-zA-Z]{2,7})",$str,$results);
# sort the results alphabetically
sort($results[0]);
# remove duplicate results by comparing it to the previous value
$prev="";
while(list($key,$val)=each($results[0])) {
if($val==$prev) unset($results[0][$key]);
else $prev=$val;
}
# process the array and return the remaining email addresses
$str = "";
foreach ($results[0] as $value) {
$str .= "<br />".$value;
}
return $str;
};
Any ideas how to achieve this?

Something along these lines:
$emails = array('email1#domain.com', 'email2#domain.com', 'email1#domain.com', 'email1#domain.com', 'email3#test.co.uk');
$grouped = array();
foreach ($emails as $email) {
preg_match('/(?<=#)[^#]+$/', $email, $match);
$grouped[$match[0]] = $email;
}
var_dump($grouped);
This keeps the last occurrence of a domain, it's not hard to modify to keep the first instead if you require it.

You could simply use the array_unique function to do the job for you:
$emails = explode(' ', $emailString);
$emails = array_unique($emails);

The concept prev is not reliable unless all equal hostnames are in one continuous sequence. It would work if you were sorting by hostname, with a sorting function provided, but it's a bit of overkill.
Build an array with the hostnames, drop entries for which there is already a hostname in the array.

I'd suggest the following trick/procedure:
Change from one string to array of addresses. You do this with preg_match_all, others might do it with explode, all seems valid. So you have this already.
Extract the domain from the address. You could do this again with an regular expression or some other thing, I'd say it's trivial.
Now check if the domain has been already used, and if not, pick that email address.
The last point can be easily done by using an array and the domain as key. You can then use isset to see if it is already in use.
Edit: As deceze opted for a similar answer (he overwrites the matches per domain), the following code-example is a little variation. As you have got string input, I considered to iterate over it step by step to spare the temporary array of addresses and to do the adress and domain parsing at once. To do that, you need to take care of the offsets, which is supported by preg_match. Something similar is actually possible with preg_match_all however, you would then have the array again.
This code will pick the first and ignore the other addresses per domain:
$str = 'email1#domain.com email2#domain.com email1#domain.com email1#domain.com email3#test.co.uk';
$addresses = array();
$pattern = '/[\w-]+(?:\.[\w-]+)*#((?:[\w-]+\.)+[a-zA-Z]{2,7})/';
$offset = 0;
while (preg_match($pattern, $str, $matches, PREG_OFFSET_CAPTURE, $offset)) {
list(list($address, $pos), list($domain)) = $matches;
isset($addresses[$domain]) || $addresses[$domain] = $address;
$offset = $pos + strlen($address);
}

Detecting emails in a text

I'm trying to create a function that translates every occurrence of a plain text email address in a given string into it's htmlized version.
Let's say I have the following code, where htmlizeEmails is the function I'm looking for:
$str = "Send me an email to bob#example.com.";
echo htmlizeEmails($str); // Echoes "Send me an email to bob#example.com."
If possible, I'd like this function to use the filter_var function to check if the email is valid.
Does anyone know how to do this? Thanks!
Edit:
Thanks for the answers, I used Shocker's regex to match potential email addresses and then, only if the filter_var validates it, it gets replaced.
function htmlizeEmails($text)
preg_match_all('/([a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6})/', $text, $potentialEmails, PREG_SET_ORDER);
$potentialEmailsCount = count($potentialEmails);
for ($i = 0; $i < $potentialEmailsCount; $i++) {
if (filter_var($potentialEmails[$i][0], FILTER_VALIDATE_EMAIL)) {
$text = str_replace($potentialEmails[$i][0], '' . $potentialEmails[$i][0] .'', $text);
}
}
}

$str = preg_replace('/([a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6})/', '$1', $str);
where ([a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6}) is the regular expression used for detecting an email address (this is a general example, email addresses may be more complicated than this and not all addresses may be covered, but finding the perfect regex for emails is up to you)

There's always matching every sequence of non-space characters and testing those with filter_var, but this is probably one of those cases where it's just better to use regular expressions.
echo preg_replace('/(([\w!#$%&\'*+\-\/=?^`{|}~]|\\\\\\\\|\\\\?"|\\\\ )+\.)*([\w!#$%&\'*+\-\/=?^`{|}~]|\\\\\\\\|\\\\?"|\\\\ )+#((\w+[\.-])*[a-zA-Z]{2,}|\[(\d{1,3}\.){3}\d{1,3}\])/', '$0', $str);
I've tried to follow the standard as best I could without making it ridiculously compliant. And anybody who puts comments in his or her e-mail address can just be forgotten safely, I think. And it definitely works for common e-mails.
EDIT: After a long, difficult struggle, here's my regular expression to match everything:
((([a-zA-Z0-9!\#\$%&'*+\-\/=?^_`{|}~]+|"([a-zA-Z0-9!\#\$%&'*+\-\/=?^_`{|}~(),:;<>#\[\]]|\\[ \\"])+")\.)*([a-zA-Z0-9!\#\$%&'*+\-\/=?^_`{|}~]+|"([a-zA-Z0-9!\#\$%&'*+\-\/=?^_`{|}~(),:;<>#\[\]]|\\[ \\"])+"))#((([a-zA-Z0-9]([a-zA-Z0-9]*(\-[a-zA-Z0-9]*)*)?\.)*[a-zA-Z]{2,}|\[((0?\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])\.){3}(0?\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])\]|\[[Ii][Pp][vV]6(:[0-9a-fA-F]{0,4}){6}\]))
Enjoy escaping it!

The code below should work fine, but it regex is easier to go with.
$str = "Send me an email to bob#example.com.";
function htmlizestring($a){
if(substr_count($a,"#") != 1){
return false;
}else{
$b4 = stristr($a,"#",true);
$b4pos = strripos($b4," ")+1;
$b4 = trim(substr($b4,$b4pos));
$after = stristr($a,"#");
if(substr_count($after, " ") == 0){
$after=rtrim($after," .,");
}else{
$after=trim(stristr($after," ",true));
}
$email = $b4.$after;
echo $email;
if(filter_var($email, FILTER_VALIDATE_EMAIL)){
echo "Send me an email at: <a href='mailto:".$email."'>".$email."</a>";
}else{
return false;
}
}
}
htmlizestring($str);
I happen to use stristr() with the third parameter TRUE, which only works on php 5.3+

filter_var is nice to validate an email, but Dominic Sayers' is_email is even better, and my personal choice.
source code: http://code.google.com/p/isemail/source/browse/PHP/trunk/is_email.php
about: http://isemail.info/about

Regular expression and newline

I have such text:
<Neednt#email.com> If you do so, please include this problem report.
<Anotherneednt#email.com> You can delete your
own
text from the attached returned message.
The mail system
<Some#Mail.net>: connect to *.net[82.*.86.*]: Connection timed
out
I have to parse email from it. Could you help me with this job?
upd
There could be another email addresses in <%here%>. There should be connection between 'The mail system' text. I need in email which goes after that text.

Considering this text is stored in $text, what about this :
$matches = array();
if (preg_match('/<([^>]+)>/', $text, $matches)) {
var_dump($matches[1]);
}
Which gives me :
string 'Some#Mail.net' (length=13)
Basically, I used a pretty simple regex, that matches :
a < character
anything that's not a > character : [^>]
at least one time : [^>]+
capturing it : ([^>]+)
a > character
So, it captures anything that's between < and >.
Edit after comments+edit of the OP :
If you only want the e-mail address that's after The mail system, you could use this :
$matches = array();
if (preg_match('/The mail system\s*<([^>]+)>/', $text, $matches)) {
var_dump($matches[1]);
}
In addition to what I posted before, this expects :
The string The mail system
Any number of white-characters : \s*

You want to use preg_match() and looking at this input it should be simple:
<?php
if (preg_match('/<([^>]*?#[^>]*>/', $data, $matches)) {
var_dump($matches); // specifically look at $matches[1]
}
There are other patterns that would match it, you don't have to stick to that same pattern. The '<' and '>' in your input are helpful here.

PHP Email Array Regular Expression

Given a list of emails, formated:
"FirstName Last" <email#address.com>, "NewFirst NewLast" <email2#address.com>
How can I build this into a string array of Only email addresses (I don't need the names).

PHP’s Mailparse extension has a mailparse_rfc822_parse_addresses function you might want to try. Otherwise you should build your own address parser.

You could use preg_match_all (docs):
preg_match_all('/<([^>]+)>/', $s, $matches);
print_r($matches); // inspect the resulting array
Provided that all addresses are enclosed in < ... > there is no need to explode() the string $s.
EDIT In response to comments, the regex could be rewritten as '/<([^#]+#[^>]+)>/'. Not sure whether this is fail-safe, though :)
EDIT #2 Use a parser for any non-trivial data (see the comments below - email address parsing is a bitch). Some errors could, however, be prevented by removing duplicate addresses.

<?php
$s = "\"FirstName Last\" <email#address.com>, \"NewFirst NewLast\" <email2#address.com>";
$emails = array();
foreach (split(",", $s) as $full)
{
preg_match("/.*<([^>]+)/", $full, $email);
$emails[] = $email[1];
}
print_r($emails);
?>

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

Retrieve full email address from string - php

Could always take regex out of the equation if you know that's always the format it'll be in: $testString = '<mailto:bob#example.com|bob#example.com>'; $testString = str_replace(['<mailto:', '>'], '', $testString); $addresses = explode('|', $testString); echo $addresses[0];

Related

Email regex failing on certain email addresses

Unique email addresses by domain

Detecting emails in a text

Regular expression and newline

PHP Email Array Regular Expression

Categories

Resources