PHP Email Array Regular Expression - php

Given a list of emails, formated:
"FirstName Last" <email#address.com>, "NewFirst NewLast" <email2#address.com>
How can I build this into a string array of Only email addresses (I don't need the names).

PHP’s Mailparse extension has a mailparse_rfc822_parse_addresses function you might want to try. Otherwise you should build your own address parser.

You could use preg_match_all (docs):
preg_match_all('/<([^>]+)>/', $s, $matches);
print_r($matches); // inspect the resulting array
Provided that all addresses are enclosed in < ... > there is no need to explode() the string $s.
EDIT In response to comments, the regex could be rewritten as '/<([^#]+#[^>]+)>/'. Not sure whether this is fail-safe, though :)
EDIT #2 Use a parser for any non-trivial data (see the comments below - email address parsing is a bitch). Some errors could, however, be prevented by removing duplicate addresses.

<?php
$s = "\"FirstName Last\" <email#address.com>, \"NewFirst NewLast\" <email2#address.com>";
$emails = array();
foreach (split(",", $s) as $full)
{
preg_match("/.*<([^>]+)/", $full, $email);
$emails[] = $email[1];
}
print_r($emails);
?>

Related

Retrieve full email address from string

I'm currently building a Slack bot using Laravel, and one of the features is that it can receive an email address and send a message to it.
The issue is that email addresses (e.g bob#example.com) come through as <mailto:bob#example.com|bob#example.com> from Slack.
I currently have a function that retrieves the email from this:
public function getEmail($string)
{
$pattern = '/[a-z0-9_\-\+]+#[a-z0-9\-]+\.([a-z]{2,3})(?:\.[a-z]{2})?/i';
preg_match_all($pattern, $string, $matches);
$matches = array_filter($matches);
return $matches[0][0];
}
This seemed to be working fine with email addresses like bob#example.com, however it seems to fail when working with email addresses like bob.jones#example.com (which would come through as <mailto:bob.jones#example.com|bob.jones#example.com>.
In these cases, the function is returning jones#example.com as the email address.
I'm not great with regex, but is there something else I could use/change in my pattern, or a better way to fetch the email address from the string provided by Slack?
Could always take regex out of the equation if you know that's always the format it'll be in:
$testString = '<mailto:bob#example.com|bob#example.com>';
$testString = str_replace(['<mailto:', '>'], '', $testString);
$addresses = explode('|', $testString);
echo $addresses[0];
This method will do the job and you avoid to have regular expressions. and make sure the email being returned is a real email address by validating it with php functions.
function getEmailAddress($string)
{
$string = trim($string, '<>');
$args = explode('|', $string);
foreach ($args as $_ => $val) {
if(filter_var($val, FILTER_VALIDATE_EMAIL) !== false) {
return $val;
}
}
return null;
}
echo getEmailAddress('<mailto:bob#example.com|bob#example.com>');
Output
bob#example.com
You know the strings containing the e-mail address will always be of the form <mailto:bob#example.com|bob#example.com>, so use that. Specifically, you know the string will start with <mailto:, will contain a |, and will end with >.
An added difficulty though, is that the local part of an e-mail address may contain a pipe character as well, but the domain may not; see the following question.
What characters are allowed in an email address?
public function getEmail($string)
{
$pattern = '/^<mailto:([^#]+#[^|]+)|(.*)>$/i';
preg_match_all($pattern, $string, $matches);
$matches = array_filter($matches);
return $matches[1][0];
}
This matches the full line from beginning to end, but we capture the e-mail address within the first set of parentheses. $matches[1] contains all matches from the first capturing parentheses. You could use preg_match instead, since you're not looking for all matches, just the first one.

Unique email addresses by domain

Im trying to make the below function only return 1 email per domain.
Example: if i feed the function:
email1#domain.com email2#domain.com email1#domain.com
email1#domain.com email3#test.co.uk
I want it to return
email1#domain.com email3#test.co.uk
Here is the current function:
function remove_duplicates($str) {
# match all email addresses using a regular expression and store them
# in an array called $results
preg_match_all("([\w-]+(?:\.[\w-]+)*#(?:[\w-]+\.)+[a-zA-Z]{2,7})",$str,$results);
# sort the results alphabetically
sort($results[0]);
# remove duplicate results by comparing it to the previous value
$prev="";
while(list($key,$val)=each($results[0])) {
if($val==$prev) unset($results[0][$key]);
else $prev=$val;
}
# process the array and return the remaining email addresses
$str = "";
foreach ($results[0] as $value) {
$str .= "<br />".$value;
}
return $str;
};
Any ideas how to achieve this?
Something along these lines:
$emails = array('email1#domain.com', 'email2#domain.com', 'email1#domain.com', 'email1#domain.com', 'email3#test.co.uk');
$grouped = array();
foreach ($emails as $email) {
preg_match('/(?<=#)[^#]+$/', $email, $match);
$grouped[$match[0]] = $email;
}
var_dump($grouped);
This keeps the last occurrence of a domain, it's not hard to modify to keep the first instead if you require it.
You could simply use the array_unique function to do the job for you:
$emails = explode(' ', $emailString);
$emails = array_unique($emails);
The concept prev is not reliable unless all equal hostnames are in one continuous sequence. It would work if you were sorting by hostname, with a sorting function provided, but it's a bit of overkill.
Build an array with the hostnames, drop entries for which there is already a hostname in the array.
I'd suggest the following trick/procedure:
Change from one string to array of addresses. You do this with preg_match_all, others might do it with explode, all seems valid. So you have this already.
Extract the domain from the address. You could do this again with an regular expression or some other thing, I'd say it's trivial.
Now check if the domain has been already used, and if not, pick that email address.
The last point can be easily done by using an array and the domain as key. You can then use isset to see if it is already in use.
Edit: As deceze opted for a similar answer (he overwrites the matches per domain), the following code-example is a little variation. As you have got string input, I considered to iterate over it step by step to spare the temporary array of addresses and to do the adress and domain parsing at once. To do that, you need to take care of the offsets, which is supported by preg_match. Something similar is actually possible with preg_match_all however, you would then have the array again.
This code will pick the first and ignore the other addresses per domain:
$str = 'email1#domain.com email2#domain.com email1#domain.com email1#domain.com email3#test.co.uk';
$addresses = array();
$pattern = '/[\w-]+(?:\.[\w-]+)*#((?:[\w-]+\.)+[a-zA-Z]{2,7})/';
$offset = 0;
while (preg_match($pattern, $str, $matches, PREG_OFFSET_CAPTURE, $offset)) {
list(list($address, $pos), list($domain)) = $matches;
isset($addresses[$domain]) || $addresses[$domain] = $address;
$offset = $pos + strlen($address);
}

How to extract Email & Name from Full Email text using PHP?

I have a string as
$email_string='Aslam Doctor <aslam.doctor#gmail.com>';
From which I want to extract Name & Email using PHP? so that I can get
$email='aslam.doctor#gmail.com';
$name='Aslam Doctor'
Thanks in advance.
As much as people will probably recommend regular expression I'd say use explode().
Explode splits the string up in several substrings using any delimiter.
In this case I use ' <' as a delimiter to immediately strip the whitespace between the name and e-mail.
$split = explode(' <', $email_string);
$name = $split[0];
$email = rtrim($split[1], '>');
rtrim() will trim the '>' character from the end of the string.
Using explode + list:
$email_string = 'Aslam Doctor <aslam.doctor#gmail.com>';
list($name, $email) = explode(' <', trim($email_string, '> '));
If you can use the IMAP extension, the imap_rfc822_parse_adrlist function is all you need.
/via https://stackoverflow.com/a/3638433/204774
text variable have one paragraph. two emails are included there. using extract_emails_from_string() function we extracts those mails from that paragraph.
preg_match_all function will return all matching strings with the regular expression from inputs.
function extract_emails_from_string($string){
preg_match_all("/[\._a-zA-Z0-9-]+#[\._a-zA-Z0-9-]+/i", $string, $matches);
return $matches[0];
}
$text = "Please be sure to answer the Please arun1#email.com be sure to answer the Please be sure to answer the Please be sure to answer the Please be sure to answer the Please be sure to answer the Please be sure to answer the arun#email.com";
$emails = extract_emails_from_string($text);
print(implode("\n", $emails));
This is what I use - works for email addresses with and without the angle bracket formatting. Because we are searching from right to left, this also works for those weird instances where the name segment actually contains the < character:
$email = 'Aslam Doctor <aslam.doctor#gmail.com>';
$address = trim(substr($email, strrpos($email, '<')), '<>');

Get more backreferences from regexp than parenthesis

Ok this is really difficult to explain in English, so I'll just give an example.
I am going to have strings in the following format:
key-value;key1-value;key2-...
and I need to extract the data to be an array
array('key'=>'value','key1'=>'value1', ... )
I was planning to use regexp to achieve (most of) this functionality, and wrote this regular expression:
/^(\w+)-([^-;]+)(?:;(\w+)-([^-;]+))*;?$/
to work with preg_match and this code:
for ($l = count($matches),$i = 1;$i<$l;$i+=2) {
$parameters[$matches[$i]] = $matches[$i+1];
}
However the regexp obviously returns only 4 backreferences - first and last key-value pairs of the input string. Is there a way around this? I know I can use regex just to test the correctness of the string and use PHP's explode in loops with perfect results, but I'm really curious whether it's possible with regular expressions.
In short, I need to capture an arbitrary number of these key-value; pairs in a string by means of regular expressions.
You can use a lookahead to validate the input while you extract the matches:
/\G(?=(?:\w++-[^;-]++;?)++$)(\w++)-([^;-]++);?/
(?=(?:\w++-[^;-]++;?)++$) is the validation part. If the input is invalid, matching will fail immediately, but the lookahead still gets evaluated every time the regex is applied. In order to keep it (along with the rest of the regex) in sync with the key-value pairs, I used \G to anchor each match to the spot where the previous match ended.
This way, if the lookahead succeeds the first time, it's guaranteed to succeed every subsequent time. Obviously it's not as efficient as it could be, but that probably won't be a problem--only your testing can tell for sure.
If the lookahead fails, preg_match_all() will return zero (false). If it succeeds, the matches will be returned in an array of arrays: one for the full key-value pairs, one for the keys, one for the values.
regex is powerful tool, but sometimes, its not the best approach.
$string = "key-value;key1-value";
$s = explode(";",$string);
foreach($s as $k){
$e = explode("-",$k);
$array[$e[0]]=$e[1];
}
print_r($array);
Use preg_match_all() instead. Maybe something like:
$matches = $parameters = array();
$input = 'key-value;key1-value1;key2-value2;key123-value123;';
preg_match_all("/(\w+)-([^-;]+)/", $input, $matches, PREG_SET_ORDER);
foreach ($matches as $match) {
$parameters[$match[1]] = $match[2];
}
print_r($parameters);
EDIT:
to first validate if the input string conforms to the pattern, then just use:
if (preg_match("/^((\w+)-([^-;]+);)+$/", $input) > 0) {
/* do the preg_match_all stuff */
}
EDIT2: the final semicolon is optional
if (preg_match("/^(\w+-[^-;]+;)*\w+-[^-;]+$/", $input) > 0) {
/* do the preg_match_all stuff */
}
No. Newer matches overwrite older matches. Perhaps the limit argument of explode() would be helpful when exploding.
what about this solution:
$samples = array(
"good" => "key-value;key1-value;key2-value;key5-value;key-value;",
"bad1" => "key-value-value;key1-value;key2-value;key5-value;key-value;",
"bad2" => "key;key1-value;key2-value;key5-value;key-value;",
"bad3" => "k%ey;key1-value;key2-value;key5-value;key-value;"
);
foreach($samples as $name => $value) {
if (preg_match("/^(\w+-\w+;)+$/", $value)) {
printf("'%s' matches\n", $name);
} else {
printf("'%s' not matches\n", $name);
}
}
I don't think you can do both validation and extraction of data with one single regexp, as you need anchors (^ and $) for validation and preg_match_all() for the data, but if you use anchors with preg_match_all() it will only return the last set matched.

Parse multiple predictably formatted substrings of user data existing in a single string

I have a really long string in a certain pattern such as:
userAccountName: abc userCompany: xyz userEmail: a#xyz.com userAddress1: userAddress2: userAddress3: userTown: ...
and so on. This pattern repeats.
I need to find a way to process this string so that I have the values of userAccountName:, userCompany:, etc. (i.e. preferably in an associative array or some such convenient format).
Is there an easy way to do this or will I have to write my own logic to split this string up into different parts?
Simple regular expressions like this userAccountName:\s*(\w+)\s+ can be used to capture matches and then use the captured matches to create a data structure.
If you can arrange for the data to be formatted as it is in a URL (ie, var=data&var2=data2) then you could use parse_str, which does almost exactly what you want, I think. Some mangling of your input data would do this in a straightforward manner.
You might have to use regex or your own logic.
Are you guaranteed that the string ": " does not appear anywhere within the values themselves? If so, you possibly could use implode to split the string into an array of alternating keys and values. You'd then have to walk through this array and format it the way you want. Here's a rough (probably inefficient) example I threw together quickly:
<?php
$keysAndValuesArray = implode(': ', $dataString);
$firstKeyName = 'userAccountName';
$associativeDataArray = array();
$currentIndex = -1;
$numItems = count($keysAndValuesArray);
for($i=0;$i<$numItems;i+=2) {
if($keysAndValuesArray[$i] == $firstKeyName) {
$associativeDataArray[] = array();
++$currentIndex;
}
$associativeDataArray[$currentIndex][$keysAndValuesArray[$i]] = $keysAndValuesArray[$i+1];
}
var_dump($associativeDataArray);
If you can write a regexp (for my example I'm considering there're no semicolons in values), you can parse it with preg_split or preg_match_all like this:
<?php
$raw_data = "userAccountName: abc userCompany: xyz";
$raw_data .= " userEmail: a#xyz.com userAddress1: userAddress2: ";
$data = array();
// /([^:]*\s+)?/ part works because the regexp is "greedy"
if (preg_match_all('/([a-z0-9_]+):\s+([^:]*\s+)?/i', $raw_data,
$items, PREG_SET_ORDER)) {
foreach ($items as $item) {
$data[$item[1]] = $item[2];
}
print_r($data);
}
?>
If that's not the case, please describe the grammar of your string in a bit more detail.
PCRE is included in PHP and can respond to your needs using regexp like:
if ($c=preg_match_all ("/userAccountName: (<userAccountName>\w+) userCompany: (<userCompany>\w+) userEmail: /", $txt, $matches))
{
$userAccountName = $matches['userAccountName'];
$userCompany = $matches['userCompany'];
// and so on...
}
the most difficult is to get the good regexp for your needs.
you can have a look at http://txt2re.com for some help
I think the solution closest to what I was looking for, I found at http://www.justin-cook.com/wp/2006/03/31/php-parse-a-string-between-two-strings/. I hope this proves useful to someone else. Thanks everyone for all the suggested solutions.
If i were you, i'll try to convert the strings in a json format with some regexp.
Then, simply use Json.

Categories