Extract e-mail from long text(PHP) - php

I have to find way to extract e-mail adress from webpage source code.
$str= "<a h=ref=3D.mailto:rys#adres.pl.><img src=3D.http://www.lowiecki.pl/img/list.gif=
. border=3D.0.></a></td><td class=3D.bb.>
$a = preg_split( "/ [:] /", $str )";
for($i=0;$i<count($a);$i++)
echo $a[$i];
I tried that, but i don't know how to set limit on substring "pl".

E-mail addresses can be far more complex than the forms we are used to, see examples of uncommon valid addresses.
An almost perfect, but very complex, regular expression for matching most e-mail address forms is proposed at https://emailregex.com/.
You could use this shorter, but more restrictive, expression derived from one proposed by Jan Goyvaerts at https://www.regular-expressions.info/email.html: /\b[A-Z0-9][A-Z0-9._%+-]{0,63}#(?:[A-Z0-9-]{1,63}\.){1,125}[A-Z]{2,63}\b/i
In a PHP script, it could be implemented this way:
<?php
$str = "<a h=ref=3D.mailto:rys#adres.pl.><img src=3D.http://www.lowiecki.pl/img/list.gif=
. border=3D.0.></a></td><td class=3D.bb.><a h=ref=3D.mailto:second-address#example.com.>foo</a>";
preg_match_all(
'/\b[A-Z0-9][A-Z0-9._%+-]{0,63}#(?:[A-Z0-9-]{1,63}\.){1,125}[A-Z]{2,63}\b/i', # After https://www.regular-expressions.info/email.html
quoted_printable_decode($str), # An e-mail address may be corrupted by the quoted-printable encoding.
$matches
);
echo isset($matches[0]) ? '<pre>'.print_r($matches[0], true).'</pre>' : 'No address found.';
?>
This script outputs:
Array
(
[0] => rys#adres.pl
[1] => second-address#example.com
)
Make sure to call $matches[0] to get the found addresses.
Best regards

Next code will search for an email and save it into a variable, after that you can use the result as you wish.
$email = preg_match_all(
"/[a-z0-9]+([_\\.-][a-z0-9]+)*#([a-z0-9]+([\.-][a-z0-9]+)*)+\\.[a-z]{2,}/i",
$str,
$listofemails
);
if($email) {
echo "you got a match";
}

Related

Retrieve full email address from string

I'm currently building a Slack bot using Laravel, and one of the features is that it can receive an email address and send a message to it.
The issue is that email addresses (e.g bob#example.com) come through as <mailto:bob#example.com|bob#example.com> from Slack.
I currently have a function that retrieves the email from this:
public function getEmail($string)
{
$pattern = '/[a-z0-9_\-\+]+#[a-z0-9\-]+\.([a-z]{2,3})(?:\.[a-z]{2})?/i';
preg_match_all($pattern, $string, $matches);
$matches = array_filter($matches);
return $matches[0][0];
}
This seemed to be working fine with email addresses like bob#example.com, however it seems to fail when working with email addresses like bob.jones#example.com (which would come through as <mailto:bob.jones#example.com|bob.jones#example.com>.
In these cases, the function is returning jones#example.com as the email address.
I'm not great with regex, but is there something else I could use/change in my pattern, or a better way to fetch the email address from the string provided by Slack?
Could always take regex out of the equation if you know that's always the format it'll be in:
$testString = '<mailto:bob#example.com|bob#example.com>';
$testString = str_replace(['<mailto:', '>'], '', $testString);
$addresses = explode('|', $testString);
echo $addresses[0];
This method will do the job and you avoid to have regular expressions. and make sure the email being returned is a real email address by validating it with php functions.
function getEmailAddress($string)
{
$string = trim($string, '<>');
$args = explode('|', $string);
foreach ($args as $_ => $val) {
if(filter_var($val, FILTER_VALIDATE_EMAIL) !== false) {
return $val;
}
}
return null;
}
echo getEmailAddress('<mailto:bob#example.com|bob#example.com>');
Output
bob#example.com
You know the strings containing the e-mail address will always be of the form <mailto:bob#example.com|bob#example.com>, so use that. Specifically, you know the string will start with <mailto:, will contain a |, and will end with >.
An added difficulty though, is that the local part of an e-mail address may contain a pipe character as well, but the domain may not; see the following question.
What characters are allowed in an email address?
public function getEmail($string)
{
$pattern = '/^<mailto:([^#]+#[^|]+)|(.*)>$/i';
preg_match_all($pattern, $string, $matches);
$matches = array_filter($matches);
return $matches[1][0];
}
This matches the full line from beginning to end, but we capture the e-mail address within the first set of parentheses. $matches[1] contains all matches from the first capturing parentheses. You could use preg_match instead, since you're not looking for all matches, just the first one.

Extract value from header string

I am writing a code to read bounced emails from inbox. I am getting the body of the email like so:
$body = imap_body($conn, $i);
After I get the body string, I split it into an array with explode.
$bodyParts = explode(PHP_EOL, $body);
The bounced emails that I am concerned with, they all have a particular header set i.e. X-OBJ-ID. I can loop through $bodyParts to check if that particular header is set or not, but how do I get it's value if the header exists. Currently, the header string looks like this for those bounced emails which had that header set:
"X-OBJ-ID: 24\r"
So, basically my question is: How do I extract 24 from the above string?
Lookbehinds can be helpful in such cases
/(?<=X-OBJ-ID: )\d+/
(?<=X-OBJ-ID: ) look behind. Ensures that the digits is preceded by X-OBJ-ID:
\d+ Matches digits.
Regex Demo
Example
preg_match("/(?<=X-OBJ-ID: )\d+/", "X-OBJ-ID: 24\r", $matches);
print_r($matches)
=> Array (
[0] => 24
)
Try
$int = filter_var($str, FILTER_SANITIZE_NUMBER_INT);
or you can do it via regular expression
preg_replace("/[^0-9]/","",$string);
You could do something like so:
$str = "X-OBJ-ID: 24\r";
preg_match('X-OBJ-ID:\s+(\d+)', $str, $re);
print($re);
This should match your string and store the 24 within a capture group which will be then made accessible through $re.
try this code
preg_replace('/\D/', '', $str)
it removes all the non numeric characters from the string
My solution:
<?php
$string = '"X-OBJ-ID: 24\r"';
preg_match_all('^\X-OBJ-ID: (.*?)[$\\\r]+^', $string, $matches);
echo !empty($matches[1]) ? trim($matches[1][0]) : 'No matches found';
?>
See it working here http://viper-7.com/kuMyVh

Detecting emails in a text

I'm trying to create a function that translates every occurrence of a plain text email address in a given string into it's htmlized version.
Let's say I have the following code, where htmlizeEmails is the function I'm looking for:
$str = "Send me an email to bob#example.com.";
echo htmlizeEmails($str); // Echoes "Send me an email to bob#example.com."
If possible, I'd like this function to use the filter_var function to check if the email is valid.
Does anyone know how to do this? Thanks!
Edit:
Thanks for the answers, I used Shocker's regex to match potential email addresses and then, only if the filter_var validates it, it gets replaced.
function htmlizeEmails($text)
preg_match_all('/([a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6})/', $text, $potentialEmails, PREG_SET_ORDER);
$potentialEmailsCount = count($potentialEmails);
for ($i = 0; $i < $potentialEmailsCount; $i++) {
if (filter_var($potentialEmails[$i][0], FILTER_VALIDATE_EMAIL)) {
$text = str_replace($potentialEmails[$i][0], '' . $potentialEmails[$i][0] .'', $text);
}
}
}
$str = preg_replace('/([a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6})/', '$1', $str);
where ([a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+\.[a-zA-Z]{2,6}) is the regular expression used for detecting an email address (this is a general example, email addresses may be more complicated than this and not all addresses may be covered, but finding the perfect regex for emails is up to you)
There's always matching every sequence of non-space characters and testing those with filter_var, but this is probably one of those cases where it's just better to use regular expressions.
echo preg_replace('/(([\w!#$%&\'*+\-\/=?^`{|}~]|\\\\\\\\|\\\\?"|\\\\ )+\.)*([\w!#$%&\'*+\-\/=?^`{|}~]|\\\\\\\\|\\\\?"|\\\\ )+#((\w+[\.-])*[a-zA-Z]{2,}|\[(\d{1,3}\.){3}\d{1,3}\])/', '$0', $str);
I've tried to follow the standard as best I could without making it ridiculously compliant. And anybody who puts comments in his or her e-mail address can just be forgotten safely, I think. And it definitely works for common e-mails.
EDIT: After a long, difficult struggle, here's my regular expression to match everything:
((([a-zA-Z0-9!\#\$%&'*+\-\/=?^_`{|}~]+|"([a-zA-Z0-9!\#\$%&'*+\-\/=?^_`{|}~(),:;<>#\[\]]|\\[ \\"])+")\.)*([a-zA-Z0-9!\#\$%&'*+\-\/=?^_`{|}~]+|"([a-zA-Z0-9!\#\$%&'*+\-\/=?^_`{|}~(),:;<>#\[\]]|\\[ \\"])+"))#((([a-zA-Z0-9]([a-zA-Z0-9]*(\-[a-zA-Z0-9]*)*)?\.)*[a-zA-Z]{2,}|\[((0?\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])\.){3}(0?\d{1,2}|1\d{2}|2[0-4]\d|25[0-5])\]|\[[Ii][Pp][vV]6(:[0-9a-fA-F]{0,4}){6}\]))
Enjoy escaping it!
The code below should work fine, but it regex is easier to go with.
$str = "Send me an email to bob#example.com.";
function htmlizestring($a){
if(substr_count($a,"#") != 1){
return false;
}else{
$b4 = stristr($a,"#",true);
$b4pos = strripos($b4," ")+1;
$b4 = trim(substr($b4,$b4pos));
$after = stristr($a,"#");
if(substr_count($after, " ") == 0){
$after=rtrim($after," .,");
}else{
$after=trim(stristr($after," ",true));
}
$email = $b4.$after;
echo $email;
if(filter_var($email, FILTER_VALIDATE_EMAIL)){
echo "Send me an email at: <a href='mailto:".$email."'>".$email."</a>";
}else{
return false;
}
}
}
htmlizestring($str);
I happen to use stristr() with the third parameter TRUE, which only works on php 5.3+
filter_var is nice to validate an email, but Dominic Sayers' is_email is even better, and my personal choice.
source code: http://code.google.com/p/isemail/source/browse/PHP/trunk/is_email.php
about: http://isemail.info/about

Regular expression and newline

I have such text:
<Neednt#email.com> If you do so, please include this problem report.
<Anotherneednt#email.com> You can delete your
own
text from the attached returned message.
The mail system
<Some#Mail.net>: connect to *.net[82.*.86.*]: Connection timed
out
I have to parse email from it. Could you help me with this job?
upd
There could be another email addresses in <%here%>. There should be connection between 'The mail system' text. I need in email which goes after that text.
Considering this text is stored in $text, what about this :
$matches = array();
if (preg_match('/<([^>]+)>/', $text, $matches)) {
var_dump($matches[1]);
}
Which gives me :
string 'Some#Mail.net' (length=13)
Basically, I used a pretty simple regex, that matches :
a < character
anything that's not a > character : [^>]
at least one time : [^>]+
capturing it : ([^>]+)
a > character
So, it captures anything that's between < and >.
Edit after comments+edit of the OP :
If you only want the e-mail address that's after The mail system, you could use this :
$matches = array();
if (preg_match('/The mail system\s*<([^>]+)>/', $text, $matches)) {
var_dump($matches[1]);
}
In addition to what I posted before, this expects :
The string The mail system
Any number of white-characters : \s*
You want to use preg_match() and looking at this input it should be simple:
<?php
if (preg_match('/<([^>]*?#[^>]*>/', $data, $matches)) {
var_dump($matches); // specifically look at $matches[1]
}
There are other patterns that would match it, you don't have to stick to that same pattern. The '<' and '>' in your input are helpful here.

PHP Email Array Regular Expression

Given a list of emails, formated:
"FirstName Last" <email#address.com>, "NewFirst NewLast" <email2#address.com>
How can I build this into a string array of Only email addresses (I don't need the names).
PHP’s Mailparse extension has a mailparse_rfc822_parse_addresses function you might want to try. Otherwise you should build your own address parser.
You could use preg_match_all (docs):
preg_match_all('/<([^>]+)>/', $s, $matches);
print_r($matches); // inspect the resulting array
Provided that all addresses are enclosed in < ... > there is no need to explode() the string $s.
EDIT In response to comments, the regex could be rewritten as '/<([^#]+#[^>]+)>/'. Not sure whether this is fail-safe, though :)
EDIT #2 Use a parser for any non-trivial data (see the comments below - email address parsing is a bitch). Some errors could, however, be prevented by removing duplicate addresses.
<?php
$s = "\"FirstName Last\" <email#address.com>, \"NewFirst NewLast\" <email2#address.com>";
$emails = array();
foreach (split(",", $s) as $full)
{
preg_match("/.*<([^>]+)/", $full, $email);
$emails[] = $email[1];
}
print_r($emails);
?>

Categories