I want to extract email address from a string, for example:
<?php // code
$string = 'Ruchika <ruchika#example.com>';
?>
From the above string I only want to get email address ruchika#example.com.
Kindly, recommend how to achieve this.
Try this
<?php
$string = 'Ruchika < ruchika#example.com >';
$pattern = '/[a-z0-9_\-\+\.]+#[a-z0-9\-]+\.([a-z]{2,4})(?:\.[a-z]{2})?/i';
preg_match_all($pattern, $string, $matches);
var_dump($matches[0]);
?>
see demo here
Second method
<?php
$text = 'Ruchika < ruchika#example.com >';
preg_match_all("/[\._a-zA-Z0-9-]+#[\._a-zA-Z0-9-]+/i", $text, $matches);
print_r($matches[0]);
?>
See demo here
Parsing e-mail addresses is an insane work and would result in a very complicated regular expression. For example, consider this official regular expression to catch an e-mail address: http://www.ex-parrot.com/pdw/Mail-RFC822-Address.html
Amazing right?
Instead, there is a standard php function to do this called mailparse_rfc822_parse_addresses() and documented here.
It takes a string as argument and returns an array of associative array with keys display, address and is_group.
So,
$to = 'Wez Furlong <wez#example.com>, doe#example.com';
var_dump(mailparse_rfc822_parse_addresses($to));
would yield:
array(2) {
[0]=>
array(3) {
["display"]=>
string(11) "Wez Furlong"
["address"]=>
string(15) "wez#example.com"
["is_group"]=>
bool(false)
}
[1]=>
array(3) {
["display"]=>
string(15) "doe#example.com"
["address"]=>
string(15) "doe#example.com"
["is_group"]=>
bool(false)
}
}
try this code.
<?php
function extract_emails_from($string){
preg_match_all("/[\._a-zA-Z0-9-]+#[\._a-zA-Z0-9-]+/i", $string, $matches);
return $matches[0];
}
$text = "blah blah blah blah blah blah email2#address.com";
$emails = extract_emails_from($text);
print(implode("\n", $emails));
?>
This will work.
Thanks.
This is based on Niranjan's response, assuming you have the input email enclosed within < and > characters). Instead of using a regular expression to grab the email address, here I get the text part between the < and > characters. Otherwise, I use the string to get the entire email. Of course, I didn't make any validation on the email address, this will depend on your scenario.
<?php
$string = 'Ruchika <ruchika#example.com>';
$pattern = '/<(.*?)>/i';
preg_match_all($pattern, $string, $matches);
var_dump($matches);
$email = $matches[1][0] ?? $string;
echo $email;
?>
Here is a forked demo.
Of course, if my assumption isn't correct, then this approach will fail. But based on your input, I believe you wanted to extract emails enclosed within < and > chars.
This function extract all email from a string and return it in an array.
function extract_emails_from($string){
preg_match_all( '/([\w+\.]*\w+#[\w+\.]*\w+[\w+\-\w+]*\.\w+)/is', $string, $matches );
return $matches[0];
};
This works great and it's minimal:
$email = strpos($from, '<') ? substr($from, strpos($from, '<') + 1, -1) : $from
use (my) function getEmailArrayFromString to easily extract email adresses from a given string.
<?php
function getEmailArrayFromString($sString = '')
{
$sPattern = '/[\._\p{L}\p{M}\p{N}-]+#[\._\p{L}\p{M}\p{N}-]+/u';
preg_match_all($sPattern, $sString, $aMatch);
$aMatch = array_keys(array_flip(current($aMatch)));
return $aMatch;
}
// Example
$sString = 'foo#example.com XXX bar#example.com XXX <baz#example.com>';
$aEmail = getEmailArrayFromString($sString);
/**
* array(3) {
[0]=>
string(15) "foo#example.com"
[1]=>
string(15) "bar#example.com"
[2]=>
string(15) "baz#example.com"
}
*/
var_dump($aEmail);
Based on Priya Rajaram's code, I have optimised the function a little more so that each email address only appears once.
If, for example, an HTML document is parsed, you usually get everything twice, because the mail address is also used in the mailto link, too.
function extract_emails_from($string){
preg_match_all("/[\._a-zA-Z0-9-]+#[\._a-zA-Z0-9-]+/i", $string, $matches);
return array_values(array_unique($matches[0]));
}
This will work even on subdomains. It extracts all emails from text.
$marches[0] has all emails.
$pattern = "/[a-zA-Z0-9-_]{1,}#[a-zA-Z0-9-_]{1,}(.[a-zA-Z]{1,}){1,}/";
preg_match_all ($pattern , $string, $matches);
print_r($matches);
$marches[0] has all emails.
Array
(
[0] => Array
(
[0] => clotdesormakilgehr#prednisonecy.com
[1] => **********#******.co.za.com
[2] => clotdesormakilgehr#prednisonecy.com
[3] => clotdesormakilgehr#prednisonecy.mikedomain.com
[4] => clotdesormakilgehr#prednisonecy.com
)
[1] => Array
(
[0] => .com
[1] => .com
[2] => .com
[3] => .com
[4] => .com
)
)
A relatively straight forward approach is to use PHP built-in methods for splitting texts into words and validating E-Mails:
function fetchEmails($text) {
$words = str_word_count($text, 1, '.#-_1234567890');
return array_filter($words, function($word) {return filter_var($word, FILTER_VALIDATE_EMAIL);});
}
Will return the e-mail addresses within the text variable.
Related
I'm trying to scrape a products price from a page, however unfortunately it's not in a nice clean div so I'm having to clear all the other junk.
Note: I have looked at several examples however they all assume you only have nice organised numbers in your variable, not raw HTML stuffed on the end.
An example of the string my variable may hold:
$2.87 <span>10% Off Sale</span>
I've played about with substr and sttrpos, read the manual and still can't figure it out on my own.
I want to just cut the string two digits after the first decimal place is found... No doubt it's extremely simple when you know how!
What I want to end up with:
$2.87
An example of the mess I've got myself into trying:
$whatIWant = substr($data, strrpos($data, ".") + 2);
Thanks in advance,
Try this solution:
<?php
$string = "$2.87 <span>10% Off Sale</span>";
$matches = array();
preg_match('/(\$\d+\.\d{2})/', $string, $matches);
var_dump($matches);
Output:
array(2) {
[0]=>
string(5) "$2.87"
[1]=>
string(5) "$2.87"
}
For more info (why result is array etc.) you should check PHP manual on preg_match() function: link
The below should grab the matches for you;
$pattern = '/(\$\d+\.\d{2})/';
$string = '$2.87 <span>10% Off Sale</span>';
$matches = array();
preg_match($pattern, $string, $matches);
Outputs:
Array ( [0] => $2.87 [1] => $2.87 )
There is def better way to do this. For a start, assuming the html structure that you have above will always be the same, you could do something like:
$var = "$85.25 <span>10% Off Sale</span>";
$spl = explode("<", $var);
echo $spl[0];
From the string of words, can I get only the words with a capitalized first letter? For example, I have this string:
Page and Brin originally nicknamed THEIR new search engine "BackRub",
because the system checked backlinks to estimate the importance of a
site.
I need to get: Page, Brin, THEIR, BackRub
A non-regex solution (based on Mark Baker's comment):
$result = array_filter(str_word_count($str, 1), function($item) {
return ctype_upper($item[0]);
});
print_r($result);
Output:
Array
(
[0] => Page
[2] => Brin
[5] => THEIR
[9] => BackRub
)
You can match that with
preg_match("/[A-Z]{1}[a-zA-z]*/um", $searchText)
You can see on php.net how preg_match can be applied.
http://ca1.php.net/preg_match
EDIT, TO ADD EXAMPLE
Here's an example of how to get the array with full matches
$searchText = 'Page and Brin originally nicknamed THEIR new search engine "BackRub", because the system checked backlinks to estimate the importance of a site.';
preg_match_all("/[A-Z]{1}[a-zA-z]*/um", $searchText, $matches );
var_dump( $matches );
The output is:
array(1) {
[0]=>
array(4) {
[0]=>
string(4) "Page"
[1]=>
string(4) "Brin"
[2]=>
string(5) "THEIR"
[3]=>
string(7) "BackRub"
}
}
The way I would do it is explode by space, ucfirst the exploded strings, and check them against the original.
here is what I mean:
$str = 'Page and Brin originally nicknamed THEIR new search engine "BackRub", because the system checked backlinks to estimate the importance of a site.';
$strings = explode(' ', $str);
$i = 0;
$out = array();
foreach($strings as $s)
{
if($strings[$i] == ucfirst($s))
{
$out[] = $s;
}
++$i;
}
var_dump($out);
http://codepad.org/QwrS4HpE
I would use strtok function (http://pl1.php.net/strtok), which returns the words in the string, one by one. You can specify the delimiter between words:
$string = 'Page and Brin originally nicknamed THEIR new search engine "BackRub", because the system checked backlinks to estimate the importance of a site.';
$delimiter = ' ,."'; // specify valid delimiters here (add others as needed)
$capitalized_words = array(); // array to hold the found words
$tok = strtok($string,$delimiter); // get first token
while ($tok !== false) {
$first_char = substr($tok,0,1);
if (strtoupper($first_char)===$first_char) {
// this word ($tok) is capitalized, store it
$capitalized_words[] = $tok;
}
$tok = strtok($delimiter); // get next token
}
var_dump($capitalized_words); // print the capitalized words found
This prints:
array(4) {
[0]=>
string(4) "Page"
[1]=>
string(4) "Brin"
[2]=>
string(5) "THEIR"
[3]=>
string(7) "BackRub"
}
Good luck!
Only drawback I can see is that it doesn't handle multibyte. If you have only English characters, then you're ok. If you have international characters, a modified/different solution may be needed.
You can do this using explode and loop through with regex:
$string = 'Page and Brin originally nicknamed THEIR new search engine "BackRub", because the system checked backlinks to estimate the importance of a site.';
$list = explode(' ',$string);
$matches = array();
foreach($list as $str) {
if(preg_match('/[A-Z]+[a-zA-Z]*/um',$str) $matches[] = $str;
}
print_r($matches);
I have list of domains in table with more info and
<td>example1.com</td>
<td>example2.org</td>
<td>example3.com</td>
<td>example4.com</td>
I need get .com domains using a regex. I tried to use something like :
'<td>(.............).com'
But what can I write instead of dots? What do I need to use?
I need get the data between the tags: <td>domain.com</td> -> domain.com
'<td>([^<]+\.com)</td>'
- it's more better, but i need get without tags
<?php
$html = '<td>example1.com</td>
<td>example2.org</td>
<td>example3.com</td>
<td>example4.com</td>';
$matches = array();
preg_match_all('/<td>(.*?.com)<\/td>/i', $html, $matches);
var_dump($matches[1]);
prints:
array(3) {
[0]=>
string(12) "example1.com"
[1]=>
string(12) "example3.com"
[2]=>
string(12) "example4.com"
}
Something like that:
'<td>([^<]+\.com)</td>'
but you shouldn't use regular expressions to parse html.
You can use look aheads and look behinds if you want to capture something but make sure it's surrounded by something else. Here we're capturing .com only.
<?php
$html = '<td>example1.com</td>
<td>example2.org</td>
<td>example3.com</td>
<td>example4.com</td>';
$pattern = "!(?<=<td>).*\.com*(?=</td>)!";
preg_match_all($pattern,$html,$matches);
$urls = $matches[0];
print_r($urls);
?>
Output
Array
(
[0] => example1.com
[1] => example3.com
[2] => example4.com
)
I have an output string in this format Fname Lname<fname#urmail.com>. I want to extract the email from here. How can I do that?
If you can be sure that the string format is consistent, a simple regular expression will do the trick:
$input = 'Fname Lname<fname#urmail.com>';
preg_match('~<(.*?)>~', $input, $output);
$email = $output[1];
Don't reinvent the wheel. Instead, use a parser. mailparse_rfc822_parse_addresses() is made for this specific task by professionals with an in-depth knowledge of the subject (and the possible quirks that you may run into).
Example #1 from the docs:
$to = 'Wez Furlong <wez#example.com>, doe#example.com';
var_dump(mailparse_rfc822_parse_addresses($to));
Gives (gentle formatting applied):
array(2) {
[0] => array(3) {
["display"] => string(11) "Wez Furlong"
["address"] => string(15) "wez#example.com"
["is_group"] => bool(false)
}
[1] => array(3) {
["display"] => string(15) "doe#example.com"
["address"] => string(15) "doe#example.com"
["is_group"] => bool(false)
}
}
See also: imap_rfc822_parse_adrlist() and Full name with valid email.
Use functions like substring and explode(easier method than regular expressions and will do the trick):
<?php
$text = 'Fname Lname<fname#urmail.com>';
$pieces = explode('<',$text);
$mail=substr($pieces[1],0,-1);
echo $mail;
?>
This should print the e-mail address:
if (preg_match("/<\S*>/", $subject, $matches)) {
echo "E-Mail address: ".$matches[0];
}
I have the below text displayed on the browser and trying to get the URL from the string.
string 1 = voice-to-text from #switzerland: http://bit.ly/lnpDC12D
When I try to use preg_match and trying to get the URL, but it fails
$urlstr = "";
preg_match('/\b((?#protocol)https?|ftp):\/\/((?#domain)[-A-Z0-9.]+)((?#file)\/[-A-Z0-9+&##\/%=~_|!:,.;]*)?((?#parameters)\?[A-Z0-9+&##\/%
=~_|!:,.;]*)?/i', $urlstr, $match);
echo $match[0];
I think #switzerland: has one more http// ... will it be problem ?
the above split works perfect for the below string,
voice-to-text: http://bit.ly/jDcXrZg
In this case I think parse_url will be better choice than regex based code. Something like this may work (assuming your URL always starts with http):
$str = "voice-to-text from #switzerland: http://bit.ly/lnpDC12D";
$pos = strrpos($str, "http://");
if ($pos>=0) {
var_dump(parse_url(substr($str, $pos)));
}
OUTPUT
array(3) {
["scheme"]=>
string(4) "http"
["host"]=>
string(6) "bit.ly"
["path"]=>
string(9) "/lnpDC12D"
}
As far as I understand your request, here is a way to do it :
$str = 'voice-to-text from <a href="search.twitter.com/…;: http://bit.ly/lnpDC12D';
preg_match("~(bit.ly/\S+)~", $str, $m);
print_r($m);
output:
Array
(
[0] => bit.ly/lnpDC12D
[1] => bit.ly/lnpDC12D
)