Email validation with edu domains only - php

i have been trying to get the email address which has domains ends with .edu only using code below
$email = $_REQUEST['email'];
$school = substr($email, strpos($email, "#") + 1);
is there any way?

You just need to make a substring including the last 3 chars of the current string.
<?php
$tld = substr($email, strlen($email)-2, 3); // three last chars of the string
if ($tld = "edu") {
// do stuff
}
?>

It Should be work for get your domain name and domain extension:
$email = 'test#website.edu';
$getDomain = explode('#', $email);
$explValue = explode('.', $getDomain[1], 2);
print_r($explValue);
The out put is:
Array ( [0] => website [1] => edu )
After that you can check with
if($explValue[1] == 'edu'){
//your code here
}

If .edu is the last part of the email address, you could use strlen and substr:
$email = "test#test.edu";
$end = ".edu";
$string_end = substr($email, strlen($email) - strlen($end));
if ($end === $string_end) {
// Ok
}
Maybe it is also an option to use explode and split on #. Then use explode again and split on a dot and check if the array returned contains edu:
$strings = [
"test#test.edu",
"test#test.edu.pl",
"test#test.com"
];
foreach ($strings as $string) {
if (in_array("edu", explode(".", explode("#", $string)[1]))) {
// Etc..
}
}
Demo

strpos($email, ".edu."); it should be work.
for example gensek#metu.edu.tr

You can use substr And get last 4 characters if this is valid as per your requirement so the email is valid else it not.
$string = "xyzasd.edu";
echo $txt = substr($string,-4);
if($txt == ".edu"){
//Valid
}else{
//Not Valid
}

Related

PHP: Check if string is part of an array

I'm working on my little ticketing-system based on PHP.
Now I would like to exclude senders from being processed.
This is a possible list of excluded senders:
Array (
"badboy#example.com",
"example.org",
"spam#spamming.org"
)
Okay - now I would like to check if the sender of an mail matches one of these:
$sender = "badboy#example.com";
I think this is quite easy, I think I could solve this with in_array().
But what about
$sender = "me#example.org";
example.org is defined in the array, but not me#example.org - but me#example.org should also excluded, because example.org is in the forbidden-senders-list.
How could I solve this?
Maybe you are looking for stripos function.
<?php
if (!disallowedEmail($sender)) { // Check if email is disallowed
// Do your stuff
}
function disallowedEmail($email) {
$disallowedEmails = array (
"badboy#example.com",
"example.org",
"spam#spamming.org"
)
foreach($disallowedEmails as $disallowed){
if ( stripos($email, $disallowed) !== false)
return true;
}
return false
}
Another short alternative with stripos, implode and explode functions:
$excluded = array(
"badboy#example.com",
"example.org",
"spam#spamming.org"
);
$str = implode(",", $excluded); // compounding string with excluded emails
$sender = "www#example.com";
//$sender = "me#example.org";
$domainPart = explode("#",$sender)[1]; // extracting domain part from a sender email
$isAllowed = stripos($str, $sender) === false && stripos($str, $domainPart) === false;
var_dump($isAllowed); // output: bool(false)

mask mail with Alternative words using php

Mentioned below is a dummy Email ID say,
abcdefghij#gmail.com
How to mask this email ID partially using PHP?
Output i need as
a*c*e*g*i*#gmail.com
I have tried the below code, But it not works for below requirement
$prop=3;
$domain = substr(strrchr($Member_Email, "#"), 1);
$mailname=str_replace($domain,'',$Member_Email);
$name_l=strlen($mailname);
$domain_l=strlen($domain);
for($i=0;$i<=$name_l/$prop-1;$i++)
{
$start.='*';
}
for($i=0;$i<=$domain_l/$prop-1;$i++)
{
$end.='*';
}
$MaskMail = substr_replace($mailname, $start,2, $name_l/$prop).substr_replace($domain, $end, 2, $domain_l/$prop);
Give a try like this.
$delimeter = '#';
$mail_id = 'abcdefghij#gmail.com';
$domain = substr(strrchr($mail_id, $delimeter), 1);
$user_id = substr($mail_id,0,strpos($mail_id, $delimeter));
$string_array = str_split($user_id);
$partial_id = NULL;
foreach($string_array as $key => $val){
if($key % 2 == 0){
$partial_id .=$val;
}else{
$partial_id .='*' ;
}
}
echo $partial_id.$delimeter.$domain;
Here's a no loop approach to replace every second character of an email username with a mask.
Custom PHP function using native functions split, preg_replace with regex /(.)./, and implode:
echo email_mask('abcdefghi#gmail.com');
// a*c*e*g*i*k*#gmail.com
function email_mask($email) {
list($email_username, $email_domain) = split('#', $email);
$masked_email_username = preg_replace('/(.)./', "$1*", $email_username);
return implode('#', array($masked_email_username, $email_domain));
}
Regex Explanation:
The regular expression starts at the beginning of the string, matches 2 characters and captures the first of those two, replaces the match with the first character followed by an asterisk *. preg_replace repeats this throughout the remaining string until it can no longer match a pair of characters.
$mail='abcdefghij#gmail.com';
$mail_first=explode('#',$mail);
$arr=str_split($mail_first[0]);
$mask=array();
for($i=0;$i<count($arr);$i++) {
if($i%2!=0) {
$arr[$i]='*';
}
$mask[]=$arr[$i];
}
$mask=join($mask).'#'.$mail_first[1];
echo $mask;
Result is :
a*c*e*g*i*#gmail.com
Does it need to have that many asterisks?
It's so hard to read that way.
I will suggest you keep things simple.
Maybe something like this is enough
https://github.com/fedmich/PHP_Codes/blob/master/mask_email.php
Masks an email to show first 3 characters and then the last character before the # sign
ABCDEFZ#gmail.com becomes
A*****Z#gmail.com
Here is the full code that is also in that Github link
function mask_email( $email ) {
/*
Author: Fed
Simple way of masking emails
*/
$char_shown = 3;
$mail_parts = explode("#", $email);
$username = $mail_parts[0];
$len = strlen( $username );
if( $len <= $char_shown ){
return implode("#", $mail_parts );
}
//Logic: show asterisk in middle, but also show the last character before #
$mail_parts[0] = substr( $username, 0 , $char_shown )
. str_repeat("*", $len - $char_shown - 1 )
. substr( $username, $len - $char_shown + 2 , 1 )
;
return implode("#", $mail_parts );
}

PHP Get Subdomain But Not Actual Domain

I'm currently using the following to get the subdomain of my site
$subdomain = array_shift(explode(".",$_SERVER['HTTP_HOST']));
When I use this for http://www.website.com it returns "www" which is expected
However when I use this with http://website.com it returns "website" as the subdomain. How can I make absolute sure that if there is no subdomain as in that example, it returns NULL?
Thanks!
Please, note that in common case you should first apply parse_url to incoming data - and then use [host] key from it. As for your question, you can use something like this:
preg_match('/([^\.]+)\.[^\.]+\.[^\.]+$/', 'www.domain.com', $rgMatches);
//2-nd level check:
//preg_match('/([^\.]+)\.[^\.]+\.[^\.]+$/', 'domain.com', $rgMatches);
$sDomain = count($rgMatches)?$rgMatches[1]:null;
But I'm not sure that it's exactly what you need (since url can contain 4-th domain level e t.c.)
Do this:
function getSubdomain($domain) {
$expl = explode(".", $domain, -2);
$sub = "";
if(count($expl) > 0) {
foreach($expl as $key => $value) {
$sub = $sub.".".$value;
}
$sub = substr($sub, 1);
}
return $sub;
}
$subdomain = getSubdomain($_SERVER['HTTP_HOST']);
Works fine for me. Basicly you need to use the explode limit parameter.
Detail and source: phph.net - explode manual
If you have other domain, like a .net or .org etc, just change the value accordingly
$site['uri'] = explode(".", str_replace('.com', '', $_SERVER['HTTP_HOST']) );
if( count($site['uri']) >0 ) {
$site['subdomain'] = $site['uri'][0];
$site['domain'] = $site['uri'][1];
}
else {
$site['subdomain'] = null;
$site['domain'] = $site['uri'][0];
}
//For testing only:
print_r($site);
...or not (for more flexibility):
$site['uri'] = explode(".", $_SERVER['HTTP_HOST'] );
if( count($site['uri']) > 2 ) {
$site['subdomain'] = $site['uri'][0];
$site['domain'] = $site['uri'][1];
}
else {
$site['subdomain'] = null;
$site['domain'] = $site['uri'][0];
}
//For testing only:
print_r($site);

split full email addresses into name and email?

There seems to be many acceptable email address formats in the To: and From: raw email headers ...
person#place.com
person <person#place.com>
person
Another Person <person#place.com>
'Another Person' <person#place.com>
"Another Person" <person#place.com>
After not finding any effective PHP functions for splitting out names and addresses, I've written the following code.
You can DEMO IT ON CODEPAD to see the output...
// validate email address
function validate_email( $email ){
return (filter_var($email, FILTER_VALIDATE_EMAIL)) ? true : false;
}
// split email into name / address
function email_split( $str ){
$name = $email = '';
if (substr($str,0,1)=='<') {
// first character = <
$email = str_replace( array('<','>'), '', $str );
} else if (strpos($str,' <') !== false) {
// possibly = name <email>
list($name,$email) = explode(' <',$str);
$email = str_replace('>','',$email);
if (!validate_email($email)) $email = '';
$name = str_replace(array('"',"'"),'',$name);
} else if (validate_email($str)) {
// just the email
$email = $str;
} else {
// unknown
$name = $str;
}
return array( 'name'=>trim($name), 'email'=>trim($email) );
}
// test it
$tests = array(
'person#place.com',
'monarch <themonarch#tgoci.com>',
'blahblah',
"'doc venture' <doc#venture.com>"
);
foreach ($tests as $test){
echo print_r( email_split($test), true );
}
Am I missing anything here? Can anyone recommend a better way?
I have managed to make one regex to your test cases:
person#place.com
person <person#place.com>
person
Another Person <person#place.com>
'Another Person' <person#place.com>
"Another Person" <person#place.com>
using preg_match with this regex will surely help you bit.
function email_split( $str ){
$sPattern = "/([\w\s\'\"]+[\s]+)?(<)?(([\w-\.]+)#((?:[\w]+\.)+)([a-zA-Z]{2,4}))?(>)?/g";
preg_match($sPattern,$str,$aMatch);
if(isset($aMatch[1]))
{
echo $aMatch[1] //this is name;
}
if(isset($aMatch[3]))
{
echo $aMatch[3] //this is EmailAddress;
}
}
Note: I just noticed that single "person" i.e. your third test case could be discarded with this regex (just that because of space constraint in regex) so,at first line of your email_split function, append space at last place of your string.
Then it would be bang on target.
Thanks, Hope this helps.
Code I tried:
<?php
// validate email address
function validate_email($email) {
return (filter_var($email, FILTER_VALIDATE_EMAIL)) ? true : false;
}
// split email into name / address
function email_split($str) {
$str .=" ";
$sPattern = '/([\w\s\'\"]+[\s]+)?(<)?(([\w-\.]+)#((?:[\w]+\.)+)([a-zA-Z]{2,4}))?(>)?/';
preg_match($sPattern, $str, $aMatch);
//echo "string";
//print_r($aMatch);
$name = (isset($aMatch[1])) ? $aMatch[1] : '';
$email = (isset($aMatch[3])) ? $aMatch[3] : '';
return array('name' => trim($name), 'email' => trim($email));
}
// test it
$tests = array(
'person#place.com',
'monarch <themonarch#tgoci.com>',
'blahblah',
"'doc venture' <doc#venture.com>"
);
foreach ($tests as $test) {
echo "<pre>";
echo print_r(email_split($test), true);
echo "</pre>";
}
Output I got:
Array
(
[name] =>
[email] => person#place.com
)
Array
(
[name] => monarch
[email] => themonarch#tgoci.com
)
Array
(
[name] => blahblah
[email] =>
)
Array
(
[name] => 'doc venture'
[email] => doc#venture.com
)
How about this:
function email_split($str) {
$parts = explode(' ', trim($str));
$email = trim(array_pop($parts), "<> \t\n\r\0\x0B");
$name = trim(implode(' ', $parts), "\"\' \t\n\r\0\x0B");
if ($name == "" && strpos($email, "#") === false) { // only single string - did not contain '#'
$name = $email;
$email = "";
}
return array('name' => $name, 'email' => $email);
}
Looks like this is about twice as fast as the regex solution.
Note: the OPs third test case (for my purposes) is not needed. But in the interest of answering the OP I added the if stmt to produce the OPs expected results. This could have been done other ways (check the last element of $parts for '#').
use preg_match in php, http://php.net/manual/en/function.preg-match.php
or in my opinion, you can make your own function (let say get_email_address), it catch # character and then get the 'rest-left-string' from # until '<' character and 'rest-right-string' from # until '>' character.
for example, string monarch <themonarch#tgoci.com> will return 'rest-left-string' = themonarch and 'rest-right-string' = tgoci.com . finally, your function get_email_address will return themonarch#tgoci.com
hopefully it help.. :)
unfortunately the regex fails in a couple of conditions of the fullname:
non alphanumeric chars (eg. "Amazon.it")
non printable chars
emojs
i adjusted the expression this way
$sPattern = '/([^<]*)?(<)?(([\w-\.]+)#((?:[\w]+\.)+)([a-zA-Z]{2,4}))?(>)?/';
and now all chars are correctly recognized and splitted.
tested with
$address = "Test User # `` . !! 🔥 <test#email.com";
after 7 years, hope this helps :)

Get domain name (not subdomain) in php

I have a URL which can be any of the following formats:
http://example.com
https://example.com
http://example.com/foo
http://example.com/foo/bar
www.example.com
example.com
foo.example.com
www.foo.example.com
foo.bar.example.com
http://foo.bar.example.com/foo/bar
example.net/foo/bar
Essentially, I need to be able to match any normal URL. How can I extract example.com (or .net, whatever the tld happens to be. I need this to work with any TLD.) from all of these via a single regex?
Well you can use parse_url to get the host:
$info = parse_url($url);
$host = $info['host'];
Then, you can do some fancy stuff to get only the TLD and the Host
$host_names = explode(".", $host);
$bottom_host_name = $host_names[count($host_names)-2] . "." . $host_names[count($host_names)-1];
Not very elegant, but should work.
If you want an explanation, here it goes:
First we grab everything between the scheme (http://, etc), by using parse_url's capabilities to... well.... parse URL's. :)
Then we take the host name, and separate it into an array based on where the periods fall, so test.world.hello.myname would become:
array("test", "world", "hello", "myname");
After that, we take the number of elements in the array (4).
Then, we subtract 2 from it to get the second to last string (the hostname, or example, in your example)
Then, we subtract 1 from it to get the last string (because array keys start at 0), also known as the TLD
Then we combine those two parts with a period, and you have your base host name.
It is not possible to get the domain name without using a TLD list to compare with as their exist many cases with completely the same structure and length:
nas.db.de (Subdomain)
bbc.co.uk (Top-Level-Domain)
www.uk.com (Subdomain)
big.uk.com (Second-Level-Domain)
Mozilla's public suffix list should be the best option as it is used by all major browsers:
https://publicsuffix.org/list/public_suffix_list.dat
Feel free to use my function:
function tld_list($cache_dir=null) {
// we use "/tmp" if $cache_dir is not set
$cache_dir = isset($cache_dir) ? $cache_dir : sys_get_temp_dir();
$lock_dir = $cache_dir . '/public_suffix_list_lock/';
$list_dir = $cache_dir . '/public_suffix_list/';
// refresh list all 30 days
if (file_exists($list_dir) && #filemtime($list_dir) + 2592000 > time()) {
return $list_dir;
}
// use exclusive lock to avoid race conditions
if (!file_exists($lock_dir) && #mkdir($lock_dir)) {
// read from source
$list = #fopen('https://publicsuffix.org/list/public_suffix_list.dat', 'r');
if ($list) {
// the list is older than 30 days so delete everything first
if (file_exists($list_dir)) {
foreach (glob($list_dir . '*') as $filename) {
unlink($filename);
}
rmdir($list_dir);
}
// now set list directory with new timestamp
mkdir($list_dir);
// read line-by-line to avoid high memory usage
while ($line = fgets($list)) {
// skip comments and empty lines
if ($line[0] == '/' || !$line) {
continue;
}
// remove wildcard
if ($line[0] . $line[1] == '*.') {
$line = substr($line, 2);
}
// remove exclamation mark
if ($line[0] == '!') {
$line = substr($line, 1);
}
// reverse TLD and remove linebreak
$line = implode('.', array_reverse(explode('.', (trim($line)))));
// we split the TLD list to reduce memory usage
touch($list_dir . $line);
}
fclose($list);
}
#rmdir($lock_dir);
}
// repair locks (should never happen)
if (file_exists($lock_dir) && mt_rand(0, 100) == 0 && #filemtime($lock_dir) + 86400 < time()) {
#rmdir($lock_dir);
}
return $list_dir;
}
function get_domain($url=null) {
// obtain location of public suffix list
$tld_dir = tld_list();
// no url = our own host
$url = isset($url) ? $url : $_SERVER['SERVER_NAME'];
// add missing scheme ftp:// http:// ftps:// https://
$url = !isset($url[5]) || ($url[3] != ':' && $url[4] != ':' && $url[5] != ':') ? 'http://' . $url : $url;
// remove "/path/file.html", "/:80", etc.
$url = parse_url($url, PHP_URL_HOST);
// replace absolute domain name by relative (http://www.dns-sd.org/TrailingDotsInDomainNames.html)
$url = trim($url, '.');
// check if TLD exists
$url = explode('.', $url);
$parts = array_reverse($url);
foreach ($parts as $key => $part) {
$tld = implode('.', $parts);
if (file_exists($tld_dir . $tld)) {
return !$key ? '' : implode('.', array_slice($url, $key - 1));
}
// remove last part
array_pop($parts);
}
return '';
}
What it makes special:
it accepts every input like URLs, hostnames or domains with- or without scheme
the list is downloaded row-by-row to avoid high memory usage
it creates a new file per TLD in a cache folder so get_domain() only needs to check through file_exists() if it exists so it does not need to include a huge database on every request like TLDExtract does it.
the list will be automatically updated every 30 days
Test:
$urls = array(
'http://www.example.com',// example.com
'http://subdomain.example.com',// example.com
'http://www.example.uk.com',// example.uk.com
'http://www.example.co.uk',// example.co.uk
'http://www.example.com.ac',// example.com.ac
'http://example.com.ac',// example.com.ac
'http://www.example.accident-prevention.aero',// example.accident-prevention.aero
'http://www.example.sub.ar',// sub.ar
'http://www.congresodelalengua3.ar',// congresodelalengua3.ar
'http://congresodelalengua3.ar',// congresodelalengua3.ar
'http://www.example.pvt.k12.ma.us',// example.pvt.k12.ma.us
'http://www.example.lib.wy.us',// example.lib.wy.us
'com',// empty
'.com',// empty
'http://big.uk.com',// big.uk.com
'uk.com',// empty
'www.uk.com',// www.uk.com
'.uk.com',// empty
'stackoverflow.com',// stackoverflow.com
'.foobarfoo',// empty
'',// empty
false,// empty
' ',// empty
1,// empty
'a',// empty
);
Recent version with explanations (German):
http://www.programmierer-forum.de/domainnamen-ermitteln-t244185.htm
My solution in https://gist.github.com/pocesar/5366899
and the tests are here http://codepad.viper-7.com/GAh1tP
It works with any TLD, and hideous subdomain patterns (up to 3 subdomains).
There's a test included with many domain names.
Won't paste the function here because of the weird indentation for code in StackOverflow (could have fenced code blocks like github)
echo getDomainOnly("http://example.com/foo/bar");
function getDomainOnly($host){
$host = strtolower(trim($host));
$host = ltrim(str_replace("http://","",str_replace("https://","",$host)),"www.");
$count = substr_count($host, '.');
if($count === 2){
if(strlen(explode('.', $host)[1]) > 3) $host = explode('.', $host, 2)[1];
} else if($count > 2){
$host = getDomainOnly(explode('.', $host, 2)[1]);
}
$host = explode('/',$host);
return $host[0];
}
I recommend using TLDExtract library for all operations with domain name.
I think the best way to handle this problem is:
$second_level_domains_regex = '/\.asn\.au$|\.com\.au$|\.net\.au$|\.id\.au$|\.org\.au$|\.edu\.au$|\.gov\.au$|\.csiro\.au$|\.act\.au$|\.nsw\.au$|\.nt\.au$|\.qld\.au$|\.sa\.au$|\.tas\.au$|\.vic\.au$|\.wa\.au$|\.co\.at$|\.or\.at$|\.priv\.at$|\.ac\.at$|\.avocat\.fr$|\.aeroport\.fr$|\.veterinaire\.fr$|\.co\.hu$|\.film\.hu$|\.lakas\.hu$|\.ingatlan\.hu$|\.sport\.hu$|\.hotel\.hu$|\.ac\.nz$|\.co\.nz$|\.geek\.nz$|\.gen\.nz$|\.kiwi\.nz$|\.maori\.nz$|\.net\.nz$|\.org\.nz$|\.school\.nz$|\.cri\.nz$|\.govt\.nz$|\.health\.nz$|\.iwi\.nz$|\.mil\.nz$|\.parliament\.nz$|\.ac\.za$|\.gov\.za$|\.law\.za$|\.mil\.za$|\.nom\.za$|\.school\.za$|\.net\.za$|\.co\.uk$|\.org\.uk$|\.me\.uk$|\.ltd\.uk$|\.plc\.uk$|\.net\.uk$|\.sch\.uk$|\.ac\.uk$|\.gov\.uk$|\.mod\.uk$|\.mil\.uk$|\.nhs\.uk$|\.police\.uk$/';
$domain = $_SERVER['HTTP_HOST'];
$domain = explode('.', $domain);
$domain = array_reverse($domain);
if (preg_match($second_level_domains_regex, $_SERVER['HTTP_HOST']) {
$domain = "$domain[2].$domain[1].$domain[0]";
} else {
$domain = "$domain[1].$domain[0]";
}
$onlyHostName = implode('.', array_slice(explode('.', parse_url($link, PHP_URL_HOST)), -2));
Using https://subdomain.domain.com/some/path as example
parse_url($link, PHP_URL_HOST) returns subdomain.domain.com
explode('.', parse_url($link, PHP_URL_HOST)) then breaks subdomain.domain.com into an array:
array(3) {
[0]=>
string(5) "subdomain"
[1]=>
string(7) "domain"
[2]=>
string(3) "com"
}
array_slice then slices the array so only the last 2 values are in the array (signified by the -2):
array(2) {
[0]=>
string(6) "domain"
[1]=>
string(3) "com"
}
implode then combines those two array values back together, ultimately giving you the result of domain.com
Note: this will only work when end domain you're expecting only has one . in it, like something.domain.com or else.something.domain.net
It will not work for something.domain.co.uk where you would expect domain.co.uk
There are two ways to extract subdomain from a host:
The first method that is more accurate is to use a database of tlds (like public_suffix_list.dat) and match domain with it. This is a little heavy in some cases. There are some PHP classes for using it like php-domain-parser and TLDExtract.
The second way is not as accurate as the first one, but is very fast and it can give the correct answer in many case, I wrote this function for it:
function get_domaininfo($url) {
// regex can be replaced with parse_url
preg_match("/^(https|http|ftp):\/\/(.*?)\//", "$url/" , $matches);
$parts = explode(".", $matches[2]);
$tld = array_pop($parts);
$host = array_pop($parts);
if ( strlen($tld) == 2 && strlen($host) <= 3 ) {
$tld = "$host.$tld";
$host = array_pop($parts);
}
return array(
'protocol' => $matches[1],
'subdomain' => implode(".", $parts),
'domain' => "$host.$tld",
'host'=>$host,'tld'=>$tld
);
}
Example:
print_r(get_domaininfo('http://mysubdomain.domain.co.uk/index.php'));
Returns:
Array
(
[protocol] => https
[subdomain] => mysubdomain
[domain] => domain.co.uk
[host] => domain
[tld] => co.uk
)
Here's a function I wrote to grab the domain without subdomain(s), regardless of whether the domain is using a ccTLD or a new style long TLD, etc... There is no lookup or huge array of known TLDs, and there's no regex. It can be a lot shorter using the ternary operator and nesting, but I expanded it for readability.
// Per Wikipedia: "All ASCII ccTLD identifiers are two letters long,
// and all two-letter top-level domains are ccTLDs."
function topDomainFromURL($url) {
$url_parts = parse_url($url);
$domain_parts = explode('.', $url_parts['host']);
if (strlen(end($domain_parts)) == 2 ) {
// ccTLD here, get last three parts
$top_domain_parts = array_slice($domain_parts, -3);
} else {
$top_domain_parts = array_slice($domain_parts, -2);
}
$top_domain = implode('.', $top_domain_parts);
return $top_domain;
}
function getDomain($url){
$pieces = parse_url($url);
$domain = isset($pieces['host']) ? $pieces['host'] : '';
if(preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)){
return $regs['domain'];
}
return FALSE;
}
echo getDomain("http://example.com"); // outputs 'example.com'
echo getDomain("http://www.example.com"); // outputs 'example.com'
echo getDomain("http://mail.example.co.uk"); // outputs 'example.co.uk'
I had problems with the solution provided by pocesar.
When I would use for instance subdomain.domain.nl it would not return domain.nl. Instead it would return subdomain.domain.nl
Another problem was that domain.com.br would return com.br
I am not sure but i fixed these issues with the following code (i hope it will help someone, if so I am a happy man):
function get_domain($domain, $debug = false){
$original = $domain = strtolower($domain);
if (filter_var($domain, FILTER_VALIDATE_IP)) {
return $domain;
}
$debug ? print('<strong style="color:green">»</strong> Parsing: '.$original) : false;
$arr = array_slice(array_filter(explode('.', $domain, 4), function($value){
return $value !== 'www';
}), 0); //rebuild array indexes
if (count($arr) > 2){
$count = count($arr);
$_sub = explode('.', $count === 4 ? $arr[3] : $arr[2]);
$debug ? print(" (parts count: {$count})") : false;
if (count($_sub) === 2){ // two level TLD
$removed = array_shift($arr);
if ($count === 4){ // got a subdomain acting as a domain
$removed = array_shift($arr);
}
$debug ? print("<br>\n" . '[*] Two level TLD: <strong>' . join('.', $_sub) . '</strong> ') : false;
}elseif (count($_sub) === 1){ // one level TLD
$removed = array_shift($arr); //remove the subdomain
if (strlen($arr[0]) === 2 && $count === 3){ // TLD domain must be 2 letters
array_unshift($arr, $removed);
}elseif(strlen($arr[0]) === 3 && $count === 3){
array_unshift($arr, $removed);
}else{
// non country TLD according to IANA
$tlds = array(
'aero',
'arpa',
'asia',
'biz',
'cat',
'com',
'coop',
'edu',
'gov',
'info',
'jobs',
'mil',
'mobi',
'museum',
'name',
'net',
'org',
'post',
'pro',
'tel',
'travel',
'xxx',
);
if (count($arr) > 2 && in_array($_sub[0], $tlds) !== false){ //special TLD don't have a country
array_shift($arr);
}
}
$debug ? print("<br>\n" .'[*] One level TLD: <strong>'.join('.', $_sub).'</strong> ') : false;
}else{ // more than 3 levels, something is wrong
for ($i = count($_sub); $i > 1; $i--){
$removed = array_shift($arr);
}
$debug ? print("<br>\n" . '[*] Three level TLD: <strong>' . join('.', $_sub) . '</strong> ') : false;
}
}elseif (count($arr) === 2){
$arr0 = array_shift($arr);
if (strpos(join('.', $arr), '.') === false && in_array($arr[0], array('localhost','test','invalid')) === false){ // not a reserved domain
$debug ? print("<br>\n" .'Seems invalid domain: <strong>'.join('.', $arr).'</strong> re-adding: <strong>'.$arr0.'</strong> ') : false;
// seems invalid domain, restore it
array_unshift($arr, $arr0);
}
}
$debug ? print("<br>\n".'<strong style="color:gray">«</strong> Done parsing: <span style="color:red">' . $original . '</span> as <span style="color:blue">'. join('.', $arr) ."</span><br>\n") : false;
return join('.', $arr);
}
Here's one that works for all domains, including those with second level domains like "co.uk"
function strip_subdomains($url){
# credits to gavingmiller for maintaining this list
$second_level_domains = file_get_contents("https://raw.githubusercontent.com/gavingmiller/second-level-domains/master/SLDs.csv");
# presume sld first ...
$possible_sld = implode('.', array_slice(explode('.', $url), -2));
# and then verify it
if (strpos($second_level_domains, $possible_sld)){
return implode('.', array_slice(explode('.', $url), -3));
} else {
return implode('.', array_slice(explode('.', $url), -2));
}
}
Looks like there's a duplicate question here: delete-subdomain-from-url-string-if-subdomain-is-found
Very late, I see that you marked regex as a keyword and my function works like a charm, so far I haven't found a url that fails:
function get_domain_regex($url){
$pieces = parse_url($url);
$domain = isset($pieces['host']) ? $pieces['host'] : '';
if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
return $regs['domain'];
}else{
return false;
}
}
if you want one without regex I have this one, which I am sure I also took from this post
function get_domain($url){
$parseUrl = parse_url($url);
$host = $parseUrl['host'];
$host_array = explode(".", $host);
$domain = $host_array[count($host_array)-2] . "." . $host_array[count($host_array)-1];
return $domain;
}
They both work amazing, BUT, this took me a while to realize if the url doesn't start with http:// or https:// it will fail so make sure the url string starts with the protocol.
Simply try this:
preg_match('/(www.)?([^.]+\.[^.]+)$/', $yourHost, $matches);
echo "domain name is: {$matches[0]}\n";
this working for majority of domains.
This function will return the domain name without the extension of any url given even if you parse a url without the http:// or https://
You can extend this code
(?:\.co)?(?:\.com)?(?:\.gov)?(?:\.net)?(?:\.org)?(?:\.id)?
with more extensions if you want to handle more second level domainnames.
function get_domain_name($url){
$pieces = parse_url($url);
$domain = isset($pieces['host']) ? $pieces['host'] : $url;
$domain = strtolower($domain);
$domain = preg_replace('/.international$/', '.com', $domain);
if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,90}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
if (preg_match('/(.*?)((?:\.co)?(?:\.com)?(?:\.gov)?(?:\.net)?(?:\.org)?(?:\.id)?(?:\.asn)?.[a-z]{2,6})$/i', $regs['domain'], $matches)) {
return $matches[1];
}else return $regs['domain'];
}else{
return $url;
}
}
I'm using this to achieve the same target and it always works, I hope it will help others.
$url = https://use.fontawesome.com/releases/v5.11.2/css/all.css?ver=2.7.5
$handle = pathinfo( parse_url( $url )['host'] )['filename'];
$final_handle = substr( $handle , strpos( $handle , '.' ) + 1 );
print_r($final_handle); // fontawesome
Simplest solution
#preg_replace('#\/(.)*#', '', #preg_replace('#^https?://(www.)?#', '', $url))
Simply try this:
<?php
$host = $_SERVER['HTTP_HOST'];
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
echo "domain name is: {$matches[0]}\n";
?>

Categories