Parsing domain from a URL - php

I need to build a function which parses the domain from a URL.
So, with
http://google.com/dhasjkdas/sadsdds/sdda/sdads.html
or
http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html
it should return google.com
with
http://google.co.uk/dhasjkdas/sadsdds/sdda/sdads.html
it should return google.co.uk.

Check out parse_url():
$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$parse = parse_url($url);
echo $parse['host']; // prints 'google.com'
parse_url doesn't handle really badly mangled urls very well, but is fine if you generally expect decent urls.

$domain = str_ireplace('www.', '', parse_url($url, PHP_URL_HOST));
This would return the google.com for both http://google.com/... and http://www.google.com/...

From http://us3.php.net/manual/en/function.parse-url.php#93983
for some odd reason, parse_url
returns the host (ex. example.com) as
the path when no scheme is provided in
the input url. So I've written a quick
function to get the real host:
function getHost($Address) {
$parseUrl = parse_url(trim($Address));
return trim($parseUrl['host'] ? $parseUrl['host'] : array_shift(explode('/', $parseUrl['path'], 2)));
}
getHost("example.com"); // Gives example.com
getHost("http://example.com"); // Gives example.com
getHost("www.example.com"); // Gives www.example.com
getHost("http://example.com/xyz"); // Gives example.com

function get_domain($url = SITE_URL)
{
preg_match("/[a-z0-9\-]{1,63}\.[a-z\.]{2,6}$/", parse_url($url, PHP_URL_HOST), $_domain_tld);
return $_domain_tld[0];
}
get_domain('http://www.cdl.gr'); //cdl.gr
get_domain('http://cdl.gr'); //cdl.gr
get_domain('http://www2.cdl.gr'); //cdl.gr

The code that was meant to work 100% didn't seem to cut it for me, I did patch the example a little but found code that wasn't helping and problems with it. so I changed it out to a couple of functions (to save asking for the list from Mozilla all the time, and removing the cache system). This has been tested against a set of 1000 URLs and seemed to work.
function domain($url)
{
global $subtlds;
$slds = "";
$url = strtolower($url);
$host = parse_url('http://'.$url,PHP_URL_HOST);
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
foreach($subtlds as $sub){
if (preg_match('/\.'.preg_quote($sub).'$/', $host, $xyz)){
preg_match("/[^\.\/]+\.[^\.\/]+\.[^\.\/]+$/", $host, $matches);
}
}
return #$matches[0];
}
function get_tlds() {
$address = 'http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1';
$content = file($address);
foreach ($content as $num => $line) {
$line = trim($line);
if($line == '') continue;
if(#substr($line[0], 0, 2) == '/') continue;
$line = #preg_replace("/[^a-zA-Z0-9\.]/", '', $line);
if($line == '') continue; //$line = '.'.$line;
if(#$line[0] == '.') $line = substr($line, 1);
if(!strstr($line, '.')) continue;
$subtlds[] = $line;
//echo "{$num}: '{$line}'"; echo "<br>";
}
$subtlds = array_merge(array(
'co.uk', 'me.uk', 'net.uk', 'org.uk', 'sch.uk', 'ac.uk',
'gov.uk', 'nhs.uk', 'police.uk', 'mod.uk', 'asn.au', 'com.au',
'net.au', 'id.au', 'org.au', 'edu.au', 'gov.au', 'csiro.au'
), $subtlds);
$subtlds = array_unique($subtlds);
return $subtlds;
}
Then use it like
$subtlds = get_tlds();
echo domain('www.example.com') //outputs: example.com
echo domain('www.example.uk.com') //outputs: example.uk.com
echo domain('www.example.fr') //outputs: example.fr
I know I should have turned this into a class, but didn't have time.

Please consider replacring the accepted solution with the following:
parse_url() will always include any sub-domain(s), so this function doesn't parse domain names very well.
Here are some examples:
$url = 'http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$parse = parse_url($url);
echo $parse['host']; // prints 'www.google.com'
echo parse_url('https://subdomain.example.com/foo/bar', PHP_URL_HOST);
// Output: subdomain.example.com
echo parse_url('https://subdomain.example.co.uk/foo/bar', PHP_URL_HOST);
// Output: subdomain.example.co.uk
Instead, you may consider this pragmatic solution.
It will cover many, but not all domain names -- for instance, lower-level domains such as 'sos.state.oh.us' are not covered.
function getDomain($url) {
$host = parse_url($url, PHP_URL_HOST);
if(filter_var($host,FILTER_VALIDATE_IP)) {
// IP address returned as domain
return $host; //* or replace with null if you don't want an IP back
}
$domain_array = explode(".", str_replace('www.', '', $host));
$count = count($domain_array);
if( $count>=3 && strlen($domain_array[$count-2])==2 ) {
// SLD (example.co.uk)
return implode('.', array_splice($domain_array, $count-3,3));
} else if( $count>=2 ) {
// TLD (example.com)
return implode('.', array_splice($domain_array, $count-2,2));
}
}
// Your domains
echo getDomain('http://google.com/dhasjkdas/sadsdds/sdda/sdads.html'); // google.com
echo getDomain('http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html'); // google.com
echo getDomain('http://google.co.uk/dhasjkdas/sadsdds/sdda/sdads.html'); // google.co.uk
// TLD
echo getDomain('https://shop.example.com'); // example.com
echo getDomain('https://foo.bar.example.com'); // example.com
echo getDomain('https://www.example.com'); // example.com
echo getDomain('https://example.com'); // example.com
// SLD
echo getDomain('https://more.news.bbc.co.uk'); // bbc.co.uk
echo getDomain('https://www.bbc.co.uk'); // bbc.co.uk
echo getDomain('https://bbc.co.uk'); // bbc.co.uk
// IP
echo getDomain('https://1.2.3.45'); // 1.2.3.45
Finally, Jeremy Kendall's PHP Domain Parser allows you to parse the domain name from a url. League URI Hostname Parser will also do the job.

If you want extract host from string http://google.com/dhasjkdas/sadsdds/sdda/sdads.html, usage of parse_url() is acceptable solution for you.
But if you want extract domain or its parts, you need package that using Public Suffix List. Yes, you can use string functions arround parse_url(), but it will produce incorrect results sometimes.
I recomend TLDExtract for domain parsing, here is sample code that show diff:
$extract = new LayerShifter\TLDExtract\Extract();
# For 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html'
$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
parse_url($url, PHP_URL_HOST); // will return google.com
$result = $extract->parse($url);
$result->getFullHost(); // will return 'google.com'
$result->getRegistrableDomain(); // will return 'google.com'
$result->getSuffix(); // will return 'com'
# For 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html'
$url = 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html';
parse_url($url, PHP_URL_HOST); // will return 'search.google.com'
$result = $extract->parse($url);
$result->getFullHost(); // will return 'search.google.com'
$result->getRegistrableDomain(); // will return 'google.com'

You can pass PHP_URL_HOST into parse_url function as second parameter
$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$host = parse_url($url, PHP_URL_HOST);
print $host; // prints 'google.com'

Here is the code i made that 100% finds only the domain name, since it takes mozilla sub tlds to account. Only thing you have to check is how you make cache of that file, so you dont query mozilla every time.
For some strange reason, domains like co.uk are not in the list, so you have to make some hacking and add them manually. Its not cleanest solution but i hope it helps someone.
//=====================================================
static function domain($url)
{
$slds = "";
$url = strtolower($url);
$address = 'http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1';
if(!$subtlds = #kohana::cache('subtlds', null, 60))
{
$content = file($address);
foreach($content as $num => $line)
{
$line = trim($line);
if($line == '') continue;
if(#substr($line[0], 0, 2) == '/') continue;
$line = #preg_replace("/[^a-zA-Z0-9\.]/", '', $line);
if($line == '') continue; //$line = '.'.$line;
if(#$line[0] == '.') $line = substr($line, 1);
if(!strstr($line, '.')) continue;
$subtlds[] = $line;
//echo "{$num}: '{$line}'"; echo "<br>";
}
$subtlds = array_merge(Array(
'co.uk', 'me.uk', 'net.uk', 'org.uk', 'sch.uk', 'ac.uk',
'gov.uk', 'nhs.uk', 'police.uk', 'mod.uk', 'asn.au', 'com.au',
'net.au', 'id.au', 'org.au', 'edu.au', 'gov.au', 'csiro.au',
),$subtlds);
$subtlds = array_unique($subtlds);
//echo var_dump($subtlds);
#kohana::cache('subtlds', $subtlds);
}
preg_match('/^(http:[\/]{2,})?([^\/]+)/i', $url, $matches);
//preg_match("/^(http:\/\/|https:\/\/|)[a-zA-Z-]([^\/]+)/i", $url, $matches);
$host = #$matches[2];
//echo var_dump($matches);
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
foreach($subtlds as $sub)
{
if (preg_match("/{$sub}$/", $host, $xyz))
preg_match("/[^\.\/]+\.[^\.\/]+\.[^\.\/]+$/", $host, $matches);
}
return #$matches[0];
}

I'm adding this answer late since this is the answer that pops up most on Google...
You can use PHP to...
$url = "www.google.co.uk";
$host = parse_url($url, PHP_URL_HOST);
// $host == "www.google.co.uk"
to grab the host but not the private domain to which the host refers. (Example www.google.co.uk is the host, but google.co.uk is the private domain)
To grab the private domain, you must need know the list of public suffixes to which one can register a private domain. This list happens to be curated by Mozilla at https://publicsuffix.org/
The below code works when an array of public suffixes has been created already. Simply call
$domain = get_private_domain("www.google.co.uk");
with the remaining code...
// find some way to parse the above list of public suffix
// then add them to a PHP array
$suffix = [... all valid public suffix ...];
function get_public_suffix($host) {
$parts = split("\.", $host);
while (count($parts) > 0) {
if (is_public_suffix(join(".", $parts)))
return join(".", $parts);
array_shift($parts);
}
return false;
}
function is_public_suffix($host) {
global $suffix;
return isset($suffix[$host]);
}
function get_private_domain($host) {
$public = get_public_suffix($host);
$public_parts = split("\.", $public);
$all_parts = split("\.", $host);
$private = [];
for ($x = 0; $x < count($public_parts); ++$x)
$private[] = array_pop($all_parts);
if (count($all_parts) > 0)
$private[] = array_pop($all_parts);
return join(".", array_reverse($private));
}

I've found that #philfreo's solution (referenced from php.net) is pretty well to get fine result but in some cases it shows php's "notice" and "Strict Standards" message. Here a fixed version of this code.
function getHost($url) {
$parseUrl = parse_url(trim($url));
if(isset($parseUrl['host']))
{
$host = $parseUrl['host'];
}
else
{
$path = explode('/', $parseUrl['path']);
$host = $path[0];
}
return trim($host);
}
echo getHost("http://example.com/anything.html"); // example.com
echo getHost("http://www.example.net/directory/post.php"); // www.example.net
echo getHost("https://example.co.uk"); // example.co.uk
echo getHost("www.example.net"); // example.net
echo getHost("subdomain.example.net/anything"); // subdomain.example.net
echo getHost("example.net"); // example.net

function getTrimmedUrl($link)
{
$str = str_replace(["www.","https://","http://"],[''],$link);
$link = explode("/",$str);
return strtolower($link[0]);
}

$domain = parse_url($url, PHP_URL_HOST);
echo implode('.', array_slice(explode('.', $domain), -2, 2))

parse_url didn't work for me. It only returned the path. Switching to basics using php5.3+:
$url = str_replace('http://', '', strtolower( $s->website));
if (strpos($url, '/')) $url = strstr($url, '/', true);

I have edited for you:
function getHost($Address) {
$parseUrl = parse_url(trim($Address));
$host = trim($parseUrl['host'] ? $parseUrl['host'] : array_shift(explode('/', $parseUrl['path'], 2)));
$parts = explode( '.', $host );
$num_parts = count($parts);
if ($parts[0] == "www") {
for ($i=1; $i < $num_parts; $i++) {
$h .= $parts[$i] . '.';
}
}else {
for ($i=0; $i < $num_parts; $i++) {
$h .= $parts[$i] . '.';
}
}
return substr($h,0,-1);
}
All type url (www.domain.ltd, sub1.subn.domain.ltd will result to : domain.ltd.

None of this solutions worked for me when I use this test cases:
public function getTestCases(): array
{
return [
//input expected
['http://google.com/dhasjkdas', 'google.com'],
['https://google.com/dhasjkdas', 'google.com'],
['https://www.google.com/dhasjkdas', 'google.com'],
['http://www.google.com/dhasjkdas', 'google.com'],
['www.google.com/dhasjkdas', 'google.com'],
['google.com/dhasjkdas', 'google.com'],
];
}
but wrapping this answer into function worked in all cases: https://stackoverflow.com/a/65659814/5884988

This will generally work very well if the input URL is not total junk. It removes the subdomain.
$host = parse_url( $Row->url, PHP_URL_HOST );
$parts = explode( '.', $host );
$parts = array_reverse( $parts );
$domain = $parts[1].'.'.$parts[0];
Example
Input: http://www2.website.com:8080/some/file/structure?some=parameters
Output: website.com

Combining the answers of worldofjr and Alix Axel into one small function that will handle most use-cases:
function get_url_hostname($url) {
$parse = parse_url($url);
return str_ireplace('www.', '', $parse['host']);
}
get_url_hostname('http://www.google.com/example/path/file.html'); // google.com

Just use as like following ...
<?php
echo $_SERVER['SERVER_NAME'];
?>

Related

PHP Regex - Remove second-level domain [duplicate]

I need to build a function which parses the domain from a URL.
So, with
http://google.com/dhasjkdas/sadsdds/sdda/sdads.html
or
http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html
it should return google.com
with
http://google.co.uk/dhasjkdas/sadsdds/sdda/sdads.html
it should return google.co.uk.
Check out parse_url():
$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$parse = parse_url($url);
echo $parse['host']; // prints 'google.com'
parse_url doesn't handle really badly mangled urls very well, but is fine if you generally expect decent urls.
$domain = str_ireplace('www.', '', parse_url($url, PHP_URL_HOST));
This would return the google.com for both http://google.com/... and http://www.google.com/...
From http://us3.php.net/manual/en/function.parse-url.php#93983
for some odd reason, parse_url
returns the host (ex. example.com) as
the path when no scheme is provided in
the input url. So I've written a quick
function to get the real host:
function getHost($Address) {
$parseUrl = parse_url(trim($Address));
return trim($parseUrl['host'] ? $parseUrl['host'] : array_shift(explode('/', $parseUrl['path'], 2)));
}
getHost("example.com"); // Gives example.com
getHost("http://example.com"); // Gives example.com
getHost("www.example.com"); // Gives www.example.com
getHost("http://example.com/xyz"); // Gives example.com
function get_domain($url = SITE_URL)
{
preg_match("/[a-z0-9\-]{1,63}\.[a-z\.]{2,6}$/", parse_url($url, PHP_URL_HOST), $_domain_tld);
return $_domain_tld[0];
}
get_domain('http://www.cdl.gr'); //cdl.gr
get_domain('http://cdl.gr'); //cdl.gr
get_domain('http://www2.cdl.gr'); //cdl.gr
The code that was meant to work 100% didn't seem to cut it for me, I did patch the example a little but found code that wasn't helping and problems with it. so I changed it out to a couple of functions (to save asking for the list from Mozilla all the time, and removing the cache system). This has been tested against a set of 1000 URLs and seemed to work.
function domain($url)
{
global $subtlds;
$slds = "";
$url = strtolower($url);
$host = parse_url('http://'.$url,PHP_URL_HOST);
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
foreach($subtlds as $sub){
if (preg_match('/\.'.preg_quote($sub).'$/', $host, $xyz)){
preg_match("/[^\.\/]+\.[^\.\/]+\.[^\.\/]+$/", $host, $matches);
}
}
return #$matches[0];
}
function get_tlds() {
$address = 'http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1';
$content = file($address);
foreach ($content as $num => $line) {
$line = trim($line);
if($line == '') continue;
if(#substr($line[0], 0, 2) == '/') continue;
$line = #preg_replace("/[^a-zA-Z0-9\.]/", '', $line);
if($line == '') continue; //$line = '.'.$line;
if(#$line[0] == '.') $line = substr($line, 1);
if(!strstr($line, '.')) continue;
$subtlds[] = $line;
//echo "{$num}: '{$line}'"; echo "<br>";
}
$subtlds = array_merge(array(
'co.uk', 'me.uk', 'net.uk', 'org.uk', 'sch.uk', 'ac.uk',
'gov.uk', 'nhs.uk', 'police.uk', 'mod.uk', 'asn.au', 'com.au',
'net.au', 'id.au', 'org.au', 'edu.au', 'gov.au', 'csiro.au'
), $subtlds);
$subtlds = array_unique($subtlds);
return $subtlds;
}
Then use it like
$subtlds = get_tlds();
echo domain('www.example.com') //outputs: example.com
echo domain('www.example.uk.com') //outputs: example.uk.com
echo domain('www.example.fr') //outputs: example.fr
I know I should have turned this into a class, but didn't have time.
Please consider replacring the accepted solution with the following:
parse_url() will always include any sub-domain(s), so this function doesn't parse domain names very well.
Here are some examples:
$url = 'http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$parse = parse_url($url);
echo $parse['host']; // prints 'www.google.com'
echo parse_url('https://subdomain.example.com/foo/bar', PHP_URL_HOST);
// Output: subdomain.example.com
echo parse_url('https://subdomain.example.co.uk/foo/bar', PHP_URL_HOST);
// Output: subdomain.example.co.uk
Instead, you may consider this pragmatic solution.
It will cover many, but not all domain names -- for instance, lower-level domains such as 'sos.state.oh.us' are not covered.
function getDomain($url) {
$host = parse_url($url, PHP_URL_HOST);
if(filter_var($host,FILTER_VALIDATE_IP)) {
// IP address returned as domain
return $host; //* or replace with null if you don't want an IP back
}
$domain_array = explode(".", str_replace('www.', '', $host));
$count = count($domain_array);
if( $count>=3 && strlen($domain_array[$count-2])==2 ) {
// SLD (example.co.uk)
return implode('.', array_splice($domain_array, $count-3,3));
} else if( $count>=2 ) {
// TLD (example.com)
return implode('.', array_splice($domain_array, $count-2,2));
}
}
// Your domains
echo getDomain('http://google.com/dhasjkdas/sadsdds/sdda/sdads.html'); // google.com
echo getDomain('http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html'); // google.com
echo getDomain('http://google.co.uk/dhasjkdas/sadsdds/sdda/sdads.html'); // google.co.uk
// TLD
echo getDomain('https://shop.example.com'); // example.com
echo getDomain('https://foo.bar.example.com'); // example.com
echo getDomain('https://www.example.com'); // example.com
echo getDomain('https://example.com'); // example.com
// SLD
echo getDomain('https://more.news.bbc.co.uk'); // bbc.co.uk
echo getDomain('https://www.bbc.co.uk'); // bbc.co.uk
echo getDomain('https://bbc.co.uk'); // bbc.co.uk
// IP
echo getDomain('https://1.2.3.45'); // 1.2.3.45
Finally, Jeremy Kendall's PHP Domain Parser allows you to parse the domain name from a url. League URI Hostname Parser will also do the job.
If you want extract host from string http://google.com/dhasjkdas/sadsdds/sdda/sdads.html, usage of parse_url() is acceptable solution for you.
But if you want extract domain or its parts, you need package that using Public Suffix List. Yes, you can use string functions arround parse_url(), but it will produce incorrect results sometimes.
I recomend TLDExtract for domain parsing, here is sample code that show diff:
$extract = new LayerShifter\TLDExtract\Extract();
# For 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html'
$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
parse_url($url, PHP_URL_HOST); // will return google.com
$result = $extract->parse($url);
$result->getFullHost(); // will return 'google.com'
$result->getRegistrableDomain(); // will return 'google.com'
$result->getSuffix(); // will return 'com'
# For 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html'
$url = 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html';
parse_url($url, PHP_URL_HOST); // will return 'search.google.com'
$result = $extract->parse($url);
$result->getFullHost(); // will return 'search.google.com'
$result->getRegistrableDomain(); // will return 'google.com'
You can pass PHP_URL_HOST into parse_url function as second parameter
$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$host = parse_url($url, PHP_URL_HOST);
print $host; // prints 'google.com'
Here is the code i made that 100% finds only the domain name, since it takes mozilla sub tlds to account. Only thing you have to check is how you make cache of that file, so you dont query mozilla every time.
For some strange reason, domains like co.uk are not in the list, so you have to make some hacking and add them manually. Its not cleanest solution but i hope it helps someone.
//=====================================================
static function domain($url)
{
$slds = "";
$url = strtolower($url);
$address = 'http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1';
if(!$subtlds = #kohana::cache('subtlds', null, 60))
{
$content = file($address);
foreach($content as $num => $line)
{
$line = trim($line);
if($line == '') continue;
if(#substr($line[0], 0, 2) == '/') continue;
$line = #preg_replace("/[^a-zA-Z0-9\.]/", '', $line);
if($line == '') continue; //$line = '.'.$line;
if(#$line[0] == '.') $line = substr($line, 1);
if(!strstr($line, '.')) continue;
$subtlds[] = $line;
//echo "{$num}: '{$line}'"; echo "<br>";
}
$subtlds = array_merge(Array(
'co.uk', 'me.uk', 'net.uk', 'org.uk', 'sch.uk', 'ac.uk',
'gov.uk', 'nhs.uk', 'police.uk', 'mod.uk', 'asn.au', 'com.au',
'net.au', 'id.au', 'org.au', 'edu.au', 'gov.au', 'csiro.au',
),$subtlds);
$subtlds = array_unique($subtlds);
//echo var_dump($subtlds);
#kohana::cache('subtlds', $subtlds);
}
preg_match('/^(http:[\/]{2,})?([^\/]+)/i', $url, $matches);
//preg_match("/^(http:\/\/|https:\/\/|)[a-zA-Z-]([^\/]+)/i", $url, $matches);
$host = #$matches[2];
//echo var_dump($matches);
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
foreach($subtlds as $sub)
{
if (preg_match("/{$sub}$/", $host, $xyz))
preg_match("/[^\.\/]+\.[^\.\/]+\.[^\.\/]+$/", $host, $matches);
}
return #$matches[0];
}
I'm adding this answer late since this is the answer that pops up most on Google...
You can use PHP to...
$url = "www.google.co.uk";
$host = parse_url($url, PHP_URL_HOST);
// $host == "www.google.co.uk"
to grab the host but not the private domain to which the host refers. (Example www.google.co.uk is the host, but google.co.uk is the private domain)
To grab the private domain, you must need know the list of public suffixes to which one can register a private domain. This list happens to be curated by Mozilla at https://publicsuffix.org/
The below code works when an array of public suffixes has been created already. Simply call
$domain = get_private_domain("www.google.co.uk");
with the remaining code...
// find some way to parse the above list of public suffix
// then add them to a PHP array
$suffix = [... all valid public suffix ...];
function get_public_suffix($host) {
$parts = split("\.", $host);
while (count($parts) > 0) {
if (is_public_suffix(join(".", $parts)))
return join(".", $parts);
array_shift($parts);
}
return false;
}
function is_public_suffix($host) {
global $suffix;
return isset($suffix[$host]);
}
function get_private_domain($host) {
$public = get_public_suffix($host);
$public_parts = split("\.", $public);
$all_parts = split("\.", $host);
$private = [];
for ($x = 0; $x < count($public_parts); ++$x)
$private[] = array_pop($all_parts);
if (count($all_parts) > 0)
$private[] = array_pop($all_parts);
return join(".", array_reverse($private));
}
I've found that #philfreo's solution (referenced from php.net) is pretty well to get fine result but in some cases it shows php's "notice" and "Strict Standards" message. Here a fixed version of this code.
function getHost($url) {
$parseUrl = parse_url(trim($url));
if(isset($parseUrl['host']))
{
$host = $parseUrl['host'];
}
else
{
$path = explode('/', $parseUrl['path']);
$host = $path[0];
}
return trim($host);
}
echo getHost("http://example.com/anything.html"); // example.com
echo getHost("http://www.example.net/directory/post.php"); // www.example.net
echo getHost("https://example.co.uk"); // example.co.uk
echo getHost("www.example.net"); // example.net
echo getHost("subdomain.example.net/anything"); // subdomain.example.net
echo getHost("example.net"); // example.net
function getTrimmedUrl($link)
{
$str = str_replace(["www.","https://","http://"],[''],$link);
$link = explode("/",$str);
return strtolower($link[0]);
}
$domain = parse_url($url, PHP_URL_HOST);
echo implode('.', array_slice(explode('.', $domain), -2, 2))
parse_url didn't work for me. It only returned the path. Switching to basics using php5.3+:
$url = str_replace('http://', '', strtolower( $s->website));
if (strpos($url, '/')) $url = strstr($url, '/', true);
I have edited for you:
function getHost($Address) {
$parseUrl = parse_url(trim($Address));
$host = trim($parseUrl['host'] ? $parseUrl['host'] : array_shift(explode('/', $parseUrl['path'], 2)));
$parts = explode( '.', $host );
$num_parts = count($parts);
if ($parts[0] == "www") {
for ($i=1; $i < $num_parts; $i++) {
$h .= $parts[$i] . '.';
}
}else {
for ($i=0; $i < $num_parts; $i++) {
$h .= $parts[$i] . '.';
}
}
return substr($h,0,-1);
}
All type url (www.domain.ltd, sub1.subn.domain.ltd will result to : domain.ltd.
None of this solutions worked for me when I use this test cases:
public function getTestCases(): array
{
return [
//input expected
['http://google.com/dhasjkdas', 'google.com'],
['https://google.com/dhasjkdas', 'google.com'],
['https://www.google.com/dhasjkdas', 'google.com'],
['http://www.google.com/dhasjkdas', 'google.com'],
['www.google.com/dhasjkdas', 'google.com'],
['google.com/dhasjkdas', 'google.com'],
];
}
but wrapping this answer into function worked in all cases: https://stackoverflow.com/a/65659814/5884988
This will generally work very well if the input URL is not total junk. It removes the subdomain.
$host = parse_url( $Row->url, PHP_URL_HOST );
$parts = explode( '.', $host );
$parts = array_reverse( $parts );
$domain = $parts[1].'.'.$parts[0];
Example
Input: http://www2.website.com:8080/some/file/structure?some=parameters
Output: website.com
Combining the answers of worldofjr and Alix Axel into one small function that will handle most use-cases:
function get_url_hostname($url) {
$parse = parse_url($url);
return str_ireplace('www.', '', $parse['host']);
}
get_url_hostname('http://www.google.com/example/path/file.html'); // google.com
Just use as like following ...
<?php
echo $_SERVER['SERVER_NAME'];
?>

Trim a string to get only domain name in PHP [duplicate]

I need to build a function which parses the domain from a URL.
So, with
http://google.com/dhasjkdas/sadsdds/sdda/sdads.html
or
http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html
it should return google.com
with
http://google.co.uk/dhasjkdas/sadsdds/sdda/sdads.html
it should return google.co.uk.
Check out parse_url():
$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$parse = parse_url($url);
echo $parse['host']; // prints 'google.com'
parse_url doesn't handle really badly mangled urls very well, but is fine if you generally expect decent urls.
$domain = str_ireplace('www.', '', parse_url($url, PHP_URL_HOST));
This would return the google.com for both http://google.com/... and http://www.google.com/...
From http://us3.php.net/manual/en/function.parse-url.php#93983
for some odd reason, parse_url
returns the host (ex. example.com) as
the path when no scheme is provided in
the input url. So I've written a quick
function to get the real host:
function getHost($Address) {
$parseUrl = parse_url(trim($Address));
return trim($parseUrl['host'] ? $parseUrl['host'] : array_shift(explode('/', $parseUrl['path'], 2)));
}
getHost("example.com"); // Gives example.com
getHost("http://example.com"); // Gives example.com
getHost("www.example.com"); // Gives www.example.com
getHost("http://example.com/xyz"); // Gives example.com
function get_domain($url = SITE_URL)
{
preg_match("/[a-z0-9\-]{1,63}\.[a-z\.]{2,6}$/", parse_url($url, PHP_URL_HOST), $_domain_tld);
return $_domain_tld[0];
}
get_domain('http://www.cdl.gr'); //cdl.gr
get_domain('http://cdl.gr'); //cdl.gr
get_domain('http://www2.cdl.gr'); //cdl.gr
The code that was meant to work 100% didn't seem to cut it for me, I did patch the example a little but found code that wasn't helping and problems with it. so I changed it out to a couple of functions (to save asking for the list from Mozilla all the time, and removing the cache system). This has been tested against a set of 1000 URLs and seemed to work.
function domain($url)
{
global $subtlds;
$slds = "";
$url = strtolower($url);
$host = parse_url('http://'.$url,PHP_URL_HOST);
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
foreach($subtlds as $sub){
if (preg_match('/\.'.preg_quote($sub).'$/', $host, $xyz)){
preg_match("/[^\.\/]+\.[^\.\/]+\.[^\.\/]+$/", $host, $matches);
}
}
return #$matches[0];
}
function get_tlds() {
$address = 'http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1';
$content = file($address);
foreach ($content as $num => $line) {
$line = trim($line);
if($line == '') continue;
if(#substr($line[0], 0, 2) == '/') continue;
$line = #preg_replace("/[^a-zA-Z0-9\.]/", '', $line);
if($line == '') continue; //$line = '.'.$line;
if(#$line[0] == '.') $line = substr($line, 1);
if(!strstr($line, '.')) continue;
$subtlds[] = $line;
//echo "{$num}: '{$line}'"; echo "<br>";
}
$subtlds = array_merge(array(
'co.uk', 'me.uk', 'net.uk', 'org.uk', 'sch.uk', 'ac.uk',
'gov.uk', 'nhs.uk', 'police.uk', 'mod.uk', 'asn.au', 'com.au',
'net.au', 'id.au', 'org.au', 'edu.au', 'gov.au', 'csiro.au'
), $subtlds);
$subtlds = array_unique($subtlds);
return $subtlds;
}
Then use it like
$subtlds = get_tlds();
echo domain('www.example.com') //outputs: example.com
echo domain('www.example.uk.com') //outputs: example.uk.com
echo domain('www.example.fr') //outputs: example.fr
I know I should have turned this into a class, but didn't have time.
Please consider replacring the accepted solution with the following:
parse_url() will always include any sub-domain(s), so this function doesn't parse domain names very well.
Here are some examples:
$url = 'http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$parse = parse_url($url);
echo $parse['host']; // prints 'www.google.com'
echo parse_url('https://subdomain.example.com/foo/bar', PHP_URL_HOST);
// Output: subdomain.example.com
echo parse_url('https://subdomain.example.co.uk/foo/bar', PHP_URL_HOST);
// Output: subdomain.example.co.uk
Instead, you may consider this pragmatic solution.
It will cover many, but not all domain names -- for instance, lower-level domains such as 'sos.state.oh.us' are not covered.
function getDomain($url) {
$host = parse_url($url, PHP_URL_HOST);
if(filter_var($host,FILTER_VALIDATE_IP)) {
// IP address returned as domain
return $host; //* or replace with null if you don't want an IP back
}
$domain_array = explode(".", str_replace('www.', '', $host));
$count = count($domain_array);
if( $count>=3 && strlen($domain_array[$count-2])==2 ) {
// SLD (example.co.uk)
return implode('.', array_splice($domain_array, $count-3,3));
} else if( $count>=2 ) {
// TLD (example.com)
return implode('.', array_splice($domain_array, $count-2,2));
}
}
// Your domains
echo getDomain('http://google.com/dhasjkdas/sadsdds/sdda/sdads.html'); // google.com
echo getDomain('http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html'); // google.com
echo getDomain('http://google.co.uk/dhasjkdas/sadsdds/sdda/sdads.html'); // google.co.uk
// TLD
echo getDomain('https://shop.example.com'); // example.com
echo getDomain('https://foo.bar.example.com'); // example.com
echo getDomain('https://www.example.com'); // example.com
echo getDomain('https://example.com'); // example.com
// SLD
echo getDomain('https://more.news.bbc.co.uk'); // bbc.co.uk
echo getDomain('https://www.bbc.co.uk'); // bbc.co.uk
echo getDomain('https://bbc.co.uk'); // bbc.co.uk
// IP
echo getDomain('https://1.2.3.45'); // 1.2.3.45
Finally, Jeremy Kendall's PHP Domain Parser allows you to parse the domain name from a url. League URI Hostname Parser will also do the job.
If you want extract host from string http://google.com/dhasjkdas/sadsdds/sdda/sdads.html, usage of parse_url() is acceptable solution for you.
But if you want extract domain or its parts, you need package that using Public Suffix List. Yes, you can use string functions arround parse_url(), but it will produce incorrect results sometimes.
I recomend TLDExtract for domain parsing, here is sample code that show diff:
$extract = new LayerShifter\TLDExtract\Extract();
# For 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html'
$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
parse_url($url, PHP_URL_HOST); // will return google.com
$result = $extract->parse($url);
$result->getFullHost(); // will return 'google.com'
$result->getRegistrableDomain(); // will return 'google.com'
$result->getSuffix(); // will return 'com'
# For 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html'
$url = 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html';
parse_url($url, PHP_URL_HOST); // will return 'search.google.com'
$result = $extract->parse($url);
$result->getFullHost(); // will return 'search.google.com'
$result->getRegistrableDomain(); // will return 'google.com'
You can pass PHP_URL_HOST into parse_url function as second parameter
$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$host = parse_url($url, PHP_URL_HOST);
print $host; // prints 'google.com'
Here is the code i made that 100% finds only the domain name, since it takes mozilla sub tlds to account. Only thing you have to check is how you make cache of that file, so you dont query mozilla every time.
For some strange reason, domains like co.uk are not in the list, so you have to make some hacking and add them manually. Its not cleanest solution but i hope it helps someone.
//=====================================================
static function domain($url)
{
$slds = "";
$url = strtolower($url);
$address = 'http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1';
if(!$subtlds = #kohana::cache('subtlds', null, 60))
{
$content = file($address);
foreach($content as $num => $line)
{
$line = trim($line);
if($line == '') continue;
if(#substr($line[0], 0, 2) == '/') continue;
$line = #preg_replace("/[^a-zA-Z0-9\.]/", '', $line);
if($line == '') continue; //$line = '.'.$line;
if(#$line[0] == '.') $line = substr($line, 1);
if(!strstr($line, '.')) continue;
$subtlds[] = $line;
//echo "{$num}: '{$line}'"; echo "<br>";
}
$subtlds = array_merge(Array(
'co.uk', 'me.uk', 'net.uk', 'org.uk', 'sch.uk', 'ac.uk',
'gov.uk', 'nhs.uk', 'police.uk', 'mod.uk', 'asn.au', 'com.au',
'net.au', 'id.au', 'org.au', 'edu.au', 'gov.au', 'csiro.au',
),$subtlds);
$subtlds = array_unique($subtlds);
//echo var_dump($subtlds);
#kohana::cache('subtlds', $subtlds);
}
preg_match('/^(http:[\/]{2,})?([^\/]+)/i', $url, $matches);
//preg_match("/^(http:\/\/|https:\/\/|)[a-zA-Z-]([^\/]+)/i", $url, $matches);
$host = #$matches[2];
//echo var_dump($matches);
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
foreach($subtlds as $sub)
{
if (preg_match("/{$sub}$/", $host, $xyz))
preg_match("/[^\.\/]+\.[^\.\/]+\.[^\.\/]+$/", $host, $matches);
}
return #$matches[0];
}
I'm adding this answer late since this is the answer that pops up most on Google...
You can use PHP to...
$url = "www.google.co.uk";
$host = parse_url($url, PHP_URL_HOST);
// $host == "www.google.co.uk"
to grab the host but not the private domain to which the host refers. (Example www.google.co.uk is the host, but google.co.uk is the private domain)
To grab the private domain, you must need know the list of public suffixes to which one can register a private domain. This list happens to be curated by Mozilla at https://publicsuffix.org/
The below code works when an array of public suffixes has been created already. Simply call
$domain = get_private_domain("www.google.co.uk");
with the remaining code...
// find some way to parse the above list of public suffix
// then add them to a PHP array
$suffix = [... all valid public suffix ...];
function get_public_suffix($host) {
$parts = split("\.", $host);
while (count($parts) > 0) {
if (is_public_suffix(join(".", $parts)))
return join(".", $parts);
array_shift($parts);
}
return false;
}
function is_public_suffix($host) {
global $suffix;
return isset($suffix[$host]);
}
function get_private_domain($host) {
$public = get_public_suffix($host);
$public_parts = split("\.", $public);
$all_parts = split("\.", $host);
$private = [];
for ($x = 0; $x < count($public_parts); ++$x)
$private[] = array_pop($all_parts);
if (count($all_parts) > 0)
$private[] = array_pop($all_parts);
return join(".", array_reverse($private));
}
I've found that #philfreo's solution (referenced from php.net) is pretty well to get fine result but in some cases it shows php's "notice" and "Strict Standards" message. Here a fixed version of this code.
function getHost($url) {
$parseUrl = parse_url(trim($url));
if(isset($parseUrl['host']))
{
$host = $parseUrl['host'];
}
else
{
$path = explode('/', $parseUrl['path']);
$host = $path[0];
}
return trim($host);
}
echo getHost("http://example.com/anything.html"); // example.com
echo getHost("http://www.example.net/directory/post.php"); // www.example.net
echo getHost("https://example.co.uk"); // example.co.uk
echo getHost("www.example.net"); // example.net
echo getHost("subdomain.example.net/anything"); // subdomain.example.net
echo getHost("example.net"); // example.net
function getTrimmedUrl($link)
{
$str = str_replace(["www.","https://","http://"],[''],$link);
$link = explode("/",$str);
return strtolower($link[0]);
}
$domain = parse_url($url, PHP_URL_HOST);
echo implode('.', array_slice(explode('.', $domain), -2, 2))
parse_url didn't work for me. It only returned the path. Switching to basics using php5.3+:
$url = str_replace('http://', '', strtolower( $s->website));
if (strpos($url, '/')) $url = strstr($url, '/', true);
I have edited for you:
function getHost($Address) {
$parseUrl = parse_url(trim($Address));
$host = trim($parseUrl['host'] ? $parseUrl['host'] : array_shift(explode('/', $parseUrl['path'], 2)));
$parts = explode( '.', $host );
$num_parts = count($parts);
if ($parts[0] == "www") {
for ($i=1; $i < $num_parts; $i++) {
$h .= $parts[$i] . '.';
}
}else {
for ($i=0; $i < $num_parts; $i++) {
$h .= $parts[$i] . '.';
}
}
return substr($h,0,-1);
}
All type url (www.domain.ltd, sub1.subn.domain.ltd will result to : domain.ltd.
None of this solutions worked for me when I use this test cases:
public function getTestCases(): array
{
return [
//input expected
['http://google.com/dhasjkdas', 'google.com'],
['https://google.com/dhasjkdas', 'google.com'],
['https://www.google.com/dhasjkdas', 'google.com'],
['http://www.google.com/dhasjkdas', 'google.com'],
['www.google.com/dhasjkdas', 'google.com'],
['google.com/dhasjkdas', 'google.com'],
];
}
but wrapping this answer into function worked in all cases: https://stackoverflow.com/a/65659814/5884988
This will generally work very well if the input URL is not total junk. It removes the subdomain.
$host = parse_url( $Row->url, PHP_URL_HOST );
$parts = explode( '.', $host );
$parts = array_reverse( $parts );
$domain = $parts[1].'.'.$parts[0];
Example
Input: http://www2.website.com:8080/some/file/structure?some=parameters
Output: website.com
Combining the answers of worldofjr and Alix Axel into one small function that will handle most use-cases:
function get_url_hostname($url) {
$parse = parse_url($url);
return str_ireplace('www.', '', $parse['host']);
}
get_url_hostname('http://www.google.com/example/path/file.html'); // google.com
Just use as like following ...
<?php
echo $_SERVER['SERVER_NAME'];
?>

How to get word between particular symbols in string

My source string could be:
example.com or http://example.com or www.example.com or https://example.com or http://www.example.com or https://www.example.com
or
example.abc.com or http://example.abc.com or www.example.abc.com or https://example.abc.com or http://www.example.abc.com or https://www.example.abc.com
I want the result: example
How can we do this using php string functions? or in other way?
Try this
$str = 'http://example.abc.com';
$last = explode("/", $str, 3);
$ans = explode('.',$last[2]);
echo $ans[0];
You can use parse_url
<?php
// Real full current URL, this can be useful for a lot of things
$url = 'http'.((isset($_SERVER['HTTPS']) && $_SERVER['HTTPS'] == 'on') ? 's' : '').'://'.$_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI'];
// Or you can put another url
$url = 'https://www.example.foo.biz/';
// Get the host name
$hostName = parse_url($url, PHP_URL_HOST);
// Get the first part of the host name
$host = substr($hostName, 0, strpos($hostName, '.'));
print_r($url);
print_r($hostName);
// Here is what you want
print_r($host);
?>
you can use strpos:
<?php
$url = "http://www.example.com";
/* Use any of you want.
$url = "https://example.com";
$url = "https://www.example.abc.com";
$url = "https://www.www.example.com"; */
if ($found = strpos($url,'example') !== false) {
echo "it exists";
}
?>
EDIT:
So this is what I cam up with now, using explode and substr:
$url = "http://www.example.com";
/* Use any of you want.
$url = "https://example.com";
$url = "https://www.example.abc.com";
$url = "https://www.www.example.com"; */
$exp ='example';
if ($found = strpos($url, $exp) !== false) {
echo $str = substr($url, strpos($url, $exp));
echo "<br>". "it exists" . "<br>";
$finalword = explode(".", $str);
var_dump($finalword);
}
?>

How to remove path after domain name from string

I have the following code :
function removeFilename($url)
{
$file_info = pathinfo($url);
return isset($file_info['extension'])
? str_replace($file_info['filename'] . "." . $file_info['extension'], "", $url)
: $url;
}
$url1 = "http://website.com/folder/filename.php";
$url2 = "http://website.com/folder/";
$url3 = "http://website.com/";
echo removeFilename($url1); //outputs http://website.com/folder/
echo removeFilename($url2);//outputs http://website.com/folder/
echo removeFilename($url3);//outputs http:///
Now my problem is that when there is only only a domain without folders or filenames my function removes website.com too.
My idea is there is any way on php to tell my function to do the work only after third slash or any other solutions you think useful.
UPDATED : ( working and tested )
<?php
function removeFilename($url)
{
$parse_file = parse_url($url);
$file_info = pathinfo($parse_file['path']);
return isset($file_info['extension'])
? str_replace($file_info['filename'] . "." . $file_info['extension'], "", $url)
: $url;
}
$url1 = "http://website.com/folder/filename.com";
$url2 = "http://website.org/folder/";
$url3 = "http://website.com/";
echo removeFilename($url1); echo '<br/>';
echo removeFilename($url2); echo '<br/>';
echo removeFilename($url3);
?>
Output:
http://website.com/folder/
http://website.org/folder/
http://website.com/
Sounds like you are wanting to replace a substring and not the whole thing. This function might help you:
http://php.net/manual/en/function.substr-replace.php
Since filename is at last slash you can use substr and str_replace to remove file name from path.
$PATH = "http://website.com/folder/filename.php";
$file = substr( strrchr( $PATH, "/" ), 1) ;
echo $dir = str_replace( $file, '', $PATH ) ;
OUTPUT
http://website.com/folder/
pathinfo cant recognize only domain and file name. But if without filename url is ended by slash
$a = array(
"http://website.com/folder/filename.php",
"http://website.com/folder/",
"http://website.com",
);
foreach ($a as $item) {
$item = explode('/', $item);
if (count($item) > 3)
$item[count($item)-1] ='';;
echo implode('/', $item) . "\n";
}
result
http://website.com/folder/
http://website.com/folder/
http://website.com
Close to the answer of splash58
function getPath($url) {
$item = explode('/', $url);
if (count($item) > 3) {
if (strpos($item[count($item) - 1], ".") === false) {
return $url;
}
$item[count($item)-1] ='';
return implode('/', $item);
}
return $url;
}

How to remove http://, www and slash from URL in PHP?

I need a php function which produce a pure domain name from URL. So this function must be remove http://, www and /(slash) parts from URL if these parts exists. Here is example input and outputs:
Input - > http://www.google.com/ | Output -> google.com
Input - > http://google.com/ | Output -> google.com
Input - > www.google.com/ | Output -> google.com
Input - > google.com/ | Output -> google.com
Input - > google.com | Output -> google.com
I checked parse_url function, but doesn't return what I need.
Since, I'm beginner in PHP, it was difficult for me. If you have any idea, please answer.
Thanx in advance.
$input = 'www.google.co.uk/';
// in case scheme relative URI is passed, e.g., //www.google.com/
$input = trim($input, '/');
// If scheme not included, prepend it
if (!preg_match('#^http(s)?://#', $input)) {
$input = 'http://' . $input;
}
$urlParts = parse_url($input);
// remove www
$domain = preg_replace('/^www\./', '', $urlParts['host']);
echo $domain;
// output: google.co.uk
Works correctly with all your example inputs.
$str = 'http://www.google.com/';
$str = preg_replace('#^https?://#', '', rtrim($str,'/'));
echo $str; // www.google.com
There are lots of ways grab the domain out of a url I've posted 4 ways below starting from the shortest to the longest.
#1
function urlToDomain($url) {
return implode(array_slice(explode('/', preg_replace('/https?:\/\/(www\.)?/', '', $url)), 0, 1));
}
echo urlToDomain('http://www.example.com/directory/index.php?query=true');
#2
function urlToDomain($url) {
$domain = explode('/', preg_replace('/https?:\/\/(www\.)?/', '', $url));
return $domain['0'];
}
echo urlToDomain('http://www.example.com/directory/index.php?query=true');
#3
function urlToDomain($url) {
$domain = preg_replace('/https?:\/\/(www\.)?/', '', $url);
if ( strpos($domain, '/') !== false ) {
$explode = explode('/', $domain);
$domain = $explode['0'];
}
return $domain;
}
echo urlToDomain('http://www.example.com/directory/index.php?query=true');
#4
function urlToDomain($url) {
if ( substr($url, 0, 8) == 'https://' ) {
$url = substr($url, 8);
}
if ( substr($url, 0, 7) == 'http://' ) {
$url = substr($url, 7);
}
if ( substr($url, 0, 4) == 'www.' ) {
$url = substr($url, 4);
}
if ( strpos($url, '/') !== false ) {
$explode = explode('/', $url);
$url = $explode['0'];
}
return $url;
}
echo urlToDomain('http://www.example.com/directory/index.php?query=true');
All of the functions above return the same response: example.com
Try this, it will remove what you wanted (http:://, www and trailing slash) but will retain other subdomains such as example.google.com
$host = parse_url('http://www.google.com', PHP_URL_HOST);
$host = preg_replace('/^(www\.)/i', '', $host);
Or as a one-liner:
$host = preg_replace('/^(www\.)/i', '', parse_url('http://www.google.com', PHP_URL_HOST));
if (!preg_match('/^http(s)?:\/\//', $url))
$url = 'http://' . $url;
$host = parse_url($url, PHP_URL_HOST);
$host = explode('.', strrev($host));
$host = strrev($host[1]) . '.' strrev($host[0]);
This would return second level domain, though it would be useless for say .co.uk domains, so you might want to do some more checking, and include additional parts if strrev($host[0]) is uk, au, etc.
$value = 'https://google.ca';
$result = str_ireplace('www.', '', parse_url($value, PHP_URL_HOST));
// google.ca
First way is to use one regular expression to trim unnecesary parts of URL like protocol, www and ending slash
function trimUrlProtocol($url) {
return preg_replace('/((^https?:\/\/)?(www\.)?)|(\/$)/', '', trim($url));
}
echo trimUrlProtocol('http://sandbox.onlinephpfunctions.com/') . PHP_EOL;
echo trimUrlProtocol('https://sandbox.onlinephpfunctions.com/') . PHP_EOL;
echo trimUrlProtocol('http://www.sandbox.onlinephpfunctions.com/') . PHP_EOL;
echo trimUrlProtocol('https://www.sandbox.onlinephpfunctions.com/') . PHP_EOL;
echo trimUrlProtocol('http://sandbox.onlinephpfunctions.com') . PHP_EOL;
echo trimUrlProtocol('https://sandbox.onlinephpfunctions.com') . PHP_EOL;
echo trimUrlProtocol('http://www.sandbox.onlinephpfunctions.com') . PHP_EOL;
echo trimUrlProtocol('https://www.sandbox.onlinephpfunctions.com') . PHP_EOL;
echo trimUrlProtocol('sandbox.onlinephpfunctions.com') . PHP_EOL;
By alternative way you can use parse_url, but you have to make additional cheks to check if host part exists and then use regular expression to trim www. Just use first way, it is simple and lazy.
This will account for "http/https", "www" and the ending slash
$str = 'https://www.google.com/';
$str = preg_replace('#(^https?:\/\/(w{3}\.)?)|(\/$)#', '', $str);
echo $str; // google.com
Just ask if you need help understanding the regex.
Use parse_url
http://www.php.net/manual/en/function.parse-url.php

Categories