PHP Get Subdomain But Not Actual Domain

PHP Get Subdomain But Not Actual Domain - php

I'm currently using the following to get the subdomain of my site
$subdomain = array_shift(explode(".",$_SERVER['HTTP_HOST']));
When I use this for http://www.website.com it returns "www" which is expected
However when I use this with http://website.com it returns "website" as the subdomain. How can I make absolute sure that if there is no subdomain as in that example, it returns NULL?
Thanks!

Please, note that in common case you should first apply parse_url to incoming data - and then use [host] key from it. As for your question, you can use something like this:
preg_match('/([^\.]+)\.[^\.]+\.[^\.]+$/', 'www.domain.com', $rgMatches);
//2-nd level check:
//preg_match('/([^\.]+)\.[^\.]+\.[^\.]+$/', 'domain.com', $rgMatches);
$sDomain = count($rgMatches)?$rgMatches[1]:null;
But I'm not sure that it's exactly what you need (since url can contain 4-th domain level e t.c.)

Do this:
function getSubdomain($domain) {
$expl = explode(".", $domain, -2);
$sub = "";
if(count($expl) > 0) {
foreach($expl as $key => $value) {
$sub = $sub.".".$value;
}
$sub = substr($sub, 1);
}
return $sub;
}
$subdomain = getSubdomain($_SERVER['HTTP_HOST']);
Works fine for me. Basicly you need to use the explode limit parameter.
Detail and source: phph.net - explode manual

If you have other domain, like a .net or .org etc, just change the value accordingly
$site['uri'] = explode(".", str_replace('.com', '', $_SERVER['HTTP_HOST']) );
if( count($site['uri']) >0 ) {
$site['subdomain'] = $site['uri'][0];
$site['domain'] = $site['uri'][1];
}
else {
$site['subdomain'] = null;
$site['domain'] = $site['uri'][0];
}
//For testing only:
print_r($site);
...or not (for more flexibility):
$site['uri'] = explode(".", $_SERVER['HTTP_HOST'] );
if( count($site['uri']) > 2 ) {
$site['subdomain'] = $site['uri'][0];
$site['domain'] = $site['uri'][1];
}
else {
$site['subdomain'] = null;
$site['domain'] = $site['uri'][0];
}
//For testing only:
print_r($site);

Related

Get last part of string with multiple preg_match

Spotify has two ways to use url's/identifiers. I want to get the last part of the strings below (the ID)
example url's:
a. https://play.spotify.com/artist/6mdiAmATAx73kdxrNrnlao
b. spotify:artist:6mdiAmATAx73kdxrNrnlao
Can't get the code below to work, so I can add more options to it later as well. I tried basename first, but obviously that doesn't work with ':'.
$str = "https://play.spotify.com/artist/6mdiAmATAx73kdxrNrnlao";
or:
$str = "spotify:artist:6mdiAmATAx73kdxrNrnlao";
if (
preg_match('artist/([a-zA-Z0-9]{22})/', $str, $re) ||
preg_match('artist:([a-zA-Z0-9]{22})/', $str, $re)
) {
$spotifyId = $re[1];
}
Any help is appreciated!

Try this for urls having slashes. If the Spotify string uses colons(:) simply switch the / to a : in the explode() function:
// your url
$url = "https://play.spotify.com/artist/6mdiAmATAx73kdxrNrnlao/blah/blah/blah";
// get path only
$path = parse_url($url)['path'];
// seperate by forward slash
$parts = explode('/', $path);
// go through them and find string with 22 characters
$id = '';
foreach ($parts as $key => $value) {
if (strlen($value) === 22 ) {
// found it, now store it
$id = $value;
break;
}
}
A rough sample of a helpful function would be as follows:
function getSpotifyId($spotifyUrl) {
// check for valid url
if (!filter_var($spotifyUrl, FILTER_VALIDATE_URL)) {
// split using colon
$parts = explode(':', parse_url($spotifyUrl)['path']);
} elseif (filter_var($spotifyUrl, FILTER_VALIDATE_URL)) {
// split using forward slash
$parts = explode('/', parse_url($spotifyUrl)['path']);
}
// loop through segments to find id of 22 chars
foreach ($parts as $key => $value) {
// assuming id will always be 22 characters
if (strlen($value) === 22 ) {
// found it, now return it
return $value;
}
}
return false;
}
$id1 = getSpotifyId('http://localhost/xampp/web_development/6mdiAmATAx73kdxrNrnlao/stack.php');
$id2 = getSpotifyId('spotify:artist:6mdiAmATAx73kdxrNrnlao');
$id3 = getSpotifyId('My name is tom');
results:
$id1 = '6mdiAmATAx73kdxrNrnlao'
$id2 = '6mdiAmATAx73kdxrNrnlao'
$id3 = false

How to hide part of the ip address and/or host name using `preg_replace()`?

I need to make these strings
192.168.0.1 and some.subdomain.domain.com
become
192.168.0.*** and ***.subdomain.domain.com
with preg_replace() function in one sentence.
Is it possibe?
I've tried
preg_replace('/([0123456789]$)|(^(.)[a-z][^\.])/','***',$a)
with $a as some.subdomain.domain.com and 192.168.0.1
but it seems like something wrong with my pattern.
(Optionally): and is it possible to mask these parts with asterisks as exact number of masked letters/numbers, e.g. 127.0.0.1 -> ***.0.0.1, 10.4.8.2 -> **.4.8.2, and sub.domain.com - > ***.domain.com, s.domain.com -> *.domain.com?

For the first question, preg_replace can use an array as first parameter. So you could try something like this (cannot test at the moment):
$patterns = [
'/^[a-z]+/i',
'/[0-9]+$/'
];
$newHost = preg_replace($patterns, '***', $host);
For the second question I would try something like this (not tested yet too):
$patterns = [
'/^([a-z]*)([a-z])([a-z]*)/i',
'/([0-9]*)([0-9])([0-9]*)$/'
];
$newHost = preg_replace($patterns, '\1*\3', $host);
I would test tomorrow if you don't.

well preg_replace() is a little bit slow there are better ways to do it like explode() if you want to hide the last part of the ip do something like:
<?php
$ip = "192.168.0.1";
$ip_items = explode('.', $ip);
$filtered_ip = ''; //The var to store the filtered ip
foreach($ip_items as $item) {
if($item == end($ip_items)) { //check if its the last part of the IP
$ip_part = '***';
} else {
$ip_part = $item . '.';
}
$filtered_ip .= $ip_part;
}
echo $filtered_ip;
?>
result: 192.168.0.***
and if you want to filter other parts of the ip like the first one use $ip_items[0] instead of end($ip_items)
example :
<?php
$ip = "192.168.0.1";
$ip_items = explode('.', $ip);
$filtered_ip = ''; //The var to store the filtered ip
foreach($ip_items as $item) {
if($item == $ip_items[0]) { //check if its the first part of the IP
$ip_part = '***.'; //we added the '.' to that one because its the first item
} else {
$ip_part = $item . '.';
}
$filtered_ip .= $ip_part;
}
echo $filtered_ip;
?>
result: ***.168.0.1.
EDIT: and for the second question you can use str_length to get the length and use str_repeat to repeat the character
example:
<?php
$ip = "192.168.0.1";
$ip_items = explode('.', $ip);
$filtered_ip = ''; //The var to store the filtered ip
foreach($ip_items as $item) {
if($item == end($ip_items)) { //check if its the last part of the IP
$ip_part = str_repeat("*", strlen($item)) ;
} else {
$ip_part = $item . '.';
}
$filtered_ip .= $ip_part;
}
echo $filtered_ip;
?>

Parse url to get rid of all parameters after the first

From what I understand youtube.com uses three types of urls for their video links.
http://www.youtube.com/watch?v=8uLPtmCroQ8&feature=related
http://www.youtube.com/watch?v=8uLPtmCroQ8
http://youtu.be/8uLPtmCroQ8
I get this url submitted to my site in any one of these different ways and I store the url into a custom field called $video_code. I need to strip it of any parameters that come after the id of the video so if a user submit the first url above, &feature=related gets stripped. I'm using php.

If I understand your problem correctly, You could use something like this to store the video id in the databse and then construct the url as you like.
function getVideoId($url)
{
$parsedUrl = parse_url($url);
if ($parsedUrl === false)
return false;
if (!empty($parsedUrl['query']))
{
$query = array();
parse_str($parsedUrl['query'], $query);
if (!empty($query['v']))
return $query['v'];
}
if (in_array(strtolower($parsedUrl['host']), array('youtu.be', 'www.youtu.be')))
return trim($parsedUrl['path'], '/');
return false;
}
$input = array('http://www.youtube.com/watch?v=8uLPtmCroQ8&feature=related', 'http://www.youtube.com/watch?v=8uLPtmCroQ8', 'http://youtu.be/8uLPtmCroQ8');
foreach ($input as $url)
{
echo getVideoId($url) . PHP_EOL;
}

In which language did you want to this? If it is in PHP you should look at this.

You could also do a regular expressions to split the string. Take a look here: http://www.php.net/manual/en/function.preg-split.php

Use this code:
$arr=array(
'http://www.youtube.com/watch?v=8uLPtmCroQ8&feature=related',
'http://www.youtube.com/watch?v=8uLPtmCroQ8',
'http://youtu.be/8uLPtmCroQ8',
);
for($i=0; $i<count($arr); $i++){
$urlarr = parse_url($arr[$i]);
if (!empty($urlarr['query'])) {
parse_str($urlarr['query']);
$qarr = array();
if (!empty($v))
$qarr['v'] = $v;
$urlarr['query'] = http_build_query($qarr);
$arr[$i] = http_build_url('', $urlarr);
}
}
print_r($arr);
OUTPUT:
Array
(
[0] => http://www.youtube.com/watch?v=8uLPtmCroQ8
[1] => http://www.youtube.com/watch?v=8uLPtmCroQ8
[2] => http://youtu.be/8uLPtmCroQ8
)

function getVideoCode($url){
$videoCode;
$code_parse = parse_url($url);
if(empty($code_parse["query"])){
$videoCode = str_replace("/"," ",$code_parse["path"]);
}else{
$videoCode = clearQuery($code_parse["query"]);
}
echo $videoCode;
}
function clearQuery($query){
$redundant = array("v", "&", "feature","=","related");
return str_replace($redundant," ",$query);
}
It is not a professional code but It's easy to understand.When I call like this:
getVideoCode("http://youtu.be/8uLPtmCroQ8");
getVideoCode("http://www.youtube.com/watch?v=8uLPtmCroQ8");
getVideoCode("http://www.youtube.com/watch?v=8uLPtmCroQ8&feature=related");
The Output is
8uLPtmCroQ8
8uLPtmCroQ8
8uLPtmCroQ8

PHP: building a URL path

I have a few strings to combine to build a full path. e.g.
$base = "http://foo.com";
$subfolder = "product/data";
$filename = "foo.xml";
// How to do this?
$url = append_url_parts($base, $subfolder, $filename); ???
String concatenation won't do, that would omit the necessary forward slashes.
In Win32 I'd use PathCombine() or PathAppend(), which would handle adding any necessary slashes between strings, without doubling them up. In PHP, what should I use?

Try this:
$base = "http://foo.com";
$subfolder = "product/data";
$filename = "foo.xml";
function stripTrailingSlash(&$component) {
$component = rtrim($component, '/');
}
$array = array($base, $subfolder, $filename);
array_walk_recursive($array, 'stripTrailingSlash');
$url = implode('/', $array);

when it comes down to something like this I like to use a special function with unlimited parameters.
define('BASE_URL','http://mysite.com'); //Without last slash
function build_url()
{
return BASE_URL . '/' . implode(func_get_args(),'/');
}
OR
function build_url()
{
$Path = BASE_URL;
foreach(func_get_args() as $path_part)
{
$Path .= '/' . $path_part;
}
return $Path;
}
So that when I use the function I can do
echo build_url('home'); //http://mysite.com/home
echo build_url('public','css','style.css'); //http://mysite.com/public/css/style.css
echo build_url('index.php'); //http://mysite.com/index.php
hope this helps you, works really well for me especially within an Framework Environment.
to use with params you can append the url like so for simplicity.
echo build_url('home') . '?' . http_build_query(array('hello' => 'world'));
Would produce: http://mysite.com/home?hello=world

not sure why you say string concat won't do, because something like this is basically similar to a string concat. (untested semi-pseudo)
function append_url_parts($base, $subf, $file) {
$url = sprintf("%s%s%s", $base, (($subf)? "/$subf": ""), (($file)? "/$file": ""));
return $url;
}
with string concat, we'd have to write a slightly longer block like so:
function append_url_parts($base, $subf, $file) {
$subf = ($subf)? "/$subf": "";
$file = ($file)? "/$file": "";
$url = "$base$subf$file";
return $url;
}

I usually go simple:
<?
$url = implode('/', array($base, $subfolder, $filename));
Either that or use a framework, and then use whatever route system it has.

There are a few considerations first.
Are you interested in getting the current path of the script or some other path?
How flexible do you need this to be? Is it something that is going to change all the time? Is it something an admin will set once and forget?
You want to be careful not to include the slash bug where your document has a slash added at the end because you were too lazy to figure out how to separate directory vars from the file var. There will only be one file and one base per URL and unknown number of directories in each path, right? :)

If you want to make sure there are no duplicate slashes within the resultant path, I like this little function...simply pass it an array of path part you want combined and it will return a formatted path - no need to worry whether any of the parts contain a slash alerady or not:
function build_url($arr)
{
foreach ( $arr as $path ) $url[] = rtrim ( $path, '/' );
return implode( $url, '/' );
}
This should work on all versions of PHP too.

Not my code, but a handy function which takes an absolute URL and a relative URL and combines the two to make a new absolute path.
The function has been modified to ignore an absolute URL passed as relative ( basically anything that includes a schema ).
$url = "http://www.goat.com/money/dave.html";
$rel = "../images/cheese.jpg";
$com = InternetCombineURL($url,$rel);
public function InternetCombineUrl($absolute, $relative) {
$p = parse_url($relative);
if(isset($p["scheme"]))return $relative;
extract(parse_url($absolute));
$path = dirname($path);
if($relative{0} == '/') {
$cparts = array_filter(explode("/", $relative));
}
else {
$aparts = array_filter(explode("/", $path));
$rparts = array_filter(explode("/", $relative));
$cparts = array_merge($aparts, $rparts);
foreach($cparts as $i => $part) {
if($part == '.') {
$cparts[$i] = null;
}
if($part == '..') {
$cparts[$i - 1] = null;
$cparts[$i] = null;
}
}
$cparts = array_filter($cparts);
}
$path = implode("/", $cparts);
$url = "";
if($scheme) {
$url = "$scheme://";
}
if(isset($user)) {
$url .= "$user";
if($pass) {
$url .= ":$pass";
}
$url .= "#";
}
if($host) {
$url .= "$host/";
}
$url .= $path;
return $url;
}

I wrote this function for all cases to combine url parts with no duplicate slashes.
It accepts many arguments or an array of parts.
Some parts may be empty strings, that does not produce double slashes.
It keeps starting and ending slashes if they are present.
function implodePath($parts)
{
if (!is_array($parts)) {
$parts = func_get_args();
if (count($parts) < 2) {
throw new \RuntimeException('implodePath() should take array as a single argument or more than one argument');
}
} elseif (count($parts) == 0) {
return '';
} elseif (count($parts) == 1) {
return $parts[0];
}
$resParts = [];
$first = array_shift($parts);
if ($first === '/') {
$resParts[] = ''; // It will keep one starting slash
} else {
// It may be empty or have some letters
$first = rtrim($first, '/');
if ($first !== '') {
$resParts[] = $first;
}
}
$last = array_pop($parts);
foreach ($parts as $part) {
$part = trim($part, '/');
if ($part !== '') {
$resParts[] = $part;
}
}
if ($last === '/') {
$resParts[] = ''; // To keep trailing slash
} else {
$last = ltrim($last, '/');
if ($last !== '') {
$resParts[] = $last; // Adding last part if not empty
}
}
return implode('/', $resParts);
}
Here is a check list from unit test. Left array is input and right part is result string.
[['/www/', '/eee/'], '/www/eee/'],
[['/www', 'eee/'], '/www/eee/'],
[['www', 'eee'], 'www/eee'],
[['www', ''], 'www'],
[['www', '/'], 'www/'],
[['/www/', '/aaa/', '/eee/'], '/www/aaa/eee/'],
[['/www', 'aaa/', '/eee/'], '/www/aaa/eee/'],
[['/www/', '/aaa/', 'eee/'], '/www/aaa/eee/'],
[['/www', 'aaa', 'eee/'], '/www/aaa/eee/'],
[['/www/', '/aaa/'], '/www/aaa/'],
[['/www', 'aaa/'], '/www/aaa/'],
[['/www/', 'aaa/'], '/www/aaa/'],
[['/www', '/aaa/'], '/www/aaa/'],
[['/www', '', 'eee/'], '/www/eee/'],
[['www/', '/aaa/', '/eee'], 'www/aaa/eee'],
[['/www/', '/aaa', ''], '/www/aaa'],
[['', 'aaa/', '/eee/'], 'aaa/eee/'],
[['', '', ''], ''],
[['aaa', '', '/'], 'aaa/'],
[['aaa', '/', '/'], 'aaa/'],
[['/', 'www', '/'], '/www/'],
It can be used as implodePath('aaa', 'bbb') or implodePath(['aaa', 'bbb'])

Get domain name (not subdomain) in php

I have a URL which can be any of the following formats:
http://example.com
https://example.com
http://example.com/foo
http://example.com/foo/bar
www.example.com
example.com
foo.example.com
www.foo.example.com
foo.bar.example.com
http://foo.bar.example.com/foo/bar
example.net/foo/bar
Essentially, I need to be able to match any normal URL. How can I extract example.com (or .net, whatever the tld happens to be. I need this to work with any TLD.) from all of these via a single regex?

Well you can use parse_url to get the host:
$info = parse_url($url);
$host = $info['host'];
Then, you can do some fancy stuff to get only the TLD and the Host
$host_names = explode(".", $host);
$bottom_host_name = $host_names[count($host_names)-2] . "." . $host_names[count($host_names)-1];
Not very elegant, but should work.
If you want an explanation, here it goes:
First we grab everything between the scheme (http://, etc), by using parse_url's capabilities to... well.... parse URL's. :)
Then we take the host name, and separate it into an array based on where the periods fall, so test.world.hello.myname would become:
array("test", "world", "hello", "myname");
After that, we take the number of elements in the array (4).
Then, we subtract 2 from it to get the second to last string (the hostname, or example, in your example)
Then, we subtract 1 from it to get the last string (because array keys start at 0), also known as the TLD
Then we combine those two parts with a period, and you have your base host name.

It is not possible to get the domain name without using a TLD list to compare with as their exist many cases with completely the same structure and length:
nas.db.de (Subdomain)
bbc.co.uk (Top-Level-Domain)
www.uk.com (Subdomain)
big.uk.com (Second-Level-Domain)
Mozilla's public suffix list should be the best option as it is used by all major browsers:
https://publicsuffix.org/list/public_suffix_list.dat
Feel free to use my function:
function tld_list($cache_dir=null) {
// we use "/tmp" if $cache_dir is not set
$cache_dir = isset($cache_dir) ? $cache_dir : sys_get_temp_dir();
$lock_dir = $cache_dir . '/public_suffix_list_lock/';
$list_dir = $cache_dir . '/public_suffix_list/';
// refresh list all 30 days
if (file_exists($list_dir) && #filemtime($list_dir) + 2592000 > time()) {
return $list_dir;
}
// use exclusive lock to avoid race conditions
if (!file_exists($lock_dir) && #mkdir($lock_dir)) {
// read from source
$list = #fopen('https://publicsuffix.org/list/public_suffix_list.dat', 'r');
if ($list) {
// the list is older than 30 days so delete everything first
if (file_exists($list_dir)) {
foreach (glob($list_dir . '*') as $filename) {
unlink($filename);
}
rmdir($list_dir);
}
// now set list directory with new timestamp
mkdir($list_dir);
// read line-by-line to avoid high memory usage
while ($line = fgets($list)) {
// skip comments and empty lines
if ($line[0] == '/' || !$line) {
continue;
}
// remove wildcard
if ($line[0] . $line[1] == '*.') {
$line = substr($line, 2);
}
// remove exclamation mark
if ($line[0] == '!') {
$line = substr($line, 1);
}
// reverse TLD and remove linebreak
$line = implode('.', array_reverse(explode('.', (trim($line)))));
// we split the TLD list to reduce memory usage
touch($list_dir . $line);
}
fclose($list);
}
#rmdir($lock_dir);
}
// repair locks (should never happen)
if (file_exists($lock_dir) && mt_rand(0, 100) == 0 && #filemtime($lock_dir) + 86400 < time()) {
#rmdir($lock_dir);
}
return $list_dir;
}
function get_domain($url=null) {
// obtain location of public suffix list
$tld_dir = tld_list();
// no url = our own host
$url = isset($url) ? $url : $_SERVER['SERVER_NAME'];
// add missing scheme ftp:// http:// ftps:// https://
$url = !isset($url[5]) || ($url[3] != ':' && $url[4] != ':' && $url[5] != ':') ? 'http://' . $url : $url;
// remove "/path/file.html", "/:80", etc.
$url = parse_url($url, PHP_URL_HOST);
// replace absolute domain name by relative (http://www.dns-sd.org/TrailingDotsInDomainNames.html)
$url = trim($url, '.');
// check if TLD exists
$url = explode('.', $url);
$parts = array_reverse($url);
foreach ($parts as $key => $part) {
$tld = implode('.', $parts);
if (file_exists($tld_dir . $tld)) {
return !$key ? '' : implode('.', array_slice($url, $key - 1));
}
// remove last part
array_pop($parts);
}
return '';
}
What it makes special:
it accepts every input like URLs, hostnames or domains with- or without scheme
the list is downloaded row-by-row to avoid high memory usage
it creates a new file per TLD in a cache folder so get_domain() only needs to check through file_exists() if it exists so it does not need to include a huge database on every request like TLDExtract does it.
the list will be automatically updated every 30 days
Test:
$urls = array(
'http://www.example.com',// example.com
'http://subdomain.example.com',// example.com
'http://www.example.uk.com',// example.uk.com
'http://www.example.co.uk',// example.co.uk
'http://www.example.com.ac',// example.com.ac
'http://example.com.ac',// example.com.ac
'http://www.example.accident-prevention.aero',// example.accident-prevention.aero
'http://www.example.sub.ar',// sub.ar
'http://www.congresodelalengua3.ar',// congresodelalengua3.ar
'http://congresodelalengua3.ar',// congresodelalengua3.ar
'http://www.example.pvt.k12.ma.us',// example.pvt.k12.ma.us
'http://www.example.lib.wy.us',// example.lib.wy.us
'com',// empty
'.com',// empty
'http://big.uk.com',// big.uk.com
'uk.com',// empty
'www.uk.com',// www.uk.com
'.uk.com',// empty
'stackoverflow.com',// stackoverflow.com
'.foobarfoo',// empty
'',// empty
false,// empty
' ',// empty
1,// empty
'a',// empty
);
Recent version with explanations (German):
http://www.programmierer-forum.de/domainnamen-ermitteln-t244185.htm

My solution in https://gist.github.com/pocesar/5366899
and the tests are here http://codepad.viper-7.com/GAh1tP
It works with any TLD, and hideous subdomain patterns (up to 3 subdomains).
There's a test included with many domain names.
Won't paste the function here because of the weird indentation for code in StackOverflow (could have fenced code blocks like github)

echo getDomainOnly("http://example.com/foo/bar");
function getDomainOnly($host){
$host = strtolower(trim($host));
$host = ltrim(str_replace("http://","",str_replace("https://","",$host)),"www.");
$count = substr_count($host, '.');
if($count === 2){
if(strlen(explode('.', $host)[1]) > 3) $host = explode('.', $host, 2)[1];
} else if($count > 2){
$host = getDomainOnly(explode('.', $host, 2)[1]);
}
$host = explode('/',$host);
return $host[0];
}

I recommend using TLDExtract library for all operations with domain name.

I think the best way to handle this problem is:
$second_level_domains_regex = '/\.asn\.au$|\.com\.au$|\.net\.au$|\.id\.au$|\.org\.au$|\.edu\.au$|\.gov\.au$|\.csiro\.au$|\.act\.au$|\.nsw\.au$|\.nt\.au$|\.qld\.au$|\.sa\.au$|\.tas\.au$|\.vic\.au$|\.wa\.au$|\.co\.at$|\.or\.at$|\.priv\.at$|\.ac\.at$|\.avocat\.fr$|\.aeroport\.fr$|\.veterinaire\.fr$|\.co\.hu$|\.film\.hu$|\.lakas\.hu$|\.ingatlan\.hu$|\.sport\.hu$|\.hotel\.hu$|\.ac\.nz$|\.co\.nz$|\.geek\.nz$|\.gen\.nz$|\.kiwi\.nz$|\.maori\.nz$|\.net\.nz$|\.org\.nz$|\.school\.nz$|\.cri\.nz$|\.govt\.nz$|\.health\.nz$|\.iwi\.nz$|\.mil\.nz$|\.parliament\.nz$|\.ac\.za$|\.gov\.za$|\.law\.za$|\.mil\.za$|\.nom\.za$|\.school\.za$|\.net\.za$|\.co\.uk$|\.org\.uk$|\.me\.uk$|\.ltd\.uk$|\.plc\.uk$|\.net\.uk$|\.sch\.uk$|\.ac\.uk$|\.gov\.uk$|\.mod\.uk$|\.mil\.uk$|\.nhs\.uk$|\.police\.uk$/';
$domain = $_SERVER['HTTP_HOST'];
$domain = explode('.', $domain);
$domain = array_reverse($domain);
if (preg_match($second_level_domains_regex, $_SERVER['HTTP_HOST']) {
$domain = "$domain[2].$domain[1].$domain[0]";
} else {
$domain = "$domain[1].$domain[0]";
}

$onlyHostName = implode('.', array_slice(explode('.', parse_url($link, PHP_URL_HOST)), -2));
Using https://subdomain.domain.com/some/path as example
parse_url($link, PHP_URL_HOST) returns subdomain.domain.com
explode('.', parse_url($link, PHP_URL_HOST)) then breaks subdomain.domain.com into an array:
array(3) {
[0]=>
string(5) "subdomain"
[1]=>
string(7) "domain"
[2]=>
string(3) "com"
}
array_slice then slices the array so only the last 2 values are in the array (signified by the -2):
array(2) {
[0]=>
string(6) "domain"
[1]=>
string(3) "com"
}
implode then combines those two array values back together, ultimately giving you the result of domain.com
Note: this will only work when end domain you're expecting only has one . in it, like something.domain.com or else.something.domain.net
It will not work for something.domain.co.uk where you would expect domain.co.uk

There are two ways to extract subdomain from a host:
The first method that is more accurate is to use a database of tlds (like public_suffix_list.dat) and match domain with it. This is a little heavy in some cases. There are some PHP classes for using it like php-domain-parser and TLDExtract.
The second way is not as accurate as the first one, but is very fast and it can give the correct answer in many case, I wrote this function for it:
function get_domaininfo($url) {
// regex can be replaced with parse_url
preg_match("/^(https|http|ftp):\/\/(.*?)\//", "$url/" , $matches);
$parts = explode(".", $matches[2]);
$tld = array_pop($parts);
$host = array_pop($parts);
if ( strlen($tld) == 2 && strlen($host) <= 3 ) {
$tld = "$host.$tld";
$host = array_pop($parts);
}
return array(
'protocol' => $matches[1],
'subdomain' => implode(".", $parts),
'domain' => "$host.$tld",
'host'=>$host,'tld'=>$tld
);
}
Example:
print_r(get_domaininfo('http://mysubdomain.domain.co.uk/index.php'));
Returns:
Array
(
[protocol] => https
[subdomain] => mysubdomain
[domain] => domain.co.uk
[host] => domain
[tld] => co.uk
)

Here's a function I wrote to grab the domain without subdomain(s), regardless of whether the domain is using a ccTLD or a new style long TLD, etc... There is no lookup or huge array of known TLDs, and there's no regex. It can be a lot shorter using the ternary operator and nesting, but I expanded it for readability.
// Per Wikipedia: "All ASCII ccTLD identifiers are two letters long,
// and all two-letter top-level domains are ccTLDs."
function topDomainFromURL($url) {
$url_parts = parse_url($url);
$domain_parts = explode('.', $url_parts['host']);
if (strlen(end($domain_parts)) == 2 ) {
// ccTLD here, get last three parts
$top_domain_parts = array_slice($domain_parts, -3);
} else {
$top_domain_parts = array_slice($domain_parts, -2);
}
$top_domain = implode('.', $top_domain_parts);
return $top_domain;
}

function getDomain($url){
$pieces = parse_url($url);
$domain = isset($pieces['host']) ? $pieces['host'] : '';
if(preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)){
return $regs['domain'];
}
return FALSE;
}
echo getDomain("http://example.com"); // outputs 'example.com'
echo getDomain("http://www.example.com"); // outputs 'example.com'
echo getDomain("http://mail.example.co.uk"); // outputs 'example.co.uk'

I had problems with the solution provided by pocesar.
When I would use for instance subdomain.domain.nl it would not return domain.nl. Instead it would return subdomain.domain.nl
Another problem was that domain.com.br would return com.br
I am not sure but i fixed these issues with the following code (i hope it will help someone, if so I am a happy man):
function get_domain($domain, $debug = false){
$original = $domain = strtolower($domain);
if (filter_var($domain, FILTER_VALIDATE_IP)) {
return $domain;
}
$debug ? print('<strong style="color:green">»</strong> Parsing: '.$original) : false;
$arr = array_slice(array_filter(explode('.', $domain, 4), function($value){
return $value !== 'www';
}), 0); //rebuild array indexes
if (count($arr) > 2){
$count = count($arr);
$_sub = explode('.', $count === 4 ? $arr[3] : $arr[2]);
$debug ? print(" (parts count: {$count})") : false;
if (count($_sub) === 2){ // two level TLD
$removed = array_shift($arr);
if ($count === 4){ // got a subdomain acting as a domain
$removed = array_shift($arr);
}
$debug ? print("<br>\n" . '[*] Two level TLD: <strong>' . join('.', $_sub) . '</strong> ') : false;
}elseif (count($_sub) === 1){ // one level TLD
$removed = array_shift($arr); //remove the subdomain
if (strlen($arr[0]) === 2 && $count === 3){ // TLD domain must be 2 letters
array_unshift($arr, $removed);
}elseif(strlen($arr[0]) === 3 && $count === 3){
array_unshift($arr, $removed);
}else{
// non country TLD according to IANA
$tlds = array(
'aero',
'arpa',
'asia',
'biz',
'cat',
'com',
'coop',
'edu',
'gov',
'info',
'jobs',
'mil',
'mobi',
'museum',
'name',
'net',
'org',
'post',
'pro',
'tel',
'travel',
'xxx',
);
if (count($arr) > 2 && in_array($_sub[0], $tlds) !== false){ //special TLD don't have a country
array_shift($arr);
}
}
$debug ? print("<br>\n" .'[*] One level TLD: <strong>'.join('.', $_sub).'</strong> ') : false;
}else{ // more than 3 levels, something is wrong
for ($i = count($_sub); $i > 1; $i--){
$removed = array_shift($arr);
}
$debug ? print("<br>\n" . '[*] Three level TLD: <strong>' . join('.', $_sub) . '</strong> ') : false;
}
}elseif (count($arr) === 2){
$arr0 = array_shift($arr);
if (strpos(join('.', $arr), '.') === false && in_array($arr[0], array('localhost','test','invalid')) === false){ // not a reserved domain
$debug ? print("<br>\n" .'Seems invalid domain: <strong>'.join('.', $arr).'</strong> re-adding: <strong>'.$arr0.'</strong> ') : false;
// seems invalid domain, restore it
array_unshift($arr, $arr0);
}
}
$debug ? print("<br>\n".'<strong style="color:gray">«</strong> Done parsing: <span style="color:red">' . $original . '</span> as <span style="color:blue">'. join('.', $arr) ."</span><br>\n") : false;
return join('.', $arr);
}

Here's one that works for all domains, including those with second level domains like "co.uk"
function strip_subdomains($url){
# credits to gavingmiller for maintaining this list
$second_level_domains = file_get_contents("https://raw.githubusercontent.com/gavingmiller/second-level-domains/master/SLDs.csv");
# presume sld first ...
$possible_sld = implode('.', array_slice(explode('.', $url), -2));
# and then verify it
if (strpos($second_level_domains, $possible_sld)){
return implode('.', array_slice(explode('.', $url), -3));
} else {
return implode('.', array_slice(explode('.', $url), -2));
}
}
Looks like there's a duplicate question here: delete-subdomain-from-url-string-if-subdomain-is-found

Very late, I see that you marked regex as a keyword and my function works like a charm, so far I haven't found a url that fails:
function get_domain_regex($url){
$pieces = parse_url($url);
$domain = isset($pieces['host']) ? $pieces['host'] : '';
if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
return $regs['domain'];
}else{
return false;
}
}
if you want one without regex I have this one, which I am sure I also took from this post
function get_domain($url){
$parseUrl = parse_url($url);
$host = $parseUrl['host'];
$host_array = explode(".", $host);
$domain = $host_array[count($host_array)-2] . "." . $host_array[count($host_array)-1];
return $domain;
}
They both work amazing, BUT, this took me a while to realize if the url doesn't start with http:// or https:// it will fail so make sure the url string starts with the protocol.

Simply try this:
preg_match('/(www.)?([^.]+\.[^.]+)$/', $yourHost, $matches);
echo "domain name is: {$matches[0]}\n";
this working for majority of domains.

This function will return the domain name without the extension of any url given even if you parse a url without the http:// or https://
You can extend this code
(?:\.co)?(?:\.com)?(?:\.gov)?(?:\.net)?(?:\.org)?(?:\.id)?
with more extensions if you want to handle more second level domainnames.
function get_domain_name($url){
$pieces = parse_url($url);
$domain = isset($pieces['host']) ? $pieces['host'] : $url;
$domain = strtolower($domain);
$domain = preg_replace('/.international$/', '.com', $domain);
if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,90}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
if (preg_match('/(.*?)((?:\.co)?(?:\.com)?(?:\.gov)?(?:\.net)?(?:\.org)?(?:\.id)?(?:\.asn)?.[a-z]{2,6})$/i', $regs['domain'], $matches)) {
return $matches[1];
}else return $regs['domain'];
}else{
return $url;
}
}

I'm using this to achieve the same target and it always works, I hope it will help others.
$url = https://use.fontawesome.com/releases/v5.11.2/css/all.css?ver=2.7.5
$handle = pathinfo( parse_url( $url )['host'] )['filename'];
$final_handle = substr( $handle , strpos( $handle , '.' ) + 1 );
print_r($final_handle); // fontawesome

Simplest solution
#preg_replace('#\/(.)*#', '', #preg_replace('#^https?://(www.)?#', '', $url))

Simply try this:
<?php
$host = $_SERVER['HTTP_HOST'];
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
echo "domain name is: {$matches[0]}\n";
?>

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

PHP Get Subdomain But Not Actual Domain - php

Related

Get last part of string with multiple preg_match

How to hide part of the ip address and/or host name using `preg_replace()`?

Parse url to get rid of all parameters after the first

PHP: building a URL path

Get domain name (not subdomain) in php

Categories

Resources