Get subdomain if any - php

Is there any predefined method in PHP to get sub-domain from url if any?
url pattern may be:
http://www.sd.domain.com
http://domain.com
http://sd.domain.com
http://domain.com
where sd stands for sub-doamin.
Now method must return different values for every case:
case 1 -> return sd
case 2 -> return false or empty
case 3 -> return sd
case 4 -> return false or empty
I found some good links
PHP function to get the subdomain of a URL
Get subdomain from url?
but not specifically apply on my cases.
Any help will be most appreciable.
Thanks

Okay, here I create a script :)
$url = $_SERVER['HTTP_HOST'];
$host = explode('.', $url);
if( !empty($host[0]) && $host[0] != 'www' && $host[0] != 'localhost' ){
$domain = $host[0];
}else{
$domain = 'home';
}

So, there are several possibilities...
First, regular expressions of course:
(http://)?(www\.)?([^\.]*?)\.?([^\.]+)\.([^\.]+)
The entry in the third parenthesis will be your subdomain. Of course, if your url would be https:// or www2 (seen it all...) the regex would break. So this is just a first draft to start working with.
My second idea is, just as yours, explodeing the url. I thought of something like this:
function getSubdomain($url) {
$parts = explode('.', str_replace('http://', '', $url));
if(count($parts) >= 3) {
return $parts[count($parts) - 3];
}
return null;
}
My idea behind this function was, that if an url is splitted by . the subdomain will almost always be the third last entry in the resulting array. The protocol has to be stripped first (see case 3). Of course, this certainly can be done more elegant.
I hope I could give you some ideas.

Try this.
[update] We have a constant defined _SITE_ADDRESS such as www.mysite.com you could use a literal for this.
It works well in our system for what seems like that exact purpose.
public static function getSubDomain()
{
if($_SERVER["SERVER_NAME"] == str_ireplace('http://','',_SITE_ADDRESS)) return ''; //base domain
$host = str_ireplace(array("www.", _SITE_ADDRESS), "", strtolower(trim($_SERVER["HTTP_HOST"])));
$sub = preg_replace('/\..*/', '', $host);
if($sub == $host) return ''; //this is likely an ip address
return $sub;
}
There is an external note on that function but no link, So sorry to any original developer who's code this is based on.

Related

Modify function getdomain, would need it without subdomain in PHP

I would need help with my code:
I have a function which only replaces thee www. with a blank space.
For example:
If I add the url: www.testek.com
The user will see testek.com
But if I add the url: s.dada.testek.com
The user will see s.dada.testek.com
So if we use the domain s.dada.testek.com I would like that the end user sees only testek.com.
But I would like to get only the main domain without any subdomains.
Code:
function getdomain($url){
$parsed = parse_url($url);
return str_replace('www.','', strtolower($parsed['host']));
}
I saw a post but it won't work for me.
Thanks for the help!
Now I've changed the code to:
function getdomain($url){
$parsed = parse_url($url);
$bits = explode(".",$parsed["host"]);
$mainDomain = array_filter($bits, function ($i) use ($bits) {
return $i >= count($bits)-2;
}, array(
'www.rover.ebay.com' => 'ebay.com',
's.click.aliexpress.com' => 'aliexpress.com', );
return implode(".", $mainDomain);
}
Am I thinking the right way?
Because now the end user sees like this:
http://i.stack.imgur.com/JddKB.jpg
If you simply want to get the last 2 segments of a URL main domain name then you can do the following:
function getdomain($url){
$parsed = parse_url($url);
$bits = explode(".",$parsed["host"]);
$mainDomain = array_filter($bits, function ($i) use ($bits) {
return $i >= count($bits)-2;
}, ARRAY_FILTER_USE_KEY );
return implode(".", $mainDomain);
}
See how it works in https://eval.in/636860
Unfortunately most of the times there's no "catch all" solution and you have to do a lot of hard-coded things. e.g. the UK has .co.uk but France just .fr so depending on that you may need the last 3 or even 4 segments.
I've fixed it like this:
function getdomain($url){
$parsed = parse_url($url);
$replace = array ("rover.", "www.", "s.click.");
return str_replace($replace,'', strtolower($parsed['host']));
}
I've created an array with the "subdomains" which I don't want to be shown.
And now it works ok.
apokryfos thanks for your support and for opening my mind :)

regex to create link from url and strip www

I have a PHP function which takes a passed url and creates a clean link. It puts the full link in the anchor tags and presents just "www.domain.com" from the url. It works well but I would like to modify it so it strips out the "www." part as well.
<?php
// pass a url like: http://www.yelp.com/biz/my-business-name
// should return: yelp.com
function formatURL($url, $target=FALSE) {
if ($target) { $anchor_tag = "\\4"; }
else { $anchor_tag = "\\4"; }
$return_link = preg_replace("`(http|ftp)+(s)?:(//)((\w|\.|\-|_)+)(/)?(\S+)?`i", $anchor_tag, $url);
return $return_link;
}
?>
My regex skills are not that strong so any help greatly appreciated.
Take a look at parse_url: http://us2.php.net/manual/en/function.parse-url.php
This will simplify your logic quite a bit can can make replacing the www. a simple string replace.
$link = 'http://www.yelp.com/biz/my-business-name';
$hostname = parse_url($link, PHP_URL_HOST));
if(strpos($hostname, 'www.') === 0)
{
$hostname = substr($hostname, 4);
}
I have modified my original answer to account for the issue in the comments. The preg_replace in the post below will also work and is a bit more concise, I will leave this here to show an alternative solution that does not require invoking the regex engine if you desire.
This will get your the Domain name minus the www :
$url = preg_replace('/^www./', '', parse_url($url, PHP_URL_HOST));
^ in the regex means only remove www from the start of the string
Working example : http://codepad.org/FTNikw8g

PHP router without regular expressions

I have been working on a fancy router/dispatcher class for weeks now trying to decide how I wanted it, I got it perfect IMO except performance is not what I am wanting from it. It uses a route map arrap = /forums/viewthread/:id/:page => 'forums/viewthread/(?\d+)' and loops through my map array with regex to get a match, I am trying to get something better on a high traffic site, here is a start...
$uri = "forum/viewforum/id-522/page-3";
$parts = explode("/", $uri);
$controller = $parts['0'];
$method = $parts['1'];
if($parts['2'] != ''){
$idNumber = $parts['2'];
}
if($parts['3'] != ''){
$pageNumber = $parts['3'];
}
Where I need help is sometime an id and a page will not be present sometime one or the other and sometimes both, so obvioulsy my above code would not cover that, it assumes array item 2 is always the id and 3 is always the page, could someone show me a practical way of matchting up the page and id to a variable only if they exist in the URI and without using regular expressions?
You can see what I have so far on my regular expressions versions in this question Is this a good way to match URI to class/method in PHP for MVC
This seems more extendable:
$parts = explode("/", $uri);
$parts_count=count($parts);
//set default values
$page_info=array('id'=>0,'page'=>0);
for($i=2;$i<$parts_count;$i++) {
if(strpos($parts[$i],'-')!==FALSE) {
list($info_type,$info_val)=explode('-',$parts[$i],2);
if(isset($page_info[$info_type])) {
$page_info[$info_type]=(int)$info_val;
}
}
}
then just use $page_info values. You can easily add other values this way and more levels of '/'.
if ( ! empty($parts['2']))
{
if (strpos($parts['2'], 'id-') !== FALSE)
{
$idNumber = str_replace('id-', '', $parts['2']);
}
elseif (strpos($parts['2'], 'page-') !== FALSE)
{
$pageNumber = str_replace('id-', '', $parts['2']);
}
}
And do the same for $part[3]

clean the url in php

I am trying to make a user submit link box. I've been trying all day and can't seem to get it working.
The goal is to make all of these into example.com... (ie. remove all stuff before the top level domain)
Input is $url =
Their are 4 types of url:
www.example.com...
example.com...
http://www.example.com...
http://example.com...
Everything I make works on 1 or 2 types, but not all 4.
How one can do this?
You can use parse_url for that. For example:
function parse($url) {
$parts = parse_url($url);
if ($parts === false) {
return false;
}
return isset($parts['scheme'])
? $parts['host']
: substr($parts['path'], 0, strcspn($parts['path'], '/'));
}
This will leave the "www." part if it already exists, but it's trivial to cut that out with e.g. str_replace. If the url you give it is seriously malformed, it will return false.
Update (an improved solution):
I realized that the above would not work correctly if you try to trick it hard enough. So instead of whipping myself trying to compensate if it does not have a scheme, I realized that this would be better:
function parse($url) {
$parts = parse_url($url);
if ($parts === false) {
return false;
}
if (!isset($parts['scheme'])) {
$parts = parse_url('http://'.$url);
}
if ($parts === false) {
return false;
}
return $parts['host'];
}
Your input can be
www.example.com
example.com
http://www.example.com
http://example.com
$url_arr = parse_url($url);
echo $url_arr['host'];
output is example.com
there's a few steps you can take to get a clean url.
Firstly you need to make sure there is a protocol to make parse_url work correctly so you can do:
//Make sure it has a protocol
if(substr($url,0,7) != 'http://' || substr($url,0,8) != 'https://')
{
$url = 'http://' . $url;
}
Now we run it through parse_url()
$segments = parse_url($url);
But this is where it get's complicated because the way domain names are constructed is that you can have 1,2,3,4,5,6 .. .domain levels, meaning that you cannot detect the domain name from all urls, you have to have a pre compiled list of tld's to check the last portion of the domain, so you then can extract that leaving the website's domain.
There is a list available here : http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1
But you would be better of parsing this list into mysql and then select the row where the tld matches the left side of the domain string.
Then you order by length and limit to 1, if this is found then you can do something like:
$db_found_tld = 'co.uk';
$domain = 'a.b.c.domain.co.uk';
$domain_name = substr($domain,0 - strlen($db_found_tld));
This would leave a.b.c.domain, so you have removed the tld, now the domain name would be extracted like so:
$parts = explode($domain_name);
$base_domain = $parts[count($parts) - 1];
now you have domain.
this seems very lengthy but I hope now you know that its not easy to get just the domain name without tld or sub domains.

URL parse function

Given this variable:
$variable = foo.com/bar/foo
What function would trim $variable to foo.com ?
Edit: I would like the function to be able to trim anything on a URL that could possibly come after the domain name.
Thanks in advance,
John
Working for OP:
$host = parse_url($url, PHP_URL_HOST);
The version of PHP I have to work with doesn't accept two parameters (Zend Engine 1.3.0). Whatever. Here's the working code for me - you do have to have the full URL including the scheme (http://). If you can safely assume that the scheme is http:// (and not https:// or something else), you could just prepend that to get what you need.
Working for me:
$url = 'http://foo.com/bar/foo';
$parts = parse_url($url);
$host = $parts['host'];
echo "The host is $host\n";
I'm using http://www.google.com/asdf in my example
If you're fine with getting the subdomain as well, you could split by "//" and take the 1th element to effectively remove the protocol and get www.google.com/asdf
You can then split by "/" and get the 0th element.
That seems ugly. Just brainstorming here =)
Try this:
function getDomain($url)
{
if(filter_var($url, FILTER_VALIDATE_URL, FILTER_FLAG_HOST_REQUIRED) === FALSE)
{
return false;
}
/*** get the url parts ***/
$parts = parse_url($url);
/*** return the host domain ***/
return $parts['scheme'].'://'.$parts['host'];
}
$variable = 'foo.com/bar/foo';
echo getDomain($variable);
You can use php's parse_url function and then access the value of the key "host" to get the hostname

Categories