Extracting part of a domain name

Extracting part of a domain name - php

I have a url, members.exampledomain.com, and I would like to display only exampledomain onto my page.
For example http://members.exampledomain.com's index page has something like
<img src="members/images/logoexampledomain.png" />

Try using:
print_r($_SERVER);
and see if there is something you like there :)
Else if it is allways the same url like that, you can use $pieces = explode('.', $url); and $pieces[1] will contain exampledomain

If that doesn't work for you I have used javascript before to retrieve information out of urls before on pages where php would not run. I believe I retrieved it using window.location and had a function figure out what characters to look at so it would know where to split the url or retrieve out the information you want.

If you wish to do this with the domain as suggested above, the following code should work for you.
$full_domain = 'members.example.com';
$parts = explode('.', $full_domain);
array_slice($parts, -2, 2);
$domain = implode('.', $parts);
echo $domain;
This is a basic string parser for the domain. Things get much more complex when you are using multiple domains on the same script (aka Virtual hosting), and those domains have varying extensions such as .com.au .
If you wish to get the 'member' portion of the domain, you can add the following code after the above code:
$member = rtrim(str_replace($domain, '', $full_domain), '.');
echo $member;

Related

Issue with & in a string submitted with $_GET

I'm building an "away"-page for my website and when a user posted a link to another website, each visitor clicking that link will be redirected first to the away.php file with an info that I am not responsible for the content of the linked website.
The code in away.php to fetch the incoming browser URI is:
$goto = $_GET['to'];
So far it works, however there's a logical issue with dynamic URIs, in example:
www.mydomain.com/away.php?to=http://example.com
is working, but dynamic URIs like
www.mydomain.com/away.php?to=http://www.youtube.com/watch?feature=fvwp&v=j1p0_R8ZLB0
aren't working since there is a & included in the linked domain, which will cause ending the $_GET['to'] string to early.
The $goto variable contains only the part until the first &:
echo $_GET['to'];
===> "http://www.youtube.com/watch?feature=fvwp"
I understand why, but looking for a solution since I haven't found it yet on the internet.

Try using urlencode:
$link = urlencode("http://www.youtube.com/watch?feature=fvwp&v=j1p0_R8ZLB0") ;
echo $link;
The function will convert url special symbols into appropriate symbols that can carry data.
It will look like this and may be appended to a get parameter:
http%3A%2F%2Fwww.youtube.com%2Fwatch%3Ffeature%3Dfvwp%26v%3Dj1p0_R8ZLB0
To get special characters back (for example to output the link) there is a function urldecode.
Also function htmlentities may be useful.
You can test with this:
$link = urlencode("http://www.youtube.com/watch?feature=fvwp&v=j1p0_R8ZLB0") ;
$redirect = "{$_SERVER['PHP_SELF']}?to={$link}" ;
if (!isset($_GET['to'])){
header("Location: $redirect") ;
} else {
echo $_GET['to'];
}
EDIT:
Ok, I have got a solution for your particular situation.
This solution will work only if:
Parameter to will be last in the query string.
if (preg_match("/to=(.+)/", $redirect, $parts)){ //We got a parameter TO
echo $parts[1]; //Get everything after TO
}
So, $parts[1] will be your link.

Reduce link (URL) size

Is it possible to reduce the size of a link (in text form) by PHP or JS?
E.g. I might have links like these:
http://www.example.com/index.html <- Redirects to the root
http://www.example.com/folder1/page.html?start=true <- Redirects to page.html
http://www.example.com/folder1/page.html?start=false <- Redirects to page.html?start=false
The purpose is to find out, if the link can be shortened and still point to the same location. In these examples the first two links can be reduces, because the first points to the root, and the second has parameters that can be omitted.
The third link is then the case, where the parameters can't be omitted, meaning that it can't be reduced further than to remove the http://.
So the above links would be reduced like this:
Before: http://www.example.com/index.html
After: www.example.com
Before: http://www.example.com/folder1/page.html?start=true
After: www.example.com/folder1/page.html
Before: http://www.example.com/folder1/page.html?start=false
After: www.example.com/folder1/page.html?start=false
Is this possible by PHP or JS?
Note:
www.example.com is not a domain I own or have access to besides through the URL. The links are potentially unknown, and I'm looking for something like an automatic link shortener that can work by getting the URL and nothing else.
Actually I was thinking of something like a linkchecker that could check if the link works before and after the automatic trim, and if it doesn't then the check will be done again at a less trimmed version of the link. But that seemed like overkill...

Since you want to do this automatically, and you don't know how the parameters change the behaviour, you will have to do this by trial and error: Try to remove parts from an URL, and see if the server responds with a different page.
In the simplest case this could work somehow like this:
<?php
$originalUrl = "http://stackoverflow.com/questions/14135342/reduce-link-url-size";
$originalContent = file_get_contents($originalUrl);
$trimmedUrl = $originalUrl;
while($trimmedUrl) {
$trialUrl = dirname($trimmedUrl);
$trialContent = file_get_contents($trialUrl);
if ($trialContent == $originalContent) {
$trimmedUrl = $trialUrl;
} else {
break;
}
}
echo "Shortest equivalent URL: " . $trimmedUrl;
// output: Shortest equivalent URL: http://stackoverflow.com/questions/14135342
?>
For your usage scenario, your code would be a bit more complicated, as you would have to test for each parameter in turn to see if it is necessary. For a starting point, see the parse_url() and parse_str() functions.
A word of caution: this code is very slow, as it will perform lots of queries to every URL you want to shorten. Also, it will likely fail to shorten many URLs because the server might include stuff like timestamps in the response. This makes the problem very hard, and that's the reason why companies like google have many engineers that think about stuff like this :).

Yea, that's possible:
JS:
var url = 'http://www.example.com/folder1/page.html?start=true';
url = url.replace('http://','').replace('?start=true','').replace('/index.html','');
php:
$url = 'http://www.example.com/folder1/page.html?start=true';
$url = str_replace(array('http://', '?start=true', '/index.html'), "", $url);
(Each item in the array() will be replaced with "")

Here is a JS for you.
function trimURL(url, trimToRoot, trimParam){
var myRegexp = /(http:\/\/|https:\/\/)(.*)/g;
var match = myRegexp.exec(url);
url = match[2];
//alert(url); // www.google.com
if(trimParam===true){
url = url.split('?')[0];
}
if(trimToRoot === true){
url = url.split('/')[0];
}
return url
}
alert(trimURL('https://www.google.com/one/two.php?f=1'));
alert(trimURL('https://www.google.com/one/two.php?f=1', true));
alert(trimURL('https://www.google.com/one/two.php?f=1', false, true));
Fiddle: http://jsfiddle.net/5aRpQ/

Regex for Domains having three dots ex:- "gov.ac.in"

We a list of URL's in this format (http://www.xyz.gov.ac.in). Not all of them look like this, some of them have normal domains. I am confused on how to get the domain name from a 3 dotted url. The code we have is working fine for 2 dotted domain names.
Here is the code we have:
function get_domain($url)
{
$pieces = parse_url($url);
$domain = isset($pieces['host']) ? $pieces['host'] : '';
if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
return $regs['domain'];
}
return false;
}
echo get_domain($url) ;
How can we modify the above code to accommodate for 3 dotted domains as well as the other types?
The echo results should be in this format xyz.gov.ac.in

Basically, you can't. At least not without a lookup table that has all "TLDs".
For example, in my country (The Netherlands) we have .nl and .co.nl. But www.gov.nl is a normal website (I'm trying to illustrate that you can't automatically say that gov. isn't a domain). And www.edu.nl doesn't exist.
Any standard regex that would try to parse them would tell you that the domain is www.gov.nl, while the domain is actually gov.nl. Same for edu.nl.
The only way you can accomplish what you want is by getting a list of all TLDs (and sub-TLDs) and using that to parse them.
I believe that Firefox and Chrome have such a list implemented (for coloring the domain name in the URL) and constantly keep it up-to-date. Maybe look in those sources?

Try this:
/(^[\w|-]+\.)(?P<domain>([\w|-]+\.)+(\w+))/i
Hope this will help..

You should be able to use this Regex instead
/(?P<domain>([a-z0-9][a-z0-9\-]{1,63}\.)+[a-z\.]{2,6})$/i

PHP - allow domains, not subdomains

I would appreciate any help that can be provided with this matter.
I am creating a registration form, one field is for the users domain which I will verify is valid with FILTER_VALIDATE_URL and that it exists with dns_check_record.
However a problem I'm having is that using these two methods will also allow subdomains to be submitted to the form which I don't want.
Does anyone know a way to allow domains but not subdomains?
I've tested the following function, from http://syntax.cwarn23.net/PHP/Strip_URL_to_Domain:
function domain($domainb)
{
$bits = explode('/', $domainb);
if ($bits[0]=='http:' || $bits[0]=='https:')
{
$domainb= $bits[2];
} else {
$domainb= $bits[0];
}
unset($bits);
$bits = explode('.', $domainb);
$idz=count($bits);
$idz-=3;
if (strlen($bits[($idz+2)])==2) {
$url=$bits[$idz].'.'.$bits[($idz+1)].'.'.$bits[($idz+2)];
} else if (strlen($bits[($idz+2)])==0) {
$url=$bits[($idz)].'.'.$bits[($idz+1)];
} else {
$url=$bits[($idz+1)].'.'.$bits[($idz+2)];
}
return $url;
However this isn't perfect as any domains such as www.domain.uk.com will appear as uk.com (I know not a common domain extension).
Does anyone know a method better than the above function?

As pointed by Micheal Mior, you have to check for .co.uk, .com.br and many others.
Some browser vendors are maintaining a list of such non-TLD that are effectively TLD: http://publicsuffix.org/. The list is quite huge.
There is a library here that uses this effective TLD list to implement the function you are looking for (download are here). (Found via https://wiki.mozilla.org/Gecko:Effective_TLD_Service.)

Combine them.
dns_check_record will fail on '.co.uk', so you can split your string on the dots, check the domain you get when you combine the last two parts, and if that fails, use a third part too, if any.
You will do a double check for invalid domains, but I assume that won't be an issue.

first you could use parse_url() to get only the host name: http://www.stackoverflow.com -> $url['host'] = 'www.stackoverflow.com'
Second you could count the amount of points in the hostname: explode() --> count() or substr_count()
Has the host more than 1 point a subdomain could be exist.
Now you could use the solution mentioned by GolezTrol or arnaud576875.

How do I detect subdomain and filter it?

I do have a domain search function. In search box you have the option to enter any kind of domain names. what I am looking into is how do I filter sub domain from search or else trim sub domain and keep only main.
for example if a user entered mail.yahoo.com then that to be convert to yahoo.com or it can be omitted from search.

Here's a more concise way to grab the domain and a likely subdomain from a URL.
function find_subdomain($url) {
$parts = parse_url($url);
$domain_parts = explode('.', $parts['host']);
while(count($domain_parts) > 4)
array_shift($domain_parts);
return join('.', $domain_parts);
}
Keep in mind that not everything that looks like a subdomain is really a subdomain. Some countries have their own country-specific domains that everyone uses, like .co.uk and .com.au. You can not rely on the number of dots in the URL to tell you what is and is not a subdomain. In fact, you might need the opposite approach - first remove the top-level domain, then see what's left. Unfortunately then you're left with the second-level domain problem.
Can you tell us more about what exactly you are trying to accomplish? Why are you trying to detect subdomains? You mentioned a search box. What is being searched?
Edit: I have updated the function to up to four of the right-most parts of the domain. Given "http://one.two.three.four.five.six.com" it will return 'four.five.six.com'

I customized an utility function that i'm using, it's close to perfection (but that's what you could get without hard-coding all the possible list of domain extensions).
Here's the catch: the assumes that the main domain contains at least 4 characters. i.e for: sub.mail.com, it returns mail.com But for sub.aol.com it returns sub.aol.com
function get_main_domain($host='') {
if(empty($host))$host=$_SERVER['HTTP_HOST'];
$domain_parts = explode('.',$host);
$count=count($domain_parts);
if($count<=2)return $host;
$permit=0;
for($i=$count-1;$i>=0;$i--){
$permit++;
if(strlen($domain_parts[$i])>3)break;
}
while(count($domain_parts) >$permit)array_shift($domain_parts);
return join('.', $domain_parts);
}

Well that doesnt work for all domain if you forgot to mention it in array...
Here is my solution...but I need to compress it to few lines...is it possible??
function subdomain($domainb){$bits = explode('/', $domainb);
if ($bits[0]=='http:' || $bits[0]=='https:'){
$domainb= $bits[2];
} else {$domainb= $bits[0];}
unset($bits);
$bits = explode('.', $domainb); $idz=0;
while (isset($bits[$idz])){$idz+=1;}
$idz-=4; $idy=0;
while ($idy<$idz){ unset($bits[$idy]);
$idy+=1;} $part=array();
foreach ($bits AS $bit){$part[]=$bit;}
unset($bit); unset($bits); unset($domainb);
if (strlen($part[1])>4){ unset($part[0]);}
foreach($part AS $bit){$domainb.=$bit.'.';}
unset($bit);
return preg_replace('/(.*)\./','$1',$domainb);}

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.