this is my first question. And btw I am unconfy with RegExes.
I was thinking about a PHP function that validates domains or URLs, given by user input. (Sub)Domains shall be collected via html input field.
So I have to deal with different formats like http(s)://domain.tld and domain.tld both with the possibility of including a path or being invalid.
The function should rather correct almost correct user input instead of returning false.
In the end, I want to return the format (sub.)domain.tld, but only for real existing domains.
My WIP-solution is the following. What do you think about it?
function valDomain($url,$prefix=""){
$url = trim($url);
$url = str_replace(" ", "", $url);
$url = trim($url,'.');
$url = trim($url,'?');
$url = trim($url,'-');
$url = trim($url,'/');
$url = strtolower($url);
$url = substr($url,0,100);
if(strpos($url,'.') == false) {
return false;
}
if(strpos($url,'http') !== false) {
$x = parse_url($url);
if(isset($x['host'])){
$url = $x['host'];
}
}
if(strpos($url,'/') !== false) {
$x = explode("/", $url);
if(isset($x[0])){
$url = $x[0];
}
}
if(checkdnsrr($url,"A")){
return $prefix.$url;
} else {
return false;
}
}
For explanation: It tidies up the user input, checks if it can be a url/domain at all, takes the host if it's a proper url, deletes the path, and then, when it only should be the raw url, check if there is a dns entry corresponding to it. Only if yes, it returns the validated domain. Other it returns false.
Does this make sense?
(The $prefix argument can optionally be used to add a http:// to the url in order to render a hyperlink).
Retrieved results will be stored in database, so they need to be hack-safe.
I'm getting a url from a form, this way:
$input_website = isset($_POST['website']) ? check_plain($_POST['website']) : 'None';
I need to get back a naked domain name(for some API integration), for example: http://www.example.com will return as example.com
and www.example.com will return example.com etc.
I have this code now, that returns the correct url for the first case http://www.example.com but returns nothing for www.example.com or even example.com:
function get_domain($url)
{
$pieces = parse_url($url);
$domain = isset($pieces['host']) ? $pieces['host'] : '';
if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
return $regs['domain'];
}
return false;
}
Can you please advice on the matter?
As per discussion with you:
$url = 'www.noamddd.com';
$arrUrl = explode("/", $url);
echo $arrUrl[0];
Old Answer:
Make a function with the following code block and get the domain names.
Try this
more about parse_url
$url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
$parse = parse_url($url);
print $parse['host']; //google.com
Also you can do this in another way:
echo $domain = str_ireplace('www.', '', parse_url($url, PHP_URL_HOST));//google.com
If you just have the URL (and not want the current domain name like frayne-konok suggests) and want to extract the server name, you can use a regular expression like this:
$serverName = preg_replace('|.*?://(.*?)/.*|', '$1', $url);
I ended up doing something a bit different - checking if there is http and if not, i'm adding it using this function:
function addHttp($website) {
if (!preg_match("~^(?:f|ht)tps?://~i", $url)) {
$url = "http://" . $url;
}
return $website;
}
and only then i'm sending it to my other function that return the domain.
For sure not the best way, but it works.
I'm trying to compare two urls using PHP, ensuring that the domain name is the same. It cannot be the sub-domain. It has to literally be the same domain. Example:
http://www.google.co.uk would validate as true compared to http://www.google.co.uk/pages.html.
but
http://www.google.co.uk would validate as false compared to http://www.something.co.uk/pages.html.
Use parse_url(), and compare the "host" index in the array returned from the two calls to parse_url().
Use parse_url()
$url1 = parse_url("http://www.google.co.uk");
$url2 = parse_url("http://www.google.co.uk/pages.html");
if ($url1['host'] == $url2['host']){
//matches
}
simple, use parse_url()
$url1 = parse_url('http://www.google.co.uk');
$url2 = parse_url('http://www.google.co.uk/pages.html');
if($url1['host'] == $url2['host']){
// same domain
}
You could use parse_url for this
$url1 = parse_url('http://www.google.com/page1.html');
$domain1 = $url1['host'];
$url2 = parse_url('http://www.google.com/page2.html');
$domain2 = $url2['host'];
if($domain1 == $domain2){
// something
}
Expanding the answer given by Ariel, the code you could use is similar to the following one:
<?php
compare_host('http://www.google.co.uk', 'http://www.something.co.uk/pages.html');
function compare_host($url1, $url2)
{
// PHP prior of 5.3.3 emits a warning if the URL parsing failed.
$info = #parse_url($url1);
if (empty($info)) {
return FALSE;
}
$host1 = $info['host'];
$info = #parse_url($url2);
if (empty($info)) {
return FALSE;
}
return (strtolower($host1) === strtolower($info['host']));
}
Is there a function in PHP to get the name of the subdomain?
In the following example I would like to get the "en" part of the URL:
en.example.com
Here's a one line solution:
array_shift((explode('.', $_SERVER['HTTP_HOST'])));
Or using your example:
array_shift((explode('.', 'en.example.com')));
EDIT: Fixed "only variables should be passed by reference" by adding double parenthesis.
EDIT 2: Starting from PHP 5.4 you can simply do:
explode('.', 'en.example.com')[0];
Uses the parse_url function.
$url = 'http://en.example.com';
$parsedUrl = parse_url($url);
$host = explode('.', $parsedUrl['host']);
$subdomain = $host[0];
echo $subdomain;
For multiple subdomains
$url = 'http://usa.en.example.com';
$parsedUrl = parse_url($url);
$host = explode('.', $parsedUrl['host']);
$subdomains = array_slice($host, 0, count($host) - 2 );
print_r($subdomains);
You can do this by first getting the domain name (e.g. sub.example.com => example.co.uk) and then use strstr to get the subdomains.
$testArray = array(
'sub1.sub2.example.co.uk',
'sub1.example.com',
'example.com',
'sub1.sub2.sub3.example.co.uk',
'sub1.sub2.sub3.example.com',
'sub1.sub2.example.com'
);
foreach($testArray as $k => $v)
{
echo $k." => ".extract_subdomains($v)."\n";
}
function extract_domain($domain)
{
if(preg_match("/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i", $domain, $matches))
{
return $matches['domain'];
} else {
return $domain;
}
}
function extract_subdomains($domain)
{
$subdomains = $domain;
$domain = extract_domain($subdomains);
$subdomains = rtrim(strstr($subdomains, $domain, true), '.');
return $subdomains;
}
Outputs:
0 => sub1.sub2
1 => sub1
2 =>
3 => sub1.sub2.sub3
4 => sub1.sub2.sub3
5 => sub1.sub2
http://php.net/parse_url
<?php
$url = 'http://user:password#sub.hostname.tld/path?argument=value#anchor';
$array=parse_url($url);
$array['host']=explode('.', $array['host']);
echo $array['host'][0]; // returns 'sub'
?>
As the only reliable source for domain suffixes are the domain registrars, you can't find the subdomain without their knowledge.
There is a list with all domain suffixes at https://publicsuffix.org. This site also links to a PHP library: https://github.com/jeremykendall/php-domain-parser.
Please find an example below. I also added the sample for en.test.co.uk which is a domain with a multi suffix (co.uk).
<?php
require_once 'vendor/autoload.php';
$pslManager = new Pdp\PublicSuffixListManager();
$parser = new Pdp\Parser($pslManager->getList());
$host = 'http://en.example.com';
$url = $parser->parseUrl($host);
echo $url->host->subdomain;
$host = 'http://en.test.co.uk';
$url = $parser->parseUrl($host);
echo $url->host->subdomain;
PHP 7.0: Use the explode function and create a list of all the results.
list($subdomain,$host) = explode('.', $_SERVER["SERVER_NAME"]);
Example: sub.domain.com
echo $subdomain;
Result: sub
echo $host;
Result: domain
Simply...
preg_match('/(?:http[s]*\:\/\/)*(.*?)\.(?=[^\/]*\..{2,5})/i', $url, $match);
Just read $match[1]
Working example
It works perfectly with this list of urls
$url = array(
'http://www.domain.com', // www
'http://domain.com', // --nothing--
'https://domain.com', // --nothing--
'www.domain.com', // www
'domain.com', // --nothing--
'www.domain.com/some/path', // www
'http://sub.domain.com/domain.com', // sub
'опубликованному.значения.ua', // опубликованному ;)
'значения.ua', // --nothing--
'http://sub-domain.domain.net/domain.net', // sub-domain
'sub-domain.third-Level_DomaIN.domain.uk.co/domain.net' // sub-domain
);
foreach ($url as $u) {
preg_match('/(?:http[s]*\:\/\/)*(.*?)\.(?=[^\/]*\..{2,5})/i', $u, $match);
var_dump($match);
}
Simplest and fastest solution.
$sSubDomain = str_replace('.example.com','',$_SERVER['HTTP_HOST']);
$REFERRER = $_SERVER['HTTP_REFERER']; // Or other method to get a URL for decomposition
$domain = substr($REFERRER, strpos($REFERRER, '://')+3);
$domain = substr($domain, 0, strpos($domain, '/'));
// This line will return 'en' of 'en.example.com'
$subdomain = substr($domain, 0, strpos($domain, '.'));
Using regex, string functions, parse_url() or their combinations it's not real solution. Just test any of proposed solutions with domain test.en.example.co.uk, there will no any correct result.
Correct solution is use package that parses domain with Public Suffix List. I recomend TLDExtract, here is sample code:
$extract = new LayerShifter\TLDExtract\Extract();
$result = $extract->parse('test.en.example.co.uk');
$result->getSubdomain(); // will return (string) 'test.en'
$result->getSubdomains(); // will return (array) ['test', 'en']
$result->getHostname(); // will return (string) 'example'
$result->getSuffix(); // will return (string) 'co.uk'
What I found the best and short solution is
array_shift(explode(".",$_SERVER['HTTP_HOST']));
For those who get 'Error: Strict Standards: Only variables should be passed by reference.'
Use like this:
$env = (explode(".",$_SERVER['HTTP_HOST']));
$env = array_shift($env);
$domain = 'sub.dev.example.com';
$tmp = explode('.', $domain); // split into parts
$subdomain = current($tmp);
print($subdomain); // prints "sub"
As seen in a previous question:
How to get the first subdomain with PHP?
There isn't really a 100% dynamic solution - I've just been trying to figure it out as well and due to different domain extensions (DTL) this task would be really difficult without actually parsing all these extensions and checking them each time:
.com vs .co.uk vs org.uk
The most reliable option is to define a constant (or database entry etc.) that stores the actual domain name and remove it from the $_SERVER['SERVER_NAME'] using substr()
defined("DOMAIN")
|| define("DOMAIN", 'mymaindomain.co.uk');
function getSubDomain() {
if (empty($_SERVER['SERVER_NAME'])) {
return null;
}
$subDomain = substr($_SERVER['SERVER_NAME'], 0, -(strlen(DOMAIN)));
if (empty($subDomain)) {
return null;
}
return rtrim($subDomain, '.');
}
Now if you're using this function under http://test.mymaindomain.co.uk it will give you test or if you have multiple sub-domain levels http://another.test.mymaindomain.co.uk you'll get another.test - unless of course you update the DOMAIN.
I hope this helps.
Simply
reset(explode(".", $_SERVER['HTTP_HOST']))
I'm doing something like this
$url = https://en.example.com
$splitedBySlash = explode('/', $url);
$splitedByDot = explode('.', $splitedBySlash[2]);
$subdomain = $splitedByDot[0];
Suppose current url = sub.example.com
$host = array_reverse(explode('.', $_SERVER['SERVER_NAME']));
if (count($host) >= 3){
echo "Main domain is = ".$host[1].".".$host[0]." & subdomain is = ".$host[2];
// Main domain is = example.com & subdomain is = sub
} else {
echo "Main domain is = ".$host[1].".".$host[0]." & subdomain not found";
// "Main domain is = example.com & subdomain not found";
}
this is my solution, it works with the most common domains, you can fit the array of extensions as you need:
$SubDomain = explode('.', explode('|ext|', str_replace(array('.com', '.net', '.org'), '|ext|',$_SERVER['HTTP_HOST']))[0]);
// For www.abc.en.example.com
$host_Array = explode(".",$_SERVER['HTTP_HOST']); // Get HOST as array www, abc, en, example, com
array_pop($host_Array); array_pop($host_Array); // Remove com and exmaple
array_shift($host_Array); // Remove www (Optional)
echo implode($host_Array, "."); // Combine array abc.en
I know I'm really late to the game, but here goes.
What I did was take the HTTP_HOST server variable ($_SERVER['HTTP_HOST']) and the number of letters in the domain (so for example.com it would be 11).
Then I used the substr function to get the subdomain. I did
$numberOfLettersInSubdomain = strlen($_SERVER['HTTP_HOST'])-12
$subdomain = substr($_SERVER['HTTP_HOST'], $numberOfLettersInSubdomain);
I cut the substring off at 12 instead of 11 because substrings start on 1 for the second parameter. So now if you entered test.example.com, the value of $subdomain would be test.
This is better than using explode because if the subdomain has a . in it, this will not cut it off.
if you are using drupal 7
this will help you:
global $base_path;
global $base_root;
$fulldomain = parse_url($base_root);
$splitdomain = explode(".", $fulldomain['host']);
$subdomain = $splitdomain[0];
$host = $_SERVER['HTTP_HOST'];
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
$domain = $matches[0];
$url = explode($domain, $host);
$subdomain = str_replace('.', '', $url[0]);
echo 'subdomain: '.$subdomain.'<br />';
echo 'domain: '.$domain.'<br />';
From PHP 5.3 you can use strstr() with true parameter
echo strstr($_SERVER["HTTP_HOST"], '.', true); //prints en
Try this...
$domain = 'en.example.com';
$tmp = explode('.', $domain);
$subdomain = current($tmp);
echo($subdomain); // echo "en"
function get_subdomain($url=""){
if($url==""){
$url = $_SERVER['HTTP_HOST'];
}
$parsedUrl = parse_url($url);
$host = explode('.', $parsedUrl['path']);
$subdomains = array_slice($host, 0, count($host) - 2 );
return implode(".", $subdomains);
}
you can use this too
echo substr($_SERVER['HTTP_HOST'], 0, strrpos($_SERVER['HTTP_HOST'], '.', -5));
Maybe I'm late, but even though the post is old, just as I get to it, many others do.
Today, the wheel is already invented, with a library called php-domain-parser that is active, and in which two mechanisms can be used.
One based on the Public Suffix List and one based on the IANA list.
Simple and effective, it allows us to create simple helpers that help us in our project, with the ability to know that the data is maintained, in a world in which the extensions and their variants are very changeable.
Many of the answers given in this post do not pass a battery of unit tests, in which certain current extensions and their variants with multiple levels are checked, and neither with the casuistry of domains with extended characters.
Maybe it serves you, as it served me.
<?php
// Your code here!
function get_domain($host) {
$parts = explode('.',$host);
$extension = $parts[count($parts)-1];
$name = $parts[count($parts)-2];
return $name.'.'.$extension;
}
echo get_domain("https://api.neoistone.com");
?>
If you only want what comes before the first period:
list($sub) = explode('.', 'en.example.com', 2);