Compare two strings (urls) for same domain - php

I'm trying to compare two urls using PHP, ensuring that the domain name is the same. It cannot be the sub-domain. It has to literally be the same domain. Example:
http://www.google.co.uk would validate as true compared to http://www.google.co.uk/pages.html.
but
http://www.google.co.uk would validate as false compared to http://www.something.co.uk/pages.html.

Use parse_url(), and compare the "host" index in the array returned from the two calls to parse_url().

Use parse_url()
$url1 = parse_url("http://www.google.co.uk");
$url2 = parse_url("http://www.google.co.uk/pages.html");
if ($url1['host'] == $url2['host']){
//matches
}

simple, use parse_url()
$url1 = parse_url('http://www.google.co.uk');
$url2 = parse_url('http://www.google.co.uk/pages.html');
if($url1['host'] == $url2['host']){
// same domain
}

You could use parse_url for this
$url1 = parse_url('http://www.google.com/page1.html');
$domain1 = $url1['host'];
$url2 = parse_url('http://www.google.com/page2.html');
$domain2 = $url2['host'];
if($domain1 == $domain2){
// something
}

Expanding the answer given by Ariel, the code you could use is similar to the following one:
<?php
compare_host('http://www.google.co.uk', 'http://www.something.co.uk/pages.html');
function compare_host($url1, $url2)
{
// PHP prior of 5.3.3 emits a warning if the URL parsing failed.
$info = #parse_url($url1);
if (empty($info)) {
return FALSE;
}
$host1 = $info['host'];
$info = #parse_url($url2);
if (empty($info)) {
return FALSE;
}
return (strtolower($host1) === strtolower($info['host']));
}

Related

Thinking about domain validation

this is my first question. And btw I am unconfy with RegExes.
I was thinking about a PHP function that validates domains or URLs, given by user input. (Sub)Domains shall be collected via html input field.
So I have to deal with different formats like http(s)://domain.tld and domain.tld both with the possibility of including a path or being invalid.
The function should rather correct almost correct user input instead of returning false.
In the end, I want to return the format (sub.)domain.tld, but only for real existing domains.
My WIP-solution is the following. What do you think about it?
function valDomain($url,$prefix=""){
$url = trim($url);
$url = str_replace(" ", "", $url);
$url = trim($url,'.');
$url = trim($url,'?');
$url = trim($url,'-');
$url = trim($url,'/');
$url = strtolower($url);
$url = substr($url,0,100);
if(strpos($url,'.') == false) {
return false;
}
if(strpos($url,'http') !== false) {
$x = parse_url($url);
if(isset($x['host'])){
$url = $x['host'];
}
}
if(strpos($url,'/') !== false) {
$x = explode("/", $url);
if(isset($x[0])){
$url = $x[0];
}
}
if(checkdnsrr($url,"A")){
return $prefix.$url;
} else {
return false;
}
}
For explanation: It tidies up the user input, checks if it can be a url/domain at all, takes the host if it's a proper url, deletes the path, and then, when it only should be the raw url, check if there is a dns entry corresponding to it. Only if yes, it returns the validated domain. Other it returns false.
Does this make sense?
(The $prefix argument can optionally be used to add a http:// to the url in order to render a hyperlink).
Retrieved results will be stored in database, so they need to be hack-safe.

PHP check if url is valid

I wonder what would be the best way in php to check if provided url is valid... At first I tried with:
filter_var($url, FILTER_VALIDATE_URL) === false
But it does not accept www.example.com (without protocol). So I tried with a simple modification:
protected function checkReferrerUrl($url) {
if(strpos($url, '://') == false) {
$url = "http://".$url;
}
if(filter_var($url, FILTER_VALIDATE_URL) === false) {
return false;
}
return true;
}
Now it works fine with www.example.com but also accepts simple foo as it converts to http://foo. However though this is not a valid public url I think... so what would you suggest? Go back to traditional regexp?
I recommend, that you do not use filter_var with type URL.
There are much more side-effects.
For example, these are valid URLs according to filter_var:
http://example.com/"><script>alert(document.cookie)</script>
http://example.ee/sdsf"f
Additionally FILTER_VALIDATE_URL does not support internationalized domain names (IDN).
I recommend using a regex combined with some ifs afterwards (f.e. for the domain) for security reasons.
Without the security aspect I am using parse_url to take my parts. But this function has a similar issue, when the scheme (no http/https) is missing.
Use this
<?php
$url = 'www.example.com';
if(validateURL($url)){
echo "Valid";
}else{
echo "invalid";
}
function validateURL($URL) {
$pattern_1 = "/^(http|https|ftp):\/\/(([A-Z0-9][A-Z0-9_-]*)(\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i";
$pattern_2 = "/^(www)((\.[A-Z0-9][A-Z0-9_-]*)+.(com|org|net|dk|at|us|tv|info|uk|co.uk|biz|se)$)(:(\d+))?\/?/i";
if(preg_match($pattern_1, $URL) || preg_match($pattern_2, $URL)){
return true;
} else{
return false;
}
}
?>
Try this one too
<?php
// Assign URL to $URL variable
$url = 'http://example.com';
// Check url using preg_match
if (preg_match("/^(https?:\/\/+[\w\-]+\.[\w\-]+)/i",$url)){
echo "Valid";
}else{
echo "invalid";
}
?>

Get URL with an if statement PHP

I am trying to get URL path and to save it as variable...
$setURL = true;
$getDomain = "http://".$_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI'];
$getSubdomain = "http://".$_SERVER['HTTP_HOST'].$_SERVER['REQUEST_URI'];
if ($setURL === true) {
$result = 'http://'.parse_url($getDomain, PHP_URL_HOST) . '/';
echo 'get domain';
} else {
$result = 'http://'.parse_url($getSubdomain, PHP_URL_HOST).parse_url($getSubdomain, PHP_URL_PATH);
echo 'get subdomain';
}
$siteURL = $result;
So basically if I defined variable $setURL = true; it will return correct URL for simple domain ... http://domain-name.com
However else does not work as I want to... else is there for subdomains. So if I set $setURL = false; it should return following... http://domain-name.com/path/
But unfortunately it return more then that... It returns anything I type as URL...
http://domain-name.com/path/something/index.php it will return all of that as URL!
Please help me to fix this as I don't have any ideas how I could manage to make it.
Formally, a subdomain precedes a domain name. For example, in ftp.debian.us, ftp is the subdomain.
It sounds like what you want is the first path in the URI. You can use PHP's explode() method to grab the first segment in the path.
$uriparts = explode($_SERVER['REQUEST_URI']) // = '/path/to/somewhere/index.html'
$path = $uriparts[1] // = 'path'

PHP function to get the subdomain of a URL

Is there a function in PHP to get the name of the subdomain?
In the following example I would like to get the "en" part of the URL:
en.example.com
Here's a one line solution:
array_shift((explode('.', $_SERVER['HTTP_HOST'])));
Or using your example:
array_shift((explode('.', 'en.example.com')));
EDIT: Fixed "only variables should be passed by reference" by adding double parenthesis.
EDIT 2: Starting from PHP 5.4 you can simply do:
explode('.', 'en.example.com')[0];
Uses the parse_url function.
$url = 'http://en.example.com';
$parsedUrl = parse_url($url);
$host = explode('.', $parsedUrl['host']);
$subdomain = $host[0];
echo $subdomain;
For multiple subdomains
$url = 'http://usa.en.example.com';
$parsedUrl = parse_url($url);
$host = explode('.', $parsedUrl['host']);
$subdomains = array_slice($host, 0, count($host) - 2 );
print_r($subdomains);
You can do this by first getting the domain name (e.g. sub.example.com => example.co.uk) and then use strstr to get the subdomains.
$testArray = array(
'sub1.sub2.example.co.uk',
'sub1.example.com',
'example.com',
'sub1.sub2.sub3.example.co.uk',
'sub1.sub2.sub3.example.com',
'sub1.sub2.example.com'
);
foreach($testArray as $k => $v)
{
echo $k." => ".extract_subdomains($v)."\n";
}
function extract_domain($domain)
{
if(preg_match("/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i", $domain, $matches))
{
return $matches['domain'];
} else {
return $domain;
}
}
function extract_subdomains($domain)
{
$subdomains = $domain;
$domain = extract_domain($subdomains);
$subdomains = rtrim(strstr($subdomains, $domain, true), '.');
return $subdomains;
}
Outputs:
0 => sub1.sub2
1 => sub1
2 =>
3 => sub1.sub2.sub3
4 => sub1.sub2.sub3
5 => sub1.sub2
http://php.net/parse_url
<?php
$url = 'http://user:password#sub.hostname.tld/path?argument=value#anchor';
$array=parse_url($url);
$array['host']=explode('.', $array['host']);
echo $array['host'][0]; // returns 'sub'
?>
As the only reliable source for domain suffixes are the domain registrars, you can't find the subdomain without their knowledge.
There is a list with all domain suffixes at https://publicsuffix.org. This site also links to a PHP library: https://github.com/jeremykendall/php-domain-parser.
Please find an example below. I also added the sample for en.test.co.uk which is a domain with a multi suffix (co.uk).
<?php
require_once 'vendor/autoload.php';
$pslManager = new Pdp\PublicSuffixListManager();
$parser = new Pdp\Parser($pslManager->getList());
$host = 'http://en.example.com';
$url = $parser->parseUrl($host);
echo $url->host->subdomain;
$host = 'http://en.test.co.uk';
$url = $parser->parseUrl($host);
echo $url->host->subdomain;
PHP 7.0: Use the explode function and create a list of all the results.
list($subdomain,$host) = explode('.', $_SERVER["SERVER_NAME"]);
Example: sub.domain.com
echo $subdomain;
Result: sub
echo $host;
Result: domain
Simply...
preg_match('/(?:http[s]*\:\/\/)*(.*?)\.(?=[^\/]*\..{2,5})/i', $url, $match);
Just read $match[1]
Working example
It works perfectly with this list of urls
$url = array(
'http://www.domain.com', // www
'http://domain.com', // --nothing--
'https://domain.com', // --nothing--
'www.domain.com', // www
'domain.com', // --nothing--
'www.domain.com/some/path', // www
'http://sub.domain.com/domain.com', // sub
'опубликованному.значения.ua', // опубликованному ;)
'значения.ua', // --nothing--
'http://sub-domain.domain.net/domain.net', // sub-domain
'sub-domain.third-Level_DomaIN.domain.uk.co/domain.net' // sub-domain
);
foreach ($url as $u) {
preg_match('/(?:http[s]*\:\/\/)*(.*?)\.(?=[^\/]*\..{2,5})/i', $u, $match);
var_dump($match);
}
Simplest and fastest solution.
$sSubDomain = str_replace('.example.com','',$_SERVER['HTTP_HOST']);
$REFERRER = $_SERVER['HTTP_REFERER']; // Or other method to get a URL for decomposition
$domain = substr($REFERRER, strpos($REFERRER, '://')+3);
$domain = substr($domain, 0, strpos($domain, '/'));
// This line will return 'en' of 'en.example.com'
$subdomain = substr($domain, 0, strpos($domain, '.'));
Using regex, string functions, parse_url() or their combinations it's not real solution. Just test any of proposed solutions with domain test.en.example.co.uk, there will no any correct result.
Correct solution is use package that parses domain with Public Suffix List. I recomend TLDExtract, here is sample code:
$extract = new LayerShifter\TLDExtract\Extract();
$result = $extract->parse('test.en.example.co.uk');
$result->getSubdomain(); // will return (string) 'test.en'
$result->getSubdomains(); // will return (array) ['test', 'en']
$result->getHostname(); // will return (string) 'example'
$result->getSuffix(); // will return (string) 'co.uk'
What I found the best and short solution is
array_shift(explode(".",$_SERVER['HTTP_HOST']));
For those who get 'Error: Strict Standards: Only variables should be passed by reference.'
Use like this:
$env = (explode(".",$_SERVER['HTTP_HOST']));
$env = array_shift($env);
$domain = 'sub.dev.example.com';
$tmp = explode('.', $domain); // split into parts
$subdomain = current($tmp);
print($subdomain); // prints "sub"
As seen in a previous question:
How to get the first subdomain with PHP?
There isn't really a 100% dynamic solution - I've just been trying to figure it out as well and due to different domain extensions (DTL) this task would be really difficult without actually parsing all these extensions and checking them each time:
.com vs .co.uk vs org.uk
The most reliable option is to define a constant (or database entry etc.) that stores the actual domain name and remove it from the $_SERVER['SERVER_NAME'] using substr()
defined("DOMAIN")
|| define("DOMAIN", 'mymaindomain.co.uk');
function getSubDomain() {
if (empty($_SERVER['SERVER_NAME'])) {
return null;
}
$subDomain = substr($_SERVER['SERVER_NAME'], 0, -(strlen(DOMAIN)));
if (empty($subDomain)) {
return null;
}
return rtrim($subDomain, '.');
}
Now if you're using this function under http://test.mymaindomain.co.uk it will give you test or if you have multiple sub-domain levels http://another.test.mymaindomain.co.uk you'll get another.test - unless of course you update the DOMAIN.
I hope this helps.
Simply
reset(explode(".", $_SERVER['HTTP_HOST']))
I'm doing something like this
$url = https://en.example.com
$splitedBySlash = explode('/', $url);
$splitedByDot = explode('.', $splitedBySlash[2]);
$subdomain = $splitedByDot[0];
Suppose current url = sub.example.com
$host = array_reverse(explode('.', $_SERVER['SERVER_NAME']));
if (count($host) >= 3){
echo "Main domain is = ".$host[1].".".$host[0]." & subdomain is = ".$host[2];
// Main domain is = example.com & subdomain is = sub
} else {
echo "Main domain is = ".$host[1].".".$host[0]." & subdomain not found";
// "Main domain is = example.com & subdomain not found";
}
this is my solution, it works with the most common domains, you can fit the array of extensions as you need:
$SubDomain = explode('.', explode('|ext|', str_replace(array('.com', '.net', '.org'), '|ext|',$_SERVER['HTTP_HOST']))[0]);
// For www.abc.en.example.com
$host_Array = explode(".",$_SERVER['HTTP_HOST']); // Get HOST as array www, abc, en, example, com
array_pop($host_Array); array_pop($host_Array); // Remove com and exmaple
array_shift($host_Array); // Remove www (Optional)
echo implode($host_Array, "."); // Combine array abc.en
I know I'm really late to the game, but here goes.
What I did was take the HTTP_HOST server variable ($_SERVER['HTTP_HOST']) and the number of letters in the domain (so for example.com it would be 11).
Then I used the substr function to get the subdomain. I did
$numberOfLettersInSubdomain = strlen($_SERVER['HTTP_HOST'])-12
$subdomain = substr($_SERVER['HTTP_HOST'], $numberOfLettersInSubdomain);
I cut the substring off at 12 instead of 11 because substrings start on 1 for the second parameter. So now if you entered test.example.com, the value of $subdomain would be test.
This is better than using explode because if the subdomain has a . in it, this will not cut it off.
if you are using drupal 7
this will help you:
global $base_path;
global $base_root;
$fulldomain = parse_url($base_root);
$splitdomain = explode(".", $fulldomain['host']);
$subdomain = $splitdomain[0];
$host = $_SERVER['HTTP_HOST'];
preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
$domain = $matches[0];
$url = explode($domain, $host);
$subdomain = str_replace('.', '', $url[0]);
echo 'subdomain: '.$subdomain.'<br />';
echo 'domain: '.$domain.'<br />';
From PHP 5.3 you can use strstr() with true parameter
echo strstr($_SERVER["HTTP_HOST"], '.', true); //prints en
Try this...
$domain = 'en.example.com';
$tmp = explode('.', $domain);
$subdomain = current($tmp);
echo($subdomain); // echo "en"
function get_subdomain($url=""){
if($url==""){
$url = $_SERVER['HTTP_HOST'];
}
$parsedUrl = parse_url($url);
$host = explode('.', $parsedUrl['path']);
$subdomains = array_slice($host, 0, count($host) - 2 );
return implode(".", $subdomains);
}
you can use this too
echo substr($_SERVER['HTTP_HOST'], 0, strrpos($_SERVER['HTTP_HOST'], '.', -5));
Maybe I'm late, but even though the post is old, just as I get to it, many others do.
Today, the wheel is already invented, with a library called php-domain-parser that is active, and in which two mechanisms can be used.
One based on the Public Suffix List and one based on the IANA list.
Simple and effective, it allows us to create simple helpers that help us in our project, with the ability to know that the data is maintained, in a world in which the extensions and their variants are very changeable.
Many of the answers given in this post do not pass a battery of unit tests, in which certain current extensions and their variants with multiple levels are checked, and neither with the casuistry of domains with extended characters.
Maybe it serves you, as it served me.
<?php
// Your code here!
function get_domain($host) {
$parts = explode('.',$host);
$extension = $parts[count($parts)-1];
$name = $parts[count($parts)-2];
return $name.'.'.$extension;
}
echo get_domain("https://api.neoistone.com");
?>
If you only want what comes before the first period:
list($sub) = explode('.', 'en.example.com', 2);

php check if domain equals value, then perform action

I need to take a variable that contains a URL, and check to see if the domain equals a certain domain, if it does, echo one variable, if not, echo another.
$domain = "http://www.google.com/docs";
if ($domain == google.com)
{ echo "yes"; }
else
{ echo "no"; }
Im not sure how to write the second line where it checks the domain to see if $domain contains the url in the if statement.
This is done by using parse_url:
$host = parse_url($domain, PHP_URL_HOST);
if($host == 'www.google.com') {
// do something
}
Slightly more advanced domain-finding function.
$address = "http://www.google.com/apis/jquery";
if (get_domain($address) == "google.com") {
print "Yes.";
}
function get_domain($url)
{
$pieces = parse_url($url);
$domain = isset($pieces['host']) ? $pieces['host'] : '';
if (preg_match('/(?P<domain>[a-z0-9][a-z0-9\-]{1,63}\.[a-z\.]{2,6})$/i', $domain, $regs)) {
return $regs['domain'];
}
return false;
}
In addition to the other answers you could use a regex, like this one which looks for google.com in the domain name
$domain = "http://www.google.com/docs";
if (preg_match('{^http://[\w\.]*google.com/}i', $domain))
{
}
Have you tried parse_url()?
Note that you might also want to explode() the resulting domain on '.', depending on exactly what you mean by 'domain'.
You can use the parse_url function to divide the URL into the separate parts (protocol/host/path/query string/etc). If you also want to allow www.google.com to be a synonym for google.com, you'll need to add an extra substring check (with substr) that makes sure that the latter part of the host matches the domain you're looking for.

Categories