I'm writing a PHP application where the user can enter a URL and some operations take place afterwards (further details not relevant to this question).
Requirement: If the user enters example.com, it should be converted to http://www.example.com.
The http:// part is straight-forward but am struggling with the rules that determine whether www. is prepended. Since the URL could be anything that might work in a web browser, it could be localhost or 192.168.0.1 for example. For these, clearly www. shouldn't be prepended.
So the exclusion list from above is: "If the host is localhost or looks like a v4 IP address, don't prepend". But expect there will be other cases that need to covered - could anyone advise - or suggest an alternative way of approaching this?
You can validate the user input to IP and decide whether to concatenate the "www" or not.
The user input can be "127.0.0.1", "127.0.0.1:8080","http://127.0.0.1:8080' or "http://exaple.com:8080".
$input = ("127.0.0.1:8080");
[$host,$port] = explode(":",trim($input,"http://"));
if(!empty($port)){
}
if (filter_var($host, FILTER_VALIDATE_IP)) {
header("location:http://$host:$port");
} else {
header("location:www.$host:$port");
}
I have a problem that I was thinking it was going to be simple to solve but I cannot figure it out.
I have a database full of URLs like:
http://www.domain.com/page.html
http://domain.com/page.html
http://sub.domain.co.in/page.html
http://sub.sub.domain.it/page.html
http://other.domain.net/page.html
http://ok.domain.com/ok.html
etc...
now, given http://www.domain.co.uk/page.html
I need to figure out if such page it is already in the database assuming that the different extension does not change the content.
The final goal is simple, I am building a site where people can submit pages, those pages needs to be unique to avoid duplication of content. Users are submitting google maps .com and google maps .co.in creating a duplication of the same page, what I need to do is to figure out if the page submitted is already been submitted with a different domain extension. I will also do a check on title and content if found, just in case the domain extension DOES change the content ( like www.wyska.net and www.wyska.com )
in other words:
maps.google.com === maps.google.it === maps.google.co.in === maps.google.co.uk .....
only if content is "similar" (I will have to work on figure out what "similar" means too)
so far I have (but it doesn't work):
<?php
$url = 'http://www.domain.com/text.html'; //works with this domain
$parse = parse_url($url);
var_dump($parse);
var_dump(pathinfo($parse['host']));
$url = 'http://sub.sub.domain.co.in/text.html'; //does not work with this domain
$parse = parse_url($url);
var_dump($parse);
var_dump(pathinfo($parse['host']));
?>
if necessary I can even break the domain in different parts and store those parts instead than the full domain.
I as thinking to do a search replace on the domain extension, but I haven't been able to find a full list of domain extension to use. Something like: if it ends with any of those strings, then remove that part from the domain
I am using the following code to check to see if the dns entry exists for a domain (basically to validate the domain is real)...
$sdip = gethostbyname("www.nonexistancedomainsdlkmsaldsa.com.");
$sddomain = gethostbyaddr($sdip);
print "IP: $sdip\n";
print "Domain: $sddomain\n";
It is returning....IP: 67.215.65.132 Domain: hit-nxdomain.opendns.com
So I am assuming that it is in fact searching for www.nonexistancedomainsdlkmsaldsa.com.myhostname.com (not exactly sure on this).
I read the article on php.net about adding a . following the domain, so I tried this and it didn't change anything. Anyone understand why I'm getting this? Basically, here is what I am trying to do...
User enters domain name (no http/https)....it checks to see if domain exists, if it does, it continues my script, if it does not exist, it fails with an error message.
I want them to be able to enter ANY valid domain name and it check it. So any other solutions would be helpful if there is anything better. Basically I am using this as a validation to see if domain name exists.
I would also (if at all possible), like to find out how to check regexp for
sub.subdomain.domain.tld where sub and subdomain are not required (basically to see if format is a valid format). I have searched stackoverflow, but they don't get into the detail I need to allow sub.subdomain but not require them.
article on php.net:
If you do a gethostbyname() and there is no trailing dot after a
domainname that does not resolve, this domainname will ultimately be
appended to the server-FQDN by nslookup.
So if you do a lookup for nonexistentdomainname.be your server may
return the ip for nonexistentdomainname.be.yourhostname.com, which is
the server-ip.
To avoid this behaviour, just add a trailing dot to the domainname;
i.e. gethostbyname('nonexistentdomainname.be.')
EDIT
it isn't in fact returning my server IP (i overlooked that). So I don't know if that article is relevant to this problem or not. I will leave it just in case it is.
It sounds like you're using OpenDNS as your DNS server. When you query OpenDNS for a non-existent domain, they will redirect you to one of their own "search result" pages/servers. This is specifically for OpenDNS, other DNS servers should not behave this way and correctly return no result.
I'm building an application that uses sub domains and custom domain names that sit in the database for users, so if a request comes from another domain I'll check from the database if that custom url is indeed there or when the request comes from a subdomain, I'll check if that's there. If it is I do my stuff.
Consider this a simple example of I'm looking for:
if(is_user_request())
{
$url = get_url();
// assuming that get_url() magically decides whether to output ..
// a custom domain (http://domain.tld)
// or a subdomain's first part (eg. "this".domain.tld)
}
else
{
// otherwise it's not a sub domain nor a custom domain,
// so we're dealing with our own main site.
}
Now before you go ahead assuming that because I have 0 rep, I'm here asking for "teh codes". I have a completely working way of doing this, which is the following:
// hosts
$hosts = explode('.', $_SERVER['HTTP_HOST']);
// if there is a subdomain and that's under our $sitename
if(!empty($hosts[1]) AND $hosts[1] === Config::get('domain_mid_name'))
{
$url = $hosts[0];
$url_custom = false;
}
// if there is no subdomain, but the domain is our $sitename
elseif(!empty($hosts[0]) AND $hosts[0] === Config::get('domain_mid_name') AND !empty($hosts[1]) AND $hosts[1] !== Config::get('domain_mid_name'))
{
$url = false;
$url_custom = false;
}
// otherwise it's most likely that the request
// came from a entirely different domain name.
// which means it's probably $custom_site
else
{
$url = false;
$url_custom = implode('.', $hosts);
}
if($url)
{
return $url;
}
if($url_custom)
{
return $url_custom;
}
However, I'm sure there are better way of doing this. Because first of all, HTTP_HOST does not include 'http://', so I need to add that manually and I'm pretty sure this entire if, else thing is just an overkill. So, people smarter than me, enlighten me, please.
Oh and, no .. I do not have pre-defined sub-domains. I have a simple wildcard *.domain.tld set up, so all sub-domains go to the main script. I'm just saying this because from my search for a solution I found numerous answers that suggested to manually create a sub-domain, which is not even remotely related to what I'm asking, so let's skip that subject.
$_SERVER['HTTP_HOST'] is the correct way to do it unless you want to pass different parameters from your web server into PHP.
As for the protocol, be aware the request protocol should be determined by $_SERVER['HTTPS'] rather than assuming it's http.
For extracting the subdomain you could look at using array_shift and then running
$subdomain = array_shift(explode('.', $_SERVER['HTTP_HOST']));
But generally what you have is how it should be done.
As already said, $_SERVER['HTTP_HOST'] is the way to go.
But there are errors in your code. You're assuming that host names sent consist of 2 or 3 components but you can't be sure of that. You should at least check count($hosts) too.
If by example you use domain.tld for your own site then you're better off with first checking if domain.tld is sent (you return your page, fast); then see if substr($_SERVER['HTTP_HOST']...,-11)==='.domain.tld' and if so, return the subsite (works with any level of subdomain, still fast); else error-recovery, since a completely foreign domain has been routed to you. The key thing to note is that domain matching from the hierarchy's top means matching the hostname strings right-aligned:
.domain.tld | subsite-pattern
sub12.domain.tld | MATCH
sub12.dumain.tld | NO MATCH
sub12domain.tld | NO MATCH
I have my site on the server http://www.myserver.uk.com.
On this server I have two domains:
one.com and two.com
I would like to get the current domain using PHP, but if I use $_SERVER['HTTP_HOST'] then it is showing me
myserver.uk.com
instead of:
one.com or two.com
How can I get the domain, and not the server name?
Try using this:
$_SERVER['SERVER_NAME']
Or parse:
$_SERVER['REQUEST_URI']
Reference: apache_request_headers()
The best use would be
echo $_SERVER['HTTP_HOST'];
And it can be used like this:
if (strpos($_SERVER['HTTP_HOST'], 'banana.com') !== false) {
echo "Yes this is indeed the banana.com domain";
}
This code below is a good way to see all the variables in $_SERVER in a structured HTML output with your keywords highlighted that halts directly after execution. Since I do sometimes forget which one to use myself - I think this can be nifty.
<?php
// Change banana.com to the domain you were looking for..
$wordToHighlight = "banana.com";
$serverVarHighlighted = str_replace( $wordToHighlight, '<span style=\'background-color:#883399; color: #FFFFFF;\'>'. $wordToHighlight .'</span>', $_SERVER );
echo "<pre>";
print_r($serverVarHighlighted);
echo "</pre>";
exit();
?>
The only secure way of doing this
The only guaranteed secure method of retrieving the current domain is to store it in a secure location yourself.
Most frameworks take care of storing the domain for you, so you will want to consult the documentation for your particular framework. If you're not using a framework, consider storing the domain in one of the following places:
Secure methods of storing the domain
Used By
A configuration file
Joomla, Drupal/Symfony
The database
WordPress
An environmental variable
Laravel
A service registry
Kubernetes DNS
The following work... but they're not secure
Hackers can make the following variables output whatever domain they want. This can lead to cache poisoning and barely noticeable phishing attacks.
$_SERVER['HTTP_HOST']
This gets the domain from the request headers which are open to manipulation by hackers. Same with:
$_SERVER['SERVER_NAME']
This one can be made better if the Apache setting usecanonicalname is turned off; in which case $_SERVER['SERVER_NAME'] will no longer be allowed to be populated with arbitrary values and will be secure. This is, however, non-default and not as common of a setup.
In popular systems
Below is how you can get the current domain in the following frameworks/systems:
WordPress
$urlparts = parse_url(home_url());
$domain = $urlparts['host'];
If you're constructing a URL in WordPress, just use home_url or site_url, or any of the other URL functions.
Laravel
request()->getHost()
The request()->getHost function is inherited from Symfony, and has been secure since the 2013 CVE-2013-4752 was patched.
Drupal
The installer does not yet take care of making this secure (issue #2404259). But in Drupal 8 there is documentation you can you can follow at Trusted Host Settings to secure your Drupal installation after which the following can be used:
\Drupal::request()->getHost();
Other frameworks
Feel free to edit this answer to include how to get the current domain in your favorite framework. When doing so, please include a link to the relevant source code or to anything else that would help me verify that the framework is doing things securely.
Addendum
Exploitation examples:
Cache poisoning can happen if a botnet continuously requests a page using the wrong hosts header. The resulting HTML will then include links to the attackers website where they can phish your users. At first the malicious links will only be sent back to the hacker, but if the hacker does enough requests, the malicious version of the page will end up in your cache where it will be distributed to other users.
A phishing attack can happen if you store links in the database based on the hosts header. For example, let say you store the absolute URL to a user's profiles on a forum. By using the wrong header, a hacker could get anyone who clicks on their profile link to be sent a phishing site.
Password reset poisoning can happen if a hacker uses a malicious hosts header when filling out the password reset form for a different user. That user will then get an email containing a password reset link that leads to a phishing site. Another more complex form of this skips the user having to do anything by getting the email to bounce and resend to one of the hacker's SMTP servers (for example CVE-2017-8295.)
Here are some more malicious examples
Additional Caveats and Notes:
When usecanonicalname is turned off the $_SERVER['SERVER_NAME'] is populated with the same header $_SERVER['HTTP_HOST'] would have used anyway (plus the port). This is Apache's default setup. If you or DevOps turns this on then you're okay -- ish -- but do you really want to rely on a separate team, or yourself three years in the future, to keep what would appear to be a minor configuration at a non-default value? Even though this makes things secure, I would caution against relying on this setup.
Red Hat, however, does turn usecanonical on by default [source].
If serverAlias is used in the virtual hosts entry, and the aliased domain is requested, $_SERVER['SERVER_NAME'] will not return the current domain, but will return the value of the serverName directive.
If the serverName cannot be resolved, the operating system's hostname command is used in its place [source].
If the host header is left out, the server will behave as if usecanonical
was on [source].
Lastly, I just tried exploiting this on my local server, and was unable to spoof the hosts header. I'm not sure if there was an update to Apache that addressed this, or if I was just doing something wrong. Regardless, this header would still be exploitable in environments where virtual hosts are not being used.
A Little Rant:
This question received hundreds of thousands of views without a single mention of the security problems at hand! It shouldn't be this way, but just because a Stack Overflow answer is popular, that doesn't mean it is secure.
Using $_SERVER['HTTP_HOST'] gets me (subdomain.)maindomain.extension. It seems like the easiest solution to me.
If you're actually 'redirecting' through an iFrame, you could add a GET parameter which states the domain.
<iframe src="myserver.uk.com?domain=one.com"/>
And then you could set a session variable that persists this data throughout your application.
Try $_SERVER['SERVER_NAME'].
Tips: Create a PHP file that calls the function phpinfo() and see the "PHP Variables" section. There are a bunch of useful variables we never think of there.
To get the domain:
$_SERVER['HTTP_HOST']
Domain with protocol:
$protocol = strpos(strtolower($_SERVER['SERVER_PROTOCOL']), 'https') === FALSE ? 'http' : 'https';
$domainLink = $protocol . '://' . $_SERVER['HTTP_HOST'];
Protocol, domain, and queryString total:
$url = $protocol . '://' . $_SERVER['HTTP_HOST'] . '?' . $_SERVER['QUERY_STRING'];
**As the $_SERVER['SERVER_NAME'] is not reliable for multi-domain hosting!
I know this might not be entirely on the subject, but in my experience, I find storing the WWW-ness of the current URL in a variable useful.
In addition, please see my comment below, to see what this is getting at.
This is important when determining whether to dispatch Ajax calls with "www", or without:
$.ajax("url" : "www.site.com/script.php", ...
$.ajax("url" : "site.com/script.php", ...
When dispatching an Ajax call the domain name must match that of in the browser's address bar, and otherwise you will have an Uncaught SecurityError in the console.
So I came up with this solution to address the issue:
<?php
substr($_SERVER['SERVER_NAME'], 0, 3) == "www" ? $WWW = true : $WWW = false;
if ($WWW) {
/* We have www.example.com */
} else {
/* We have example.com */
}
?>
Then, based on whether $WWW is true, or false run the proper Ajax call.
I know this might sound trivial, but this is such a common problem that is easy to trip over.
Everybody is using the parse_url function, but sometimes a user may pass the argument in different formats.
So as to fix that, I have created a function. Check this out:
function fixDomainName($url='')
{
$strToLower = strtolower(trim($url));
$httpPregReplace = preg_replace('/^http:\/\//i', '', $strToLower);
$httpsPregReplace = preg_replace('/^https:\/\//i', '', $httpPregReplace);
$wwwPregReplace = preg_replace('/^www\./i', '', $httpsPregReplace);
$explodeToArray = explode('/', $wwwPregReplace);
$finalDomainName = trim($explodeToArray[0]);
return $finalDomainName;
}
Just pass the URL and get the domain.
For example,
echo fixDomainName('https://stackoverflow.com');
will return:
stackoverflow.com
And in some situation:
echo fixDomainName('stackoverflow.com/questions/id/slug');
And it will also return stackoverflow.com.
This quick & dirty works for me.
Whichever way you get the string containing the domain you want to extract, i.e. using a super global -$_SERVER['SERVER_NAME']- or, say, in Drupal: global $base_url, regex is your friend:
global $base_url;
preg_match("/\w+\.\w+$/", $base_url, $matches);
$domain = $matches[0];
The particular regex string I am using in the example will only capture the last two components of the $base_url string, of course, but you can add as many "\w+." as desired.
Hope it helps.