I'm looking for a way to disable PHP's use of include_path in a project.
Background
Perhaps you're surprised: Why would I want to prevent the use of such a useful feature?
Unfortunately, while I'd consider the answer simple, it's not simple to accurately describe:
In brief, include_path is (obviously) package-/library-blind.
More specifically, it's possible to put two different versions of a library into two folders, both of which are in include_path, and ending up with executed code that is a mixture of both. Consider the following setting:
include_path=/some/server/path/lib:
/var/www/apache/htdocs/yourApplication/library
Now imagine the canonical location for a library you want to use is in /some/server/path/lib and your build process places it there, but a developer is trying to patch a part of it and erroneously syncing the library to /var/www/apache/htdocs/yourApplication/library. Now imagine this happens:
/some/server/path/lib/moduleYouWant
'--> A.php (old version)
/var/www/apache/htdocs/yourApplication/
'--> SomeFile.php (uses B.php from module; previously used A.php)
/var/www/apache/htdocs/yourApplication/library/moduleYouWant
'--> A.php (new version)
'--> B.php (new file; uses A.php from module)
Suddenly, your application will mysteriously use or autoload (if your autoloader puts include_path to use, which most framework-supplied autoloaders do, as far as I'm aware) only half of the changes you made - the new B.php but the old A.php.
You may, of course, object to the setup I've described on grounds that it should never happen for other reasons. That's fair. I'm likely to agree, even. The above scenario is a backwards-incompatible change, for example, and that's not very nice.
Nonetheless, so far I've seen it happen twice in the wild (in a sufficiently complex project, with chroots muddying the water) and it's eaten up countless confused debugging hours... and I don't want to have that issue again. Not in this sort of constellation - or any other.
I don't want PHP magically trying to find my files. I want PHP to demand I be specific about which files to load.
Why I don't need include_path
With __DIR__ (or dirname(__FILE__) in older PHP versions), semantically 'relative' paths are easy to construct. And if I need live- and test-server distinctions, constants defining absolute locations are my friend.
So... I want non-absolute paths supplied to include() (and related functions) to fail.
This stackoverflow question from 2011 tells me that I can't blank include_path. Is there perhaps some other way I can disable the feature? My Googlefu is weak, but nonetheless I have a creeping suspicion the answer is 'no (unless you patch PHP itself)'.
If someone knows of a way I can put a stop to this other than by code convention agreement (or the related "please don't use include() but this custom function I wrote"), I'd love to know. :)
You indicate that the root of the problem is that you have developers writing bad relative includes where they should use be using absolute includes or autoloading.
The solution to this is not technical but cultural. You can enforce correct behavior by adding a pre-commit-hook to your versioning-system that tries to detect erroneous includes (either relative includes, or just all includes) and block the commit.
Just use the following:
set_include_path("./");
then when you need to include file outside the current folder you have to specify the fully qualified relative or absolute path
Related
Whenever I want to include a document with PHP, or perform any other PHP action which requires a path to be described, I need to write something like, ../../../../../document.html. This works, but it's tedious, and in some cases, the path is wrong, resulting in code appearing on-page, and debugging.
This can, obviously, be bypassed by using the $SERVER_['DOCUMENT_ROOT'] command, but that, too, requires a sometimes unmanageable amount of code (again, when many, many paths are present).
Is there any way to simply define all PHP paths site-wide to be document root-relative, as in HTML (/document.html is root relative)?
I have a detailed answer on this in another question:
finding a file in php that is 4 directories up
It explains the caveats of relative file paths in PHP. Use the magic constants and server variables mentioned there to overcome relative path issues.
Yes. Most experienced developers would tend to define constants in a config file for various paths important to the application. So perhaps something like this if you want to define the webserver document root as your application root, and perhaps have another path otuside the web server directory where you place application includes (classes, etc.) that you don;t want exposed in the web directory.
define('WEB_ROOT', $_SERVER['DOCUMENT_ROOT']);
define('INCLUDE_DIR', '/path/to/directory/');
You can then just reference these constants in your application.
I would certainly recommend going away from relative paths as they are problematic when refactoring your code or moving your code from one server to another. If you need relative type of paths (for app portability for example) you might be better served using the PHP magic constants like __FILE__ and __DIR__.
I used to do that and have those problems. Then I switched my site to use mod_rewrite for the urls. I then had all of my php pages in the same directory so I didn't have to go a confusing 4 times up the directory structure to find the root. You can have a php file on your server at:
/var/www/index.php
And, using mod_rewrite in your .htaccess file, you can have that map to:
http://domain.com/really/long/path/structure/page.html
When I moved over to that structure, it really helped me on the php side of things specifically regarding navigating to different directories.
Ok, to start with - I am addicted to using a root relative link structure for everything. Include in php always makes that difficult for me, but I happened upon a line of code that lets me include root-relatively.
It's really simple:
set_include_path( get_include_path() . PATH_SEPARATOR . $_SERVER['DOCUMENT_ROOT'] );
That's from a comment in the php manual
I have a pretty simple php site, but with many different subdirectories, and this makes it easy to use. Also - the company may be switching servers soon, and I am thinking this may ease the transition for many sites.
So is there a security risk here? I don't dynamically include files or remotely include them. Am I taking a performance hit including this at the top of every php file? or is it negligible?
There is no security risk as long as you control what you put in the include_path.
There is, however, a performance hit if you have too many paths in your include_path (as PHP will have to try each path before finding the file).
Given your code, the docroot is at the end of the include_path, so you'll only see a performance hit when an included file isn't found in the rest of the include_path (ie a missing file).
I'm in the middle of making my own custom forum system software. Much like phpbb, mybb, vbulletin, etc. except it's obviously quite less advanced. It's just a personal project for myself and I've run into some problems since I've never had to develop something that can be repackaged for others.
The file structure is as follows:
So, config.php is the end all be all of including files. It has the database connection information, it instantiates my database class as well, and none of the function files require/include any files since they'll always be accessed where config.php is required.
HERE'S THe QUESTION!
However I'm running into simple but very annoying problems, for example I call a function in config.php towards the top that checks the users cookies values and makes sure they all belong to the same user, and if not it deletes the cookies. However, it has to be after the database files require. And things like, a variable declared in config.php isn't always accessible, so sometimes I have to declare it in the header files.
Seems like it's not much of a question, but I guess it's just asking for how I can include/require in general without running into issues.
As a general note, most people don't mix config variables and code in one file. If you look at popular open source packages like Wordpress, Config.php just has config variables set. No code.
If you're using certain functions in anything more than a "one off" situation, you may want to consider putting them into your main class - that way they're available as needed.
#James is right, separate your config file. You can include it inside an "application.php" required file (so it's available globally).
I have run into a situation where I absolutely needed HTTP Header information prior to page build. Though it seemed a little backward, the solution was to call that file first, then include the application.php file. Checking for a cookie should be fine.
In another situation, #include('myStubbonPricing.php') was the answer. I'm not an advocate of error suppression, but in my case it only outputted a shipping rate (if the zip code was entered). To my defense !isset and the like would not fix the problem due to an XML request/response scenario.
I'm auditing my site design based on the excellent Essential PHP Security by Chris Shiflett.
One of the recommendations I'd like to adopt is moving all possible files out of webroot, this includes includes.
Doing so on my shared host is simple enough, but I'm wondering how people handle this on their development testbeds?
Currently I've got an XAMPP installation configured so that localhost/mysite/ matches up with D:\mysite\ in which includes are stored at D:\mysite\includes\
In order to keep include paths accurate, I'm guess I need to replicate the server's path on my local disk? Something like D:\mysite\public_html\
Is there a better way?
This seems to be a sticking point for quite a few php developers, so lets address it well. Most PHP applications litter their code with include '../../library/someclass.php.class'. This isn't much good to anyone, because its very easy to break, and no-one likes doing path janitor work when you should be coding. It's also a bit like building a house of cards and cementing the joins for fear of any change. So ok, maybe we could just create a constant, and use the full path?
define('PATH', '/home/me/webroot/Application');
include(PATH . '/Library/someclass.php.class');
Well thats pretty good, but erm, what if we deploy on windows? Also, are we going to define path on every script entrance point? Not very DRY if you ask me. Plus, moving deployments is going to be a huge pain. Clearly, while we're closer it's not much of an improvement.
Luckily, PHP provides a few magic bullet functions that can help us out immediately.
set_include_path
get_include_path
realpath
So lets just say you have a single entrance point for your application, or at the very least a shared header file. We can grab our deployment root pretty quickly if we know where our header file is related the the code root. IE, in /home/me/webroot/Application/Init/set_paths.php
define('PATH_SITE', realpath(dirname(__FILE__) . '/../../'));
Awesome, thats our document root. It's OS independant and its pretty easy to adapt if you change where set_paths.php lives. Now we can talk about some other locations in our application, just because constants are handy:
define('PATH_APPLICATION', realpath(PATH_SITE . "/Application"));
define('PATH_LIBRARY', realpath(PATH_SITE . "/Application/Library"));
define('PATH_CONFIG', realpath(PATH_SITE . "/Config"));
define('PATH_WRITE', realpath(PATH_SITE . "/Volatile"));
This is all very well and good, but its not really much better than our previous solution. Enter in the PHP include path. By adding the relevant constants to our path, we wont need to define them every time. Order of paths in the include path is actually pretty important for speed, so we make every effort to get them in order of usage.
$paths['inc'] = array_flip(explode(PATH_SEPARATOR, get_include_path()));
unset($paths['inc']['.']);
$paths['inc'] = array_flip($paths['inc']);
// The first item on the path the external
// libs that get used all the time,
// then the application path, then the
// site path, and any php configured items.
// The current directory should be last.
$paths = array_merge(array(PATH_LIBRARY, PATH_APPLICATION, PATH_SITE), $paths['inc'], array("."));
set_include_path(implode(PATH_SEPARATOR, $paths));
Now all the critical locations in our application are on the path, and you can include to your hearts content, regardless of where you decide to store your libraries, settings etc.
include('someclass.php.class');
A step further
If you're working with a fairly well designed OOP Application, we can go a bit further. If you subscribe to one file, one class, then the PEAR naming convention makes life very simple.
The PEAR naming conventions dictate a 1:1 relation between the filesystem and the class. As an example, the class Foo_Bar_Baz would be found in the file "Foo/Bar/Baz.php" on your include_path.
source
Once you have a predictable mapping of files to classes, you can then implement spl_autoload_register And you can replace
include('someclass.php.class');
new SomeClass();
With simply
new SomeClass();
And have PHP deal with it for you.
Yes, there is a better way. You should always be using relative paths, as in include('./includes/foo.php');. If your paths are relative, you don't have to worry about your local paths except that they should match the overall structure of the site (./includes could refer to D:\projects\web\foo-page\includes on your local machine and /home/andrew/foo-page/includes on the site).
Alternately, use a web server on your local machine or a virtual machine to mimic your production environment; in a properly configured environment, / will refer to your wwwroot, not to your root directory (like filesystem / or D:\ on Windows).
You could always have relative include paths. Either simply doing require("../../something");
instead of require("D:\something\something"); (Of course, in that case you have to make sure that number of .. before your path is correct. (.. means go to the parent directory)), or, if your include structure is very complex, you could use the FILE constant, which always points to the php file currently being executed. You could get that value, and then parse our the needed paths to your file.
Finally, if you want to keep the file structure as exact as in production server as possible, but don't want to keep a lot of files in different locations, look up junctions http://en.wikipedia.org/wiki/NTFS_junction_point for windows or symbolic links for *nix.
That way you could build up the right paths using junctions, at the same time keeping your original files where they were, thus only keeping 1 version of files.
Where should I put require_once statements, and why?
Always on the beginning of a
file, before the class,
In the actual method when the
file is really needed
It depends
?
Most frameworks put includes at the beginning and do not care if the file is really needed.
Using autoloader is the other case here.
Edit:
Surely, we all agree, that the autoloader is the way to go. But that is the 'other case' I was not
asking here. (BTW, Zend Framework Application uses autoloader, and the files are still hard-required, and placed at the beginning).
I just wanted to know, why do programmers include required files at the beginning of the file, even when they likely will not be used at all (e.g. Exception files).
Autoloading is a much better practice, as it will only load what is needed. Obviously, you also need to include the file which defines the __autoload function, so you're going to have some somewhere.
I usually have a single file called "includes.php" which then defines the __autoload and includes all the non-class files (such as function libraries, configuration files, etc). This file is loaded at the start of each page.
I'd say 3. It depends. If you're dealing with a lot of code, it may be worth loading the include file on request only, as loading code will take time to do, and eat up memory. On the other hand, this makes maintenance much harder, especially if you have dependencies. If you load includes "on demand", you may want to use a wrapper function so you can keep track of what module is loaded where.
I think the autoloader mechanism really is the way to go - of course, the application's design needs to be heavily object oriented for that to work.