Ok, to start with - I am addicted to using a root relative link structure for everything. Include in php always makes that difficult for me, but I happened upon a line of code that lets me include root-relatively.
It's really simple:
set_include_path( get_include_path() . PATH_SEPARATOR . $_SERVER['DOCUMENT_ROOT'] );
That's from a comment in the php manual
I have a pretty simple php site, but with many different subdirectories, and this makes it easy to use. Also - the company may be switching servers soon, and I am thinking this may ease the transition for many sites.
So is there a security risk here? I don't dynamically include files or remotely include them. Am I taking a performance hit including this at the top of every php file? or is it negligible?
There is no security risk as long as you control what you put in the include_path.
There is, however, a performance hit if you have too many paths in your include_path (as PHP will have to try each path before finding the file).
Given your code, the docroot is at the end of the include_path, so you'll only see a performance hit when an included file isn't found in the rest of the include_path (ie a missing file).
Related
I'm looking for a way to disable PHP's use of include_path in a project.
Background
Perhaps you're surprised: Why would I want to prevent the use of such a useful feature?
Unfortunately, while I'd consider the answer simple, it's not simple to accurately describe:
In brief, include_path is (obviously) package-/library-blind.
More specifically, it's possible to put two different versions of a library into two folders, both of which are in include_path, and ending up with executed code that is a mixture of both. Consider the following setting:
include_path=/some/server/path/lib:
/var/www/apache/htdocs/yourApplication/library
Now imagine the canonical location for a library you want to use is in /some/server/path/lib and your build process places it there, but a developer is trying to patch a part of it and erroneously syncing the library to /var/www/apache/htdocs/yourApplication/library. Now imagine this happens:
/some/server/path/lib/moduleYouWant
'--> A.php (old version)
/var/www/apache/htdocs/yourApplication/
'--> SomeFile.php (uses B.php from module; previously used A.php)
/var/www/apache/htdocs/yourApplication/library/moduleYouWant
'--> A.php (new version)
'--> B.php (new file; uses A.php from module)
Suddenly, your application will mysteriously use or autoload (if your autoloader puts include_path to use, which most framework-supplied autoloaders do, as far as I'm aware) only half of the changes you made - the new B.php but the old A.php.
You may, of course, object to the setup I've described on grounds that it should never happen for other reasons. That's fair. I'm likely to agree, even. The above scenario is a backwards-incompatible change, for example, and that's not very nice.
Nonetheless, so far I've seen it happen twice in the wild (in a sufficiently complex project, with chroots muddying the water) and it's eaten up countless confused debugging hours... and I don't want to have that issue again. Not in this sort of constellation - or any other.
I don't want PHP magically trying to find my files. I want PHP to demand I be specific about which files to load.
Why I don't need include_path
With __DIR__ (or dirname(__FILE__) in older PHP versions), semantically 'relative' paths are easy to construct. And if I need live- and test-server distinctions, constants defining absolute locations are my friend.
So... I want non-absolute paths supplied to include() (and related functions) to fail.
This stackoverflow question from 2011 tells me that I can't blank include_path. Is there perhaps some other way I can disable the feature? My Googlefu is weak, but nonetheless I have a creeping suspicion the answer is 'no (unless you patch PHP itself)'.
If someone knows of a way I can put a stop to this other than by code convention agreement (or the related "please don't use include() but this custom function I wrote"), I'd love to know. :)
You indicate that the root of the problem is that you have developers writing bad relative includes where they should use be using absolute includes or autoloading.
The solution to this is not technical but cultural. You can enforce correct behavior by adding a pre-commit-hook to your versioning-system that tries to detect erroneous includes (either relative includes, or just all includes) and block the commit.
Just use the following:
set_include_path("./");
then when you need to include file outside the current folder you have to specify the fully qualified relative or absolute path
I'm auditing my site design based on the excellent Essential PHP Security by Chris Shiflett.
One of the recommendations I'd like to adopt is moving all possible files out of webroot, this includes includes.
Doing so on my shared host is simple enough, but I'm wondering how people handle this on their development testbeds?
Currently I've got an XAMPP installation configured so that localhost/mysite/ matches up with D:\mysite\ in which includes are stored at D:\mysite\includes\
In order to keep include paths accurate, I'm guess I need to replicate the server's path on my local disk? Something like D:\mysite\public_html\
Is there a better way?
This seems to be a sticking point for quite a few php developers, so lets address it well. Most PHP applications litter their code with include '../../library/someclass.php.class'. This isn't much good to anyone, because its very easy to break, and no-one likes doing path janitor work when you should be coding. It's also a bit like building a house of cards and cementing the joins for fear of any change. So ok, maybe we could just create a constant, and use the full path?
define('PATH', '/home/me/webroot/Application');
include(PATH . '/Library/someclass.php.class');
Well thats pretty good, but erm, what if we deploy on windows? Also, are we going to define path on every script entrance point? Not very DRY if you ask me. Plus, moving deployments is going to be a huge pain. Clearly, while we're closer it's not much of an improvement.
Luckily, PHP provides a few magic bullet functions that can help us out immediately.
set_include_path
get_include_path
realpath
So lets just say you have a single entrance point for your application, or at the very least a shared header file. We can grab our deployment root pretty quickly if we know where our header file is related the the code root. IE, in /home/me/webroot/Application/Init/set_paths.php
define('PATH_SITE', realpath(dirname(__FILE__) . '/../../'));
Awesome, thats our document root. It's OS independant and its pretty easy to adapt if you change where set_paths.php lives. Now we can talk about some other locations in our application, just because constants are handy:
define('PATH_APPLICATION', realpath(PATH_SITE . "/Application"));
define('PATH_LIBRARY', realpath(PATH_SITE . "/Application/Library"));
define('PATH_CONFIG', realpath(PATH_SITE . "/Config"));
define('PATH_WRITE', realpath(PATH_SITE . "/Volatile"));
This is all very well and good, but its not really much better than our previous solution. Enter in the PHP include path. By adding the relevant constants to our path, we wont need to define them every time. Order of paths in the include path is actually pretty important for speed, so we make every effort to get them in order of usage.
$paths['inc'] = array_flip(explode(PATH_SEPARATOR, get_include_path()));
unset($paths['inc']['.']);
$paths['inc'] = array_flip($paths['inc']);
// The first item on the path the external
// libs that get used all the time,
// then the application path, then the
// site path, and any php configured items.
// The current directory should be last.
$paths = array_merge(array(PATH_LIBRARY, PATH_APPLICATION, PATH_SITE), $paths['inc'], array("."));
set_include_path(implode(PATH_SEPARATOR, $paths));
Now all the critical locations in our application are on the path, and you can include to your hearts content, regardless of where you decide to store your libraries, settings etc.
include('someclass.php.class');
A step further
If you're working with a fairly well designed OOP Application, we can go a bit further. If you subscribe to one file, one class, then the PEAR naming convention makes life very simple.
The PEAR naming conventions dictate a 1:1 relation between the filesystem and the class. As an example, the class Foo_Bar_Baz would be found in the file "Foo/Bar/Baz.php" on your include_path.
source
Once you have a predictable mapping of files to classes, you can then implement spl_autoload_register And you can replace
include('someclass.php.class');
new SomeClass();
With simply
new SomeClass();
And have PHP deal with it for you.
Yes, there is a better way. You should always be using relative paths, as in include('./includes/foo.php');. If your paths are relative, you don't have to worry about your local paths except that they should match the overall structure of the site (./includes could refer to D:\projects\web\foo-page\includes on your local machine and /home/andrew/foo-page/includes on the site).
Alternately, use a web server on your local machine or a virtual machine to mimic your production environment; in a properly configured environment, / will refer to your wwwroot, not to your root directory (like filesystem / or D:\ on Windows).
You could always have relative include paths. Either simply doing require("../../something");
instead of require("D:\something\something"); (Of course, in that case you have to make sure that number of .. before your path is correct. (.. means go to the parent directory)), or, if your include structure is very complex, you could use the FILE constant, which always points to the php file currently being executed. You could get that value, and then parse our the needed paths to your file.
Finally, if you want to keep the file structure as exact as in production server as possible, but don't want to keep a lot of files in different locations, look up junctions http://en.wikipedia.org/wiki/NTFS_junction_point for windows or symbolic links for *nix.
That way you could build up the right paths using junctions, at the same time keeping your original files where they were, thus only keeping 1 version of files.
Just something I wonder about when including files:
Say I want to include a file, or link to it. Should I just for example:
include("../localfile.php");
or should I instead use
include("http://sameserver.com/but/adirect/linkto/localfile.php");
Is one better than the other? Or more secure? Or is it just personal preference?
Clearly it would be a necessity if you had a file that you would include into files in multiple directories, and THAT file includes a different file, or is there some other way of doing that?
Reading a file is much faster than making an HTTP request and getting the response. Never include(a_uri) if you can help it.
Use $_SERVER['DOCUMENT_ROOT'] if you want to calculate a complete file path for your include.
As said before, definitely include a local file and not do an HTTP request (which takes more time, is not cached and the contents are technically viewable to all the world, if he knows where to look for it).
One more small detail, if you use full paths to your included files, it will even be faster then relative paths, especially if you use some kind Byte Code Cache.
Definitely include the local file, because the php script doesn't really know or care that you're including a script on your local server, so the url path causes an http request, and network latency from http requests is pretty much the bottleneck for rendering any html page in general, the fewer of them you have, the better off you're going to be.
Personally, I try to avoid using include and require in general, in favor of require_once, because using require_once means that you are writing your code reusably instead of writing code that executes immediately when you include it. Pull in class definitions, pull in function libraries, but try to avoid code that executes immediately when you include it, because that will make it harder to reuse.
If your question is about keeping it so you don't have to change a billion paths when you move from staging to production, go with this little tidbit I learned:
define('BASE_DIR', '/path/to/root/');
Then use BASE_DIR in all of your path references. When it's time to move your site, just change that definition to the new path (which should just be / at that point).
In addition to what other people say, these invocations will have different results, since the remove invocation will execute php output, not the file contents. Unless you stop php from processing the file, in which case you're exposing your code to the world which is also not necessarily what you actually want to.
Always include locally, cause if you include remote someone can create a different file and do nasty things. And the other problem you can't really test this with remote includes. As far as i know you should use require_once instead...
Each page on my website is rendered using PHP.
Each PHP file uses around 10 includes. So for every page that is displayed, the server needs to fetch 10 files, in addition to the rest of its functions (MySQL, etc).
Should I combine them into a single include file? Will that make ANY difference to the real-world speed? It's not a trivial task as there would be a spaghetti of variable scope to sort out.
Include files are processed on the server, so they're not "fetched" by the browser. The performance difference of using includes vs. copy and pasting the code or consolidating files is so negligible (and I'm guessing we're talking about in the 10 ms to 100 ms range, at the absolute most), that it isn't at all worth it.
Feel free to include and require to your heart's content. Clean code is substantially more important than shaving less than 100 ms off a page load. If you're building something where timing is that critical, you shouldn't be using PHP anyway.
What takes time is figuring out where the files are actually located in the include path. If you got multiple locations in your include path, PHP will search each location until it either finds the file or fails (in which case it throws an error). That's why you should put the include path where most of the included files are to be found on top of the include path.
If you use absolute paths in your include path, PHP will cache the path in the realpath cache, but note that this gets stale very quickly. So yes, including ten files is potentially slower than including one large file, simply because PHP has to check the include path more often. However, unless your webserver is a really weak machine, ten files is not enough to make an impact. This gets only interesting when including hundreds of files or have many locations to search, in which case you should use an OpCode cache anyway.
Also note that when including files, it is not good practice to include each and every file right at the beginning, because you might be including files that are never called by your application for a specific request.
Reference
http://de2.php.net/manual/en/ini.core.php#ini.include-path
http://de2.php.net/manual/en/ini.core.php#ini.sect.performance
http://en.wikipedia.org/wiki/List_of_PHP_accelerators
Although disk I/O operations among the biggest performance-eaters, a regular site won't notice any sensible number of includes.
Before you hit any problems with includes, you probably already would have some opcode cache that eliminates this problem too.
include\ andrequires` only open file on the server side, but that might be time consumming depending on the hardware/filesystem, etc.
Anyway, if you can, use autoloader. Only needed files will be loaded that way.
Then if you think included files are a source of slowdown (and I think there is a lot of other points to look for improvement before), you can try to automatically merge the files. You still have one file per class when developping, but you can build a file that contains each class' definition to have only one include (something like cat <all your included file>.php > to_include.php).
The question might prompt some people to say a definitive YES or NO almost immediately, but please read on...
I have a simple website where there are 30 php pages (each has some php server side code + HTML/CSS etc...). No complicated hierarchy, nothing. Just 30 pages.
I also have a set of purely back-end php files - the ones that have code for saving stuff to database, doing authentication, sending emails, processing orders and the like. These will be reused by those 30 content-pages.
I have a master php file to which I send a parameter. This specifies which one of those 30 files is needed and it includes the appropriate content-page. But each one of those may require a variable number of back-end files to be included. For example one content page may require nothing from back-end, while another might need the database code, while something else might need the emailer, database and the authentication code etc...
I guess whatever back-end page is required, can be included in the appropriate content page, but one small change in the path and I have to edit tens of files. It will be too cumbersome to check which content page is requested (switch-case type of thing) and include the appropriate back-end files, in the master php file. Again, I have to make many changes if a single path changes.
Being lazy, I included ALL back-end files inthe master file so that no content page can request something that is not included.
First question - is this a good practice? if it is done by anyone at all.
Second, will there be a performance problem or any kind of problem due to me including all the back-end files regardless of whether they are needed?
EDIT
The website gets anywhere between 3000 - 4000 visits a day.
You should benchmark. Time the execution of the same page with different includes. But I guess it won't make much difference with 30 files.
But you can save yourself the time and just enable APC in the php.ini (it is a PECL extension, so you need to install it). It will cache the parsed content of your files, which will speed things up significantly.
BTW: There is nothing wrong with laziness, it's even a virtue ;)
If your site is object-oriented I'd recommend using auto-loading (http://php.net/manual/en/language.oop5.autoload.php).
This uses a magic method (__autoload) to look for a class when needed (it's lazy, just like you!), so if a particular page doesn't need all the classes, it doesn't have to get them!
Again, though, this depends on if it is object-oriented or not...
It will slow down your site, though probably not by a noticable amount. It doesn't seem like a healthy way to organize your application, though; I'd rethink it. Try to separate the application logic (eg. most of the server-side code) from the presentation layer (eg. the HTML/CSS).
it's not a bad practice if the files are small and contains just definition and settings.
if they actually run code, or extremely large, it will cause a performance issue.
now - if your site has 3 visitors an hour - who cares, if you have 30000... that's another issue, and you need to work harder to minimize that.
You can migitate some of the disadvantages of PHP code-compiling by using XCache. This PHP module will cache the PHP-opcode which reduces compile time and performance.
Considering the size of your website; if you haven't noticed a slowdown, why try to fix it?
When it comes to larger sites, the first thing you should do is install APC. Even though your current method of including files might not benefit as much from APC as it could, APC will still do an amazing job speeding stuff up.
If response-speed is still problematic, you should consider including all your files. APC will keep a cached version of your sourcefiles in memory, but can only do this well if there are no conditional includes.
Only when your PHP application is at a size where memory exhaustion is a big risk (note that for most large-scale websites Memory is not the bottleneck) you might want to conditionally include parts of your application.
Rasmus Lerdorf (the man behind PHP) agrees: http://pooteeweet.org/blog/538
As others have said, it shouldn't slow things down much, but it's not 'ideal'.
If the main issue is that you're too lazy to go changing the paths for all the included files (if the path ever needs to be updated in the future). Then you can use a constant to define the path in your main file, and use the constant any time you need to include/require a file.
define('PATH_TO_FILES', '/var/www/html/mysite/includes/go/in/here/');
require_once PATH_TO_FILES.'database.php';
require_once PATH_TO_FILES.'sessions.php';
require_once PATH_TO_FILES.'otherstuff.php';
That way if the path changes, you only need to modify one line of code.
It will indeed slow down your website. Most because of the relative slow loading and processing of PHP. The more code you'd like to include, the slower the application will get.
I live by "include as little as possible, as much as necessary" so i usually just include my config and session handling for everything and then each page includes just what they need using an include path defined in the config include, so for path changes you still just need to change one file.
If you include everything the slowdown won't be noticeable until you get a lot of page hits (several hits per second) so in your case just including everything might be ok.