Organizing PHP includes in your development environment - php

I'm auditing my site design based on the excellent Essential PHP Security by Chris Shiflett.
One of the recommendations I'd like to adopt is moving all possible files out of webroot, this includes includes.
Doing so on my shared host is simple enough, but I'm wondering how people handle this on their development testbeds?
Currently I've got an XAMPP installation configured so that localhost/mysite/ matches up with D:\mysite\ in which includes are stored at D:\mysite\includes\
In order to keep include paths accurate, I'm guess I need to replicate the server's path on my local disk? Something like D:\mysite\public_html\
Is there a better way?

This seems to be a sticking point for quite a few php developers, so lets address it well. Most PHP applications litter their code with include '../../library/someclass.php.class'. This isn't much good to anyone, because its very easy to break, and no-one likes doing path janitor work when you should be coding. It's also a bit like building a house of cards and cementing the joins for fear of any change. So ok, maybe we could just create a constant, and use the full path?
define('PATH', '/home/me/webroot/Application');
include(PATH . '/Library/someclass.php.class');
Well thats pretty good, but erm, what if we deploy on windows? Also, are we going to define path on every script entrance point? Not very DRY if you ask me. Plus, moving deployments is going to be a huge pain. Clearly, while we're closer it's not much of an improvement.
Luckily, PHP provides a few magic bullet functions that can help us out immediately.
set_include_path
get_include_path
realpath
So lets just say you have a single entrance point for your application, or at the very least a shared header file. We can grab our deployment root pretty quickly if we know where our header file is related the the code root. IE, in /home/me/webroot/Application/Init/set_paths.php
define('PATH_SITE', realpath(dirname(__FILE__) . '/../../'));
Awesome, thats our document root. It's OS independant and its pretty easy to adapt if you change where set_paths.php lives. Now we can talk about some other locations in our application, just because constants are handy:
define('PATH_APPLICATION', realpath(PATH_SITE . "/Application"));
define('PATH_LIBRARY', realpath(PATH_SITE . "/Application/Library"));
define('PATH_CONFIG', realpath(PATH_SITE . "/Config"));
define('PATH_WRITE', realpath(PATH_SITE . "/Volatile"));
This is all very well and good, but its not really much better than our previous solution. Enter in the PHP include path. By adding the relevant constants to our path, we wont need to define them every time. Order of paths in the include path is actually pretty important for speed, so we make every effort to get them in order of usage.
$paths['inc'] = array_flip(explode(PATH_SEPARATOR, get_include_path()));
unset($paths['inc']['.']);
$paths['inc'] = array_flip($paths['inc']);
// The first item on the path the external
// libs that get used all the time,
// then the application path, then the
// site path, and any php configured items.
// The current directory should be last.
$paths = array_merge(array(PATH_LIBRARY, PATH_APPLICATION, PATH_SITE), $paths['inc'], array("."));
set_include_path(implode(PATH_SEPARATOR, $paths));
Now all the critical locations in our application are on the path, and you can include to your hearts content, regardless of where you decide to store your libraries, settings etc.
include('someclass.php.class');
A step further
If you're working with a fairly well designed OOP Application, we can go a bit further. If you subscribe to one file, one class, then the PEAR naming convention makes life very simple.
The PEAR naming conventions dictate a 1:1 relation between the filesystem and the class. As an example, the class Foo_Bar_Baz would be found in the file "Foo/Bar/Baz.php" on your include_path.
source
Once you have a predictable mapping of files to classes, you can then implement spl_autoload_register And you can replace
include('someclass.php.class');
new SomeClass();
With simply
new SomeClass();
And have PHP deal with it for you.

Yes, there is a better way. You should always be using relative paths, as in include('./includes/foo.php');. If your paths are relative, you don't have to worry about your local paths except that they should match the overall structure of the site (./includes could refer to D:\projects\web\foo-page\includes on your local machine and /home/andrew/foo-page/includes on the site).
Alternately, use a web server on your local machine or a virtual machine to mimic your production environment; in a properly configured environment, / will refer to your wwwroot, not to your root directory (like filesystem / or D:\ on Windows).

You could always have relative include paths. Either simply doing require("../../something");
instead of require("D:\something\something"); (Of course, in that case you have to make sure that number of .. before your path is correct. (.. means go to the parent directory)), or, if your include structure is very complex, you could use the FILE constant, which always points to the php file currently being executed. You could get that value, and then parse our the needed paths to your file.
Finally, if you want to keep the file structure as exact as in production server as possible, but don't want to keep a lot of files in different locations, look up junctions http://en.wikipedia.org/wiki/NTFS_junction_point for windows or symbolic links for *nix.
That way you could build up the right paths using junctions, at the same time keeping your original files where they were, thus only keeping 1 version of files.

Related

Disabling include_path in PHP

I'm looking for a way to disable PHP's use of include_path in a project.
Background
Perhaps you're surprised: Why would I want to prevent the use of such a useful feature?
Unfortunately, while I'd consider the answer simple, it's not simple to accurately describe:
In brief, include_path is (obviously) package-/library-blind.
More specifically, it's possible to put two different versions of a library into two folders, both of which are in include_path, and ending up with executed code that is a mixture of both. Consider the following setting:
include_path=/some/server/path/lib:
/var/www/apache/htdocs/yourApplication/library
Now imagine the canonical location for a library you want to use is in /some/server/path/lib and your build process places it there, but a developer is trying to patch a part of it and erroneously syncing the library to /var/www/apache/htdocs/yourApplication/library. Now imagine this happens:
/some/server/path/lib/moduleYouWant
'--> A.php (old version)
/var/www/apache/htdocs/yourApplication/
'--> SomeFile.php (uses B.php from module; previously used A.php)
/var/www/apache/htdocs/yourApplication/library/moduleYouWant
'--> A.php (new version)
'--> B.php (new file; uses A.php from module)
Suddenly, your application will mysteriously use or autoload (if your autoloader puts include_path to use, which most framework-supplied autoloaders do, as far as I'm aware) only half of the changes you made - the new B.php but the old A.php.
You may, of course, object to the setup I've described on grounds that it should never happen for other reasons. That's fair. I'm likely to agree, even. The above scenario is a backwards-incompatible change, for example, and that's not very nice.
Nonetheless, so far I've seen it happen twice in the wild (in a sufficiently complex project, with chroots muddying the water) and it's eaten up countless confused debugging hours... and I don't want to have that issue again. Not in this sort of constellation - or any other.
I don't want PHP magically trying to find my files. I want PHP to demand I be specific about which files to load.
Why I don't need include_path
With __DIR__ (or dirname(__FILE__) in older PHP versions), semantically 'relative' paths are easy to construct. And if I need live- and test-server distinctions, constants defining absolute locations are my friend.
So... I want non-absolute paths supplied to include() (and related functions) to fail.
This stackoverflow question from 2011 tells me that I can't blank include_path. Is there perhaps some other way I can disable the feature? My Googlefu is weak, but nonetheless I have a creeping suspicion the answer is 'no (unless you patch PHP itself)'.
If someone knows of a way I can put a stop to this other than by code convention agreement (or the related "please don't use include() but this custom function I wrote"), I'd love to know. :)
You indicate that the root of the problem is that you have developers writing bad relative includes where they should use be using absolute includes or autoloading.
The solution to this is not technical but cultural. You can enforce correct behavior by adding a pre-commit-hook to your versioning-system that tries to detect erroneous includes (either relative includes, or just all includes) and block the commit.
Just use the following:
set_include_path("./");
then when you need to include file outside the current folder you have to specify the fully qualified relative or absolute path

Is there any way to make all PHP paths in a site, root-relative?

Whenever I want to include a document with PHP, or perform any other PHP action which requires a path to be described, I need to write something like, ../../../../../document.html. This works, but it's tedious, and in some cases, the path is wrong, resulting in code appearing on-page, and debugging.
This can, obviously, be bypassed by using the $SERVER_['DOCUMENT_ROOT'] command, but that, too, requires a sometimes unmanageable amount of code (again, when many, many paths are present).
Is there any way to simply define all PHP paths site-wide to be document root-relative, as in HTML (/document.html is root relative)?
I have a detailed answer on this in another question:
finding a file in php that is 4 directories up
It explains the caveats of relative file paths in PHP. Use the magic constants and server variables mentioned there to overcome relative path issues.
Yes. Most experienced developers would tend to define constants in a config file for various paths important to the application. So perhaps something like this if you want to define the webserver document root as your application root, and perhaps have another path otuside the web server directory where you place application includes (classes, etc.) that you don;t want exposed in the web directory.
define('WEB_ROOT', $_SERVER['DOCUMENT_ROOT']);
define('INCLUDE_DIR', '/path/to/directory/');
You can then just reference these constants in your application.
I would certainly recommend going away from relative paths as they are problematic when refactoring your code or moving your code from one server to another. If you need relative type of paths (for app portability for example) you might be better served using the PHP magic constants like __FILE__ and __DIR__.
I used to do that and have those problems. Then I switched my site to use mod_rewrite for the urls. I then had all of my php pages in the same directory so I didn't have to go a confusing 4 times up the directory structure to find the root. You can have a php file on your server at:
/var/www/index.php
And, using mod_rewrite in your .htaccess file, you can have that map to:
http://domain.com/really/long/path/structure/page.html
When I moved over to that structure, it really helped me on the php side of things specifically regarding navigating to different directories.

Preventing directory scanning from Acunetix

I have a PHP enabled site, with directory-listing turned off.
But, when I used Acunetix: (web vulnerability scanning software) to scan my site, and other high-profile websites, it was able to list all directories & files.
I don't know what this is happening, but I have this theory: maybe the software is using English words, trying to see if a folder exists by trying names like "include/", "css/", "/images", etc. Then, maybe it is able to list files that way.
Because, if directory listing is off, I don't know what more there is to do.
So, I devised this plan, that if I give my folders/files difficult names like I3Nc_lude, 11css11, etc., maybe it would be difficult for the software to find the names. What do you think?
I know, I could be dead-wrong about this, and the idea might be laughable but, that is why I am asking for help.
How do you Completely! Forbid directory listing??
Ensure all directories from the root of your site have directory
listings disabled. It is typically on by default when you setup a
new server.
Assuming that directory listing in your webserver is not your issue,
keep in mind that any resources you have in your site: CSS files, JS
sources, and of course HREFs can be traversed with little or no
effort (typically a few lines of javascript). There is no way to
hide anything that you've referenced. This is most likely what you
are seeing reflected in the scan.
Alternatively, if you use SVN or other version control systems to
deploy your site, often these can be used to determine the path of
every file in your codebase.
Probably the most common mistake people make when first creating sites is that they keep all their files in the webroot, and it becomes somewhat trivial to figure out where things are.
IMHO the best approach is have your code in a separate directory outside the webroot, and then load it as needed (this is how most MVC frameworks work). You can control entirely then what can and can not be accessed via the web. You can have 100s of classes in a directory and as long as they are not in the webroot, no one will ever be able to see them, even if directory listing were to become enabled.
The checkers aren't using some kind of language-based brute force attack, that would be far too costly and invasive even for the most inept hacker. Your internet file sharing service (Apache, IIS, whatever) is serving up the structure to anyone who asks.
I found this solution at - it should apply to you, I hope.
http://www.velvetblues.com/web-development-blog/dont-get-hacked-6-ways-to-secure-your-wordpress-blog/
Hide Your Directory Structure
It is also good practice to hide your directory structure. By default, many WordPress installations enable any visitors to snoop and see all files in folders lacking an index file. And while this might not seem dangerous, it really is. By enabling visitors to see what files are in each directory, they can better plot their attack.
To fix this problem, you can do one of two things:
Option 1: Use An Index File
For each directory that you want to protect, simply add an index file. A simple index.html file will suffice.
Option 2: Use An .htaccess File
The preferred way of hiding the directory structure is to use the following code in an .htaccess file.
Options -indexes
That just sounds like a nightmare to manage. Focus on securing the files the best you can with all preventative measures. Don't rely on security through obscurity. If someone wants in, some random directory names will just slow them down slightly

How to call a CSS file from the right place

I'm starting a project in PHP, and I want to structure my files properly from the start (unlike my last project, which had almost every file in a single directory). The problem is the following, which I will describe with an example:
Take the following files: index.php, includes/header.php, and css/common.css. index.php 'includes' the header (as will many other php files). The header then calls common.css so that its html elements can be placed properly. common.css will also provide styling for general elements in index.php and other files.
Notice that since the header is being included, when the header calls common.css, it does so from the location of the file calling it; in this case, index.php. But if I add, say, modules/friends.php and call the header with it, it will be looking for the CSS file in the wrong spot!
Initially I tried to remedy this by using the actual path for when I call CSS files. However, my local machine and web server have a different layout of directories, and therefore I cannot simply call /var/www/whatever.
Can anyone help me or redirect me to a place where this sort of thing is documented?
Thanks,
Paragon
Always specify absolute paths to all your resources: .css, .js, images, etc...
http://en.wikipedia.org/wiki/Absolute_path
However, my local machine and web server have a different layout of directories, and therefore I cannot simply call /var/www/whatever.
You can. Web paths is not the same thing as local filesystem paths. When you specify path in web - the root sign / specifies to the webroot (the directory your project is placed at), not your filesystem root.
Congratulations on recognizing a huge problem.
Yes, this is always the big, important question that you need to answer at the start.
I've finally learned -- and this is after quite a few years -- to try my best to make the file structure on the development machine (my PC, say) be exactly like the file structure on the host machine (a Linux host, for example). That one thing alone has saved me unending hours of grief.
If you can accomplish that, then the rest is a piece of cake, believe me. You can put files in whatever directories you want, wherever it makes sense to you, on both machines. You can figure out what files should go where.
If you don't bother to try for near-identical file-directory setups on both machines, you are forever going to be wondering, as you edit away, "Hey, what machine am I on? If I'm on the host, then very-important-file.php is in /toplevel, and everything else is under it. But if I'm on the PC, then very-important-file.php is over here in /my-files, see, and then other files are on different levels and did I delete that file and ..." My God, don't make me think, much less think about that mindless crap.
You can handle and remember just the root being in different spots on different machines, but other than that, forget it.
Now when you come to run your stuff, you will always know where the pieces of that stuff are: CSS files, JS files, whatever. PLUS you can (maybe; if you're lucky) debug your code on the PC or the host equally well, with no differences and with no changes anywhere. PLUS when you upload your new code, you can FTP it up to the host in one big chunk rooted where you like. (Which has the very nice ancillary benefit of your being able to move files around wherever you want on the development machine.)
Piece of cake! Don't pass up this chance to save yourself days or weeks (literally) of time.
Always IMHO.

My choice of Class Names is hampered by Windows XP Max Path Length issues with SVN / Domain Driven Design - any solutions

I'm using PHP 5.2 to make a website
I like to have explicit names for my classes
I also have a convention saying 'the path and name of a file' match the 'name of the class'
So a class called:
ABCSiteCore_Ctrlrs_DataTransfer_ImportMergeController
would sit in my svn working copy at:
C:\_my\websrv\ABCCoUkHosting2\webserve\my_library\vendor\ABCSiteCore-6-2\ABCSiteCore\Ctrlrs\DataTransfer\ImportMergeController.php
I find the naming convention gives me a better view of my code base, leading to better understanding and reducing the feeling of complexity.
Unfortunately there seems to be a max path length on my Windows XP PC. It seems to cause problems when I try to checkout Subversion files into my working copy.
If the path is too long, I can't check it out - the checkout fails.
So I find myself taking ages just to think of a name for a domain concept.
I might want to name a class "notification service" - but I end up calling it something like "NtfctnSrvce". It also cause problems when I try to create a specification class.
say, for example i'd love to have a spec class called with an explicit name,say:
$hasBeenNotifiedSpec = new ABCSiteCore_Model_MssgSys_Rules_Customers_HasCustomerBeenSentNotificationOfOnlineTransactionPaymentByEmail($notificationLog);
if($hasBeenNotifiedSpec->isSatisfiedBy($customer))
{
...do something
By using my file-to-class-name naming convention, I can simply use Windows Explorer to get a good idea of what the class does, its place /role in the Model/View/Controller pattern etc.
ABCSiteCore\
Model\
MssgSys\
Rules\
Customer\
HasCustomerBeenSentNotificationOfOnlineTransactionPaymentByEmail.php
Whenever i think of name for a domain concept, I've got into the habit of pasting the potential path length into a 'path length checker' to see if i can use it - its just a peice of pre-formatted text in my working-notes-wiki:
As you can see. Unfortunately that class name is getting close to the limits
C:\_my\websrv\ABCCoUkHosting2\webserve\my_library\vendor\ABCSiteCore-6-2\ABCSiteCore\Model\MssgSys\Rules\Customer\hasCustomerBeenSentNotificationOfOnlineTransactionPaymentByEmail.php
-------------------------------------------------------------------------------------------------------------------------------------------------------------------script path length danger zone------->|
----------------------------------------------------------------------------------------------------------------------------------------------------------------------max path length danger zone (inclusive .svn folder)------->|
C:\_my\websrv\ABCCoUkHosting2\webserve\my_library\vendor\ABCSiteCore-6-2\ABCSiteCore\Model\MssgSys\Rules\Customer\.svn\text-base\hasCustomerBeenSentNotificationOfOnlineTransactionPaymentByEmail.php.svn-base
Because of these path length restrictions, I tend to choose names for my entities that are not the best fit to the ubiquitous language of my domain model. This, can sometimes lead to misconceptions about how the system works, causes confusions and adds to the complexity - making development harder.
so :
How can I solve this issue?
is it solvable or is this just one one
those practical constraints that we
all just have to deal with?
is this just a PC thing? It might be time to switch to Mac or Linux.
Here's some abstract advice which may help you; yes switch operating systems, but not how you think - get virtualbox and load in a version of linux (ubuntu's nice and quick + well supported) then use that virtual os as your development os - this way you get the best of both worlds, and when you're finished working you simply close the virtual machine (saving it's state if you like) and have a nice clean pc to do what you want with.
The benefits are almost unlimited, for instance I have several virtual machines which I can load up whenever, some are testing setups, some mirror web server setups, I even have different editions of windows (such as an old XP with IE6) to test out browser bugs.
Surprisingly, I actually find my machine runs quicker using virtualbox; and please don't let the time to setup and run things worry you, as virtualbox is extremely quick, and you'll find your virtual machines load much faster than your primary os.
There are many other benefits which come with, but over all it's a very liberating experience :)
The cause is actually mentioned here: http://svn.haxx.se/users/archive-2005-02/1088.shtml
The long and short of it is that if you use absolute paths for your
operations, you get access to 32k of path length. Relative paths are
limited to ~255.
The Subversion libraries use relative paths. (I hate it, always have,
but that's a battle I've no time to wage.) That said, if you feed
Subversion an absolute path, well, paths relative to an absolute path
are still themselves absolute, so you should be okay. Also, if you
use TortoiseSVN on Windows, you should be okay, because it always
feeds Subversion absolute paths.
Pretty embarrassing for something that is called an OS, if you ask me. Workaround: Use Tortoise SVN. However I do not like Tortoise on the other hand since it's so slow.

Categories