Efficient PHP auto-loading and naming strategies - php

Like most web developers these days, I'm thoroughly enjoying the benefits of solid MVC architecture for web apps and sites. When doing MVC with PHP, autoloading obviously comes in extremely handy.
I've become a fan of spl_autoload_register over simply defining a single __autoload() function, as this is obviously more flexible if you are incorporating different base modules that each use their own autoloading. However, I've never felt great about the loading functions that I write. They involve a lot of string checking and directory scanning in order to look for possible classes to load.
For example, let's say I have an app that has a base path defined as PATH_APP, and a simple structure with directories named models, views and controllers. I often employ a naming structure whereby files are named IndexView.php and IndexController.php inside the appropriate directory, and models generally have no particular scheme by default. I might have a loader function for this structure like this that gets registered with spl_autoload_register:
public function MVCLoader($class)
{
if (file_exists(PATH_APP.'/models/'.$class.'.php')) {
require_once(PATH_APP.'/models/'.$class.'.php');
return true;
}
else if (strpos($class,'View') !== false) {
if (file_exists(PATH_APP.'/views/'.$class.'.php')) {
require_once(PATH_APP.'/views/'.$class.'.php');
return true;
}
}
else if (strpos($class,'Controller') !== false) {
if (file_exists(PATH_APP.'/controllers/'.$class.'.php')) {
require_once(PATH_APP.'/controllers/'.$class.'.php');
return true;
}
}
return false;
}
If it's not found after that, I might have another function to scan sub-directories in the models directory. However, all the if/else-ing, string checking and directory scanning seems inefficient to me, and I'd like to improve it.
I'm very curious what file naming and autoloading strategies other developers might employ. I'm looking specifically for good techniques to employ for efficient autoloading, and not alternatives to autoloading.

This is what I have been using in all of my projects (lifted straight from the source of the last one):
public static function loadClass($class)
{
$files = array(
$class . '.php',
str_replace('_', '/', $class) . '.php',
);
foreach (explode(PATH_SEPARATOR, ini_get('include_path')) as $base_path)
{
foreach ($files as $file)
{
$path = "$base_path/$file";
if (file_exists($path) && is_readable($path))
{
include_once $path;
return;
}
}
}
}
If I look for SomeClass_SeperatedWith_Underscores it will look for SomeClass_SeperatedWith_Underscores.php followed by SomeClass/SeperatedWith/Underscores.php rooted at each directory in the current include path.
EDIT: I just wanted to put out there that I use this for efficiency in development, and not necessarily processing time. If you have PEAR on your path then with this you can just use the classes and don't have to include them when you need them.
I tend to keep my classes in a hierarchy of directories, with underscores breaking up namespaces... This code lets me keep the file structure nice and tidy if I want, or to inject a quick class file without nested directories if I want (for adding a single class or two to a library that it is defendant on, but not part of the project I am currently working on.)

I landed on this solution:
I created a single script that traverses my class library folder (which contains subfolders for separate modules / systems), and parses the file contents looking for class definitions. If it finds a class definition in a php file (pretty simple regex pattern), it creates a symlink:
class_name.php -> actual/source/file.php
This lets me use a single, simple autoload function that needs only the class name and the path to the main symlink folder, and doesn't have to do any path/string manipulation.
The best part is that I can rearrange my source code completely or add a new subsystem and just run the link generating script to have everything autoloaded.

If you want efficiency then you shouldn't be using the autoload feature at all. The autoload feature is for being lazy. You should be providing an explicit path to your include files when you include them. If your autoload function can find these files then you could code to find them explicitly. When you are working on the view part of the code and about to load a new view class, by letting the autoload function handle it, it first assumes your class is a model class? That's inefficient. Instead your code should just be:
include_once $this->views_path . $class . '.php';
If you need multiple "view" paths, make a function that loads views:
public function load_view($class) {
// perhaps there's a mapping here instead....
foreach ($this->views_paths as $path) {
$filename = $path . $class . '.php';
if (file_exists($filename)) {
include_once $filename;
}
}
throw ....
}
In any case, at the point where the include occurs, you have the greatest/most accurate information about the class you want to load. Using that information to load the class fully is the only efficient class loading strategy. Yes, you may end up with more class variables or (heaven forbid) some global variables. But that is a better tradeoff than just being lazy and scanning parts of the file system for your class.

Related

PHP Autoload classes without importing namespaces on top of every script

Many developers writing object-oriented applications create one PHP
source file per class definition. One of the biggest annoyances is
having to write a long list of needed includes at the beginning of
each script (one for each class).
In PHP 5, this is no longer necessary. The spl_autoload_register()
function registers any number of autoloaders, enabling for classes and
interfaces to be automatically loaded if they are currently not
defined. Source: http://php.net/manual/en/language.oop5.autoload.php
Well i found that this statement is not true, because i still end up writing
long list of includes imports in each file simply because i am using different sub folders inside my includes folder and namespaces according to PHP-FIG's PSR-0 coding convention.
includes/core/database/
includes/core/html/
includes/domain/
etc.
spl_autoload_register() unable to automatically load DB, HTML Domain logic classes because it does not know folder structure where file is so i am using namespaces to it, but it takes just as much space as having imports on top of every script.
use MyProject\Core\Database;
use MyProject\Core\Html;
use MyProject\domain;
I use different classes per script so i cannot simply make one big file and include_once(), besides importing of namespaces does not work with include_once().
I instantiate class like this
try {
$DBQuery = new Database\DBQuery();
$HtmlGenerator = new Html\HtmlGenerator();
$domain = new domain\UserRegister();
} catch (Error $e) {
echo $e->getMessage();
}
My Autoload function
spl_autoload_register(function ($fullyQualifiedClassName) {
//change backslash in namespace name to DIRECTORY_SEPARATOR for file system
if ( stristr($fullyQualifiedClassName, "\\") ) {
$fullyQualifiedClassName = str_ireplace("\\", DIRECTORY_SEPARATOR, $fullyQualifiedClassName);
}
//function dirname() used because THIS file in sub folder /includes and we need to go to parent folder
$class_path = $lib_patch . DIRECTORY_SEPARATOR . "{$fullyQualifiedClassName}.php";
if ( !is_file($class_path) ) {
throw new Error("Unable to load class with path: $class_path");
}
require_once $class_path;
});
Any way i can avoid importing multiple namespaces at this i am open to stopping using namespaces completely but i'd like to keep my sub folder structure. Is there way auto-load function can know what folder my files at without making code that will loop trough every sub-folder looking for file e.g. DBQuery.php because this will impact performance.
Autoloading is saving you from includeing the files, you're now exclusively dealing with name resolution. If you don't want to write a bunch of use statement in your files, you could simply use the fully qualified names of those classes instead of aliasing them:
$db = new \MyProject\Core\Database;
$html = new \MyProject\Core\Html;
...
The use of use MyProject\Core\Database is that it enables you to write Database instead of \MyProject\Core\Database. Autoloading of the underlying file works the same.
If you even don't like that aspect, then it's hard to have your cake and eat it too. You could flatten your namespaces so you don't have as many different namespaces to import, but then your project organisation starts to become more prone to name clashes or harder to locate files. It's a tradeoff, something has to give somewhere. If you're not happy with some consequence of using namespaces, you need to find a new happy middleground for yourself.
Having said this, in many languages it is extremely common to have a bunch of import statements at the top of each file in one way or another. A decent IDE can largely auto-generate those while you write your code. It is something that you should rather get used to instead of fighting against it. It may be annoying, but the alternatives are more name clashes or giant imports. It's virtually impossible to have it modular, fast, extensible and terse all at the same time.

How can I reconcile SplClassLoader's namespace requirement with my custom directory layout?

I recently starting writing a custom MVC framework in PHP. It's basically a learning exercise.
My classes are located in the following directories:
system/libraries/
system/controllers/
system/models
application/libraries/
application/controlers/
application/models
I'm not using namespaces because I can't figure out how to instantiate controllers using namespaces and Apache 2 handler style URLs (controller/action/id). I created a Bootstrap class to autoload my other classes:
class Bootstrap
{
public function autoloadClasses($class)<br/>
{
$class .= '.php';
$classDirectories = array(
SYSTEM_LIBS_DIR,
SYSTEM_CONTROLLERS_DIR,
SYSTEM_MODELS_DIR,
APPLICATION_LIBS_DIR,
APPLICATION_CONTROLLERS_DIR,
APPLICATION_MODELS_DIR
);
foreach ($classDirectories as $classDirectory) {
$directoryIterator = new DirectoryIterator($classDirectory);
foreach($directoryIterator as $file) {
if ($file == $class) {
include $classDirectory . $class;
break 2;
}
}
}
}
public function register()
{
spl_autoload_register(array($this, 'autoloadClasses'), true);
}
public function init()
{
$this->register();
$loader = new Loader($_GET);
$controller = $loader->createController();
$controller->executeAction();
}
}
It works fine. However, I know I should really be using the implementation recommended by PSR-0:
https://gist.github.com/221634
However, I can't figure out how to get it to work without namespaces. It looks like the namespace is an optional pararmeter. However, if I do the following, nothing happens -- not even an error in the Apache logs:
$libLoader = new SplClassLoader('', 'system/libraries');
The goal of PSR-0 was to try and specify how external third-party library classes should be named, and where the files containing those classes should live on disk. This goal was accomplished, and from a high level, it's not too bad of a thing. Interopability and not stepping all over other libraries is a good thing.
Your directory layout and class naming scheme doesn't mesh with PSR-0, which means SplClassLoader is going to be nigh-useless for you.
You have two options:
Rename all of your classes, shuffle them into a namespace hierarchy, and refactor the rest of the code that needs to worry about it, or
Don't use SplClassLoader and write your own autoloader.
If you're building a library intended for external distribution, it'll be a good idea to make yourself PSR-0 compliant, as it's pretty darn simple, logical and painless.
If you're building your own app for your own use and don't intend it as a library, then you are under no requirement to do all of that work, and you shouldn't, because it'd be silly. This looks like it's the case, so I can end with a big fat: don't bother.
I got it to work. YAY!
Here is the code from my front controller (index.php) I'm going to refactor it since it would be cleaner to simply make one call to some type of bootstrap class:
<?php
use NeilMVC\system\libraries\Loader;
require_once('conf/conf.php');
require_once('SplClassLoader.php');
$loadSystemLibraries = new SplClassLoader('NeilMVC\system\libraries');
$loadSystemControllers = new SplClassLoader('NeilMVC\system\controllers');
$loadSystemModels = new SplClassLoader('NeilMVC\system\models');
$loadApplicationLibraries = new SplClassLoader('NeilMVC\application\libraries');
$loadApplicationControllers = new SplClassLoader('NeilMVC\application\controllers');
$loadApplicationModels = new SplClassLoader('NeilMVC\application\models');
$loadSystemLibraries->register();
$loadSystemControllers->register();
$loadSystemModels->register();
$loadApplicationLibraries->register();
$loadApplicationControllers->register();
$loadApplicationModels->register();
$loader = new Loader($_GET);
$controller = $loader->createController();
$controller->executeAction();
I had to refactor some of the classes in order to resolve fully qualified classes to unqualified names used in the MVC-style URLs. It wasn't that hard to do, I just had to tinker with it to understand it. If anyone want to know more, you can email me through my website http://neilgirardi.com
Cheers and happy holidays!

Efficient autoload function

I currently am building my own PHP framework and am creating a lot of directories to store my classes in.
This is my current autoload function:
function __autoload($className)
{
$locations = array('', 'classes/', 'classes/calendar/', 'classes/exceptions/', 'classes/forms/', 'classes/table/', 'classes/user', 'pages/', 'templates/');
$fileName = $className . '.php';
foreach($locations AS $currentLocation)
{
if(file_exists($currentLocation . $fileName))
{
include_once ($currentLocation . $fileName);
return;
}
}
}
Now in my main class file I do have all of the necessary classes already included so that they won't have to be searched for.
Here are my questions:
Is this function efficient enough? Will there be a lot of load time or is there a way for me to minimize the load time?
Is include_once() the way that I should go about including the classes?
Is there a way that I could write the function to guess at the most popular folders? Or would that take up too much time and/or not possible?
Would namespaces help me at all? (I am reading and learning about them right now.)
This is answered very well here: autoload and multiple directories
You should probably go with require, for two reasons: a) you don't need to have PHP track if the file has been already included, because if it has it won't need to call __autoload in the first place and b) if the file cannot be included you won't be able to continue execution anyway
The answer for point 1 covers this
Not necessarily; you need some namespace-like mechanism to implement faster loading (to only look where you have to) but you can fake it if necessary without using real namespaces
For reference, the interaction between __autoload and namespaces is documented here.

Would performance suffer using autoload in php and searching for the class file?

I've always struggled with how to best include classes into my php code. Pathing is usually an issue but a few minutes ago i found this question which dramatically helps that. Now I'm reading about __autoload and thinking that it could make the process of developing my applications much easier. The problem is i like to maintain folder structure to separate areas of functionality as opposed to throwing everything into a general /lib folder. So if i override autoload to do a deep search of a class folder including all subfolders, what performance hits can i expect?
Obviously this will depend on scale, depth of the folder structure and number of classes but generally I'm asking on a medium scale project will it cause problems.
__autoload is great, but the cost of stating all the files in a recursive search function is expensive. You might want to look at building a tree of files to use for autoloading. In my framework, I consistently name files for their classes and use a map that is cached for the data.
Check out http://trac.framewerk.org/cgi-bin/trac.fcgi/browser/trunk/index.php [dead link] starting at line 68 for an idea of how this can be done.
Edit: And to more directly answer your question, without caching, you can expect a performance hit on a site with medium to heavy traffic.
A common pattern (Pear, Zend Framework as examples...) is to make the classname reflect the path, so Db_Adapter_Mysql will be in at /Db/Adapter/Mysql.php, from somewhere that's added to the include-path.
There are 2 ways that you could easily do this, first of all, name your classes so that they'll define the structure of where to find them
function __autoload($classname)
{
try
{
if (class_exists($classname, false) OR interface_exists($classname, false))
{
return;
}
$class = split('_', strtolower(strval($classname)));
if (array_shift($class) != 'majyk')
{
throw new Exception('Autoloader tried to load a class that does not belong to us ( ' . $classname . ' )');
}
switch (count($class))
{
case 1: // Core Class - matches Majyk_Foo - include /core/class_foo.php
$file = MAJYK_DIR . 'core/class_' . $class[0] . '.php';
break;
case 2: // Subclass - matches Majyk_Foo_Bar - includes /foo/class_bar.php
$file = MAJYK_DIR . $class[0] . '/class_' . $class[1] . '.php';
break;
default:
throw new Exception('Unknown Class Name ( ' . $classname .' )');
return false;
}
if (file_exists($file))
{
require_once($file);
if (!class_exists($classname, false) AND !interface_exists($classname, false))
{
throw new Exception('Class cannot be found ( ' . $classname . ' )');
}
}
else
{
throw new Exception('Class File Cannot be found ( ' . str_replace(MAJYK_DIR, '', $file) . ' )');
}
}
catch (Exception $e)
{
// spl_autoload($classname);
echo $e->getMessage();
}
}
Or, 2, use multiple autoloaders. PHP >=5.1.2 Has the SPL library, which allows you to add multiple autoloaders. You add one for each path, and it'll find it on it's way through. Or just add them to the include path and use the default spl_autoload()
An example
function autoload_foo($classname)
{
require_once('foo/' . $classname . '.php');
}
function autoload_bar($classname)
{
require_once('bar/' . $classname . '.php');
}
spl_autoload_register('autoload_foo');
spl_autoload_register('autoload_bar');
spl_autoload_register('spl_autoload'); // Default SPL Autoloader
Autoload is great PHP feature that helps you very much...
The perfomance wouldn't suffer if will use the smart taxonomy like:
1. every library stays in the folders "packages"
2. every class is located by replacing the "_" in the class name with the "/" and adding a ".php" at the end
class = My_App_Smart_Object
file = packages/My/App/Smart/Object.php
The benefits of this approach(used by almost any framework) is also a smarter organization of your code :-)
Hunting for files all over the place will make things slower (many more disk hits). Loading all of your classes in case you might need them will make things take more memory. Specifying which classes you need in every file is difficult to maintain (i.e. they don't get removed if they're no longer used).
The real question is which of these is more important to you? They're all tradeoffs, in the end, so you have to pick one. It's arguable, though, that most of the overhead in the second and third options has to do with actually compiling the code. Using something like APC can significantly reduce the overhead of loading and compiling every class on every page load.
Given the use of APC, I would likely take the approach of dividing up my code into modules (e.g. the web interface module, the database interaction module, etc.) and have each of those modules import all the classes for their module, plus classes from other modules they may need. It's a tradeoff between the last two, and I've found it works well enough for my needs.
I tend to use a simple approach where __autoload() consults a hash mapping class names to relative paths, which is contained in a file that's regenerated using a simple script which itself performs the recursive search.
This requires that the script be run when adding a new class file or restructuring the code base, but it also avoids "cleverness" in __autoload() which can lead to unnecessary stat() calls, and it has the advantage that I can easily move files around within my code base, knowing that all I need to do is run a single script to update the autoloader.
The script itself recursively inspects my includes/ directory, and assumes that any PHP file not named in a short list of exclusions (the autoloader itself, plus some other standard files I tend to have) contains a class of the same name.
Zend Framework's approach is to do autoload based on the PEAR folder standard (Class_Foo maps to /Class/Foo.php), however rather than using a set base path it uses the include_path.
The problem with their approach is there's no way to check beforehand if a file exists so the autoload will try to include a file that doesn't exist in any of the include_path's, error out, and never give any other autoload functions registered with spl_autoload_register a chance to include the file.
So a slight deviation is to manually provide an array of base paths where the autoload can expect to find classes setup in the PEAR fashion and just loop over the base paths:
<?php
//...
foreach( $paths as $path )
{
if( file_exists($path . $classNameToFilePath) )
include $path . $classNameToFilePath;
}
//...
?>
Granted you'll kinda be search but for each autoload you'll only be doing at worst n searches, where n is the number of base paths you are checking.
But if you find yourself still having to recursively scan directories the question is not "Will autoload hurt my performance," the question should be "why am I tossing my class files around in a random structure?" Sticking to the PEAR structure will save you so many headaches, and even if you decide to go with manually doing your includes as opposed to autoload, there will be no guessing as to where the class files are located when you do your include statements.

How to handle including needed classes in PHP

I'm wondering what the best practice is for handling the problem with having to "include" so many files in my PHP scripts in order to ensure that all the classes I need to use are accessible to my script.
Currently, I'm just using include_once to include the classes I access directly. Each of those would include_once the classes that they access.
I've looked into using the __autoload function, but hat doesn't seem to work well if you plan to have your class files organized in a directory tree. If you did this, it seems like you'd end up walking the directory tree until you found the class you were looking for. Also, I'm not sure how this effects classes with the same name in different namespaces.
Is there an easier way to handle this?
Or is PHP just not suited to "enterprisey" type applications with lots of different objects all located in separate files that can be in many different directories.
I my applications I usually have setup.php file that includes all core classes (i.e. framework and accompanying libraries). My custom classes are loaded using autoloader aided by directory layout map.
Each time new class is added I run command line builder script that scans whole directory tree in search for model classes then builds associative array with class names as keys and paths as values. Then, __autoload function looks up class name in that array and gets include path. Here's the code:
autobuild.php
define('MAP', 'var/cache/autoload.map');
error_reporting(E_ALL);
require 'setup.php';
print(buildAutoloaderMap() . " classes mapped\n");
function buildAutoloaderMap() {
$dirs = array('lib', 'view', 'model');
$cache = array();
$n = 0;
foreach ($dirs as $dir) {
foreach (new RecursiveIteratorIterator(new RecursiveDirectoryIterator($dir)) as $entry) {
$fn = $entry->getFilename();
if (!preg_match('/\.class\.php$/', $fn))
continue;
$c = str_replace('.class.php', '', $fn);
if (!class_exists($c)) {
$cache[$c] = ($pn = $entry->getPathname());
++$n;
}
}
}
ksort($cache);
file_put_contents(MAP, serialize($cache));
return $n;
}
autoload.php
define('MAP', 'var/cache/autoload.map');
function __autoload($className) {
static $map;
$map or ($map = unserialize(file_get_contents(MAP)));
$fn = array_key_exists($className, $map) ? $map[$className] : null;
if ($fn and file_exists($fn)) {
include $fn;
unset($map[$className]);
}
}
Note that file naming convention must be [class_name].class.php. Alter the directories classes will be looked in autobuild.php. You can also run autobuilder from autoload function when class not found, but that may get your program into infinite loop.
Serialized arrays are darn fast.
#JasonMichael: PHP 4 is dead. Get over it.
You can define multiple autoloading functions with spl_autoload_register:
spl_autoload_register('load_controllers');
spl_autoload_register('load_models');
function load_models($class){
if( !file_exists("models/$class.php") )
return false;
include "models/$class.php";
return true;
}
function load_controllers($class){
if( !file_exists("controllers/$class.php") )
return false;
include "controllers/$class.php";
return true;
}
You can also programmatically determine the location of the class file by using structured naming conventions that map to physical directories. This is how Zend do it in Zend Framework. So when you call Zend_Loader::loadClass("Zend_Db_Table"); it explodes the classname into an array of directories by splitting on the underscores, and then the Zend_Loader class goes to load the required file.
Like all the Zend modules, I would expect you can use just the loader on its own with your own classes but I have only used it as part of a site using Zend's MVC.
But there have been concerns about performance under load when you use any sort of dynamic class loading, for example see this blog post comparing Zend_Loader with hard loading of class files.
As well as the performance penalty of having to search the PHP include path, it defeats opcode caching. From a comment on that post:
When using ANY Dynamic class loader APC can’t cache those files fully as its not sure which files will load on any single request. By hard loading the files APC can cache them in full.
__autoload works well if you have a consistent naming convention for your classes that tell the function where they're found inside the directory tree. MVC lends itself particularly well for this kind of thing because you can easily split the classes into models, views and controllers.
Alternatively, keep an associative array of names to file locations for your class and let __autoload query this array.
Of the suggestions so far, I'm partial to Kevin's, but it doesn't need to be absolute. I see a couple different options to use with __autoload.
Put all class files into a single directory. Name the file after the class, ie, classes/User.php or classes/User.class.php.
Kevin's idea of putting models into one directory, controllers into another, etc. Works well if all of your classes fit nicely into the MVC framework, but sometimes, things get messy.
Include the directory in the classname. For example, a class called Model_User would actually be located at classes/Model/User.php. Your __autoload function would know to translate an underscore into a directory separator to find the file.
Just parse the whole directory structure once. Either in the __autoload function, or even just in the same PHP file where it's defined, loop over the contents of the classes directory and cache what files are where. So, if you try to load the User class, it doesn't matter if it's in classes/User.php or classes/Models/User.php or classes/Utility/User.php. Once it finds User.php somewhere in the classes directory, it will know what file to include when the User class needs to be autoloaded.
#Kevin:
I was just trying to point out that spl_autoload_register is a better alternative to __autoload since you can define multiple loaders, and they won't conflict with each other. Handy if you have to include libraries that define an __autoload function as well.
Are you sure? The documentation says differently:
If your code has an existing __autoload function then this function must be explicitly registered on the __autoload stack. This is because spl_autoload_register() will effectively replace the engine cache for the __autoload function by either spl_autoload() or spl_autoload_call().
=> you have to explicitly register any library's __autoload as well. But apart from that you're of course right, this function is the better alternative.
__autoload will work, but only in PHP 5.

Categories