CodeSniffer sniff for generating dependency graphs for PHP code? - php

GOAL:
I'm interested in generating a DOT Format description of the class dependencies in a PHP program.
IDEA:
It shouldn't be hard to write a CodeSniffer "sniff" that can detect (and emit DOT records for) the following patterns in PHP source:
class SomeClassName extends BasicClassName { // SomeClassName refers to BasicClassName
...
new OtherClassName(); // SomeClassName refers to OtherClassName
ThisClassName::some_method(); // SomeClassName refers to ThisClassName
ThatClassName::$some_member; // SomeClassName refers to ThatClassName
RandomClassName::some_constant; // SomeClassName refers to RandomClassName
...
}
But I haven't found any published sniffs to emit this information (and any other patterns indicating a "real" class dependency relationship that I may have missed).
NOTE:
I specifically do not care about PHP's include() and require() statements (whose behavior I'm not convinced is even well-defined). For the purposes of this question let's assume that all PHP class resolution is handled via autoloading, and that we're looking to use only static code analysis to build the class dependency diagram.
EDIT:
Unfortunately, I see no general way to deal with the following:
class ThatClassName {
...
function generateClassName() {
// something too complicated to analyze statically...
}
function foo() {
$name = $this->generateClassName();
$instance = new $name; // ThatClassName refers to ... what?
...
}
...
}
But of course it would be possible to represent this scenario in a dependency graph by showing ThatClassName with a dependency on the generateClassName() method - perhaps shown with parens to make the method name easily distinguishable from a class name. And it probably wouldn't be a bad idea to establish a convention whereby any method which generates a class name dynamically must contain an annotation (in the associated comment) which indicates every class name which might possibly be generated - these "documented dynamic dependencies" could then be automatically included in the dependency graph.

This isn't really what PHP_CodeSniffer is designed to do; specifically because the sniffs are not supposed to output data or write to files. But, there is certainly nothing stopping you from doing this inside a sniff. It's just PHP code after all and it doesn't need to report any errors or warnings.
I haven't come across any sniffs that are doing anything like you describe, so I think you'd have to write a new one.
If you want to create a new sniff, I'd recommend starting with an abstract scope sniff. This allows you to look for T_NEW and T_DOUBLE_COLON tokens inside T_CLASS tokens. Here is an example.
Or, if you also want to look into global functions and other code outside classes, you can just look for T_NEW and T_DOUBLE_COLON tokens inside a regular sniff
If you're not sure how to get started or you just want some help writing the sniff, contact me and I can help write this with you. I'd just need to know what output you'd want for each case found, or I can just use something basic. If you want a hand, my email is: gsherwood at squiz dot net

I wrote a tool for this: PhpDependencyAnalysis.
This is an extandable static code analysis for object-oriented PHP-Projects (>= 5.3.3) based on namespaces. It creates dependency graphs on customizable levels, e.g. on package-level or on class-level. Thus, it's usable to declare dependencies in general, but it's also usable to perform a detection of violations between layers in a tiered architecture according to compliance with Separation of Concerns, Law of Demeter and Acyclic Dependencies Principle.
You can also change the output format to DOT.
Just check PhpDependencyAnalysis on GitHub.

Related

PHP Dependency Injection - 2 classes contain each other

I have 2 classes - UserService and ProfileService. I need each class to contain the other one. I am trying to do this with dependency injection:
$this->container['userService'] = function ($c) {
return new UserService($c['profileService']);
};
$this->container['profileService'] = function ($c) {
return new ProfileService($c['userService']);
};
In each class I define a constructor to handle these parameters. Anyway, I am getting a "ERR_EMPTY_RESPONSE" error in Chrome. What's the right way to do this?
Circular dependencies are in most cases sign of a bad design. It means that the two classes are very tightly coupled together which often implies mixing of concerns.
Instead of trying to force this to work, try to find out why there is such dependency --- why does one class need the other (and vice versa)? And is that part really related to a particular side of the problem?
Once the dependency understood, the solution usually is to extract the infringing part into third (or fourth) class and let both the classes refer to that, instead of referring to each other.
Of course knowing more about your particular problem would also enable me to give you a better answer.

Is the Visitor pattern useful for dynamically typed languages?

The Visitor pattern allows operations on objects to be written without extending the object class. Sure. But why not just write a global function, or a static class, that manipulates my object collection from the outside? Basically, in a language like java, an accept() method is needed for technical reasons; but in a language where I can implement the same design without an accept() method, does the Visitor pattern become trivial?
Explanation: In the Visitor pattern, visitable classes (entities) have a method .accept() whose job is to call the visitor's .visit() method on themselves. I can see the logic of the java examples: The visitor defines a different .visit(n) method for each visitable type n it supports, and the .accept() trick must be used to choose among them at runtime. But languages like python or php have dynamic typing and no method overloading. If I am a visitor I can call an entity method (e.g., .serialize()) without knowing the entity's type or even the full signature of the method. (That's the "double dispatch" issue, right?)
I know an accept method could pass protected data to the visitor, but what's the point? If the data is exposed to the visitor classes, it is effectively part of the class interface since its details matter outside the class. Exposing private data never struck me as the point of the visitor pattern, anyway.
So it seems that in python, ruby or php I can implement a visitor-like class without an accept method in the visited object (and without reflection), right? If I can work with a family of heterogeneous objects and call their public methods without any cooperation from the "visited" class, does this still deserve to be called the "Visitor pattern"? Is there something to the essence of the pattern that I am missing, or does it just boil down to "write a new class that manipulates your objects from the outside to carry out an operation"?
PS. I've looked at plenty of discussion on SO and elsewhere, but could not find anything that addresses this question. Pointers welcome.
The place where Visitor is particularly useful is where the Visitor needs to switch on the type of Visitees, and for whatever reason, you don't want to encode that knowledge into the Visitees (think plugin architectures). Consider the following Python code:
Visitor style
class Banana(object):
def visit(self, visitor):
visitor.process_banana(self)
class Apple(object):
def visit(self, visitor):
visitor.process_apple(self)
class VisitorExample(object):
def process_banana(self, banana):
print "Mashing banana: ", banana
def process_banana(self, apple):
print "Crunching apple: ", apple
(Note that we could compress the visitee logic with a base class/mixin).
Compare with:
Non-visitor style
class NonVisitorVisitor(object):
def process(self, fruit):
verb = {Banana: "Mashing banana: ",
Apple: "Crunching apple: "}[type(fruit)]
print verb, fruit
In the second example, the fruits don't need any special support for the "visitor", and the "visitor" handles the absence of logic for the given type.
By contrast, in Java or C++ the second example is not really possible, and the visit method (in the visitees) can use one name to refer to all versions of the process method; the compiler will pick the version which applies to the type being passed; and the visitor can easily provide a default implementation for the root class for the type of visitees. It's also necessary to have a visit method in the visitees because the method variant (e.g. process(Banana b) vs process(Apple a)) is selected at compile time in the code generated for the visitee's visit method.
Accordingly, in languages like Python or Ruby where there is no dispatch on parameter types (or rather, the programmer has to implement it themselves), there is no need for the visitor pattern. Alternatively, one might say the visitor pattern is better implemented without the dispatching through visitee methods.
In general in dynamic languages like Python, Ruby, or Smalltalk, it is better to have the "visitee" classes carry all the information needed (here, the verb applicable), and if necessary, provide hooks to support the "visitor", such as command or strategy patterns, or use the Non-visitor pattern shown here.
Conclusion
The non-Visitor is a clean way to implement the type-switching logic, notwithstanding that explicit type switching is usually a code smell. Remember that the Java and C++ way of doing it is also explicit switching in the Visitor; the elegance of the pattern in those languages is that it avoids having explicit switching logic in the visitees, which is not possible in dynamic languages with untyped variables. Accordingly, the Visitor pattern at the top is bad for dynamic languages because it reproduces the sin which the Visitor pattern in static languages seeks to avoid.
The thing with using patterns is that rather than slavishly reproducing UML diagrams, you must understand what they are trying to accomplish, and how they accomplish those goals with the language machinery concretely under consideration. In this case, the pattern to achieve the same merits looks different, and has a different pattern of calls. Doing so will allow you to adapt them to different languages, but also to different concrete situations within the same language.
Update: here's a ruby article on implementing this pattern: http://blog.rubybestpractices.com/posts/aaronp/001_double_dispatch_dance.html
The double dispatch seems rather forced to me; you could just do away with it, as far as I can tell.
This answer is made with an ignorance of PHP etc but the Visitor needs typically to call more than just a single method (you mentioned "serialize") on the entities. When the Visit() method is called on the concrete Visitor, the Visitor is capable of running distictly different code for each entity subtype. I don't see how that is different from a dynamically-types language (though I'd love some feedback).
Another nice benefit of Visitor is it provides a clean seperation of the code that is getting run on each entity from the code that enumerates the entities. This has saved me some serious code duplication in at least one large project.
As an aside, I've used Visitor in languages that did not have method overloading. You just replace Visit(TypeN n) with VisitN(TypeN n).
Follow up from comments.
This is a visitor psuedo code, and I don't know how I would so it without the cooperation of the visited object (at least without a switch block):
abstract class ScriptCommand
{
void Accept(Visitor v);
}
abstract class MoveFileCommand
{
string TargetFile;
string DestinationLocation;
void Accept(Visitor v)
{
v.VisitMoveFileCmd(this); // this line is important because it eliminates the switch on object type
}
}
abstract class DeleteFileCommand
{
string TargetFile;
void Accept(Visitor v)
{
v.VisitDeleteFileCmd(this); // this line is important because it eliminates the switch on object type
}
}
// etc, many more commands
abstract class CommandVisitor
{
void VisitMoveFileCmd(MoveFileCommand cmd);
void VisitDeleteFileCmd(DeleteFileCommand cmd);
// etc
}
// concrete implementation
class PersistCommandVisitor() inherits CommandVisitor
{
void VisitMoveFileCmd(MoveFileCommand cmd)
{
// save the MoveFileCommand instance to a file stream or xml doc
// this code is type-specific because each cmd subtype has vastly
// different properties
}
void VisitDeleteFileCmd(DeleteFileCommand cmd)
{
// save the DeleteFileCommand instance to a file stream or xml doc
// this code is type-specific because each cmd subtype has vastly
// different properties
}
}
The visitor infrastructure allows the handling of a wide array of command subtypes with no select case, swithc, if else.
In regards to the visitor handling the enumerating, I think you are limiting yourself like that. That's not to say a cooperating class (an abstract VisitorEnumerator) can't be involved.
For example, note this visitor is unaware of the order of enumeration:
class FindTextCommandVisitor() inherits CommandVisitor
{
string TextToFind;
boolean TextFound = false;
void VisitMoveFileCmd(MoveFileCommand cmd)
{
if (cmd.TargetFile.Contains(TextToFind) Or cmd.DestinationLocation.Contains(TextToFind))
TextFound = true;
}
void VisitDeleteFileCmd(DeleteFileCommand cmd)
{
// search DeleteFileCommand's properties
}
}
And this allows it to be reused like this:
ScriptCommand FindTextFromTop(string txt)
{
FindTextCommandVisitor v = new FindTextCommandVisitor();
v.TextToFind = txt;
for (int cmdNdx = 0; cmdNdx < CommandList.Length; cmdNdx++)
{
CommandList[cmdNdx].Accept(v);
if (v.TextFound)
return CommandList[cmdNdx]; // return the first item matching
}
}
and the enumerate the opposite way with the same visitor:
ScriptCommand FindTextFromBottom(string txt)
{
FindTextCommandVisitor v = new FindTextCommandVisitor();
v.TextToFind = txt;
for (int cmdNdx = CommandList.Length-1; cmdNdx >= 0; cmdNdx--)
{
CommandList[cmdNdx].Accept(v);
if (v.TextFound)
return CommandList[cmdNdx]; // return the first item matching
}
}
In real code I would create a base class for the enumerator and then subclass it to handle the different enumeration scenarios, while passing in the concrete Visitor subclass to completely decouple them. Hopefully you can see the power of keeping the enumeration seperate.
I think you are using Visitor Pattern and Double Dispatch interchangeably. When you say,
If I can work with a family of heterogeneous objects and call their public methods without any cooperation from the "visited" class, does this still deserve to be called the "Visitor pattern"?
and
write a new class that manipulates your objects from the outside to carry out an operation"?
you are defining what Double dispatch is. Sure, Visitor pattern is implemented by double dispatch. But there is something more to the pattern itself.
Each Visitor is an algorithm over a group of elements (entities) and new visitors can be plugged in without changing the existing code. Open/Closed principle.
When new elements are added frequently, Visitor pattern is best avoided
Maybe, it depends on the language.
The visitor pattern solves double and multiple-hierarchy problems in languages that don't feature multiple-dispatch. Take Ruby, Lisp and Python. They are all dynamically-typed languages, but only CLOS-Lisp implements multiple-dispatch in the standard. This is also called multimethods and Python and Ruby can apparently implement it by using extensions.
I like this curious comment on wikipedia stating that:
Lisp's object system [CLOS] with its multiple dispatch does not replace the Visitor pattern,
but merely provides a more concise implementation of it in which the pattern all but
disappears.
In other languages, even statically typed ones, you have to work around the absence of multimethods. The Visitor pattern is one such way.
Visitor Pattern to me meant to add new functionality to objects based on their type. Apparently having if/else ladders to perform type specific operations is bad (I would like an explanation for this :( ). In python, I was able to do this, without the whole double dispatch drama, by monkeypatching (another bad idea) certain functions as class methods.
I asked about this here.
In the below example, assume there's a base class ASTNode and a large class hierarchy under it (ASTVar, ASTModule, ASTIf, ASTConst, etc). These classes only have their specific data attributes and trivial methods.
Then, assume the class code is locked (or maybe the functionality is separated from the data). Now, I have methods which are dynamically assigned to classes. Note that in below example the iteration/recursion method call name (stringify) is different from the function name (nodeType_stringify).
def ASTNode__stringify(self):
text = str(self)
for child in self.children:
text += ", { " + child.stringify() + " }"
return text
def ASTConst__stringify(self):
text = str(self)
for child in self.children:
text += ", [ " + child.stringify() + " ]"
return text
def ASTIf__stringify(self):
text = str(self)
text += "__cond( " + self.op1.stringify() + ")"
text += "__then { " + self.op2.stringify() + "}"
text += "__else {" + self.op3.stringify() + "}"
return text
I can extend the classes (possibly, one-time during module init) with functionality whenever I want (bad idea?).
# mainModule1.py
def extend_types():
# ASTNode and all derived class get this method
ASTNode.stringify = ASTNode__stringify
ASTConst.stringify = ASTConst__stringify
ASTIf.stringify = ASTIf__stringify
Now, calling my_root_node.stringify() would appropriately call the right child methods (recursively), without explicitly checking for type.
Isn't this technique be similar to adding methods to the Javascript prototypes (Visitor pattern in JS).
Wasn't this what the goal of Visitor Pattern was? Extension of code-locked Types? Surely, the need to use double-dispatch (VisitorObject.visit(ConcreteObject) being called by ConcreteObject.Accept(VisitorObject)) wouldn't be necessary in python, which is dynamically typed. Probably, someone will formalize this for dynamically typed languages, and we will have a new pattern on hand, or not. After all, patterns are discovered, not invented (I don't remember where I read this).
Visitor pattern do 2 things:
Allows for ad hoc polymorphism (same function but do different things
to different "types").
Enables adding new consuming algorithm without changing provider of data.
You can do second in dynamic languages without Visitor nor runtime type information. But first one requires some explicit mechanism, or design pattern like Visitor.

Is it ever okay to have a class as a collection of methods and no properties?

I'm writing a bunch of generic-but-related functions to be used by different objects. I want to group the functions, but am not sure if I should put them in a class or simply a flat library file.
Treating them like a class doesn't seem right, as there is no one kind of object that will use them and such a class containing all these functions may not necessarily have any properties.
Treating them as a flat library file seems too simple, for lack of a better word.
What is the best practice for this?
Check out namespaces:
http://www.php.net/manual/en/language.namespaces.rationale.php
Wrapping them in a useless class is a workaround implementation of the concept of a namespace. This concept allows you to avoid collisions with other functions in large projects or plugin/module type deployments.
EDIT
Stuck with PHP 5.2?
There's nothing wrong with using a separate file(s) to organize utility functions. Just be sure to document them with comments so you don't end up with bunchafunctions.php, a 20,000 file of procedural code of dubious purpose.
There's also nothing wrong with prefixes. Using prefixes is another way to organize like-purpose functions, but be sure to avoid these "pseudo-namespaces" already reserved by the language. Specifically, "__" is reserved as a prefix by PHP [reference]. To be extra careful, you can also wrap your function declarations in function_exists checks, if you're concerned about conflicting functions from other libraries:
if (!function_exists('myFunction')) {
function myFunction() {
//code
}
}
You can also re-consider your object structure, maybe these utility functions would be more appropriate as methods in a base class that all the other objects can extend. Take a look at inheritance: http://www.php.net/manual/en/language.oop5.inheritance.php. The base class pattern is a common and very useful one:
abstract class baseObject {
protected function doSomething () {
print 'foo bar';
}
public function getSomething () {
return 'bar foo';
}
}
class foo extends baseObject {
public function bar () {
$this->doSomething();
}
}
$myObject = new foo();
$myObject->bar();
echo $myObject->getSomething();
You can experiment with the above code here: http://codepad.org/neRtgkcQ
I would usually stick them in a class anyway and mark the methods static. You might call it a static class, even though PHP actually has no such thing (you can't put the static keyword in front of a class). It's still better than having the functions globally because you avoid possible naming conflicts. The class becomes a sort of namespace, but PHP also has its own namespace which may be better suited to your purpose.
You might even find later that there are indeed properties you can add, even if they too are static, such as lazy-loaded helper objects, cached information, etc.
I'd use classes with static methods in such case:
class Tools {
static public function myMethod() {
return 1*1;
}
}
echo Tools::myMethod();
EDIT
As already mentioned by Chris and yes123: if the hoster already runs PHP 5.3+, you should consider using namespace. I'd recommend a read of Matthew Weier O'Phinney's article Why PHP Namespaces Matter, if you're not sure if it's worth switching to namespaces.
EDIT
Even though the ones generalizing usage of static methods as "bad practice" or "nonsense" did not explain why they consider it to be as such - which imo would've been more constructive - they still made me rethinking and rereading.
The typical arguments will be, that static methods can create dependencies and because of that can make unit testing and class renaming impossible.
If unit testing isn't used at all (maybe programming for home/personal use, or low-budget projects, where no one is willing to pay the extra costs of unit testing implementations) this argument becomes obsolete, of course.
Even if unit testing is used, creation of static methods dependencies can be avoided by using $var::myMethod(). So you still could use mocks and rename the class...
Nevertheless I came to the conclusion that my answer is way too generalized.
I think I better should've wrote: It depends.
As this most likely would result in an open ended debate of pros and cons of all the different solutions technically possible, and of dozens of possible scenarios and environments, I'm not willing going into this.
I upvoted Chris' answer now. It already covers most technical possibilities and should serve you well.
Treating them as a class does give you the benefit of a namespace, though you could achieve the same thing by prefixing them like PHP does with the array_* functions. Since you don't have any properties, that basically implies that all your methods are static (as Class::method()). This isn't an uncommon practice in Java.
By using a class, you also have the ability, if necessary, to inherit from a parent class or interface. An example of this might be class constants defined for error codes your functions might return.
EDIT: If PHP 5.3+ is available, the Namespace feature is ideal. However, PHP versions still lag in a lot of hosts and servers, especially those running enterprise-stable Linux distributions.
I've seen it a few different ways, all have their warts but all worked for the particular project in which they were utilized.
one file with all of the functions
one file with each function as its own class
one massive utilities class with all of the methods
one utils.php file that includes files in utils folder with each
function in its own file
Yes, it's OK formally... As any class is methods + properties. But when you pack in class just some functions -- it`s become not ideal OOP. If you have bunch of functions, that groupped, but not used some class variables -- it' seems, that you have somewhere a design problem.
My current feeling here is "Huston, we have a problem".
If you use exactly functions, there one reason to wrap them in static class - autoloader.
Of course, it creates high coupling, and it's may to be bad for testing (not always), but... Simple functions are not better than static class in this case :) Same high coupling, etc.
In ideal OOP architecture, all functions will be methods of some objects. It's just utopia, but we should to build architecture as close as we can to ideal.
Writing a bunch of "generic-but-related" functions is usually bad idea. Most likely you don't see problem clear enough to create proper objects.
It is bad idea not because it is "not ideal OOP". It is not OOP at all.
"The base class pattern" brought by Chris is another bad idea - google for: "favor composition over inheritance".
"beeing extra careful" with function_exists('myFunction') is not but idea. It is a nightmare.
This kind of code is currently avoided even in modern javascript...

How unique is PHP's __autoload()?

PHP's __autoload() (documentation) is pretty interesting to me. Here's how it works:
You try to use a class, like new Toast_Mitten()(footnote1)
The class hasn't been loaded into memory. PHP pulls back its fist to sock you with an error.
It pauses. "Wait," it says. "There's an __autoload() function defined." It runs it.
In that function, you have somehow mapped the string Toast_Mitten to classes/toast_mitten.php and told it to require that file. It does.
Now the class is in memory and your program keeps running.
Memory benefit: you only load the classes you need. Terseness benefit: you can stop including so many files everywhere and just include your autoloader.
Things get particularly interesting if
1) Your __autoload() has an automatic way of determining the file path and name from the class name. For instance, maybe all your classes are in classes/ and Toast_Mitten will be in classes/toast_mitten.php. Or maybe you name classes like Animal_Mammal_Weasel, which will be in classes/animal/mammal/animal_mammal_weasel.php.
2) You use a factory method to get instances of your class.
$Mitten = Mitten::factory('toast');
The Mitten::factory method can say to itself, "let's see, do I have a subclass called Toast_Mitten()? If so, I'll return that; if not, I'll just return a generic instance of myself - a standard mitten. Oh, look! __autoload() tells me there is a special class for toast. OK, here's an instance!"
Therefore, you can start out using a generic mitten throughout your code, and when the day comes that you need special behavior for toast, you just create that class and bam! - your code is using it.
My question is twofold:
(Fact) Do other languages have similar constructs? I see that Ruby has an autoload, but it seems that you have to specify in a given script which classes you expect to use it on.
(Opinion) Is this too magical? If your favorite language doesn't do this, do you think, "hey nifty, we should have that" or "man I'm glad Language X isn't that sloppy?"
1 My apologies to non-native English speakers. This is a small joke. There is no such thing as a "toast mitten," as far as I know. If there were, it would be a mitten for picking up hot toast. Perhaps you have toast mittens in your own country?
Both Ruby and PHP get it from AUTOLOAD in Perl.
http://perldoc.perl.org/perltoot.html#AUTOLOAD:-Proxy-Methods
http://perldoc.perl.org/AutoLoader.html
Note that the AutoLoader module is a set of helpers for common tasks using the AUTOLOAD functionality.
Do not use __autoload(). It's a global thing so, by definition, it's somewhat evil. Instead, use spl_autoload_register() to register yet another autoloader to your system. This allows you to use several autoloaders, what is pretty common practice.
Respect existing conventions. Every part of namespaced class name is a directory, so new MyProject\IO\FileReader(); should be in MyProject/IO/FileReader.php file.
Magic is evil!
The Mitten::factory method can say to itself, "let's see, do I have a subclass called Toast_Mitten()? If so, I'll return that; if not, I'll just return a generic instance of myself - a standard mitten. Oh, look! __autoload() tells me there is a special class for toast. OK, here's an instance!"
Rather such tricky code, use simple and verbose one:
try {
$mitten = new ToastMitten();
// or $mitten = Mitten::factory('toast');
} catch (ClassNotFoundException $cnfe) {
$mitten = new BaseMitten();
}
I think this feature comes in very handy, and I have not seen any features like it else where. Nor have I needed these features else where.
Java has something similar. It's called a ClassLoader. Probably other languages too, but they stick with some default implementation.
And, while we're at this. It would have been nice if __autoload loaded any type of symbols, not just classes: constants, functions and classes.
See Ruby's Module#const_missing
I just learned this: Ruby has a method on Module called const_missing that gets called if you call Foo::Bar and Bar isn't in memory yet (although I suppose that Foo has to be in memory).
This example in ruby-doc.org shows a way to use that to implement an autoloader for that module. This is in fact what Rails uses to load new ActiveRecord model classes, according to "Eloquent Ruby" by Russ Olsen (Chapter 21, "Use method_missing for flexible error handling", which also covers const_missing).
It's able to do this because of the "convention over configuration" mindset: if you reference a model called ToastMitten, if it exists, it will be in app/models/toast_mitten.rb. If you could put that model any place you wanted, Rails wouldn't know where to look for it. Even if you're not using Rails, this example, and point #1 in my question, shows how useful it can be to follow conventions, even if you create them yourself.

Strategy for developing namespaced and non-namespaced versions of same PHP code

I'm maintaining library written for PHP 5.2 and I'd like to create PHP 5.3-namespaced version of it. However, I'd also keep non-namespaced version up to date until PHP 5.3 becomes so old, that even Debian stable ships it ;)
I've got rather clean code, about 80 classes following Project_Directory_Filename naming scheme (I'd change them to \Project\Directory\Filename of course) and only few functions and constants (also prefixed with project name).
Question is: what's the best way to develop namespaced and non-namespaced versions in parallel?
Should I just create fork in repository and keep merging changes between branches? Are there cases where backslash-sprinkled code becomes hard to merge?
Should I write script that converts 5.2 version to 5.3 or vice-versa? Should I use PHP tokenizer? sed? C preprocessor?
Is there a better way to use namespaces where available and keep backwards compatibility with older PHP?
Update: Decided against use of namespaces after all.
I don't think preprocessing the 5.3 code this is a great idea. If your code is functionally identical in both PHP 5.2 and 5.3 with the exception of using namespaces, instead of underscore-separated prefixes, why use namespaces at all? In that case it sounds to me like you want to use namespaces, for the sake of using namespaces..
I do think you'll find that as you migrate to namespaces, you will start to 'think a bit differently' about organizing your code.
For this reason, I strongly agree with your first solution. Create a fork and do backports of features and bugfixes.
Good luck!
This is a followup to my previous answer:
The namespace simulation code got quite stable. I already can get symfony2 to work (some problems still, but basically). Though there is still some stuff missing like variable namespace resolution for all cases apart from new $class.
Now I wrote a script which will iterate recursively through a directory and process all files: http://github.com/nikic/prephp/blob/master/prephp/namespacePortR.php
Usage Instructions
Requirements for your code to work
Your classnames mustn't contain the _ character. If they do, classnames could get ambiguous while converting.
Your code mustn't redeclare any global functions or constants within a namespace. Thus it is ensured that all your code may be resolved at compile-time.
Basically these are the only restrictions to your code. Though I should note that in a default configuration the namespacePortR will not resolve things like $className = 'Some\\NS\\Class'; new $className, because it would require inserting additional code. It's better that this is patched up later (either manually or using an automated patching system.)
Configuration
As we have made the assumption that no global function or constant is redeclared in a namespace you must set the assumeGlobal class constant in the namespace listener. In the same file set the SEPARATOR constant to _.
In the namespacePortR change the configuration block to satisfy your needs.
PS: The script may be provided a ?skip=int option. This tells it to skip the first int files. You should not need it, if you have set the override mode to intelligent.
Here's what I've found:
Doing this with regular expressions is a nightmare. You can get most of it done with just a few simple expressions, but then edge cases are a killer. I've ended up with horrible, fragile mess that barely works with one codebase.
It's doable with built-in tokenizer and simple recursive descent parser that handles only simplified subset of the language.
I've ended up with rather ugly design (parser and transformer in one – mostly just changing or re-emitting tokens), because it seemed too much work to build useful syntax tree with whitespace maintained (I wanted resulting code to be human-readable).
I wanted to try phc for this, but couldn't convince its configure that I have built required version of Boost library.
I haven't tried ANTLR for this yet, but it's probably the best tool for that kind of tasks.
I am working on a project that emulates PHP 5.3 on PHP 5.2: prephp. It includes namespace support (not yet complete though.)
Now, out of the experience of writing this there is one ambiguity problem in namespace resolution: Unqualified function calls and constant lookups have a fallback to the global namespace. So you could convert your code automatically only if you either fully qualified or qualified all your function calls/constant lookups or if you didn't redefine any function or constant in a namespace with the same name as a PHP built in function.
If you strictly adhered to this practice (whichever of them you choose) it would be fairly easy to convert your code. It would be a subset of the code for emulating namespaces in prephp. If you need help with the implementation, fell free to ask me, I would be interested ;)
PS: The namespace emulation code of prephp isn't complete yet and may be buggy. But it may give you some insights.
Here's the best answer I think you're going to be able to find:
Step 1: Create a directory called 5.3 for every directory w/ php5.3 code in it and stick all 5.3-specific code in it.
Step 2: Take a class you want to put in a namespace and do this in 5.3/WebPage/Consolidator.inc.php:
namespace WebPage;
require_once 'WebPageConsolidator.inc.php';
class Consolidator extends \WebpageConsolidator
{
public function __constructor()
{
echo "PHP 5.3 constructor.\n";
parent::__constructor();
}
}
Step 3: Use a strategy function to use the new PHP 5.3 code. Place in non-PHP5.3 findclass.inc.php:
// Copyright 2010-08-10 Theodore R. Smith <phpexperts.pro>
// License: BSD License
function findProperClass($className)
{
$namespaces = array('WebPage');
$namespaceChar = '';
if (PHP_VERSION_ID >= 50300)
{
// Search with Namespaces
foreach ($namespaces as $namespace)
{
$className = "$namespace\\$className";
if (class_exists($className))
{
return $className;
}
}
$namespaceChar = "\\";
}
// It wasn't found in the namespaces (or we're using 5.2), let's search global namespace:
foreach ($namespaces as $namespace)
{
$className = "$namespaceChar$namespace$className";
if (class_exists($className))
{
return $className;
}
}
throw new RuntimeException("Could not load find a suitable class named $className.");
}
Step 4: Rewrite your code to look like this:
<?php
require 'findclass.inc.php';
$includePrefix = '';
if (PHP_VERSION_ID >= 50300)
{
$includePrefix = '5.3/';
}
require_once $includePrefix . 'WebPageConsolidator.inc.php';
$className = findProperClass('Consolidator');
$consolidator = new $className;
// PHP 5.2 output: PHP 5.2 constructor.
// PHP 5.3 output: PHP 5.3 constructor. PHP 5.2 constructor.
That will work for you. It is a cludge performance-wise, but just a little, and will be done away with when you decide to stop supporting 5.3.
What I did, with a large codebase that used the underscore naming convention (among others), and require_once a whole lot in lieu of an autoloader, was to define an autoloader, and add class_alias lines in the files defining aliases to a classes old name after changing their names to be nice with namespaces.
I then started removing require_once statements where execution was not dependent on inclusion order, since the autoloader would pick stuff up, and namespace stuff as I went along fixing bugs and so on.
It's worked quite well so far.
Well, I don't know if it is the "best" way, but in theory, you could use a script to take your 5.3 migrate code and backport it into 5.2 (potentially even using PHP).
On your namespace files you would want to do something convert:
namespace \Project\Directory\Filename;
class MyClass {
public $attribute;
public function typedFunction(MyClass $child) {
if ($child instanceof MyClass) {
print 'Is MyClass';
}
}
}
To something like:
class Project_Directory_Filename_MyClass {
public $attribute;
public function typedFunction(Project_Directory_Filename_MyClass $child) {
if ($child instanceof Project_Directory_Filename_MyClass) {
print 'Is MyClass';
}
}
}
And in your namespace code you would need to convert from:
$myobject = new Project\Directory\Filename\MyClass();
To:
$myobject = new Project_Directory_Filename_MyClass();
While all your includes and requires would stay the same, I think you would almost need to keep some sort of Cache of all your classes and namespace to do the complex conversion around the 'instanceof' and typed parameters if you use them. That is the trickiest thing I can see.
I haven't tested this on my own, but you may take a look on this php 5.2 -> php 5.3 conversion script.
It's is not the same as 5.3 -> 5.2, but maybe you will find some useful stuff there.
Our DMS Software Reengineering Toolkit can likely implement your solution pretty well. It is designed to carry out reliable source code transformations, by using AST to AST transforms coded in surface-syntax terms.
It has a PHP Front End which is a full, precise PHP parser, AST builder, and AST to PHP-code regenerator. DMS provides for AST prettyprinting, or fidelity printing ("preserve column numbers where possible").
This combination has been used to implement a variety of trustworthy PHP source code manipulation tools for PHP 4 and 5.
EDIT (in response to a somewhat disbelieving comment):
For the OP's solution, the following DMS transformation rule should do most of the work:
rule replace_underscored_identifier_with_namespace_path(namespace_path:N)
:namespace_path->namespace_path
"\N" -> "\complex_namespace_path\(\N\)"
if N=="NCLASS_OR_NAMESPACE_IDENTIFIER" && has_underscores(N);
This rule finds all "simple" identifiers that are used where namespace paths are allowed,
and replaces those simple identifiers with the corresponding namespace path constructed
by tearing the string for the identifier apart into consitutent elements separated by underscores. One has to code some procedural help
in DMS's implementation langauge, PARLANSE, to check that the identifier contains underscores ("has_underscores"), and to implement the tear apart logic by building the corresponding namespace path subtree ("complex_namespace_path").
The rule works by abstractly identifying trees that correspond to language nonterminals (in this case, "namespace_path", and replacing simple ones by more complex trees that represent the full name space path. The rule is written as text, but the rule itself is parsed by DMS to construct the trees it needs to match PHP trees.
DMS rule application logic can trivially apply this rule everywhere throughout the AST produced by the PHP parser.
This answer may seem overly simple in the face of all the complicated stuff that makes up the PHP langauge, but all that other complexity is hidden in the PHP langauge definition used by DMS; that definition is some 10,000 lines of lexical and grammar definitions, but is already tested and working. All the DMS machinery, and these 10K lines, are indications of why simple regexes can't do the job reliably. (It is surprising how much machinery it takes to get this right; I've been working on DMS since 1995).
If you want to see all the machinery that makes up how DMS defines/manipulates a language, you can see a nice simple example.

Categories