I'm attempting to automate the removal of namespaces from a PHP class collection to make them PHP 5.2 compatible. (Shared hosting providers do not fancy rogue PHP 5.3 installations. No idea why. Also the code in question doesn't use any 5.3 feature additions, just that syntax. Autoconversion seems easier than doing it by hand or reimplementing the codebase.)
For rewriting the *.php scripts I'm basically running over a tokenizer list. The identifier searching+merging is already complete. But I'm a bit confused now how to accomplish the actual rewriting.
function rewrite($name, $namespace, $use) {
global $identifiers2; // list of known/existing classes
/*
bounty on missing code here
*/
return strtr($name, "\\", "_"); // goal: backslash to underscore
}
That function is going to be invoked on each found identifier (whether class, function or const). It will receive some context information to transform a local identifier into an absolute/global $name:
$name =
rewrite(
"classfuncconst", # <-- foreach ($names as $name)
"current\name\space",
array(
'namespc' => 'use\this\namespc',
'alias' => 'from\name\too',
...
)
);
At this stage I've already prepared an $identifiers2 list. It contains a list of all known classes, functions and constant names (merged for simplicity here).
$identifiers2 = array( // Alternative suggestions welcome.
"name\space\Class" => "Class", // - list structure usable for task?
"other\ns\func1" => "func1", // - local name aliases helpful?
"blip\CONST" => "CONST", // - (ignore case-insensitivity)
The $name parameter as received by the rewrite() function can be a local, unqualified, \absolute or name\spaced identifier (but just identifers, no expressions). The $identifiers2 list is crucial to resolve unqualified identifiers, which can refer to things in the current namespace, or if not found there, global stuff.
And the various use namespace aliases have to be taken into account and add some complication besides the namespace resolving and precedence rules.
So, how / in which order would you attempt to convert the variations of class/function names here?
Mental Laziness Bounty.
To make this a less blatant plzsendtehcodez question: an explainative instruction list or pseudo-code answer would be eligible too. And if another approach would be more suitable for the task, please elaborate on that rather. (But no, upgrading PHP or changing the hoster is not an option.)
I think I've figured it out meanwhile, but the question is still open for answers / implementation proposals. (Otherwise the bounty will obviously go to nikic.)
In an existing question on migration of namespaces to pseudo namespaced code I already introduced a conversion tool I have written as part of a larger project. I haven't maintained this project anymore since that point, but as far as I remember the namespace replacements did work. (I may reimplement this project using a proper parser at some point. Working with plain tokens has proven to be quite a tedious task.)
You will find my implementation of namespace -> pseudo-namespace resolution in the namespace.php. I based the implementation on the namespace resolution rules, which will probably be of help for you, too.
To make this a less blatant readmycodez answer, here the basic steps the code does:
Get the identifier to be resolved and ensure that it is not a class, interface, function or constant declaration (these are resolved in registerClass and registerOther by simply prepending the current namespace with ns separators replaced by underscores).
Determine what type of identifier it is: A class, a function or a constant. (As these need different resolution.)
Make sure we do not resolve the self and parent classes, nor the true, false and null constants.
Resolve aliases (use list):
If the identifier is qualified get the part before the first namespace separator and check whether there exists an alias with that name. If it does, replace the first part with the aliased namespace (now the identifier will be fully qualified). Otherwise prepend the current namespace.
If identifier is unqualified and the identifier type is class, check whether the identifier is an alias and if it is, replace it with the aliased class.
If the identifier is fully qualified now drop the leading namespace separator and replace all other namespace separators with underscores and end this algorithm.
Otherwise:
If we are in the global namespace no further resolution required, thus end this algorithm.
If the identifier type is class prepend the current namespace, replace all NS separators with underscores and end this algorithm.
Otherwise:
If the function / constant is defined globally leave the identifier as is and end this algorithm. (This assumes that no global functions are redefined in a namespace! In my code I don't make this assumption, thus I insert dynamic resolution code.)
Otherwise prepend the current namespace and replace all namespace separators with underscores. (Seems like I got a fault in my code here: I don't do this even if the assumeGlobal flag is set. Instead I always insert the dynamic dispatch code.)
Additional note: Don't forget that one can also write namespace\some\ns. I resolve these constructs in the NS function (which is also responsible for finding namespace declarations).
Related
Essentially, I seek to pass a static class method to a callback, but do not wish to do so using a hard-coded string, but rather the fully-qualified class method literal. We can do that using classes like so:
$name = NS\FooClass::class;
instead of:
$name = 'NS\FooClass';
which will give us the string of the fully-qualified name of the class. I seek to be able to do something similar for a class method like so:
$name = NS\FooClass::foo_method::method;
instead of:
$name = 'NS\FooClass::foo_method';
It is more manageable and I can use the IDE functionality way better using the literals. Any similar way I can achieve what I want with the class methods without using strings?
There is currently no such mechanism built into the language. It has been suggested - see for instance this discussion from Feb 2020 - but there are more nuances to think about than might be immediately apparent; notably:
Should the syntax resolve at run-time and check the existence of the class and the method (::class in most cases doesn't; a bare function like strlen::func would have to because of the way namespaces resolve; an object implementing __callStatic could never be used this way)?
Should the result be a string, an array (see below), or a Closure object?
Anyway, that's a topic for elsewhere...
As the manual page on the callable type says, there are two ways to specify a static method for use as a callback:
As a string, as in your example 'NS\FooClass::foo_method'
As an array where the first part is a class name, and the second part is a the method name: ['NS\FooClass', 'foo_method']
Since only the class name needs to be qualified with namespace information, you can use ::class with the second syntax to get nearly what you wanted:
$callback = [NS\FooClass::class, 'foo_method'];
This allows any decent IDE to spot the reference to the class, and allows you to reference it by an imported or aliased name.
It's worth noting that if the callable type is specified in a parameter or return type declaration or a docblock, some IDEs (e.g. PhpStorm) will "understand" either format as a reference to the method, and include it in features like "find usages" and "go to declaration".
I have been reading about Using namespaces: Aliasing/Importing in PHP. There are two things I don't understand.
It says,
Note that for namespaced names (fully qualified namespace names
containing namespace separator, such as Foo\Bar as opposed to global
names that do not, such as FooBar), the leading backslash is
unnecessary and not recommended, as import names must be fully
qualified, and are not processed relative to the current namespace.
Can someone please explain
What does it mean?
What's the purpose of using namespaces aliasing ? Given that I know the purpose of using namespaces.
What does it mean?
It really means what it says and shows in the example. When importing a namespaced class, you should omit the first backslash:
use My\Full\Classname as Another; // recommended
use \My\Full\Classname as Another; // not recommended
The reason being that use expects a fully qualified namespace. You cannot use a relative path. In other words if you are in the My\ namespace already, you cannot use Full\Classname.
What's the purpose?
It's explained in the first chapter actually:
In the PHP world, namespaces are designed to solve two problems that authors of libraries and applications encounter when creating re-usable code elements such as classes or functions:
Name collisions between code you create, and internal PHP classes/functions/constants or third-party classes/functions/constants.
Ability to alias (or shorten) Extra_Long_Names designed to alleviate the first problem, improving readability of source code.
So, the purpose is to shorten and/or to avoid clashes, e.g. when you have two classes called Foo and need to use both, you have to have a way to resolve that conflict (at least if you don't want to use the fully qualified name each time):
use My\Very\Long\Namespaced\Class\Named\Foo as Foo;
use My\Other\Foo as OtherFoo;
And then you can use
$foo = new Foo;
$otherFoo = new OtherFoo;
So that's short and simple and doesn't clash. There really isn't much more to it.
You might need to import two totally separate name spaces, that happen to have the same name. Like, maybe you need to select data from mysql and then insert into oracle, and you're using some database library which uses namespacing.
use Database\Mysql\Connection;
use Database\Oracle\Connection;
$conn = new Connection(); //which one is it??
You could either skip importing a namespace
use Database\Mysql\Connection;
use Database\Oracle\Connection;
$conn = new Database\Mysql\Connection();
or alias at least one of them
use Database\Mysql\Connection as MysqlConnection;
use Database\Oracle\Connection as OracleConnection;
$conn = new MysqlConnection();
I have some code I'm working with that was written by the guy before me and I'm trying to look it over and get a feel for the system and how it all works. I am also fairly new to PHP, so I have a few questions for those willing and able to provide.
The basic breakdown of the code in question is this:
$__CMS_CONN__ = new PDO(DB_DSN, DB_USER, DB_PASS);
Record::connection($__CMS_CONN__);
First question, I know the double underscore makes it magic, but I haven't been able to find anywhere exactly what properties that extends to it, beyond that it behaves like a constant, kind of. So what does that mean?
class Record
{
public static $__CONN__ = false;
final public static function connection($connection)
{
self::$__CONN__ = $connection;
}
}
Second, these two pieces go together. They are each in separate files. From what I've read, static variables can be referenced in the same way as static functions, so couldn't you just call the variable and set it directly instead of using the function?
I get the feeling it's more involved than I am aware, but I need to start somewhere.
This isn't a magic variable. The person who wrote that shouldn't really use double underscores for variable names like that because it can cause confusion.
This is just a static property on a class. Which means it is shared between instances of that class (in the same php request).
Have a look at the docs for static properties if you're unsure on how these work.
There are several predefined "magic constants" that use this naming style. However, I don't think the underscores mean anything special (as far as the language is concerned); i.e. defining your own variable like this won't bestow it any magical properties. It may be part of the previous programmer's naming convention, and if so, it's probably ill-advised.
Setting a property via a function can, in many circumstances, make the "client" code more resilient to changes in the implementation of the class. All implementation details can be hidden inside the method (known as a "setter"). However, there are strong feelings about whether this is a good idea or not (I, for one, am not a big fan).
Two underscores do not make a variable magic.
It's better to use getters/setters than to access class properties directly.
The PHP manual has this to say on naming variables (and other symbols) with underscores:
PHP reserves all symbols starting with __ as magical. It is recommended that you do not create symbols starting with __ in PHP unless you want to use documented magical functionality.
Pay particular attention to the use of the words "reserves" and "documented". They mean double underscores shouldn't be used for user-defined symbols as it may lead to future conflicts, and that unless the symbol is explicitly mentioned in the manual as being magic, it's mundane.
In C#, variables and other things can be named protected names such as "class" by prepending the name with an # sign. So, #class is a valid name. Is it possible to do this same thing in PHP? I am using a class of constants to simulate an enum for HTML attributes such as ID, and Class. For now I am using "CssClass" but I'd rather use the name Class somehow.
Nope, not possible, at least not for class constants.
You cannot use any of the following [reserved] words as constants, class names, function or method names.
I don't know about C#, but there isn't any special symbol in PHP to transform a keyword into an identifier. As long as you don't name it exactly the same as a keyword (barring letter case), it'll just be any normal constant name.
How about a (different since it's not just CSS) prefix? Gets repetitive to type, but is a nice workaround. I realize this may be redundant as well if your class is named something like HTMLAttribute, but it's the easiest way out.
const A_ID = 'id';
const A_CLASS = 'class';
// etc
Yes, it is possible.
In fact you can define anything as constant:
define("define", 1);
define("class", 1);
define("if", 1);
define("=.+*", 1);
However, you can not use all defined constants.
You can query them with constant("if") again. But this is not exactly what you asked for. So unlike C# there is no shortcut to use any random constant. But as for naming them, there are almost no restrictions. (Might be a bug though. It's PHP.)
Constants:
The name of a constant follows the same rules as any label in PHP. A valid constant name starts with a letter or underscore, followed by any number of letters, numbers, or underscores. As a regular expression, it would be expressed thusly: [a-zA-Z_\x7f-\xff][a-zA-Z0-9_\x7f-\xff]*
List of reserved keywords:
These words have special meaning in PHP. Some of them represent things which look like functions, some look like constants, and so on--but they're not, really: they are language constructs. You cannot use any of the following words as constants, class names, function or method name.
[see list here]
Within these rules you're free to make up your names. So, for instance, you could name a constant _CLASS, but not CLASS. I'd avoid the use of such ambiguous names though and namespace constants that are particular to the app, like MYAPP_CLASS.
Going from PHP5 to PHP7, a class constant could be named almost anything:
class ReservedWord
{
// Works in PHP >= 7.0 only
const NULL = null;
const TRUE = true;
}
However, thanks to this part of the manual and this comment, I've found that a class constant cannot be named these few things (see the test here):
class
static
__halt_compiler (oh, that was so useful!)
Edit: As I found in here in an RFC, the reason why class constant does not work is the name resolution ::class. However, still no idea about the two others.
I'm pretty new to PHP, but I've been programming in similar languages for years. I was flummoxed by the following:
class Foo {
public $path = array(
realpath(".")
);
}
It produced a syntax error: Parse error: syntax error, unexpected '(', expecting ')' in test.php on line 5 which is the realpath call.
But this works fine:
$path = array(
realpath(".")
);
After banging my head against this for a while, I was told you can't call functions in an attribute default; you have to do it in __construct. My question is: why?! Is this a "feature" or sloppy implementation? What's the rationale?
The compiler code suggests that this is by design, though I don't know what the official reasoning behind that is. I'm also not sure how much effort it would take to reliably implement this functionality, but there are definitely some limitations in the way that things are currently done.
Though my knowledge of the PHP compiler isn't extensive, I'm going try and illustrate what I believe goes on so that you can see where there is an issue. Your code sample makes a good candidate for this process, so we'll be using that:
class Foo {
public $path = array(
realpath(".")
);
}
As you're well aware, this causes a syntax error. This is a result of the PHP grammar, which makes the following relevant definition:
class_variable_declaration:
//...
| T_VARIABLE '=' static_scalar //...
;
So, when defining the values of variables such as $path, the expected value must match the definition of a static scalar. Unsurprisingly, this is somewhat of a misnomer given that the definition of a static scalar also includes array types whose values are also static scalars:
static_scalar: /* compile-time evaluated scalars */
//...
| T_ARRAY '(' static_array_pair_list ')' // ...
//...
;
Let's assume for a second that the grammar was different, and the noted line in the class variable delcaration rule looked something more like the following which would match your code sample (despite breaking otherwise valid assignments):
class_variable_declaration:
//...
| T_VARIABLE '=' T_ARRAY '(' array_pair_list ')' // ...
;
After recompiling PHP, the sample script would no longer fail with that syntax error. Instead, it would fail with the compile time error "Invalid binding type". Since the code is now valid based on the grammar, this indicates that there actually is something specific in the design of the compiler that's causing trouble. To figure out what that is, let's revert to the original grammar for a moment and imagine that the code sample had a valid assignment of $path = array( 2 );.
Using the grammar as a guide, it's possible to walk through the actions invoked in the compiler code when parsing this code sample. I've left some less important parts out, but the process looks something like this:
// ...
// Begins the class declaration
zend_do_begin_class_declaration(znode, "Foo", znode);
// Set some modifiers on the current znode...
// ...
// Create the array
array_init(znode);
// Add the value we specified
zend_do_add_static_array_element(znode, NULL, 2);
// Declare the property as a member of the class
zend_do_declare_property('$path', znode);
// End the class declaration
zend_do_end_class_declaration(znode, "Foo");
// ...
zend_do_early_binding();
// ...
zend_do_end_compilation();
While the compiler does a lot in these various methods, it's important to note a few things.
A call to zend_do_begin_class_declaration() results in a call to get_next_op(). This means that it adds a new opcode to the current opcode array.
array_init() and zend_do_add_static_array_element() do not generate new opcodes. Instead, the array is immediately created and added to the current class' properties table. Method declarations work in a similar way, via a special case in zend_do_begin_function_declaration().
zend_do_early_binding() consumes the last opcode on the current opcode array, checking for one of the following types before setting it to a NOP:
ZEND_DECLARE_FUNCTION
ZEND_DECLARE_CLASS
ZEND_DECLARE_INHERITED_CLASS
ZEND_VERIFY_ABSTRACT_CLASS
ZEND_ADD_INTERFACE
Note that in the last case, if the opcode type is not one of the expected types, an error is thrown – The "Invalid binding type" error. From this, we can tell that allowing the non-static values to be assigned somehow causes the last opcode to be something other than expected. So, what happens when we use a non-static array with the modified grammar?
Instead of calling array_init(), the compiler prepares the arguments and calls zend_do_init_array(). This in turn calls get_next_op() and adds a new INIT_ARRAY opcode, producing something like the following:
DECLARE_CLASS 'Foo'
SEND_VAL '.'
DO_FCALL 'realpath'
INIT_ARRAY
Herein lies the root of the problem. By adding these opcodes, zend_do_early_binding() gets an unexpected input and throws an exception. As the process of early binding class and function definitions seems fairly integral to the PHP compilation process, it can't just be ignored (though the DECLARE_CLASS production/consumption is kind of messy). Likewise, it's not practical to try and evaluate these additional opcodes inline (you can't be sure that a given function or class has been resolved yet), so there's no way to avoid generating the opcodes.
A potential solution would be to build a new opcode array that was scoped to the class variable declaration, similar to how method definitions are handled. The problem with doing that is deciding when to evaluate such a run-once sequence. Would it be done when the file containing the class is loaded, when the property is first accessed, or when an object of that type is constructed?
As you've pointed out, other dynamic languages have found a way to handle this scenario, so it's not impossible to make that decision and get it to work. From what I can tell though, doing so in the case of PHP wouldn't be a one-line fix, and the language designers seem to have decided that it wasn't something worth including at this point.
My question is: why?! Is this a "feature" or sloppy implementation?
I'd say it's definitely a feature. A class definition is a code blueprint, and not supposed to execute code at the time of is definition. It would break the object's abstraction and encapsulation.
However, this is only my view. I can't say for sure what idea the developers had when defining this.
You can probably achieve something similar like this:
class Foo
{
public $path = __DIR__;
}
IIRC __DIR__ needs php 5.3+, __FILE__ has been around longer
It's a sloppy parser implementation. I don't have the correct terminology to describe it (I think the term "beta reduction" fits in somehow...), but the PHP language parser is more complex and more complicated than it needs to be, and so all sorts of special-casing is required for different language constructs.
My guess would be that you won't be able to have a correct stack trace if the error does not occur on an executable line... Since there can't be any error with initializing values with constants, there's no problem with that, but function can throw exceptions/errors and need to be called within an executable line, and not a declarative one.