Situation
I have a file which contains 2-3 classes. Each class may and may not contain a destructor function __destruct().
I am retrieving these classes using composer and have no real control over it, neither I can selectively say 'go on and ignore lines X~Z' where the destructor calls are located, as these lines may slightly change.
I need to get rid of all the destructors - or at least prevent them from executing on object destruction, as other part of code which I have no control over jumps into endless destruct loop.
Approaches I tried and considered:
Regex
My initial thought was to write a regex, that would match all these functions and removed them (best effort regex, I am sure there would be corner cases for which the regex would not work), but eventually, after remembering various questions about using regexes inappropriately, I reconsidered - not mentioning the fact, that coming up with working regex seems to be a problem for me as well.
Curly brackets block used for nested scope in C
I have some background in C, so for a second I thought I could just match all the lines with function __destruct(), verify there is no { on the same line and comment the definition out, so I would get something like this:
# function __destruct()
{
echo "initiating destructor function";
$this->callToSomething();
}
Which I thought would not execute (or at very least - would not execute on object destruction, which is why I need to comment this out). But PHP does not allow this syntax and I can not do that. Maybe it allows for some other syntax which would achieve the same results?
Parsing, Reflection? ...?
I do not really want to write myself a PHP parser of its own for this task, but if that is what it will take, I will do it. But since there is already a Reflection class in PHP, maybe there would be a way to load class into the memory, remove the destructors in runtime and then overwrite class definition on disk?
I do not see why you cannot use a regex here. Sure, you need a more advanced pattern.
But you should be never forced to touch it.
Here is a pattern that finds __destruct() blocks:
function\s+__destruct\s*\(\s*\)\s*({(?>[^{}\\]++|\\.|(?1))*+})
Demo
All you need to do is to load the files in question, run
$re = '/function\s+__destruct\s*\(\s*\)\s*({(?>[^{}\\\\]++|\\\\.|(?1))*+})/s';
$result = preg_replace($re, '', $str);
And save the files back to disk. C'est ça.
Related
Is there a way to detect or find files containing class and code outside it that would be executed while autoloaded or required? Like examples or so:
<?php
class SomeClass
{
private $variable;
public function __construct($variable) {
$this->variable = $variable;
}
}
// example usage
$foo = new SomeClass('test');
var_dump($foo);
// some other code
I need to find them in one older project and there are too many files to check them manually.
Thanks.
EDIT: I cannot autoload all of them (as I dont know in what files the unwanted code might be) and even that wouldnt necessary show me any executed code unless it would have some error in it.
Maybe you search for the word class and pipe the result on the word new ?
grep -r --include "*.php" 'class' | grep 'new'
The proper (and least error-prone but hard) way to do this would be using PHP's Tokenizer. You could use glob to get all the PHP files. Then go over them one by one using the tokenizer marking them as a class file when they contain the T_CLASS (or T_NAMESPACE) token, then skipping over the class itself and looking for anything but those tokens (and namespaces, comments and such) outputting you the filename when you find it.
The not so proper method (and surely one prone to false positives) would be writing your own simple parser (which is probably harder than using the Tokenizer) or using some simple regular expression with awk combination to find out stuff that is around classes and namespaces. I'm not too experienced with that though (and I'm pretty sure regexp alone isn't enough) so I can't really provide an example.
I do not have control over the code I'm executing. This is a third party function, but not user-entered. These are things that are versioned so it is impractical that I poke there and change all die()'s into something more sane. Because a new version is coming from time to time and then I could paradoxically make the code even more insecure by trying mess with it poor error handling.
so let's say we have a function:
fuction myfunc() {
// lot's of complicated code
if (!is_file('myfile.txt') exit('file not found');
// and so on
}
What I'm trying to do is to somewhat run that piece of code and return to my main thread and then act accordingly with that error.
I've tried die() or eval() but this returns the whole script.
Bummer ?
A Hail Mary approach is to use runkit's runkit_function_redefine or function_override to redefine the functions die and exit to throw an Exception instead.
A potential problem is that the 3rd party can catch those exceptions and might not deal with them correctly. It's also very likely that you can't properly deal with the exception either.
You can use register_shutdown_function to run code after exit has been called. You are somewhat limited in what you can do at this point as some services have already been shut down (such as autoloading). I think you can still output content, not sure about sessions and other headers.
Another approach would be to run the code in a seperate php process (or http request), for instance by calling php through exec.
A more solid approach can be to add predicitions to your own code, ensuring the bad states are never reached when calling the 3rd party code. It is possible that not all preconditions can be met.
Ideally only code that is an entry point (like a router script) may exit. Using exit anywhere else is just shoddy programming really.
If you have not read Halcyon's answer, you should have a look at it first.
Since you mention eval(), I assume you have the code as a string. I will refer to both die() and exit() by just the latter, but things should be relevant for both.
You can try and replace occurrences of die() and exit() with something, and then eval. It's simple but that can be very messy.
One thing to look out for, IF you decide to do this, is that you may end up replacing occurrences which are not really 'code'. For example, echo "Let him exit()";.
Also, when you consider possible equivalent syntax, like exit (); , exit; or exit(1);, it gets much more unpleasant. You'll have to handle those with a regex.
But if you can safely assume that
those two signatures (die() and exit()) are the only ones you need to worry about; and
that strings (or other content) are not going to contain those 'phrases'
then you can use this approach.
Let's say that at the beginning of a random function variable $variables['content'] is 1,000 characters long.
This random function is very long, with many nested functions within.
At the end of the function $variables['content'] is only 20 characters long.
How do you find which of nested functions modified this variable?
Not sure how you'd want to return it but you could use the magic constant __LINE__. That returns the line of the current document.
You could create a variable called $variables['line'] and assign __LINE__ as the value, where appropriate.
If it were me, first I'd consider breaking apart the 1000-line beast. There's probably no good reason for it to be so huge. Yes, it'll take longer than just trying to monkey-patch your current bug, but you'll probably find dozens more bugs in that function just trying to break it apart.
Lecture over, I'd do a search/replace for $variables['content'].*=([^;]*); to a method call like this: $variables['content'] = hello(\1, __LINE__);. This will fail if you are assigning strings with semicolons in them or something similar, so make sure you inspect every change carefully. Write a hello() function that takes two parameters: whatever it is you're assigning to $variables['content'] and the line number. In hello(), simply print your line number to the log or standard error or whatever is most convenient, and then return the first argument unchanged.
When you're done fixing it all up, you can either remove all those silly logging functions, or you can see if 'setting the $variables['content'] action' is important enough to have its own function that does something useful. Refactoring can start small. :)
I think this is a problem code tracing can help with.
In my case I had this variable that was being modified across many functions, and I didn't know where.
The problem is that at some point in the program the variable (a string) was around 40,000 characters, then at the end of the program something had cut it to 20 characters.
To find this information I stepped through the code with the Zend debugger. I found the information I wanted (what functions modified the variable), but it took me a while.
Apparently XDebug does tell you what line numbers, in what functions the variables are modified:
Example, tracing documentation, project home, tutorial article.
I'm pretty new to PHP, but I've been programming in similar languages for years. I was flummoxed by the following:
class Foo {
public $path = array(
realpath(".")
);
}
It produced a syntax error: Parse error: syntax error, unexpected '(', expecting ')' in test.php on line 5 which is the realpath call.
But this works fine:
$path = array(
realpath(".")
);
After banging my head against this for a while, I was told you can't call functions in an attribute default; you have to do it in __construct. My question is: why?! Is this a "feature" or sloppy implementation? What's the rationale?
The compiler code suggests that this is by design, though I don't know what the official reasoning behind that is. I'm also not sure how much effort it would take to reliably implement this functionality, but there are definitely some limitations in the way that things are currently done.
Though my knowledge of the PHP compiler isn't extensive, I'm going try and illustrate what I believe goes on so that you can see where there is an issue. Your code sample makes a good candidate for this process, so we'll be using that:
class Foo {
public $path = array(
realpath(".")
);
}
As you're well aware, this causes a syntax error. This is a result of the PHP grammar, which makes the following relevant definition:
class_variable_declaration:
//...
| T_VARIABLE '=' static_scalar //...
;
So, when defining the values of variables such as $path, the expected value must match the definition of a static scalar. Unsurprisingly, this is somewhat of a misnomer given that the definition of a static scalar also includes array types whose values are also static scalars:
static_scalar: /* compile-time evaluated scalars */
//...
| T_ARRAY '(' static_array_pair_list ')' // ...
//...
;
Let's assume for a second that the grammar was different, and the noted line in the class variable delcaration rule looked something more like the following which would match your code sample (despite breaking otherwise valid assignments):
class_variable_declaration:
//...
| T_VARIABLE '=' T_ARRAY '(' array_pair_list ')' // ...
;
After recompiling PHP, the sample script would no longer fail with that syntax error. Instead, it would fail with the compile time error "Invalid binding type". Since the code is now valid based on the grammar, this indicates that there actually is something specific in the design of the compiler that's causing trouble. To figure out what that is, let's revert to the original grammar for a moment and imagine that the code sample had a valid assignment of $path = array( 2 );.
Using the grammar as a guide, it's possible to walk through the actions invoked in the compiler code when parsing this code sample. I've left some less important parts out, but the process looks something like this:
// ...
// Begins the class declaration
zend_do_begin_class_declaration(znode, "Foo", znode);
// Set some modifiers on the current znode...
// ...
// Create the array
array_init(znode);
// Add the value we specified
zend_do_add_static_array_element(znode, NULL, 2);
// Declare the property as a member of the class
zend_do_declare_property('$path', znode);
// End the class declaration
zend_do_end_class_declaration(znode, "Foo");
// ...
zend_do_early_binding();
// ...
zend_do_end_compilation();
While the compiler does a lot in these various methods, it's important to note a few things.
A call to zend_do_begin_class_declaration() results in a call to get_next_op(). This means that it adds a new opcode to the current opcode array.
array_init() and zend_do_add_static_array_element() do not generate new opcodes. Instead, the array is immediately created and added to the current class' properties table. Method declarations work in a similar way, via a special case in zend_do_begin_function_declaration().
zend_do_early_binding() consumes the last opcode on the current opcode array, checking for one of the following types before setting it to a NOP:
ZEND_DECLARE_FUNCTION
ZEND_DECLARE_CLASS
ZEND_DECLARE_INHERITED_CLASS
ZEND_VERIFY_ABSTRACT_CLASS
ZEND_ADD_INTERFACE
Note that in the last case, if the opcode type is not one of the expected types, an error is thrown – The "Invalid binding type" error. From this, we can tell that allowing the non-static values to be assigned somehow causes the last opcode to be something other than expected. So, what happens when we use a non-static array with the modified grammar?
Instead of calling array_init(), the compiler prepares the arguments and calls zend_do_init_array(). This in turn calls get_next_op() and adds a new INIT_ARRAY opcode, producing something like the following:
DECLARE_CLASS 'Foo'
SEND_VAL '.'
DO_FCALL 'realpath'
INIT_ARRAY
Herein lies the root of the problem. By adding these opcodes, zend_do_early_binding() gets an unexpected input and throws an exception. As the process of early binding class and function definitions seems fairly integral to the PHP compilation process, it can't just be ignored (though the DECLARE_CLASS production/consumption is kind of messy). Likewise, it's not practical to try and evaluate these additional opcodes inline (you can't be sure that a given function or class has been resolved yet), so there's no way to avoid generating the opcodes.
A potential solution would be to build a new opcode array that was scoped to the class variable declaration, similar to how method definitions are handled. The problem with doing that is deciding when to evaluate such a run-once sequence. Would it be done when the file containing the class is loaded, when the property is first accessed, or when an object of that type is constructed?
As you've pointed out, other dynamic languages have found a way to handle this scenario, so it's not impossible to make that decision and get it to work. From what I can tell though, doing so in the case of PHP wouldn't be a one-line fix, and the language designers seem to have decided that it wasn't something worth including at this point.
My question is: why?! Is this a "feature" or sloppy implementation?
I'd say it's definitely a feature. A class definition is a code blueprint, and not supposed to execute code at the time of is definition. It would break the object's abstraction and encapsulation.
However, this is only my view. I can't say for sure what idea the developers had when defining this.
You can probably achieve something similar like this:
class Foo
{
public $path = __DIR__;
}
IIRC __DIR__ needs php 5.3+, __FILE__ has been around longer
It's a sloppy parser implementation. I don't have the correct terminology to describe it (I think the term "beta reduction" fits in somehow...), but the PHP language parser is more complex and more complicated than it needs to be, and so all sorts of special-casing is required for different language constructs.
My guess would be that you won't be able to have a correct stack trace if the error does not occur on an executable line... Since there can't be any error with initializing values with constants, there's no problem with that, but function can throw exceptions/errors and need to be called within an executable line, and not a declarative one.
I'm looking to improve my PHP coding and am wondering what PHP-specific techniques other programmers use to improve productivity or workaround PHP limitations.
Some examples:
Class naming convention to handle namespaces: Part1_Part2_ClassName maps to file Part1/Part2/ClassName.php
if ( count($arrayName) ) // handles $arrayName being unset or empty
Variable function names, e.g. $func = 'foo'; $func($bar); // calls foo($bar);
Ultimately, you'll get the most out of PHP first by learning generally good programming practices, before focusing on anything PHP-specific. Having said that...
Apply liberally for fun and profit:
Iterators in foreach loops. There's almost never a wrong time.
Design around class autoloading. Use spl_autoload_register(), not __autoload(). For bonus points, have it scan a directory tree recursively, then feel free to reorganize your classes into a more logical directory structure.
Typehint everywhere. Use assertions for scalars.
function f(SomeClass $x, array $y, $z) {
assert(is_bool($z))
}
Output something other than HTML.
header('Content-type: text/xml'); // or text/css, application/pdf, or...
Learn to use exceptions. Write an error handler that converts errors into exceptions.
Replace your define() global constants with class constants.
Replace your Unix timestamps with a proper Date class.
In long functions, unset() variables when you're done with them.
Use with guilty pleasure:
Loop over an object's data members like an array. Feel guilty that they aren't declared private. This isn't some heathen language like Python or Lisp.
Use output buffers for assembling long strings.
ob_start();
echo "whatever\n";
debug_print_backtrace();
$s = ob_get_clean();
Avoid unless absolutely necessary, and probably not even then, unless you really hate maintenance programmers, and yourself:
Magic methods (__get, __set, __call)
extract()
Structured arrays -- use an object
My experience with PHP has taught me a few things. To name a few:
Always output errors. These are the first two lines of my typical project (in development mode):
ini_set('display_errors', '1');
error_reporting(E_ALL);
Never use automagic. Stuff like autoLoad may bite you in the future.
Always require dependent classes using require_once. That way you can be sure you'll have your dependencies straight.
Use if(isset($array[$key])) instead of if($array[$key]). The second will raise a warning if the key isn't defined.
When defining variables (even with for cycles) give them verbose names ($listIndex instead of $j)
Comment, comment, comment. If a particular snippet of code doesn't seem obvious, leave a comment. Later on you might need to review it and might not remember what it's purpose is.
Other than that, class, function and variable naming conventions are up to you and your team. Lately I've been using Zend Framework's naming conventions because they feel right to me.
Also, and when in development mode, I set an error handler that will output an error page at the slightest error (even warnings), giving me the full backtrace.
Fortunately, namespaces are in 5.3 and 6. I would highly recommend against using the Path_To_ClassName idiom. It makes messy code, and you can never change your library structure... ever.
The SPL's autoload is great. If you're organized, it can save you the typical 20-line block of includes and requires at the top of every file. You can also change things around in your code library, and as long as PHP can include from those directories, nothing breaks.
Make liberal use of === over ==. For instance:
if (array_search('needle',$array) == false) {
// it's not there, i think...
}
will give a false negative if 'needle' is at key zero. Instead:
if (array_search('needle',$array) === false) {
// it's not there!
}
will always be accurate.
See this question: Hidden Features of PHP. It has a lot of really useful PHP tips, the best of which have bubbled up to the top of the list.
There are a few things I do in PHP that tend to be PHP-specific.
Assemble strings with an array.
A lot of string manipulation is expensive in PHP, so I tend to write algorithms that reduce the discrete number of string manipulations I do. The classic example is building a string with a loop. Start with an array(), instead, and do array concatenation in the loop. Then implode() it at the end. (This also neatly solves the trailing-comma problem.)
Array constants are nifty for implementing named parameters to functions.
Enable NOTICE, and if you realy want to STRICT error reporting. It prevents a lot of errors and code smell: ini_set('display_errors', 1); error_reporting(E_ALL && $_STRICT);
Stay away from global variables
Keep as many functions as possible short. It reads easier, and is easy to maintain. Some people say that you should be able to see the whole function on your screen, or, at least, that the beginning and end curly brackets of loops and structures in the function should both be on your screen
Don't trust user input!
I've been developing with PHP (and MySQL) for the last 5 years. Most recently I started using a framework (Zend) with a solid javascript library (Dojo) and it's changed the way I work forever (in a good way, I think).
The thing that made me think of this was your first bullet: Zend framework does exactly this as it's standard way of accessing 'controllers' and 'actions'.
In terms of encapsulating and abstracting issues with different databases, Zend_Db this very well. Dojo does an excellent job of ironing out javascript inconsistencies between different browsers.
Overall, it's worth getting into good OOP techniques and using (and READING ABOUT!) frameworks has been a very hands-on way of getting to understand OOP issues.
For some standalone tools worth using, see also:
Smarty (template engine)
ADODB (database access abstraction)
Declare variables before using them!
Get to know the different types and the === operator, it's essential for some functions like strpos() and you'll start to use return false yourself.