Why are dynamic constructs difficult for php compilers (HPHP)? - php

I was reading up on Paul Bigger's http://blog.paulbiggar.com/archive/a-rant-about-php-compilers-in-general-and-hiphop-in-particular/ and he mentions that HPHP doesn't fully support dynamic constructs. He then states, "Still, a naive approach is to just stick a switch statement in, and compile everything that makes sense." Is he saying that instead of a dynamic include, you could use switch statements to include the proper file? If so, why would this work and why is it "easier" for a compiler to compile? As always, thx for your time!

from my understanding, if you've got this
include "$foo.php";
the compiler would have no clue what you're going to include. On the other side, with this
switch($foo) {
case 'bar' : include "bar.php";
case 'quux' : include "quux.php";
}
they can simply compile "bar" and "quux" and wrap them in an if statement which checks $foo and executes whatever is appropriate.

A compiler expects to be able to identify all of the source and binary files that might be used by the program being compiled.
include($random_file);
If the file named in $random_file declares constants, classes, variables, the compiler will have no way knowing because the value of $random_file is not known at compile time. Your code using those constants, classes and variables will fail in difficult-to-debug ways. The switch statement would make known the list of possible files so the compiler can discover any relevant declarations.
Languages designed to be compiled have dynamic linkers and foreign function interfaces that combine to provide similar functionality to include($random_file) without needing the explicit switch.

Related

Hook into call_user_func_array - possible? [duplicate]

I want to hook before execution / or replace standart core functions, for example i im going to prevent both include and require accesa to any scripts. Is any way to make it without any extra .dll's? Or another case is_array($myarr); i would be to hook at array($myarr) === $myarr; (looks like it is faster) to avoid creating extra classes and functions.
Ps and one more question : how to prevent all php execution after some moment? I have html templates with php parts <?=$myvar?> i want to prevent short sintax and execution at all when my script ends work, what i have to try?
About hooks to standart functions: there is no way to do that without external modules. APD PECL module
will do the job.
rename_function('require', 'internal_require'); // saving reference to original function
override_function('require', '$filename',
'print "require called"; internal_require($filename);');
Second question is not very clear. Do you want to hook on standart is_array function, to array() lexical construct or (array) type casting?
About stopping php interpretation: have a look at __halt_compiler function. But keep in mind that succeeding blocks of php will be just embedded in HTML (thus visible to everybody).
If you want to disable functions, you can use safe mode, but it is deprecated and not recommended. And as madfriend says, __halt_compiler just sends everything below it as text. Bear in mind that it can only be called from the outermost scope - I.e. not inside curly braces (if, loops, functions etc.)

php hook core functions

I want to hook before execution / or replace standart core functions, for example i im going to prevent both include and require accesa to any scripts. Is any way to make it without any extra .dll's? Or another case is_array($myarr); i would be to hook at array($myarr) === $myarr; (looks like it is faster) to avoid creating extra classes and functions.
Ps and one more question : how to prevent all php execution after some moment? I have html templates with php parts <?=$myvar?> i want to prevent short sintax and execution at all when my script ends work, what i have to try?
About hooks to standart functions: there is no way to do that without external modules. APD PECL module
will do the job.
rename_function('require', 'internal_require'); // saving reference to original function
override_function('require', '$filename',
'print "require called"; internal_require($filename);');
Second question is not very clear. Do you want to hook on standart is_array function, to array() lexical construct or (array) type casting?
About stopping php interpretation: have a look at __halt_compiler function. But keep in mind that succeeding blocks of php will be just embedded in HTML (thus visible to everybody).
If you want to disable functions, you can use safe mode, but it is deprecated and not recommended. And as madfriend says, __halt_compiler just sends everything below it as text. Bear in mind that it can only be called from the outermost scope - I.e. not inside curly braces (if, loops, functions etc.)

Looking for functions with PHP tokenizer

Right now, I have a script which uses PHP's tokenizer to look for certain functions within a PHP source code file. The pattern I am currently looking for is:
T_STRING + T_WHITESPACE (optional) + "("
This seems to match all of my test cases so far except variable functions, which I am ignoring for the purposes of this question.
The obvious problem here is that this pattern produces a lot of false positives, like matching function definitions:
public function foo() { // foo() should not be matched
My question is, is there a more reliable/accurate method for looking at source code and plucking out all the function invocations? Maybe a better method than using the tokenizer at all?
Edit:
In particular, I'm looking to emulate the functionality of the disable_functions PHP directive within a class file. So, if exec() should be disallowed, I'm trying to find any uses of that function within the analyzed file. I do realize that variable functions make this terribly difficult, so I am detecting these and disallowing them as well.
You first run the tokenizer (available in PHP). Then you run a parser on top of the tokens. The parser needs to read the tokens and should be able to tell your what a specific token has been used for. It depends on the reliability of your parser how reliable the outcome is.
If your current parser (you have not shown any code) is not reliable enough, you need to write a better parser. That simple it is. Probably you're not doing much more than just tokenizing and then reading as it passes which just might not be enough.
Instead of using the tokenizer, consider instead using a higher-level parser to analyze your code. For example, PHP-Parser can explicitly identify function declarations, as well as variable function calls.

PHP: What are language constructs and why do we need them?

I keep coming across statements like:
echo is a language construct but
print is a function and hence has a
return value
and
die is a language construct
My question is what are these language constructs and more importantly why do we need them?
Language constructs are hard coded into the PHP language. They do not play by normal rules.
For example, whenever you try to access a variable that doesn't exist, you'd get an error. To test whether a variable exists before you access it, you need to consult isset or empty:
if (isset($foo))
If isset was a normal function, you'd get a warning there as well, since you're accessing $foo to pass it into the function isset. Since isset is a language construct though, this works without throwing a warning. That's why the documentation makes a clear distinction between normal functions and language constructs.
Language constructs are what makes up the language: things like "if" "for" "while" "function" and so on.
The mentions in the PHP manual of things like "echo", "die" or "return" are there to make it clear that these are NOT functions and that they do not always behave like functions.
You could call "echo" as "echo()" so it may confuse beginners. That's why they put the clear disinction in the manual. To make it absolutely clear to everyone.
Other examples for language constructs that could be mistaken for functions are "array()", "list()" and "each()".
To understand the answer for this question you must understand how parsers work. A language is defined by syntax and the syntax is defined through keywords.
The language constructs are pieces of code that make the base of PHP language. The parser deals with them directly instead of functions.
Not all of a language can be functions. There must be some base, somewhere, on which you implement those first functions. The elements of this base are the language constructs (alternately, built-ins). They don't always behave like "normal" functions do.
For the sake of completeness, a language construct is any instruction which is built into the language itself, while a function is an additional block of code.
In some cases, a language may choose to build in a particular feature or to rely on a separate function.
For example, PHP has the print language construct, which outputs a string. Many other languages, such as C don’t build it in, but implement it as a function. There might be technical reasons for taking one or other approach, but sometimes it is more philosophical — whether the feature should be regarded as core or additional.
For practical purposes, while functions follow a rigid set of logistic rules, language constructs don’t. Sometimes, that’s because they may be doing something which would otherwise traumatise a regular function. For example, isset(…), by its very purpose, may be referencing something which doesn’t exist. Functions don’t handle that at all well.
Here are some of the characteristics of language constructs:
Many don’t require parentheses; some do sometimes.
Language Constructs are processed in a different stage; functions are processed later
Some Language Constructs, such as isset do things which would be impossible as functions; some others, such as Array(…) could have gone either way.
Some Language Constructs certainly don’t look like functions. For example, the Array(…) construct can be written as […].
As the documentation keeps reminding us, language constructs cannot be referenced as variable variables. So $a='print_r'; $a(…); is OK, but $a='print'; $a(…); isn’t.
Some things are just not possible using normal functions, consider this snippet:
list($el1, $el2) = array('el1', 'el2');
What it does is it takes the elements from a non-associative array and assigns the values to the variables defined in the list() construct.
Simply cannot be done with functions :)
A more subtle example is issetand empty. Though they look like functions, they one thing that's not possible with functions alone – they do not generate "variable is undefined" or "undefined index" notices.
language constructs can be formed in more than one way and has a return-value
print("asdf"); is as possible as print "asdf"; and will return 1.
echo("asdf"); is equal to echo "asdf;" but has no return-value.
die("asdf"); is equal to exit("asdf"); and hasn't a return-value too.

What are some useful PHP Idioms?

I'm looking to improve my PHP coding and am wondering what PHP-specific techniques other programmers use to improve productivity or workaround PHP limitations.
Some examples:
Class naming convention to handle namespaces: Part1_Part2_ClassName maps to file Part1/Part2/ClassName.php
if ( count($arrayName) ) // handles $arrayName being unset or empty
Variable function names, e.g. $func = 'foo'; $func($bar); // calls foo($bar);
Ultimately, you'll get the most out of PHP first by learning generally good programming practices, before focusing on anything PHP-specific. Having said that...
Apply liberally for fun and profit:
Iterators in foreach loops. There's almost never a wrong time.
Design around class autoloading. Use spl_autoload_register(), not __autoload(). For bonus points, have it scan a directory tree recursively, then feel free to reorganize your classes into a more logical directory structure.
Typehint everywhere. Use assertions for scalars.
function f(SomeClass $x, array $y, $z) {
assert(is_bool($z))
}
Output something other than HTML.
header('Content-type: text/xml'); // or text/css, application/pdf, or...
Learn to use exceptions. Write an error handler that converts errors into exceptions.
Replace your define() global constants with class constants.
Replace your Unix timestamps with a proper Date class.
In long functions, unset() variables when you're done with them.
Use with guilty pleasure:
Loop over an object's data members like an array. Feel guilty that they aren't declared private. This isn't some heathen language like Python or Lisp.
Use output buffers for assembling long strings.
ob_start();
echo "whatever\n";
debug_print_backtrace();
$s = ob_get_clean();
Avoid unless absolutely necessary, and probably not even then, unless you really hate maintenance programmers, and yourself:
Magic methods (__get, __set, __call)
extract()
Structured arrays -- use an object
My experience with PHP has taught me a few things. To name a few:
Always output errors. These are the first two lines of my typical project (in development mode):
ini_set('display_errors', '1');
error_reporting(E_ALL);
Never use automagic. Stuff like autoLoad may bite you in the future.
Always require dependent classes using require_once. That way you can be sure you'll have your dependencies straight.
Use if(isset($array[$key])) instead of if($array[$key]). The second will raise a warning if the key isn't defined.
When defining variables (even with for cycles) give them verbose names ($listIndex instead of $j)
Comment, comment, comment. If a particular snippet of code doesn't seem obvious, leave a comment. Later on you might need to review it and might not remember what it's purpose is.
Other than that, class, function and variable naming conventions are up to you and your team. Lately I've been using Zend Framework's naming conventions because they feel right to me.
Also, and when in development mode, I set an error handler that will output an error page at the slightest error (even warnings), giving me the full backtrace.
Fortunately, namespaces are in 5.3 and 6. I would highly recommend against using the Path_To_ClassName idiom. It makes messy code, and you can never change your library structure... ever.
The SPL's autoload is great. If you're organized, it can save you the typical 20-line block of includes and requires at the top of every file. You can also change things around in your code library, and as long as PHP can include from those directories, nothing breaks.
Make liberal use of === over ==. For instance:
if (array_search('needle',$array) == false) {
// it's not there, i think...
}
will give a false negative if 'needle' is at key zero. Instead:
if (array_search('needle',$array) === false) {
// it's not there!
}
will always be accurate.
See this question: Hidden Features of PHP. It has a lot of really useful PHP tips, the best of which have bubbled up to the top of the list.
There are a few things I do in PHP that tend to be PHP-specific.
Assemble strings with an array.
A lot of string manipulation is expensive in PHP, so I tend to write algorithms that reduce the discrete number of string manipulations I do. The classic example is building a string with a loop. Start with an array(), instead, and do array concatenation in the loop. Then implode() it at the end. (This also neatly solves the trailing-comma problem.)
Array constants are nifty for implementing named parameters to functions.
Enable NOTICE, and if you realy want to STRICT error reporting. It prevents a lot of errors and code smell: ini_set('display_errors', 1); error_reporting(E_ALL && $_STRICT);
Stay away from global variables
Keep as many functions as possible short. It reads easier, and is easy to maintain. Some people say that you should be able to see the whole function on your screen, or, at least, that the beginning and end curly brackets of loops and structures in the function should both be on your screen
Don't trust user input!
I've been developing with PHP (and MySQL) for the last 5 years. Most recently I started using a framework (Zend) with a solid javascript library (Dojo) and it's changed the way I work forever (in a good way, I think).
The thing that made me think of this was your first bullet: Zend framework does exactly this as it's standard way of accessing 'controllers' and 'actions'.
In terms of encapsulating and abstracting issues with different databases, Zend_Db this very well. Dojo does an excellent job of ironing out javascript inconsistencies between different browsers.
Overall, it's worth getting into good OOP techniques and using (and READING ABOUT!) frameworks has been a very hands-on way of getting to understand OOP issues.
For some standalone tools worth using, see also:
Smarty (template engine)
ADODB (database access abstraction)
Declare variables before using them!
Get to know the different types and the === operator, it's essential for some functions like strpos() and you'll start to use return false yourself.

Categories