I am trying to identify pure functions in PHP code.
A pure function is one where both these statements about the function hold:
The function always evaluates the same result value given the same argument value(s). The function result value cannot depend on any hidden information or state that may change as program execution proceeds or between different executions of the program, nor can it depend on any external input from I/O devices.
Evaluation of the result does not cause any semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices.
(definition from Wikipedia)
Is it sufficient to say that a PHP function is pure if and only if
all its arguments are passed by value (no & in the argument list)
it does not use object members (no $this in the function body)
it does not use globals (it doesn't contain global in the function body)
it does not use superglobals (it doesn't contain $_ variables)
Are these statements true ?
Am I missing any use cases ?
You are missing a lot of use-cases
rand()
database interaction
file IO
static variables
calls to other functions with global
import/require statements inside functions
functions with internal state like ob_get_contents()
mutating array pointers
There is probably a lot of stuff I'm not thinking of. PHP has a very stateful design.
Related
I'm wondering what you think the best practice is here-- does it buy you very much to type-check parameters in PHP? I.e have you actually seen noticeably fewer bugs on projects where you've implemented parameter type-checking vs. those that don't? I'm thinking about stuff like this:
public function __construct($screenName, $createdAt) {
if (!is_string($screenName) || !is_string($createdAt) {
return FALSE;
}
}
Normally within a PHP application that makes use of the skalar variable "types" is bound to actually string input (HTTP request). PHP made this easier so to convert string input to numbers so you can use it for calculation and such.
However checking scalar values for is_string as proposed in your example does not make much sense. Because nearly any type of variable in the scalar family is a string or at least can be used as a string. So as for your class example, the question would be, does it actually make sense to check the variable type or not?
For the code you proposed it does not make any sense because you exit the constructor with a return false;. This will end the constructor to run and return a not-properly-initialized object.
Instead you should throw an exception, e.g. an InvalidArgumentException if a constructors argument does not provide the expected / needed type of value.
Leaving this aside and taking for granted that your object constructor needs to differ between a string and an integer or bool or any other of the scalar types, then you should do the checks.
If you don't rely on the exact scalar types, you can cast to string instead.
Just ensure that the data hidden inside the object is always perfectly all-right and it's not possible that wrong data can slip into private members.
It depends. I'll generally use the type-hinting that is built into PHP for higher-level objects ((stdClass $obj, array $arr, MyClass $mine)), but when it comes to lower level values -- especially numbers and strings, it becomes a little less beneficial.
For example, if you had the string '12345', that becomes a little difficult to differentiate between that and the number 12345.
For everything else, the accidental casting of array to a string will be obvious. Class instances which are cast to strings, if they don't have a __toString, will make PHP yell. So your only real issue is classes which have a __toString method and, well, that really limits the number of times where it can come up. I really wonder if it is worth that level of overhead.
Checking function arguments is a very good practice. I suspect people often don't do that because their functions grow bigger and the code becomes uglier and less readable. Now with PHP 7 you can type-hint scalar types but there is still no solution for cases when you want your parameter to be one of two types: array or instance of \Traversable (which both can be traversed with foreach).
In this case, I recommend having a look at the args module from NSPL. The __constructor from your example will have the following look:
public function __construct($screenName, $createdAt)
{
expectsAll(string, [$screenName, $createdAt]);
}
// or require a non-empty array, string or instance of \ArrayAccess
function first($sequence)
{
expects([nonEmpty, arrayAccess, string], $sequence);
return $sequence[0];
}
More examples here.
Better documentation is more important when you're the only one interacting with the methods. Standard method definition commenting gives you well documented methods that can easily be compiled into an API that is then used in many IDEs.
When you're exposing your libraries or your inputs to other people, though, it is nice to do type checking and throw errors if your code won't work with their input. Type checking on user input protects you from errors and hacking attempts, and as a library letting other developers know that the input they provided is not what you're expecting is sometimes nice.
Some may know that PHP methods can be remotely invoked from Flash.
Sometimes the input parameter of a remote PHP method is an array of integers.
Because PHP is dynamically typed an attacker can pass an array of anything.
The array of integers has to be used in a SQL query.
At the moment I'm preventing injection like this:
foreach ($unsafeArray as $value)
$safeArray[] = (int)$value;
What would you recommend? Maybe I should start using Java :D
You could use this: $aSafeArray = array_map('intval', $aUnsafeArray); to make sure all passed values are an integer.
My advice would be to start using prepared statements!
Example:
$o->bindParam(':anint', $iInt, PDO::PARAM_INT);
What would you recommend?
I'm not a flash expert, but indeed PHP methods could be called just by knowing its name, and the parameters could be passed as array. So the issue is actually not the remote method invocation, but the input filtering and validation.
Depending on the intented behaviour, I would use intval as opposed to hard cast to int (AFAIR it would return 0 on invalid value), otherwise you could throw an exception or whatever. You have to define its behavior first.
Maybe I should start using Java
No, unless you want a bloated solution both in terms of development speed and huge memory requirement at compile and runtime :p
I will always be in confusion whether to create pass/call by reference functions. It would be great if someone could explain when exactly I should use it and some realistic examples.
A common reason for calling by reference (or pointers) in other languages is to save on space - but PHP is smart enough to implement copy-on-write for arguments which are declared as passed-by-value (copies). There are also some hidden semantic oddities - although PHP5 introduced the practice of always passing objects by reference, array values are always stored as references, call_user_func() always calls by value - never by reference (because it itself is a function - not a construct).
But this is additional to the original question asked.
In general its good practice to always declare your code as passing by value (copy) unless you explicitly want the value to be different after the invoked functionality returns. The reason being that you should know how the invoked functionality changes the state of the code you are currently writing. These concepts are generally referred to as isolation and separation of concerns.
Since PHP 5 there is no real reason to pass values by reference.
One exception is if you want to modify arrays in-place. Take for example the sort function. You can see that the array is passed by reference, which means that the array is sorted in place (no new array is returned).
Or consider a recursive function where each call needs to have access to the same datum (which is often an array too).
In php4 it was used for large variables. If you passed an array in a function the array was copied for use in the function, using a lot of memory and cpu. The solution was this:
function foo(&$arr)
{
echo $arr['value'];
}
$arr = new array();
foo($arr);
This way you only passed the reference, a link to the array and save memory and cpu. Since php5 every object and array (not sure of scalars like int) are passed by reference internally so there isn't any need to do it yourself.
This is best when your function will always return a modified version of the variable that is passed to it to the same variable
$var = modify($var);
function modify($var)
{
return $var.'ret';
}
If you will always return to the passed variable, using reference is great.
Also, when dealing with large variables and especially arrays, it is good to pass by reference wherever feasible. This helps save on memory.
Usually, I pass by reference when dealing with arrays since I usually return to the modified array to the original array.
let me elaborate more on the Title. Consider for example PHP_FUNCTION(session_start). Will I be able to invoke session_start from within session_id which is another PHP_FUNCTION (this is just for illustration not the actual purpose)?
Well, yes, but you should avoid it as much as possible. One of the main benefits of writing internal implementations of functions is that, contrary to what happens in PHP, C function calls are cheap. Additionally, calling PHP functions internally in C code is relatively painful.
For instance, in the case of session_start, you have php_session_start, which is exposed by the session extension. Owing to what I described in the first paragraph, extensions will usually export C functions that may be useful to others.
In fact, the internal PHP function foo needed to call the internal PHP function bar, the best strategy, if possible, would be to define an auxiliary (non PHP_FUNCTION) C function with most of the implementation of bar. Then both PHP_FUNCTION(foo) and PHP_FUNCTION(bar) could call that auxiliary function.
Anyway, the easiest way to call PHP functions is to use call_user_function:
int call_user_function(HashTable *function_table, zval **object_pp,
zval *function_name, zval *retval_ptr, zend_uint param_count,
zval *params[] TSRMLS_DC);
The variant call_user_function_ex also allows prohibiting separation when the argument should be sent by reference by it's not and specifying a symbol table.
This will work both if the relevant function is internal (PHP_FUNCTION) or was defined in userspace. If it's a regular function, you should use EG(function_table) as the first argument, the second should be NULL and I think you can figure out the others.
If you execute the function several times, this is not very efficient. In that case, see the functions in "Zend_API.h" that start with zend_fcall_.
I wouldn't recommend other options to call internal functions, such as manually setting up the arguments stack and other trickery and them manually calling the underlying C function.
ie. session_start(session_id())
Yes, however in this case it doesn't make sense because session_id() requires the session to already be started.
I have a string that stores some variables that must be executed to produce a result, for example:
define('RUN_THIS', '\$something.",".$somethingElse');
Which is then eval()-uated:
$foo = eval("return ".RUN_THIS.";");
I understand that eval is unsafe if the string that gets evaluated is from user input. However, if for example I wanted to have everything run off Facebook's HipHop which doesn't support eval() I couldn't do this.
Apparently I can use call_user_func() - is this effectively the same result as eval()? How is deemed to be secure when eval() isn't, if that is indeed the case?
Edit:
In response to the comments, I didn't originally make it clear what the goal is. The constant is defined in advance in order that later code, be it inside a class that has access to the configuration constants, or procedural code, can use it in order to evaluate the given string of variables. The variables that need to be evaluated can vary (completely different names, order, formatting) depending on the situation but it's run for the same purpose in the same way, which is why I currently have the string of variables set in a constant in this way. Technically, eval() is not unsafe as long as the config.php that defines the constants is controlled but that wasn't the point of the question.
Kendall seems to have a simple solution, but I'll try to answer your other question:
Apparently I can use call_user_func() - is this effectively the same result as eval()? How is deemed to be secure when eval() isn't, if that is indeed the case?
call_user_func is actually safer than eval because of the fact that call_user_func can only call one user function. eval on the other hand executes the string as PHP code itself. You can append '; (close the string and start a new "line" of code) at the end of the string and then add some more code, add a ;' (end the line of code and start another string so that there is no syntax error), thus allowing the constant RUN_THIS to contain lots of PHP code that the user can run on the server (including deleting all your important files and retrieving information for databases, etc. NEVER LET THIS HAPPEN.
call_user_func doesn't let his happen. When you run call_user_func_array($func, $args) the user can only run a restricted set of functions because: (a) the function has to be user defined (b) you can manipulate $func to ensure the user isn't able to run any function he/she wants either by checking that $func is in a list of "allowed functions" or by prefixing something like user_ to the function names and the $func variable itself (This way the user can run only functions beginning with user_.
I can't see any reason why you can't just use double-quote string building.
$foo = "\$something,$somethingElse";