In most other Object Oriented languages. it would be sacrilege to have each function receive a single associative array of Objects rather than enumerating each in the method signature. Why though, is it acceptable and commonly used in most popular frameworks for both of these languages to do this?
Is there some justification beyond wishing to have concise method signatures?
I do see a benefit in this -- that the API could remain unchanged as new, optional parameters are added. But Javascript and PHP already allow for optional parameters in their method signatures. If anything, it seems like Java or another OO language would benefit from this more... and yet I rarely see this pattern there.
What gives?
In my opinion, a lot of these functions are climbing in the number of arguments they accept, over 10 is not uncommon. Even if you do optional parameters, you still have to send them in order.
Consider a function like:
function myFunc(a0, a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, a13){
//...
}
Say you want to use arguments 3 and 11 only. Here is your code:
myFunc(null, null, null, 'hello', null, null, null, null, null, null, null, 'world');
Wouldn't you rather just:
myFunc({
a3 : 'hello',
a11 : 'world'
});
?
There are a couple of reasons for this.
The first is that not all argument lists have a natural order. If there is a use-case to only supply the 1st and 6th argument, now you have to fill in four default placeholders. Jage illustrates this well.
The second is that it's very hard to remember the order the arguments must occur in. If you take a series of numbers as your arguments, it's unlikely to know which number means what. Take imagecopyresampled($dest, $src, 0, 10, 0, 20, 100, 50, 30, 80) as an example. In this case, the configuration array acts like Python's named arguments.
The major reason is that those particular languages simply do not support having multiple calling conventions for the same function name. I.E. you can't do the following:
public function someFunc(SomeClass $some);
public function someFunc(AnotherClass $another);
public function someFunc(SomeClass $some, AnotherClass $another);
So you must find another way to create simpler ways to pass your variables around, in PHP we end up with someFunc(array('some'=>$some, 'another'=>$another)) because it is the only convenient way. In JavaScript we end up using objects, which isn't as bad: someFunc({some: something, another: anotherthing})
Ruby follows this methodology as well, and has even devised a more succinct syntax meant specifically for using hashes as initializers, as of Ruby 1.9:
# old syntax
myFunc(:user => 'john', :password => 'password');
# new syntax
myFunc(user: 'john', password: 'password');
The problem being addressed is that, within these languages, you cannot overload functions based on argument type. This results in complex classes with a single constructor that can have a huge and unwieldy argument list. Using hash-like objects to supply parameters allows for a sort of pseudo-overloading by parameter name rather than by type.
My only real problem with this practice is it becomes difficult to document your arguments:
PHPDoc for variable-length arrays of arguments
One of the most important reason why you don't see this in other OO languages is that you are probably referring to compiled languages like C++ or Java.
The compiler is responsible to determine, at compilation time not during execution, which method you want to call and this is normally done based on the signature, so basically it has to be done this way. This is also how method overload is done in these languages.
First of all such technique is very simple for the coders.
They do not need to remember order of all of the parameters of the all functions in your application. The code becomes more readable and more robust.
Any future development, improvements or refactoring will be no so hard to provide. The code becomes more maintainable.
But there is some pitfalls. For example, not every IDE will give you a simple code completing with such functions and their parameters.
In my opinion there 2 reasons why this has evolved:
Array's are very easy to create and manipulate, even with mixed types.
PHP/JS programmers have often a non academic background and are less indoctrinated from languanges like Java and C++.
Related
I'm trying to write more functional code in PHP without any helper libraries.
I need to return some JSON that includes the results of a transformed array AND the count of that array (for convenience on the data consumer end). Since you're not supposed to use variables in FP, I'm stumped on how to get the count of the array without recalculating/remapping the array.
Here's an example of what my code currently looks like:
$duplicates = array_filter( get_results(), 'find_duplicates' );
send_json( array(
"duplicates" => $duplicates,
"numDuplicates" => count( $duplicates )
) );
How can I do the same without storing the results of the filter in a temporary variable to avoid running array_filter() twice?
But first, acknowledge the following...
"Since you're not supposed to use variables in FP..." – that's a ludicrous understanding of functional programming. Variables are used constantly in functional programs. I'm guessing you saw point-free functional programs and then imagined that every program can be expressed in such a way...
the receiver of the JSON could easily get the number of duplicates using JSON.parse(json).duplicates.length because every Array in JavaScript has a length property – it's arguably silly to attach a numDuplicates in the first place. Anyway, let's assume your consumer has a specific API that requires the numDuplicates field...
functional programming is concerned with things like function purity – maybe you've simplified your code in your post (which is bad; don't do that) or that is in fact your actual code. In such a case, get_results() and send_json functions are impure; send_json has an obvious (but unknown) side effect (the return value is not used) — You ask for a functional solution but you have other outstanding non-functional code... so...
There's nothing wrong with the code you have. Sometimes removing a point (variable, or argument), it hurts the readability of the code. In your case, this code is perfectly legible. It is at this point that I feel you're only trying to shorten the code or make it more clever. Your intention is to improve it, but I think you'd actually harm it in this case.
What if I told you...
a variable assignment can be replaced with a lambda? 0_0
(function ($duplicates) {
send_json([
'duplicates' => $duplicates,
'numDuplicates' => count($duplicates)
});
}) (array_filter(get_results(), 'find_duplicates'));
But that made the code longer.. and there's added abstraction which hurts readability T_T In this case, using a normal variable assignment (as in your original code) would've been much better
Combinators
OK, so what if you had some combinators at your disposal to massage the data into the desired shape?
function apply (...$xs) {
return function ($f) use ($xs) {
return call_user_func($f, ...$xs);
};
}
function identity ($x) { return $x; }
// hey look, mom! no points!
send_json(
array_combine(
['duplicates', 'numDuplicates'],
array_map(
apply(
array_filter(get_results(), 'find_duplicates')),
['identity', 'count'])));
Did we achieve anything other than writing the weirdest PHP you or anyone else has probably seen? Not to mention, the input is strangely nested in the middle of the expression...
remarks
I'm nearly certain that you'll be disappointed with this answer (or disagree with me), but I'm also pretty confident that you're not sure what you're looking for. A guess: you saw functional programming that "doesn't use variables" and assumed that's how all programs can and should be written; but that's just not the case. Sometimes using a variable or two can dramatically improve the readability of a given expression.
Anyway, all of this is truly beside the point because attaching numDuplicates is arguably an anti-pattern in JSON anyway (point #2 above).
I've noticed some inconsistencies in the PHP manual; a number of core function signatures are documented to accept arguments by reference, however they accept arguments by value.
I posted a more specific question previously, and #cweiske provided a great answer (with reference to the pertinent PHP source) however these inconsistencies appear to be more rampant.
There are a number of functions that are affected by this (I'll update this list as warrants; also note, these tests were done in an error_reporting(-1) environment)
http://www.php.net/manual/en/function.current.php
This was already discussed at the linked question
http://www.php.net/manual/en/function.key.php
This was already discussed at the linked question
http://www.php.net/manual/en/function.array-replace-recursive.php
array array_replace_recursive ( array &$array , array &$array1 [, array &$... ] )
Accepts arguments $array, $array1, etc., by value. This has been corrected.
http://www.php.net/manual/en/function.array-multisort.php
bool array_multisort ( array &$arr [, mixed $arg = SORT_ASC [, mixed $arg = SORT_REGULAR [, mixed $... ]]] )
Accepts arguments $arr, etc., by value. This should throw an error, as it won't do anything if the argument isn't a variable.
Now I'm concerned, not because I'm being anal about the documentation, but because I fear that PHP devs are on the fence about the implementation details of these functions (or something equally unreliable)
I use array_replace_recursive() for instance, to take an array argument, and apply it against another array containing defaults. Some of my codebase has taken advantage of this inconsistency, to simply do:
$values = array_replace_recursive(array(
'setting_1' => array(
'sub-setting_1' => '',
'sub-setting_2' => '',
'sub-setting_3' => '',
),
'setting_2' => array(
'sub-setting_1' => 0,
'sub-setting_2' => 0,
),
'setting_3' => true,
), $values);
Thus producing a properly formatted array (to get around gratuitous isset() calls)
Should I be concerned with this? I'm thinking about submitting a documentation related bug request, but I'm first curious if anyone on SO (looking in your direction #cweiske) has insight on why this has been done.
Perhaps I'm not understanding your dilemma, but those functions don't directly modify the data in their parameters, so whether they accept via reference or value is sort of a moot point isn't it? Either way you're going to need to assign the return value if you want the function call to actually do anything for you.
In terms of insight into why you might be seeing the discrepancy. PHP 5 has had many changes made to the way references work. In fact I believe it's PHP 5.3+ that no longer permits certain usages of & to assign by reference. What you may be seeing is the transition of some of the PHP core functions to comply with the new rules about reference assignment. If memory serves me, PHP 5.3 branch was actually intended to be PHP 6, but was decidedly merged into the trunk as a branch instead of waiting for several complex features to be finished. This would account for why you've never seen confusion like this before in PHP.
I'm wondering what you think the best practice is here-- does it buy you very much to type-check parameters in PHP? I.e have you actually seen noticeably fewer bugs on projects where you've implemented parameter type-checking vs. those that don't? I'm thinking about stuff like this:
public function __construct($screenName, $createdAt) {
if (!is_string($screenName) || !is_string($createdAt) {
return FALSE;
}
}
Normally within a PHP application that makes use of the skalar variable "types" is bound to actually string input (HTTP request). PHP made this easier so to convert string input to numbers so you can use it for calculation and such.
However checking scalar values for is_string as proposed in your example does not make much sense. Because nearly any type of variable in the scalar family is a string or at least can be used as a string. So as for your class example, the question would be, does it actually make sense to check the variable type or not?
For the code you proposed it does not make any sense because you exit the constructor with a return false;. This will end the constructor to run and return a not-properly-initialized object.
Instead you should throw an exception, e.g. an InvalidArgumentException if a constructors argument does not provide the expected / needed type of value.
Leaving this aside and taking for granted that your object constructor needs to differ between a string and an integer or bool or any other of the scalar types, then you should do the checks.
If you don't rely on the exact scalar types, you can cast to string instead.
Just ensure that the data hidden inside the object is always perfectly all-right and it's not possible that wrong data can slip into private members.
It depends. I'll generally use the type-hinting that is built into PHP for higher-level objects ((stdClass $obj, array $arr, MyClass $mine)), but when it comes to lower level values -- especially numbers and strings, it becomes a little less beneficial.
For example, if you had the string '12345', that becomes a little difficult to differentiate between that and the number 12345.
For everything else, the accidental casting of array to a string will be obvious. Class instances which are cast to strings, if they don't have a __toString, will make PHP yell. So your only real issue is classes which have a __toString method and, well, that really limits the number of times where it can come up. I really wonder if it is worth that level of overhead.
Checking function arguments is a very good practice. I suspect people often don't do that because their functions grow bigger and the code becomes uglier and less readable. Now with PHP 7 you can type-hint scalar types but there is still no solution for cases when you want your parameter to be one of two types: array or instance of \Traversable (which both can be traversed with foreach).
In this case, I recommend having a look at the args module from NSPL. The __constructor from your example will have the following look:
public function __construct($screenName, $createdAt)
{
expectsAll(string, [$screenName, $createdAt]);
}
// or require a non-empty array, string or instance of \ArrayAccess
function first($sequence)
{
expects([nonEmpty, arrayAccess, string], $sequence);
return $sequence[0];
}
More examples here.
Better documentation is more important when you're the only one interacting with the methods. Standard method definition commenting gives you well documented methods that can easily be compiled into an API that is then used in many IDEs.
When you're exposing your libraries or your inputs to other people, though, it is nice to do type checking and throw errors if your code won't work with their input. Type checking on user input protects you from errors and hacking attempts, and as a library letting other developers know that the input they provided is not what you're expecting is sometimes nice.
In a PHP web application I'm working on, I see functions defined in two possible ways.
Approach 1:
function myfunc($arg1, $arg2, $arg3)
Approach 2:
// where $array_params has the structure array('arg1'=>$val1, 'arg2'=>$val2, 'arg3'=>$val3)
function myfunc($array_params)
When should I use one approach over another? It seems that if system requirements keep changing, and therefore the number of arguments for myfunc keep changing, approach 1 may require a lot of maintenance.
If the system is changing so often that using an indexed array is the best solution, I'd say this is the least of your worries. :-)
In general functions/methods shouldn't take too many arguments (5 plus or minus 2 being the maximum) and I'd say that you should stick to using named (and ideally type hinted) arguments. (An indexed array of arguments only really makes sense if there's a large quantity of optional data - a good example being configuration information.)
As #Pekka says, passing an array of arguments is also liable to be a pain to document and therefore for other people/yourself in 'n' months to maintain.
Update-ette...
Incidentally, the oft mentioned book Code Complete examines such issues in quite a bit of detail - it's a great tome which I'd highly recommend.
Using a params array (a surrogate for what is called "named arguments" in other languages") is great - I like to use it myself - but it has a pretty big downside: Arguments are not documentable using standard phpDoc notation that way, and consequently, your IDE won't be able to give you hints when you type in the name of a function or method.
I find using an optional array of arguments to be useful when I want to override a set of defaults in the function. It might be useful when constructing an object that has a lot of different configuration options or is just a dumb container for information. This is something I picked up mostly from the Ruby world.
An example might be if I want to configure a container for a video in my web page:
function buildVideoPlayer($file, $options = array())
{
$defaults = array(
'showAds' => true,
'allowFullScreen' = true,
'showPlaybar' = true
);
$config = array_merge($defaults, $options);
if ($config['showAds']) { .. }
}
$this->buildVideoPlayer($url, array('showAds' => false));
Notice that the initial value of $options is an empty array, so providing it at all is optional.
Also, with this method we know that $options will always be an array, and we know those keys have defaults so we don't constantly need to check is_array() or isset() when referencing the argument.
with the first approach you are forcing the users of your function to provide all the parameters needed. the second way you cannot be sure that you got all you need. I would prefer the first approach.
If the parameters you're passing in can be grouped together logically you could think about using a parameter object (Refactoring, Martin Fowler, p295), that way if you need to add more parameters you can just add more fields to your parameter class and it won't break existing methods.
There are pros and cons to each way.
If it's a simple function that is unlikely to change, and only has a few arguments, then I would state them explicitly.
If the function has a large number of arguments, or is likely to change a lot in the future, then I would pass an array of arguments with keys, and use that. This also becomes helpful if you have function where you only need to pass some of the arguments often.
An example of when I would choose an array over arguments, for an example, would be if I had a function that created a form field. possible arguments I may want:
Type
Value
class
ID
style
options
is_required
and I may only need to pass a few of these. for example, if a field is type = text, I don't need options. I may not always need a class or a default value. This way It is easier to pass in several combinations of arguments, without having a function signature with a ton arguments and passing null all the time. Also, when HTML 5 becomes standard many many years from now, I may want to add additional possible arguments, such as turning auto-complete on or off.
I know this is an old post, but I want to share my approach. If the variables are logically connected in a type, you can group them in a class, and pass an object of them to the function. for example:
class user{
public $id;
public $name;
public $address;
...
}
and call:
$user = new user();
$user->id = ...
...
callFunctions($user);
If a new parameter is needed you can just add it in the class and the the function signature doesn't need to change.
Recently, I have identified very strong code smell in one of the project, I'm working for.
Especially, in its cost counting feature. To count the total cost for some operation, we have to pass many kind of information to the "parser class".
For example:
phone numbers
selected campaigns
selected templates
selected contacts
selected groups
and about 2-4 information types more
Before refactoring there were all this parameters were passed through constructor to the Counter class(8 parameters, you can image that..).
To increase readability I have introduced a Data class, named CostCountingData with readable getters and setters to all this properties.
But I don't think, that that code became much readable after this refactoring:
$cost_data = new CostCountingData();
$cost_data->setNumbers($numbers);
$cost_data->setContacts($contacts);
$cost_data->setGroups($groups);
$cost_data->setCampaigns($campaigns);
$cost_data->setUser($user);
$cost_data->setText($text);
$cost_data->setTotalQuantity($total_quantity);
$CostCounter = new TemplateReccurentSendingCostCounter($cost_data);
$total_cost = $CostCounter->count();
Can you tell me whether there is some problem with readability of this code snippet?
Maybe you can see any code smells and can point me at them..
The only idea I have how to refactore this code, is to split this big data object to several, consisting related data types. But I'm not sure whether should I do this or not..
Whay do you think about it?
If what you want is named parameters (which it looks to me is what you are trying to achieve), would you ever consider just passing in an associative array, with the names as keys? There is also the Named Parameter Idiom, I can only find a good reference for this for C++ though, perhaps it's called something else by PHP programmers.
For that, you'd change your methods on CostCountingData so they all returned the original object. That way you could rewrite your top piece of code to this:
$cost_data = new CostCountingData();
$cost_data
->setNumbers($numbers)
->setContacts($contacts)
->setGroups($groups)
->setCampaigns($campaigns)
->setUser($user)
->setText($text)
->setTotalQuantity($total_quantity);
So there's two options. I would probably go for the named parameter idiom myself as it is self-documenting, but I don't think the difference is too great.
There are a few possibilities here:
If you only need a few of the parameters each time, the named parameter approach suggested elsewhere in this answer is a nice solution.
If you always need all of the parameters, I would suggest that the smell is coming from a broken semantic model of the data required to calculate a cost.
No matter how you dress up a call to a method that takes eight arguments, the fact that eight supposedly unrelated pieces of information are required to calculate something will always smell.
Take a step back:
Are the eight separate arguments really all unrelated, or can you compose intermediate objects that reflect the reality of the situation a little better? Products? invoice items? Something else?
Can the cost method be split into smaller methods that calculate part of the cost based on fewer arguments, and the total cost produced from adding up the component costs?
If I needed the data object (and classes called "data" are themselves a smell) I'd set its values in its constructor. But the interesting question is where are these values coming from? The source of the values should itself be an object of some sort, and its probably the one you are really interested in.
Edit: If the data is coming from POST, then I would make the class be wrapper around the POST data. I would not provide any setter functions, and would make the read accessors look like this (my PHP is more than a little rusty):
class CostStuff {
constructor CostStuff( $postdata ) {
$mypost = $postdata;
}
function User() {
return $mypost[ "user_name" ];
}
...
}
What do you think is wrong with your code snippet? If you have several variables that you've already used that you now need to get into $cost_data, then you'll need to set each one in turn. There's no way around this.
A question though, is why do you have these variables $numbers, $contracts, etc. that you're now deciding you need to move into $cost_data ?
Is the refactoring you're looking for possibly that the point in the code where you set $numbers should actually be the point you're setting $cost_data->numbers, and having the $numbers variable at all is superfluous?
E.g.
$numbers=getNumbersFromSomewhere() ;
// do stuff
$contracts=getContracstFromSomewhere() ;
// do stuff
$cost_data=new dataObject() ;
$cost_data->setNumbers($numbers);
$cost_data->setContracts($contracts) ;
$cost_data->someOperation() ;
becomes
$cost_data=new dataObject() ;
$cost_data->setNumbers(getNumbersFromSomewhere()) ;
// do stuff
$cost_data->setContracts(getContractsFromSomewhere()) ;
// do stuff
$cost_data->someOperation() ;