After some work in C and Java I've been more and more annoyed by the wild west laws in PHP. What I really feel that PHP lacks is strict data types. The fact that string('0') == (int)0 == (boolean)false is one example.
You cannot rely on what the data type a function returns is. You can neither force arguments of a function to be of a specific type, which might lead to a non strict compare resulting in something unexpected. Everything can be taken care of, but it still opens up for unexpected bugs.
Is it good or bad practice to typecast arguments received for a method? And is it good to typecast the return?
IE
public function doo($foo, $bar) {
$foo = (int)$foo;
$bar = (float)$bar;
$result = $bar + $foo;
return (array)$result;
}
The example is quite stupid and I haven't tested it, but I think everyone gets the idea. Is there any reason for the PHP-god to convert data type as he wants, beside letting people that don't know of data types use PHP?
For better or worse, loose-typing is "The PHP Way". Many of the built-ins, and most of the language constructs, will operate on whatever types you give them -- silently (and often dangerously) casting them behind the scenes to make things (sort of) fit together.
Coming from a Java/C/C++ background myself, PHP's loose-typing model has always been a source of frustration for me. But through the years I've found that, if I have to write PHP I can do a better job of it (i.e. cleaner, safer, more testable code) by embracing PHP's "looseness", rather than fighting it; and I end up a happier monkey because of it.
Casting really is fundamental to my technique -- and (IMHO) it's the only way to consistently build clean, readable PHP code that handles mixed-type arguments in a well-understood, testable, deterministic way.
The main point (which you clearly understand as well) is that, in PHP, you can not simply assume that an argument is the type you expect it to be. Doing so, can have serious consequences that you are not likely to catch until after your app has gone to production.
To illustrate this point:
<?php
function displayRoomCount( $numBoys, $numGirls ) {
// we'll assume both args are int
// check boundary conditions
if( ($numBoys < 0) || ($numGirls < 0) ) throw new Exception('argument out of range');
// perform the specified logic
$total = $numBoys + $numGirls;
print( "{$total} people: {$numBoys} boys, and {$numGirls} girls \n" );
}
displayRoomCount(0, 0); // (ok) prints: "0 people: 0 boys, and 0 girls"
displayRoomCount(-10, 20); // (ok) throws an exception
displayRoomCount("asdf", 10); // (wrong!) prints: "10 people: asdf boys, and 10 girls"
One approach to solving this is to restrict the types that the function can accept, throwing an exception when an invalid type is detected. Others have mentioned this approach already. It appeals well to my Java/C/C++ aesthetics, and I followed this approach in PHP for years and years. In short, there's nothing wrong with it, but it does go against "The PHP Way", and after a while, that starts to feel like swimming up-stream.
As an alternative, casting provides a simple and clean way to ensure that the function behaves deterministically for all possible inputs, without having to write specific logic to handle each different type.
Using casting, our example now becomes:
<?php
function displayRoomCount( $numBoys, $numGirls ) {
// we cast to ensure that we have the types we expect
$numBoys = (int)$numBoys;
$numGirls = (int)$numGirls;
// check boundary conditions
if( ($numBoys < 0) || ($numGirls < 0) ) throw new Exception('argument out of range');
// perform the specified logic
$total = $numBoys + $numGirls;
print( "{$total} people: {$numBoys} boys, and {$numGirls} girls \n" );
}
displayRoomCount("asdf", 10); // (ok now!) prints: "10 people: 0 boys, and 10 girls"
The function now behaves as expected. In fact, it's easy to show that the function's behavior is now well-defined for all possible inputs. This is because the the cast operation is well-defined for all possible inputs; the casts ensure that we're always working with integers; and the rest of the function is written so as to be well-defined for all possible integers.
Rules for type-casting in PHP are documented here, (see the type-specific links mid-way down the page - eg: "Converting to integer").
This approach has the added benefit that the function will now behave in a way that is consistent with other PHP built-ins, and language constructs. For example:
// assume $db_row read from a database of some sort
displayRoomCount( $db_row['boys'], $db_row['girls'] );
will work just fine, despite the fact that $db_row['boys'] and $db_row['girls'] are actually strings that contain numeric values. This is consistent with the way that the average PHP developer (who does not know C, C++, or Java) will expect it to work.
As for casting return values: there is very little point in doing so, unless you know that you have a potentially mixed-type variable, and you want to always ensure that the return value is a specific type. This is more often the case at intermediate points in the code, rather than at the point where you're returning from a function.
A practical example:
<?php
function getParam( $name, $idx=0 ) {
$name = (string)$name;
$idx = (int)$idx;
if($name==='') return null;
if($idx<0) $idx=0;
// $_REQUEST[$name] could be null, or string, or array
// this depends on the web request that came in. Our use of
// the array cast here, lets us write generic logic to deal with them all
//
$param = (array)$_REQUEST[$name];
if( count($param) <= $idx) return null;
return $param[$idx];
}
// here, the cast is used to ensure that we always get a string
// even if "fullName" was missing from the request, the cast will convert
// the returned NULL value into an empty string.
$full_name = (string)getParam("fullName");
You get the idea.
There are a couple of gotcha's to be aware of
PHP's casting mechanism is not smart enough to optimize the "no-op" cast. So casting always causes a copy of the variable to be made. In most cases, this not a problem, but if you regularly use this approach, you should keep it in the back of your mind. Because of this, casting can cause unexpected issues with references and large arrays. See PHP Bug Report #50894 for more details.
In php, a whole number that is too large (or too small) to represent as an integer type, will automatically be represented as a float (or a double, if necessary). This means that the result of ($big_int + $big_int) can actually be a float, and if you cast it to an int the resulting number will be gibberish. So, if you're building functions that need to operate on large whole numbers, you should keep this in mind, and probably consider some other approach.
Sorry for the long post, but it's a topic that I've considered in depth, and through the years, I've accumulated quite a bit of knowledge (and opinion) about it. By putting it out here, I hope someone will find it helpful.
The next version of PHP (probably 5.4) will support scalar type hinting in arguments.
But apart from that: Dynamic type conversion really isn't something you should hate and avoid. Mostly it will work as expected. And if it doesn't, fix it by checking it is_* of some type, by using strict comparison, ..., ...
You can use type hinting for complex types. If you need to compare value + type you can use "===" for comparison.
(0 === false) => results in false
(0 == false) => results in true
Also you write return (array)$result; which makes no sense. What you want in this case is return array($result) if you want the return type to be an array.
I don't think it's bad, but I would go one step further: Use type hinting for complex types, and throw an exception if a simple type isn't one you expect. This way you make clients aware of any costs/problems with the cast (such as loss of precision going from int -> float or float -> int).
Your cast to array in the above code there though is misleading -- you should just create a new array containing the one value.
That all said, your example above becomes:
public function doo($foo, $bar) {
if (!is_int($foo)) throw new InvalidArgumentException();
if (!is_float($bar)) throw new InvalidArgumentException();
$result = $bar + $foo;
return array($result);
}
No, it's not good to typecast because you don't know what you'll have in the end. I would personally suggest using functions such as intval(), floatval(), etc.
Related
So, now we got new PHP7 and we can check if return type is what we want.
For example,
function foo(): bool
{
$num = 8;
if (10 === $num) {
return true;
} else {
return false;
}
}
foo();
OUTPUT: false
Okey, that was easy? It works like it should, but if return is not what we expect?
function bar(): bool
{
$num = 10;
if (10 === $num) {
return array(['apple', 'banana', 'strawberry']);
} else {
return false;
}
}
bar();
OUTPUT: Uncaught TypeError: Return value of foo() must be of the type boolean, array returned
That was very basic and those examples just show how it works.
If we have function like in code example 2, can we check for multiple return types? Like
function bar(): bool || array <---
{
$num = 10;
if (10 === $num) {
return array(['apple', 'banana', 'strawberry']);
} else {
return false;
}
}
bar();
But however this results: FATAL ERROR syntax error, unexpected '||' (T_BOOLEAN_OR), expecting '{' on line number 2
So is it possible to define multiple return types?
#scott-c-wilson has explained the mechanics of the language rule well.
From a design perspective, having a function that potentially returns different types of result indicates a potential design flaw:
If you're returning your result, or otherwise false if something didn't go to plan; you should probably be throwing an exception instead. If the function processing didn't go according to plan; that's an exceptional situation: leverage the fact. I know PHP itself has a habit of returning false if things didn't work, but that's just indicative of poor design - for the same reason - in PHP.
If your function returns potentially different things, then it's quite possible it's doing more than one thing, which is bad design in your function. Functions should do one thing. If your function has an if/else with the true/false block handling different chunks of processing (as opposed to just handling exit situations), then this probably indicates you ought to have two functions, not one. Leave it to the calling code to decide which is the one to use.
If your function returns two different object types which then can be used in a similar fashion in the calling code (ie: there's no if this: do that; else do this other thing in the calling code), this could indicate you ought to be returning an interface, not a concrete implementation.
there will possibly be legit situations where returning different types is actually the best thing to do. If so: all good. But that's where the benefit of using a loosely typed language comes in: just don't specify the return type of the function. But... that said... the situation probably isn't legit, so work through the preceding design considerations first to determine if you really do have one of these real edge cases where returning different types is warranted.
Is it possible to define multiple return types?
No. That's the point! Multiple return types are confusing and mean you need to do extra non-intuitive checking (I'm looking at you, "===").
From the manual:
"...return type declarations specify the type of the value that will be returned from a function." (emphasis mine)
The type, not the types.
Updating this as of 2020.
There has been vote for Union types and it was accepted to PHP 8.
https://wiki.php.net/rfc/union_types_v2
To have multiple return types in PHP you just need to have PHP 8, otherwise it is simply not possible at all. However, you should not be dumm and set everything as possible return, as it might make your code a mess. #Adam Cameron explains this very well in his answer.
Union types should exist, but they don't.
You can use ?int for example, but then it will expect NULL or INT.
The Answer:
It's not possible. There isn't any way natively through PHP.
However, you can omit the 'Type Declaration' which is ONLY there for strict typing. Just use DocBlocks #return bool|string for example. The IDE will recognize it, but PHP won't care.
http://docs.phpdoc.org/references/phpdoc/tags/return.html
I'm wondering what you think the best practice is here-- does it buy you very much to type-check parameters in PHP? I.e have you actually seen noticeably fewer bugs on projects where you've implemented parameter type-checking vs. those that don't? I'm thinking about stuff like this:
public function __construct($screenName, $createdAt) {
if (!is_string($screenName) || !is_string($createdAt) {
return FALSE;
}
}
Normally within a PHP application that makes use of the skalar variable "types" is bound to actually string input (HTTP request). PHP made this easier so to convert string input to numbers so you can use it for calculation and such.
However checking scalar values for is_string as proposed in your example does not make much sense. Because nearly any type of variable in the scalar family is a string or at least can be used as a string. So as for your class example, the question would be, does it actually make sense to check the variable type or not?
For the code you proposed it does not make any sense because you exit the constructor with a return false;. This will end the constructor to run and return a not-properly-initialized object.
Instead you should throw an exception, e.g. an InvalidArgumentException if a constructors argument does not provide the expected / needed type of value.
Leaving this aside and taking for granted that your object constructor needs to differ between a string and an integer or bool or any other of the scalar types, then you should do the checks.
If you don't rely on the exact scalar types, you can cast to string instead.
Just ensure that the data hidden inside the object is always perfectly all-right and it's not possible that wrong data can slip into private members.
It depends. I'll generally use the type-hinting that is built into PHP for higher-level objects ((stdClass $obj, array $arr, MyClass $mine)), but when it comes to lower level values -- especially numbers and strings, it becomes a little less beneficial.
For example, if you had the string '12345', that becomes a little difficult to differentiate between that and the number 12345.
For everything else, the accidental casting of array to a string will be obvious. Class instances which are cast to strings, if they don't have a __toString, will make PHP yell. So your only real issue is classes which have a __toString method and, well, that really limits the number of times where it can come up. I really wonder if it is worth that level of overhead.
Checking function arguments is a very good practice. I suspect people often don't do that because their functions grow bigger and the code becomes uglier and less readable. Now with PHP 7 you can type-hint scalar types but there is still no solution for cases when you want your parameter to be one of two types: array or instance of \Traversable (which both can be traversed with foreach).
In this case, I recommend having a look at the args module from NSPL. The __constructor from your example will have the following look:
public function __construct($screenName, $createdAt)
{
expectsAll(string, [$screenName, $createdAt]);
}
// or require a non-empty array, string or instance of \ArrayAccess
function first($sequence)
{
expects([nonEmpty, arrayAccess, string], $sequence);
return $sequence[0];
}
More examples here.
Better documentation is more important when you're the only one interacting with the methods. Standard method definition commenting gives you well documented methods that can easily be compiled into an API that is then used in many IDEs.
When you're exposing your libraries or your inputs to other people, though, it is nice to do type checking and throw errors if your code won't work with their input. Type checking on user input protects you from errors and hacking attempts, and as a library letting other developers know that the input they provided is not what you're expecting is sometimes nice.
I've always came away from stackoverflow answers and any reading I've done that "===" is superior to "==" because uses a more strict comparison, and you do not waste resources converting value types in order to check for a match.
I may be coming at this with the wrong assumption, so I assume part of this question is, "is my assumption true?"
Secondly,
I'm dealing specifically with a situation where I'm getting data from a database in the form of a string "100".
The code I am comparing is this...
if ($this->the_user->group == 100) //admin
{
Response::redirect('admin/home');
}
else // other
{
Response::redirect('user/home');
}
vs.
if ( (int) $this->the_user->group === 100) //admin
{
Response::redirect('admin/home');
}
else // other
{
Response::redirect('user/home');
}
or even
if (intval($this->the_user->group) === 100) //admin
{
Response::redirect('admin/home');
}
else // other
{
Response::redirect('user/home');
}
is any integrity (or performance) gained by manually casting or converting simply so you can use the identity ('===') comparison?
In your particular case == is the better option. As you (as can be seen in your code) have probably already found out many database functions will always return strings, even if you fetch an integer. So type strict comparison really only bloats your code.
Furthermore you are adding a potential (let's call it theoretic) security risk. E.g. (int) '100AB2' would yield 100. In your case this probably can't happen, but in others it may.
So: Don't overuse strict comparison, it's not always good. You mainly need it only in ambiguous cases, like the return value of strpos.
There is a performance difference between == and === - latter will be even twice as fast, see Equal vs identical comparison operator.
The difference, however is too small to be bothered with - unless the code is executed millions of times.
That's a really tiny optimization you're doing there. Personally, I don't think it's really worth it.
Any boost you gain from not casting the value when using === is lost when you explicitly cast the value. In your case, since the type is not important to you, you should just do == and be done with it.
My recommendation would be to keep === for when you need to check type as well - e.g. 0 evaluating to false and so on.
Any performance gains will be microscopically small, unless you're performing literally billions and trillions of these comparisons for days/months/years on-end. The strict comparison does have its uses, but it also is somewhat of anomally in PHP. PHP's a weakly typed language, and (usually) does the right thing for auto-converting/casting values to be the right thing. Most times, it's not necessary to do a strict comparison, as PHP will do the right thing.
But there are cases, such as when using strpos, where the auto-conversion will fail. strpos will return '0' if the needle you're searching is right at the start of the haystack, which would get treated as FALSE, which is wrong. The only way to handle this is via the strict comparison.
PHP has some WTF loose comparisons that return TRUE like:
array() == NULL
0 == 'Non-numeric string'
Always use strict comparison between a variable and a string
$var === 'string'
From what I know, checking preconditions is a good practice. If a method needs an int value then it's a good solution to do use something like this:
public function sum($input1, $input2) {
if (!is_int($input1)) throw new Exception('Input must be a integer');
However after looking to the source code of Zend/Codeigniter I don't see checks like this very often. Is there a reason for this ?
Because it is difficult / inefficient to test each and every variable before you use it. Instead they check just input variables - check visitors at the door, not once inside the house.
It is of course a good defensive programming technique to test at least more important vars before using them, especially if the input comes from many places.
This is a bit off-topic, but the solution I would recommend is to test input variables like this:
$username=get('username', 'string');
$a=get('a', 'int');
...
$_REQUEST and similar should never be used (or even be accessible) directly.
Also, when doing HTML output, you should always use this:
echo html($username); // replaces '<' with '<' - uses htmlentities
To avoid SQL injection attacks one can use MeekroDB, but it is unfortunately very limiting (MySQL only, single DB only,...). It has a good API though which promotes safety, so I would recommend checking it out.
For myself I have build a small DB library that is based on PDO and uses prepared statements. YMMV.
Specifying such strict preconditions in any case is not necessary and feels not useful in a dynamical typed language.
$sum = sum("1", "2");
Why one should forbid it? Additional if you throw an Exception, one tries to avoid it. This means, he will test and cast himself
function sum ($a, $b) {
if (!is_int($a)) throw new Exception('Input must be a integer');
if (!is_int($b)) throw new Exception('Input must be a integer');
return $a + $b;
}
if (!is_int($value1)) { $value1 = (int) $value1; }
if (!is_int($value2)) { $value2 = (int) $value2; }
$sum = sum($value1, $value2);
Every is_int() occurs multiple times just to avoid unnecessary Exceptions.
Its sufficient to validate values, when you receive them, not all over the whole application.
Speaking about ZF, i'd say that they try to minimize it in favour of interfaces and classes. You can see in many definitions across ZF something like this:
public function preDispatch(Zend_Request_Http $request)
which is fine enough. Also at critical places where ints/strings are needed there are some sanity checks. But mostly not in the form of is_string() but rather as isValidLocale() that calls some other class to check validity.
Summarized, I made a loop with a few iterations to check the efficiency of each test:
$iterations = 99999999;
$var = null;
isset comparasion
if ( isset( $var ) )
{
}
'===' comparasion
if ( $var === null )
{
}
And i have this log, in microseconds:
'isset()': 1.4792940616608
'===': 1.9428749084473
For me, this is a little curious. Why isset() function is faster than one comparison operator as ===?
The === comparison is a strict check, meaning that the two objects you're comparing have to be of the same type. When you break it down in plain English, it's actually not that weird that === needs some more time. Consider the parser to do this:
if (isset($var)) {
// Do I have something called $var stored in memory?
// Why yes, I do.. good, return true!
}
if ($var === null) {
// Do I have something called $var stored in memory?
// Why yes, I do.. good! But is it a NULL type?
// Yes, it is! Good, return true!
}
As you can see, the === operator needs to do an additional check before it can determine if the variable matches the condition, so it's not that strange that it is a little bit slower.
Isset is not a function: it is a language built-in. Using isset is faster than using a function.
The other thing is that isset is used all over the place, so it makes sense that it's been profiled to death, whereas === maybe hasn't received as much love.
Other than that, you'd have to dig in the PHP source with a profiler to see exactly what's going on.
I'm not sure I would call 100 million "a few iterations". You appear to have accumulated about a half-second difference, divide that by 100 million and you get a whopping 5 nanosecond difference per iteration if my math is correct. With the difference being so small it may simply come down to the fact that isset only has one operand in this context and === has two.
It's impossible to even discuss the Zend engine's implementation details of the two examples without specifying a specific PHP version; source code is a moving target. Even minute changes to the implementations are going to effect the results over that many passes. I would not be surprised if you found the opposite to be the case on some versions of PHP and/or in a different context.
isset itself is covered by three different op-codes in the VM depending upon the context:
"Simple" Compiled Variables like your example: ZEND_ISSET_ISEMPTY_VAR
Arrays: ZEND_ISSET_ISEMPTY_DIM_OBJ (requires 2 operands, the var and the index)
Object properties: ZEND_ISSET_ISEMPTY_PROP_OBJ (also 2 operands, var and prop name)
It's an interesting question for curiosity's sake but we're in hair splitting territory and it's probably not a real-world optimization strategy.