PHP: with an associative array of counters, is it more idiomatic to explicitly initialize each value to 0 before the first increment? - php

I'm writing some code that builds up associative array of counters. When it encounters a new item for the first time it creates a new key and initializes it to zero. I.e.:
if (!array_key_exists($item, $counters)) {
$counters[$item] = 0;
}
$counters[$item]++;
However, PHP actually does that first part implicitly. If I just do...
$counters[$item]++;
... then $counters[$item] will evaluate to NULL and be converted to 0 before it's incremented. Obviously the second way is simpler and more concise, but it feels a little sleazy because it's not obvious that $counters[$item] might not exist yet. Is one way or the other preferred in PHP?
For comparison, in Python the idiomatic approach would be to use collections.Counter when you want keys that initialize themselves to 0, and a regular dictionary when you want to initialize them yourself. In PHP you only have the first option.

Incrementing an uninitialized key will generate a PHP Notice, and is a bad idea. You should always initialize first.
However, the use of array_key_exists is not very idiomatic. I know coming from Python it may seem natural, but if you know that $counter has no meaningful NULL values it's more idiomatic to use isset() to test for array membership. (It's also much faster for no reason I can discern!)
This is how I would write a counter in PHP:
$counters = array();
foreach ($thingtobecounted as $item) {
if (isset($counters[$item])) {
$counters[$item]++;
} else {
$counters[$item] = 1;
}
}
Unfortunately unlike Python PHP does not provide any way to do this without performing two key lookups.

the first is preferred. the second option will generate a Notice in your logs that $counters[$item] is undefined. it still works but if you change display_errors = On; and error_reporting = E_ALL. in your php.ini file you will see these notices in your browser.

The first way is generally how you do it, if for nothing other than simpler maintenance. Remember, you may not be the one maintaining the code. You don't want error logs riddled with correctly operating code. Even worse, you may need to transfer methods to other languages (or earlier versions of PHP) where implicit initialization might not occur.

If you don't really need a check on each array index - or know that most of the indexes will be undefinded - why not suppress errors like: ?
(this way you save some performance on initializing [useless] indexes)
if (#!array_key_exists($item, $counters)) {

Related

Is it good practice to ignore non-fatal errors?

When I learned PHP I was taught to make my code error free, but to still hide errors in production code to ensure a clean user experience.
I've recently been involved in some projects where the original writer took the approach of leaving in errors and warnings and even utilizing them to achieve something, rather than write code without it.
For example, the code would look like this:
$numm = 0;
while($numm < 10){
$var = "something,".$var;
$numm++;
}
This code will throw a non-fatal Noticethe first time through the loop, because $var doesn't exist for the first concatenation.
There are tons of other examples where they either ignore errors, or even utilize them (to end loops, etc.) but then hide them from the user.
To me, this seems like bad practice, but I could just be OCD.
A Notice is a bug waiting to happen. I routinely run development with error_reporting(E_ALL); set. I want to find the bugs before they are a problem, and not simply ignore the problems, potential, or not.
Set a requirement of isset($var) in the while loop.
One thing that I have always found annoying was doing things like:
$var = isset($var) ? "something,".$var : "something,";
This one liner will prevent the error but not ideal way of doing it when you consider the number of possible uses. Imagine an associative array that returns that doesn't always have all it's key/values you would expect set.
One approach that i take to nearly all my apps is creating and using the following function:
function rtnVal(&$val, $default = null){
return isset($val) ? $val : $default;
}
so in this case, all I have to do is this:
$var = "something,".rtnVal($var);
Easy ain't it? In case you didn't know, defining
function rtnVal(&$var) { ... }
instead of:
function rtnVal($var) { ... }
(notice the & symbol) means that $var is a 'placeholder' (passed by reference) and not actually passed. So when you use it, it doesn't have to be previously set.
There is one limitation to this though and that's working with Objects, they don't like being passed by reference this way so for those, I have yet to find a better solution.

Is it alright to suppress/hide PHP notices?

I've suppressed notices for quite some time with no problems whatsoever but I am beginning to wonder if I'm doing the right thing. I can't seem to find any logical reason why I shouldn't just suppress them but some other people seem to think that suppressing them using error_reporting is a horrible thing to do, but why?
The closest thing to an answer I could find was in this question but that's still far from the answer I'm looking for. Is there some sort of unforeseen downside to hiding all the notices that PHP generates? For example, to include a variable from a POST call back into the form because there were errors, I would simply use:
<?= $_POST['variable'] ?>
This would generate a PHP notice. To fix that notice, I could use something like this:
<?= isset($_POST['variable']) ? $_POST['variable'] : '' ?>
But, is this really necessary? Will my code actually benefit any from doing this rather than just echoing the variable whether it exists or not and potentially creating a PHP notice? It seems to me that being able to ignore notices is a benefit of using PHP as then you don't have to worry about whether a variable is defined or not, especially for an example such as this where it doesn't seem to matter.
I also take advantage of PHP's ability to automatically change a variable's type/casting depending on how it's being used and you will often find code snippets such as this:
for ($i = 0; $i < $limit; $i++) $results[] = $i; // Example
where $results has not been previously defined, but is turned into an array when I try to add a new item to it as an array. I sort of prefer doing it this way because if no results are added to the array and I need to store that information or convert it to JSON for whatever reason, then that particular variable will not be defined and thus save additional bandwidth, even if it's minute.
$data = stdClass; // For reference, in my case this would be defined before this code
$data->results = array();
$limit = 0;
for ($i = 0; $i < $limit; $i++) $data->results[] = $i;
print json_encode($data);
// {"results":[]}
versus
$data = stdClass; // For reference
$limit = 0;
for ($i = 0; $i < $limit; $i++) $data->results[] = $i;
print json_encode($data);
// []
The question again: what real benefit, if any, do I gain from fixing notice errors rather than just suppressing them? How can/would it harm my code?
In my point of view, you should never suppress errors, any kind of them, notices or not. It might give you some convenience right now, but down the road, you'll face many, many problems with your code when you are maintaining it.
Suppose you have a variable you want to echo out like the above first example. Yes, using isset is a little complicated, but maybe your application should handle the special empty case anyway, thus improving the experience. Example:
if (isset($var)) {
echo $var;
} else {
echo "Nothing is found. Try again later.";
}
If you only had echo $var; and if this was a public facing view a user was reading, they would just see nothing there, which may cause confusion. Of course, this is just one special case where fixing PHP Notices can improve your application.
It shouldn't be taken as a trouble or inconvenience when you are taking care of notices in PHP code, because code is supposed to be clean. I'd rather have a notice-free code than seeing clean code when I open it in source. Of course, both is definitely better! :)
Again, the errors (even if they aren't fatal) will cause problems down the road. If you're already doing things like echo $var; without checking it, that's an assumption that a variable exists, even if you know it might not, it will just give you a habit of assuming things exist and work. This might be small right now, but after a while you'll find out that you'll cause yourself many, many problems.
The notices are there for a reason. If we all did error_reporting(E_ALL ^ E_NOTICE) in our code, we're just being irresponsible for our code. If you are able to fix it, then why are you being lazy and not doing so? Sure, ship 1.0 with notices, fix them later, that's what we all say. But it is better to do that as a habit, code perfect the first time. If you spend 15 minutes writing code plagued by notices, and then spend 2 hours in later development time fixing them, why not just spend an hour and a half perfecting the code as you write it in the first place?
Writing good code should be a habit, not an inconvenience.
Error messages are there for a reason. Respect them, fix them, and there you go, you're a responsible programmer.
You also pave a path for the future maintainers of your code.
Suppose that you have a notice that you shouldn't ignore.
This notice will be hidden into all notices that you usually ignore.
IMHO, warnings should not be ignored. You should always take care of warnings to prevent bug. Every time I have a notice in my log file, I treat it like a bug.
Also, if your site is accessed by a lot of user, you'll get a very big log file.
In my experience, notices usually indicate a good chance that there's a bug somewhere in your code, usually of the case where you expect a certain variable to be set at a certain point but there will be cases where it isn't, and you'll start wondering why some part of your page isn't showing up or starts crashing randomly.
Of course computers aren't all that smart, and there will be cases where the code is clear enough and you don't care if there are any warnings, but that's what the # operator is for.
I respectfully disagree with some of the comments that suggest that you should never suppress notices. If you know what you are doing, I do think using # is very useful, especially for handling unset variables or array elements. Don't get me wrong: I agree that in the hands of an inexperienced or sloppy programmer, # can be evil. However, consider this example:
public function foo ($array) {
if (isset ($array[0])) {
$bar = $array[0];
} else {
$bar = null;
}
// do something with $bar
}
is functionally identical to
public function foo ($array) {
$bar = #$array[0];
// do something with $bar
}
but is IMHO less readable and more work to type. In these types of cases, I know there are exactly two possibilities: a variable is set or it isn't. I don't know in advance, but I must proceed in both cases. I see nothing wrong with using # in that case. Yes, you could also write
public function foo ($array) {
$bar = isset ($array[0]) ? $array[0] : null;
// do something with $bar
}
but I find that only marginally better. To me, code readability and brevity are values in and of themselves, and bloating the code with isset-tests just out of principle to me is a little silly.
Of course, if I am not mistaken, using # takes a tiny bit more time to execute that an isset-test, but let's be honest: how much of our code is truly performance critical? In a loop that is executed a bazillion times, I would probably use isset instead, but in most cases, it makes no difference to the user.

Is calling array() without arguments of any use?

From my C++ knowledge base, I tend to initialize arrays in PHP by typing:
$foo = array()
Or I may bring this custom from Javascript, anyway, is this of any use?
As there's no problem in doing this:
$foo[45] = 'bar' without initializing it as an array, I guess not.
PS: the tags improvement is really good
Yes it is. At the very least in improves readability of code (so that you don't need to wonder 'where does $foo come from? Is it empty, or is there anything in it?`.
Also it will prevent 'Variable '$a' is not set notices, or Invalid argument passed to foreach in case you don't actually assign any values to array elements.
Either method is perfectly acceptable. As mentioned, this practice of using the array() construct is typically carried over from another language where you initialize before populating. With PHP, you can initialize an empty array and then populate later, or you can simply establish an array by assignments, such as $variableName[0] = "x";.
#petruz, that's the best way to do this, no only it will save you from nasty PHP error messages saying that function expects the parameter to be an array but, IMHO, this is the best way to write code. I initialise a variable before using it
Initializing variables before use is good practice. Even if it is not required.
I've had problems (in older versions of PHP, haven't tried recently) where I was acting on array with array_push or something and PHP barked at me. As a general rule it's not necessary, but it can be safer, especially if you're dealing with legacy code; perhaps you're expecting $foo to be an array, but it's actually a boolean? Bad things ensue.
It's good practice. Sooner or later you'll encounter a situation where you might want to do something like this:
array_push($foo, '45');
Which will throw a notice, whereas:
$foo = array();
array_push($foo, '45');
won't.
With initialization:
$myArray = array();
if ($myBoolean) {
$myArray['foo'] = 'bar';
}
return $myArray;
Without initialization:
if ($myBoolean) {
$myArray['foo'] = 'bar';
}
return $myArray;
In the first case it's clear what you want to happen if $myBoolean is false. In the second case it is not and php may throw a warning when you try and use $myArray later. Obviously this is a simplified case, but in a complex case the "if" may be a few lines down and/or not even exist until someone comes along and adds it later without realizing the array wasn't initialized.
While not necessary, I have seen lack of initialization cause non-obvious logic problems like this in complex functions that have been modified a lot over time.

PHP and undefined variables strategy

I am a C++ programmer starting with PHP. I find that I lose most of the debugging time (and my selfesteem!) due to undefined variables. From what I know, the only way to deal with them is to watch the output at execution time.
Are other strategies to notice these faults earlier (something like with C++ that a single compile gives you all the clues you need)?
This is a common complaint with PHP. Here are some ideas:
Use a code analysis tool. Many IDEs such as NetBeans will help also.
Just run the code. PHP doesn't have an expensive compilation step like C++ does.
Use unit testing. Common side effects include: better code.
Set error_reporting(-1), or the equivalent in your ini file.
Get xdebug. It's not preventative, but stack traces help with squishing bugs.
isset(), === null (identity operator), and guard clauses are your friends.
Loose and dynamic typing are a feature of the language. Just because PHP isn't strict about typing doesn't mean you can't be. If it really bugs you and you have a choice, you could try Python instead—it's a bit stricter with typing.
Log your E_NOTICE messages to a text file. You can then process logs with automated scripts to indicate files and lines where these are raised.
No. In PHP, you can only know a variable doesn't exist when you try to access it.
Consider:
if ($data = file('my_file.txt')) {
if (count($data) >= 0)
$line = reset($data);
}
var_dump($line);
You have to restructure your code so that all the code paths leads to the variable defined, e.g.:
$line = "default value";
if ($data = file('my_file.txt')) {
if (count($data) >= 0)
$line = reset($data);
}
var_dump($line);
If there isn't any default value that makes sense, this is still better than isset because you'll warned if you have a typo in the variable name in the final if:
$line = null;
if ($data = file('my_file.txt')) {
if (count($data) >= 0)
$line = reset($data);
}
if ($line !== null) { /* ... */ }
Of course, you can use isset1 to check, at a given point, if a variable exists. However, if your code relies on that, it's probably poorly structured. My point is that, contrary to e.g. C/Java, you cannot, at compile time, determine if an access to a variable is valid. This is made worse by the nonexistence of block scope in PHP.
1 Strictly speaking, isset won't tell you whether a variable is set, it tell if it's set and is not null. Otherwise, you'll need get_defined_vars.
From what I know the only way to deal with them is to watch the output at execution time.
Not really: To prevent these notices from popping up, you just need to make sure you initialize variables before accessing them the first time. We (sadly IMO) don't have variable declaration in PHP, but initializing them in the beginning of your code block is just as well:
$my_var = value;
Using phpDocumentor syntax, you can also kind of declare them to be of a certain a type, at least in a way that many IDEs are able to do code lookup with:
/** #desc optional description of what the variable does
#var int */
$my_var = 0;
Also, you can (and sometimes need to) use isset() / empty() / array_key_exists() conditions before trying to access a variable.
I agree this sucks big time sometimes, but it's necessary. There should be no notices in finished production code - they eat up performance even if displaying them is turned off, plus they are very useful to find out typos one may have made when using a variable. (But you already know that.)
Just watch not to do operations that requires the variable value when using it the first time, like the concatenate operator, .=.
If you are a C++ programmer you must be used to declare all variables. Do something similar to this in PHP by zeroing variables or creating empty array if you want to use them.
Pay attention to user input, and be sure you have registered globals off and check inputs from $_GET and $_POST by isset().
You can also try to code classes against structural code, and have every variable created at the beginning of a class declaration with the correct privacy policy.
You can also separate the application logic from the view, by preparing all variables that have to be outputted first, and when it goes to display it, you will be know which variables you prepared.
During development stages use
error_reporting(E_ALL);
which will show every error that has caused, all NOTICE errors, etc.
Keep an eye on your error_log as well. That will show you errors.
Use an error reporting system, example:
http://php.net/manual/en/function.set-error-handler.php
class ErrorReporter
{
public function catch($errno, $errstr, $errfile, $errline)
{
if($errno == E_USER_NOTICE && !defined('DEBUG'))
{
// Catch all output buffer and clear states, redirect or include error page.
}
}
}
set_error_handler(array(new ErrorReporter,'catch'));
A few other tips is always use isset for variables that you may / may not have set because of a if statement let’s say.
Always use if(isset($_POST['key'])) or even better just use if(!empty($_POST['key'])) as this checks if the key exists and if the value is not empty.
Make sure you know your comparison operators as well. Languages like C# use == to check a Boolean state whereas in PHP to check data-types you have to use === and use == to check value states, and single = to assign a value!
Unless I'm missing something, then why is no one suggesting to structure your page properly? I've never really had an ongoing problem with undefined variable errors.
An idea on structuring your page
Define all your variables at the top, assign default values if necessary, and then use those variables from there. That's how I write web pages and I never run into undefined variable problems.
Don't get in the habit of defining variables only when you need them. This quickly creates spaghetti code and can be very difficult to manage.
No one likes spaghetti code
If you show us some of your code we might be able to offer suggestions on how you can better structure it to resolve these sorts of errors. You might be getting confused coming from a C background; the flow may work differently to web pages.
Good practice is to define all variable before use, i.e., set a default value:
$variable = default_value;
This will solve most problems. As suggested before, use Xdebug or built-in debugging tools in editors like NetBeans.
If you want to hide the error of an undefined variable, then use #. Example: #$var
I believe that various of the Code Coverage tools that are available for PHP will highlight this.
Personally, I try and set variables, even if it's with an empty string, array, Boolean, etc. Then I use a function such as isset() before using them. For example:
$page_found = false;
if ($page_found==false) {
// Do page not found stuff here
}
if (isset($_POST['field'])) {
$value = $_POST['field'];
$sql = "UPDATE table SET field = '$value'";
}
And so on. And before some smart-ass says it: I know that query's unsafe. It was just an example of using isset().
I really didn't find a direct answer already here. The actual solution I found to this problem is to use PHP Code Sniffer along with this awesome extension called PHP Code Sniffer Variable Analysis.
Also the regular PHP linter (php -l) is available inside PHP Code Sniffer, so I'm thinking about customizing my configuration for regular PHP linting, detecting unused/uninitialized variables and validating my own code style, all in one step.
My very minimal PHPCS configuration:
<?xml version="1.0"?>
<ruleset name="MyConfig">
<description>Minimal PHP Syntax check</description>
<rule ref="Generic.PHP.Syntax" />
<rule ref="VariableAnalysis" />
</ruleset>

Unsetting a variable vs setting to ''

Is it better form to do one of the following? If not, is one of them faster than the other?
unset($variable);
or to do
$variable = '';
they will do slightly different things:
unset will remove the variable from the symbol table and will decrement the reference count on the contents by 1. references to the variable after that will trigger a notice ("undefined variable"). (note, an object can override the default unset behavior on its properties by implementing __unset()).
setting to an empty string will decrement the reference count on the contents by 1, set the contents to a 0-length string, but the symbol will still remain in the symbol table, and you can still reference the variable. (note, an object can override the default assignment behavior on its properties by implementing __set()).
in older php's, when the ref count falls to 0, the destructor is called and the memory is freed immediately. in newer versions (>= 5.3), php uses a buffered scheme that has better handling for cyclical references (http://www.php.net/manual/en/features.gc.collecting-cycles.php), so the memory could possibly be freed later, tho it might not be delayed at all... in any case, that doesn't really cause any issues and the new algorithm prevents certain memory leaks.
if the variable name won't be used again, unset should be a few cpu cycles faster (since new contents don't need to be created). but if the variable name is re-used, php would have to create a new variable and symbol table entry, so it could be slower! the diff would be a negligible difference in most situations.
if you want to mark the variable as invalid for later checking, you could set it to false or null. that would be better than testing with isset() because a typo in the variable name would return false without any error... you can also pass false and null values to another function and retain the sentinel value, which can't be done with an unset var...
so i would say:
$var = false; ...
if ($var !== false) ...
or
$var = null; ...
if (!is_null($var)) ...
would be better for checking sentinel values than
unset($var); ...
if (isset($var)) ...
Technically $test = '' will return true to
if(isset($test))
Because it is still 'set', it is just set to en empty value.
It will however return true to
if(empty($test))
as it is an empty variable. It just depends on what you are checking for. Generally people tend to check if a variable isset, rather than if it is empty though.
So it is better to just unset it completely.
Also, this is easier to understand
unset($test);
than this
$test = '';
the first immediately tells you that the variable is NO LONGER SET. Where as the latter simply tells you it is set to a blank space. This is commonly used when you are going to add stuff to a variable and don't want PHP erroring on you.
You are doing different things, the purpose of unset is to destroys the specified variable in the context of where you make it, your second example simply sets the variable to an empty string.
Unsetting a variable doesn't force immediate memory freeing, if you are concerned about performance, setting the variable to NULL may be a better option, but really, the difference will be not noticeable...
Discussed in the docs:
unset() does just what it's name says
- unset a variable. It does not force immediate memory freeing. PHP's
garbage collector will do it when it
see fits - by intention as soon, as
those CPU cycles aren't needed anyway,
or as late as before the script would
run out of memory, whatever occurs
first.
If you are doing $whatever = null;
then you are rewriting variable's
data. You might get memory freed /
shrunk faster, but it may steal CPU
cycles from the code that truly needs
them sooner, resulting in a longer
overall execution time.
I think the most relevant difference is that unsetting a variable communicates that the variable will not be used by subsequent code (it also "enforces" this by reporting an E_NOTICE if you try to use it, as jspcal said that's because it's not in the symbol table anymore).
Therefore, if the empty string is a legal (or sentinel) value for whatever you are doing with your variable, go ahead and set it to ''. Otherwise, if the variable is no longer useful, unsetting it makes for clearer code intent.
They have totally different meanings. The former makes a variable non-existant. The latter just sets its value to the empty string. It doesn't matter which one is "better" so to speak, because they are for totally different things.
Are you trying to clean up memory or something? If so, don't; PHP manages memory for you, so you can leave it laying around and it'll get cleaned up automatically.
If you're not trying to clean up memory, then you need to figure out why you want to unset a variable or set it to empty, and choose the appropriate one. One good sanity check for this: let's say someone inserted the following line of code somewhere after your unset/empty:
if(strcmp($variable, '') == 0) { do_something(); }
And then, later:
if(!isset($variable)) { do_something_else(); }
The first will run do_something() if you set the variable to the empty string. The second will run do_something_else() if you unset the variable. Which of these do you expect to run if your script is behaving properly?
There is one other 'gotcha' to consider here, the reference.
if you had:
$a = 'foobar';
$variable =& $a;
then to do either of your two alternatives is quite different.
$variable = '';
sets both $variable and $a to the empty string, where as
unset($variable);
removes the reference link between $a and $variable while removing $variable from the symbol table. This is indeed the only way to unlink $a and $variable without setting $variable to reference something else. Note, e.g., $variable = null; won't do it.

Categories