PHP_CodeSniffer Tutorials

PHP_CodeSniffer Tutorials - php

Does anyone have any tutorials that are up to date, and include more complex processing of rules? Most of the tutorials I am finding on line do not deal with 1.4.3, with the ruleset.xml, but the old php file of coding.
Secondly, I want to do more in-depth processing as our company has different coding standards that I need to code for an enforce, and want a good starting place to understand the existing complex sniffs, and the structures therein.
Our company uses different standards than the common libraries so that when reading the code, the developer knows if the method is from an external library (PEAR/Zend/etc...) as the naming convention will indicate that. If the coding standard is not our format, then the method is from an outside library, and chances are good it works well, without the need for the developer to re-implement something.
In larger code bases, you will see a class created and methods referenced, without knowing the sources anymore, without tracing up the stack. Therefore, by using different standards, are classes will stand out.
For instance:
$Foo = Foo::Find(); // Mixed case - from a library or PHP itself
$Bar = BAR::Find(); // All uppercase - ours, may need to optimize the Find()
Variable declarations are the same, where we use a trailing underscore on methods and variables to indicate Private scope. If someone is changing the scope resolution, they would remove the underscore, and the change/remove private keyword to clearly indicate that they understood the ramifications of their change.

Start here, but it is basic: http://pear.php.net/manual/en/package.php.php-codesniffer.coding-standard-tutorial.php
PHP_CodeSniffer comes with quite a lot of sniffs that do a lot of different things. It might be worth looking through some of those to see how they make use of the token stack.
Using the -vv command line argument is also a really good way to see how a file is converted into tokens. This will help you register to look for the correct token types and make use of the $phpcsFile->findNext() and $phpcsFile->findPrevious() methods that many sniffs use.
Here is a small sniff that might be worth looking at:
https://github.com/squizlabs/PHP_CodeSniffer/blob/master/CodeSniffer/Standards/PSR2/Sniffs/ControlStructures/ElseIfDeclarationSniff.php
And another that shows the usage of additional indexes in the token stack:
https://github.com/squizlabs/PHP_CodeSniffer/blob/master/CodeSniffer/Standards/PSR2/Sniffs/ControlStructures/ControlStructureSpacingSniff.php

Related

PHP - get all declared resources (traits, classes, functions and constants) within a given script?

I designed a PHP 5.5+ framework comprised of more than 750 different classes to make both web applications and websites.
I would like to, ideally, be able to reduce its size by producing a version of itself containing just the bare essential files and resources needed for a given project (whether it's a website or a web application).
What I want to do is to be able to:
reduce the amount of traits, classes, constants and functions to the bare essential per project
compress the code files to achieve a lesser deployment size and faster execution (if possible)
So far, I've got the second part completed. But the most important part is the first, and that's where I'm having problems. I have a function making use of get_declared_classes() and get_declared_traits(), get_defined_constants() and get_defined_functions() to get the full list of user-defined classes, traits, functions and constants. But it gives me pretty much EVERYTHING and that's not what I want.
Is there a way to get all defined classes, functions and constants (no need for traits as I could run class_uses() on every class and get the list of traits in use by that class) for a single given script?
I know there's the token_get_all() function but I tried it with no luck (or maybe it's I'm using it the wrong way).
Any hint? Any help would be greatly appreciated :)

You can use PHP Parser for this. It constructs abstract syntax trees based on the files you supply to it. Then you can analyze its output for each file, and produce a report usable to you.
Other than that, you can use token_get_all() approach you've mentioned already, and write a small parser yourself. Depending on your project, this might be easier or more difficult. For example, do you use a lot of new X() constructs, or do you tend to pass dependencies via constructors?
Unfortunately, these are about the only viable choices you have, since PHP is dynamically typed language.
If you use dependency injection, however, you might want to take a look at your DI framework's internal cache files, which often contain such dependency maps. If you don't use such framework, I recommend to start doing this, especially since your project is big and that's where dependency injection excels at. PHP-DI, one of such frameworks, proved to be successful in some of my middle-size projects (25k SLOC).
Who knows? Maybe refactoring your project to use DI will let you accomplish the task you want without even getting to know all the dependencies. One thing I'm sure of is that it will help you maintain it.

List PHP functions (builtin and optional) that are used by a site

UPDATE: resolved with a PHP parser. Let's reopen so that I can answer and accept.
I'm considering moving a PHP site to a new host. I've got a shared plan on both source and target hosts, therefore limited access to PHP customization.
Certain PHP function families (openssl, mcrypt, gd to name a few) may not be available on the new host. In order to match the API surface against the host, I'd like to statically list all PHP functions, both standalone and class methods, that my files reference.
There's very little dynamic code, so API references that are hidden behind eval are not a concern. Static analysis would be sufficient.
I've tried phpCallGraph with -p and Doxygen - both produce incomplete coverage. Are there any other tools to that effect out there, please?
EDIT: the solution in this question is utterly inapplicable. It's called functions I'm after, not defined ones.
EDIT2: I would like to avoid retesting the whole site. Just the portions that depend on module provided functions.

Solved with PHP-Parser and an hour of work.
Go through all PHP files in a folder, parse each one. Go through the parse tree, identify function calls, store the function names. Identify function definition statements, store the names of these. At the end, subtract the latter from the former, dump the result. Same for class method invokations.

Getting base url, development variable, the best way, without using globals

In my application architecture I want to replace my globals with something that ain't gonna burn most of the developer's eyes, because I am using globals like this,
define('DEVELOPMENT_ENVIRONMENT', true);
// Shorten DIRECTORY_SEPARATOR global,
define('DS', DIRECTORY_SEPARATOR);
// Set full path to the document root
define('ROOT', realpath(dirname(__FILE__)) . DS);
how could I prevent this? I tried creating a class that reads an xml file, but this will give me a longer code like this
$c = new Config();
if($c->devmode === TRUE) {}
or maybe something like this
$c = new Config()
echo $c->baseurl;
Any better ways to do this?

I think questions like yours can not be generally answered but they probably deserve an answer anyway. It's just that there is not the one golden rule or solution to deal with this.
At the most bare sense I can imagine the problem you describe is the context an application runs in. At the level of human face this is multi-folded, just only take the one constant:
define('DEVELOPMENT_ENVIRONMENT', true);
Even quite simple and easily introduced, it comes with a high price. If it is already part of your application first try to understand what the implications are.
You have one application codebase and somewhere in it - in concrete everywhere the constant is used - there are branches of your code that are either executed if this constant is TRUE or FALSE.
This on it's own is problematic because such code tends to become complex and hard to debug. So regardless how (constant, variable, function, class) you first of all should reduce and prevent the usage of such constructs.
And honestly, using a (global) constant does not look that wrong too me, especially compared with the alternatives, it first of all is the most preferable one in my eyes because it lies less and is not complicated but rather straight forward. You could turn this into a less-dynamic constant in current PHP versions by using the const keyword to declare it however:
const DEVELOPMENT_ENVIRONMENT = TRUE;
This is one facet of this little line of code. Another one is the low level of abstraction it comes with. If you want to define environments for the application, saying that a development environment is true or false is ambiguous. Instead you normally have an environment which can be of different types:
const ENVIRONMENT_UNSPECIFIED = 0;
const ENVIRONMENT_DEVELOPMENT = 1;
const ENVIRONMENT_STAGING = 2;
const ENVIRONMENT_LIVE = 3;
const ENVIRONMENT = ENVIRONMENT_DEVELOPMENT;
However this little example is just an example to visualize what I mean to make it little ambiguous. It does not solve the general problem outlined above and the following one:
You introduce context to your application on the level of global. That means any line of code inside a component (function, class) that relates to anything global (here: DEVELOPMENT_ENVIRONMENT) can not be de-coupled from the global state any longer. That means you've written code that only works inside that applications global context. This stands in your way if you want to write re-usable software components. Re-usability must not only mean a second application, it already means in testing and debugging. Or just the next revision of your software. As you can imagine that can stand in your own way pretty fast - or let's say faster then you want.
So the problem here is less the constant on it's own but more relying to the single context the code will run in or better worded global static state. The goal you need to aim for when you would like to introduce changes here for the better is to reduce this global static state. This is important if you're looking for alternatives because it will help you to do better decisions.
For example, instead of introducing a set of constants I have in the last code-example, find places that you make use of DEVELOPMENT_ENVIRONMENT and think why you have put it in there and if it is not possible to remove it out there. So first think about if it is needed at all (these environment flags are often a smell, once needed in a quick debugging or because it was thought "oh how practical" - and then rotting in code over weeks of no use). After you've considered whether it is needed or not and you came to the point it is needed, you need to find out why it is needed at that place. Does it really belong there? Can't it - as you should do with anything that provides context - turned into a parameter?
Normally objects by definition ship with their own context. If you've got a logger that behaves differently in development than in live, this should be a configuration and not a decision inside the application code somewhere. If your application always has a logger, inject it. The application code just logs.
So as you can imagine, it totally depends on many different things how and when you can prevent this. I can only suggest you to find out now, to reduce the overall usage.
There are some practical tips on the way for common scenarios we face in applications. For the "root-path problem" you can use relative paths in conjunction with magic constants like __DIR__. For example if the front-endpoint in the webroot (e.g. index.php) needs to point to the private application directory hosting the code:
<?php
/**
* Turbo CMS - Build to race your website's needs to the win.
*
* Webroot Endpoint
*/
require(__DIR__ . '/../private/myapp/bootstrap.php');
The application then normally knows how it works and where to find files relative to itself. And if you return some application context object (and this must not be global(!)), you can inject the webroot folder as well:
<?php
/**
* Turbo CMS - Build to race your website's needs to the win.
*
* Webroot Endpoint
*/
/* #var $turboAppContext Turbo\App\WebappContext */
$turboAppContext = require(__DIR__ . '/../private/myapp/bootstrap.php');
$turboAppContext->setWebroot(__DIR__);
Now the context of your webserver configures the application defaults. this is a crucial part actually because this touches a field of context inside your application (but not in every component) that is immanent. You can not prevent this context. It's like with leaking abstractions. There is an environment (known as "the system") your application runs in. But even though, you want to make it as independent as possible.
Like with the DEVELOPMENT_ENVIRONMENT constant above, these points are crucial to reduce and to find the right place for them. Also to only allow a very specific layer to set the input values (to change context) and only some high-level layers of your software to access these values. The largest part of your code-base should work without any of these parameters. And you can only control the access by passing around parameters and by not using global. Then code on a level that is allowed to access a certain setting (in the best meaning of the word), can access it - everything else does not have that parameter. To get this safety, you need to kill globals as best as possible.
E.g. the functionalitly to redirect to another location needs the base-url of the current request. It should not fetch them from server variables but based on a request-object that abstracts access to the server variables so that you can replace things here (e.g. when you're moving the application behind a front-proxy - well not always the best example but this can happen). If you have hard-coded your software against $_SERVER you would then need to modify $_SERVER in some stages of your software. You don't want that, instead you move away from this (again) global static state (here via a superglobal variable, spot those next to your global constants) by using objects that represent a certain functionality your application needs.
As long as we're talking about web-applications, take a look at Symfony's request and response abstraction (which is also used by many other projects which makes your application even more open and fluent). But this is just a side-note.
So whatever you want to base your decision on, do not get misguided by how many letters to type. The benefit of this is very short-sighted when you start to consider the overall letters you need to type when developing your software.
Instead understand where you introduce context, where you can prevent that and where you can't. For the places you can't, consider to make context a parameter instead of a "property" of the code. More fluent code allows you more re-usable code, better tests and less hassles when you move to another platform.
This is especially important if you have a large installation base. Code on these bases with global static state is a mess to maintain: Late releases, crawling releases, disappointed developers, burdensome development. There are lessons to learn, and the lessons are to understand which implications certain features of the language have and when to use them.
The best rule I can give - and I'm not an academic developer at all - is to consider global as expensive. It can be a superb shortcut to establish something however you should know about the price it comes with. And the field is wide because this does not only apply to object oriented programming but actually to procedural code as well. In object oriented programming many educational material exists that offers different ways to prevent global static state, so I would even say the situation there is quite well documented. But PHP is not purely OOP so it's not always that easy as having an object at hand - you might first need to introduce some (but then, see as well the request and response abstractions that are already available).
So the really best suggestion I can give to improve your code in context of this question is: Stick to the constant(s) (maybe with const keyword to make them less dynamic and more constant-ly) and then just try to remove them. As written in comments already, PHP does a very fine job about cross-platform file-access, just use / as directory separator, this is well understood and works very well. Try to not introduce a root-path constant anyway - this should not be constant for the code you write but a parameter on some level - it can change, for example in sub-requests or sub-apps which can save you a life-span before re-inventing the wheel again.
The hard task is to keep things simple. But it's worth.

Just put some server variable to the vhost config and prepare different config files for each option. Using apache it would be (you'll need mod_env module):
SetEnv ENVIRONMENT dev
And then in index just use something like:
$configFileName = getenv ('ENVIRONMENT').'.ini';
Now just load this file and determine all the application behaviour on the values given. Ofcourse you can facilitate it further if you use some framework but this would be a good start.

You can encapsulate your constants in a class and then retrieve it by a static methods :
if(Config::devMode()) {}
echo Config::baseUrl();
This way you save a line and some memory because you don't need to instantiate an object.

PHP (A few questions) OO, refactoring, eclipse

I am using PHP in eclipse. It works ok, I can connect to my remote site, there is colour coding of code elements and some code hints.
I realise this may be too long to answer all questions, if you have a good answer for one part, answering just that is ok.
Firstly General Coding
I have found that it is easy to
loose track of included files and
their variables. For example if
there was a database $cursor it is
difficult to remember or even know
that it was declared in the included
file (this becomes much worse the
more files you include). How are
people dealing with this?
How are people documenting their
code - in particular the required
GET and POST data?
Secondly OO Development:
Should I be going full OO in my
development. Currently I have a
functions library which I can
include and have separated each
"task" into a separate file. It is a
bit nasty but it works.
If I go OO how do I structure the
directories in PHP, java uses
packages - what about php?
How should I name my files, should I
use all lower case with _ for spaces
"hello_world.php"? Should I name
classes with Uppercase like Java
"HelloWorld.php"? Is there a
different naming convention for
Classes and regular function files?
Thirdly Refactoring
I must say this is a real pain. If
I change the name of a variable in
one place I have to go through whole
document and each file that included
this file and change the name their
too. Of course, errors everywhere
is what results. How are people
dealing with this problem? In Java
if you change the name in one place
it changes everywhere.
Are there any plugins to improve php
refactoring? I am using the
official PHP version of Eclipse from
their website.
thanks

Firstly General Coding
1) OO can help you with that. As you encapsulate variables and functionality, they don't go out and mess with the namespaces. Assumind I understand right what problem you hint at, using an OO approach helps alleviating conflicts that can arise when you are inadvertedly redeclaring varables. (Note: Alleviate. Not completely prevent on its own. ;))
Otherwise a practise i have encounterd is prepending variable names with something like a 'package name' -- which merely shifts the problem one level up and isn't exactely beautiful either. :|
2) "However suits their purpose". PHPdoc is a good start; will help to create API documentation.
Secondly OO Development:
3) As said before -- "it depends". Do it when needed. You don't have to go full OO for "hello world". But you can. Weigh the costs and benefits of either route and choose wisely. Though I personally want to suggest when in doubt favour OOP over 'unstructured' approaches. Basically, know your tools and when to use them -- then you can make that call on your own easily. :)
4) As far as I can see, the directories "are structured like packages". Mind you, "directories" and "like". Having said that, various frameworks have solved that problem for theirselves; cf; th eother answers.
5) Again, however you please. There is not a definitive way You Have To Do It Or Else. Just stick to it once you chose your path ;3
Aside of that certain frameworks etc. have their own naming conventions. Symfony, e.g., uses CamelCase like Java.
Thirdly Refactoring
I must say this is a real pain.
yes :3 But it pays off.
If I change the name of a variable in one place I have to go through whole
document and each file that included this file and change the name their too.
Of course, errors everywhere is what results. How are people dealing with
this problem? In Java if you change the name in one place it changes everywhere.
No, it doesn't. If you get yourself a tool with support you only have to use the refactoring tool once; but if you rename a class property in java, there is no magic bot that walks through the internet and automagically makes sure everyone on the planet uses the new name. ;)
But as for how to prevent it -- be smart. Honour program contracts, i.e. use interfaces. Do not use functions / members you shouldn't use directly. Watch the hierarchies. Use a reasonable division of code and respect this division's boundaries.
But how people deal with that problem? Well, search and replace I suppose ;)
As for the Eclipse-Plugin -- The dynamic nature of PHP makes it more difficult to automagically refactor code; we can't always use static type hinting etc., and divination of argument and return types is impossible more often than not. So, to the extent of my knowledge, 'automatic refactoring' is not as well-supported by tools as in the Java world. Though I am sure for the doable cases, there should be plugins. :)

I've found using a PHP framework (e.g. Zend, Cake, CodeIgniter, etc) can force class structures and naming conventions while generally addressing autoloading as well. Using PHPDoc formatting liberally helps with code-completion and hinting as well as documenting specific requirements (e.g. method parameter definitions).

For the OO Development part:
I am using the autoload functionality to load the classes dynamically. My directory structure is like packages in java. My classes are named like in java (e.g. HelloWorld.php). But the class is defined with the complete path to that class (e.g. class FW_package1_package2_HelloWorld {...}).
If a class is called the autoload method replaces all _ against / and searches for the class with the extracted path (e.g. FW/package1/package2/HelloWorld.php).
I am strongly influenced by Java, so that I chose this way.

Take a look at nWire for PHP. It is a plugin for Eclipse PDT which provides code exploration and visualization.
It can easily be used to trace dependencies within your application and it is very handy for OO projects, enabling you to visualize class hierarchies and much more.
It doesn't support refactoring, but it can assist by showing you the references of a given components (e.g. a function or a field).

Design Tips for PHP Function Include Files

Good design dictates only writing each function once. In PHP I'm doing this by using include files (like Utils.php and Authenticate.php), with the PHP command include_once. However I haven't been able to find any standards or best practices for PHP include files. What would you at StackOverflow suggest?
I'm looking for:
Naming Standards
Code Standards
Design Patterns
Suggestions for defining return types of common functions
(now I'm just using associative arrays).

One convention I like to use is to put each class in its own file named ClassName.class.php and then set up the autoloader to include the class files. Or sometimes I'll put them all in a classes/ subdirectory and just name them ClassName.php. Depends on how many class vs. non-class includes I'm expecting.
If you organize your utility functions into classes and make them static methods instead, you can get away with writing only a single require_once() in your top level files. This approach may or may not be appropriate for your code or coding style.
As for return types, I try to follow the conventions used in the built-in functions. Return a type appropriate to the request, or return false on failure. Just make sure you use the === operator when checking for false in the results.
The fact that you're concerned about conventions suggests you're already on the right track. If you are familiar with any other OOP language like Java, C++, C#, etc., then you'll find you can follow a lot of the same conventions thanks to the OOP goodness in PHP5.

Whatever naming convention you end up using (I prefer to take cues from either Java or C# wherever possible) make sure if you use include files for functions that they do not actually execute any code upon including, and never include the same file twice. (use include-once or require-once)

Some such standards have been written already. Most large projects will follow and standard of their own.
Here is one written by Zend and is the standard used in the Zend framework.
http://framework.zend.com/manual/en/coding-standard.html
Also, PEAR always had some fairly strict coding standards:
http://pear.php.net/manual/en/standards.php
My preferred answer though is that for your own project you should use what you feel comfortable with, and be internally consistent. For other projects, follow their rules. The consistency allows for greatest code readability. My own standards are not the same as the PEAR ones. I do not indent with four spaces (I use tabs) and I never use camel case like function names, but nonetheless if I am editing something from another project I'll go with whatever that project does.

I've done the following. First, I created an intercepting filter, to intercept all web requests, I also created a version which would work with command line commands.
Both interceptors would go to a boot strap file, which would setup an autoloader. This file as the autoloading function and a hash. For the hash the key is the class name, and the value is the file path to the class file. The autoload function will simply take the class name and run a require on the file.
A few performance tips if you need them, use single quotes in defining the file, as they're slightly faster since they're not interpreted, also use require/include, instead of their _once versions, this is guaranteed to run once, and the former is a fair bit faster.
The above is great, in fact, even with a large code base with a tonne of classes, the hash isn't that big and performance has never been a concern. And more importantly we're not married to some crazy pseudo name space class naming convention, see below.
The other option is delimited name, pseudo name space trick. This is less attractive as name spaces will come with 5.3 and I see this being gross as renaming these across the code base will be less fun. Regardless, this is how it works, assume a root for all your code. Then All classes are named based on the directory traversal required to get there, delimited by a character, such as '_', and then the class name itself, the file will be named after the class, however. This way the location of the class is encoding in the name, and the auto loader can use that. The problem with this method besides really_long_crazy_class_names_MyClass, is that there is a fair bit of processing on each call, but that might be premature optimisation, and again name spaces are coming.
eg.
/code root
ClassA ClassA.php
/subfolder
subFolder_ClassB ClassB.php

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.