I do PHP coding a lot in my company and personal work. Usually my files get bigger, sometimes more than 2000-3000 lines long. Then, they get difficult to manage.
My Question: What should be (is) the standard length of a PHP code file in terms of lines-of-code. At what length do you guys split it up?
Note: No Object Oriented programming (I don't use classes). Please answer accordingly.
Clarification of not using classes:
I do use functions a lot.
I don't use classes because the code is legacy. I have to maintain that and add new features.
I was a C programmer before. So, going OO is somewhat tough for me. Like learning whole new way of doing things.
There is no good standard length. Some files grow bigger, some smaller.
A good guiding principle from Object Oriented Programming is separating tasks and concerns into classes, and splitting those classes into separate files.
That is the most logical separation, and allows using PHP 5's Autoloading. The basic principles may be worth adopting even if you don't want to get into serious OOP.
Related questions:
What are the advantages/disadvantages of monolithic PHP coding versus small specialized php scripts?
Code should not be split according to number of lines of code, it should be split according to functionality. Parts of your code that handle, say, templating, should go in different files (and possibly directories) than parts that handle, say, authentication. If you have a file that's thousands of lines long, it's almost certainly doing way too much and needs to be split up, if not refactored entirely.
Maybe you should start using classes then.
BTW, I definitely split the PHP code files at 1000 lines of code.
Use classes and OO programming. I have been to an workshop once "make love to your code" that stated to avoid functions that are longer as the space on your monitor (you should not scroll to look at the whole function)
Even quite large code files can be reasonably easy to manage if you organise them well. You should keep your functions short, keep related functions together, and name them well.
You will also find it easier to manage if you use an IDE with a function lookup table - I use Netbeans, and on the left hand side it gives me a panel with quick links to all the functions in my current file. It also gives me the ability to click on a line where a function is called and jump to the declaration (anwhere in the project).
On the other hand, if you have code files several thousand lines long which consist of a single function, then yes, the odds are it will be very hard to manage, an no amount of IDE cleverness will help.
Related
I started a small project for me and a few friends to edit a few tables in a multi-database (mysql). Now the project is over several hundred pages and while it looks incredible on the ouside, it is stating to feel cluttered inside. no structure. here is what we have:
3 databases
several hundred tables make up the three DB.
The php project is designed to make it easy to edit these tables instead of manually.
does anyone have a suggestion how to organize the code. I a starting to see repeated includes at the top of files, certain code is starting to repeat (I have functions for the more common ones)
I would like to stay away from "CLASS" type programming (unless you feel this might be best) only because it is an open source project and some of my friends are not that great at php, so want to keep it simple. but for organization, I could go to class style.
my biggest concerns is that the majority of pages (the html part) are tons of cut and paste. so each page is like the other. not sure how to consolidate those efficiently. I think once that part is figured out, the php code will trim up as well.
thanks
I'd go for the Object Oriented style here (or class type programming as you said :)). This will cut your code massively, it will also help if you need to change a function which is on multiple pages rather than changing multiple functions.
Your friends will thank you in the long run, especially when they embrace the goodness of OO.
If you mean "CLASS" as in OOP (Object Oriented Programming) it's definitely something you should consider. Arranging methods in objects is very convenient once you get used to it, when you have discovered the autoloader you'll know why.
You should also take a look on the market of MVC frameworks. MVC stands for Model, View and Controller and is a fairly common pattern amongst applications. I'd recommend looking at CodeIgniter which is very easy to get started with, even without an extensive PHP career.
If you by any chance would stick to the 100% interpreted, in other words: spaghetti and functions. I'd split everything in to files grouped by their area of functionality. Like: media.php, database.php et.c. Take a look at WordPress and the wp-includes folder and see how they've solved it. Good luck!
There are a lot of things you can do differently here. Here are three to get you started:
You're going to have to move to object-oriented programming, especially if you're wanting to go the route of code organization. Keep with the DRY principle at all times.
With that in mind, check out a good frameowrk. I would recommend CodeIgniter. The MVC design pattern will remove a lot of the redundancy in your code if you use it correctly. If you choose to not go down the framework route, I would definitely look at some templating libraries to help you out.
Normalize your database. This will help you remove redundant model code.
My suggestion is to use a XML-like abstraction layer for all the database fields and also use OOP where you can and understand. You can use XML-File-configuration to separate the database-logic and you can use the php XML-extension to parse it. Now if you have your XML-tree in the memory you can parse it again with an engine to output the html stuff. I use this a lot for example TYPO3 uses this too, but not a XML but a very large array. And also using this you can create your own XML-language and attribute. It's a bit like XLST but not as deep but it's better then TYPO3.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I've already awarded a 100 point bounty to mario's answer, but might start a second 100 point bounty if I see new good answers coming in. This is why I'm keeping the question open and will not choose a final answer, despite having awarded the bounty to mario.
This might seem like a simple question (study the code and refactor) but I'm hoping those with lots more experience can give me some solid advice.
The library is an open source 20,000 line library that's all in a single file and which I haven't written myself. The code looks badly written and the single file is even a bigger problem, because it freezes eclipse for half a minute at least every time I want to make a change, which is one of the reasons I think it's worth it to refactor this library into smaller classes.
So aside from reading the code and trying to understand it, are there common (or not so common) tips when refactoring a library such as this? What do you advise to make my life a little easier?
Thanks to everyone for your comments.
A few generic principles apply:
Divide and conquer. Split the file into smaller, logical libraries and function groupings. You will learn more about the library this way, and make it easier to understand and test incrementally.
Remove duplication. Look for repeated functions and concepts, and replace them with standard library functions, or centralized functions within the library.
Add consistency. Smooth out parameters and naming.
Add unit tests. This is the most important part of refactoring a library. Use jUnit (or similar), and add tests that you can use to verify that the functions are both correct, and that they have not changed.
Add docs. Document your understanding of the consistent, improved library as you write your tests.
If the code is badly written, it is likely that it has a lot of cloning. Finding and getting rid of the clones would then likely make it a lot more maintainable as well as reducing its size.
You can find a variety of clone detectors, these specifically for PHP:
Bergmann's PHPCPD
SourceForge PMD
Our CloneDR
ranked in least-to-most capability order (IMHO with my strong personal self-interest in CloneDR) in terms of qualitatively different ability to detect interesting clones.
If the code is badly written, a lot of it might be dead. It would be worthwhile to find out which part executes in practice, and which does not. A test coverage tool can give you good insight into the answer for this question, even in the absence of tests (you simply exercise your program by hand). What the test coverage tool says executes, obviously isn't dead. What doesn't execute... might be worth further investigation to see if you can remove it. A test coverage tool is also useful to tell you how much of the code is exercised by your unit tests, as suggested by another answer. Finally, a test coverage tool can help you find where some of the functionality is: exercise the functionality from the outside, and whatever code the test coverage tool says is executed is probably relevant.
Our PHP Test Coverage Tool can collect test coverage data.
If it's an open source library, ask the developers. First it's very likely someone already has (attempted) a restructured version. And very occassionally the big bloated version of something was actually auto-generated from a more modular version.
I actually do that sometimes for one of my applications which is strictly pluginized, and allows a simple cat */*.php > monolithic.php, which eases distribution and handling. So ask if that might be the case there.
If you really want to restructure it, then use the time-proven incremental extension structure. Split up the class library into mutliple files, by segregating the original class. Split every ~ 2000 lines, and name the first part library0.php:
class library0 {
var $var1,$var2,$var3,$var4;
function method1();
function method2();
function method3();
function method4();
function method5();
The next part simple goes from there and holds the next few methods:
class library1 extends library0 {
function method6();
function method7();
function method8();
...
Do so until you have separated them all. Call the last file by its real name library.php, and class library extends library52 { should do it. That's so ridiculously simplistic, a regex script should be able to do it.
Now obviously, there are no memory savings here. And splitting it up like that buys you nothing in terms of structuring. With 20000 lines it's however difficult to get a quick overview and senseful grouping right the first time. So start with an arbitrary restructuring in lieu of an obvious plan. But going from there you could very well sort and put the least useful code into the last file, and use the lighter base classes whenever they suffice. You'll need a dependency chart however to see if this is workable, else errors might blow up at runtime.
(I haven't tried this approach with a huge project like that. But arbitrarily splitting something into three parts, and then reshuffling it for sensibility did work out. That one time.)
I assume you are planning to break the library up into thematically relevant classes. Definitely consider using autoloading. It's the best thing since sliced bread, and makes inter-dependencies easy to handle.
Document the code using phpDoc compatible comments from the start.
Calling Side Approach
If you know the library use is limited to a particular class, module, or project it can be easier to approach the problem from the calling side. You can then do the following to clean the code and refactor it. The point of approaching from the calling side is because there are very few calls into the library. The fewer the calls the (potentially) less code that is actually used in the lib.
Write the Calling Side Tests
Write a test that mimics the calls that are done against the library.
Bury the Dead Code
If there is a lot of dead code this will be a huge win. Trace the the actual calls into the library and remove everything else. Run the test and verify.
Refactor Whats Left
Since you have the tests it should be much easier to refactor (or even replace) the code in the library. You can then apply the standard refactoring rules ie. (de-duplication, simplification, consolidation, etc).
Apart from what was already stated I suggest to have a look at Martin Fowler's Catalog of Refactorings based on his book. The page also contains a large number of additional sources useful in understanding how refactoring should be approached. A more detailed catalog listing can be found at sourcemaking. Note that not all of these techniques and patterns can be applied to PHP code.
There is also a lot useful tools to assist you in the refactorings (and in general) at http://phpqatools.org. Use these to analze your code to find things like dead or duplicated code, high cyclomatic complexity, often executed code and so on. Not only will this give you a better overview of your code, but it will also tell you which portions of your code are critical (and better left untouched in the beginning) and which could be candidates for refactorings.
Whatever you do, do write Unit-Tests. You have to make sure you are not breaking code when refactoring. If the library is not unit-tested yet, add a test before you change any code. If you find you cannot write a test for a portion of code you want to change, check if doing a smaller refactoring in some other place might let you do so more easily. If not, do not attempt the refactoring until you can.
Write tests for the library such
that all the lines of the code is
covered(i.e 100% Coverage).
Use
TDD. Start from the higher
level module and re-factor(Top to
Bottom approach).
Run the tests mentioned in step 1. and verify with the results of step 2.
I understand that 100% coverage(as mentioned in step 1) does not necessarily mean that all the features have been covered at least we are making sure that whatever the o/p of the current system will be same as the o/p of new system.
A good book that answers your question with a lot of examples and details is: Working Effectively with Legacy Code, by Michael Feathers.
First of all, consider using a different IDE - Eclipse is notoriously terrible in terms of performance. Komodo is way faster. So is PhpStorm.
In terms of making the refactoring easier, I'd first try to identify the high-level picture - what functions are there? Are there classes? Can you put those classes into separate files just to start with?
http://www.amazon.com/Clean-Code-Handbook-Software-Craftsmanship/dp/0132350882
Refactoring depends from you goals and type of solution. This book will help you to understand basic concepts of right code.
If you problem include the headache of manually placing the functions in different files than may be below strategy can help.
get your library file ina php variable
$code = file_get_contents('path/yo/your/library.php');
eliminate tags
$code = str_replace('<?php' ,'' ,$code);
$code = str_replace('?>' ,'' ,$code);
separate all the functions
$code_array = explode('function',$code);
now body of all the functions and their names are in array
create separate files for each of the functions in folder 'functions'
foreach($code_array as $function)
{
$funcTemp = explode('(',$function); // getting function name
$function_name = trim($funcTemp[0]);
$function_text = '<?php function '.$function;
file_put_contents('functions/'.$function_name.'.php',$function_text)
}
now all the functions of your library are in the separate files in a common folder. files are named with the function names. now you can easily look up you functions in folder view and apply your strategies to manage them.
You can also implemet __call() function to use same formates
function __call($name,$params)
{
include_once('functions/'.$name.'.php');
$name($params); // this may be wrong ...
}
Hope it helps :)
Usually, a general rule of thumb is to remove repeated code. Also make sure to have useful documentation. If you're using Java, Javadoc is very useful, but a suitable equivalent is available for other languages.
I am using includes to pull in the various functions I am using, and I am now starting to use include to pull in chunks of HTML/PHP. Is there a point where I have overused includes?
As soon as you start having problems reading your own code that you wrote some time ago, it's definitely too much.
I recommend programming in object oriented PHP and using autoloaders to avoid include/require as far as possible. Excessive use of include/require often leads to unreadable and unmaintainable spaghetti code, which is very bad.
In small projects I usually just have one require statement to pull in my autoloader function(s) and in larger applications I use Zend Framework where I rely on Zend_Loader exclusively.
From a purist point of view I'd say: More than 3 includes/requires in your own code (without third party libs) is too much:
One for inluding some iniitialization stuff
One for loading the autoloader class/function
And the one in the autoloader itself. There should only be one function that actually incudes/requires files. That function or method can then be reused in extended autoloader classes.
I mostly try to stick to that principle.
I'd say it depends to what point your code is still readable. If someone not working on your project have difficulties to understand your code then yes, includes are overused.
You can overuse anything but it's probably not doing you that much harm (just a few extra stats here and there). You have to remember that large projects like Drupal and Wordpress do hundreds, if not thousands of includes.
If you're hooking in HTML, you might be getting a bit desperate. I'd personally have a good look at a proper templating language or even a framework that helped you into a MVC or MVT stance. It makes maintaining it a lot easier than chasing includes all over the place and (more importantly), keeps 95% of your logic out of your presentation files. Oh and they can maintain your databases in a much more programmatic modular method.
Basically Frameworks give you a lot of development benefits ;)
Symphony and CakePHP are both good frameworks but if you just want a look at templating, have a go with Smarty.
If all you are using is includes then I would look into another way of doing it.
For example if you have a separate file for every function maybe look into putting them all in one file or putting them with similar functions.
It's really a matter of architecture and optimisation. Rather than discuss what's the optimal number of includes per script, I'd advise using a template engine, e.g. Smarty because it allows you to:
Separate markup from the program logic
Use template tags and built-in functions to considerably ease the development
Cache preprocessed PHP files making the whole thing a lot faster for your users
I'm creating a PHP file that does 2 mysql database calls and the rest of the script is if statements for things like file_exists and other simple variables. I have about 2000 lines of code in this file so far.
Is it better practice to include a separate file if a statement is true; or simply type the code directly in the if statement itself?
Is their a maximum number of lines of code for a single file that should be adhered to with PHP?
I would say there should not be any performance issue related to the number of lines in your php files, it can be as big as you need.
Now, for the patterns and best practices, I would say that you have to judge by yourself, I saw many well organized files of several thousand lines and a lot of actually small and difficult to read files.
My advise would be:
Judge the readability of the source code, always organize it well.
It's important to have a logical separation to some extent, if your file does both: heavy database access, writing, modification, html rendering, ajax and so on.. You may want to separate things or use object oriented approach.
Always search the balance between the logical separation and code. It should not be messy nor extra-neat with a lot of 10-line files
2000 lines of code in a single file is not exactly bad from a computer point of view but in most situations is probably avoidable, take a look into the MVC design pattern, it'll help you to better organize your code.
Also, bear in mind that including (a lot of) files will slow down the execution of your code.
You may want to read a book like Clean Code by Bob Martin. Here are a few nuggets from that book:
A class should have one responsibility
A function should do one thing and do it well
With PHP, if you aren't using the Class approach; you're going to run into duplication problems. Do yourself a favor and do some reading on the subject; it'll save you a lot more time in extending and maintenance.
Line count is not a good indicator of performance. Make sure that your code is organized efficiently, divided into logical classes or blocks and that you don't combine unrelated code into single modules.
One of the problems with a language like PHP is that, barring some creative caching, every line of every included file must be tokenized, zipped through a parse tree and turned into meaningful instructions every time the hosting page is requested. Compiled platforms like .NET and Java do not suffer from this performance killer.
Also, since one of the other posters mentioned MVC as a way to keep files short: good code organization is a function of experience and common sense and is in no way tied to any particular pattern or architecture. MVC is interesting, but isn't a solution to this problem.
Do you need to focus on the number of lines? No, not necessarily. Just make sure your code is organized, efficient, and not unnecessarily verbose.
It really doesn't matter, so long as you have documented your code properly, modularised as much as possible, and checked for any inefficiencies. You may well have a 10,000 line file. Although I usually split at around 500-1000 for each section of an application.
2k lines sound too much to me... Though it depends what code style you are following, e.g. many linebreaks, many little functions or good api-contract comments can increase the size though they are good practice. Also good code formatting can increase lines.
Regarding PHP it would be good to know: Is it 2k lines with just one class or just one big include with non-OOP PHP code? Is it mixed with template statements and programm logic (like I find often in PHP code)?
Usually I don't count these lines, when to split. They just went into habits. If code gets confusing I react and refactor. Still having looked into some code we as a team wrote recently, I can see some patterns:
extract function/method if size is bigger than 20LOC (without comments) and usage of if/else clauses
extract to another class if size >200-300LOC
extract to another package/folder if artifacts >10
Still it depends what the kind of code I have. For instance if loads of logic is involved (if/else/switch/for), the LOC per function decreases. If there is hardly any logic involved (simple stupid one-path code statements) the limits increase. In the end the most-important rule is: Would a human understand the code. Will she/he be able to read it well.
I don't know any useful way to split code that's that simple, particularly if it all belongs together semantically.
It is probably more interesting to think about whether you can eliminate some of the code by refactoring. For example, if you often use a particular combination of checks with slightly different variables, it might help to outsource the combination of checks into a function and call it wherever appropriate.
I remember seeing a project once that was well-written for the most part, but it had a problem of that kind. For example, the code for parsing its configuration file was duplicated like this:
if (file_exists("configfile")) {
/* tons of code here */
} else if (file_exists("/etc/configfile")) {
/* almost the same code again */
}
That's an extreme example but you get the idea.
This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Why should I use templating system in PHP?
I was just curious as to how many developers actually do this?
Up to this time I haven't and I was just curious to whether it really helps make things look cleaner and easier to follow. I've heard using template engines like Smarty help out, but I've also heard the opposite. That they just create unnecessary overhead and it's essentially like learning a new language.
Does anyone here have experience with templates? What are your feelings on them? Are the helpful on big projects or just a waste of time?
On a side note: The company I work for doesn't have a designer, there are just two developers working on this project charged with the re-design/upgrade. I also use a bit of AJAX, would this have issues with a template engine?
Not only does this practice make the code look cleaner, it also has many long term and short term benefits.
You can never go wrong with organizing code. First off it makes it much easier to maintain and easier to read if someone else has to pick up after you. I have worked with Smarty before and it is nice, it keeps the designers work from interfering with the program code.
Using template systems and frameworks would make it much easier to accomplish tasks. There is a rule of thumb you can follow which is DRY (Don't Repeat Yourself). Frameworks help you achieve this goal.
You may want to look into MVC, this is the model that these frameworks are based off of. But you could implement this design structure without necessarily using framework. Avoiding the learning curve. For frameworks like Zend, the learning curve is much greater than some other ones.
I have found that Code Igniter is fairly easy to use and they have some VERY helpful video tutorials on their website.
Best of Luck!!
Actually it's the business logic that needs to be separated from the views. You can use php as a "template language" inside the views.
You can use ajax on any template engine i think.
Edit
My original response addressed the question whether to use a template engine or not to generate your html.
I argued that php is good enough for template tasks, as long as you separate business logic from presentation logic.
It's worth doing this even for simple pages, because it enables you to:
isolate the code that is the brain of your application from the code that is the face, and so you can change the face, without messing with the brain, or you can enhance the brain without braking the looks
isolate 80% of bugs in 20% of your code
create reusable components: you could assign different presentation code to the same business code, and vice versa;
separate concerns of the feature requests (business code) from the concerns of the design requests (presentation code), which also usually are related to different people on the client side, and different people on the contractor side
use different people to write the business code and the presentation code; you can have the designer to handle directly the presentation code, with minimal php knoledge;
A simple solution, which mimics MVC and doesn't use objects could be:
use a single controller php file, which receives all requests via a .httpdaccess file;
the controller decides what business and presentation code to use, depending on the request
the controller then uses an include statement to include the business php file
the business code does it's magic, and then includes the presentation php file
PHP is a template engine (or if you prefer, a hypertext preprocessor). When HTML is mixed heavily with PHP logic, it does become very difficult to maintain, which is why you would have functions defined separately to build various parts and simply build the page from short function calls embedded in the HTML. Done like this, I don't see much of a difference between Smarty and raw PHP, other than the choice of delimiters.
Separation of concerns is a very important tenant to any type of software development, even on the web. Too many times I have found that people just throw everything into as few files as possible and call it a day. This is most certainly the wrong way to do it. As has been mentioned, it will help with maintainability of the code for others, but more than that, it helps you be able to read the code. When everything is separated out, you can think about easily.
Code Ignitor, I have found, has been the easiest to learn framework for working with PHP. I pretty much started my current job and was up and running with it within a few days, from never having heard of it, to using it pretty efficiently. I don't see it as another language at all, either. Basically, using the framework forces me to organize things in a manageable way, and the added functionality is anlagous to using plugins and such for jQuery, or importing packages in Java. The thought that it's like learning another language seems almost silly.
So, in short, organize organize organize. Keep in mind, though, that there is a level of abstraction that just becomes absurd. A rule of thumb is that a class (or file in our case) should do one thing very well. This doesn't mean it is a class that wraps around print, but takes a string, formats it using a complex algorithm and then prints it (this is just an example). Each class should do something specific, and you can do that without any framework. What makes MVC great, though, is that it lets you organize things further, not just on the single class level, but on the level of "packages", being Model, View, and Controller (at least in the case of these frameworks; there are other ways to package projects). So, now you have single classes that do things well, and then you have them grouped with similar classes that do other things well. This way, everything is kept very clean an manageable.
The last level to think about once you have things organized into classes, and then packages, is how these classes get accessed between packages. When using MVC, the access usually will go Model<->Controller<->View, thus separating the model (which is usually database stuff and "business" code in the PHP world), from the view (which usually takes information from the user, and passes it along to the controller, who will then get more information from the model, if necessary, or do something else with the input information). The controller kind of works like the switchboard between the two other packages usually. Again, there are other ways to go with packaging and such, but this is a common way.
I hope that helps.
Smarty and other php template frameworks really do nothing more than compile to PHP anyway, and they also cache their results in most cases to allow for faster processing. You can do this all on your own, but if you ever look at the compiled templates that Smarty generates, and compare to the original Smarty template you create, you can see that one is far more readable than the other.
I write mostly mod_perl these days and started using templates (HTML::Template) halfway through our ongoing project. If I had to make the decision again, I would use templates right from the start - rewriting later to use templates is kind of tedious, though rewarding because you get nicer and cleaner code. For anything bigger than 2-3 pages in php, I would also use some template engine.
One big advantage of a templating engine such as Smarty is that non-developers can use it to embed the necessary logic that is used on the front-end (one really can't separate logic and display on all but the simplest sites). However, if the developer is the one maintaining the pages then using PHP would be preferable in my opinion.
If you separate out large logic blocks and maintain a consistent patten for looping and for-each flow control statements (i.e. don't use print statements, or only use print statements for one-liners, etc.) Then that should be okay.