I have a php application that has about 50-55 code files. The file that has the maximum amount of code has about 1200 lines of code(this includes the spaces, tabs and multiple line breaks...) and rest of the code files are relatively smaller than this.
The application code in almost every file is a mix of html, sql and php(what you call spaghetti), except in a few files that are pure php include files....for example a file containing functions that are needed in many other places.
I have been considering whether it's a good idea to refactor this application to a mvc type architecture.
Now i know that a mvc application offers plenty of advantages like ease of maintenance, reuse and ease of further development etc but what about scalability and performance - specifically in this case ?
My assumption is that since this is a small application(i believe so, do you think it's small enough?), i don't envision having a hard time with maintenance or adding a few more features(at the max) which just may mean a few additions in existing files or maybe say addition of a maximum 5-10 new files.
So i am thinking i shouldn't be converting to mvc just for maintenance sake.
As far as i understand you may put each component of mvc on a separate server to spread the load so as to have a different server serving html, database and logic and do other optimization/caching further as well to make a hich is mvc application scale and perform.
I think even though in a small spaghetti application we cannot have different servers for html, database etc we can easily scale without degrading the performance by having a load balancer in front of web servers, databae server etc.(Supposing it comes to a situation where one server is not enough)
Even more so on it's own the spaghetti code should perform better than mvc, since it doesn't have any overheads like requiring includes or files or function calls from files placed under folders belonging to a different component of the mvc.
So, considering all these things do you really think it's useful to refactor a relatively small spaghetti application to mvc for scalability and performance?
Please don't tell me that refactoring will be useful in future (i know that will help and will consider if we really need to add much more code to the existing code base) but please provide me with a clear answer to
1)Do i really need to convert this application to a mvc architecture for scalability and performance ?
2)Can a spaghetti application like this scale and perform for atleast a minimum of 1 million request a day and half of which occur during some peak time?
As far as i understand you may put each component of mvc on a separate server to spread the load
I've never heard that myself - but I come from a .Net world where you'd run all your managed code on the same server anyway (it's not like in the Java world where you often have a separate "App" server and "Web" server).
The main reason you'd probably move to MVC (just as you mentioned) is for benefits in managing the code: separation of concerns, re-use, etc; not performance.
In theory you could do this with object / component based technologies like Java or .Net where the components communicate with each other - but in procedural code? i don't think so!
So, considering all these things do you really think it's useful to refactor a relatively small spaghetti application to mvc for scalability and performance?
No - assuming by scalability and performance you're refering to runtime qualities of the system, which I believe is what you meant.
If you put scalability and performance purely into the context of development (people coding - how fast and easily they can work on the system, how easily you can add developers to the project) then the answer would be yes.
2)Can a spaghetti application like this scale and perform for atleast a minimum of 1 million request a day and half of which occur during some peak time?
Nothings impossible - I love Gordons comments along those lines - but as I'm sure you'd agree, it's probably not the best footing you could be on.
No, you don't have to convert because with an infinite budget any application will scale infinitely. Just add more servers. The question is, do you have an infinite budget? If you dont have an infinite budget, find out what is cheaper: add more hardware or optimize code.
So the real answer is: maybe.
We cannot tell you what it takes for your application to reach your scalability goals. We don't know what it does, nor do you provide hard limits for the desired performance. For instance, how long may a request take until it is served? Run ab or Siege and measure your Status Quo. Run a profiler on your code and identify bottlenecks. Find out whether you are IO, CPU or RAM bound. Are you using an Opcode cache? Take all your findings and make an educated guess about cost. Then decide how to optimize.
There will be a point where the effort required to shed some microseconds is higher than simply adding better or more hardware. In practise, you will likely use a mixed strategy to find the most scalable and affordable solution for your needs.
Note that refactoring a big ball of mud into a pristine OOP application does not necessarily mean it will run faster afterwards. In fact, the more loosely coupled, the more indirection, the more layers, the slower the application will likely become. It's a tradeoff for better maintainability. Maintainbility is a cost saver too. It will cut down on your time to deliver new features.
Yes
No
50+ files is not small. Spaghetti code is unmaintainable and highly inefficient, so there is no performance advantage over a proper framework. Finally, a good framework offers well-designed, well-tested, and constantly updated plugins to achieve most common tasks, reducing the amount of code that you have to write and maintain yourself.
Back in 1995 high-traffic sites were running tangled messes of spaghetti-code. Today, you shouldn't even think about running a high-traffic site (or any kind of site!) on spaghetti code.
Related
I know there already are a lot of posts floating on the web regarding this topic.
However, many people tend to focus on different things when talking about it. My main goal is to create a scalable web application that is easy to maintain. Speed to develop and maintain is far more appreciated BY ME than raw performance (or i could have used Java instead).
This is because i have noticed that when a project grows in code size, you must have maintainable code. When I first wrote my application in the procedural way, and without any framework it became a nightmare only after 1 month. I was totally lost in the jungle of spaghetti code lines. I didn't have any structure at all, even though i fought so badly to implement one.
Then I realized that I have to have structure and code the right way. I started to use CodeIgniter. That really gave me structure and maintainable code. A lot of users say that frameworks are slowing things down, but I think they missed the picture. The code must be maintainable and easy to understand.
Framework + OOP + MVC made my web application so structured so that adding features was not a problem anymore.
When i create a model, I tend to think that it is representing a data object. Maybe a form or even a table/database. So I thought about ORM (doctrine). Maybe this would be yet another great implementation into my web application giving it more structure so I could focus on the features and not repeating myself.
However, I have never used any ORM before and I have only learned the basics of it, why it's good to use and so on.
So now Im asking all of you guys that just like me are striving for maintainable code and know how important that is, is ORM (doctrine) a must have for maintainable code just like framework+mvc+oop?
I want more life experience advices than "raw sql is faster" advices, cause if i would only care about raw performance, i should have dropped framework+mvc+oop in the first place and kept living in a coding nightmare.
It feels like it fits so good into a MVC framework where the models are the tables.
Right now i've got like 150 sql queries in one file doing easy things like getting a entry by id, getting entry by name, getting entry by email, getting entry by X and so on. i thought that ORM could reduce these lines, or else im pretty sure that this will grow to 1000 sql lines in the future. And if i change in one column, i have to change all of them! what a nightmare again just thinking about it. And maybe this could also give me nice models that fits to the MVC pattern.
Is ORM the right way to go for structure and maintainable code?
Ajsie,
My vote is for an ORM. I use NHibernate. It's not perfect and there is a sizable learning curve. But the code is much more maintainable, much more OOP. Its almost impossible to create an application using OOP without an ORM unless you like a lot of duplicate code. It will definitely eliminate probably the vast majority of your SQL code.
And here's the other thing. If you're are going to build an OOP system, you'll end up writing your own O/R Mapper anyway. You'll need to call dynamic SQL or stored procs, get the data as a reader or dataset, convert that to an object, wire up relationships to other objects, turn object modifications into sql inserts/updates, etc. What you write will be slower and more buggy than NHibernate or something that's been in the market for a long while.
Your only other choice really is to build a very data centric, procedural application. Yes it may perform faster in some areas. I agree that performance IS important. But what matters is that its FAST ENOUGH. If you save a few milliseconds here and there doing procedural code, your users will not notice the performance increase. But you 'll notice the crappy code.
The biggest performance bottle-necks in an ORM are in the right way to pre-fetch and lazy-load objects. This gets into the n-query problems with ORMs. However, these are easily solved. You just have to performance tune your object queries and limit the number of calls to the database, tell it when to use joins, etc. NHibernate also supports a rich caching mechanism so you don't hit the database at all at times.
I also disagree with those that say performance is about users and maintenance is about coders. If your code is not easily maintained, it will be buggy and slow to add features. Your users will care about that.
I wont say every application should have an ORM, but I think most will benefit. Also don't be afraid to use native SQL or stored procedures with an ORM every now and then where necessary. If you have to do batch updates to millions of records or write a very complex report (hopefully against a separate, denormalized reporting database) then straight SQL is the way to go. Use ORMs for the OOP, transactional, business logic and C.R.U.D. stuff, and use SQL for the exceptions and edge cases.
I'd recommend reading Jeffrey Palermo's stuff on NHibernate and Onion Architecture. Also, take his agile boot camp or other classes to learn O/R Mapping, NHibernate and OOP. Thats what we use: NHibernate, MVC, TDD, Dependency Injection.
A lot of users say that frameworks are
slowing things down, but I think they
missed the big picture. The code MUST
BE MAINTAINABLE and EASY TO
UNDERSTAND.
A well-structured, highly-maintainable system is worthless if its performance is Teh Suck!
Maintability is something which benefits the coders who construct an application. Raw performance benefits the real people who use the app for their work (or whatever). So, whose concerns ought to be paramount: those who build the system or those who pay for it?
I know it's not as simple as that, because the customer will eventually pay for a poorly structured system - perhaps more bugs, certainly more time to fix them, more time to implement enhancements to the application. As is usually the case, everything is a trade-off.
I've started developing like you, without orm tools.
Then i worked for companies where software development was more industrialized, and they all use some kind of orm mapping tool (with more or less features). The development is far easier, faster, produce more maintainable code, etc.
But i've also seen the drawbacks of these tools : very slow performance. But it was mostly misuses of the tool (hibernate in that case).
Orm tool are very complex tool, so it is easy to misuse them, but if you have experience with them, you should be able to get nearly the same performances as with raw sql. I would have three advices for you :
If performance is not critical, use an orm tool (choose a good one, i am not developing with php, so i can't give you a name)
Be sure for each feature you add, to check the sql that the orm tool produce and send to the database (thanks to a logging facility for example). Think if it is the way you would have written your queries. Most of the inefficiencies of orm tools come from unwanted data that are gathered from the db, unique request split in multiple ones, etc. Slowness rarely comes from the tool in itself
Do not use the tool for everything. Choose wisely when not to use it (you reduce maintainability each time you do raw db access), but sometimes, it isn't just worst trying to make the orm tool do something it was not developed for.
Edit:
Orm tool are most useful with very complex model : many relationships between entities. Which is most of the time encountered in configuration part of the application, or in complex business part of the application.
So it is less useful if you have only few entities, and if there is less chance they get changed (refactored).
The limit between few entities and many is not clear. I would say more that 50 differents Types (sql tables, without join tables) is many, and less than 10 is few.
I don't know what was used to build stackoverflow but it must have been very carefully performance tested before.
If you want to build a web site that will get such a heavy load, and if you don't have experience with that, try to get someone in your team that have already worked on such sites (performance testing with a real set of data and a representative number of concurrent users is not an easy and fast task to implement). Having someone that have experience with it will greatly speed up the process.
Its very important to have a maintainabilty that is high. Ive developed large scaled web application with lowlevel super high preformance. The big disadvantage was maintaining the system, that is, developing new features. If you'r to slow developing the customers will look for other systems/applications.. Its a trade of. Most of the orms has features if you need to do optmized queries direct to sql. The orm itself isnt the bottleneck. Ill say its more about a good db design.
I think you missed the picture. Performance is everyday for your users, they care not at all about maintainability. You are being ethnocentric, you are concerned only for your personal concerns and not those of the the people who pay for the system. It isn't all about your convenience.
Perhaps you should sit down with the users and watch them use your system for day or two. Then you should sit down at a PC that is the same power as the ones they use (not a dev machine) and spend an entire week doing nothing but using your system all day long. Then you might understand their point.
I am working on a Facebook game which is developed using Zend framework. Right now I don't have lots of traffic and already seen quite a large # of data usage / CPU time.
Actually, I'm not good at Zend. I good at coding from scratch for both PHP & JS.
so, I am curious about the performance of Zend framework. becuase I'm thinking about rebuilding the applciation using Zend as the backend to manage the data / session / logic. and use JS (native code or JQuery) for the front-end rendering UI and handle user action in the client side.
In between, use aJax to get data from Zend backend.. most likely REST.
Anyone has suggestion about this kind of structure? I want to cut down the server load by that and also easier to manage code, plus better user experience.
Appreciate if anyone has good idea. :)
(POST few days later)
so, baseline PHP should be faster and use less data transfer (if code correctly) then Zend (or any) framework, right? Code reusablility is not a big concern here. :)
One thing many php developers fall victim of is sacrificing good architecture and sound principles for what they perceive as performance.
You may decide to cut corners in your code, but remember, "premature optimization is the root of all evil". So if you need to optimize early make sure that you're actually doing something useful.
The Zend libraries are engineered with best practice in mind, not necessarily performance. The rationale is that there are many ways to speed things up later without sacrificing code maintenance and readability (caching, load balancing, hardware, queue management, etc).
That been said, I don't think what you're looking for are statistics on ZF's performance, but rather advice on how to setup your application with it. Specifically, I'd advise you to create a dedicated, very light bootstrap for ajax requests. In ajax, you would normally only need some minimal prerequisites before dropping inside your controller. For the non ajax requests, set them up normally using the recommended architecture (bootstrap, front controller+plugins, controller+helpers, model+views+helpers).
My personal rule of thumb is that if I only am going to serve about 100 requests throughout the day, then there's very little reason to optimize anything. When the application starts feeling sluggish, if it generates enough revenue, maybe I can get a dedicated server, if not, there are always solutions such as apc, memcached, beanstalkd, etc.
Here's what you're looking for: PHP framework comparison benchmarks
Zend Framework vs raw PHP code tends to be close to a 1:500 requests per second ratio.
If you have a lot of time and not a lot of money, develop in raw PHP. If you have little time and enough money, develop in whatever is fastest without regard for performance...you can always add hardware later. If you fall in between the two, balance it as you see fit.
As much as I dislike it, CakePHP is faster to develop in than ZF, so it's great for time crunches. CodeIgniter is faster than either, but doesn't have as much built-in, so it's closer to the performance end of things without going to raw PHP.
There is only one thing to do first if you think that a ZF-based app is slow. Measure it. Various tools exist that will take the profiling output of Xdebug to show you which parts are slowing down the process, then you can make some moves towards optimising those parts (such as a lighter weight initialisation, and/or some caching). One good resource is the the Zend Framework book at survivethedeepend.com, "performance optimisation for ZF Apps".
In my opinion, benchmarks of frameworks are useless. They say something about an abstract situation, but performance is very situational. You should measure your application and optimise that. Yes, there's probably a limit to how far you can make your application go if you use lots of components from Zend Framework, but then there's also a limit for how fast your application can go before you need to drop PHP and write it in C. It's possible to make Zend Framework perform just fine on a high load site, but you have to put an effort in to it. Just like you have to if you don't use it.
I used to use procedural-style PHP. Later, I used to create some classes. Later, I learned Zend Framework and started to program in OOP style. Now my programs are based on my own framework (with elements of cms, but without any design in framework), which is built on the top of the Zend Framework.
Now it consists of lots classes. But the more I program, more I'm afraid. I'm afraid that my program will be slow because of them I'm afraid to add every another one class which can help me to develop but can slow the application.
All I know is that including lots of files slows application (using eAccelerator + gathering all the code in one file can speed up application 20 times!), but I have no idea if creating new classes and objects slows PHP by itself.
Does anyone have any information about it?
This bugs me. See...procedural code is not always spaghetti code, yet the OOP fanboys always presume that it is. I've written several procedural based web apps as well as an IRC services daemon in PHP. Amazingly, it seems to outperform most of the other ones that are out there and editing it is super easy. One of my friends who generally does OOP took a look at it and said "no code has the right to be this clean"
Conversely, I wrote my own PHP framework (out of boredom) and it was done in a purely OOP manner.
A good programmer can write great procedural code without the overhead classes bring. A bad programmer who uses OOP will always write crappy OOP code that slows things down.
There is no one right answer to which is better for PHP, but rather which is better for the exact scenario.
Here's good article discussing the issue. I also have seen some anecdotal bench-marks that will put OOP PHP overhead at 10-15%
Personally I think OOP is better choice since at the end it may perform better just because it probably was better designed and thought through. Procedural code tends to be messy and hard to maintain. So at the end - it has to be how critical is performance difference for your app vs. ability to maintain, extend and simply comprehend
The most important thing to remember is, design first, optimize later. A better design, which is more maintainable, is better than spaghetti code. Otherwise, you might as well write your web app in assembler. After you're done, you can profile (instead of guess), and optimize what seems slowest.
Yes, every include makes your program slower, but there is more to it than that.
If you decompose your program, over many files, there is a point where you're including/parsing/executing the least amount of code, vs the overhead of including all those files.
Furthermore, having lots of files with little code ain't so bad, because, as you said, using things like eAccelerator, or APC, is a trivial way to get a crap ton of performance back. At the same time you get, if you believe in them, all the wonderful benefits of having and Object Oriented code base.
Also, slow on a per request basis != not scalable.
Updated
As requested, PHP is still faster at straight up array manipulation than it is classes. I vaguely remember the doctrine ORM project, and someone comparing hydration of arrays versus objects, and the arrays came out faster. It's not an order of magnitude, it is noticable, however -- this is in french, but the code and results are completely understandable.. Just a note, that doctrine uses magic methods __get, and __set a lot, and these are also slower than an explicit variable access, part of doctrine's object hydration slowness could be attributed to that, so I would treat it as a worst case scenario. Lastly, even if you're using arrays, if you have to do a lot of moving around in memory, or tonnes of tests, such as isset, or functions like 'in_array' (it's order N), you'll screw the performance benefits. Also remember that objects are just arrays underneath, the interpreter just treats them as a special. I would, personally, favour better code than a small performance increase, you'll get more benefit from having smarter algorithms.
If your project contains many files and due to the nature of PHP's file access checking and restrictions, I'd recommend to turn on realpath_cache, bump up the configuration settings to reasonable numbers, and turn off open_basedir and safe_mode. Ensure to use PHP-FPM or SuExec to run the php process under a user id which is restricted to the document root to get back the security one usually gains from open_basedir and/or safe_mode.
Here are a few pointers why this is a performance gain:
https://bugs.php.net/bug.php?id=46965
http://nirlevy.blogspot.de/2009/01/slow-lstat-slow-php-slow-drupal.html
Also consider my comment on the answer from #Ólafur:
I found especially auto-loading to be the biggest slow down. PHP is extremely slow for directory lookup and file open access, the more PHP function you use during a custom auto-loader, the bigger the slow-down. You can help it a bit with turning off safe-mode (deprecated anyways) or even open-basedir (but I would not do that), but the biggest improvement comes from not using auto-loading and simply use "require_once" with complete fs pathes to require all dependencies per php file you use.
Using large frameworks for web apps that actually do not require so large number of classes for everything is probably the worst problem that many are not aware of. Strip it down at least not to include every bit of code, keep just what you need and throw the rest.
If you're using include_once() then you are causing an unnecessary slowdown, regardless of OOP design or not.
OOP will add an overhead to your code but I will bet that you will never notice it.
You may reconsider to rethink your classes structure and how do you implement them. If you said that OOP is slower you may have to redesign your classes and how do you implement them. A class is just a template of an object, any bad designed method affects all the objects of that class.
Use inheritance and polimorfism the most you can, this will effectively reduce the amount of behaviors and independent methods your classes need, but first off all you need to create a good inheritance map, abstracting your first or mother classes as much as you can.
It is not a problem about how many classes do you have, the problem is how many methods, properties or fields they have and how well are those methods structured. Inheritance reduces the amount of methods to design drammatically and the amount of code to be compiled too.
As several other people have pointed out, there is a mild overhead to OO PHP, but you can offset it by focusing your optimization effort on the core classes that your various other classes derive from. This is why C++ is becoming increasingly popular in the world of high-performance computing, traditionally the realm of C and Fortran.
Personally, I've never seen a PHP server that was CPU-constrained. Check your RAM use (you can optimize the core classes for this as well) and make sure you're not making unnecessary database calls, which are orders of magnitude more expensive than any extra CPU work you're doing.
If you design a huge OOP object hog, that does everything rather than doing functional decomposition to various classes, you will obviously fill up the memory with useless ballast code. Also, with a slow framework you will not make a simply hello World any fast. I noticed it is a kind trend (bad habit) that for one single facebook icon, people include a hole awesome font library and then next there is a search icon with fontello included. Each time they accomplish something unusual, they connect an entire framework. If you want to create a fast loading oop app use one framework only like zephir-phalcon or whatever you fancy and stick to it.
There are ways to limit the penalty from the include_once entries, and that's by having functions declared in the 'include_once' file that themselves have their code content in an 'include' statement. This will load your library of code, but only those functions actually being used will load code as it is needed. You take a second file system hit for the included code, but memory usages drop to practically nothing for the library itself, and only the code used by your program gets loaded. The hit from the second file system access can be mitigated by caching. When dealing with a large project of procedural based PHP, this provides low memory usage and fast processing. DO NOT do this with classes. This would be for a production instance, a development server will show all the penalty of hits since you don't want caching turned on.
I'm thinking about frameworks. I know they make life a lot easier, but as far as I understand they also significantly burden the server, don't they?
I'm looking for some kind of benchmarks, famous frameworks to basic php comparisons.
I have a will to create a kind of my own framework - a way simplier (no ORM and other cool stuff, only main ideas), but more flexible and faster.
Maybe the main question I'm interested in - How slower frameworks are comparing to basic framework-like PHP code with classes and other stuff, but with no ORM.
I'm not really sure whether I made myself clear - sorry for that.
This benchmark might interest you : Labor Day Benchmarks ; and here is an update : A Siege On Benchmarks.
It kinda compares basic-html, basic-php, and some well-known frameworks.
Still, I cannot post such an answer without saying something like this (quoting Survie the Deep End's introduction ; emphasis mine) :
Base performance is something you must
take with a pinch of salt. Since every
framework has varying requirements and
feature mixes it's hard (many would
say even pointless) to compare them on
an equal footing. The main point I'd
make is that judicious use of caching
and optimisation eliminates most base
performance advantages any framework
has over any other framework. Don't
neglect that point please!
In the end of the day (or, when working on a big project), what really makes a difference, at least for "normal projects" is how long you need to develop what your client asked.
And using a big framework with great features reduces development time much -- and facilitates maintenance even more !
Even if it means you'll need one more server : hardware is pretty cheap, compared to a couple of days/weeks of developper-time !
I come from the same background you are now attempting to head into. I too worked exclusively with a framework of my own, and it was great, cause I knew what went wrong when things broke down.
Until you reach for the nearest framework (in my instance Zend) and see that all of the hard work is usually not worth the hassle. Good software takes 10 years to write, imagine just how long a good framework takes.
And at the end of the day, you can speed up you application dramatically with clever use of caching techniques and horizontal scaling, so the framework selection becomes second nature. Choose what you like, choose something that is well supported and you'll be fine.
I'm building a PHP site, but for now the only PHP I'm using is a half-dozen or so includes on certain pages. (I will probably use some database queries eventually.)
Are simple include() statements a concern for speed or scaling, as opposed to static HTML? What kinds of things tend to cause a site to bog down?
Certainly include() is slower than static pages. However, with modern systems you're not likely to see this as a bottleneck for a long time - if ever. The benefits of using includes to keep common parts of your site up to date outweigh the tiny performance hit, in my opinion (having different navigation on one page because you forgot to update it leads to a bad user experience, and thus bad feelings about your site/company/whatever).
Using caching will really not help either - caching code is going to be slower than just an include(). The only time caching will benefit you is if you're doing computationally-intensive calculations (very rare, on web pages), or grabbing data from a database.
Sounds like you are participating in a bit of premature optimization. If the application is not built, while performance concerns are good to be aware of, your primary concern should be getting the app written.
Includes are a fact of life. Don't worry about number, worry about keeping your code well organized (PEAR folder structure is a lovely thing, if you don't know what I'm talking about look at the structure of the Zend Framework class files).
Focus on getting the application written with a reasonable amount of abstraction. Group all of your DB calls into a class (or classes) so that you minimize code duplication (KISS principles and all) and when it comes time to refactor and optimize your queries they are centrally located. Also get started on some unit testing to prevent regression.
Once the application is up and running, don't ask us what is faster or better since it depends on each application what your bottleneck will be. It may turn out that even though you have lots of includes, your loops are eating up your time, or whatever. Use XDebug and profile your code once its up and running. Look for the segments of code that are eating up a disproportionate amount of time then refactor. If you focus too much now on the performance hit between include and include_once you'll end up chasing a ghost when those curl requests running in sync are eating your breakfast.
Though in the mean time, the best suggestions are look through the php.net manual and make sure if there's a built in function doing something you are trying to do, use it! PHP's C-based extensions will always be faster than any PHP code that you could write, and you'll be surprised how much of what you are trying to do is done already.
But again, I cannot stress this enough, premature optimization is BAD!!! Just get your application up off the ground with good levels of abstraction, profile it, then fix what actually is eating up your time rather than fixing what you think might eat up your time.
Strictly speaking, straight HTML will always serve faster than a server-side approach since the server doesn't have to do any interpretation of the code.
To answer the bigger question, there are a number of things that will cause your site to bog down; there's just no specific threshold for when your code is causing the problem vs. PHP. (keep in mind that many of Yahoo's sites are PHP-driven, so don't think that PHP can't scale).
One thing I've noticed is that the PHP-driven sites that are the slowest are the ones that include more than is necessary to display a specific page. OSCommerce (oscommerce.com) is one of the most popular PHP-driven shopping carts. It has a bad habit, however, of including all of their core functionality (just in case it's needed) on every single page. So even if you don't need to display an 'info box', the function is loaded.
On the other hand, there are many PHP frameworks out there (such as CakePHP, Symfony, and CodeIgniter) that take a 'load it as you need it' approach.
I would advise the following:
Don't include more functionality than you need for a specific page
Keep base functions separate (use an MVC approach when possible)
Use require_once instead of include if you think you'll have nested includes (e.g. page A includes file B which includes file C). This will avoid including the same file more than once. It will also stop the process if a file can't be found; thus helping your troubleshooting process ;)
Cache static pages as HTML if possible - to avoid having to reparse when things don't change
Nah includes are fine, nothing to worry about there.
You might want to think about tweaking your caching headers a bit at some point, but unless you're getting significant hits it should be no problem. Assuming this is all static data, you could even consider converting the whole site to static HTML (easiest way: write a script that grabs every page via the webserver and dumps it out in a matching dir structure)
Most web applications are limited by the speed of their database (or whatever their external storage is, but 9/10 times that'll be a database), the application code is rarely cause for concern, and it doesn't sound like you're doing anything you need to worry about yet.
Before you make any long-lasting decisions about how to structure the code for your site, I would recommend that you do some reading on the Model-View-Controller design pattern. While there are others this one appears to be gaining a great deal of ground in web development circles and certainly will be around for a while. You might want to take a look at some of the other design patterns suggested by Martin Fowler in his Patterns of Enterprise Application Architecture before making any final decisions about what sort of design will best fit your needs.
Depending on the size and scope of your project, you may want to go with a ready-made framework for PHP like Zend Framework or PHP On Trax or you may decide to build your own solution.
Specifically regarding the rendering of HTML content I would strongly recommend that you use some form of templating in order to keep your business logic separate from your display logic. I've found that this one simple rule in my development has saved me hours of work when one or the other needed to be changed. I've used http://www.smarty.net/">Smarty and I know that most of the frameworks out there either have a template system of their own or provide a plug-in architecture that allows you to use your own preferred method. As you look at possible solutions, I would recommend that you look for one that is capable of creating cached versions.
Lastly, if you're concerned about speed on the back-end then I would highly recommend that you look at ways to minimize your calls your back-end data store (whether it be a database or just system files). Try to avoid loading and rendering too much content (say a large report stored in a table that contains hundreds of records) all at once. If possible look for ways to make the user interface load smaller bits of data at a time.
And if you're specifically concerned about the actual load time of your html content and its CSS, Javascript or other dependencies I would recommend that you review these suggestions from the guys at Yahoo!.
To add on what JayTee mentioned - loading functionality when you need it. If you're not using any of the frameworks that do this automatically, you might want to look into the __autoload() functionality that was introduced in PHP5 - basically, your own logic can be invoked when you instantiate a particular class if it's not already loaded. This gives you a chance to include() a file that defines that class on-demand.
The biggest thing you can do to speed up your application is to use an Opcode cache, like APC. There's an excellent list and description available on Wikipedia.
As far as simple includes are concerned, be careful not to include too many files on each request as the disk I/O can cause your application not to scale well. A few dozen includes should be fine, but it's generally a good idea to package your most commonly included files into a single script so you only have one include. The cost in memory of having a few classes here and there you don't need loaded will be better than the cost of disk I/O for including hundreds of smaller files.