Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
To give some context:
I had a discussion with a colleague recently about the use of Autoloaders in PHP. I was arguing in favour of them, him against.
My point of view is that Autoloaders can help you minimise manual source dependency which in turn can help you reduce the amount of memory consumed when including lots of large files that you may not need.
His response was that including files that you do not need is not a big problem because after a file has been included once it is kept in memory by the Apache child process and this portion of memory will be available for subsequent requests. He argues that you should not be concerned about the amount of included files because soon enough they will all be loaded into memory and used on-demand from memory. Therefore memory is less of an issue and the overhead of trying to find the file you need on the filesystem is much more of a concern.
He's a smart guy and tends to know what he's talking about. However, I always thought that the memory used by Apache and PHP was specific to that particular request being handled.
Each request is assigned an amount of memory equal to memory_limit PHP option and any source compilation and processing is only valid for the life of the request.
Even with op-code caches such as APC, I thought that the individual request still needs to load up each file in it's own portion of memory and that APC is just a shortcut to having it pre-compiled for the responding process.
I've been searching for some documentation on this but haven't managed to find anything so far. I would really appreciate it if someone can point me to any useful documentation on this topic.
UPDATE:
Just to clarify, the autoloader discussion part was more of a context :).
It may not have been clear but my main question is about whether Apache will pool together its resources to respond to multiple requests (especially memory used by included files), or whether each request will need to retrieve the code required to satisfy the execution path in isolation from other requests handled from the same process.
e.g.:
Files 1, 2, 3 and 4 are an equal size of 100KB each.
Request A includes file 1, 2 and 3.
Request B includes file 1, 2, 3 and 4.
In his mind he's thinking that Request A will consume 300KB for the entirety of it's execution and Request B will only consume a further 100KB because files 1,2 and 3 are already in memory.
In my mind it's 300KB and 400KB because they are both being processed independently (if by the same process).
This brings him back to his argument that "just include the lot 'cos you'll use it anyway" as opposed to my "only include what you need to keep the request size down".
This is fairly fundamental to how I approach building a PHP website, so I would be keen to know if I'm off the mark here.
I've also always been of the belief that for large-scale website memory is the most precious resource and more of a concern than file-system checks for an autoloader that are probably cached by the kernel anyway.
You're right though, it's time to benchmark!
Here's how you win arguments: run realistic benchmark, and be on the right side of the numbers.
I've had this same discussion, so I tried an experiment. Using APC, I tried a Kohana app with a single monolithic include (containing all of Kohana) as well as with the standard autoloader. The final result was that the single include was faster at a statistically irrelevant rate (less than 1%) but used slightly more memory (according to PHP's memory functions). Running the test without APC (or XCache, etc) is pointless, so I didn't bother.
So my conclusion was to continue use autoloading because it's much simpler to use. Try the same thing with your app and show your friend the results.
Now you don't need to guess.
Disclaimer: I wasn't using Apache. I cannot emphasize enough to run your own benchmarks on your own hardware on your own app. Don't trust that my experience will be yours.
You are the wiser ninja, grasshopper.
Autoloaders don't load the class file until the class is requested. This means that they will use at most the same amount memory as manual includes, but usually much less.
Classes get read fresh from file each request even if an apache thread can handle multiple requests, so your friends 'eventuall all are read' doesn't hold water.
You can prove this by putting an echo 'foo'; above the class definition in the class file. You'll see on each new request the line will be executed regardless of if you autoload or manually include the whole world of class files at start.
I couldn't find any good concise documentation on this--i may write some with some memory usage examples--as i also have had to explain this to others and show evidence to get it to sink in. I think the folks at zend didn't think anyone would not see the benifits of autoloading.
Yes, apc and such (like all caching solutions) can overcome the resouce negatives and even eek out small gains in performance, but you eat up lots of unneeded memory if you do this on a non-trivial number of libraries and serving a large number of clients. Try something Like loading a healthy chunk of the pear libraries in a massive include file while handling 500 connections hitting your page at the same time.
Even using things like Apc you benefit from using autoloaders with any non-namespaced classes (most of the existing php code currently) as it can help avoid global namespace pollution when dealing with large umbers of class libraries.
This is my opionion.
I think autoloaders are a very bad idea for the following reasons
I like to know what and where my scripts are grabbing the data/code from. Makes debugging easier.
This also has configuration problems in so far as if one of your developers changes the file (upgrade etc) or configuration and things stop working it is harder to find out where it is broken.
I also think that it is lazy programming.
As to memory/preformance issues it is just as cheap to buy some more memory for the computer if it is struggling with that.
Related
I am now writing a php framework. I am wondering whether it will slow down when php require/include or require_once/include_once too many files during a request?
Well of course it will. Doing anything too many times will cause a slow down.
On a more serious note though, IO operations that touch disk are very slow compared to anything that happens in memory. So often times, including files will be a major performance factor when using a large framework (just look at Zend Framework...).
However, there are typically ways to alleviate this such as APC and similar op code caches.
Sometimes programming approaches are also taken. For example, if I remember correctly, Doctrine 1 has the capability to bundle everything into 1 giant file as to have fewer IO calls.
If in doubt, do some indepth profiling of an application written with your framework and see if include/require/etc are one of the major slow points.
Yes, this will slow your application down. *_once calls are generally more expensive, since it must be checked whether that file has already been included. With a lot of includes, there is a lot of hard disk access and a lot of memory usage bundled. I've developed applications with the Zend Framework that include a total of 150 to 200 files at each request - you really can see the impact that has on the overall performance.
The more files you include will add to some load. However, if you have to choose between require and require_once, require_once / include_once take more load because a check will need to be done by the server to see if the same file has been included elsewhere. So if you could possibly avoid that, at least you could boost performance.
Unless you use cache libraries, everytime a request comes those files would be included again and again. Surely it would slow things down. Create a framework that only include-s what needs to be include-ed.
I've just noticed that my app is including over 148 php files on one page. Bear in mind this is the back end admin and not the main site, but is this too many? What impact does a large number of includes have on a server, both whilst under average load and stressed? Would disk I/o be a problem?
Included File Stats
File Type - Include Count - Combined File Size
Index - 1 - 0.00169 MB
Bootstrap - 1 - 0.01757 MB
Helper - 98 - 0.58557 MB - (11 are Profiler related classes)
Configuration - 8 - 0.00672 MB
Data Store - 23 - 0.10836 MB
Action - 8 - 0.02652 MB
Page - 1 - 0.00094 MB
I18n Resource - 7 - 0.00870 MB
Vendor Library - 1 - 0.02754 MB
Total Files - 148 - 0.78362 MB
Time ran 0.123920917511
Memory used 2.891 MB
Edit 1. Should be noted that this is a worst case scenario page. It has many different template models, controllers and associated views because it handles publishing with custom fields.
Edit 2. Also the frontend has agressive page caching so the number of includes in the front is roughly 30-40 at the moment.
Edit 3. Profiler when turned off won't include the files so this will reduce quite a few includes
So, here's a breakdown of the potential problems.
The number of files itself is an issue. Unless you're using a bytecode cache (and you are), and that cache is configured to not stat the file prior to pulling in the compiled bytecode, PHP is going to stat every single one of those files on include, then read them in. In some cases, that can also mean path resolution and a naive autoloader that pokes and prods at numerous directories. This won't be "slow" because the OS will surely have things cached if the files are hit frequently, but it does add precious milliseconds to each request.
If every autoloader is designed properly and the codebase relies entirely on the autoloader to pull in the required classes (meaning nothing uses include/require/include_once/require_once on a class file), you can avoid having to open and read many of the files by gluing every single class together into a single large include. This is a bit on the impractical side of things, mainly because if there is no bytecode cache, PHP still has to parse, compile and interpret it all. Additionally, not every class is going to be used on every request, so it may be a bit wasteful.
The bottom line is that a well-configured bytecode cache will completely mitigate this problem. There's nothing wrong with telling your customers that they have to properly configure their servers for optimal performance. If they know what they're doing, they'll have everything correct to begin with.
Yes, so many files can be a problem.
No, it is probably not a problem in your case, since this is only a back-end, which is probably accessed by a few people, and not too often.
In general, I would discourage having more than 20 PHP files called on each page. This is because even the website and the server are highly optimized, for every page, the server must go and look at every file to see at least if it changed since the last request (if there is no cache implemented on this level).
Even if the time to access a file is tiny, it is a time you are loosing at each request. This tiny period of time multiplied by 148 can become an issue (and a huge scalability problem).
When I worked on a PHP framework project, I used a trick to reduce the number of files. Several files were combined to one minified file, and this single file was cached. Then, if there was a need to update the framework or the website, the cached file was automatically removed, then rebuilt.
Even if I personally discourage you to minify the source code (because it is difficult to do, difficult to test, and creates a bunch of problems, like the meaningless numbers of lines in errors), you can probably do the same thing by combining all your files into a single file.
Be careful: if a page A uses half of those files, and page B - another half, combining everything will probably decrease the performance, since PHP engine will have to parse more code.
Are the includes themselves doing something fancy, like db queries? And are they all at the top of the page, or are they included as-needed?
Those stats don't look bad, so, if admin access is infrequent, you may be ok. But you should examine this from a design angle: can things can be organized in a way that would prevent you from having to maintain so many includes? Separate from any performance issues, there is a risk here of creating hard-to-track dependency bugs.
(It could be as MainMa said, related to a framework, in which case you may have no control over the above. I only mention it in case you do.)
A couple things in case you didn't know already:
If it's just text or static HTML, you
can get the contents with
file_get_contents(), readfile(), etc. This is
somewhat faster because the loaded
file doesn't need parsing. But
obviously if it contains PHP code
this won't help.
You can use
include_once() to prevent the same
file from being included twice (if, for instance, it's included by two files
that are themselves included by the top level file).
Disk I/O won't be your problem. The system will cache frequently accessed files in RAM, or if they aren't that frequently accessed, it won't matter.
Load times may be an issue, as each file has to be requested and interpreted by the server separately.
I don't know how the web server will cope with the many requests; it may not care. If the client doesn't do pipelined requests though, you'll pay for many many TCP connections built up and torn down, which also costs a goodly amount of latency.
Honestly, don't worry about it - 148 is nothing, even if 0 caching happened at php side you're going to be hitting fs caches almost everytime - and in the grand scheme of things virtually every opensource anything out there has way more files without a problem (drupal, wordpress, joomla, elgg, anything).
Really, no problem here - even if you managed to shave a millisecond here or there off, it's so far down the priority list and places where you can make speed gains it's barely worth considering for more than a second.
caveat: do try to use require_once and include_once where suited and ensure you only load those classes/files that are needed for a given request to process.
I read through the related questions and didn't find my answer. This isn't about require/require_once or the use of the __autoload function or even the name of the files.
My company builds large sites and as we've grown, the practice we've grown into is splitting up functions by their relation such as:
inc.functions-user.php
inc.functions-media.php
inc.functions-calendar.php
Each these files tends to be 1000 to 3000 lines of code. Combining would make them a monster to maintain and more difficult for more developers.
However, in some of our larger sites, we end of with somewhere between 8 and 15 of these individual functions files.
Is including the 15 functions files in the header the best way or should we find a way to combine them? Are 12 includes vs. 5 includes significantly detrimental to the running of our site?
If you care about performance install an opcocde cche like APC which will save the compiled form of the script in memory.
If you don't want to install APC the difference is minimal, yes accessing less files takes less time, but that's not where most of the time is spent. (Especially as the filesystem should be able to cache the scripts (uncompiled) in memory) if they are requested often enough.
Calling include/require function 5 times instead of 12 time is not so different, what is important is content of the included file(s).
Also, include cahchers are well suit for your purpose such as APC or xcache.
I would even suggest to split them into much more files.
Look at the MVC pattern, or other frameworks, they are extremly splitted, so you easily can maintain "only" parts, without worrying about destroying something, as long as you follow your structure.
Some points to consider that I think about also
Rasmus Lerdorf has said frequently that "you shouldn't have more than about five includes". I can only assume that he knows what he's talking about, because he made PHP. I am skeptical about the feasibility of this, however. Especially on large projects.
I've found that it's better for development and milestones to make life easier on your developers. If that means separate files, then that's a good idea.
If you're worried about CPU usage or bandwidth, there are probably more obvious bottlenecks than liberal use of include. Un-optimized functions are a good way to make the app faster, and paying attention to images and css or js files is a good way to reduce bandwidth.
With vanilla PHP it is generally better to use as few include files as possible, but of course that makes maintenance a pain. Use an opcode cache such as APC and the performance problem will pretty much disappear. Also, 12 files isn't a very large number of includes, compared to the large MVC-frameworks and other libraries. Keeping the functions separated in a logical structure is the best way by far.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I want to know when building a typical site on the LAMP stack how do you optimize it for the best possible load times. I am picturing a typical DB-driven site.
This is a high-level look and could probably pull in question and let me break it down into each layer of the stack.
L - At the system level, (setup and filesystem) can you do to improve speed? One thing I can think of is image sizes, can compression here help optimize anything?
A - There have to be a ton of settings related to site speed here in the web server. Not my Forte. Probably depends a lot on how many sites are running concurrently.
M - MySQL in a database driven site, DB performance is key. Is there a better normalization approach i.e, using link tables? Web developers often just make simple monolithic tables resembling 1NF and this can kill performance.
P - aside from performance-boosting settings like caching, what can the programmer do to affect performance at a high level? I would really like to know if MVC design approaches hit performance more than quick-and-dirty. Other simple tips like are sessions faster than cookies would be interesting to know.
Obviously you have to get down and dirty into the details and find what code is slowing you down. Also I realize that many sites have many different performance characteristics, but let's assume a typical site that has more reads then writes.
I am just wondering if we can compile a bunch of best practices and fully expect people to link other questions so we can effectively workup a checklist.
My goal is to see if even in addition to the usual issues in performance we can see some oddball things you might not think of crop up to go along with a best-practices summary.
So my question is, if you were starting from scratch, how would you make sure your LAMP site was fast?
Here's a few personal must-dos that I always set up in my LAMP applications.
Install mod_deflate for apache, and
do not use PHP's gzip handlers.
mod_deflate will allow you to
compress static content, like
javascript/css/static html, as well
as the usual dynamic PHP output, and
it's one less thing you have to worry
about in your code.
Be careful with .htaccess files!
Enabling .htaccess files for
directories in your app means that
Apache has to scan the filesystem
constantly, looking for .htaccess
directives. It is far better to put
directives inside the main
configuration or a vhost
configuration, where they are loaded
once. Any time you can get rid of a
directory-level access file by moving
it into a main configuration file,
you save disk access time.
Prepare your application's database
layer to utilize a connection manager
of some sort (I use a Singleton for
most applications). It's not very
hard to do, and reducing the number
of database connections your
application opens saves resources.
If you think your application will
see significant load, memcached can
perform miracles. Keep this in mind
while you write your code... perhaps
one day instead of creating objects
on the fly, you will be getting them
from memcached. A little foresight
will make implementation painless.
Once your app is up and running, set
MySQL's slow query time to a small
number and monitor the slow query log
diligently. This will show you where
your problem queries are coming from,
and allow you to optimize your
queries and indexes before they
become a problem.
For serious performance tweakers, you
will want to compile PHP from source.
Installing from a package installs a
lot of libraries that you may never
use. Since PHP environments are
loaded into every instance of an
Apache thread, even a 5MB memory
overhead from extra libraries quickly
becomes 250MB of lost memory when
there's 50 Apache threads in
existence. I keep a list of my
standard ./configure line I use when
building PHP here, and I find it
suits most of my applications. The
downside is that if you end up
needing a library, you have to
recompile PHP to get it. Analyze
your code and test it in a devel
environment to make sure you have
everything you need.
Minify your Javascript.
Be prepared to move static content,
such as images and video, to a
non-dynamic web server. Write your
code so that any URLs for images and
video are easily configured to point
to another server in the future. A
web server optimized for static
content can easily serve tens or even
hundreds of times faster than a
dynamic content server.
That's what I can think of off the top of my head. Googling around for PHP best practices will find a lot of tips on how to write faster/better code as well (Such as: echo is faster than print).
First, realize that performance is an iterative process. You don't build a web application in a single pass, launch it, and never work on it again. On the contrary, you start small, and address performance issues as your site grows.
Now, onto specifics:
Profile. Identify your bottlenecks. This is the most important step. You need to focus your effort where you'll get the best results. You should have some sort of monitoring solution in place (like cacti or munin), giving you visibility into what's going on on your server(s)
Cache, cache, cache. You'll probably find that database access is your biggest bottleneck on the back end -- but you should verify this on your own. Fortunately, you'll probably find that a lot of your traffic is for a small set of resources. You can cache those resources in something like memcached, saving yourself the database hit, and resulting in better backend performance.
As others have mentioned above, take a look at the YDN performance rules. Consider picking up the accompanying book. This'll help you with front end performance
Install PHP APC, and make sure it's configured with enough memory to hold all your compiled PHP bytecode. We recently discovered that our APC installation didn't have nearly enough ram; giving it enough to work in cut our CPU time in half, and disk activity by 10%
Make sure your database tables are properly indexed. This goes hand in hand with monitoring the slow query log.
The above will get you very far. That is to say, even a fairly db-heavy site should be able to survive a frontpage digg on a single modestly-spec'd server if you've done the above.
You'll eventually hit a point where the default apache config won't always be able to keep up with incoming requests. When you hit this wall, there are two things to do:
As above, profile. Monitor your apache activity -- you should have an idea of how many connections are active at any given time, in addition to the max number of active connections when you get sudden bursts of traffic
Configure apache with this in mind. This is the best guide to apache config I've seen: Practical mod_perl chapter 11
Take as much load off of apache as you can. Apache's too heavy-duty to serve static content efficiently. You should be using a lighter-weight reverse proxy (like squid) or webserver (lighttpd or nginx) to serve static content, and to take over the job of spoon-feeding bytes to slow clients. This leaves Apache to do what it does best: execute your code. Again, the mod_perl book does a good job of explaining this.
Once you've gotten this far, it's largely an issue of caching more, and keeping an eye on your database. Eventually, you'll outgrow a single server. First, you'll probably add more front end boxes, all backed by a single database server. Then you're going to have to start spreading your database load around, probably by sharding. For an excellent overview of this growth process, see this livejournal presentation
For a more in-depth look at much of the above, check out Building Scalable Web Sites, by Cal Henderson, of Flickr fame. Google has portions of the book available for preview
I've used MysqlTuner for performance analysis on my mysql servers and its given a good insight into further issues for googling, as well as making its own recommendations
A resource you might find helpful is the YDN set of performance rules.
Don't forget the fact that your users will be thousands of miles away from your server, and downloading dozens of files to render a single page. That latency, and the overhead of rendering the page in their browsers can be larger than the amount of time that you spend collecting the information, and generating the page.
See the pages at Yahoo Developer Network about Best Practices for Speeding Up Your Web Site, and the YSlow tool for seeing what part of the downloading of the site is taking time.
Don't forget to turn off atime for your filesystem!
I'd recommend using Jet Profiler for MySQL to find any bad queries. I've successfully used it on a couple of my sites. Really helpful, and much easier to digest than the slow query log.
I'd recommend starting with http://highscalability.com/
As for your suggestions:
Compression for images, definitely no. Type of files system tunning, yes, that could have some effect, but minimal. But actually the best is to use in-memory reverse proxy, or even better CDN.
For Apache basically only load the modules you need. Do not load anything else. As with PHP you can only use forking MPM, it's important to keep it slim. As for optimal settings, well you have to fine tune them to specific application, hardware etc. If you have enough CPU, it's recommendable that you use mod_deflate. Faster the server can send data to the client, faster it can start processing next request.
I was just reading over this thread where the pros and cons of using include_once and require_once were being debated. From that discussion (particularly Ambush Commander's answer), I've taken away the fact(?) that any sort of include in PHP is inherently expensive, since it requires the processor to parse a new file into OP codes and so on.
This got me to thinking.
I have written a small script which will "roll" a number of Javascript files into one (appending the all contents into another file), such that it can be packed to reduce HTTP requests and overall bandwidth usage.
Typically for my PHP applications, I have one "includes.php" file which is included on each page, and that then includes all the classes and other libraries which I need. (I know this isn't probably the best practise, but it works - the __autoload feature of PHP5 is making this better in any case).
Should I apply the same "rolling" technique on my PHP files?
I know of that saying about premature optimisation being evil, but let's take this question as theoretical, ok?
There is a problem with Apache/PHP on Windows which causes the application to be extremely slow when loading or even touching too many files (page which loads approx. 50-100 files may spend few seconds only with file business). This problem appears both with including/requiring and working with files (fopen, file_get_contents etc).
So if you (or more likely anybody else, due to the age of this post) will ever run your app on apache/windows, reducing the number of loaded files is absolutely necessary for you. Combine more PHP classes into one file (an automated script for it would be useful, I haven't found one yet) or be careful to not touch any unneeded file in your app.
That would depend somewhat on whether it was more work to parse several small files or to parse one big one. If you require files on an as-needed basis (not saying you necessarily should do things that way ) then presumably for some execution paths there would be considerably less compilation required than if all your code was rolled into one big PHP file that the parser had to encode the entirety of whether it was needed or not.
In keeping with the question, this is thinking aloud more than expertise on the internals of the PHP runtime, - it doesn't sound as though there is any real world benefit to getting too involved with this at all. If you run into a serious slowdown in your PHP I would be very surprised if the use of require_once turned out to be the bottleneck.
As you've said: "premature optimisation ...". Then again, if you're worried about performance, use an opcode cache like APC, which makes this problem almost disappear.
This isn't an answer to your direct question, just about your "js packing".
If you leave your javascript files alone and allow them to be included individually in the HTML source, the browser will cache those files. Then on subsequent requests when the browser requests the same javascript file, your server will return a 304 not modified header and the browser will use the cached version. However if your "packing" the javascript files together on every request, the browser will re-download the file on every page load.