I've just noticed that my app is including over 148 php files on one page. Bear in mind this is the back end admin and not the main site, but is this too many? What impact does a large number of includes have on a server, both whilst under average load and stressed? Would disk I/o be a problem?
Included File Stats
File Type - Include Count - Combined File Size
Index - 1 - 0.00169 MB
Bootstrap - 1 - 0.01757 MB
Helper - 98 - 0.58557 MB - (11 are Profiler related classes)
Configuration - 8 - 0.00672 MB
Data Store - 23 - 0.10836 MB
Action - 8 - 0.02652 MB
Page - 1 - 0.00094 MB
I18n Resource - 7 - 0.00870 MB
Vendor Library - 1 - 0.02754 MB
Total Files - 148 - 0.78362 MB
Time ran 0.123920917511
Memory used 2.891 MB
Edit 1. Should be noted that this is a worst case scenario page. It has many different template models, controllers and associated views because it handles publishing with custom fields.
Edit 2. Also the frontend has agressive page caching so the number of includes in the front is roughly 30-40 at the moment.
Edit 3. Profiler when turned off won't include the files so this will reduce quite a few includes
So, here's a breakdown of the potential problems.
The number of files itself is an issue. Unless you're using a bytecode cache (and you are), and that cache is configured to not stat the file prior to pulling in the compiled bytecode, PHP is going to stat every single one of those files on include, then read them in. In some cases, that can also mean path resolution and a naive autoloader that pokes and prods at numerous directories. This won't be "slow" because the OS will surely have things cached if the files are hit frequently, but it does add precious milliseconds to each request.
If every autoloader is designed properly and the codebase relies entirely on the autoloader to pull in the required classes (meaning nothing uses include/require/include_once/require_once on a class file), you can avoid having to open and read many of the files by gluing every single class together into a single large include. This is a bit on the impractical side of things, mainly because if there is no bytecode cache, PHP still has to parse, compile and interpret it all. Additionally, not every class is going to be used on every request, so it may be a bit wasteful.
The bottom line is that a well-configured bytecode cache will completely mitigate this problem. There's nothing wrong with telling your customers that they have to properly configure their servers for optimal performance. If they know what they're doing, they'll have everything correct to begin with.
Yes, so many files can be a problem.
No, it is probably not a problem in your case, since this is only a back-end, which is probably accessed by a few people, and not too often.
In general, I would discourage having more than 20 PHP files called on each page. This is because even the website and the server are highly optimized, for every page, the server must go and look at every file to see at least if it changed since the last request (if there is no cache implemented on this level).
Even if the time to access a file is tiny, it is a time you are loosing at each request. This tiny period of time multiplied by 148 can become an issue (and a huge scalability problem).
When I worked on a PHP framework project, I used a trick to reduce the number of files. Several files were combined to one minified file, and this single file was cached. Then, if there was a need to update the framework or the website, the cached file was automatically removed, then rebuilt.
Even if I personally discourage you to minify the source code (because it is difficult to do, difficult to test, and creates a bunch of problems, like the meaningless numbers of lines in errors), you can probably do the same thing by combining all your files into a single file.
Be careful: if a page A uses half of those files, and page B - another half, combining everything will probably decrease the performance, since PHP engine will have to parse more code.
Are the includes themselves doing something fancy, like db queries? And are they all at the top of the page, or are they included as-needed?
Those stats don't look bad, so, if admin access is infrequent, you may be ok. But you should examine this from a design angle: can things can be organized in a way that would prevent you from having to maintain so many includes? Separate from any performance issues, there is a risk here of creating hard-to-track dependency bugs.
(It could be as MainMa said, related to a framework, in which case you may have no control over the above. I only mention it in case you do.)
A couple things in case you didn't know already:
If it's just text or static HTML, you
can get the contents with
file_get_contents(), readfile(), etc. This is
somewhat faster because the loaded
file doesn't need parsing. But
obviously if it contains PHP code
this won't help.
You can use
include_once() to prevent the same
file from being included twice (if, for instance, it's included by two files
that are themselves included by the top level file).
Disk I/O won't be your problem. The system will cache frequently accessed files in RAM, or if they aren't that frequently accessed, it won't matter.
Load times may be an issue, as each file has to be requested and interpreted by the server separately.
I don't know how the web server will cope with the many requests; it may not care. If the client doesn't do pipelined requests though, you'll pay for many many TCP connections built up and torn down, which also costs a goodly amount of latency.
Honestly, don't worry about it - 148 is nothing, even if 0 caching happened at php side you're going to be hitting fs caches almost everytime - and in the grand scheme of things virtually every opensource anything out there has way more files without a problem (drupal, wordpress, joomla, elgg, anything).
Really, no problem here - even if you managed to shave a millisecond here or there off, it's so far down the priority list and places where you can make speed gains it's barely worth considering for more than a second.
caveat: do try to use require_once and include_once where suited and ensure you only load those classes/files that are needed for a given request to process.
Related
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
To give some context:
I had a discussion with a colleague recently about the use of Autoloaders in PHP. I was arguing in favour of them, him against.
My point of view is that Autoloaders can help you minimise manual source dependency which in turn can help you reduce the amount of memory consumed when including lots of large files that you may not need.
His response was that including files that you do not need is not a big problem because after a file has been included once it is kept in memory by the Apache child process and this portion of memory will be available for subsequent requests. He argues that you should not be concerned about the amount of included files because soon enough they will all be loaded into memory and used on-demand from memory. Therefore memory is less of an issue and the overhead of trying to find the file you need on the filesystem is much more of a concern.
He's a smart guy and tends to know what he's talking about. However, I always thought that the memory used by Apache and PHP was specific to that particular request being handled.
Each request is assigned an amount of memory equal to memory_limit PHP option and any source compilation and processing is only valid for the life of the request.
Even with op-code caches such as APC, I thought that the individual request still needs to load up each file in it's own portion of memory and that APC is just a shortcut to having it pre-compiled for the responding process.
I've been searching for some documentation on this but haven't managed to find anything so far. I would really appreciate it if someone can point me to any useful documentation on this topic.
UPDATE:
Just to clarify, the autoloader discussion part was more of a context :).
It may not have been clear but my main question is about whether Apache will pool together its resources to respond to multiple requests (especially memory used by included files), or whether each request will need to retrieve the code required to satisfy the execution path in isolation from other requests handled from the same process.
e.g.:
Files 1, 2, 3 and 4 are an equal size of 100KB each.
Request A includes file 1, 2 and 3.
Request B includes file 1, 2, 3 and 4.
In his mind he's thinking that Request A will consume 300KB for the entirety of it's execution and Request B will only consume a further 100KB because files 1,2 and 3 are already in memory.
In my mind it's 300KB and 400KB because they are both being processed independently (if by the same process).
This brings him back to his argument that "just include the lot 'cos you'll use it anyway" as opposed to my "only include what you need to keep the request size down".
This is fairly fundamental to how I approach building a PHP website, so I would be keen to know if I'm off the mark here.
I've also always been of the belief that for large-scale website memory is the most precious resource and more of a concern than file-system checks for an autoloader that are probably cached by the kernel anyway.
You're right though, it's time to benchmark!
Here's how you win arguments: run realistic benchmark, and be on the right side of the numbers.
I've had this same discussion, so I tried an experiment. Using APC, I tried a Kohana app with a single monolithic include (containing all of Kohana) as well as with the standard autoloader. The final result was that the single include was faster at a statistically irrelevant rate (less than 1%) but used slightly more memory (according to PHP's memory functions). Running the test without APC (or XCache, etc) is pointless, so I didn't bother.
So my conclusion was to continue use autoloading because it's much simpler to use. Try the same thing with your app and show your friend the results.
Now you don't need to guess.
Disclaimer: I wasn't using Apache. I cannot emphasize enough to run your own benchmarks on your own hardware on your own app. Don't trust that my experience will be yours.
You are the wiser ninja, grasshopper.
Autoloaders don't load the class file until the class is requested. This means that they will use at most the same amount memory as manual includes, but usually much less.
Classes get read fresh from file each request even if an apache thread can handle multiple requests, so your friends 'eventuall all are read' doesn't hold water.
You can prove this by putting an echo 'foo'; above the class definition in the class file. You'll see on each new request the line will be executed regardless of if you autoload or manually include the whole world of class files at start.
I couldn't find any good concise documentation on this--i may write some with some memory usage examples--as i also have had to explain this to others and show evidence to get it to sink in. I think the folks at zend didn't think anyone would not see the benifits of autoloading.
Yes, apc and such (like all caching solutions) can overcome the resouce negatives and even eek out small gains in performance, but you eat up lots of unneeded memory if you do this on a non-trivial number of libraries and serving a large number of clients. Try something Like loading a healthy chunk of the pear libraries in a massive include file while handling 500 connections hitting your page at the same time.
Even using things like Apc you benefit from using autoloaders with any non-namespaced classes (most of the existing php code currently) as it can help avoid global namespace pollution when dealing with large umbers of class libraries.
This is my opionion.
I think autoloaders are a very bad idea for the following reasons
I like to know what and where my scripts are grabbing the data/code from. Makes debugging easier.
This also has configuration problems in so far as if one of your developers changes the file (upgrade etc) or configuration and things stop working it is harder to find out where it is broken.
I also think that it is lazy programming.
As to memory/preformance issues it is just as cheap to buy some more memory for the computer if it is struggling with that.
I am now writing a php framework. I am wondering whether it will slow down when php require/include or require_once/include_once too many files during a request?
Well of course it will. Doing anything too many times will cause a slow down.
On a more serious note though, IO operations that touch disk are very slow compared to anything that happens in memory. So often times, including files will be a major performance factor when using a large framework (just look at Zend Framework...).
However, there are typically ways to alleviate this such as APC and similar op code caches.
Sometimes programming approaches are also taken. For example, if I remember correctly, Doctrine 1 has the capability to bundle everything into 1 giant file as to have fewer IO calls.
If in doubt, do some indepth profiling of an application written with your framework and see if include/require/etc are one of the major slow points.
Yes, this will slow your application down. *_once calls are generally more expensive, since it must be checked whether that file has already been included. With a lot of includes, there is a lot of hard disk access and a lot of memory usage bundled. I've developed applications with the Zend Framework that include a total of 150 to 200 files at each request - you really can see the impact that has on the overall performance.
The more files you include will add to some load. However, if you have to choose between require and require_once, require_once / include_once take more load because a check will need to be done by the server to see if the same file has been included elsewhere. So if you could possibly avoid that, at least you could boost performance.
Unless you use cache libraries, everytime a request comes those files would be included again and again. Surely it would slow things down. Create a framework that only include-s what needs to be include-ed.
Just wondering if anyone has information on what "costs" are associated with including a LARGE (600K or more) php file containing 100s of class files. Does it really make much difference in comparison to autoloading individual files that for instance searches across several directories before finding a match?
Would having APC caching on make this cost negligible?
Basically, the cost of including one big file depend on your usecase. Let's say you have a large file with 200 classes.
If you only use 1 class, including the large file will be more expensive than including a small class file for that individual class.
If you use all 200 classes, including the large file will be significantly less expensive than including 200 small files.
Where the cutoff lies is really system dependent. I would imaging that it would be somewhere around the 50% mark (where if you're using less than 100 classes in any one request, autoload).
And using APC will likely shift the breakeven point closer to less classes (so without, 100 classes used might be the breakeven point, but with it might be at 50 classes used) since it makes the large single include much cheaper, but only lowers the overhead of each individual smaller include slightly.
The exact break-even points will be 100% system dependent (how fast is your disk I/O, how fast are your processors, how much memory, etc). So the only way to know for sure on your platform is to test.
However, more is at stake than raw performance. Maintainability will suffer with one large file since it's harder to work on multiple classes at the same time (tabs in an IDE become useless). I personally would keep all the classes in separate files and make my life as the developer easier rather than making one giant monstrosity of a file.
Now, if you have facebook traffic levels, it may be worth investigating further. But if you're not, I personally wouldn't worry about it...
I have conducted some tests on the various cost(s) of php include() which I'd like to share, as I see many programmers or CMS platforms overlooking these pre-runtime php costs.
The cost of the function itself is quite negligible. 100 file includes (with empty files) costs about 5ms; and no more than one microsecond when using an opcache.
So the cost savings of including a larger php file containing 100 classes, as opposed to 100 separate file includes, is only about 5ms. And using an OpCode cache makes that cost irrelevant.
The real cost come with the size of your files, and what PHP has to parse and/or compile. For a better idea of what those cost are, here are test results I conducted on a 2010 Mac Mini Server, with a 10,000 RPM drive, running PHP 5.3 with an optimizer enabled eAccelerator opcache.
1µs for 100 EMPTY File includes, w/opcache
5ms for 100 EMPTY File includes, no opcache
7ms for 100 32KB File includes, w/opcache
30ms for 100 32KB File includes, no opcache
14ms for 100 64KB File includes, w/opcache
60ms for 100 64KB File includes, no opcache
22ms for 100 128KB File includes, w/opcache
100ms for 100 128KB File includes, no opcache
38ms for 100 200KB File includes, w/opcache
170ms for 100 200KB File includes, no opcache
Therefore, a 600KB php file roughly cost 6ms, or about 1ms when using an opcode cache. What you really want to watch instead is the size of all code included per request.
Merging file in combos to try and save resources is definitely not a good idea and would be a mistake when using an op-cache. My test doesn't account for disk speed very much if at all, as I included the same file 100 times. That said I don't feel the need to cover disk I/O at all, because having an op-cache installed is really a prerequisite in term of basic performance.
To gain performance as much as possible and save RAM usage, one must do the opposite. Which is to split files contextually as much as possible, with the use of an autoloader or a class factory pattern, to include as little unused code as possible for each and every request.
To that effect, misusing include_once() can also have negative performance consequences...
In regards to your base classes. I have similar circumstances, but I only include a tiny portion of the table schema. Mainly the field types and primary key details. For performance reasons, I purposely do not include the quite heavy schema of the tables all the time, because they are rarely used, and when they are, I use only a couple of them maximum per request.
The average full column details of a table being roughly 20-50k per schema arrays. Including 10-15 of them on any given request cost just about 1-3 ms for the arrays. Which in itself, is not much. But it becomes worthwhile when combined with a 500k RAM saving per request.
APC will save you a lot, but I don't know if it will be negligible if your source is 600k. That is about 15000 lines of code? Not that much for a website, but quite large for a single file.
You'd rather use a more dynamic approach and isolation specific functionality in specific classes. Then, for each page, you can choose which code is needed.
Especially when you use APC, this approach will be better, because you don't have the overhead of file I/O which you will have when you load many small files from disk. I would choose to implement small, specified classes and put each of those in a separate file. You can use the PHP class loading mechanism (__autoload) to automatically load the right units.
When you figure out a good naming convention for your classes and units, this will make your development a lot easier.
By "common script startup sequence", what I mean is that in the majority of pages on my site, the first order of business is to consult 3 specific files (via include()), which centrally define constants, certain functions used in many scripts, and a class or two, as well as providing the database credentials. I don't know if there's a more standard term for such a setup.
What I want to know is whether it's possible to have too many of these and make things slower as a result. I know that using include() has a certain amount of overhead because it's another file to look for in the filesystem, parse, and execute. If there is such a thing as too many includes, I want to know whether I am anywhere near that point. N.B. Some of my pages include() still more scripts that they specifically, individually need (for example, a script that defines a function used by only a few pages), and I do not count these occasional extra includes, which are used reasonably sparingly anyway. I'm only worrying about the 3 includes that occur on the majority of pages and set everything up.
What are the 3 includes?
Two of them are outside of webroot. common.php defines a bunch of functions, classes and other things that do not vary between the development and production sites. config.php defines various constants and paths that are different in the development and production sites (which database to connect to, among other things). Of course, it's desirable for this file in particular to be outside of webroot. config.php include()s common.php at the bottom.
The other one is inside webroot and contains a single line:
include [path to appropriate directory]/config.php
The directory differs between the development and production sites.
(Feel free to question the rationale behind setting up the includes this way, but I feel that this does provide a good, reliable system for preparing to execute each page, and my question is about whether it is bad to have that many includes as a baseline on each page.)
Use APC and your worries go away. The opcode of your files will be cached in the RAM and everything will go super fast. :) Facebook does this so it'll definitely help you to scale.
Because you may not notice any difference between 1 include or 50 in terms of speed, but for an application with high concurrency, I/O can be a huge bottleneck. So the key is not speed, but scaling.
The best thing to do is use an accelerator of some kind, APC or eAccelerator or something like this to keep them cached in RAM. The reasons behind this are quite a few and on a busy site it means a lost.
For example a friend did an experiment on his website which has about 15k users a day and average page load time of 0.03s. He removed most of the includes which he used as templates - the average load time dropped to 0.01 secs. Then he put an accelerator - 0.002 secs per page. I hope those numbers convince you that includes must be kept as little as possible on busy sites if you don't use an accelerator of some kind.
This is because of the high I/O which is needed to scan directories, find the files, open them, read them and so on.
So keep the includes to minimum. Study the most important parts of your site and optimize there by moving required parts to general includes and so on.
I dont believe the performance has anything do with no of includes, because think of a case where one included file contains 500 lines of codes and in another case you have 50 included files with just one line of code each.
Or if you by any chance using Windows as OS, you can use WinCache.
http://php.net/manual/en/book.wincache.php
I was just reading over this thread where the pros and cons of using include_once and require_once were being debated. From that discussion (particularly Ambush Commander's answer), I've taken away the fact(?) that any sort of include in PHP is inherently expensive, since it requires the processor to parse a new file into OP codes and so on.
This got me to thinking.
I have written a small script which will "roll" a number of Javascript files into one (appending the all contents into another file), such that it can be packed to reduce HTTP requests and overall bandwidth usage.
Typically for my PHP applications, I have one "includes.php" file which is included on each page, and that then includes all the classes and other libraries which I need. (I know this isn't probably the best practise, but it works - the __autoload feature of PHP5 is making this better in any case).
Should I apply the same "rolling" technique on my PHP files?
I know of that saying about premature optimisation being evil, but let's take this question as theoretical, ok?
There is a problem with Apache/PHP on Windows which causes the application to be extremely slow when loading or even touching too many files (page which loads approx. 50-100 files may spend few seconds only with file business). This problem appears both with including/requiring and working with files (fopen, file_get_contents etc).
So if you (or more likely anybody else, due to the age of this post) will ever run your app on apache/windows, reducing the number of loaded files is absolutely necessary for you. Combine more PHP classes into one file (an automated script for it would be useful, I haven't found one yet) or be careful to not touch any unneeded file in your app.
That would depend somewhat on whether it was more work to parse several small files or to parse one big one. If you require files on an as-needed basis (not saying you necessarily should do things that way ) then presumably for some execution paths there would be considerably less compilation required than if all your code was rolled into one big PHP file that the parser had to encode the entirety of whether it was needed or not.
In keeping with the question, this is thinking aloud more than expertise on the internals of the PHP runtime, - it doesn't sound as though there is any real world benefit to getting too involved with this at all. If you run into a serious slowdown in your PHP I would be very surprised if the use of require_once turned out to be the bottleneck.
As you've said: "premature optimisation ...". Then again, if you're worried about performance, use an opcode cache like APC, which makes this problem almost disappear.
This isn't an answer to your direct question, just about your "js packing".
If you leave your javascript files alone and allow them to be included individually in the HTML source, the browser will cache those files. Then on subsequent requests when the browser requests the same javascript file, your server will return a 304 not modified header and the browser will use the cached version. However if your "packing" the javascript files together on every request, the browser will re-download the file on every page load.