I was just reading over this thread where the pros and cons of using include_once and require_once were being debated. From that discussion (particularly Ambush Commander's answer), I've taken away the fact(?) that any sort of include in PHP is inherently expensive, since it requires the processor to parse a new file into OP codes and so on.
This got me to thinking.
I have written a small script which will "roll" a number of Javascript files into one (appending the all contents into another file), such that it can be packed to reduce HTTP requests and overall bandwidth usage.
Typically for my PHP applications, I have one "includes.php" file which is included on each page, and that then includes all the classes and other libraries which I need. (I know this isn't probably the best practise, but it works - the __autoload feature of PHP5 is making this better in any case).
Should I apply the same "rolling" technique on my PHP files?
I know of that saying about premature optimisation being evil, but let's take this question as theoretical, ok?
There is a problem with Apache/PHP on Windows which causes the application to be extremely slow when loading or even touching too many files (page which loads approx. 50-100 files may spend few seconds only with file business). This problem appears both with including/requiring and working with files (fopen, file_get_contents etc).
So if you (or more likely anybody else, due to the age of this post) will ever run your app on apache/windows, reducing the number of loaded files is absolutely necessary for you. Combine more PHP classes into one file (an automated script for it would be useful, I haven't found one yet) or be careful to not touch any unneeded file in your app.
That would depend somewhat on whether it was more work to parse several small files or to parse one big one. If you require files on an as-needed basis (not saying you necessarily should do things that way ) then presumably for some execution paths there would be considerably less compilation required than if all your code was rolled into one big PHP file that the parser had to encode the entirety of whether it was needed or not.
In keeping with the question, this is thinking aloud more than expertise on the internals of the PHP runtime, - it doesn't sound as though there is any real world benefit to getting too involved with this at all. If you run into a serious slowdown in your PHP I would be very surprised if the use of require_once turned out to be the bottleneck.
As you've said: "premature optimisation ...". Then again, if you're worried about performance, use an opcode cache like APC, which makes this problem almost disappear.
This isn't an answer to your direct question, just about your "js packing".
If you leave your javascript files alone and allow them to be included individually in the HTML source, the browser will cache those files. Then on subsequent requests when the browser requests the same javascript file, your server will return a 304 not modified header and the browser will use the cached version. However if your "packing" the javascript files together on every request, the browser will re-download the file on every page load.
Related
Alternatively does it copy the compiled code from cache (APC) or load it from disk for each call.
The reason I am asking this is that I have a large data structure that I have initialized in a file. Now I worry about the performance of the script.
PHP is actually very efficient at dealing with large data-structures. It is, however, always going to store them in memory (which is not shared between calls). If it is large enough, you might want to consider buffering it piece by piece or storing it in a datastore of some sort. If your data file is 100MiB, you're going to be loading at least 100MiB plus the memory required by PHP with every call.
APC wont entirely help this situation either. When PHP loads initially (without APC), it will perform the following steps:
Read the entire file into memory
Lexing, it tokenizes the file into standard codes or "Lexicons" for the parser to read
Parsing, it then utilizes the tokens in the file to generate the required expressions for complication
Compiling, It takes the expressions and creates "opt-codes" similar to how Java "compiles"
Executing, the opt-codes are executed in the PHP runtime (data actually gets manipulated here)
You might have noticed that steps 1-4 are all redundant with multiple calls which is why compiled languages have a dedicated compiler to perform these steps and either a Runtime, VM, or OS to run the generated bytecode or binary on. APC actually tries to give PHP that same edge: by precompiling each file, it can then store the (typically smaller) precompiled, opt-code file into memory and access it when someone accesses the page.
The problem with your use-case is that this does absolutely nothing for literal data within a file. Data still must be declared and wont even be touched until step 5, which is why I am emphasizing the importance of perhaps using an external data store if you see a significant performance hit.
Please use a profile like XDebug or something similar to gain some more insight on what is actually slowing your script down (if at all) so you can make a more informed decision on where to go from here.
I'm developing a web application and I'm using a file called functions.php in which I stored all the functions related to my application, currently the file has about 1500 lines of code with over 30 functions.
I was wondering if it is a problem, could it slow down the process when calling functions? Should I make other files to move some of the functions there?
In most cases, the number of lines in the .php script isn't going to affect the speed of the program nearly as much as the code itself. If execution time is your number one concern, then optimizing your code should be your number one priority. Start with the functions that are called the most, and make sure the code there is as tight as possible.
Splitting the functions up into different files would technically make the script slower since the interpreter would have to do disk I/O to parse the files. But the speed hit would be infinitesimal, so I'd argue that splitting them up might save time in the long run since it'll be easier to debug and optimize if you're not always staring down a huge file with 30 functions in it.
Finally, if you do have to use a bunch of huge .php files in your app, you might want to look into using something like Zend that will compile your scripts on the server.
If you only use one or two functions per page, would it be faster to split them up into separate files and only include the ones you need? Yes, because PHP needs to read through the whole file to include it. Will it make a noticeable enough difference that it's worth splitting the file? In most cases the answer's probably no.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
To give some context:
I had a discussion with a colleague recently about the use of Autoloaders in PHP. I was arguing in favour of them, him against.
My point of view is that Autoloaders can help you minimise manual source dependency which in turn can help you reduce the amount of memory consumed when including lots of large files that you may not need.
His response was that including files that you do not need is not a big problem because after a file has been included once it is kept in memory by the Apache child process and this portion of memory will be available for subsequent requests. He argues that you should not be concerned about the amount of included files because soon enough they will all be loaded into memory and used on-demand from memory. Therefore memory is less of an issue and the overhead of trying to find the file you need on the filesystem is much more of a concern.
He's a smart guy and tends to know what he's talking about. However, I always thought that the memory used by Apache and PHP was specific to that particular request being handled.
Each request is assigned an amount of memory equal to memory_limit PHP option and any source compilation and processing is only valid for the life of the request.
Even with op-code caches such as APC, I thought that the individual request still needs to load up each file in it's own portion of memory and that APC is just a shortcut to having it pre-compiled for the responding process.
I've been searching for some documentation on this but haven't managed to find anything so far. I would really appreciate it if someone can point me to any useful documentation on this topic.
UPDATE:
Just to clarify, the autoloader discussion part was more of a context :).
It may not have been clear but my main question is about whether Apache will pool together its resources to respond to multiple requests (especially memory used by included files), or whether each request will need to retrieve the code required to satisfy the execution path in isolation from other requests handled from the same process.
e.g.:
Files 1, 2, 3 and 4 are an equal size of 100KB each.
Request A includes file 1, 2 and 3.
Request B includes file 1, 2, 3 and 4.
In his mind he's thinking that Request A will consume 300KB for the entirety of it's execution and Request B will only consume a further 100KB because files 1,2 and 3 are already in memory.
In my mind it's 300KB and 400KB because they are both being processed independently (if by the same process).
This brings him back to his argument that "just include the lot 'cos you'll use it anyway" as opposed to my "only include what you need to keep the request size down".
This is fairly fundamental to how I approach building a PHP website, so I would be keen to know if I'm off the mark here.
I've also always been of the belief that for large-scale website memory is the most precious resource and more of a concern than file-system checks for an autoloader that are probably cached by the kernel anyway.
You're right though, it's time to benchmark!
Here's how you win arguments: run realistic benchmark, and be on the right side of the numbers.
I've had this same discussion, so I tried an experiment. Using APC, I tried a Kohana app with a single monolithic include (containing all of Kohana) as well as with the standard autoloader. The final result was that the single include was faster at a statistically irrelevant rate (less than 1%) but used slightly more memory (according to PHP's memory functions). Running the test without APC (or XCache, etc) is pointless, so I didn't bother.
So my conclusion was to continue use autoloading because it's much simpler to use. Try the same thing with your app and show your friend the results.
Now you don't need to guess.
Disclaimer: I wasn't using Apache. I cannot emphasize enough to run your own benchmarks on your own hardware on your own app. Don't trust that my experience will be yours.
You are the wiser ninja, grasshopper.
Autoloaders don't load the class file until the class is requested. This means that they will use at most the same amount memory as manual includes, but usually much less.
Classes get read fresh from file each request even if an apache thread can handle multiple requests, so your friends 'eventuall all are read' doesn't hold water.
You can prove this by putting an echo 'foo'; above the class definition in the class file. You'll see on each new request the line will be executed regardless of if you autoload or manually include the whole world of class files at start.
I couldn't find any good concise documentation on this--i may write some with some memory usage examples--as i also have had to explain this to others and show evidence to get it to sink in. I think the folks at zend didn't think anyone would not see the benifits of autoloading.
Yes, apc and such (like all caching solutions) can overcome the resouce negatives and even eek out small gains in performance, but you eat up lots of unneeded memory if you do this on a non-trivial number of libraries and serving a large number of clients. Try something Like loading a healthy chunk of the pear libraries in a massive include file while handling 500 connections hitting your page at the same time.
Even using things like Apc you benefit from using autoloaders with any non-namespaced classes (most of the existing php code currently) as it can help avoid global namespace pollution when dealing with large umbers of class libraries.
This is my opionion.
I think autoloaders are a very bad idea for the following reasons
I like to know what and where my scripts are grabbing the data/code from. Makes debugging easier.
This also has configuration problems in so far as if one of your developers changes the file (upgrade etc) or configuration and things stop working it is harder to find out where it is broken.
I also think that it is lazy programming.
As to memory/preformance issues it is just as cheap to buy some more memory for the computer if it is struggling with that.
I'm working on a webapp that uses a lot of ajax to display data and I'm wondering if I could get any advice on what else I could do to speed up the app, and reduce bandwidth, etc.
I'm using php, mysql, freeBSD, Apache, Tomcat for my environment. I own the server and have full access to all config files, etc.
I have gzip deflate compression turned on in the apache http.conf file. I have obfuscated and minified all the .js and .css files.
My webapp works in this general manner. After login the user lands on the index.php page. All links on the index page are ajax calls to read a .php class function that will retrieve the html in a string and display it inside a div somewhere on the main index.php page.
Most of the functions returning the html are returning strings like:
<table>
<tr>
<td>Data here</td>
</tr>
</table>
I don't return the full "<html><head>" stuff, because it already exists in the main index.php page.
However, the html strings returned are formatted with tabs, spaces, comments, etc. for easy reading of the code. Should I take the time to minify these pages and remove the tabs, comments, spaces? Or is it negligible to minify the .php pages because its on the server?
I guess I'm trying to figure out if the way I've structured the webapp is going to cause bandwidth issues and if I can reduce the .php class file size could I improve some performance by reducing them. Most of the .php classes are 40-50KB with the largest being 99KB.
For speed, I have thought about using memcache, but don't really know if adding it after the fact is worth it and I don't quite know how to implement it. I don't know if there is any caching turned on on the server...I guess I have left that up to the browser...I'm not very well versed in the caching arena.
Right now the site doesn't appear slow, but I'm the only user...I'm just wondering if its worth the extra effort.
Any advice, or articles would be appreciated.
Thanks in advance.
My recommendation would be to NOT send the HTML over the AJAX calls. Instead, send just the underlying data ("Data here" part) through JSON, then process that data through a client-side function that would decorate it with the right HTML, then injecting it into the DOM. This will drastically speed up the Ajax calls.
Memcache provides an API that allows you to cache data. What you additionally need (and in my opinion more important is) is a strategy about what to cache and when to invalidate the cache. This cannot be determined by looking at the source code, it comes from how your site is used.
However, an opcode cache (e.g. APC) could be used right away.
Code beautifier is for human not for machine.
As part of the optimization you should take off.
Or simply add a flag checking in your application, certain condition match (like debug mode), it return nicely formatted javascript. Otherwise, whitespace does not mean anything to machine.
APC
You should always use APC to compile & cache php script into op-code.
Why?
changes are hardly make after deployment
if every script is op-code ready, your server does not required to compile plain-text script into binary op-code on the fly
compile once and use many
What are the benefits?
lesser execution cycle to compile plain-text script
lesser memory consume (both related)
a simple math, if a request served in 2 seconds in your current environment, now with APC is served in 0.5 seconds, you gain 4 times better performance, 2 seconds with APC can served 4 requests. That's mean previously you can fit 50 concurrent users, now you can allow 200 concurrent users
Memcache - NO GO?
depends, if you are in single host environment, probably not gain any better. The biggest advantages of memcache is for information sharing & distribution (which mean multiple server environment, cache once and use many).
etc?
static files with expiration header (prime cache concept, no request is fastest, and save bandwidth)
cache your expensive request into memcache/disk-cache or even database (expensive request such as report/statistics generation)
always review your code for best optimization (but do not over-do)
always do benchmark and compare the results (was and current)
fine-tune your apache/tomcat configuration
consider to re-compile PHP with minimum library/extension and load the necessary libraries during run-time only (such as application using mysqli, not using PDO, no reason to keep it)
I've just noticed that my app is including over 148 php files on one page. Bear in mind this is the back end admin and not the main site, but is this too many? What impact does a large number of includes have on a server, both whilst under average load and stressed? Would disk I/o be a problem?
Included File Stats
File Type - Include Count - Combined File Size
Index - 1 - 0.00169 MB
Bootstrap - 1 - 0.01757 MB
Helper - 98 - 0.58557 MB - (11 are Profiler related classes)
Configuration - 8 - 0.00672 MB
Data Store - 23 - 0.10836 MB
Action - 8 - 0.02652 MB
Page - 1 - 0.00094 MB
I18n Resource - 7 - 0.00870 MB
Vendor Library - 1 - 0.02754 MB
Total Files - 148 - 0.78362 MB
Time ran 0.123920917511
Memory used 2.891 MB
Edit 1. Should be noted that this is a worst case scenario page. It has many different template models, controllers and associated views because it handles publishing with custom fields.
Edit 2. Also the frontend has agressive page caching so the number of includes in the front is roughly 30-40 at the moment.
Edit 3. Profiler when turned off won't include the files so this will reduce quite a few includes
So, here's a breakdown of the potential problems.
The number of files itself is an issue. Unless you're using a bytecode cache (and you are), and that cache is configured to not stat the file prior to pulling in the compiled bytecode, PHP is going to stat every single one of those files on include, then read them in. In some cases, that can also mean path resolution and a naive autoloader that pokes and prods at numerous directories. This won't be "slow" because the OS will surely have things cached if the files are hit frequently, but it does add precious milliseconds to each request.
If every autoloader is designed properly and the codebase relies entirely on the autoloader to pull in the required classes (meaning nothing uses include/require/include_once/require_once on a class file), you can avoid having to open and read many of the files by gluing every single class together into a single large include. This is a bit on the impractical side of things, mainly because if there is no bytecode cache, PHP still has to parse, compile and interpret it all. Additionally, not every class is going to be used on every request, so it may be a bit wasteful.
The bottom line is that a well-configured bytecode cache will completely mitigate this problem. There's nothing wrong with telling your customers that they have to properly configure their servers for optimal performance. If they know what they're doing, they'll have everything correct to begin with.
Yes, so many files can be a problem.
No, it is probably not a problem in your case, since this is only a back-end, which is probably accessed by a few people, and not too often.
In general, I would discourage having more than 20 PHP files called on each page. This is because even the website and the server are highly optimized, for every page, the server must go and look at every file to see at least if it changed since the last request (if there is no cache implemented on this level).
Even if the time to access a file is tiny, it is a time you are loosing at each request. This tiny period of time multiplied by 148 can become an issue (and a huge scalability problem).
When I worked on a PHP framework project, I used a trick to reduce the number of files. Several files were combined to one minified file, and this single file was cached. Then, if there was a need to update the framework or the website, the cached file was automatically removed, then rebuilt.
Even if I personally discourage you to minify the source code (because it is difficult to do, difficult to test, and creates a bunch of problems, like the meaningless numbers of lines in errors), you can probably do the same thing by combining all your files into a single file.
Be careful: if a page A uses half of those files, and page B - another half, combining everything will probably decrease the performance, since PHP engine will have to parse more code.
Are the includes themselves doing something fancy, like db queries? And are they all at the top of the page, or are they included as-needed?
Those stats don't look bad, so, if admin access is infrequent, you may be ok. But you should examine this from a design angle: can things can be organized in a way that would prevent you from having to maintain so many includes? Separate from any performance issues, there is a risk here of creating hard-to-track dependency bugs.
(It could be as MainMa said, related to a framework, in which case you may have no control over the above. I only mention it in case you do.)
A couple things in case you didn't know already:
If it's just text or static HTML, you
can get the contents with
file_get_contents(), readfile(), etc. This is
somewhat faster because the loaded
file doesn't need parsing. But
obviously if it contains PHP code
this won't help.
You can use
include_once() to prevent the same
file from being included twice (if, for instance, it's included by two files
that are themselves included by the top level file).
Disk I/O won't be your problem. The system will cache frequently accessed files in RAM, or if they aren't that frequently accessed, it won't matter.
Load times may be an issue, as each file has to be requested and interpreted by the server separately.
I don't know how the web server will cope with the many requests; it may not care. If the client doesn't do pipelined requests though, you'll pay for many many TCP connections built up and torn down, which also costs a goodly amount of latency.
Honestly, don't worry about it - 148 is nothing, even if 0 caching happened at php side you're going to be hitting fs caches almost everytime - and in the grand scheme of things virtually every opensource anything out there has way more files without a problem (drupal, wordpress, joomla, elgg, anything).
Really, no problem here - even if you managed to shave a millisecond here or there off, it's so far down the priority list and places where you can make speed gains it's barely worth considering for more than a second.
caveat: do try to use require_once and include_once where suited and ensure you only load those classes/files that are needed for a given request to process.