Clearstatcache: what is the point of caching file info? - php

As we know, return values of file_exists(), filesize() etc. functions are cached by PHP, causing considerable annoyance to developers. I often see and hear advices like "you must place clearstatcache() before your file info calls" or "write your own real_filesize() and place clearstatcache() at the first line". I have seen a lot of code filled with a lot of clearstatcache() calls. Additionally, it is not possible to clear the cache per file, you must clear the whole cache every time.
A real software either a) requests for file information rarely, or b) needs always fresh information. If someone really need for caching, they can easily implement it with a little bit of coding.
So, currently I can only see the downside of this caching. I think, filestat caching is one of the major broken things in PHP, and it is included in PHP7 too. The question is to the one who knows: what are the benefits of caching file informations in this unusable way?

Related

PHP auto-prepend buggy after out of memory error

This may be better suited to server fault but I thought I'd ask here first.
We have a file that is prepended to every PHP file on our servers using auto-prepend that contains a class called Bootstrap that we use for autoloading, environment detection, etc. It's all working fine.
However, when there is an "OUT OF MEMORY" error directly preceding (i.e., less than a second or even at the same time) a request to another file on the same server, one of three things happens:
Our code for checking if(class_exists('Bootstrap'), which we used to wrap the class definition when we first got this error, returns true, meaning that class has already been declared despite this being the auto-prepend file.
We get a "cannot redeclare class Bootstrap" error from our auto-prepended file, meaning that class_exists('Bootstrap') returned false but it somehow was still declared.
The file is not prepended at all, leading to a one-time fatal error for files that depend on it.
We could, of course, try to fix the out of memory issues since those seem to be causing the other errors, but for various reasons, they are unfixable in our setup or very difficult to fix. But that's beside the point - it seems to me that this is a bug in PHP with some sort of memory leak causing issues with the auto-prepend directive.
This is more curiosity than anything since this rarely happens (maybe once a week on our high-traffic servers). But I'd like to know - why is this happening, and what can we do to fix it?
We're running FreeBSD 9.2 with PHP 5.4.19.
EDIT: A few things we've noticed while trying to fix this over the past few months:
It seems to only happen on our secure servers. The out of memory issues are predominantly on our secure servers (they're usually from our own employees trying to download too much data), so it could just be a coincidence, but it deserves pointing out
The dump of get_declared_classes when we have this issue contains classes that are not used on the page that is triggering the error. For example, the output of $_SERVER says the person is on xyz.com, but one of the declared classes is only used in abc.com, which is where the out of memory issues usually originate from.
All of this leads me to believe that PHP is not doing proper end-of-cycle garbage collection after getting an out of memory error, which causes the Bootstrap class to either be entirely or partly in memory on the next page request if it's soon enough after the error. I'm not familiar enough with PHP garbage collection to actually act on this, but I think this is most likely the issue.
You might not be able to "fix" the problem without fixing the out of memory issue. Without knowing the framework you're using, I'll just go down the list of areas that come to mind.
You stated "they're usually from our own employees trying to download too much data". I would start there, as it could be the biggest/loudest opportunity for optimizations, a few idea come to mind.
if the data being downloaded is files, perhaps you could use streams to chunk the reads, to a constant size, so the memory is not gobbled up on big downloads.
can you do download queueing, throttling.
if the data is coming from a database, besides optimizing your queries, you could rate limit them, reduce the result set sizes and ideally move such workloads to a dedicated environment, with mirrored data.
ensure your code is releasing file pointers and database connections responsibly, leaving it to PHP teardown, could result in delayed garbage collection and a sort of cascading effect, in high traffic situations.
Other low hanging fruit when it comes to memory limits
you are running php 5.4.19, if your software permits it, consider updating to more resent version "PHP 5.4 has not been patched since 2015" besides PHP 7 comes with a whole slew of performance improvements.
if you have a client side application involved monitor it's xhr and overall network activity, look for excessive polling and hanging connections.
as for your autoloader, based on your comment "The dump of get_declared_classes when we have this issue contains classes that are not used on the page that is triggering the error" you may want to check the implementation, to make sure it's not loading some sort of bundled class cache, if you are using composer, dump-autoload might be helpful.
sessions, I've seen some applications load files based on cookies and sessions, if you have such a setup, I would audit that logic and ensure there are no sticky sessions loading unneeded resources.
It's clear from your question you are running a multi-tenency server. Without proper stats it hard to be more specific, but I would think it's clear the issue is not a PHP issue, as it seems to be somewhat isolated, based on your description.
Proper Debugging and Profiling
I would suggest installing a PHP profiler, even for a short time, new relic is pretty good. You will be able to see exactly what is going on, and have the data to fix the right problem. I think they have a free trial, which should get you pointed in the right direction... There are others too, but their names escape me at the moment.
Even if class_exists returns false, it would never return true if an interface of the same name exists. However, you cannot declare an interface and class of the same name.
Try running class_exists('Bootstrap') && interface_exists('Bootstrap') to make sure you do not redeclare.
Did you have a look at __autoload function?
I believe that you could workaround this issue by creating some function like that in your code:
function __autoload($className)
{
if (\file_exists($className . '.php'))
include_once($className . '.php');
else
eval('class ' . $className . ' { function __call($method, $args) { return false; } }');
}
If you have a file called Bootstrap.php with class Bootstrap declared inside it, PHP will automatically load file, otherwise declare a ghost class that could handle any function call inside it, avoiding any error messages. Note that for ghost function I used __call magic method.

How does apache PHP memory usage really work? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
To give some context:
I had a discussion with a colleague recently about the use of Autoloaders in PHP. I was arguing in favour of them, him against.
My point of view is that Autoloaders can help you minimise manual source dependency which in turn can help you reduce the amount of memory consumed when including lots of large files that you may not need.
His response was that including files that you do not need is not a big problem because after a file has been included once it is kept in memory by the Apache child process and this portion of memory will be available for subsequent requests. He argues that you should not be concerned about the amount of included files because soon enough they will all be loaded into memory and used on-demand from memory. Therefore memory is less of an issue and the overhead of trying to find the file you need on the filesystem is much more of a concern.
He's a smart guy and tends to know what he's talking about. However, I always thought that the memory used by Apache and PHP was specific to that particular request being handled.
Each request is assigned an amount of memory equal to memory_limit PHP option and any source compilation and processing is only valid for the life of the request.
Even with op-code caches such as APC, I thought that the individual request still needs to load up each file in it's own portion of memory and that APC is just a shortcut to having it pre-compiled for the responding process.
I've been searching for some documentation on this but haven't managed to find anything so far. I would really appreciate it if someone can point me to any useful documentation on this topic.
UPDATE:
Just to clarify, the autoloader discussion part was more of a context :).
It may not have been clear but my main question is about whether Apache will pool together its resources to respond to multiple requests (especially memory used by included files), or whether each request will need to retrieve the code required to satisfy the execution path in isolation from other requests handled from the same process.
e.g.:
Files 1, 2, 3 and 4 are an equal size of 100KB each.
Request A includes file 1, 2 and 3.
Request B includes file 1, 2, 3 and 4.
In his mind he's thinking that Request A will consume 300KB for the entirety of it's execution and Request B will only consume a further 100KB because files 1,2 and 3 are already in memory.
In my mind it's 300KB and 400KB because they are both being processed independently (if by the same process).
This brings him back to his argument that "just include the lot 'cos you'll use it anyway" as opposed to my "only include what you need to keep the request size down".
This is fairly fundamental to how I approach building a PHP website, so I would be keen to know if I'm off the mark here.
I've also always been of the belief that for large-scale website memory is the most precious resource and more of a concern than file-system checks for an autoloader that are probably cached by the kernel anyway.
You're right though, it's time to benchmark!
Here's how you win arguments: run realistic benchmark, and be on the right side of the numbers.
I've had this same discussion, so I tried an experiment. Using APC, I tried a Kohana app with a single monolithic include (containing all of Kohana) as well as with the standard autoloader. The final result was that the single include was faster at a statistically irrelevant rate (less than 1%) but used slightly more memory (according to PHP's memory functions). Running the test without APC (or XCache, etc) is pointless, so I didn't bother.
So my conclusion was to continue use autoloading because it's much simpler to use. Try the same thing with your app and show your friend the results.
Now you don't need to guess.
Disclaimer: I wasn't using Apache. I cannot emphasize enough to run your own benchmarks on your own hardware on your own app. Don't trust that my experience will be yours.
You are the wiser ninja, grasshopper.
Autoloaders don't load the class file until the class is requested.  This means that they will use at most the same amount memory as manual includes, but usually much less.
Classes get read fresh from file each request even if an apache thread can handle multiple requests, so your friends 'eventuall all are read' doesn't hold water.
You can prove this by putting an echo 'foo'; above the class definition in the class file. You'll see on each new request the line will be executed regardless of if you autoload or manually include the whole world of class files at start.
I couldn't find any good concise documentation on this--i may write some with some memory usage examples--as i also have had to explain this to others and show evidence to get it to sink in. I think the folks at zend didn't think anyone would not see the benifits of autoloading.
Yes, apc and such (like all caching solutions) can overcome the resouce negatives and even eek out small gains in performance, but you eat up lots of unneeded memory if you do this on a non-trivial number of libraries and serving a large number of clients. Try something Like loading a healthy chunk of the pear libraries in a massive include file while handling 500 connections hitting your page at the same time.
Even using things like Apc you benefit from using autoloaders with any non-namespaced classes (most of the existing php code currently) as it can help avoid global namespace pollution when dealing with large umbers of class libraries.
This is my opionion.
I think autoloaders are a very bad idea for the following reasons
I like to know what and where my scripts are grabbing the data/code from. Makes debugging easier.
This also has configuration problems in so far as if one of your developers changes the file (upgrade etc) or configuration and things stop working it is harder to find out where it is broken.
I also think that it is lazy programming.
As to memory/preformance issues it is just as cheap to buy some more memory for the computer if it is struggling with that.

Page cache in PHP that handles concurrency?

I've read previous answers here about caching in PHP, and the articles they link to. I've checked out the oft-recommended Pear Cache_Light, QuickCache, and WordPress Super Cache. (Sorry - apparently I'm allowed to hyperlink only once.)
Either none deal with concurrency issues, or none explicitly call out that they do in their documentation.
Can anyone point me in the direction of a PHP page cache that handles concurrency?
This is on a shared host, so memcache and opcode caches are unfortunately not an option. I don't use a templating engine and would like to avoid taking a dependency on one. WP Super Cache's approach is preferable - i.e. storing static files under wwwroot to let Apache serve them - but not a requirement.
Thanks!
P.S. Examples of things that should be handled automatically:
Apache / the PHP cache is in the middle of reading a cached file. The cached file becomes obsolete and deletion is attempted.
A cached file was deleted because it was obsolete. A request for that file comes in, and the file is in the process of being recreated. Another request for the file comes in during this.
It seems PEAR::Cache_Lite has some kind of security to deal with concurrency issues.
If you take a look at the manual of constructor Cache_Lite::Cache_Lite, you have those options :
fileLocking
enable / disable fileLocking. Can avoid cache corruption under bad
circumstances.
writeControl
enable / disable write control. Enable write control will lightly slow
the cache writing but not the cache
reading. Write control can detect some
corrupt cache files but maybe it's not
a perfect control.
readControl
enable / disable read control. If enabled, a control key is embeded in
cache file and this key is compared
with the one calculated after the
reading
readControlType
Type of read control (only if read control is enabled). Must be 'md5'
(for a md5 hash control (best but
slowest)), 'crc32' (for a crc32 hash
control (lightly less safe but
faster)) or 'strlen' (for a length
only test (fastest))
Which one to use is still up to you, and will depend on what kind of performance you are ready to sacrifice -- and the risk of concurrency access that probably exists in your application.
You might also want to take a look at Zend_Cache_Frontend_Output, to cache a page, using something like Zend_Cache_Backend_File as backend.
That one seems to support some kind of security as well -- the same kinf of stuff that Cache_Lite already gave you (so I won't copy-paste a second time)
As a sidenote, if your website runs on a shared host, I suppose it doesn't have that many users ? So the risks of concurrent access are probably not that high, are they ?
Anyway, I probably would not search any farther that what those tow Frameworks propose : it is already probably more than enough for the needs of your application :-)
(I've never seen any caching mecanism "more secure" than what those allow you to do... And i've never run into some catastrophic concurrency problem of that sort yet... In 3 years of PHP-development)
Anyway : have fun !
I would be tempted to modify one of the existing caches. Zend Framework's cache should be able to do the trick. If not, I would change it.
You could create a really primitive locking strategy. The database could be used to track all of the cached items, allow locking for update, allow people to wait for someone else's update to complete, ...
That would handle your ACID issues. You could set the lock for someone else's update to a very short period, or possibly have it just skip the cache altogether for that round trip depending on your server load/capacity and the cost of producing the cached content.
Jacob
Concurrent resource creation aka cache slamming / thread race can be a serious issue on busy websites. That's why I've created cache library that synchronize read/write processes/threads.
It has elegant and clear structure: interfaces -> adaptors -> classes for easy extension. At github page im explaining in details what's the problem with slamming and how The Library is resolving it.
Check it here:
https://github.com/tztztztz/php-no-slam-cache
Under Linux, generally, the file will remain "open" for read, even if it's "deleted" until the process closes the file. This is something built into the system, and can sometimes cause huge discrepancies in disk usage sizes (deleting a 3G file while it's still "open" would mean that is still allocated on the disk as in use until the process closes it) - I'm unsure as to whether the same is true under linux.
Assuming a Journalling Filesystem (most Linux Filesystems, and NTFS) - then the file should not be seen as "created" until the process closes the file. This should show up as a non-existant file!
Assuming a Journalling Filesystem (most Linux Filesystems, and NTFS) -
then the file should not be seen as "created" until the process
closes the file. This should show up as a non-existant file!
Nope, it is visible as soon as it is created, you have to lock it.
Rename is atomic though. So you could open(), write(), close(), rename(), but this will not prevent the same cache item being re-created twice at the same time.
A cached file was deleted because it was obsolete.
A request for that file comes in, and the file is in the process of being recreated. Another request for the file comes in during this.
If it is not locked, a half-complete file will be served, or two processes will try to regenerate the same file at the same time, giving "interesting" results.
You could cache pages in the database, just create a simple "name,value" table and store cached pages on it.

Optimizing Kohana-based Websites for Speed and Scalability

A site I built with Kohana was slammed with an enormous amount of traffic yesterday, causing me to take a step back and evaluate some of the design. I'm curious what are some standard techniques for optimizing Kohana-based applications?
I'm interested in benchmarking as well. Do I need to setup Benchmark::start() and Benchmark::stop() for each controller-method in order to see execution times for all pages, or am I able to apply benchmarking globally and quickly?
I will be using the Cache-library more in time to come, but I am open to more suggestions as I'm sure there's a lot I can do that I'm simply not aware of at the moment.
What I will say in this answer is not specific to Kohana, and can probably apply to lots of PHP projects.
Here are some points that come to my mind when talking about performance, scalability, PHP, ...
I've used many of those ideas while working on several projects -- and they helped; so they could probably help here too.
First of all, when it comes to performances, there are many aspects/questions that are to consider:
configuration of the server (both Apache, PHP, MySQL, other possible daemons, and system); you might get more help about that on ServerFault, I suppose,
PHP code,
Database queries,
Using or not your webserver?
Can you use any kind of caching mechanism? Or do you need always more that up to date data on the website?
Using a reverse proxy
The first thing that could be really useful is using a reverse proxy, like varnish, in front of your webserver: let it cache as many things as possible, so only requests that really need PHP/MySQL calculations (and, of course, some other requests, when they are not in the cache of the proxy) make it to Apache/PHP/MySQL.
First of all, your CSS/Javascript/Images -- well, everything that is static -- probably don't need to be always served by Apache
So, you can have the reverse proxy cache all those.
Serving those static files is no big deal for Apache, but the less it has to work for those, the more it will be able to do with PHP.
Remember: Apache can only server a finite, limited, number of requests at a time.
Then, have the reverse proxy serve as many PHP-pages as possible from cache: there are probably some pages that don't change that often, and could be served from cache. Instead of using some PHP-based cache, why not let another, lighter, server serve those (and fetch them from the PHP server from time to time, so they are always almost up to date)?
For instance, if you have some RSS feeds (We generally tend to forget those, when trying to optimize for performances) that are requested very often, having them in cache for a couple of minutes could save hundreds/thousands of request to Apache+PHP+MySQL!
Same for the most visited pages of your site, if they don't change for at least a couple of minutes (example: homepage?), then, no need to waste CPU re-generating them each time a user requests them.
Maybe there is a difference between pages served for anonymous users (the same page for all anonymous users) and pages served for identified users ("Hello Mr X, you have new messages", for instance)?
If so, you can probably configure the reverse proxy to cache the page that is served for anonymous users (based on a cookie, like the session cookie, typically)
It'll mean that Apache+PHP has less to deal with: only identified users -- which might be only a small part of your users.
About using a reverse-proxy as cache, for a PHP application, you can, for instance, take a look at Benchmark Results Show 400%-700% Increase In Server Capabilities with APC and Squid Cache.
(Yep, they are using Squid, and I was talking about varnish -- that's just another possibility ^^ Varnish being more recent, but more dedicated to caching)
If you do that well enough, and manage to stop re-generating too many pages again and again, maybe you won't even have to optimize any of your code ;-)
At least, maybe not in any kind of rush... And it's always better to perform optimizations when you are not under too much presure...
As a sidenote: you are saying in the OP:
A site I built with Kohana was slammed with
an enormous amount of traffic yesterday,
This is the kind of sudden situation where a reverse-proxy can literally save the day, if your website can deal with not being up to date by the second:
install it, configure it, let it always -- every normal day -- run:
Configure it to not keep PHP pages in cache; or only for a short duration; this way, you always have up to date data displayed
And, the day you take a slashdot or digg effect:
Configure the reverse proxy to keep PHP pages in cache; or for a longer period of time; maybe your pages will not be up to date by the second, but it will allow your website to survive the digg-effect!
About that, How can I detect and survive being “Slashdotted”? might be an interesting read.
On the PHP side of things:
First of all: are you using a recent version of PHP? There are regularly improvements in speed, with new versions ;-)
For instance, take a look at Benchmark of PHP Branches 3.0 through 5.3-CVS.
Note that performances is quite a good reason to use PHP 5.3 (I've made some benchmarks (in French), and results are great)...
Another pretty good reason being, of course, that PHP 5.2 has reached its end of life, and is not maintained anymore!
Are you using any opcode cache?
I'm thinking about APC - Alternative PHP Cache, for instance (pecl, manual), which is the solution I've seen used the most -- and that is used on all servers on which I've worked.
See also: Slides APC Facebook,
Or Benchmark Results Show 400%-700% Increase In Server Capabilities with APC and Squid Cache.
It can really lower the CPU-load of a server a lot, in some cases (I've seen CPU-load on some servers go from 80% to 40%, just by installing APC and activating it's opcode-cache functionality!)
Basically, execution of a PHP script goes in two steps:
Compilation of the PHP source-code to opcodes (kind of an equivalent of JAVA's bytecode)
Execution of those opcodes
APC keeps those in memory, so there is less work to be done each time a PHP script/file is executed: only fetch the opcodes from RAM, and execute them.
You might need to take a look at APC's configuration options, by the way
there are quite a few of those, and some can have a great impact on both speed / CPU-load / ease of use for you
For instance, disabling [apc.stat](https://php.net/manual/en/apc.configuration.php#ini.apc.stat) can be good for system-load; but it means modifications made to PHP files won't be take into account unless you flush the whole opcode-cache; about that, for more details, see for instance To stat() Or Not To stat()?
Using cache for data
As much as possible, it is better to avoid doing the same thing over and over again.
The main thing I'm thinking about is, of course, SQL Queries: many of your pages probably do the same queries, and the results of some of those is probably almost always the same... Which means lots of "useless" queries made to the database, which has to spend time serving the same data over and over again.
Of course, this is true for other stuff, like Web Services calls, fetching information from other websites, heavy calculations, ...
It might be very interesting for you to identify:
Which queries are run lots of times, always returning the same data
Which other (heavy) calculations are done lots of time, always returning the same result
And store these data/results in some kind of cache, so they are easier to get -- faster -- and you don't have to go to your SQL server for "nothing".
Great caching mechanisms are, for instance:
APC: in addition to the opcode-cache I talked about earlier, it allows you to store data in memory,
And/or memcached (see also), which is very useful if you literally have lots of data and/or are using multiple servers, as it is distributed.
of course, you can think about files; and probably many other ideas.
I'm pretty sure your framework comes with some cache-related stuff; you probably already know that, as you said "I will be using the Cache-library more in time to come" in the OP ;-)
Profiling
Now, a nice thing to do would be to use the Xdebug extension to profile your application: it often allows to find a couple of weak-spots quite easily -- at least, if there is any function that takes lots of time.
Configured properly, it will generate profiling files that can be analysed with some graphic tools, such as:
KCachegrind: my favorite, but works only on Linux/KDE
Wincachegrind for windows; it does a bit less stuff than KCacheGrind, unfortunately -- it doesn't display callgraphs, typically.
Webgrind which runs on a PHP webserver, so works anywhere -- but probably has less features.
For instance, here are a couple screenshots of KCacheGrind:
(source: pascal-martin.fr)
(source: pascal-martin.fr)
(BTW, the callgraph presented on the second screenshot is typically something neither WinCacheGrind nor Webgrind can do, if I remember correctly ^^ )
(Thanks #Mikushi for the comment) Another possibility that I haven't used much is the the xhprof extension : it also helps with profiling, can generate callgraphs -- but is lighter than Xdebug, which mean you should be able to install it on a production server.
You should be able to use it alonside XHGui, which will help for the visualisation of data.
On the SQL side of things:
Now that we've spoken a bit about PHP, note that it is more than possible that your bottleneck isn't the PHP-side of things, but the database one...
At least two or three things, here:
You should determine:
What are the most frequent queries your application is doing
Whether those are optimized (using the right indexes, mainly?), using the EXPLAIN instruction, if you are using MySQL
See also: Optimizing SELECT and Other Statements
You can, for instance, activate log_slow_queries to get a list of the requests that take "too much" time, and start your optimization by those.
whether you could cache some of these queries (see what I said earlier)
Is your MySQL well configured? I don't know much about that, but there are some configuration options that might have some impact.
Optimizing the MySQL Server might give you some interesting informations about that.
Still, the two most important things are:
Don't go to the DB if you don't need to: cache as much as you can!
When you have to go to the DB, use efficient queries: use indexes; and profile!
And what now?
If you are still reading, what else could be optimized?
Well, there is still room for improvements... A couple of architecture-oriented ideas might be:
Switch to an n-tier architecture:
Put MySQL on another server (2-tier: one for PHP; the other for MySQL)
Use several PHP servers (and load-balance the users between those)
Use another machines for static files, with a lighter webserver, like:
lighttpd
or nginx -- this one is becoming more and more popular, btw.
Use several servers for MySQL, several servers for PHP, and several reverse-proxies in front of those
Of course: install memcached daemons on whatever server has any amount of free RAM, and use them to cache as much as you can / makes sense.
Use something "more efficient" that Apache?
I hear more and more often about nginx, which is supposed to be great when it comes to PHP and high-volume websites; I've never used it myself, but you might find some interesting articles about it on the net;
for instance, PHP performance III -- Running nginx.
See also: PHP-FPM - FastCGI Process Manager, which is bundled with PHP >= 5.3.3, and does wonders with nginx.
Well, maybe some of those ideas are a bit overkill in your situation ^^
But, still... Why not study them a bit, just in case ? ;-)
And what about Kohana?
Your initial question was about optimizing an application that uses Kohana... Well, I've posted some ideas that are true for any PHP application... Which means they are true for Kohana too ;-)
(Even if not specific to it ^^)
I said: use cache; Kohana seems to support some caching stuff (You talked about it yourself, so nothing new here...)
If there is anything that can be done quickly, try it ;-)
I also said you shouldn't do anything that's not necessary; is there anything enabled by default in Kohana that you don't need?
Browsing the net, it seems there is at least something about XSS filtering; do you need that?
Still, here's a couple of links that might be useful:
Kohana General Discussion: Caching?
Community Support: Web Site Optimization: Maximum Website Performance using Kohana
Conclusion?
And, to conclude, a simple thought:
How much will it cost your company to pay you 5 days? -- considering it is a reasonable amount of time to do some great optimizations
How much will it cost your company to buy (pay for?) a second server, and its maintenance?
What if you have to scale larger?
How much will it cost to spend 10 days? more? optimizing every possible bit of your application?
And how much for a couple more servers?
I'm not saying you shouldn't optimize: you definitely should!
But go for "quick" optimizations that will get you big rewards first: using some opcode cache might help you get between 10 and 50 percent off your server's CPU-load... And it takes only a couple of minutes to set up ;-) On the other side, spending 3 days for 2 percent...
Oh, and, btw: before doing anything: put some monitoring stuff in place, so you know what improvements have been made, and how!
Without monitoring, you will have no idea of the effect of what you did... Not even if it's a real optimization or not!
For instance, you could use something like RRDtool + cacti.
And showing your boss some nice graphics with a 40% CPU-load drop is always great ;-)
Anyway, and to really conclude: have fun!
(Yes, optimizing is fun!)
(Ergh, I didn't think I would write that much... Hope at least some parts of this are useful... And I should remember this answer: might be useful some other times...)
Use XDebug and WinCacheGrind or WebCacheGrind to profile and analyze slow code execution.
(source: jokke.dk)
Profile code with XDebug.
Use a lot of caching. If your pages are relatively static, then reverse proxy might be the best way to do it.
Kohana is out of the box very very fast, except for the use of database objects. To quote Zombor "You can reduce memory usage by ensuring you are using the database result object instead of result arrays." This makes a HUGEE performance difference on a site that is being slammed. Not only does it use more memory, it slows down execution of scripts.
Also - you must use caching. I prefer memcache and use it in my models like this:
public function get($e_id)
{
$event_data = $this->cache->get('event_get_'.$e_id.Kohana::config('config.site_domain'));
if ($event_data === NULL)
{
$this->db_slave
->select('e_id,e_name')
->from('Events')
->where('e_id', $e_id);
$result = $this->db_slave->get();
$event_data = ($result->count() ==1)? $result->current() : FALSE;
$this->cache->set('event_get_'.$e_id.Kohana::config('config.site_domain'), $event_data, NULL, 300); // 5 minutes
}
return $event_data;
}
This will also dramatically increase performance. The above two techniques improved a sites performance by 80%.
If you gave some more information about where you think the bottleneck is, I'm sure we could give some better ideas.
Also check out yslow (google it) for some other performance tips.
Strictly related to Kohana (you probably already have done this, or not):
In production mode:
Enable internal caching (this will only cache the Kohana::find_file results, but this actually can help a lot.
Disable profiler
Just my 2 cents :)
I totally agree with the XDebug and caching answers. Don't look into the Kohana layer for optimization until you've identified your biggest speed and scale bottlenecks.
XDebug will tell you were you spend the most of your time and identify 'hotspots' in your code. Keep this profiling information so you can baseline and measure performance improvements.
Example problem and solution:
If you find that you're building up expensive objects from the database each time, that don't really change often, then you can look at caching them with memcached or another mechanism. All of these performance fixes take time and add complexity to your system, so be sure of your bottlenecks before you start fixing them.

Rolling and packing PHP scripts

I was just reading over this thread where the pros and cons of using include_once and require_once were being debated. From that discussion (particularly Ambush Commander's answer), I've taken away the fact(?) that any sort of include in PHP is inherently expensive, since it requires the processor to parse a new file into OP codes and so on.
This got me to thinking.
I have written a small script which will "roll" a number of Javascript files into one (appending the all contents into another file), such that it can be packed to reduce HTTP requests and overall bandwidth usage.
Typically for my PHP applications, I have one "includes.php" file which is included on each page, and that then includes all the classes and other libraries which I need. (I know this isn't probably the best practise, but it works - the __autoload feature of PHP5 is making this better in any case).
Should I apply the same "rolling" technique on my PHP files?
I know of that saying about premature optimisation being evil, but let's take this question as theoretical, ok?
There is a problem with Apache/PHP on Windows which causes the application to be extremely slow when loading or even touching too many files (page which loads approx. 50-100 files may spend few seconds only with file business). This problem appears both with including/requiring and working with files (fopen, file_get_contents etc).
So if you (or more likely anybody else, due to the age of this post) will ever run your app on apache/windows, reducing the number of loaded files is absolutely necessary for you. Combine more PHP classes into one file (an automated script for it would be useful, I haven't found one yet) or be careful to not touch any unneeded file in your app.
That would depend somewhat on whether it was more work to parse several small files or to parse one big one. If you require files on an as-needed basis (not saying you necessarily should do things that way ) then presumably for some execution paths there would be considerably less compilation required than if all your code was rolled into one big PHP file that the parser had to encode the entirety of whether it was needed or not.
In keeping with the question, this is thinking aloud more than expertise on the internals of the PHP runtime, - it doesn't sound as though there is any real world benefit to getting too involved with this at all. If you run into a serious slowdown in your PHP I would be very surprised if the use of require_once turned out to be the bottleneck.
As you've said: "premature optimisation ...". Then again, if you're worried about performance, use an opcode cache like APC, which makes this problem almost disappear.
This isn't an answer to your direct question, just about your "js packing".
If you leave your javascript files alone and allow them to be included individually in the HTML source, the browser will cache those files. Then on subsequent requests when the browser requests the same javascript file, your server will return a 304 not modified header and the browser will use the cached version. However if your "packing" the javascript files together on every request, the browser will re-download the file on every page load.

Categories