Making PHP performance profiling predictable - php

I'm using xdebug with PHP to do some performance profiling. But when I run the same script more than once, I often get very different times. So it's hard to know how much faith to put in the results.
Obviously there's a lot happening on a machine that can affect PHP's performance. But is there anything I can do to reduce the number of variables, so multiple tests are more consistent?
I'm running PHP under Apache, on Mac OS X.

Reduce the number of unrelated services on the box as much as possible.
Cut down on the number of Apache processes.
Prime the various caches by loading your script a few times. Possibly use a benchmarking tool like Apache's ab or siege, to make sure all Apache children are hit.
Profile your script from the command line using curl or wget so that Apache only serves one resource: the script itself.
There may be an argument for getting more "real world" numbers by omitting some of these steps. I look forward to other answers this question may receive.

There are two different tasks, measuring performance and finding problems.
For measuring the time it takes, you should expect variability, because it depends on what else is going on in the machine. That's normal.
For finding problems, what you need to know is the percent of time used by various activities. Percent doesn't change too much as a function of other things, and the exact value of the percent doesn't matter much anyway.
What matters is that you find activities responsible for a healthy percent, that you can fix, and then that you fix them. When you do, you can expect to save time up to that percent, but the finding is what you need to do. The measuring is secondary.
Added: You might want to ask "Don't you have to measure in order to find?"
Consider an example. Suppose you run your program with debugging turned on, and you randomly pause it, and you see it in the process of closing a log file. You continue it, and then pause it again, and see the same thing. Well that rough "measurement" says it's spending 100% of its time doing that. Naturally, the time spent doing it isn't really 100%, but whatever it is, it's big, and you've found it. So then maybe you don't have to open/close the file so often, or something. Typically, more samples are needed, but not too many.

As others have said, reduce the running services and programs to a minimum
Run your test multiple times in succession and average to account for outliers
Make sure caching of any sort is disabled (unless you specifically want to test it with a cache)
If the results still vary widely, the problem is most likely in your profiling code. It might have some racing conditions or depends on network connections. You will get more details if you provide the code
You also might be hitting some bottlenecks on some of the runs. If you profile carefully different parts of the scripts you might be able to catch it.

Related

debugging long running PHP script

I have php script running as a cron job, extensively using third party code. Script itself has a few thousands LOC. Basically it's the data import / treatment script. (JSON to MySQL, but it also makes a lot of HTTP calls and some SOAP).
Now, performance is downgrading with the time. When testing with a few records (around 100), performance is ok, it is done in a 10-20 minutes. When running whole import (about 1600 records), mean time of import of one record grows steadily, and whole thing takes more than 24 hours, so at least 5 times longer than expected.
Memory seems not to be a problem, usage growing as it should, without unexpected peaks.
So, I need to debug it to find the bottleneck. It can be some problem with the script, underlying code base, php itself, database, os or network part. I am suspecting for now some kind of caching somewhere which is not behaving well with a near 100 % miss ratio.
I cannot use XDebug, profile file grows too fast to be treatable.
So question is: how can I debug this kind of script?
PHP version: 5.4.41
OS: Debian 7.8
I can have root privileges if necessary, and install the tools. But it's the production server and ideally debugging should not be too disrupting.
Yes its possible and You can use Kint (PHP Debugging Script)
What is it?
Kint for PHP is a tool designed to present your debugging data in the absolutely best way possible.
In other words, it's var_dump() and debug_backtrace() on steroids. Easy to use, but powerful and customizable. An essential addition to your development toolbox.
Still lost? You use it to see what's inside variables.
Act as a debug_backtrace replacer, too
you can download here or Here
Total Documentations and Help is here
Plus, it also supports almost all php framework
CodeIgniter
Drupal
Symfony
Symfony 2
WordPress
Yii
framework
Zend Framework
All the Best.... :)
There are three things that come to mind:
Set up an IDE so you can debug the PHP script line by line
Add some logging to the script
Look for long running queries in MySQL
Debug option #2 is the easiest. Since this is running as a cron job, you add a bunch of echo's in your script:
<?php
function log_message($type, $message) {
echo "[{strtoupper($type)}, {date('d-m-Y H:i:s')}] $message";
}
log_message('info', 'Import script started');
// ... the rest of your script
log_message('info', 'Import script finished');
Then pipe stdout to a log file in the cron job command.
01 04 * * * php /path/to/script.php >> /path/to/script.log
Now you can add log_message('info|warn|debug|error', 'Message here') all over the script and at least get an idea of where the performance issue lies.
Debug option #3 is just straight investigation work in MySQL. One of your queries might be taking a long time, and it might show up in a long running query utility for MySQL.
Profiling tool:
There is a PHP profiling tool called Blackfire which is currently in public beta. There is specific documentation on how to profile CLI applications. Once you collected the profile you can analyze the application control flow with time measurements in a nice UI:
Memory consumption suspicious:
Memory seems not to be a problem, usage growing as it should, without unexpected peaks.
A growing memory usage actually sounds suspicious! If the current dataset does not depend on all previous datasets of the import, then a growing memory most probably means, that all imported datasets are kept in memory, which is bad. PHP may also frequently try to garbage collect, just to find out that there is nothing to remove from memory. Especially long running CLI tasks are affected, so be sure to read the blog post that discovered the behavior.
Use strace to see what the program is basically doing from the system perspective. Is it hanging in IO operations etc.? strace should be the first thing you try when encountering performance problems with whatever kind of Linux application. Nobody can hide from it! ;)
If you should find out that the program hangs in network related calls like connect, readfrom and friends, meaning the network communication does hang at some point while connecting or waiting for responses than you can use tcpdump to analyze this.
Using the above methods you should be able to find out most common performance problems. Note that you can even attach to a running task with strace using -p PID.
If the above methods doesn't help, I would profile the script using xdebug. You can analyse the profiler output using tools like KCachegrind
Although it is not stipulated, and if my guess is correct you seem to be dealing with records one at a time, but in one big cron.
i.e. Grab a record#1, munge it somehow, add value to it, reformat it then save it, then move to record#2
I would consider breaking the big cron down. ie
Cron#1: grab all the records, and cache all the salient data locally (to that server). Set a flag if this stage is achieved.
Cron #2: Now you have the data you need, munge and add value, cache that output. Set a flag if this stage is achieved.
Cron #3: Reformat that data and store it. Delete all the files.
This kind of "divide and conquer" will ease your debugging woes, and lead to a better understanding of what is actually going on, and as a bonus give you the opportunity to rerun say, cron 2.
I've had to do this many times, and for me logging is the key to identifying weaknesses in your code, identify poor assumptions about data quality, and can hint at where latency is causing a problem.
I've run into strange slowdowns when doing network heavy efforts in the past. Basically, what I found was that during manual testing the system was very fast but when left to run unattended it would not get as much done as I had hoped.
In my case the issue I found was that I had default network timeouts in place and many web requests would simply time out.
In general, though not an external tool, you can use the difference between two microtime(TRUE) requests to time sections of code. To keep the logging small set a flag limit and only test the time if the flag has not been decremented down to zero after reducing for each such event. You can have individual flags for individual code segments or even for different time limits within a code segment.
$flag['name'] = 10; // How many times to fire
$slow['name'] = 0.5; // How long in seconds before it's a problem?
$start = microtime(TRUE);
do_something($parameters);
$used = microtime(TRUE) - $start;
if ( $flag['name'] && used >= $slow['name'] )
{
logit($parameters);
$flag['name']--;
}
If you output what URL, or other data/event took to long to process, then you can dig into that particular item later to see if you can find out how it is causing trouble in your code.
Of course, this assumes that individual items are causing your problem and not simply a general slowdown over time.
EDIT:
I (now) see it's a production server. This makes editing the code less enjoyable. You'd probably want to make integrating with the code a minimal process having the testing logic and possibly supported tags/flags and quantities in an external file.
setStart('flagname');
// Do stuff to be checked for speed here
setStop('flagname',$moredata);
For maximum robustness the methods/functions would have to ensure they handled unknown tags, missing parameters, and so forth.
xdebug_print_function_stack is an option, but what you can also do is to create a "function trace".There are three output formats. One is meant as a human readable trace, another one is more suited for computer programs as it is easier to parse, and the last one uses HTML for formatting the trace
http://www.xdebug.org/docs/execution_trace
Okay, basically you have two possibilities - it's either the ineffective PHP code or ineffective MySQL code. Judging by what you say, it's probably inserting into indexed table a lot of records separately, which causes the insertion time to skyrocket. You should either disable indexes and rebuild them after insertion, or optimize the insertion code.
But, about the tools.
You can configure the system to automatically log slow MySQL queries:
https://dev.mysql.com/doc/refman/5.1/en/slow-query-log.html
You can also do the same with PHP scripts, but you need a PHP-FPM environment (and you probably have Apache).
https://rtcamp.com/tutorials/php/fpm-slow-log/
These tools are very powerful and versatile.
P.S. 10-20 minutes for 100 records seems like A LOT.
You can use https://github.com/jmartin82/phplapse to record the application activity for determinate time.
For example start recording after n iterations with:
phplapse_start();
And stop it in next iteration with:
phplapse_stop();
With this process you was created a snapshot of execution when all seems works slow.
(I'm the author of project, don't hesitate to contact with me to improve the functionality)
I have a similar thing running each night (a cron job to update my database). I have found the most reliable way to debug is to set up a log table in the database and regularly insert / update a json string containing a multi-dimensional array with info about each record and whatever useful info you want to know about each record. This way if your cron job does not finish you still have detailed information about where it got up to and what happened along the way. Then you can write a simple page to pull out the json string, turn it back into an array and print useful data onto the page including timing and passed tests etc. When you see something as an issue you can concentrate on putting more info from that area into the json string.
Regular "top" command can show you, if CPU usage by php or mysql is bottleneck. If not, then delays may be caused by http calls.
If CPU usage by mysqld is low, but constant, then it may be disk usage bottleneck.
Also, you can check your bandwidth usage by installing and using "speedometer", or other tools.

Php Xdebug Proflling - Understand what to check

I'm new with XDebug. I see it like a "must have" tool to make sure the app im coding is done well.
Here's my setup :
- MAMP on Macbook Air.
- Zend XDebug activated in PHP.INI
- Webgrind for reports
I made all the configuration to make the profiler running and it works great.
My only question about this is what should I look and worry about.
Some people says that the whole php process shouldn't be over 100ms maximum, closer to 50ms the better.
Ok well, thats a good start...
Any body could be more clear on what to check, what is acceptable and what is not?
Thanks.
It's not so much a matter of what's acceptable.
It's more a matter of seeing what it is spending a lot of time doing, and seeing if you can think of a way to reduce that.
xDebug shows stack traces if you interrupt it (by Ctrl-C, or Escape, or whatever), and that is very useful information.
For example, suppose it is spending 40% of its time allocating some chunk of memory, and discarding it, when it could be done just once, or parsing some string multiple times when it could be done just once, or something neither of us could guess ahead of time, but once you see it, you slap your head and say "I can do something about that!"
Well, when you interrupt it, there's a 40% chance you will see it (on the stack).
Interrupt it again, and again, until you've seen it twice.
On average it should take 2/0.4 interrupts, or about 5.
When you've seen it twice, you've found a juicy speedup.
(You don't know it's juicy until you see it twice.)
Then rinse and repeat, because something that was smaller before is now a larger percent of the time. You will quickly make the code as speedy as anyone's.

Crawl page faster [PHP]

I have a small question about crawling a web page in PHP. I have to crawl about 90 000 products on one big eshop. I tried it in PHP, but one product takes about 2-3 sec and that's bad. Any tips, how to do it faster? Maybe a C++ multithread version? But what about time of a HTTP request? I mean, is it PHP's limitation or not? Thank you for the tips.
That's an extremely vague question. When you benchmarked the code you have, what was the slowest part? Was it network transfer times? Using a different language (or multiple threads) won't change that.
Was it time spent parsing the page? How are you doing that? If you're using an XML library to parse the entire DOM, could you get away with just looking for keywords (or even regular expressions)? That's less precise (and in some sense less correct) but perhaps it's faster.
What algorithms are you using for your analysis? Would other data structures provide better performance? As one simple example, if you spend a lot of time iterating over an array, perhaps a hash map is more appropriate.
PHP can be run in multiple processes. What happens if you kick off multiple instances of your script at once (on different pages)? Does the total time decrease?
Ultimately you've described a very general problem so I can't offer very specific solutions, but there is no inherent reason why PHP is inappropriate for this task. When you've identified what's slow (regardless of what language you're using) you should be able to more precisely address how to fix it.
I don't think it's PHPs problem but it could be depending on connection speed/computer speed. I've never had a speed problem with PHP/cURL though.
Just do multiple threads (ie. multiple connections at once), I suggest you use cURL but that's only because I'm familiar with it.
Here's a guide I've used for multiple threads for scraping with cURL:
http://semlabs.co.uk/journal/object-oriented-curl-class-with-multi-threading
Be VERY careful not to accidentally cause a denial of service situation with your scripts. But I'm sure you're already away of that possibility.
If your program is running slowly, my advice would be to run a profiler on it, and analyse why it's running slowly.
This advice applies to any language, but in the case of PHP, the profiler software you need is called xDebug.
This is a PHP extension, so you need to install it into your server. If you're running on an ISP's server, then you may not have permission to do this, but you can always install it with PHP on your local PC and run your tests there.
Once you've got xDebug installed, switch on the profiling features in PHP.ini (see the xDebug documentation for instruction on this), and run your program. It will then generate profiler files, which can be used to analyse what the program is doing.
Download KCacheGrind to perform the analysis. This will generate call tree information, showing exactly what happened as the program ran, and how long every function call took.
With this information, you can look for the function calls that are running slowly, and work out what's happening. Usually the reason for slow code is some kind of inefficiency in the way something is written; xDebug will help you find it.
Hope that helps.
You have 99% probability that PHP is NOT the problem. It is rather the eshop webserver or any other network latency.
I know this for sure because I have been doing this for months now, and even if your code has lots of regular expressions, data scraping is really fast in PHP.
The solution to speed this ? Pre cache all the website with a command line crawler since disk space is cheap. curl can do this, and httrack as well. It will be much faster and stable than PHP doing the crawling.
Then let PHP do the parsing alone, you will see hopefully PHP chomping dozens of pages per minute, hope this helps :)

From PHP workers to Python threads

Right now I'm running 50 PHP (in CLI mode) individual workers (processes) per machine that are waiting to receive their workload (job). For example, the job of resizing an image. In workload they receive the image (binary data) and the desired size. The worker does it's work and returns the resized image back. Then it waits for more jobs (it loops in a smart way). I'm presuming that I have the same executable, libraries and classes loaded and instantiated 50 times. Am I correct? Because this does not sound very effective.
What I'd like to have now is one process that handles all this work and being able to use all available CPU cores while having everything loaded only once (to be more efficient). I presume a new thread would be started for each job and after it finishes, the thread would stop. More jobs would be accepted if there are less than 50 threads doing the work. If all 50 threads are busy, no additional jobs are accepted.
I am using a lot of libraries (for Memcached, Redis, MogileFS, ...) to have access to all the various components that the system uses and Python is pretty much the only language apart from PHP that has support for all of them.
Can Python do what I want and will it be faster and more efficient that the current PHP solution?
Most probably - yes. But don't assume you have to do multithreading. Have a look at the multiprocessing module. It already has an implementation of a Pool included, which is what you could use. And it basically solves the GIL problem (multithreading can run only 1 "standard python code" at any time - that's a very simplified explanation).
It will still fork a process per job, but in a different way than starting it all over again. All the initialisations done- and libraries loaded before entering the worker process will be inherited in a copy-on-write way. You won't do more initialisations than necessary and you will not waste memory for the same libarary/class if you didn't actually make it different from the pre-pool state.
So yes - looking only at this part, python will be wasting less resources and will use a "nicer" worker-pool model. Whether it will really be faster / less CPU-abusing, is hard to tell without testing, or at least looking at the code. Try it yourself.
Added: If you're worried about memory usage, python may also help you a bit, since it has a "proper" garbage collector, while in php GC is a not a priority and not that good (and for a good reason too).
Linux has shared libraries, so those 50 php processes use mostly the same libraries.
You don't sound like you even have a problem at all.
"this does not sound very effective." is not a problem description, if anything those words are a problem on their own. Writing code needs a real reason, else you're just wasting time and/or money.
Python is a fine language and won't perform worse than php. Python's multiprocessing module will probably help a lot too. But there isn't much to gain if the php implementation is not completly insane. So why even bother spending time on it when everything works? That is usually the goal, not a reason to rewrite ...
If you are on a sane operating system then shared libraries should only be loaded once and shared among all processes using them. Memory for data structures and connection handles will obviously be duplicated, but the overhead of stopping and starting the systems may be greater than keeping things up while idle. If you are using something like gearman it might make sense to let several workers stay up even if idle and then have a persistent monitoring process that will start new workers if all the current workers are busy up until a threshold such as the number of available CPUs. That process could then kill workers in a LIFO manner after they have been idle for some period of time.

Optimizing Kohana-based Websites for Speed and Scalability

A site I built with Kohana was slammed with an enormous amount of traffic yesterday, causing me to take a step back and evaluate some of the design. I'm curious what are some standard techniques for optimizing Kohana-based applications?
I'm interested in benchmarking as well. Do I need to setup Benchmark::start() and Benchmark::stop() for each controller-method in order to see execution times for all pages, or am I able to apply benchmarking globally and quickly?
I will be using the Cache-library more in time to come, but I am open to more suggestions as I'm sure there's a lot I can do that I'm simply not aware of at the moment.
What I will say in this answer is not specific to Kohana, and can probably apply to lots of PHP projects.
Here are some points that come to my mind when talking about performance, scalability, PHP, ...
I've used many of those ideas while working on several projects -- and they helped; so they could probably help here too.
First of all, when it comes to performances, there are many aspects/questions that are to consider:
configuration of the server (both Apache, PHP, MySQL, other possible daemons, and system); you might get more help about that on ServerFault, I suppose,
PHP code,
Database queries,
Using or not your webserver?
Can you use any kind of caching mechanism? Or do you need always more that up to date data on the website?
Using a reverse proxy
The first thing that could be really useful is using a reverse proxy, like varnish, in front of your webserver: let it cache as many things as possible, so only requests that really need PHP/MySQL calculations (and, of course, some other requests, when they are not in the cache of the proxy) make it to Apache/PHP/MySQL.
First of all, your CSS/Javascript/Images -- well, everything that is static -- probably don't need to be always served by Apache
So, you can have the reverse proxy cache all those.
Serving those static files is no big deal for Apache, but the less it has to work for those, the more it will be able to do with PHP.
Remember: Apache can only server a finite, limited, number of requests at a time.
Then, have the reverse proxy serve as many PHP-pages as possible from cache: there are probably some pages that don't change that often, and could be served from cache. Instead of using some PHP-based cache, why not let another, lighter, server serve those (and fetch them from the PHP server from time to time, so they are always almost up to date)?
For instance, if you have some RSS feeds (We generally tend to forget those, when trying to optimize for performances) that are requested very often, having them in cache for a couple of minutes could save hundreds/thousands of request to Apache+PHP+MySQL!
Same for the most visited pages of your site, if they don't change for at least a couple of minutes (example: homepage?), then, no need to waste CPU re-generating them each time a user requests them.
Maybe there is a difference between pages served for anonymous users (the same page for all anonymous users) and pages served for identified users ("Hello Mr X, you have new messages", for instance)?
If so, you can probably configure the reverse proxy to cache the page that is served for anonymous users (based on a cookie, like the session cookie, typically)
It'll mean that Apache+PHP has less to deal with: only identified users -- which might be only a small part of your users.
About using a reverse-proxy as cache, for a PHP application, you can, for instance, take a look at Benchmark Results Show 400%-700% Increase In Server Capabilities with APC and Squid Cache.
(Yep, they are using Squid, and I was talking about varnish -- that's just another possibility ^^ Varnish being more recent, but more dedicated to caching)
If you do that well enough, and manage to stop re-generating too many pages again and again, maybe you won't even have to optimize any of your code ;-)
At least, maybe not in any kind of rush... And it's always better to perform optimizations when you are not under too much presure...
As a sidenote: you are saying in the OP:
A site I built with Kohana was slammed with
an enormous amount of traffic yesterday,
This is the kind of sudden situation where a reverse-proxy can literally save the day, if your website can deal with not being up to date by the second:
install it, configure it, let it always -- every normal day -- run:
Configure it to not keep PHP pages in cache; or only for a short duration; this way, you always have up to date data displayed
And, the day you take a slashdot or digg effect:
Configure the reverse proxy to keep PHP pages in cache; or for a longer period of time; maybe your pages will not be up to date by the second, but it will allow your website to survive the digg-effect!
About that, How can I detect and survive being “Slashdotted”? might be an interesting read.
On the PHP side of things:
First of all: are you using a recent version of PHP? There are regularly improvements in speed, with new versions ;-)
For instance, take a look at Benchmark of PHP Branches 3.0 through 5.3-CVS.
Note that performances is quite a good reason to use PHP 5.3 (I've made some benchmarks (in French), and results are great)...
Another pretty good reason being, of course, that PHP 5.2 has reached its end of life, and is not maintained anymore!
Are you using any opcode cache?
I'm thinking about APC - Alternative PHP Cache, for instance (pecl, manual), which is the solution I've seen used the most -- and that is used on all servers on which I've worked.
See also: Slides APC Facebook,
Or Benchmark Results Show 400%-700% Increase In Server Capabilities with APC and Squid Cache.
It can really lower the CPU-load of a server a lot, in some cases (I've seen CPU-load on some servers go from 80% to 40%, just by installing APC and activating it's opcode-cache functionality!)
Basically, execution of a PHP script goes in two steps:
Compilation of the PHP source-code to opcodes (kind of an equivalent of JAVA's bytecode)
Execution of those opcodes
APC keeps those in memory, so there is less work to be done each time a PHP script/file is executed: only fetch the opcodes from RAM, and execute them.
You might need to take a look at APC's configuration options, by the way
there are quite a few of those, and some can have a great impact on both speed / CPU-load / ease of use for you
For instance, disabling [apc.stat](https://php.net/manual/en/apc.configuration.php#ini.apc.stat) can be good for system-load; but it means modifications made to PHP files won't be take into account unless you flush the whole opcode-cache; about that, for more details, see for instance To stat() Or Not To stat()?
Using cache for data
As much as possible, it is better to avoid doing the same thing over and over again.
The main thing I'm thinking about is, of course, SQL Queries: many of your pages probably do the same queries, and the results of some of those is probably almost always the same... Which means lots of "useless" queries made to the database, which has to spend time serving the same data over and over again.
Of course, this is true for other stuff, like Web Services calls, fetching information from other websites, heavy calculations, ...
It might be very interesting for you to identify:
Which queries are run lots of times, always returning the same data
Which other (heavy) calculations are done lots of time, always returning the same result
And store these data/results in some kind of cache, so they are easier to get -- faster -- and you don't have to go to your SQL server for "nothing".
Great caching mechanisms are, for instance:
APC: in addition to the opcode-cache I talked about earlier, it allows you to store data in memory,
And/or memcached (see also), which is very useful if you literally have lots of data and/or are using multiple servers, as it is distributed.
of course, you can think about files; and probably many other ideas.
I'm pretty sure your framework comes with some cache-related stuff; you probably already know that, as you said "I will be using the Cache-library more in time to come" in the OP ;-)
Profiling
Now, a nice thing to do would be to use the Xdebug extension to profile your application: it often allows to find a couple of weak-spots quite easily -- at least, if there is any function that takes lots of time.
Configured properly, it will generate profiling files that can be analysed with some graphic tools, such as:
KCachegrind: my favorite, but works only on Linux/KDE
Wincachegrind for windows; it does a bit less stuff than KCacheGrind, unfortunately -- it doesn't display callgraphs, typically.
Webgrind which runs on a PHP webserver, so works anywhere -- but probably has less features.
For instance, here are a couple screenshots of KCacheGrind:
(source: pascal-martin.fr)
(source: pascal-martin.fr)
(BTW, the callgraph presented on the second screenshot is typically something neither WinCacheGrind nor Webgrind can do, if I remember correctly ^^ )
(Thanks #Mikushi for the comment) Another possibility that I haven't used much is the the xhprof extension : it also helps with profiling, can generate callgraphs -- but is lighter than Xdebug, which mean you should be able to install it on a production server.
You should be able to use it alonside XHGui, which will help for the visualisation of data.
On the SQL side of things:
Now that we've spoken a bit about PHP, note that it is more than possible that your bottleneck isn't the PHP-side of things, but the database one...
At least two or three things, here:
You should determine:
What are the most frequent queries your application is doing
Whether those are optimized (using the right indexes, mainly?), using the EXPLAIN instruction, if you are using MySQL
See also: Optimizing SELECT and Other Statements
You can, for instance, activate log_slow_queries to get a list of the requests that take "too much" time, and start your optimization by those.
whether you could cache some of these queries (see what I said earlier)
Is your MySQL well configured? I don't know much about that, but there are some configuration options that might have some impact.
Optimizing the MySQL Server might give you some interesting informations about that.
Still, the two most important things are:
Don't go to the DB if you don't need to: cache as much as you can!
When you have to go to the DB, use efficient queries: use indexes; and profile!
And what now?
If you are still reading, what else could be optimized?
Well, there is still room for improvements... A couple of architecture-oriented ideas might be:
Switch to an n-tier architecture:
Put MySQL on another server (2-tier: one for PHP; the other for MySQL)
Use several PHP servers (and load-balance the users between those)
Use another machines for static files, with a lighter webserver, like:
lighttpd
or nginx -- this one is becoming more and more popular, btw.
Use several servers for MySQL, several servers for PHP, and several reverse-proxies in front of those
Of course: install memcached daemons on whatever server has any amount of free RAM, and use them to cache as much as you can / makes sense.
Use something "more efficient" that Apache?
I hear more and more often about nginx, which is supposed to be great when it comes to PHP and high-volume websites; I've never used it myself, but you might find some interesting articles about it on the net;
for instance, PHP performance III -- Running nginx.
See also: PHP-FPM - FastCGI Process Manager, which is bundled with PHP >= 5.3.3, and does wonders with nginx.
Well, maybe some of those ideas are a bit overkill in your situation ^^
But, still... Why not study them a bit, just in case ? ;-)
And what about Kohana?
Your initial question was about optimizing an application that uses Kohana... Well, I've posted some ideas that are true for any PHP application... Which means they are true for Kohana too ;-)
(Even if not specific to it ^^)
I said: use cache; Kohana seems to support some caching stuff (You talked about it yourself, so nothing new here...)
If there is anything that can be done quickly, try it ;-)
I also said you shouldn't do anything that's not necessary; is there anything enabled by default in Kohana that you don't need?
Browsing the net, it seems there is at least something about XSS filtering; do you need that?
Still, here's a couple of links that might be useful:
Kohana General Discussion: Caching?
Community Support: Web Site Optimization: Maximum Website Performance using Kohana
Conclusion?
And, to conclude, a simple thought:
How much will it cost your company to pay you 5 days? -- considering it is a reasonable amount of time to do some great optimizations
How much will it cost your company to buy (pay for?) a second server, and its maintenance?
What if you have to scale larger?
How much will it cost to spend 10 days? more? optimizing every possible bit of your application?
And how much for a couple more servers?
I'm not saying you shouldn't optimize: you definitely should!
But go for "quick" optimizations that will get you big rewards first: using some opcode cache might help you get between 10 and 50 percent off your server's CPU-load... And it takes only a couple of minutes to set up ;-) On the other side, spending 3 days for 2 percent...
Oh, and, btw: before doing anything: put some monitoring stuff in place, so you know what improvements have been made, and how!
Without monitoring, you will have no idea of the effect of what you did... Not even if it's a real optimization or not!
For instance, you could use something like RRDtool + cacti.
And showing your boss some nice graphics with a 40% CPU-load drop is always great ;-)
Anyway, and to really conclude: have fun!
(Yes, optimizing is fun!)
(Ergh, I didn't think I would write that much... Hope at least some parts of this are useful... And I should remember this answer: might be useful some other times...)
Use XDebug and WinCacheGrind or WebCacheGrind to profile and analyze slow code execution.
(source: jokke.dk)
Profile code with XDebug.
Use a lot of caching. If your pages are relatively static, then reverse proxy might be the best way to do it.
Kohana is out of the box very very fast, except for the use of database objects. To quote Zombor "You can reduce memory usage by ensuring you are using the database result object instead of result arrays." This makes a HUGEE performance difference on a site that is being slammed. Not only does it use more memory, it slows down execution of scripts.
Also - you must use caching. I prefer memcache and use it in my models like this:
public function get($e_id)
{
$event_data = $this->cache->get('event_get_'.$e_id.Kohana::config('config.site_domain'));
if ($event_data === NULL)
{
$this->db_slave
->select('e_id,e_name')
->from('Events')
->where('e_id', $e_id);
$result = $this->db_slave->get();
$event_data = ($result->count() ==1)? $result->current() : FALSE;
$this->cache->set('event_get_'.$e_id.Kohana::config('config.site_domain'), $event_data, NULL, 300); // 5 minutes
}
return $event_data;
}
This will also dramatically increase performance. The above two techniques improved a sites performance by 80%.
If you gave some more information about where you think the bottleneck is, I'm sure we could give some better ideas.
Also check out yslow (google it) for some other performance tips.
Strictly related to Kohana (you probably already have done this, or not):
In production mode:
Enable internal caching (this will only cache the Kohana::find_file results, but this actually can help a lot.
Disable profiler
Just my 2 cents :)
I totally agree with the XDebug and caching answers. Don't look into the Kohana layer for optimization until you've identified your biggest speed and scale bottlenecks.
XDebug will tell you were you spend the most of your time and identify 'hotspots' in your code. Keep this profiling information so you can baseline and measure performance improvements.
Example problem and solution:
If you find that you're building up expensive objects from the database each time, that don't really change often, then you can look at caching them with memcached or another mechanism. All of these performance fixes take time and add complexity to your system, so be sure of your bottlenecks before you start fixing them.

Categories