Why is Magento so slow? [closed] - php

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Is Magento usually so terribly slow?
This is my first experience with it and the admin panel simply takes ages to load and save changes. It is a default installation with the test data.
The server where it is hosted serves other non-Magento sites super fast. What is it about the PHP code that Magento uses that makes it so slow, and what can be done to fix it?

I've only been tangentially involved in optimizing Magento for performance, but here's a few reasons why the system is so slow
Parts of Magento use an EAV database system implemented on top of MySQL. This means querying for a single "thing" often means querying multiple rows
There's a lot of things behind the scenes (application configuration, system config, layout config, etc.) that involve building up giant XML trees in memory and then "querying" those same trees for information. This takes both memory (storing the trees) and CPU (parsing the trees). Some of these (especially the layout tree) are huge. Also, unless caching is on, these tree are built up from files on disk and on each request.
Magento uses its configuration system to allow you to override classes. This is a powerful feature, but it means anytime a model, helper, or controller is instantiated, extra PHP instructions need to run to determine if an original class file or an override class files is needed. This adds up.
Besides the layout system, Magento's template system involves a lot of recursive rendering. This adds up.
In general, the Magento Engineers were tasked, first and foremost, with building the most flexible, customizable system possible, and worry about performance later.
The first thing you can do to ensure better performance is turn caching on (System -> Cache Management). This will relieve some of the CPU/disk blocking that goes on while Magento is building up its various XML trees.
The second thing you'll want to do is ensure your host and operations team has experience performance tuning Magento. If you're relying on the $7/month plan to see you through, well, good luck with that.

Further to Alan Storm's recommendations on caching, there's two things I'd specifically recommend you look into related to caching:
- Make sure caching is in memcached, rather than on disk.
I look after a couple of magento installs, and once you get any sort of load on the system, memcached starts to perform much faster. And its dead easy to change it over (relative to doing other magento stuff at least!)
Good starting point is here: http://www.magentocommerce.com/boards/viewthread/12998/P30/ - but if you've not used memcached at all before, its worth looking at some general info about it as well.
- Enable template/view caching.
This is a good article: http://inchoo.net/ecommerce/magento/magento-block-caching/
There are good ones on the magento site too (google magento block caching), but its down at the moment.
To add my two cents to the block caching, I'd advise you create your own blocks in /app/code/local, extending the core ones and defining the cache parameters, name them xxx_Cache and then update your layout to use these blocks instead of the core ones. This way, you avoid losing your changes or breaking the system when you upgrade magento.

If you haven't seen it yet, Magento and Rackspace teamed up to create a white paper on performance tuning Magento. It's excellent.
https://support.rackspace.com/whitepapers/building-secure-scalable-and-highly-available-magento-stores-powered-by-rackspace-solutions/
--- edit ---
Another great resource, newly available (Oct 2011) is:
http://www.sessiondigital.com/assets/Uploads/Mag-Perf-WP-final.pdf
(Thanks due to Alan Storm on this one.)

There is possibly also a very non-obvious reason why your admin interface is very slow. Magento has a module named Mage_AdminNotification. Try to disable that ext. Because what it does is query magentocommerce.com for new update messages. If their servers are slow your admin page waits and is in effect slow because of the network lag and loading of the external news. If you have secured your outgoing server connection through a firewall this can be even more frustrating, since the admin interface will wait for the timeout when it cannot reach magentocommerce.com
To disable it: go to System -> Configuration, scroll to the bottom and hit Advanced(in the Advanced section). Now disable Mage_AdminNotification and save!

I only have a superficial experience with Magento. I installed it on a shared grid-server and the page loading was dismal ~5+ seconds. On a lark, I installed it on my optimized for CMS sites dedicated server, and it felt very, very snappy.
My Dedicated hosting had ~10 Joomla! sites and a VBullitin site running.
My guess is it's just not going to be performant on shared hosting. The over-subscription just won't allow enough resources for Magento to run as it ought.

I'm more involved in the managed server optimization in my company but I may have a few tips for you. First, you can look at the code more closely using the code tracing feature of Zend server. It will allow you to see where and when the things get dirty.
I totally share benlumley's consideration regarding the cache. Most of the sites we host doesn't even have the block caching enable. This cache has to be explicitly called and not "assumed". So if you code hasn't yet took part of this mechanism, it's something you definitely want to try. If you have a EE version, you can get the Full page up in order to get the best of the beast.
A reverse proxy will also help a lot. It'll cache the static ressources, significantly lowering the pressure on the php interpretation stack of your front servers.
Don't forget to write the sessions & Magento cache to a RAM disk. This will also definitely get you to another level of performances.
There's still a lot to be said here but I'm running out of time. You have to know that a good site, well coded in a 1.4.1 CE version, running on a 2x5650 Xeon + 16 GB RAM server and having a Rproxy on top can take up to 50 000 unique visitors a day with smooth pages to everybody.

Switching from Apache to LiteSpeed helped us a lot. In addition to: Editing MySQL's settings, installing Fooman Speedster (module to compress/combine js and css files), and installing APC. Magento has also posted a white paper on how to get the best performance out of the enterprise edition, but it is equally applicable to the other versions: http://www.magentocommerce.com/whitepaper/

There are many reasons why your Magento shopping cart could be running slow but no excuses for there is a variety of ways to eleviate the problem and make it pretty darn fast. Enabling Gzip by modifying your htaccess file is a start. You can also install the fooman speedster extension. The type of server used also will determine the speed of your store. More tips and a better explanation here http://www.interactone.com/how-to-speed-up-magento/

Magento is very slow because the database design is not very good. The code is a mess and very hard to update and optimize. So all optimizations are done via cache instead of code.
On the other hand. It is a webshop with a lot of tools. So if you need a flexible webshop just buy a very powerfull server and you will be ok.

When I first installed I had pages that were taking 30 seconds to load. My server was not maxed out in ram or processor, so I didn't know what to do. Looking at firebug's net panel it was loading about 100 files per page, and each one took a long time to connect. After installing fooman speedster and the gzip in the htaccess loads times were down to 3 seconds, like they had been on other shopping carts on my server.

it will also come down to functionality versus performance.
Raw performance is gained using nginx, php-fpm, memcached, apc and a proper designed server.
Functionality like plesk and magento performance could be managed by taking the entire infrastructure in perspective when designing a magento performance cloud.

Related

Why is drupal slow?

I have made site on drupal
My site has 7500 users and approx (20 to 50 without logged in)(2 to 10 logged in) users are online (and this is not heavy traffic I think)
The site is on dedicated server. I have enabled setting in performance from drupal admin and also installed memcache and eaccelerator
I looked in query logs from using devel module. it is firing total 600 to 900 queries on each page
When I have installed patch of path.inc to reduce the queries of drupal_look_path(). It has reduced queries to around 400
I have also made some positive changes in mysql (my.cnf) file, but still there are many same queries run form user_load() function again and again
I have 60 to 70 modules enabled and all are use full. I can't remove the modules
Still the site is running slow it is taking approx 10 to 15 sec
Now I don't know why the site is running so slow
Is it because the drupal has the large php code ?
Is it because it is firing so many queries on each page?
Does the InnoDB engine improve the performance?
Please, any kind of suggestions are welcome
400 queries for each requests is a sucidie (but even 50+).
You should implement some html cacher. My website generally doens't even make the db connection. It just fires the html cached in a file.
Some additional things to look into:
Install a tool like Yslow/PageSpeed to see how much of those 10-15s are client and server time.
Instal XhProf (on a development site, not live) together with Devel to see which are the functions that use the most time. Look into these first. Edit, now with link: http://groups.drupal.org/node/82889
Using pressflow might help a bit, but since you are alrady using the path.inc patch, probably not so much.
You mentioned that you installed memcache. Did you also install the memcache module and configure the cache plugin to use memcache?
EDIT: Yes, switching to InnoDB can help. One of the main performance advantages of InnoDB is row-level locking (as opposed to table-level locking of MyISAM), which means that multiple INSERT/UPDATE queries against the same table won't block each other unless really necessary. However, InnoDB does not perform well at all out of the box, you really need to fine-tune your mysql configuration for your specific site. So this is a step that you should only take carefully and after testing and optimizing on a development site. There are various questions already on this site and elsewhere about InnoDB tuning...
Anything else than that is then site specific and depends on the modules you are using. But especially things like complex node_access setups and multiple languages (i18n!) tend to either cause slow queries and/or a lot of them.
Not all modules make use of the caching mechanisms you can switch on in the performance settings area. It would be worth trying to identify which ones are doing the most/slowest queries and attempting to get the developer(s) to improve them.
Alternatively, examine whether you could achieve things with fewer modules. Some modules do overlap somewhat in functionality, so you may be able to reorganise the way the site functions a bit.
Additionally, you need to look at whether your settings MySQL are allowing enough memory for these queries to be carried out. Most MySQL distributions come with different versions of my.ini labelled 'small', 'medium', 'huge' etc. Copy the 'huge' one to my.ini (back up the old one first) and restart the DB to see if maxing out all the cache sizes makes a difference. You may well have a bottle neck there, but it can be hard to work out what setting is causing it.
Same goes for PHP. Set memory_limit in php.ini to 500MB or something and see if it helps. Of course, you may not be able to do this, depending on your hosting arrangements, but it will eliminate one possible cause (or not) if you can.
Performance of your Drupal website also depends on how well your hosting platform is tuned for Drupal. Drupal requires special optimization of LAMP stack components. You can try Drupal-specific hosting companies http://www.drupalspecific.com to make your website run faster.
facing the drupal slowness issue myself. But have a very different issue than the others mentioned.
I disabled all the content also the drupal header for a drupal page of a specific content type.
Still the time taken by this page to load is above 20 secs!
I took help of YSlow and NET firebug panels.
Upon looking at them, noticed:
JS and CSS files inclusion individually takes 3 to 2 secs, and there are fair bit of inclusions happening, as a result it takes like 20 secs.
But i am not able to figure out, why the js and css inclusions are taking so much time. (this includes normal drupal core js and css files as well)

How to optimize this website. Need general advices

this is my first question here, which is regarding a specific website optimization.
A few moths ago, we launched [site] for one of our clients which is some kind of community website.
Everything works great, but now this website is getting bigger and it shows some slowness when the pages are loading.
The server specs:
PHP 5.2.1 (i think we need to upgrade on 5.3 to make use of the new garbage collector)
Apache 2.2
Quad Core Xeon Processor # 2,8 Ghz and 4 GB DDR 3 RAM.
XCACHE 1.3 (we added this a few months ago)
Mysql 5.1 (we are using innodb as engine)
Codeigniter framework
Here is what we did so far and what we intend to do further :
Beside xcache, we don't really use a caching mechanism because most of the content comes live and beside this, we didn't wanted to optimize prematurely because we didn't know what to expect as far as the traffic flow.
On the other hand, we have installed memcached and we want to implement a cache system based on memcached.
Regarding the database structure, we have reached 3NF with most of our tables, and yes we have some slow queries(which we plan to optimize) but i think because the tables that produce slow queries are the one for blog comments(~44,408 rows) / user logs tracking (~725,837 rows) / user comments (~698,964 rows) etc which are quite big tables. The entire database is 697.4 MB in size for now.
Also, here are some stats for January 2011:
Monthly unique visitors: - 127.124
Monthly unique views: 4.829.252
Monthly unique visits: 242.708
Daily average:
Unique new visitors: 7.533
Unique new views : 179.680
Just let me know if you need more details.
Any advice is highly appreciated.
Thank you.
When it come to performance issue, there is no golden rule or labelled sticky note that first tell that is related to database. Maybe what i could suggest is to do performance profiling and there are many free and paid tools over the Internet that allows you to do so.
First start of with web server layer, make sure everything is done correctly and optimized as what is be possible.
Then move on to next layer (which i assume is your database). Normally from layman perspective whenever someone mentioned InnoDB MySQL, we assume there are indexes being created to optimize and search operations. The usage of indexes also quite important because you don't want to indexing something wrong and make things worse. My advise to this is to get a DBA equivalent personnel to troubleshoot using a staging environment.
Another tricks you could possibility look at is the contents, from web page contents to database data, make sure you show/keep data where is needed only, do no store unnecessary information into database and using smart layout on the webpage. A cut down of a seconds or two might do a big difference in terms of usability and response time.
It is very hard to explain the detail here unless we have in-depth information about your application, its architecture and your environment, but above are some commonly used direction people use to troubleshoot such incident.
Good luck!
This site has excellent resources http://www.websiteoptimization.com/
The books that are mentioned are excellent. There are just too many techniques to list here and we do not know what you have tried so far.
Sorry for the delay guys, i have been very busy to find the issue and i did it.
Well, the problem was because of apache mostly, i had an access log of almost 300 GB which at midnight was parsed to generate webalizer stats. Mostly when this was happening the website was very very slow. I disabled webalizer for the domain, cleared the logs, and what to see, it is very fast again, doesn't matter the hour you access it.
I now only have just a few slow queries that i tend to fix today.
I also updated to CI 2.0 Reactor as suggested and started to use the memcached driver.
Who would knew that apache logs can be so problematic...
Based on the stats, I don't think you are hitting load problems... on a hunch, I would look to the database first. Database partitioning might be a good place to start.
But you should really do some profiling of your application first. How much time is spent in the application versus database. Are there application methods that are using lots of time and just need some tweaking? Are database queries not written efficiently? Do you need more or better database indices?
Everything looks pretty good-- if upgrading codeigniter is an option, the new codeigniter 2.0 (reactor) adds support for memcache (New Cache driver with file system, APC and memcache support). Granted you're already using xcache, these new additions may be worth looking at.
When cache objects weren't enough for our multi-domain platform that saw huge traffic, we went the route of throwing more hardware at it-- ram, servers/database. Then we moved to database clustering to handle single account forecasted heavy load. And now switching from apache to nginx... It's a never ending battle, but what worked for us was being smart about what we cached and increasing server memory then distributing this load across servers...
Cache as many database calls as you can. In my CI application I have a settings table that rarely changes, so I cache all calls made to it as I am constantly querying the settings table.
Cache your views and even your controllers as well. I tend to cache basically as much as I can in my CI applications and then refresh the cache when a file changes.
Only autoload important libraries, models and helpers. I've seen people autoload up to 10 libraries and on-top of that a few helpers and then a model. You only really need to autoload the database and session libraries if you are using them.
Regarding point number 3, are you autoloading many things in your config/autoload.php file by any chance? It might help speed things up only loading things you need in your controllers as you need them with exception of course the session and database libraries.

Optimizing Drupal via PHP Apache and Mysql optimization

I installed Drupal common from acquia and using it for my college Intranet Website. I configured it on Ubuntu lucid lynx Desktop edition running latest XAMPP. I want to increase the performance of the website. My databse server and webserver is on same machine.
Can any one suggest methos to increase the performance on following point
What should be the ideal hardware configuration
What parameters should i change in PHP to run it for best performance?
How can I optimize apache and My SQL to get best performance out of both??
are there tweaks in drupal which can make it more faster?
Are there any additional packages for caching etc which can improve the speed??
Also, try Varnish if you're using PressFlow, as suggested by berkes. It helps a lot if you have to serve content for anonymous users.
Varnish can cache in memory all the content that Drupal produces, reducing hits to your web server and database.
Here a good start point for configuring Varnish with Pressflow:
https://wiki.fourkitchens.com/display/PF/Configure+Varnish+for+Pressflow
Google some for more details.
And don't forget about non Drupal related optimization, like reducing the number of http requests, serving web page elements from different domains to reduce browser pipelining, etc. Use YSlow and follow Yahoo's excellent rules. Google for "yahoo Best Practices for Speeding Up Your Web Site" (can't include link due to SO limitation for new users).
Is not specific for Drupal, but for every PHP setup. More general: for each web-app. I advise you to start with O'Reilly's Building Scalable Websites.
See above. For Drupal, note the memory limit; many people just crank it up to rediculous values; after logic: Drupal needs more then 38MB, I'll just give it 250MB, to be safe.
Again, see above. For Drupal, pay extra attention to the amount of queries. If you focus on Slow Queries only, you may miss that single tiny query hammering your DB 100+ times per request.
Lots. My advice is to start looking at pressflow, an optimised Drupal. It has all the tweaks you are looking for built in. And more.
Yes. Many-, but start with memcached. And if you rely on search a lot, consider moving search to SOLR search.
Many more tips for starters can be found at Drupal performance Blog
The question you ask is very broad, so it is hard to give any specifics in answers. A good place to start is drupal's own handbook on performance tuning.
I would also highly recommend the boost module if your site serves largely anonymous users, as this allows requests to not even go to drupal and be served entirely from a static cache.
Drupal's Devel module has a Performance module that will log memory usage and access times to the Reports section of your site.
Use this to determine which pages on your site are slow.
Load xdebug (a PHP extension) and turn on the profiling feature. Make requests to your performance-intensive pages and it will create (very large) dumps of the entire request. Open up the cache file in a program like KCacheGrind or WinCacheGrind and you will be able to see every function call that Drupal made when building the page. From here you can see which parts are slowest and optimize them.
This should get you a good 30-80% improvement in performance if you have a slow site. In my experience, there's usually a few blocks or views that account for a huge part of any performance issues.
Pro Drupal 7 Development has a whole section regarding fine-tuning called "optimizing drupal".
I think you will find it quite interesting. It also discusses hardware architectures which is of your interest.
Regarding the 4th question, you can for a start checkout the boost module and disable modules you are not using.
Additionally, for improving page-performance you can enable page caching from Configuration -> Performance. In the same page you can use the aggregate and compress CSS(JS) files into one", in this way you reduce the number of HTTP requests per page and the overall size of the downloaded page.
You should also consider if CRON is setup. Not running cron can fill up the db with log , stale cache and other "garbage".
A last suggestion is to convert your db from MyIsam to InnoDB, but I think this requires some investigation because it not always the case that InnoDB is faster. With InnoDb there is less time lost from table locking while MyISAM is faster in table readings.

Is it possible to get a <200ms response with Drupal (without caching)?

The question, simply put, is the one in the title. Is it possible?
So far, my experience with scripting languages is that, to increase performance, you need to cache everything and later just serve the generated HTML files.
That's ok for some use cases, but when you really need to generate a new page in realtime, it's just impossible.
Drupal can take up to 3 seconds (or more!) to render some web pages (PHP execution time, not DB). That's crazy. Completely crazy.
If many projects (like Facebook) are using PHP, obviously the problem is mine. But googling for this problem shows that it's common. Too common.
(Of course I installed APC for PHP. It certainly helps, but PHP is still ultra-slow).
Must I assume this is the reality for Drupal / PHP?
Thanks.
Short answer is no. But why would you not want to cache?
What do you mean by 'generate a new page in realtime'? Authenticated users (anyone logged in) can see new content right away. Anonymous users may have to wait a little bit (if you are using Boost, for example), BUT, you can always control it, or flush it when new content is added. You should cache as much as you can.
You can install Boost (static HTML files), Memcache, and enable Drupal cache. It's encouraged, especially the last one. You can also run nginx on the server.
You can also try using Pressflow, a drop-in replacement for Drupal that will give you better performance.
http://pressflow.org/
Its been discussed many times.. you can make Drupal extremely fast if you want to. Check out some of the 2bits articles:
http://2bits.com/contents/articles
Utilizing the available methods of caching will help you keep your hosting cost low, instead of throw more hardware on an unoptimized site.
As you say, Facebook uses PHP, and they clearly have reason to need good performance. Their solution was to write their own compiler for PHP called HipHop, which they released as open source. If you're worried about PHP's performance, you should give it a try as it will definitely improve things.
The downside is that it doesn't (yet) cover 100% of the PHP function set, so some PHP programs may not compile. I don't know where Drupal fits into this, but it would be worth trying it out - there's nothing to be lost by doing a test compilation; if its not going to work, you won't have lost anything.
On a similar vein, there is a project in the Drupal community to convert parts of the Drupal Core into a PHP Extension, meaning that some key Drupal functions are then built-in to the PHP runtime as compiled code. See the project page here. But note that this is still in a fairly early stage of development: it's still listed as experimental, and only covers a small number of functions. It might be worth keeping an eye on the project, though.
According to http://groups.drupal.org/node/34076, yes you can get a < 200ms response time with Drupal without caching.
The tips that I've received from some friends regarding Drupal load performance is to install less than 40 modules.
More than 40, especially if those contrib modules use too much hooks and memory, and the performance will be decreased.
Other tips:
remove imagecache ui and views ui on production site
if possible put htaccess on vhost.conf so that htaccess will only be called once on apahe start
use throttle module
use gzip for all html, css and js files
use cdn module and amazon server solution
use ajax for some parts or blocks of your site
last and if there is enough budget, migrate to oracle

Best practices for optimizing LAMP sites for speed? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I want to know when building a typical site on the LAMP stack how do you optimize it for the best possible load times. I am picturing a typical DB-driven site.
This is a high-level look and could probably pull in question and let me break it down into each layer of the stack.
L - At the system level, (setup and filesystem) can you do to improve speed? One thing I can think of is image sizes, can compression here help optimize anything?
A - There have to be a ton of settings related to site speed here in the web server. Not my Forte. Probably depends a lot on how many sites are running concurrently.
M - MySQL in a database driven site, DB performance is key. Is there a better normalization approach i.e, using link tables? Web developers often just make simple monolithic tables resembling 1NF and this can kill performance.
P - aside from performance-boosting settings like caching, what can the programmer do to affect performance at a high level? I would really like to know if MVC design approaches hit performance more than quick-and-dirty. Other simple tips like are sessions faster than cookies would be interesting to know.
Obviously you have to get down and dirty into the details and find what code is slowing you down. Also I realize that many sites have many different performance characteristics, but let's assume a typical site that has more reads then writes.
I am just wondering if we can compile a bunch of best practices and fully expect people to link other questions so we can effectively workup a checklist.
My goal is to see if even in addition to the usual issues in performance we can see some oddball things you might not think of crop up to go along with a best-practices summary.
So my question is, if you were starting from scratch, how would you make sure your LAMP site was fast?
Here's a few personal must-dos that I always set up in my LAMP applications.
Install mod_deflate for apache, and
do not use PHP's gzip handlers.
mod_deflate will allow you to
compress static content, like
javascript/css/static html, as well
as the usual dynamic PHP output, and
it's one less thing you have to worry
about in your code.
Be careful with .htaccess files!
Enabling .htaccess files for
directories in your app means that
Apache has to scan the filesystem
constantly, looking for .htaccess
directives. It is far better to put
directives inside the main
configuration or a vhost
configuration, where they are loaded
once. Any time you can get rid of a
directory-level access file by moving
it into a main configuration file,
you save disk access time.
Prepare your application's database
layer to utilize a connection manager
of some sort (I use a Singleton for
most applications). It's not very
hard to do, and reducing the number
of database connections your
application opens saves resources.
If you think your application will
see significant load, memcached can
perform miracles. Keep this in mind
while you write your code... perhaps
one day instead of creating objects
on the fly, you will be getting them
from memcached. A little foresight
will make implementation painless.
Once your app is up and running, set
MySQL's slow query time to a small
number and monitor the slow query log
diligently. This will show you where
your problem queries are coming from,
and allow you to optimize your
queries and indexes before they
become a problem.
For serious performance tweakers, you
will want to compile PHP from source.
Installing from a package installs a
lot of libraries that you may never
use. Since PHP environments are
loaded into every instance of an
Apache thread, even a 5MB memory
overhead from extra libraries quickly
becomes 250MB of lost memory when
there's 50 Apache threads in
existence. I keep a list of my
standard ./configure line I use when
building PHP here, and I find it
suits most of my applications. The
downside is that if you end up
needing a library, you have to
recompile PHP to get it. Analyze
your code and test it in a devel
environment to make sure you have
everything you need.
Minify your Javascript.
Be prepared to move static content,
such as images and video, to a
non-dynamic web server. Write your
code so that any URLs for images and
video are easily configured to point
to another server in the future. A
web server optimized for static
content can easily serve tens or even
hundreds of times faster than a
dynamic content server.
That's what I can think of off the top of my head. Googling around for PHP best practices will find a lot of tips on how to write faster/better code as well (Such as: echo is faster than print).
First, realize that performance is an iterative process. You don't build a web application in a single pass, launch it, and never work on it again. On the contrary, you start small, and address performance issues as your site grows.
Now, onto specifics:
Profile. Identify your bottlenecks. This is the most important step. You need to focus your effort where you'll get the best results. You should have some sort of monitoring solution in place (like cacti or munin), giving you visibility into what's going on on your server(s)
Cache, cache, cache. You'll probably find that database access is your biggest bottleneck on the back end -- but you should verify this on your own. Fortunately, you'll probably find that a lot of your traffic is for a small set of resources. You can cache those resources in something like memcached, saving yourself the database hit, and resulting in better backend performance.
As others have mentioned above, take a look at the YDN performance rules. Consider picking up the accompanying book. This'll help you with front end performance
Install PHP APC, and make sure it's configured with enough memory to hold all your compiled PHP bytecode. We recently discovered that our APC installation didn't have nearly enough ram; giving it enough to work in cut our CPU time in half, and disk activity by 10%
Make sure your database tables are properly indexed. This goes hand in hand with monitoring the slow query log.
The above will get you very far. That is to say, even a fairly db-heavy site should be able to survive a frontpage digg on a single modestly-spec'd server if you've done the above.
You'll eventually hit a point where the default apache config won't always be able to keep up with incoming requests. When you hit this wall, there are two things to do:
As above, profile. Monitor your apache activity -- you should have an idea of how many connections are active at any given time, in addition to the max number of active connections when you get sudden bursts of traffic
Configure apache with this in mind. This is the best guide to apache config I've seen: Practical mod_perl chapter 11
Take as much load off of apache as you can. Apache's too heavy-duty to serve static content efficiently. You should be using a lighter-weight reverse proxy (like squid) or webserver (lighttpd or nginx) to serve static content, and to take over the job of spoon-feeding bytes to slow clients. This leaves Apache to do what it does best: execute your code. Again, the mod_perl book does a good job of explaining this.
Once you've gotten this far, it's largely an issue of caching more, and keeping an eye on your database. Eventually, you'll outgrow a single server. First, you'll probably add more front end boxes, all backed by a single database server. Then you're going to have to start spreading your database load around, probably by sharding. For an excellent overview of this growth process, see this livejournal presentation
For a more in-depth look at much of the above, check out Building Scalable Web Sites, by Cal Henderson, of Flickr fame. Google has portions of the book available for preview
I've used MysqlTuner for performance analysis on my mysql servers and its given a good insight into further issues for googling, as well as making its own recommendations
A resource you might find helpful is the YDN set of performance rules.
Don't forget the fact that your users will be thousands of miles away from your server, and downloading dozens of files to render a single page. That latency, and the overhead of rendering the page in their browsers can be larger than the amount of time that you spend collecting the information, and generating the page.
See the pages at Yahoo Developer Network about Best Practices for Speeding Up Your Web Site, and the YSlow tool for seeing what part of the downloading of the site is taking time.
Don't forget to turn off atime for your filesystem!
I'd recommend using Jet Profiler for MySQL to find any bad queries. I've successfully used it on a couple of my sites. Really helpful, and much easier to digest than the slow query log.
I'd recommend starting with http://highscalability.com/
As for your suggestions:
Compression for images, definitely no. Type of files system tunning, yes, that could have some effect, but minimal. But actually the best is to use in-memory reverse proxy, or even better CDN.
For Apache basically only load the modules you need. Do not load anything else. As with PHP you can only use forking MPM, it's important to keep it slim. As for optimal settings, well you have to fine tune them to specific application, hardware etc. If you have enough CPU, it's recommendable that you use mod_deflate. Faster the server can send data to the client, faster it can start processing next request.

Categories