I'm involved in a project that will end up creating around 10 million new pages on an existing site. The site, and the new project, are built with CodeIgniter and connecting to MySQL.
I've never dealt with a site of this size before, and I'm concerned about how we should handle caching. Has anyone dealt with caching on a PHP site of this size that could give me some pointers? I'm used to the CodeIgniter caching system and similar, but the number of cache files that would create worries me.
Any suggestions would be appreciated.
I haven't done anything on that scale, but I don't see a problem with file-based caching as long as the caching mechanism isn't completely dumb, and you're using a modern filesystem. Distributing cache files throughout a directory tree is smart enough.
If you're worried, that's good. Of course, I would suggest writing a wrapper around CI's built-in mechanism, so that you can easily swap it out for something else (Like Zend_Cache, possibly with a beefy memcached server, or some smarter file-based system of your own design).
There are several layers of caching available to PHP and CodeIgniter, but you shouldn't have to worry about the number of cached files on a standard linux server (various file systems can handle hundreds of millions of files per mount point). But to pick your caching method, you need to measure carefully.
Options:
Opcode caching (Zend, eAccelerator, and more)
CodeIgniter view caching (configured per view)
CodeIgniter read query caching
General web caching (more info)
Optimize your database (more info)
(and so on)
Additionally, you can improve the file caches by using memory file systems and in-memory tables.
The real question is, how do you pick caching strategies? Capacity planning. You model your system (users, accounts, pages, files), simulate, measure, and add caches based on best theories. Measure again. Produce new theories and measurements until you have approaches that fit your desired scale.
In my experience, view caching and web caching are a big gain for widely read sites (WPSuperCache, for example). Opcode caching (and other forms of min-imisation) are useful for heavily dynamic sites, as is database performance tuning.
FYI: If the system runs on a Windows server: Windows can (could?) max. have approx. 65.000 files in a folder, including cache folders. Not sure if this upper limit has been fixed in newer versions.
All big guys use APC.
The number of webpages is not relevant.
The relevant number is the number of hits (pageviews ).
And if you design for speed ditch the Windows machines.
Related
I'm using Cache_Lite for html and array Cache in my project. I found Cache_Lite may lead to high system IO problem. Maybe because the performance of Cache_Lite is not good
I'm asking is there any stable php html/page cache to use?
I already have APC installed for opcode cache, Memcached installed for common data/array cache.
I've had exact problem with Cache Lite, as library doesn't properly implement file locks.
Solved it with new library and drop in replacement for Cache Lite.
https://github.com/mpapec/simple-cache/blob/master/example_clite1.php
https://github.com/mpapec/simple-cache/blob/master/example_clite2.php
https://github.com/mpapec/simple-cache/blob/master/example_clite3.php
Just to mention that library lacks some features that I didn't found useful like cache cleaning and caching in memory (_memoryCaching property which is false by default and marked as "beta quality" in original library).
Algorithm which is used for file locking follows this diagram,
Without more information it is hard to know if you are currently experiencing an IO problem or are likely to experience an IO problem in the future. (If your site is not getting much traffic or you are using a SSD you are unlikely to have a problem)
Cache Lite appears to be a file based caching system. This may lead to IO problems if your site experiences heavy load / lots of concurrent users / is hosted on a shared server / has other programs heavily using the filesystem.
An alternative to Cache Lite is memcache which is a key/value store that stores data in memory. This may not be suitable if you are storing large amounts of data or you server does not have any spare RAM as it stores all of its information in memory. Another benefit of memory is that it is much faster than accessing files from the disk. If you are only accessing a small amount of data or the same data repeatedly this is not likely to be an issue though because of disk/OS caching.
I would suggest checking to see if your system is currently experiencing any issues with IO before worrying about IO performance (unless you plan on getting slashdotted or something)
You could install a tool like Munin http://munin-monitoring.org/ and monitor your system to see if IO is a problem or is becoming a problem. Once installed check the cpu graph and look at the iowait data.
EDIT: Just saw the comment above, depending on your needs reverse proxys are another great tool checkout https://www.varnish-cache.org/ . At work we use a combination of the two ( memcache and varnish) We have 1 machine serving over 900,000 page views per month, this site includes static and dynamic content.
If you're talking about https://pear.php.net/package/Cache_Lite then i could tell you a story. We used it once, but it proved to be unreliable for websites with lots of request.
We then switched to Zend_Cache (ZF1) in combination with memcached. I can be used as standalone component.
However, you have to tune it a bit in order to use tags. There are a few implementations out there to get the job done: https://github.com/bigwhoop/taggable-zend-memcached-backend
I installed Drupal common from acquia and using it for my college Intranet Website. I configured it on Ubuntu lucid lynx Desktop edition running latest XAMPP. I want to increase the performance of the website. My databse server and webserver is on same machine.
Can any one suggest methos to increase the performance on following point
What should be the ideal hardware configuration
What parameters should i change in PHP to run it for best performance?
How can I optimize apache and My SQL to get best performance out of both??
are there tweaks in drupal which can make it more faster?
Are there any additional packages for caching etc which can improve the speed??
Also, try Varnish if you're using PressFlow, as suggested by berkes. It helps a lot if you have to serve content for anonymous users.
Varnish can cache in memory all the content that Drupal produces, reducing hits to your web server and database.
Here a good start point for configuring Varnish with Pressflow:
https://wiki.fourkitchens.com/display/PF/Configure+Varnish+for+Pressflow
Google some for more details.
And don't forget about non Drupal related optimization, like reducing the number of http requests, serving web page elements from different domains to reduce browser pipelining, etc. Use YSlow and follow Yahoo's excellent rules. Google for "yahoo Best Practices for Speeding Up Your Web Site" (can't include link due to SO limitation for new users).
Is not specific for Drupal, but for every PHP setup. More general: for each web-app. I advise you to start with O'Reilly's Building Scalable Websites.
See above. For Drupal, note the memory limit; many people just crank it up to rediculous values; after logic: Drupal needs more then 38MB, I'll just give it 250MB, to be safe.
Again, see above. For Drupal, pay extra attention to the amount of queries. If you focus on Slow Queries only, you may miss that single tiny query hammering your DB 100+ times per request.
Lots. My advice is to start looking at pressflow, an optimised Drupal. It has all the tweaks you are looking for built in. And more.
Yes. Many-, but start with memcached. And if you rely on search a lot, consider moving search to SOLR search.
Many more tips for starters can be found at Drupal performance Blog
The question you ask is very broad, so it is hard to give any specifics in answers. A good place to start is drupal's own handbook on performance tuning.
I would also highly recommend the boost module if your site serves largely anonymous users, as this allows requests to not even go to drupal and be served entirely from a static cache.
Drupal's Devel module has a Performance module that will log memory usage and access times to the Reports section of your site.
Use this to determine which pages on your site are slow.
Load xdebug (a PHP extension) and turn on the profiling feature. Make requests to your performance-intensive pages and it will create (very large) dumps of the entire request. Open up the cache file in a program like KCacheGrind or WinCacheGrind and you will be able to see every function call that Drupal made when building the page. From here you can see which parts are slowest and optimize them.
This should get you a good 30-80% improvement in performance if you have a slow site. In my experience, there's usually a few blocks or views that account for a huge part of any performance issues.
Pro Drupal 7 Development has a whole section regarding fine-tuning called "optimizing drupal".
I think you will find it quite interesting. It also discusses hardware architectures which is of your interest.
Regarding the 4th question, you can for a start checkout the boost module and disable modules you are not using.
Additionally, for improving page-performance you can enable page caching from Configuration -> Performance. In the same page you can use the aggregate and compress CSS(JS) files into one", in this way you reduce the number of HTTP requests per page and the overall size of the downloaded page.
You should also consider if CRON is setup. Not running cron can fill up the db with log , stale cache and other "garbage".
A last suggestion is to convert your db from MyIsam to InnoDB, but I think this requires some investigation because it not always the case that InnoDB is faster. With InnoDb there is less time lost from table locking while MyISAM is faster in table readings.
I am going to develop a social + professional networking website using Php (Zend or Yii framework). We are targeting over 5000 requests per minute. I have experience in developing advanced websites, using MVC frameworks.
But, this is the first time, I am going to develop something keeping scalability in mind. So, I will really appreciate, if someone can tell me about the technologies, I should be looking for.
I have read about memcache and APC. Which one should I look for? Also, should I use a single Mysql server or a master/slave combination (if its later, then why and how?)
Thanks !
You'll probably want to architect your site to use, at minimum, a master/slave replication system. You don't necessarily need to set up replicating mysql boxes to begin with, but you want design your application so that database reads use a different connection than writes (even if in the beginning both connections connect to the same db server).
You'll also want to think very carefully about what your caching strategy is going to be. I'd be looking at memcache, though with Zend_Cache you could use a file-based cache early on, and swap in memcache if/when you need it. In addition to record caching, you also want to think about (partial) page-level caching, and what kind of strategies you want to plan/implement there.
You'll also want to plan carefully how you'll handle the storage and retrieval of user-generated media. You'll want to be able to easily move that stuff off the main server onto a dedicated box to serve static content, or some kind of CDN (content distribution network).
Also, think about how you're going to handle session management, and make sure you don't do anything that will prevent you from using a non-file-based session storage ((dedicated) database, or memcache) in the future.
If you think carefully, and abstract data storage/retrieval, you'll be heading in a good direction.
Memcached is a distributed caching system, whereas APC is non-distributed and mainly an opcode cache.
If (and only if) your website has to live on different webservers (loadbalancing), you have to use memcache for distributed caching. If not, just stick to APC and its cache.
About MySQL database, I would advise a gridhosting which can autoscale according to requirements.
Depending on the requirements of your site it's more likely the database will be your bottle neck.
MVC frameworks tend to sacrifice performance for easy of coding, especially in the case of ORM. Don't rely on the ORM, instead benchmark different ways of querying the database and see which suits. You want to minimise the number of database queries, fetch a chunk of data at once instead of doing multiple small queries.
If you find that your php code is a bottle neck(profile it before optimizing) you might find facebook's hiphop useful.
I am creating a new PHP framework depending on Zend Framework.
It will be a general purpose MVC framework for web development.
I am worried about 2 aspects:
Logging:
Should I use logging? Is there any substantial performance problems when using logging?
Caching database queries:
I am caching some queries from database.
I am concerned about caching user related information. Suppose there are some information related to users. Like their personal info, etc.
If I cache such data, for every user a cache file will be generated in my data folder. Now suppose there are 10,000 - 20,000 users online in 2 hours span of time. These means that there will be 20000 files on my folder.
My question is that, will it affect the performance of my server. Is there any upper limit on how many files a folder can have on server.
Do not use a file based cache. File system operations are exceptionally slow: http://imgur.com/X1Hi1.gif . Use memcached, you don't need a lot of memory contrary to what the above post says, the amount of memory you need for it is totally proportional to how much stuff you want to store, plus memcached can cull data based on access frequency.
1) You definitely want logging, I'd recommend xdebug available at http://www.xdebug.org/. You can read further about the performance overheads at their site. (plus it integrates nicely with Eclipse's PHP version.)
2) I'm not really sure I'd want to cache much user information, but memcache is probably one of the better choices for caching in php (http://se2.php.net/memcache). And yeah, there's no limit on file number, and you'll probably not be going over the 32-bit filesize limit either =)
Caching is a real problem it's almost impossible to get it right from a user/programmer perspective. I wouldn't cache things as simple as user data. This is already cached in the database. Focus more on complex queries and complete webpages (or parts of it).
Unless you have a page like stackoverflow where i see really few ways to cache anything you have to search hard and check your logfiles about what users do on your site and you will see some hotspots soon.
Memcache is not recommended by me unless you have a lot of memory (> 8GB) on your machine. Memcache works best if you throw in Memcache servers with 16 GB doing nothing else them caching things.
For smaller sites, hardware and requirements you should consider APC as this is a very low overhead cache for data and it speeds up the execution of php at the same time (you don't want to run a production server without a bytecode cache).
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I want to know when building a typical site on the LAMP stack how do you optimize it for the best possible load times. I am picturing a typical DB-driven site.
This is a high-level look and could probably pull in question and let me break it down into each layer of the stack.
L - At the system level, (setup and filesystem) can you do to improve speed? One thing I can think of is image sizes, can compression here help optimize anything?
A - There have to be a ton of settings related to site speed here in the web server. Not my Forte. Probably depends a lot on how many sites are running concurrently.
M - MySQL in a database driven site, DB performance is key. Is there a better normalization approach i.e, using link tables? Web developers often just make simple monolithic tables resembling 1NF and this can kill performance.
P - aside from performance-boosting settings like caching, what can the programmer do to affect performance at a high level? I would really like to know if MVC design approaches hit performance more than quick-and-dirty. Other simple tips like are sessions faster than cookies would be interesting to know.
Obviously you have to get down and dirty into the details and find what code is slowing you down. Also I realize that many sites have many different performance characteristics, but let's assume a typical site that has more reads then writes.
I am just wondering if we can compile a bunch of best practices and fully expect people to link other questions so we can effectively workup a checklist.
My goal is to see if even in addition to the usual issues in performance we can see some oddball things you might not think of crop up to go along with a best-practices summary.
So my question is, if you were starting from scratch, how would you make sure your LAMP site was fast?
Here's a few personal must-dos that I always set up in my LAMP applications.
Install mod_deflate for apache, and
do not use PHP's gzip handlers.
mod_deflate will allow you to
compress static content, like
javascript/css/static html, as well
as the usual dynamic PHP output, and
it's one less thing you have to worry
about in your code.
Be careful with .htaccess files!
Enabling .htaccess files for
directories in your app means that
Apache has to scan the filesystem
constantly, looking for .htaccess
directives. It is far better to put
directives inside the main
configuration or a vhost
configuration, where they are loaded
once. Any time you can get rid of a
directory-level access file by moving
it into a main configuration file,
you save disk access time.
Prepare your application's database
layer to utilize a connection manager
of some sort (I use a Singleton for
most applications). It's not very
hard to do, and reducing the number
of database connections your
application opens saves resources.
If you think your application will
see significant load, memcached can
perform miracles. Keep this in mind
while you write your code... perhaps
one day instead of creating objects
on the fly, you will be getting them
from memcached. A little foresight
will make implementation painless.
Once your app is up and running, set
MySQL's slow query time to a small
number and monitor the slow query log
diligently. This will show you where
your problem queries are coming from,
and allow you to optimize your
queries and indexes before they
become a problem.
For serious performance tweakers, you
will want to compile PHP from source.
Installing from a package installs a
lot of libraries that you may never
use. Since PHP environments are
loaded into every instance of an
Apache thread, even a 5MB memory
overhead from extra libraries quickly
becomes 250MB of lost memory when
there's 50 Apache threads in
existence. I keep a list of my
standard ./configure line I use when
building PHP here, and I find it
suits most of my applications. The
downside is that if you end up
needing a library, you have to
recompile PHP to get it. Analyze
your code and test it in a devel
environment to make sure you have
everything you need.
Minify your Javascript.
Be prepared to move static content,
such as images and video, to a
non-dynamic web server. Write your
code so that any URLs for images and
video are easily configured to point
to another server in the future. A
web server optimized for static
content can easily serve tens or even
hundreds of times faster than a
dynamic content server.
That's what I can think of off the top of my head. Googling around for PHP best practices will find a lot of tips on how to write faster/better code as well (Such as: echo is faster than print).
First, realize that performance is an iterative process. You don't build a web application in a single pass, launch it, and never work on it again. On the contrary, you start small, and address performance issues as your site grows.
Now, onto specifics:
Profile. Identify your bottlenecks. This is the most important step. You need to focus your effort where you'll get the best results. You should have some sort of monitoring solution in place (like cacti or munin), giving you visibility into what's going on on your server(s)
Cache, cache, cache. You'll probably find that database access is your biggest bottleneck on the back end -- but you should verify this on your own. Fortunately, you'll probably find that a lot of your traffic is for a small set of resources. You can cache those resources in something like memcached, saving yourself the database hit, and resulting in better backend performance.
As others have mentioned above, take a look at the YDN performance rules. Consider picking up the accompanying book. This'll help you with front end performance
Install PHP APC, and make sure it's configured with enough memory to hold all your compiled PHP bytecode. We recently discovered that our APC installation didn't have nearly enough ram; giving it enough to work in cut our CPU time in half, and disk activity by 10%
Make sure your database tables are properly indexed. This goes hand in hand with monitoring the slow query log.
The above will get you very far. That is to say, even a fairly db-heavy site should be able to survive a frontpage digg on a single modestly-spec'd server if you've done the above.
You'll eventually hit a point where the default apache config won't always be able to keep up with incoming requests. When you hit this wall, there are two things to do:
As above, profile. Monitor your apache activity -- you should have an idea of how many connections are active at any given time, in addition to the max number of active connections when you get sudden bursts of traffic
Configure apache with this in mind. This is the best guide to apache config I've seen: Practical mod_perl chapter 11
Take as much load off of apache as you can. Apache's too heavy-duty to serve static content efficiently. You should be using a lighter-weight reverse proxy (like squid) or webserver (lighttpd or nginx) to serve static content, and to take over the job of spoon-feeding bytes to slow clients. This leaves Apache to do what it does best: execute your code. Again, the mod_perl book does a good job of explaining this.
Once you've gotten this far, it's largely an issue of caching more, and keeping an eye on your database. Eventually, you'll outgrow a single server. First, you'll probably add more front end boxes, all backed by a single database server. Then you're going to have to start spreading your database load around, probably by sharding. For an excellent overview of this growth process, see this livejournal presentation
For a more in-depth look at much of the above, check out Building Scalable Web Sites, by Cal Henderson, of Flickr fame. Google has portions of the book available for preview
I've used MysqlTuner for performance analysis on my mysql servers and its given a good insight into further issues for googling, as well as making its own recommendations
A resource you might find helpful is the YDN set of performance rules.
Don't forget the fact that your users will be thousands of miles away from your server, and downloading dozens of files to render a single page. That latency, and the overhead of rendering the page in their browsers can be larger than the amount of time that you spend collecting the information, and generating the page.
See the pages at Yahoo Developer Network about Best Practices for Speeding Up Your Web Site, and the YSlow tool for seeing what part of the downloading of the site is taking time.
Don't forget to turn off atime for your filesystem!
I'd recommend using Jet Profiler for MySQL to find any bad queries. I've successfully used it on a couple of my sites. Really helpful, and much easier to digest than the slow query log.
I'd recommend starting with http://highscalability.com/
As for your suggestions:
Compression for images, definitely no. Type of files system tunning, yes, that could have some effect, but minimal. But actually the best is to use in-memory reverse proxy, or even better CDN.
For Apache basically only load the modules you need. Do not load anything else. As with PHP you can only use forking MPM, it's important to keep it slim. As for optimal settings, well you have to fine tune them to specific application, hardware etc. If you have enough CPU, it's recommendable that you use mod_deflate. Faster the server can send data to the client, faster it can start processing next request.