I have a high traffic website and I need make sure my site is fast enough to display my pages to everyone rapidly.
I searched on Google many articles about speed and optimization and here's what I found:
Cache the page
Save it to the disk
Caching the page in memory:
This is very fast but if I need to change the content of my page I have to remove it from cache and then re-save the file on the disk.
Save it to disk
This is very easy to maintain but every time the page is accessed I have to read on the disk.
Which method should I go with?
Jan & idm are right but here's how to:
Caching (pages or contents) is crutial for performance. The minimum calls you request to the database or the file system is better whether if your content is static or dynamic.
You can use a PHP accelerator if you need to run dynamic content:
My recommendation is to use Alternative PHP Cache (APC)
Here's some benchmark:
What is the best PHP accelerator to use?
PHP Accelerators : APC vs Zend vs XCache with Zend Framework
Lighttpd – PHP Acceleration Benchmarks
For caching content and even pages you can use: Memcached or Redis.
Memcached:
Free & open source, high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load. Memcached is an in-memory key-value store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.
Redis
Redis is an open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.
Both are very good tool for caching contents or variables.
Here's some benchmark and you can choose which one you prefer:
Redis vs Memcached
Redis vs Memcached
Redis VS Memcached (slightly better bench)
On Redis, Memcached, Speed, Benchmarks and The Toilet
You can install also Varnish, nginx, or G-Wan
Varnish:
Varnish is an HTTP accelerator designed for content-heavy dynamic web sites. In contrast to other HTTP accelerators, such as Squid, which began life as a client-side cache, or Apache, which is primarily an origin server, Varnish was designed from the ground up as an HTTP accelerator.
nginx
nginx (pronounced ?engine-x?) is a lightweight, high-performance Web server/reverse proxy and e-mail (IMAP/POP3) proxy, licensed under a BSD-like license. It runs on Unix, Linux, BSD variants, Mac OS X, Solaris, and Microsoft Windows.
g-wan
G-WAN is a Web server with ANSI C scripts and a Key-Value store which outperform all other solutions.
Here's some benchmark and you can choose which one you prefer:
Serving static files: a comparison between Apache, Nginx, Varnish and G-WAN
Web Server Performance Benchmarks
Nginx+Varnish compared to Nginx
Apache, Varnish, nginx and lighttpd
G-WAN vs Nginx
You have a good idea, which is close to what i do myself. If i have a page that is 100% static, i'll save a html version of it and serve that to the user instead of generating the content again every time. This saves both mysql queries and several io operations in some cases. Every time i make some change, my administration interface simply removes the html file and recreates it.
This method has proven to be around 100x faster on my server.
The big question with website performance is "do you serve static pages, or do you serve dynamic pages?".
Static pages
The best way to speed up static pages is to cache them outside your website. If you can afford to, serve them from a CDN (Akamai, Cotendo, Level3). In this case, the traffic never hits your site. There are several ways to control the cache - from fixed duration to the standard HTTP cache directives.
Even if you can't serve your HTML from a CDN, storing your images, javascript and other static assets on a CDN can speed up your site - you could use a cloud service like Amazon for this.
If you can't afford a CDN for your HTML, you could use your own caching proxy layer, as book of Zeus suggests. I've had good results with Varnish. Ideally, you'd run your caching proxy on its own hardware - but you can run it on your existing servers.
Dynamic pages
Dynamic pages are harder to cache - so then you need to concentrate on making the pages themselves as efficient as possible. This basically means hunting the bottleneck - in most systems, the bottleneck is the database (but by no means always).
If you're confident your bottleneck is the database, there are several ways caching options - you can cache "snippets" of HTML, or you can cache database queries. Using an accelerator helps with this - I wouldn't invent one from scratch. This probably means re-architecting (parts of) your application.
You have to profile your site first.
Instead of wild guess one have to determine certain bottleneck(s) and then solve that certain problem.
Cahing is not a silver bullet nor a synonym for the optimization.
Sometimes caching is not applicable (for the ads, for example), sometimes it will help nothing as the reason of ht site slowness may be in some unrelated spot.
Your site may run out of memory. So, memory caching will make the things worse.
I can't believe someone has a high traffic site and said nmot a word of the prior profiling. How can you run it knowing nothing of it's internals? CPU load, memory load, disk i/o and such.
I can add:
Cache everything you can
Minimize number of includes
Use accelerator
Please, investigate, what makes your site slow. Don't forget about YSlow and similar things, they can help you a lot.
Besides, if you have heavy calculations you could write php extension for them, but i don't think this is your case
Related
i'm new in PHP and want to try caching(for the first time), so i make website and it has :
dynamic home page
dynamic portfolio page
dynamic contact page
static about page
static admin page
so i read the tutorial about caching and i try to make my own caching system:
using file cache based on the what page is requested, when the page is requested the cache system will check if there's cache in cache directory if there's no cache file yet then write all the output(html) from the php script(in this case output from output buffer) and if there's cache file that corresponds with the specific id(based on URI) then just include_once() the html file.
Then i read in CodeIgniter(i make this website using CI) says there's APC for caching, then i read again about APC, what i read about APC is that it caches the DB results, but now i'm confused which should i use
what i get so far:
file caching probably would slower if there's alot of request (i dont know if this is true or not but i read it somewhere from search engine)
APC is fast
but i'm still confused which i should use , i'm on shared hosting
The levels of caching most relevant in a PHP application:
File / Script caching - The operating system will actually do this to a large extent. When a file is opened it's added to an OS-level cache. It stays there until the file is touched or the OS needs to free memory for other processes. A homegrown PHP solution isn't a good replacement for this.
Opcode caching - In order to function, PHP needs to parse and compile a script into opcodes. A mechanism like APC will cache the opcodes of every PHP script executed by Apache, provided that the cache doesn't overflow. A homegrown PHP solution build on top of APC can partially do this, but APC already does it ... so don't bother.
Query caching - If your script accesses a lot of data that doesn't change very frequently, or wherein some latency between updates and the visibility of those updates is acceptable, caching the results from complex queries is beneficial. A homegrown PHP solution built on APC is acceptable and beneficial at this level. But a database level solution is also appropriate here, and often more appropriate.
Output caching - If your page is largely deterministic and/or the same sort of latency applicable to query caching is acceptable, you can cache the entire output of the script using output buffering and APC. A homegrown PHP solution built on APC is acceptable here, but generally not necessary. If the page is static, you're probably not saving yourself any re-computation. And if it's dynamic, it's usually preferable to just re-render the page anyway.
In a dedicated or virtual-dedicated environment you'd need install APC (or something similar) yourself. But, in a shared hosting environment, it's very likely that APC is installed. And if it weren't you couldn't install it yourself anyway.
And, due to my own uncertainty, I'd recommend not performing any query or output caching with APC in a shared environment -- I'm not sure whether APC segregates caches by virtual host. Even if it does, I wouldn't assume that my site is truly a separate virtual host.
I'm using Cache_Lite for html and array Cache in my project. I found Cache_Lite may lead to high system IO problem. Maybe because the performance of Cache_Lite is not good
I'm asking is there any stable php html/page cache to use?
I already have APC installed for opcode cache, Memcached installed for common data/array cache.
I've had exact problem with Cache Lite, as library doesn't properly implement file locks.
Solved it with new library and drop in replacement for Cache Lite.
https://github.com/mpapec/simple-cache/blob/master/example_clite1.php
https://github.com/mpapec/simple-cache/blob/master/example_clite2.php
https://github.com/mpapec/simple-cache/blob/master/example_clite3.php
Just to mention that library lacks some features that I didn't found useful like cache cleaning and caching in memory (_memoryCaching property which is false by default and marked as "beta quality" in original library).
Algorithm which is used for file locking follows this diagram,
Without more information it is hard to know if you are currently experiencing an IO problem or are likely to experience an IO problem in the future. (If your site is not getting much traffic or you are using a SSD you are unlikely to have a problem)
Cache Lite appears to be a file based caching system. This may lead to IO problems if your site experiences heavy load / lots of concurrent users / is hosted on a shared server / has other programs heavily using the filesystem.
An alternative to Cache Lite is memcache which is a key/value store that stores data in memory. This may not be suitable if you are storing large amounts of data or you server does not have any spare RAM as it stores all of its information in memory. Another benefit of memory is that it is much faster than accessing files from the disk. If you are only accessing a small amount of data or the same data repeatedly this is not likely to be an issue though because of disk/OS caching.
I would suggest checking to see if your system is currently experiencing any issues with IO before worrying about IO performance (unless you plan on getting slashdotted or something)
You could install a tool like Munin http://munin-monitoring.org/ and monitor your system to see if IO is a problem or is becoming a problem. Once installed check the cpu graph and look at the iowait data.
EDIT: Just saw the comment above, depending on your needs reverse proxys are another great tool checkout https://www.varnish-cache.org/ . At work we use a combination of the two ( memcache and varnish) We have 1 machine serving over 900,000 page views per month, this site includes static and dynamic content.
If you're talking about https://pear.php.net/package/Cache_Lite then i could tell you a story. We used it once, but it proved to be unreliable for websites with lots of request.
We then switched to Zend_Cache (ZF1) in combination with memcached. I can be used as standalone component.
However, you have to tune it a bit in order to use tags. There are a few implementations out there to get the job done: https://github.com/bigwhoop/taggable-zend-memcached-backend
Trying to get to grips with the different types of cache engines File, APC, Xcache, Memcache. Anybody know of any good resources/links?
Note I am using Linux, PHP and mysql
There are 2 types of caching terminology thrown around in PHP.
First is an optcode cache:
http://en.wikipedia.org/wiki/PHP_accelerator
Second is a data cache:
http://simas.posterous.com/php-data-caching-techniques
A few of the technologies can cross boundaries into both realms, but the basics behind them are simple. The idea is: Keep as much data in ram and precompiled because compiling and HD seeks are very expensive processes. HD Seeks can be done to find a file to compile / query the DB to get data / looking for a temp file, and every time that happens it slows down the user experience.
Memcached is generally the way to go, but it has some "features" such as once you save some data to t cache, it doesn't necessarily guarantee that it will be available later as it dynamically removes old caches to make way for new ones. It's also fairly basic, you'll need to roll your own system for handling timeouts and preventing cascading but it's all fairly simple. There's tons of info in the Memcached FAQ, or feel free to ask and I'll post some code examples. Memcached can also act as a session handler which is great if you have lots of users or more than one server.
Otherwise disc caching is good if you only have one server or don't mind generating separate caches of each server. Generally faster than memcached as it doesn't have the network overhead (unless you have memcached on the same server). There are plenty of good disc caching frameworks but probably the best are Pear Cache_Lite and APC.
APC also has the added advantage that it can cache your compiled PHP code which may help on high-performance websites.
I'm involved in a project that will end up creating around 10 million new pages on an existing site. The site, and the new project, are built with CodeIgniter and connecting to MySQL.
I've never dealt with a site of this size before, and I'm concerned about how we should handle caching. Has anyone dealt with caching on a PHP site of this size that could give me some pointers? I'm used to the CodeIgniter caching system and similar, but the number of cache files that would create worries me.
Any suggestions would be appreciated.
I haven't done anything on that scale, but I don't see a problem with file-based caching as long as the caching mechanism isn't completely dumb, and you're using a modern filesystem. Distributing cache files throughout a directory tree is smart enough.
If you're worried, that's good. Of course, I would suggest writing a wrapper around CI's built-in mechanism, so that you can easily swap it out for something else (Like Zend_Cache, possibly with a beefy memcached server, or some smarter file-based system of your own design).
There are several layers of caching available to PHP and CodeIgniter, but you shouldn't have to worry about the number of cached files on a standard linux server (various file systems can handle hundreds of millions of files per mount point). But to pick your caching method, you need to measure carefully.
Options:
Opcode caching (Zend, eAccelerator, and more)
CodeIgniter view caching (configured per view)
CodeIgniter read query caching
General web caching (more info)
Optimize your database (more info)
(and so on)
Additionally, you can improve the file caches by using memory file systems and in-memory tables.
The real question is, how do you pick caching strategies? Capacity planning. You model your system (users, accounts, pages, files), simulate, measure, and add caches based on best theories. Measure again. Produce new theories and measurements until you have approaches that fit your desired scale.
In my experience, view caching and web caching are a big gain for widely read sites (WPSuperCache, for example). Opcode caching (and other forms of min-imisation) are useful for heavily dynamic sites, as is database performance tuning.
FYI: If the system runs on a Windows server: Windows can (could?) max. have approx. 65.000 files in a folder, including cache folders. Not sure if this upper limit has been fixed in newer versions.
All big guys use APC.
The number of webpages is not relevant.
The relevant number is the number of hits (pageviews ).
And if you design for speed ditch the Windows machines.
I'd like to have your opinion about writing web apps in PHP vs. a long-running process using tools such as Django or Turbogears for Python.
As far as I know:
- In PHP, pages are fetched from the hard-disk every time (although I assume the OS keeps files in RAM for a while after they've been accessed)
- Pages are recompiled into opcode every time (although tools from eg. Zend can keep a compiled version in RAM)
- Fetching pages every time means reading global and session data every time, and re-opening connections to the DB
So, I guess PHP makes sense on a shared server (multiple sites sharing the same host) to run apps with moderate use, while a long-running process offers higher performance with apps that run on a dedicated server and are under heavy use?
Thanks for any feedback.
After you apply memcache, opcode caching, and connection pooling, the only real difference between PHP and other options is that PHP is short-lived, processed based, while other options are, typically, long-lived multithreaded based.
The advantage PHP has is that its dirt simple to write scripts. You don't have to worry about memory management (its always released at the end of the request), and you don't have to worry about concurrency very much.
The major disadvantage, I can see anyways, is that some more advanced (sometimes crazier?) things are harder: pre-computing results, warming caches, reusing existing data, request prioritizing, and asynchronous programming. I'm sure people can think of many more.
Most of the time, though, those disadvantages aren't a big deal. You can scale by adding more machines and using more caching. The average web developer doesn't need to worry about concurrency control or memory management, so taking the minuscule hit from removing them isn't a big deal.
With APC, which is soon to be included by default in PHP compiled bytecode is kept in RAM.
With mod_php, which is the most popular way to use PHP, the PHP interpreter stays in web server's memory.
With APC data store or memcache, you can have persistent objects in RAM instead of for example always creating them all anew by fetching data from DB.
In real life deployment you'd use all of above.
PHP is fine for either use in my opinion, the performance overheads are rarely noticed. It's usually other processes which will delay the program. It's easy to cache PHP programs with something like eAccelerator.
As many others have noted, PHP nor Django are going to be your bottlenecks. Hitting the hard disk for the bytecode on PHP is irrelevant for a heavily trafficked site because caching will take over at that point. The same is true for Django.
Model/View and user experience design will have order of magnitude benefits to performance over the language itself.
PHP is a language like Java etc.
Only your executable is the php binary and not the JVM! You can set another MAX-Runtime for PHP-Scripts without any problems (if your shared hosting provider let you do so).
Where your apps are running shouldn't depend on the kind of the server. It should depend on the ressources used by the application (CPU-Time,RAM) and what is given by your Server/Vserver/Shared Host!
For performance tuning reasons you should have a look at eAccelerator etc.
Apache supports also modules for connection pooling! See mod_dbd.
If you need to scale (like in a cluster) you can use distributed memory caching systems like memcached!