I use MySQL for my primary database, where I keep the actual objects. When an object is rendered using a template, rendering takes a lot of time.
Because of that I've decided to cache the produced HTML. Right now I store the cache in files, named appropriate, and it does work significantly faster. I am however aware that it is not the best way to do so.
I need a (preferably key-value) database to store my cache in. I cannot use a caching proxy because I still need to process the cached HTML. Is there such a database with a PHP front end?
Edit: If I use memcached, and I cache about a million pages, won't I run out of RAM?
Edit 2: And again, I have a lot of HTML to cache (gigabytes of it).
If I use memcached, and I cache about
a million pages, won't I run out of
RAM?
Memcached
memcached is also a real solid product(like redis more) used at all big sites to keep them up and running. Almost al active tweets(which user fetch) are stored in memcached for insane performance.
If you want to be fast you should have your active dataset in memory. But yeah if the dataset is bigger then your available memory you should(should always store data in persistent datastore because memcached is volatile) store data in a persistent datastore like for example mysql. When it's not available in memory you will try and fetch it from datastore and cache it memcache for future reference(with expire header).
Redis
I really like redis because it is an advanced key-value store with insane performance
Redis is an advanced key-value store.
It is similar to memcached but the
dataset is not volatile, and values
can be strings, exactly like in
memcached, but also lists, sets, and
ordered sets. All this data types can
be manipulated with atomic operations
to push/pop elements, add/remove
elements, perform server side union,
intersection, difference between sets,
and so forth. Redis supports different
kind of sorting abilities.
Redis has a VM so you don't need a seperate persisent datastore. I really like redis because of all the available commands (power :)?). This tutorial by simon willison displays(a lot of) the raw power which redis has.
Speed
Redis is pretty fast!, 110000 SETs/second, 81000 GETs/second in an entry level Linux box. Check the benchmarks.
Commits
Redis is more actively developed. 8 hours ago antirez(redis) commited something versus memcached 12 November latest commit.
Install Redis
Redis is insanely easy to install. It has no dependencies. You only have to perform:
make
./redis-server redis.conf #start redis
to compile redis(Awesome :)?).
Install Memcached
Memcached has dependency(libevent) which makes it more difficult to install.
wget http://memcached.org/latest
tar -zxvf memcached-1.x.x.tar.gz
cd memcached-1.x.x
./configure
make && make test
sudo make install
not totally true because memcached has libevent dependency and ./configure will fail of libevent is missing. But then again they have packages which are cool, but require root to install.
Redis is pretty fast: 110,000
SETs/second
If speed is a concern, why use the network layer?
According to: http://tokutek.com/downloads/mysqluc-2010-fractal-trees.pdf
InnoDB inserts ....................43,000 records per second AT ITS PEAK*;
TokuDB inserts ....................34,000 records per second AT ITS PEAK*;
G-WAN KV inserts ....100,000,000 records per second
(*) after a few thousands of inserts, performances degrade severely for InnoDB and TokuDB which end to write to disk when their cache and the system cache and the disk controller cache are full. See the PDF for an interesting discussion of the problems caused by the topology of the InnoDB database index (which severely breaks locality while the Fractals topology scales much better... but still not linearly).
To clarify the answers into logical views:
Flat Files are as fast the storage medium being used (DISK or RAM)
An environment which caches in RAM the MRU (Most Recently Used) items
Solution has a smart/fast hash index to all locations (what SQL systems rely on)
That combination will get you the best solution that you are looking for.
For argument sake, flat file or not - excluding a MEMORY ONLY solution - all engines use some form of flat file. The magic is knowing where your data is, and tuning reads to pull the data back most optimal. In the 80's at IBM we used a fixed record length flat file design - which wasn't optimized for disk space, it was optimized for I/O. Indexes then were based on Record Length * ROWID.
Now to your need, your ultimate performance for scale is to introduce a smart combination - we host over 1 million companies, with over 10 pages per company - 10 million files, plus js, css and images.
Theory 1) - You know your limitation is RAM - spool dynamic content to disk when feasible and drop such features as hit counters. Leverage NGINX or HIGHLY tune APACHE (or as we did, wrote our own web servers since 2001) - the whole concept is leverage RAM for the MOST USED, and have a very intelligent lookup for disk based content - normally the URI is fine.
Theory 2) - Trend Analysis and User Anticipation - I have spent years researching and developing systems that track trends. If I know a user will go path A, B, C, D - then when he hits B, I have already prefetched C and D. If I know a user will go A, B but may go E then D. You have the choice to pre-cache C and E, or for RAM sake prefetch D. and manually fetch C or E when the user picks that.
The Web Server we have developed along with some accounting systems I have developed over the years integrate Theory 2 to prefetch, with combinations of Smart Caching. We also store the content to disk in deflate - so the transport layer simply pumps the content onto the stack as 99% of the browsers support deflated streams. (It's faster to reflate before sending for that 1% than deflate 99% of the time)
Per the thought of MEMCACHED and SWAP - Disk speed is your enemy, however, tying up the kernel to manage that enemy is an epic fail! If you want to beat MEMCACHED performance, learn how to setup a RAM DISK and keep your deflated HOT requested items there!
** DISCLAIMER: This all assumes that you have enough bandwidth that your Infrastructure/Users bandwidth is not your bottleneck, but your servers are. #3FINC
http://memcached.org/ + http://php.net/manual/en/book.memcache.php
Flat files are "technically" the fastest - but if you're looking for something with a PHP front end and just screams - take a look at postgres.
http://www.wikivs.com/wiki/MySQL_vs_PostgreSQL#Raw_Speed
For memory caching look at memcached
http://memcached.org/
*Edit: from your edit ... (redundant yes) ... if you cache that volume in memory you will have issues. Look into postgres columnar table queries or a quasi-custom flat file solution.
As far as I know, using the file system is actually the fastest way to cache rendered templates without resorting to storing them in memory. Any database would simply add overhead and would make the whole thing slower by comparison.
I would use memcached or APC. Depending if you need caching shared between servers. Memcached is a daemon you connect to, where APC is actually inside of PHP instance (a little faster). Both of them store the cache in memory so it's blazing fast.
In fact storing cache in files is really the fastest way to do this. But, if you're really interested in putting them into a database, you can check out MongoDB. MongoDB is a document-oriented database so there are no server-side joins, that's why it's faster than mysql (1. with php 2. there are a lot of benchmarks on the internet).
Related
I'm primarily wondering what the speed difference is in accessing the object cache of APC v. memcached (NOT op-code cache). The primary advantage of memcached is that it is distributed and not restricted to the local machine. However, since it is over the network, there's is some sort of latency involved.
I was wondering whether the speed difference between accessing APC (on the machine) and memcached (on another server) is big enough to warrant having a staged caching scheme, where the program first tries APC, then memcached, and finally the database if all else fails.
Like most everything else: it depends.
If you have a lot of calculations and can store the results then caching will speed things up. If you're just basically storing rows from the database then in memory caching will help but memcached may not add a huge amount of difference vs. a database (assuming the db queries are all simple). On the other hand if you're doing complex queries, or a lot of programmatic work to create something, then caching makes much more sense.
To give you an example, I recently worked on a site that was written by a 3rd party contractor who did not do any performance work during design. It was slow as an ox because it had a lot of unoptimized includes and such. Adding APC basically improved the performance by 10x. Adding memached decreased load times by 10 - 20 ms.
If you're far enough along then do some performance testing (look up xdebug, or another tool) and see where your bottlenecks are, then plan accordingly.
Keep in mind that if you fill up your APC cache with other things then APC will have to re-calculate the op-code for your pages again. This can cause problems if the pages keep removing objects, then once the page runs the objects keep removing pages. Not fun.
Just be safe and don't be tempted to use APC for anything but config values which won't cause your pages to be removed to make space.
TL;DR Once APC gets full your site will slow down and your server will work much harder.
Trying to get to grips with the different types of cache engines File, APC, Xcache, Memcache. Anybody know of any good resources/links?
Note I am using Linux, PHP and mysql
There are 2 types of caching terminology thrown around in PHP.
First is an optcode cache:
http://en.wikipedia.org/wiki/PHP_accelerator
Second is a data cache:
http://simas.posterous.com/php-data-caching-techniques
A few of the technologies can cross boundaries into both realms, but the basics behind them are simple. The idea is: Keep as much data in ram and precompiled because compiling and HD seeks are very expensive processes. HD Seeks can be done to find a file to compile / query the DB to get data / looking for a temp file, and every time that happens it slows down the user experience.
Memcached is generally the way to go, but it has some "features" such as once you save some data to t cache, it doesn't necessarily guarantee that it will be available later as it dynamically removes old caches to make way for new ones. It's also fairly basic, you'll need to roll your own system for handling timeouts and preventing cascading but it's all fairly simple. There's tons of info in the Memcached FAQ, or feel free to ask and I'll post some code examples. Memcached can also act as a session handler which is great if you have lots of users or more than one server.
Otherwise disc caching is good if you only have one server or don't mind generating separate caches of each server. Generally faster than memcached as it doesn't have the network overhead (unless you have memcached on the same server). There are plenty of good disc caching frameworks but probably the best are Pear Cache_Lite and APC.
APC also has the added advantage that it can cache your compiled PHP code which may help on high-performance websites.
I'm working on some old(ish) software in PHP that maintains a $cache array to reduce the number of SQL queries. I was thinking of just putting memcached in its place and I'm wondering whether or not to get rid of the internal caching. Would there still be a worthwihle performance increase if I keep the internal caching, or would memcached suffice?
According to this blog post, the PHP internal array is way faster than any other method:
Cache Type Cache Gets/sec
Array Cache 365000
APC Cache 98000
File Cache 27000
Memcached Cache (TCP/IP) 12200
MySQL Query Cache (TCP/IP) 9900
MySQL Query Cache (Unix Socket) 13500
Selecting from table (TCP/IP) 5100
Selecting from table (Unix Socket) 7400
It seems likely that memcache (which is implemented on the metal) would be faster than some php interpreted caching scheme.
However: if it's not broken, don't fix it.
If you remove the custom caching code, you might have to deal with other code that depends on the cache. I can't speak for the quality of the code you have to maintain but it seems like one of those "probably not worth it" things.
Let me put it this way: Do you trust the original developer(s) to have written code that will still work if you rip out the caching? (I probably wouldn't)
So unless the existing caching is giving you problems I would recommend against taking it out.
There's an advantage in using memcache vs local caching if:
1) you have mulitple webservers running off the same database, and have memcache set up to run across multiple nodes
2) the database does not implement query result caching or is very slow to access
Otherwise, unless the caching code is very poor, you shouldn't expect to see much performance benefit.
HTH
C.
I need to run Linux-Apache-PHP-MySQL application (Moodle e-learning platform) for a large number of concurrent users - I am aiming 5000 users. By concurrent I mean that 5000 people should be able to work with the application at the same time. "Work" means not only do database reads but writes as well.
The application is not very typical, since it is doing a lot of inserts/updates on the database, so caching techniques are not helping to much. We are using InnoDB storage engine. In addition application is not written with performance in mind. For instance one Apache thread usually occupies about 30-50 MB of RAM.
I would be greatful for information what hardware is needed to build scalable configuration that is able to handle this kind of load.
We are using right now two HP DLG 380 with two 4 core processors which are able to handle much lower load (typically 300-500 concurrent users). Is it reasonable to invest in this kind of boxes and build cluster using them or is it better to go with some more high-end hardware?
I am particularly curious
how many and how powerful servers are
needed (number of processors/cores, size of RAM)
what network equipment should
be used (what kind of switches,
network cards)
any other hardware,
like particular disc storage
solutions, etc, that are needed
Another thing is how to put together everything, that is what is the most optimal architecture. Clustering with MySQL is rather hard (people are complaining about MySQL Cluster, even here on Stackoverflow).
Once you get past the point where a couple of physical machines aren't giving you the peak load you need, you probably want to start virtualising.
EC2 is probably the most flexible solution at the moment for the LAMP stack. You can set up their VMs as if they were physical machines, cluster them, spin them up as you need more compute-time, switch them off during off-peak times, create machine images so it's easy to system test...
There are various solutions available for load-balancing and automated spin-up.
If you can make your app fit, you can get use out of their non-relational database engine as well. At very high loads, relational databases (and MySQL in particular) don't scale effectively. The peak load of SimpleDB, BigTable and similar non-relational databases can scale almost linearly as you add hardware.
Moving away from a relational database is a huge step though, I can't say I've ever needed to do it myself.
I'm not so sure about hardware, but from a software point-of-view:
With an efficient data layer that will cache objects and collections returned from the database then I'd say a standard master-slave configuration would work fine. Route all writes to a beefy master and all reads to slaves, adding more slaves as required.
Cache data as objects returned from your data-mapper/ORM and not HTML, and use Memcached as your caching layer. If you update an object then write to the db and update in memcached, best use IdentityMap pattern for this. You'll probably need quite a few Memcached instances although you could get away with running these on your web servers.
We could never get MySQL clustering to work properly.
Be careful with the SQL queries you write and you should be fine.
Piotr, have you tried asking this question on moodle.org yet? There are a couple of similar scoped installations whose staff members answer that currently.
Also, depending on what your timeframe for deployment is, you might want to check out the moodle 2.0 line rather than the moodle 1.9 line, it looks like there are a bunch of good fixes for some of the issues with moodle's architecture in that version.
also: memcached rocks for this. php acceleration rocks for this. serverfault is probably the better *exchange site for this question though
I am creating a new PHP framework depending on Zend Framework.
It will be a general purpose MVC framework for web development.
I am worried about 2 aspects:
Logging:
Should I use logging? Is there any substantial performance problems when using logging?
Caching database queries:
I am caching some queries from database.
I am concerned about caching user related information. Suppose there are some information related to users. Like their personal info, etc.
If I cache such data, for every user a cache file will be generated in my data folder. Now suppose there are 10,000 - 20,000 users online in 2 hours span of time. These means that there will be 20000 files on my folder.
My question is that, will it affect the performance of my server. Is there any upper limit on how many files a folder can have on server.
Do not use a file based cache. File system operations are exceptionally slow: http://imgur.com/X1Hi1.gif . Use memcached, you don't need a lot of memory contrary to what the above post says, the amount of memory you need for it is totally proportional to how much stuff you want to store, plus memcached can cull data based on access frequency.
1) You definitely want logging, I'd recommend xdebug available at http://www.xdebug.org/. You can read further about the performance overheads at their site. (plus it integrates nicely with Eclipse's PHP version.)
2) I'm not really sure I'd want to cache much user information, but memcache is probably one of the better choices for caching in php (http://se2.php.net/memcache). And yeah, there's no limit on file number, and you'll probably not be going over the 32-bit filesize limit either =)
Caching is a real problem it's almost impossible to get it right from a user/programmer perspective. I wouldn't cache things as simple as user data. This is already cached in the database. Focus more on complex queries and complete webpages (or parts of it).
Unless you have a page like stackoverflow where i see really few ways to cache anything you have to search hard and check your logfiles about what users do on your site and you will see some hotspots soon.
Memcache is not recommended by me unless you have a lot of memory (> 8GB) on your machine. Memcache works best if you throw in Memcache servers with 16 GB doing nothing else them caching things.
For smaller sites, hardware and requirements you should consider APC as this is a very low overhead cache for data and it speeds up the execution of php at the same time (you don't want to run a production server without a bytecode cache).