My setup:
4 webservers
Static content server (NFS mount)
2 db servers
2 "do magic" servers
An additional 8 machines designated multi-purpose.
I'm writing a wrapper for three caching mechanisms so that they can be used in a somewhat normalized manner: Filesystem, Memcached and APC. I'm trying to come up with examples for use (and what to actually put in each cache).
File System
Handles content that we generate and then statically serve. RSS feeds, old report data, user specific pages, etc... This is all cached to the static server.
Memcached
PHP session data, MySQL query results, generally things that need to be available across our systems. We have 8 machines that can be included in the server pool.
APC
I have no idea. The two "do magic" servers are not part of any distributed system, so it seems likely that they could cache query results in APC and work from there. Past that, I can't think of anything.
Query Caching
Given the nature of our SQL use, query caching reduces performance. I've disabled this.
In general, what types of data should be stored where? Does this setup even make sense?
Is there any use for an APC data cache in a distributed system (I can't think of one)?
Is there something that I'm missing that would make things easier or more efficient?
Edit: I've figured out what Pascal was saying, finally. I had it stuck in my head that I would only be moving a portion of my config / whatever to APC, and still loading the rest of the file from disk. Any other suggestions?
I'm using the same kind of caching mecanism for some projects ; and we are using APC + memcached as caching systems.
There are two/three main differences between APC and memcached, when it comes to caching data :
APC access is a bit faster (something like 5 times faster than memcached, if I remember correctly), as it's only local -- i.e. no network involved.
Using APC, your cache is duplicated on each server ; using memcached, there is no duplication accross servers
whic means that memcached ensures that all servers have the same version of the data ; while data stored in APC can be different on each server
We generally use :
APC for data that has to be accessed very often, is quick to generate, and :
either is not modified often
or it doesn't matter if it's not identical on all servers
memcached for data that takes more time to generate, and/or is less used.
Or for data for which modifications must be visible immediatly (i.e. when there is a write to the DB, the cached entry is regenerated too)
For instance, we could :
Use APC to store configuration variables :
Don't change often
Are accessed very often
Are small
Use memcached for content of articles (for a CMS application, for example) :
Not that small, and there are a lot of them, which means it might need more memory than we have on one server alone
Pretty hard/heavy to generate
A couple of sidenotes :
If several servers try to write to the same file that's shared via NFS, there can be problems, as there is no locking mecanism on NFS (as far as I remember)
APC can be used to cache data, yes -- but the most important reason to use it is it's opcode caching feature (can save a large amount of CPU on the PHP servers)
Entries in memcached are limited in size : you cannot store an entry that's bigger than 1M (I've sometimes run into that problem -- rarely, but it's not good when it happens ^^ )
Related
I'm working with a couple of Web Servers behind a Load Balancer and I can enable Sticky Sessions to hold a user to the one specific Web Servers - this will work.
I have been reading about PHP Sessions & MemCache. I must say what I've read is a touch confusing as some pages say its a good idea and others the opposite.
Questions:
is it possible to keep php sessions in memcache?
is it better to use sticky sessions over memcache?
what are the problems with php sessions in memcache - note: I can get enough cache (amazon so its expandable).
1: YES. And I strongly recommend storing PHP sessions in Memcached. Here's why:
Memcached is great for storing small chunks of data that are frequently accessed by the database and filesystem.
Memcached was designed specifically for sessions. It was originally the brainchild of the lead developer of livejournal.com and later used to also cache the content of users' posts. The benefit was immediate: most of the action was taking place in memory. Page load times greatly improved.
Thankfully, PHP and Apache have an easy implementation to handle sessions with Memcached. Simply install with a few shell commands
example for Debian:
sudo apt-get -t stable install php7.4-memcached
and
change your php.ini settings to something similar to:
(taken from https://www.php.net/manual/en/memcached.sessions.php)
session.save_handler = memcached
; change server:port to fit your needs...
session.save_path = "localhost:11211"
The key is the session.save_path
It will no longer point to a relative file path on your server.
APC was mentioned - APC for the caching of .php files used by the program. APC and Memcached will reduce IO significantly and leave Apache/Nginx free to server resources, such as images, faster.
2: No
3: The fundamental disadvantage of using Memcached is data volatility
Session data is not persistent in Memcached. So if and when the server crashes, all data in memory is lost. Everyone will have to log in again.
And then you have memory consumption...
Remember: the sessions are stored in the memory. If your website handles a large number of concurrent users, you may have to shell out a little extra money for a larger memory allocation.
1. Yes, it is possible to keep PHP sessions in memcached.
The memcache extension even comes with a session handler that takes very little configuration to get up and running. http://php.net/manual/en/memcached.sessions.php
2. Memcache/Sticky Sessions
I don't really know which is "better". I feel this is going to be one of those "it depends" answers. It likely depends on your reasons for load balancing. If a small number of users cause lots of load each, or if it's a large number causing a small load each.
3. Cons of Memcache
There are probably 2 main cons to using memcache for sessions storage.
Firstly, it is volatile. This means, if one of your memcached instances is restarted/crashes etc. any sessions stored in that instance are lost. While if they were using traditional file based sessions, they will be still there when the server returns.
Secondly and probably more relevant, memcached doesn't guarantee persistance, it is only meant to be a cache. Data can be purged from memcached at any time, for any reason. While, in reality, the only reasons data should be purged is if the cache is nearing its size limits. The least recently accessed data will be expelled. Again, this might not be an issue, as the user is probably gone if their session is stale, but it depends on your needs.
If you want to use "memcacheD" extension not "memcache" (there are two different extensions) for session control, you should pay attention to modify php.ini.
Most web resources from Google is based on memcache because it's earlier version than memcacheD. They will say as following:
session.save_handler = memcache
session.save_path = "tcp://localhost:11211"
But it's not valid when it comes to memcacheD.
You should modify php.ini like that:
session.save_handler = memcached
session.save_path = "localhost:11211"
There is no protocol indentifier.
From: http://php.net/manual/en/memcached.sessions.php#99646
As my point of view its not recommended storing sessions in Memcached.If a session disappears, often the user is logged out,If a portion of a cache disappears or either due to a hardware crash it should not cause your users noticable pain.According to the memcached site, “memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.” So while developing your application, remember that you must have a fall-back mechanism to retrieve the data once it is not found in the Memcached server.
I'm using PHP APC on production servers of a web service with 10M's hits/day successfuly for a long time.
I'm considering offloading much more data to the APC local cache.
Theoretically it looks to me that since APC call is mainly a local memory access. It should not become an issue to call it 10,000s of times/sec. As far as I can tell its limits can be in the memory size but as long as the server has free CPU it should not have performance or corruption issues at high rates.
Is there any limit I'm not aware of that might prevent me from using APC's local object cache in very high rate on app server (ubuntu).
Update:
Apparently according to the answers below my question wasn't clear. I'm not looking for alternaive caching options (memcache,redis etc..). My question is whether there is any concern or limit in using local APC in very high rates and read concurrency.
I'm personally a big fan of using memcached for this kind of storage. It has several advantages:
It's a program that focuses entirely on the storage, and development of memcached will always focus on that. APC is primarily a cache for code, which just happens to offer some access to user storage.
When you reload or restart Apache (or whatever webserver you use), APC's cache gets emptied. When you use a standalone solution such as memcached, you can control when the cache gets emptied. This is really something that was very important in my case, as I sometimes have to make changes to Apache's configuration and really don't want to clear the cache when I do, as it creates a large CPU spike (loading data into the cache again).
It has the possibility to create a distributed cache, making it more scalable. When you have to add a second server because your website becomes large, you don't want two caches which cache the same stuff. memcached scales well, while APC's cache doesn't.
There are many other advantages of using memcached over APC's user cache, but for me these were the primary three reasons to not use APC's user cache. I do use APC, of course, just not the user cache.
Currently we are having a site which do a lot of api calls from our parent site for user details and other data. We are planning to cache all the details on our side. I am planning to use memcache for this. as this is a live site and so we are expecting heavier traffic in coming days(not that like FB but again my server is also not like them ;) ) so I need your opinion what issues we can face if we are going for memcache and cross opinions of yours why shouldn't we go for it. Any other alternative will also help.
https://github.com/steveyen/community-site/blob/master/db_doc/main/WhyNotMemcached.wiki
Memcached is terrific! But not for every situation...
You have objects larger than 1MB.
Memcached is not for large media and streaming huge blobs.
Consider other solutions like: http://www.danga.com/mogilefs
You have keys larger than 250 chars.
If so, perhaps you're doing something wrong?
And, see this mailing list conversation on key size for suggestions.
Your hosting provider won't let you run memcached.
If you're on a low-end virtual private server (a slice of a machine), virtualization tech like vmware or xen might not be a great place to run memcached. Memcached really wants to take over and control a hunk of memory -- if that memory gets swapped out by the OS or hypervisor, performance goes away. Using virtualization, though, just to ease deployment across dedicated boxes is fine.
You're running in an insecure environment.
Remember, anyone can just telnet to any memcached server. If you're on a shared system, watch out!
You want persistence. Or, a database.
If you really just wish that memcached had a SQL interface, then you probably need to rethink your understanding of caching and memcached.
You should implement a generic caching layer for the API calls first. Within the domain of the caching layer you can then change the strategy which backend you want to use. If you then see that memcache is not fitting you can actually switch (and/or testwise monitor how it works compared with other backends).
Even better, you can first code this build upon the filesystem quite easily (which has multiple backends, too) without the hurdle to rely on another daemon, so already get started with caching - probably file system is already enough for your caching needs?
Memcache is fast, but it also can use a lot of memory if you want to get the most out of it. Whenever you hit the disk for I/O, you're increasing the latency of your application. Pull items that are frequently accessed and put them on memcache. For my large scale deployments, we cache sessions there because DB is slow as well as filesystem session storage.
A recommendation to add to your stack is APC. It caches PHP files and lessens the overall memory usage per page.
Alternative: Redis
Memcached is, obviously, limited by your available memory and will start to jettison data when memory thresholds are reached. You may want to look redis which is as fast (faster in some benchmarks) as memcached but allows the use of both volatile and non-volatile keys, more complex data structures, and the option of using virtual memory to put Least Recently Used (LRU) key values to disk.
What is the advantage of PHP data cache ?
Where i can use that , is it good only for browser search or
it is good for data export to csv or text also .
How can i achieve data cache using PHP?
Theres a Bunch of Caches for PHP, for starters i would recommend APC http://php.net/manual/de/book.apc.php since scv and text are just strings every Cache Backend including APC is perfectly for caching them.
If you're generating data and you want to prevent computation of the same data over and over again, a good cache to look into is Memcached. Memcached is a piece of software that you run on your server. It stores anything you give it in key/value format in memory, and returns those values when you ask for them on subsequent requests. It isn't persistent (if your server goes down, everything is wiped out), though this can be useful for debugging or management purposes.
Companies like Digg and Facebook, which both rely heavily on PHP, use Memcached extensively to make sure their respective sites are fast.
Personally, I use Memcached to store things like URL routing information (40ms/request speed increase), feed caching (1-3sec/request speed increase), and social graph caching (300-400ms/request speed increase). Depending on what kind of computation you're performing, you can see various types of increases. Generally, for reasonably sized data sets (i.e.: a 1000+ line CSV file), you'll see pretty substantial increases in speed. Keep in mind, though, that Memcached uses RAM and not disk space for storage, so you can easily run out of memory if it is not properly configured. Placing Memcached on a separate server can help to alleviate this, especially on servers with PHP scripts that use a lot of memory.
Hope this helps!
I need a simple way for multiple running PHP scripts to share data.
Should I create a MySQL DB with a RAM storage engine, and share data via that (can multiple scripts connect to the same DB simultaneously?)
Or would flat files with one piece of data per line be better?
Flat files? Nooooooo...
Use a good DB engine (MySQL, SQLite, etc). Then, for maximum performance, use memcached to cache content.
In this way, you have the ease and reliability of sharing data between processes using proven server software that handles concurrency, etc... But you get the speed of having your data cached.
Keep in mind a couple things:
MySQL has a query cache. If you are issuing the same queries repeteadly, you can gain a lot of performance without adding a caching layer.
MySQL is really fast anyway. Have you load-tested to demonstrate it is not fast enough?
Please don't use flat files, for the sanity of the maintainers.
If you're just looking to have shared data, as fast as possible, and you can hold it all in RAM, then memcached is the perfect solution.
If you'd like persistence of data, then use a DBMS, like MySQL.
Generally, a DB is better, however, if you are sharing a small, mostly static amount of data, there might be performance benefits (and simplicity) of doing it with flat files.
Anything other than trivial data sharing and I would pick a DB however.
1- Where the flat file can be usefull:
Flat file can be faster than a database, but in very specific applications.
They are faster if the data is read from start to finish without any search or write.
If the data dont fit in memory and need to be read fully to get the job done, It 'can' be faster than a database. Also if there is lot more write than read, flat file also shine, most default databases setups will need to make the read queries wait for the write to finish in order maintain indexes and foreign keys. Making the write queries usually slower than simple reads.
TD/LR vesion:
Use flat files for jobs based system(Aka, simple logs parsing), not for web searches queries.
2- Flat files pit falls:
If your going with a flat file, you will need to synchronize your scripts when the file change using custom lock mechanism. Which can lead to slowdown, corruption up to dead lock if you have a bug.
3- Ram based Database ?
Most databases have in memory cache for query results, search indexes, making them very hard to beat with a flat file. Because they cache in memory, making it run entirely from memory is most of the time ineffective and dangerous. Better to properly tune the database configuration.
If your looking to optimize performance using ram, I would first look at running your php scrips, html pages, and small images from a ram drive. Where the cache mechanism is more likely to be crude and hit the hard drive systematically for non changing static data.
Better result can be reach with a load balancer, clustering with a back plane connections up to ram based SAN array. But that's a whole other topic.
5- can multiple scripts connect to the same DB simultaneously?
Yes, its called connection pooling. In php (client side) its the function to open a connection its mysql-pconnect(http://php.net/manual/en/function.mysql-pconnect.php).
You can configure the maximum open connection in php.ini I think. Similar setting on mysql server side define the maximum of concurrent client connections in /etc/mysql/my.cnf.
You must do this in order to take advantage of parrallel processessing of the cpu and avoid php script to wait the query of each other finish. It greatly increase performance under heavy load.
There is also one connection pool/thread pool in Apache configuration for regular web clients. See httpd.conf.
Sorry for the wall of text, was bored.
Louis.
If you're running them on multiple servers, a filesystem-based approach will not cut it (unless you've got a consistent shared filesystem, which is unlikely and may not be scalable).
Therefore you'll need a server-based database anyway to allow the sharing of data between web servers. If you're serious about either performance or availability, your application will support multiple web servers.
I would say that the MySql DB would be better choice unless you have some mechanism in place to deal with locks on the flat files (and some way to control access). In this case the DB layer (regardless of specific DBMS) is acting as an indirection layer, letting you not worry about it.
Since the OP doesn't specify a web server (and PHP actually can run from a commandline) then I'm not certain that the caching technologies are what they're after here. The OP could be looking to do some sort of flying data transform that isn't website driven. Who knows.
If your system has a PHP cache (that caches compiled PHP code in memory, like APC), try putting your data into a PHP file, as PHP code. If you have to write data, there are some security issues.
I need a simple way for multiple
running PHP scripts to share data.
APC, and memcached are both good options depending on context. shared memory may also be an option.
Should I create a MySQL DB with a RAM
storage engine, and share data via
that (can multiple scripts connect to
the same DB simultaneously?)
That's also a decent option, but will probably not be as fast as APC or memcached.
Or would flat files with one piece of
data per line be better?
If this is read-only data, that's a possibility -- but may be slower than any of the options above. Especially if the data is large. Rather than writing custom parsing code, however, consider simply building a PHP array, and include() the file.
If this is a datastore that may be accessed by several writers simultaneously, by all means do NOT use a flat file! Writing to a flat file from multiple processes is likely to lead to file corruption. You can lock the file, but you risk lock contention issues, and long lock wait times.
Handling concurrent writes is the reason applications like mysql and memcached exist.