In our current use of memcached, we are running into problems in a high volume server because so much time is used setting up and tearing down connections to our memcache server. Would using persistent connections to memcached help alleviate this problem?
Also, what is the preferred way to connect and use persistent memcahced connections? I was thinking of setting a "pool_size" variable then randomly choosing from 1-$POOL_SIZE and using that connection
$mem = new Memcached(rand(1, $pool_size));
Either I am looking in the wrong place or there is not a lot of information on this out there.
Both pecl/memcache and pecl/memcached support persistent connections per process. However, the bug does exist in pecl/memcached at this time.
The php client doesn't handle persistent connections. you either need to use your pooling idea, or use a 3rd party memcached client for php that supports persistent connections.
like this one:
http://github.com/andreiz/php-memcached/tree/master
I have read that persistent connections feature is broken in the "memcached" PHP extension.
First: the "persistent" connection is not destroyed. (This is ok.)
Second: when you try to reuse it, it creates a new one! (This is bad!)
Result: memory leaks, increasingly consuming all available RAM.
Check here: http://brian.moonspot.net/php-memcached-issues
As I said, I haven't experienced this myself - I just read this information in the linked article.
Related
Besides the drawback of when you restart memcached all sessions are lost and users logged out, what are any other drawbacks for using memcached for storing PHP sessions data instead of files. Any security concerns? Is performance better using memcached instead of standard files on disk?
Although, many have been able to optimize database performance through the use of Memcached it may not be the best solution for every situation.
Some of the drawbacks of Memcached:
Size Requirement
Not much Documentation support
Volatility (If a Memcached server instance crashes, any object data stored within the session is gone)
Security (There is no authentication built into Memcached).
But still Memcached is a good choice in many apps because of following reasons:
Memcached can compensate for insufficient ACID properties and it never blocks.
Memcached is cross-platform
Cross-DBMS
Its Cheap
Lets look at the brighter side!
Not a security concern specific to using memcached for sessions, but rather something I often come along: You absolutely must make sure that your memcached instances are either using unix sockets, or - if they're bound to a part - their port is blocked. Otherwise, people can just telnet in and view, modify and delete (session) data.
Also, as the name implies, it is a caching solution, not a storage solution. As such, if you decide to use memcached for session storage, you ought to have it either database backed or file-storage backed, so if there is a cache miss (entry deleted due to time out, manual removal, flush or because the assigned memory was full and it got pruned), it can check a more persistent type of storage before saying "nope, it isn't there".
I just ran a test creating 1000 non-persistent connections to mongodb via nginx/php fastcgi which took about 2.1 seconds on my dev machine. I then tried the same test using persistent connections, same result. I think I read somewhere that persistence in the php driver is now always enabled anyway. Next, I tried storing the connections to APC which resulted in a 7-9ms response time after the first request. Now I'm wondering a few things here:
There's almost never a time I can think of where I'd want to create more than one connection in my app at once and with a persistent connection from what I understand, new connections are created as needed by the mongo driver.
Creating a single connection seems to take about the same time as pulling the stored connection object from APC. Will caching the connection object ever really provide a benefit?
Caching the connection I know of course would still require some sort of check to see if it's even still a valid connection.. in performing this check each time, I wonder if it would negate the performance gain (if any) from pulling it from cache.
I can't seem to really find any material really covering any of this so I'm assuming it's because I'm confused in my understanding. Have any of you experimented with this?
Thanks!
First, as far as i know, APC is serializing data while storing it. so it would not make any sense to store any connection in APC.
Then, persistend connections will be reused by the php process for various requests. So a non persistend connection will be reestablished for each request the php process will receive.
I'm implementing an application that will have a lot of clients querying lots of small data packages from my webserver. Now I'm unsure whether to use persistent data connections to the database or not. The database is currently on the same system as the webserver and could connect via the socket, but this may change in the near future.
As I know a few releases of PHP ago mysqli_pconnect was removed because it behaved suboptimally. In the meantime it seems to be back again.
Based on my scenario I suppose I won't have an other chance to handle thousands of queries per minute but with loads of persistent connections and a MySQL configuration that reserves only little resources per connection, right?
Thanks for your input!
What happened when you tested it?
With the nest will in the world, there's no practical way you can convey all the informaton required for people to provide a definitive answer in a SO response. However usually there is very little overhead in establishing a mysql connection, particularly if it resides on the same system as the database client (in this case the webserver). There's even less overhead if you use a filesystem rather than a network socket.
So I'd suggest abstracting all your database calls so you can easily switch between connection types, but write your system to use on-demand connections, and ensure you code explicitly releases the connection as soon as practical - then see how it behaves.
C.
Are PHP persistant connections evil?
The problem is there can be only so
many connections active between Host
“Apache” and Host “MySQL”
Persistant connections usually give problems in that you hit the maximum number of connections. Also, in your case it does not give a great benefit since your database server is on the same host. Keep it to normal connections for now.
As they say, your mileage may vary, but I've never had good experiences using persistent connections from PHP, including MySQL and Oracle (both ODBC and OCI8). Every time I've tested it, the system fails to reuse connections. With high load, I end up hitting the top limit while I have hundreds of idle connections.
So my advice is that you actually try it and find out whether your set-up is reusing connections properly. If it isn't working as expected, it won't be a big lose anyway: opening a MySQL connection is not particularly costly compared to other DBMS.
Also, don't forget to reset all relevant settings when appropriate (whatever session value you change, it'll be waiting for you next time to stablish a connection and happen to reuse that one).
When querying the db is it plausible to feel extremely paranoid? I go as far as opening and closing mysql connection every time a new query has to be done. I am afraid that (especially with the ajax enabled pages) this would cause great performance downgrading.
Should I continue in this method or at least open and close connections once in everypage (instead of per-query)? (i'm writing in php btw)
thank you.
Yes, the overhead of connecting every time will be considerable. I suggest you just close it when you're done, it's very unlikely that simply having an open connection without running queries on it will open you to vulnerabilities.
I'd recommend a connection pool if it's possible with PHP. It's a way to simultaneously maximize performance and minimize connection time.
You should not close MySQL connections immediately. It's better to use a single connection for the entire PHP script. PHP will automatically close the connection if you don't explicitly.
Opening a new connection incurs some small time penalty, particularly if MySQL lives on another server on the network. New TCP connections require a three-way handshake, and each TCP connection consumes kernel resources for at least two minutes.
Although PHP doesn't support full-fledged connection pooling, the procedural MySQL API does support persistent connections. See mysql_pconnect() for more details. At my office we use pconnect to avoid crashing the TCP stacks on our high-traffic PHP site.
Consider PHP Data Objects, which can keep a persistent connection for you. You really only connect with credentials once and then cache the connection. Create the connection with something along these lines:
$dbh = new PDO('mysql:host=HOSTNAME;dbname=DBNAME',
'USER', 'PASSWORD',
array(PDO::ATTR_PERSISTENT => true));
For more information, see this page in the php manual.
Can't speak for PHP, but when I write this kind of thing in Apache/Perl I generally do two things to boost performance:
a) allow MySQL handles to stay open as long as the Apache daemon does.
b) keep cached statement handles around (using an LRU cache).
On production servers, some of those MySQL handles have been around for days, serving many thousands of quite complicated SQL queries. It isn't a problem.
Am I right in thinking that until I am able to afford dedicated servers or have any spare servers, I could successfully run a small number of memcached servers through EC2?
With the annoucement of the new auto-scaling and load balancing by Amazon today, do you guys think this would be a viable option?
And what would be the basic technical steps you'd recommend me taking?
Thanks
Currently, I have one dedicated server and no memcached servers. I want to use the power of EC2 to setup a few instances and run such memcached servers. That's my current setup.
Load balancing has nothing to do with Memcached -- it uses a hash algorithm for connecting to servers
I highly recommend not using autoscaling with Memcached -- adding servers breaks the hashing algorithm and invalidates your cache. Data will go missing and you'll have to recache.
You'll want to check the latency from your servers to EC2 -- if it's more than 50ms, you'll be hurting your performance significantly. Well, I'd assume anyway.
You can pull multiple keys (see here for how) with one request to reduce the latency effect, but you'll still take the initial hit. And it also means you need to know all the keys your going to get before you make the call. Otherwise each request adds 50ms (or more) to the execution time of your script.
Consider the data your trying to cache. Is a 64mb slab large enough to help you? You can probably run it on your main servers.
To really take advantage of memcached you need to have your memcache communicating with your code as quickly as possible. You may want to investigate how much latency you'd have between the EC2 servers and your own.
Ultimately, you might be better served upping the ram on your current box to something like 4 gigs (should run you about 50 bucks) and putting memcached on the main server. The documentation actually recommends that you install memcached on the same server that is serving out requests. Depending on the size of your application and what it does, a memcached instance with a gig or two may be way more than what you need.
Also, if you're not using a php object caching engine like APC or Eaccelerator, that will also help.
Recently AWS has released a new web service - Amazon ElasticCache. This service is protocol-complaint with Memcached.
For more details refer to : http://aws.amazon.com/elasticache/
How much free memory do you normally have on your current box? Could you not just set up a memcached instance there? I'm thinking that it's possible the latency/overhead/etc. from having remote caches is such that you'd negate any benefits, but perhaps that's not the case.
More in general:
If you want to use any type of caching mechanism, it makes sense to have your servers VERY CLOSE to your cache servers. Example: Database servers and Memcached servers, they should be in the same colocation, or same AWS "Region".
If you try to use a caching system, is because you want to improve performance. If you put the caching system away from your servers, you're basically wasting all the benefits.
Best,