600+ memcache req/s problems - help!

600+ memcache req/s problems - help! - php

I am running memcached on my server and when it hits 600+ req/s it becomes unstable and causes a big load of problems. It appears when the request rate gets that high, my PHP applications at random times are unable to connect to the memcache server, causing slow load times which makes nginx and php-fpm freak out and I receive a bunch of 104: Connection reset by peer errors in my nginx logs.
I would like to point out that in my memcache server I have 'hot objects' - objects that at times receive 90% of the memcache requests. I also noticed when so many requests hit a single object, it slightly adds a little more load time to the overall page (when it manages to load).
I would greatly appreciate any help to this problem. Thanks so much!

Switch away from using TCP sockets and going to UNIX sockets (assuming you are on a unix based server)
Start memcached with a socket enabled:
Add -s /tmp/memcached.socket to your memcached startup line (Note, sockets disables networking support)
Then in PHP, connect using persistent connections, and to the new memcache socket:
$memcache_obj = new Memcache;
$memcache_obj->pconnect('unix:///tmp/memcached.socket', 0);
Another recommendation, if you have multiple "types" of cached objects, start a memcached instance for each "type" and distribute your hot items amongst them.
Drupal does this, you can see how their config file and memcached init is setup here.
Also, it sounds to me like your memcached timeout is set WAY to high. If it's anything above 1 or 2 seconds, you can lock scripts up. The timeout should be reached, and the script should default to retrieving the object via another method (SQL, file, etc)
The other thing is verify that your memcache isn't being put into a swap file, if your cache is smaller than your average free ram, try starting memcache with the -k option, this will force it's cache to always stay in ram and can't be swapped.
If you have a multi-core server, also make sure memcached is compiled with thread support, and enable it using -t <numcores>

600 requests per second is profoundly low for memcached.
If you're establishing a connection for every request, you'll spend more time connecting than requesting and burn through your ephemeral ports very rapidly which might be the problem you're seeing.

There's a couple of things you could try:
If you have memcached running locally, you can use the named socket 'localhost' instead of '127.0.0.1'
Use persisntent connections

Related

change the 'max number of client reached' redis error message to ''

is it possible to change the error message 'max number of client reached' to null or empty string?
I'm using redis as a cache for my DB values and in cases that I can't get the values from the cache I will get it from the DB.
if I could configure it in the redis it self it would be the best option for me because my code won't have to change in order to support that edge case.
if someone has some tips on how to avoid such errors it would be nice as well :) (I'm using php scripts with predis package)

The error message max number of clients reached is clearly indicate that Redis is reached client limit and unable to serve any new requests.
this issue probably can be related to incorrect use of Predis\Client in code. Instead to create a connection object once (singleton) and use it across process lifetime. The code probably create a new object on every request to Redis and keep all these connections open.
another thing is worth to check how php processes are managed by a web server. The web server (e.g. apache prefork, nginx php-fpm) might leave processes for a long time both holding connections to Redis and exhaust server resources (mem, cpu).
if nothing from above is true - the issue (bug) might be in the predis library.
Bottom line: the code/web server exhaust maxclients limit.
If you don't have control over code/web server (e.g. nginx), to reduce amount of error messages you can:
increase maxclients over 10k (depends on your Redis server resources). This will reduce frequency of error messages.
consider to enable (disabled by default) connection timeout (use it with cautious, as your code may assume that connections are never timeout). This will release old connections from a connection pool.
decrease tcp-keepalive from 300 seconds to less than timeout. This will close connections to dead peers (clients that cannot be reached even if they look connected).

How should I handle a "too many connections" issue with mysql?

The web startup I'm working at gets a spike in number of concurrent web users from 5000 on a normal day to 10,000 on weekends. This Saturday the traffic was so high that we started getting a "too many connections" error intermittently. Our CTO fixed this by simply increasing the max_connections value on the tatabase servers. I want to know if using one persistent connection is a better solution here?
i.e. instead of using:
$db = new mysqli('db_server_ip', 'db_user', 'db_user_pass', 'db_name');
We use:
$db = new mysqli('p:db_server_ip', 'db_user', 'db_user_pass', 'db_name');
We're already using multiple MySQL servers and as well as multiple web servers (Apache + mod_php).

You should share the database connection across multiple web requests. Every process that is running on the application server should get an own mysql connection, that is kept open as long as the process is running and reused for every web request that comes in.

From the PHP Docs:
Persistent connections are good if the overhead to create a link to your SQL server is high.
And
Note, however, that this can have some drawbacks if you are using a database with connection limits that are exceeded by persistent child connections. If your database has a limit of 16 simultaneous connections, and in the course of a busy server session, 17 child threads attempt to connect, one will not be able to.
Persistent connections aren't the solution to your problem. Your problem is that your burst usage is beyond the limits set in your database configuration, and potentially your infrastructure. What your CTO did, increasing the connection limit, is a good first step. Now you need to monitor the resource utilization on your database servers to make sure they can handle the increased load from additional connections. If they can, you're fine. If you start seeing the database server running out of resources, you'll need to set up additional servers to handle the burst in traffic.

Too Many Connections
Cause
This is error is caused by
a lot of simultaneous connections, or
by old connections not being released soon enough
You already did SHOW VARIABLES LIKE "max_connections"; and increased the value.
Permanent Connections
If you use permanent or persistent database connections, you have to always take the MySQL directive wait_timeout into account. Closing won't work, but you could lower the timeout. So used resources will be faster available again. Utilize netstat to find out whats going on exactly as described here https://serverfault.com/questions/355750/mysql-lowering-wait-timeout-value-to-lower-number-of-open-connections.
Do not forget to free your result sets to reduce wasting of db server resources.
Be advised to use temporary, short lived connections instead of persistent connections.
Introducing persistence is pretty much against the whole web request-response flow, because it's stateless. You know: 1 pconnect request, causes an 8 hour persistant connection dangling around at the db server, waiting for the next request, which never comes. Multiply by number of users and look at your resources.
Temporary connections
If you use mysql_connect() - do not forget to mysql_close().
Set new_link set to false and pass the CLIENT_INTERACTIVE flag.
You might adjusting interactive_timeout, which helps in stopping old connections blocking up the work.
If the problem persists, scale
If the problem remains, then decide to scale.
Either by adding another DB server and putting a proxy in front,
(MySQL works well with HAProxy) or by switching to an automatically scaling cloud-service.
I really doubt, that your stuff is correctly configured.
How can this be a problem, when you are already running multiple MySQL servers, as well as multiple web servers? Please describe your load balancing setup.
Sounds like Apache 2.2 + mod_php + MySQL + unknown balancer, right?
Maybe try
Apache 2.4 + mod_proxy_fcgi + PHP 5.5/5.6 (php-fpm) + MySQL (InnoDb) + HAProxy or
Nginx + PHP 5.5/5.6 (php-fpm) + MySQL (InnoDb) + HAProxy.

Why should I close or keep Redis connections open?

I'm using Redis in a PHP project. I use phpredis as a client. Sometimes, during long CLI-scripts, I experience PHP segmentation faults.
I've experienced before that phpredis has problems when the connection times out. As my Redis config is configured to automatically close idle connections after 300 seconds, I guess that causes the segmentation fault.
In order to be able to choose whether to increase the connection timeout or default it to 0 (which means "never timeout"), I would like to know what the possible advantages and disadvantages are?
Why should I never close a connection?
Why should I make sure connections don't stay open?
Thanks

Generally, opening a connection is an expensive operation so modern best practices are to keep them open. On the other hand, open connections requires resources (from the database) to manage so keeping a lot of idle connections open can also be problematic. This trade off is usually resolved via the use of connection pools.
That said, what's more interesting is why does PHP segfault. The timeout is, evidently, caused by a long running command (CLI script in your case) that blocks Redis (which is mostly single threaded) from attending to the PHP app's connections. While this is a well-known Redis behavior, I would expect PHP (event without featuring reconnect at the client library) not to s**t its pants so miserably.

The answer to your question much depends on cases of redis usage in your application. So, should your never close a connection with idle connection timeout?
In general no, your should keep it default - 0. Why or when:
Any types of long living application. Such as CLI-script ot background worker. Why - phpredis do not has builded in reconnection feature so your should take care about this by yourself or do not your idle timeout.
Each time your request processed or CLI script die - all connections would be closed by php engine. Redis server close all connection for closed client sockets. You will have no problems like zombie connection or something like that. As extension, phpredis close connection in destructor - so your may be sure connections don't stay open.
p.s. Of course your can implement reconnection insome proxy class in php by yourself. We have redis in high load environment - ~4000 connections per second on instance. After 2.4 version we do not use idle connection timeout. And do not have any types of troubles with that.

Redis request latency

I'm using redis (2.6.8) with php-fpm and phpredis driver and have some trouble with redis latency issues. Under certain load first request to redis from our application takes about 1-1.5s and redis-cli --latency shows the same latency.
I've already checked the latency guide.
We use redis on the same host with Unix sockets
slowlog has no entries longer 5ms
we don't use AOF
redis takes about 3.5Gb memory of 16Gb available (i suppose it's not too much)
our system is not swapping
there is no other process doing disk I/O
I'm using persistent connections and amount of connected client is varying from 5 to 25 (sometimes strikes to 60-80).
Here is the graph.
It looks like problems starts when there are 20 or more simultaneously connected clients.
Can you help me to figure out where is the problem?
Update
I investigated the problem and it seemed like redis did not have enough processor time for some reason to operate properly.
I thoroughly checked communication between php-fpm and redis with the help of network sniffer. Redis received request over tcp but sent the answer back only after one and a half seconds. It obviously signified that the problem is inside redis, that it cannot process so many requests in the given conditions (possibly processor starvation as the processor was only 50% loaded for the whole system).
The problem was resolved by moving redis to other server that was nearly idle. I suppose that we should have played with linux scheduler to make it work on the same server, but have not done it yet.

Bear in mind that Redis is single-threaded. If the operations that you're doing err on the processor-intensive side, your requests could be blocking on each other. For instance, if you're doing HVALS against hashes with very large values, you're going to make all of your clients wait while you pull out all that data and copy it to the output buffer.
Part of what you need to do here (regardless if this is the issue) is to look at all of the commands that you're using and determine the complexity of each command. If you're doing a bunch of O(N) commands against very large amounts of data, it's not impossible that you're simply doing too much stuff at a time.
TL;DR Nobody on here can debug this issue with real certainty without knowing which commands you're using and what your data looks like. But you can look up the time complexity of each method you're using and make sure it's reasonable.

I ran across this in researching an issue I'm working on but thought it might help here:
https://groups.google.com/forum/#!topic/redis-db/uZaXHZUl0NA
If you read through the thread there is some interesting info.

PHP REDIS/MYSQL, issue with concurrent connections

I have worked on NodeJs and Redis before. Since NodeJs is a web server I could maintain a single connection to Redis and all the http requests use same Redis client to connect to Redis.
But in PHP each page upon HTTP request creates a new connection to Redis Server and this is slowing down the performance. How do they maintain the connection state in PHP? It must be same issue with PHP-Mysql too so I guess there are solutions out there?

The way php works, it that it is a program, not a server. Every time you request a page on your web server, PHP is called to run the program. Once the page is done loading, the thread is ended. PHP is not a server, so once a page is done loading, all connections associated with it are terminated. Therefor, every time a page is requested, a new connection to the database has to be made. If you are noticing a performance issue when connecting, you should try php-redis if you are not already doing so.

Let says you are using php-fpm. php-fpm has a master process and run multiple worker process according the pool configuration.
Each worker process is independent (but can use shared resource such as opcache/APUc cache ...), consume CPU and memory (memory is most important factor to tune the pool configuration max-children property).
So yes, 1 HTTP query = 1 php-fpm worker (fresh or reuse) = 1 new socket connection (or reuse persistent connection), to scale:
Redis cluster
Proxy such as HAProxy between php and redis (so you can limit maxconn)
Use local cache such as APCu to limit Redis access (complicate but the most powerful)
Check OS ulimit and opened file descriptors

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.