Memcached consistent hashing not working with 3 of 4 servers down

Memcached consistent hashing not working with 3 of 4 servers down - php

Story
I have 3 memcached servers running where I shutdown the one or the other to investigate how PHP-memcached behaves upon a server not beeing reachable.
I have defined 4 servers in PHP, 1 to simulate a server that is mostly offline (spare server). When I shutdown 1 server (=> 2 are still online), the third ->get() gives me a result.
When I shutdown one more server (=> 1 is still online), it won't find objects pushed to that last server.
Sample output
First run, 3 of 4 servers up:
Entity not found in cache on 1st try: NOT FOUND
Entity not found in cache on 2nd try: NOT FOUND
Entity not found in cache on 3rd try: NOT FOUND
Entity not found in cache on 4th try: NOT FOUND
Second run, 3 of 4 servers up:
Entity found in Cache: SUCCESS
Third run, 2 of 4 servers up:
Entity not found in cache on 1st try: CONNECTION FAILURE
Entity not found in cache on 2nd try: SERVER IS MARKED DEAD
Entity not found in cache on 3rd try: NOT FOUND
Entity not found in cache on 4th try: NOT FOUND
Fourth run, 1 of 4 servers up:
Entity not found in cache on 1st try: CONNECTION FAILURE
Entity not found in cache on 2nd try: SERVER IS MARKED DEAD
Entity not found in cache on 3rd try: CONNECTION FAILURE
Entity not found in cache on 4th try: SERVER IS MARKED DEAD
Although there is one server left online and I do push my object to memcached everytime it does not find any in cache, it is not able to find the key anymore.
I think it should also work with only a single server left.
Can you explain this behaviour to me?
It looks like it is not possible to implement something that is safe even when I shutdown 19 of 20 servers.
Sidequestion: libketama is not really maintained anymore, is it still good to use it? The logic behind the lib was rather good and is also used in the varnish caching server.
Appendix
My Script:
<?php
require_once 'CachableEntity.php';
require_once 'TestEntity.php';
echo PHP_EOL;
$cache = new Memcached();
$cache->setOption(Memcached::OPT_LIBKETAMA_COMPATIBLE, true);
$cache->setOption(Memcached::OPT_DISTRIBUTION, Memcached::DISTRIBUTION_CONSISTENT);
$cache->setOption(Memcached::OPT_SERVER_FAILURE_LIMIT, 1);
$cache->setOption(Memcached::OPT_REMOVE_FAILED_SERVERS, true);
$cache->setOption(Memcached::OPT_AUTO_EJECT_HOSTS, true);
$cache->setOption(Memcached::OPT_TCP_NODELAY, true);
//$cache->setOption(Memcached::OPT_RETRY_TIMEOUT, 10);
$cache->addServers([
['localhost', '11212'],
['localhost', '11213'],
['localhost', '11214'],
['localhost', '11215'], // always offline
]);
$entityId = '/test/test/article_123456789.test';
$entity = new TestEntity($entityId);
$found = false;
$cacheKey = $entity->getCacheKey();
$cacheResult = $cache->get($cacheKey);
if (empty($cacheResult)) {
echo 'Entity not found in cache on 1st try: ' . $cache->getResultMessage(), PHP_EOL;
$cacheResult = $cache->get($cacheKey);
if (empty($cacheResult)) {
echo 'Entity not found in cache on 2nd try: ' . $cache->getResultMessage(), PHP_EOL;
$cacheResult = $cache->get($cacheKey);
if (empty($cacheResult)) {
echo 'Entity not found in cache on 3rd try: ' . $cache->getResultMessage(), PHP_EOL;
$cacheResult = $cache->get($cacheKey);
if (empty($cacheResult)) {
echo 'Entity not found in cache on 4th try: ' . $cache->getResultMessage(), PHP_EOL;
$entity
->setTitle('TEST')
->setText('Hellow w0rld. Lorem Orem Rem Em M IpsuM')
->setUrl('http://www.google.com/content-123456789.html');
$cache->set($cacheKey, $entity->serialize(), 120);
}
}
else { $found = true; }
}
else { $found = true; }
}
else { $found = true; }
if ($found === true) {
echo 'Entity found in Cache: ' . $cache->getResultMessage(), PHP_EOL;
$entity->unserialize($cacheResult);
echo 'Title: ' . $entity->getTitle(), PHP_EOL;
}
echo PHP_EOL;

The behaviour you are experiencing is consistent. When a server is not available it's first marked with a failure and then marked as dead.
The problem is that apparently it would only be coherent if you had set Memcached::OPT_SERVER_FAILURE_LIMIT value to 2 while you have set this to 1. This would have explained why you have two error lines per unreachable server (CONNECTION FAILURE, SERVER IS MARKED AS DEAD)
This seems to be related with timeout. Adding a usleep() after a failure with a matching OPT_RETRY_TIMEOUT value will enable the server to be dropped from the list (see the following bug comment)
The value does not replicate to the next server because only keys are distributed.
Note that OPT_LIBKETAMA_COMPATIBLE does not use libketama, but only reproduces the same algorithm, which means that it does not matter if libketama is no longer active while this is the recommended configuration in PHP documentation:
It is highly recommended to enable this option if you want to use consistent hashing, and it may be enabled by default in future releases.
EDIT:
In my understanding of your post, the message "Entity found in Cache: SUCCESS" only appears on the second run (1 server offline) because there's no change from the previous command and the server hosting this key is still available (so memcached consider from the key that the value is stored on either the 1st, 2nd or 3rd server). Let's call those servers John, George, Ringo and Paul.
In the third run, at start, memcached deduces from the key which one of the four servers owns the value (e.g. John). It asks John twice before giving up because it's now off. Its algorithm then only considers 3 servers (not knowing that Paul is already dead) and deduces that George should contain the value.
George answers twice that it does not contain the value and then store it.
But on the fourth run, John, George and Paul are off. Memcached tries John twice, and then tries George twice. It then stores in Ringo.
The problem here is that the unavailable servers are not memorized between different runs, and that within the same run you have to ask twice a server before it's removed.

Redundancy
Since Memcached 3.0.0, there is a redundancy configuration.
It can be made in the extension config file.
/etc/php/7.0/mods-available/memcached.ini (may be different among operating systems)
memcache.redundancy=2
with ini_set('memcache.redundancy', 2)
This parameter is not really documented, you may replace "2" by the number of servers, this will add slight overhead with extra writes.
Loosing 19/20 servers
With redundancy, you may loose some servers and keep a "read success".
Notes:
loosing 95% of a server pool will cause a stress on the remaining ones.
Cache servers are made for performances, "a large number of servers can slow down a client from a few angles"
libketama
Github repository has not received any commits since 2014. Libketama is looking for a new maintainer https://github.com/RJ/ketama

Related

Using msg_send() in PHP results in error 11... Cannot find a solution

I am currently building a small application that uses the message queue built in PHP.
I have 1 "server" process and 1 "client" process. Messages flow from server to client.
They are simple JSON objects, that are serialised, then send.
This code is used
<?php
$send = msg_send($q, MESSAGE_TYPE_EXECUTION, $update, true, false, $error);
if (isset($error) && $error != 0) {
echo 'Execution error: ' . $error . PHP_EOL;
}
// $q is the message queue integer
// MESSAGE_TYPE_EXECUTION is integer 1
// $update is the JSON string
// true is that the JSON string is serialised
// false is that it is blocking (which it is not)
// $error get's filled when an error occurs (see below)
This works without issue, until it does not.
Sometimes after a couple of minutes, sometimes after a couple of hours the following error appears:
PHP Warning: msg_send(): msgsnd failed: Resource temporarily unavailable in
/var/www/server.php on line 57
The value of the $error variable is the integer 11.
All messages that follow this error will have error 11, until I restart the process and all is working again (for a while, until the same error appears again)
I have been searching but cannot find any explanation what error 11 is, how this can be managed and fixed without restarting the process.
Any clue, information, example etc is welcome. I would really like for server.php to be reliable.
-- edit --
client.php is the process that fetches the messages (which are all more or less the same, but with other values)
it uses this fetch the messages from the queue (filled in server.php)
<?php
$update = msg_receive($q, 0, $messagetype, 1024, $message, true, MSG_IPC_NOWAIT && MSG_NOERROR, $error);
if ($update) {
// Do stuff
}
usleep(1000000);
I have not yet checked memory usage, will look into that
Platform used
PHP 7.1.3
Centos 7

So, Solution was found after some information and leads (read the comments on my original question), brought up by #ChrisHaas (Thanks again!). After some tinkering all is running smoothly now, without error 11 for msg_send().
PHP msg_send() call is basically a wrapper of msgsnd
So a lot of information can be found there, also about errors you might encounter (in combination with flags used when reading messages with msg_receive() )
The queue is limited in total size and total messages it can hold (I, however, have not found a way to increase the total size of the queue).
The reason I was getting error 11 was due to a couple of things:
The client I created was too slow fetching messages from the queue, causing it to run into the max limit and crapping out. I did not find a solution for fixing this situation, other than restarting all processes involved. To repeat the same over and over again.
I also increased the size of reading messages in msg_receive() as sometimes the messages where big (most where small). But when you declare a too small size the big messages will remain in queue and clog it up until it craps out. Increasing the max_size helped with fetching the bigger messages too.
Long story short: error 11 is related to a full message queue in my perspective (I still do not have a 100% clear documented answer though).
Pointers to fix the issue:
Be sure you fetch all messages that are big.
Be sure to read out at least as fast as you send the messages in the queue.
Check your queue(s) with the command ipcs -q in the terminal. It allows you to see the queues currently active. Keeping an eye on that allows you to see it slowly filling up on problems.
Wish the documentation on php.net was better in this case...

Why does Memcached add() always succeed, regardless of expire time?

I'm adding a key using Memcached like so:
$valueToStore = time(); // some number
$success = $memcached->add( 'test_key', $valueToStore, 20 ); // cache for 20 seconds
But it's always succeeding when I call it in a different session, even before 20 seconds have passed. According to the docs at http://php.net/manual/en/memcached.add.php, it should be returning FALSE until the key expires (because the key already exists).
I'm running on a single development server with plenty of free cache space. Any idea what might be happening?
php -v returns: PHP 5.5.9-1ubuntu4.3
memcached version 2.1.0
libmemcached version 1.0.8.

You need to be distinct if you are using the Memcache class or the Memcached class. Your cache design is a bit strange. You should be checking the cache to first see if the item is there. If the item is not then store it. Also Memcache has some strange behavior on using the boolen type as the third argument. You should MEMCACHE_COMPRESSED. I think you are using Memcache.
To illustrate how to fix your problem:
$in_cache = $memcached->get('test_key');
if($in_cache)
return $in_cache;
else
$valueToStore = time();
$memcached->add('test_key', $valueToStore, MEMCACHE_COMPRESS, 20);

Is phpredis pipeline the same as using the protocol for mass insertion?

I'm moving some part of my site from relational database to Redis and need to insert milions of keys in possibly short time.
In my case, data must be first fetched from MySQL, prepared by PHP and then added to corresponding sorted sets (time as a score + ID as a value). Currently I'm taking adventage of phpredis multi method with Redis::PIPELINE parameter. Despite noticeable speed improvements it turned out to block reads and slow down loading times while doing import.
So here comes the question - is using pipeline in phpredis an equivalent to the mass insertion described in http://redis.io/topics/mass-insert?
Here's an example:
phpredis way:
<?php
// All necessary requires etc.
$client = Redis::getClient();
$client->multi(Redis::PIPELINE); // OR $client->pipeline();
$client->zAdd('key', 1, 2);
...
$client->zAdd('key', 1000, 2000);
$client->exec();
vs protocol from redis.io:
cat data.txt | redis-cli --pipe

I'm one of the contributors to phpredis, so I can answer your question. The short answer is that it is not the same but I'll provide a bit more detail.
What happens when you put phpredis into Redis::PIPELINE mode is that instead of sending the command when it is called, it puts it into a list of "to be sent" commands. Then, once you call exec(), one big command buffer is created with all of the commands and sent to Redis.
After the commands are all sent, phpredis reads each reply and packages the results as per each commands specification (e.g. HMGET calls come back as associative arrays, etc).
The performance on pipelining in phpredis is actually quite good, and should suffice for almost every use case. That being said, you are still processing every command through PHP, which means you will pay the function call overhead by calling the phpredis extension itself for every command. In addition, phpredis will spend time processing and formatting each reply.
If your use case requires importing MASSIVE amounts of data into Redis, especially if you don't need to process each reply (but instead just want to know that all commands were processed), then the mass-import method is the way to go.
I've actually created a project to do this here:
https://github.com/michael-grunder/redismi
The idea behind this extension is that you call it with your commands and then save the buffer to disk, which will be in the raw Redis protocol and compatible with cat buffer.txt | redis-cli --pipe style insertion.
One thing to note is that at present you can't simply replace any given phpredis call with a call to the RedisMI object, as commands are processed as variable argument calls (like hiredis), which work for most, but not all phpredis commands.
Here is a simple example of how you might use it:
<?php
$obj_mi = new RedisMI();
// Some context we can pass around in RedisMI for whatever we want
$obj_context = new StdClass();
$obj_context->session_id = "some-session-id";
// Attach this context to the RedisMI object
$obj_mi->SetInfo($obj_context);
// Set a callback when a buffer is saved
$obj_mi->SaveCallback(
function($obj_mi, $str_filename, $i_cmd_count) {
// Output our context info we attached
$obj_context = $obj_mi->GetInfo();
echo "session id: " . $obj_context->session_id . "\n";
// Output the filename and how many commands were sent
echo "buffer file: " . $str_filename . "\n";
echo "commands : " . $i_cmd_count . "\n";
}
);
// A thousand SADD commands, adding three members each time
for($i=0;$i<1000;$i++) {
$obj_mi->sadd('some-set', "$i-one", "$i-two", "$i-three");
}
// A thousand ZADD commands
for($i=0;$i<1000;$i++) {
$obj_mi->zadd('some-zset', $i, "member-$i");
}
// Save the buffer
$obj_mi->SaveBuffer('test.buf');
?>
Then you can do something like this:
➜ tredismi php mi.php
session id: some-session-id
buffer file: test.buf
commands : 2000
➜ tredismi cat test.buf|redis-cli --pipe
All data transferred. Waiting for the last reply...
Last reply received from server.
errors: 0, replies: 2000
Cheers!

Options to lessen database open connection?

I have a free application that is very used and I get around 500 to 1000 concurrent users from time to time.
This application is a desktop application that will communicate with my website API to receive data every 5 ~ 15 minutes as well as send back minimum data about 3 selects top every 15 minutes.
Since users can turn the application on and off as they wish the timer for each one of them to query my API may vary and as such I have been hitting the max connection limit available for my hosting plan.
Not wanting to upgrade it for financial matter as well as because it is a non-profitable application for the moment I am searching for other options to reduce the amount of connections and cache some information that can be cached.
The first thing that came to my mind was to use FastCGI with Perl I have tested it for some time now and it seems to work great but I have to problems while using it:
if for whatever reason the application goes idle for 60 the
server kills it and for the next few requests it will reply with
error 500 until the script is respawned which takes about 3+ minutes
(yes it takes that much I have tried my code locally on my own test
server and it comes up instantly so I am sure it is a server issue
of my hosting company but they don't seem like wanting to resolve
it).
the kill timeout which is set to 300 and will kill/restart the
script after that period which would result on the above said at 1)
about the respawn of the script.
Given that I am now looking for alternatives that are not based on FastCGI if there is any.
Also due to the limitations of the shared host I can't make my own daemon and my access to compile anything is very limited.
Are there any good options that I can archive this with either Perl or PHP ?
Mainly reduce the database open connections to a minimum and still be able to cache some select queries for returning data... The main process of the application is inserting/updating data anyway so there inst much to cache.
This was the simple code I was using for testing it:
#!/usr/bin/perl -w
use CGI::Simple; # Can't use CGI as it doesn't clear the data for the
# next request haven't investigate it further but needed
# something working to test and using CGI::Simples was
# the fastest solution found.
use DBI;
use strict;
use warnings;
use lib qw( /home/my_user/perl_modules/lib/perl/5.10.1 );
use FCGI;
my $dbh = DBI->connect('DBI:mysql:mydatabase:mymysqlservername',
'username', 'password',
{RaiseError=>1,AutoCommit=>1}
) || die &dbError($DBI::errstr);
my $request = FCGI::Request();
while($request->Accept() >= 0)
{
my $query = new CGI::Simple;
my $action = $query->param("action");
my $id = $query->param("id");
my $server = $query->param("server");
my $ip = $ENV{'REMOTE_ADDR'};
print $query->header();
if ($action eq "exp")
{
my $sth = $dbh->prepare(qq{
INSERT INTO
my_data (id, server) VALUES (?,INET_ATON(?))
ON DUPLICATE KEY UPDATE
server = INET_ATON(?)});
my $result = $sth->execute($id, $server, $server)
|| die print($dbh->errstr);
$sth->finish;
if ($result)
{
print "1";
}
else
{
print "0";
}
}
else
{
print "0";
}
}
$dbh->disconnect || die print($DBI::errstr);
exit(0);
sub dbError
{
my ($txt_erro) = #_;
my $query = new CGI::Simple;
print $query->header();
print "$txt_erro";
exit(0);
}

Run a proxy. Perl's DBD::Proxy should fit the bill. The proxy server shouldn't be under your host's control, so its 60-???-of-inactivity rule shouldn't apply here.
Alternatively, install a cron job that runs more often than the FastCGI timeout, simply to wget some "make activity" page on your site, and discard the output. Some CRMs do this to force a "check for updates" for example, so it's not completely unusual, though somewhat of an annoyance here.

FWIW, you probably want to look at CGI::Fast instead of CGI::Simple to resolve your CGI.pm not dealing in the expected manner with persistent variables...

Redirect user to a static page when server's usage is over a limit

We would like to implement a method that checks mysql load or total sessions on server and
if this number is bigger than a value then the next visitor of the website is redirected to a static webpage with a message Too many users try later.

One way I implemented it in my website is to handle the error message MySQL outputs when it denies a connection.
Sample PHP code:
function updateDefaultMessage($userid, $default_message, $dttimestamp, $db) {
$queryClients = "UPDATE users SET user_default_message = '$default_message', user_dtmodified = '$dttimestamp' WHERE user_id = $userid";
$resultClients = mysql_query($queryClients, $db);
if (!$resultClients) {
log_error("[MySQL] code:" . mysql_errno($db) . " | msg:" . mysql_error($db) . " | query:" . $queryClients , E_USER_WARNING);
$result = false;
} else {
$result = true;
}
}
In the JS:
function UpdateExistingMsg(JSONstring)
{
var MessageParam = "msgjsonupd=" + JSON.encode(JSONstring);
var myRequest = new Request({url:urlUpdateCodes, onSuccess: function(result) {if (!result) window.open(foo);} , onFailure: function(result) {bar}}).post(MessageParam);
}
I hope the above code makes sense. Good luck!

Here are some alternatives to user-lock-out that I have used in the past to decrease load:
APC Cache
PHP APC cache (speeds up access to your scripts via in memory caching of the scripts): http://www.google.com/search?gcx=c&ix=c2&sourceid=chrome&ie=UTF-8&q=php+apc+cache
I don't think that'll solve "too many mysql connections" for you, but it should really really help your website's speed in general, and that'll help mysql threads open and close more quickly, freeing resources. It's a pretty easy install on a debian system, and hopefully anything with package management (perhaps harder if you're using a if you're using a shared server).
Cache the results of common mysql queries, even if only within the same script execution. If you know that you're calling for certain data in multiple places (e.g. client_info() is one that I do a lot), cache it via a static caching variable and the info parameter (e.g.
static $client_info;
static $client_id;
if($incoming_client_id == $client_id){
return $client_info;
} else {
// do stuff to get new client info
}
You also talk about having too many sessions. It's hard to tell whether you're referring to $_SESSION sessions, or just browsing users, but too many $_SESSION sessions may be an indication that you need to move away from use of $_SESSION as a storage device, and too many browsing users, again, implies that you may want to selectively send caching headers for high use pages. For example, almost all of my php scripts return the default caching, which is no cache, except my homepage, which displays headers to allow browsers to cache for a short 1 hour period, to reduce overall load.
Overall, I would definitely look into your caching procedures in general in addition to setting a hard limit on usage that should ideally never be hit.

This should not be done in PHP. You should do it naturally by means of existing hard limits.
For example, if you configure Apache to a known maximal number of clients (MaxClients), once it reaches the limit it would reply with error code 503, which, in turn, you can catch on your nginx frontend and show a static webpage:
proxy_intercept_errors on;
error_page 503 /503.html;
location = /503.html {
root /var/www;
}
This isn't hard to do as it may sound.
PHP isn't the right tool for the job here because once you really hit the hard limit, you will be doomed.

The seemingly simplest answer would be to count the number of session files in ini_get("session.save_path"), but that is a security problem to have access to that directory from the web app.
The second method is to have a database that atomically counts the number of open sessions. For small numbers of sessions where performance really isn't an issue, but you want to be especially accurate to the # of open sessions, this will be a good choice.
The third option that I recommend would be to set up a chron job that counts the number of files in the ini_get('session.save_path') directory, then prints that number to a file in some public area on the filesystem (only if it has changed), visible to the web app. This job can be configured to run as frequently as you'd like -- say once per second if you want better resolution. Your bootstrap loader will open this file for reading, check the number, and give the static page if it is above X.
Of course, this third method won't create a hard limit. But if you're just looking for a general threshold, this seems like a good option.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.