Duplicate keys in memcached

Duplicate keys in memcached - php

I'm having some trouble with memcache on my php site. Occasionally I'll get a report that the site is misbehaving and when I look at memcache I find that a few keys exist on both servers in the cluster. The data is not the same between the two entries (one is older).
My understanding of memcached was that this shouldn't happen...the client should hash the key and then always pick the same server. So either my understanding is wrong or my code is. Can anyone explain why this might be happening?
FWIW the servers are hosted on Amazon EC2.
All my connections to memcache are opened through this function:
$mem_servers = array(
array('ec2-000-000-000-20.compute-1.amazonaws.com', 11211, 50),
array('ec2-000-000-000-21.compute-1.amazonaws.com', 11211, 50)
);
function ConnectMemcache()
{
global $mem_servers;
if ($memcon == 0) {
$memcon = new Memcache();
foreach($mem_servers as $server) $memcon->addServer($server[0], $server[1], true);
}
return($memcon);
}
and values are stored through this:
function SetData($key,$data)
{
global $mem_global_key;
if(MEMCACHE_ON_OFF)
{
$key = $mem_global_key.$key;
$memcache = ConnectMemcache();
$memcache->set($key, $data);
return true;
}
else
{
return false;
}
}

I think this blog post touches on the problems your having.
http://www.caiapps.com/duplicate-key-problem-in-memcache-php/
From the article it sounds like the following happens:
- a memcache server that has the key originally drops out
- the key is recreated on the 2nd server with updated data
- 1st server come back online and into the cluster with the old data.
- Now you have the keys save on 2 servers with different data
Sounds like you may need to use Memcache::flush to clear out the memcache cluster before your write to help minimize how long duplicates might exist in your cluster.

Related

PHP Memcached get returning 0 always

I have memcache (installed on php5) and memcached (installed on php7.2 via libmemcached) both connecting to the same memcached daemon/server.
Memcache::get works perfectly fine and fetches the data as per my expectation. But when I do Memcached::get, it always returns 0.
I have checked that I have compression off when using both extensions. I also tried toggling between Memcached::OPT_BINARY_PROTOCOL for memcached and it still produces same null result.
Interestingly, when I add a key/value pair using memcached extension and retrieve using the same key, I get proper/correct value that I added.
I am now clueless what could be the reason that it's not working for data already stored in memcached server.
EDIT 1 : I did telnet to my memcached server, and checked that it actually has the value. Also, I checked the result code returned by Memcached::getResultCode is not any kind of failure.
EDIT 2 : I may have narrowed it down further. I noticed that when I save ["key1" => "value1"] from memcache-php5 script, it stores and retrieves data correctly. But when I try to retrieve that same data with memcached-php7.1 script, it returns 0.
After that I removed the data with key "key1" from memcached server using telnet. And then I saved ["key1" => "value1"] using memcached-php7.1 script and it can retrieve that data correctly. But when trying to retrieve it using memcache-php5 script, it returns kind of the serialized data "a:1:{s:4:\"key1\";s:6:\"value1\";}" (this is json_encoded output)
So in order to upgrade, I may have to delete/flush everything and recreate entries in memcached server using memcached extension.
P.S. : I know the differences between both these php extensions. I have read all the comments on this question and it's not a duplicate of mine.

As you already know, memcache and memcached are two different extensions. Even though, they're are used for the same purpose - connecting to a memcache server - each one of them serialize data differently.
That means you can't safely switch between them, without a proper cache flush in the server or independent server instances.
<?php
$memcache = new Memcache;
$memcacheD = new Memcached;
$memcache->addServer($host);
$memcacheD->addServers($servers);
$checks = array(
123,
4542.32,
'a string',
true,
array(123, 'string'),
(object)array('key1' => 'value1'),
);
foreach ($checks as $i => $value) {
print "Checking WRITE with Memcache\n";
$key = 'cachetest' . $i;
$memcache->set($key, $value);
usleep(100);
$val = $memcache->get($key);
$valD = $memcacheD->get($key);
if ($val !== $valD) {
print "Not compatible!";
var_dump(compact('val', 'valD'));
}
print "Checking WRITE with MemcacheD\n";
$key = 'cachetest' . $i;
$memcacheD->set($key, $value);
usleep(100);
$val = $memcache->get($key);
$valD = $memcacheD->get($key);
if ($val !== $valD) {
print "Not compatible!";
var_dump(compact('val', 'valD'));
}
}

PHP DB caching, without including files

I've been searching for a suitable PHP caching method for MSSQL results.
Most of the examples I can find suggest storing the results in an array, which would then get included to page. This seems great unless a request for the content was made at the same time as it being updated/rebuilt.
I was hoping to find something similar to ASP's application level variables, but far as I'm aware, PHP doesn't offer this functionality?
The problem I'm facing is I need to perform 6 queries on page to populate dropdown boxes. This happens on the vast majority of pages. It's also not an option to combine the queries. The cached data will also need to be rebuilt sporadically, when the system changes. This could be once a day, once a week or a month. Any advice will be greatly received, thanks!

You can use Redis server and phpredis PHP extension to cache results fetched from database:
$redis = new Redis();
$redis->connect('/tmp/redis.sock');
$sql = "SELECT something FROM sometable WHERE condition";
$sql_hash = md5($sql);
$redis_key = "dbcache:${sql_hash}";
$ttl = 3600; // values expire in 1 hour
if ($result = $redis->get($redis_key)) {
$result = json_decode($result, true);
} else {
$result = Db::fetchArray($sql);
$redis->setex($redis_key, $ttl, json_encode($result));
}
(Error checks skipped for clarity)

what's the optimal chunk size for laravel

I've been working on a web app that reads some data from a remote server on it's own server side. I'm using Laravel, and initially thought it would be easier to develop my own php file with methods to connect to the remote DB. What this php does: fetch the data from the remote server (postgreSQL) and insert that into Laravel using Eloquent. Here's some code snippets.
try {
$dal = connect();
//... some validations not relevant to the question
$result = pg_query($query) or die('Query failed: ' . pg_last_error());
$data = (array_values(pg_fetch_all($result)));
$chunkOfData = array_chunk($data, 1000);
foreach ($chunkOfData as $chunk) {
insertChunkToDB($chunk);
}
closeDB($dal);
} catch(exception $e){
Log::error('Error syncing both databases, more details: '.$e);
exit(1);
}
My question focuses on the array_chunk.
I had to do this because the php crashed with "out of memory error". I used the insertChunk function so the garbage collector would clean the data that's already been inserted. Notice this code is fully functioning (as far as I know).
But... if the pg_fetch_all already retrieved the data... isn't it already in memory? why didn't the program crash then? As a side question how fast can Laravel input it's data? Would using smaller chunks (like 100) cause the program to slow down due to jumping between iterations/garbage collecting? What would be the splitting number to make things fastest?
Oh, by the way this is the function
function insertChunkToDB($chunk){
foreach ($chunk as $element) {
$object = json_decode(json_encode($element), FALSE);
insertObjectToDB($object);
}
}
The encode/decode is done so I can do this
function insertObjectToDB($element){
$LaravelModel = $element->id;
$LaravelModel = $element->name; //and so on...
$LaravelModel->save();
}
When recording foreign keys I do a quick check to see if I have the corresponding value, and if not I quickly issue an extra query to the remote server to record that data in the corresponding table.

Get Computer Unique ID from PHP

I've created an application using PHP and I'm going to sell it to my local market. I will personally be going to their locations to install/configure Apache & MySQL as well as installing my own code.
I would like a security system so that if anyone attempts to copy my code to an unauthorized machine, it won't run.
I know no one can prevent reverse engineering an application. even .exe (binary) files are cracked and with PHP (source code) anyone can do.
In my country those reverse engineers are really hard to find, so I would like to propose minimal security options like:
1) Create class (say, Navigation) which identifies system information like CPU ID, Computer name or any combination of hardware ID to make a UNIQUE_ID and matches with my given UNIQUE_ID (to the individual to whom I sold the application). If it's valid, it returns the navigation menu. Otherwise it will simply destroy the database and halt the execution by throwing an exception, maybe like:
class Navigation {
public function d() {
return current system UNIQUE_ID;
}
public function get() {
$a = file_get_contents('hash');
$c = $this->d();
if (crypt($c) != $a) {
//destory database
throw new Exception('');
} else {
return "<ul><li><a>home</a></li></ul>"; //navigation menu
}
}
}
2) Then during the installation process I'll change system UNIQUE_ID in "hash" file, create an object, and save it into a file (nav.obj):
(install.php)
<?php
$a=new Navigation;
$out=serialize($a);
file_put_contents('nav.obj', $out);
3) in header.php (which gets included in every file):
<?php
$menu=file_get_contents('nav.obj');
$menu=unserialize($a);
echo $menu->get();
?>
I know this method isn't full proof, but I'm pretty sure that around 60% of PHP developers won't be able to crack it!
Now I only need to get current system UNIQUE_ID.

I have created this function to get an unique ID based on hardware (Hard disk UUID). It is possible to use different resources like machine names, domains or even hard disk size to get a better approach depending on your needs.
function UniqueMachineID($salt = "") {
if (strtoupper(substr(PHP_OS, 0, 3)) === 'WIN') {
$temp = sys_get_temp_dir().DIRECTORY_SEPARATOR."diskpartscript.txt";
if(!file_exists($temp) && !is_file($temp)) file_put_contents($temp, "select disk 0\ndetail disk");
$output = shell_exec("diskpart /s ".$temp);
$lines = explode("\n",$output);
$result = array_filter($lines,function($line) {
return stripos($line,"ID:")!==false;
});
if(count($result)>0) {
$result = array_shift(array_values($result));
$result = explode(":",$result);
$result = trim(end($result));
} else $result = $output;
} else {
$result = shell_exec("blkid -o value -s UUID");
if(stripos($result,"blkid")!==false) {
$result = $_SERVER['HTTP_HOST'];
}
}
return md5($salt.md5($result));
}
echo UniqueMachineID();

As per http://man7.org/linux/man-pages/man5/machine-id.5.html
$machineId = trim(shell_exec('cat /etc/machine-id 2>/dev/null'));
EDIT for Tito:
[ekerner#**** ~]$ ls -l /etc/machine-id
-r--r--r--. 1 root root 33 Jul 8 2016 /etc/machine-id
EDIT 2 for Tito: Some things to consider and scenarios:
Is the user allowed to get a new machine? Id guess yes.
Or run on multiple devices?
Sounds like the machine could be irrelevant in your case?
If its user only (no machine restrictions) then Id go for a licencing service (relies on network).
There are many services for this:
Google Play (for Android apps) is a good example: https://developer.android.com/google/play/licensing/index.html
MS and Apple have similar services.
However just search the web for the term "Software Licensing Service" or "Cloud Based Software Licensing Service".
If its user + single device, then youll need to pass up the device id to whatever service you use or make, then allow the machine id to be updated, but not allow revert to previous machine id (would mean multiple devices).
However said services will give you the client code which should take care of that if its a requirement.
Two scenarios from experience:
1: User on any device: we simply made an API in the cloud (in a website) and a login screen in the app, when the user logged in it authenticated via the API and kept a token, and whenever the device was connected to the net the app would query the API and update the login and/or token.
You could alternatively have the login screen in the purchase (like maybe they already logged into a site to purchase), generate a key and pack it with or bind it into the app.
2: User plus machine:
Same thing except when the API is queried the machine id is passed up. The machine ID can change as many times as the user updates their device, but we kept a record of machine ids and made to ban rule on: if we saw an old (previously used) machine id then a certain amount of time had to have passed. Thus allowed the user to break their machine and pull out an old one.
Also to consider if you make one, how will you stop the app from working? Ppl are pretty clever it will need to be core compiled.
However that all being said, the various licensing services are pro at this and can cater for most needs. Plus in their experience theyve already overcome the security pitfalls. Id name one that I like except its yours to search out.
Nice if you can come on back with and positive or negative outcomes from your trails.

function getMachineId() {
$fingerprint = [php_uname(), disk_total_space('.'), filectime('/'), phpversion()];
return hash('sha256', json_encode($fingerprint));
}
This will get a probably-unique id based on a hash of:
The server's OS, OS version, hostname, and architecture.
The total space (not free space) on the drive where the php script is.
The Unix timestamp creation time of the computer's root file system.
The currently installed PHP version.
Unlike the other answers it doesn't depend on shell_exec() being enabled.

High-traffic percentage-based set picks?

The setup: High traffic website and a list of image URLs that we want to display. We have one image spot, and each item in the set of image URLs has a target display percentage for the day. Example:
Image1 - 10%
Image2 - 30%
Image3 - 60%
Because the traffic amount can vary from day to day, I'm doing the percentages within blocks of 1000. The images also need to be picked randomly, but still fit the distribution accurately.
Question: I've implemented POC code for doing this in memcache, but I'm uncomfortable with the way data is stored (multiple hash keys mapped by a "master record" with meta data). This also needs to be able to fall back to a database if the memcache servers go down. I'm also concerned about concurrency issues for the master record.
Is there a simpler way to accomplish this? Perhaps a fast mysql query or a better way to bring memcache into this?
Thanks

You could do what you said, pregenerate a block of 1000 values pointing at the images you'll return:
$distribution = "011022201111202102100120 ..." # exactly evenly distributed
Then store that block in MySQL and memcache, and use another key (in both MySQL and memcache) to hold the current index value for the above string. Whenever the image script is hit increment the value in memcache. If memcache goes down, go to MySQL instead (UPDATE, then SELECT; there may be a better way to do this part).
To keep memcache and MySQL in sync you could have a cron job copy the current index value from memcache to MySQL. You'll lose some accuracy but that may not be critical in this situation.
You could store multiple distributions in both MySQL and memcache and have another key that points to the currently active distribution. That way you can pregenerate future image blocks. When the index exceeds the distribution the script would increment the key and go to the next one.
Roughly:
function FetchImageFname( )
{
$images = array( 0 => 'image1.jpg', 1 => 'image2.jpg', 2 => 'image3.jpg' );
$distribution = FetchDistribution( );
$currentindex = FetchCurrentIndex( );
$x = 0;
while( $distribution[$currentindex] == '' && $x < 10 );
{
IncrementCurrentDistribKey( );
$distribution = FetchDistribution( );
$currentindex = FetchCurrentIndex( );
$x++;
}
if( $distribution[$currentindex] == '' )
{
// XXX Tried and failed. Send error to central logs.
return( $images[0] );
}
return( $distribution[$currentindex] );
}
function FetchDistribution( )
{
$current_distib_key = FetchCurrentDistribKey( );
$distribution = FetchFromMemcache( $current_distrib_key );
if( !$distribution )
$distribution = FetchFromMySQL( $current_distrib_key );
return $distribution;
}
function FetchCurrentIndex( )
{
$current_index = MemcacheIncrement( 'foo' );
if( $current_index === false )
$current_index = MySQLIncrement( 'foo' );
return $current_index;
}
.. etc. The function names kind of stink, but I think you'll get the idea. When the memcache server is back up again, you can copy the data from MySQL back to memcache and it is instantly reactivated.

A hit to the database is most likely going to take longer so I would stick with memcache. You are going to have more issues with concurrency using MySQL than memcache. memcache is better equipped to handle a lot of requests and if the servers go down, this is going to be the least of your worries on a high traffic website.
Maybe a MySQL expert can pipe in here with a good query structure if you give us more specifics.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.