Get expiration time of a memcache item in php? - php

I'm caching tweets on my site (with 30 min expiration time). When the cache is empty, the first user to find out will repopulate it.
However, at that time the Twitter API may return a 200. In that case I'd like to prolong the previous data for another 30 mins. But the previous data will already be lost.
So instead I'd like to look into repopulating the cache, say, 5 minutes before expiration time so that I don't lose any date.
So how do I know the expiration time of an item when using php's memcache::get()?
Also, is there a better way of doing this?

In that case, isn't this the better logic?
If the cache is older than 30 minutes, attempt to pull from Twitter
If new data was successfully retrieved, overwrite the cache
Cache data for an indefinite amount of time (or much longer than you intend to cache anyway)
Note the last time the cache was updated (current time) in a separate key
Rinse, repeat
The point being, only replace the data with something new if you have it, don't let the old data be thrown away automatically.

don't store critical data in memcached. it guarantees nothing.
if you always need to get "latest good" cache - you need to store data at any persistent storage, such as database or flat file.
in this case if nothing found in cache - you do twitter api request. if it fails - you read data from persistent. and on another http request you will make same iteration one more time.
or you can put data from persistent into memcache with pretty shor lifetime. few minutes for example (1-5) to let twitter servers time to get healthy. and after it expired - repeat the request.

When you are putting your data into memcache - you are setting also how long the cache is valid. So theoretically you could also put the time when cache was created and/or when cache will expire. Later after fetching from cache you can always validate how much time left till cache will expire and decide what you want to do.
But letting cache to be repopulated on user visit can be still risky at some point - lets say if you would like to repopulate cache when it reaches ~5 min before expiration time - and suddenly there would be no visitors coming in last 6 minutes before cache expires - then cache will still expire and no one will cause it to be repopulated. If you want to be always sure that cache entry exists - you need to do checks periodically - for example - making a cronjob which does cache checks and fill-ups.

Related

Does zend disk cache get deleted after ttl has expired?

Following zend_disk_cache_store documentation about the last parameter: "The Data Cache keeps objects in the cache as long as the TTL is not expired. Once the TTL is expired, the object is removed from the cache. The default value is 0."
The documentation does not explicitly say if the data is removed from disk or just ignored by zend. From my testings, it does not remove from disk. Is there any resource on zend to make sure the cache is removed from disk?
The Data Cache Lock-On-Expire feature reduces the load spike of a busy application by guaranteeing that an application gathers an expired piece from the data source only once, and by avoiding a situation where multiple PHP processes simultaneously detect that the data in the cache has expired, and repeatedly run high-cost operations.
How does it work?
When a stored Data Cache entry expires, the following process takes place:
The first attempt to fetch it will receive a 'false' response.
All subsequent requests will be receiving the expired object stored in the Data Cache for the duration of 120 seconds.
During this time period, the php script that received the 'false' response generates an updated data entry and stores it in the Data Cache with the same key.
As soon as the updated data entry is created, it is returned to the subsequent fetching requests.
If this does not occur within the time period of 120 seconds, the entire process (1-4) will repeat itself.
More here:
http://files.zend.com/help/Zend-Server/zend-server.htm#working_with_the_data_cache.htm

How is data replaced in memcached when it is full, and memcache performance?

From the memcached wiki:
When the table is full, subsequent inserts cause older data to be purged in least recently used (LRU) order.
I have the following questions:
Which data will be purged? The one which is older by insertion, or the one which is least recently used? I mean if recently accessed data is d1 which is oldest by insertion and the cache is full while replacing data will it replace d1?
I am using PHP for interacting with memcached. Can I have control over how data is replaced in memcached? Like I do not want some of my data to get replaced until it expires even if the cache is full. This data should not be replaced instead other data can be removed for insertion.
When data is expired is it removed immediately?
What is the impact of the number of keys stored on memcached performance?
What is the significance of -k option in memcached.conf? I am not able to understand what "lock down all paged memory" means. Also, the description in README is not sufficient.
When memcached needs to store new data in memory, and the memory is already full, what happen is this:
memcached searches for a a suitable* expired entry, and if one is found, it replaces the data in that entry. Which answers point 3) data is not removed immediately, but when new data should be set, space is reallocated
if no expired entry is found, the one that is least recently used is replaced
*Keep in mind how memcached deals with memory: it allocates blocks of different sizes, so the size of the data you are going to set in the cache plays role in deciding which entry is removed. The entries are 2K, 4K, 8K, 16K... etc up to 1M in size.
All this information can be found in the documentation, so just read in carefully. As #deceze says, memcached does not guarantee that the data will be available in memory and you have to be prepared for a cache miss storm. One interesting approach to avoid a miss storm is to set the expiration time with some random offset, say 10 + [0..10] minutes, which means some items will be stored for 10, and other for 20 minutes (the goal is that not all of items expire at the same time).
And if you want to preserve something in the cache, you have to do two things:
a warm-up script, that asks cache to load the data. So it is always recently used
2 expiration times for the item: one real expiration time, let's say in 30 minutes; another - cached along with the item - logical expiration time, let's say in 10 minutes. When you retrieve the data from the cache, you check the logical expiration time and if it is expired - reload data and set it in the cache for another 30 minutes. In this way you'll never hit the real cache expiration time, and the data will be periodically refreshed.
5) What is the significance of -k option in "memcached.conf". I am not
able to understand what does "Lock down all paged memory" means. Also
description in README is also not sufficient.
No matter how much memory you will allocate for memcached, it will use only the amount it needs, e.g. it allocates only the memory actually used. With the -k option however, the entire memory is reserved when memcached is started, so it always allocates the whole amount of memory, no matter if it needs it or not

PHP Memcache potential problems?

I'll most probably be using MemCache for caching some database results.
As I haven't ever written and done caching I thought it would be a good idea to ask those of you who have already done it. The system I'm writing may have concurrency running scripts at some point of time. This is what I'm planning on doing:
I'm writing a banner exchange system.
The information about banners are stored in the database.
There are different sites, with different traffic, loading a php script that would generate code for those banners. (so that the banners are displayed on the client's site)
When a banner is being displayed for the first time - it get's cached with memcache.
The banner has a cache life time for example 1 hour.
Every hour the cache is renewed.
The potential problem I see in this task is at step 4 and 6.
If we have for example 100 sites with big traffic it may happen that the script has a several instances running simultaneously. How could I guarantee that when the cache expires it'll get regenerated once and the data will be intact?
How could I guarantee that when the cache expires it'll get regenerated once and the data will be intact?
The approach to caching I take is, for lack of a better word, a "lazy" implementation. That is, you don't cache something until you retrieve it once, with the hope that someone will need it again. Here's the pseudo code of what that algorithm would look like:
// returns false if there is no value or the value is expired
result = cache_check(key)
if (!result)
{
result = fetch_from_db()
// set it for next time, until it expires anyway
cache_set(key, result, expiry)
}
This works pretty well for what we want to use it for, as long as you use the cache intelligently and understand that not all information is the same. For example, in a hypothetical user comment system, you don't need an expiry time because you can simply invalidate the cache whenever a new user posts a comment on an article, so the next time comments are loaded, they're recached. Some information however (weather data comes to mind) should get a manual expiry time since you're not relying on user input to update your data.
For what its worth, memcache works well in a clustered environment and you should find that setting something like that up isn't hard to do, so this should scale pretty easily to whatever you need it to be.

PHP Memcache Key Expire

I am not sure if this is possible. I was storing some information in a memcache server. The memcache entry i was storing was suppossed to have an expiry of 30 minutes. During that 30 minutes i could update the value of that memcache entry reference by the same key. But when i update the value i do not want to change the expire time. For example:
Key is created and set to expire in 30 minutes
10 minutes goes by and the value of the key is requested and we change the value
i replace the value of they key using memcache replace (i do not provide a new expire time because it is optional), i want the expire time to be 30-10 = 20 because the key was created 10 minutes ago and was set to expire in 30 minutes.
Now since i did not set an expire time it defaults to 0 and the key will never expire.
Now, is there a way of setting items in memcache, setting an expire time, and then getting/replacing the item while keeping the expire time to x minutes after i set the item in cache?
I might possible be able to use unix timestamps instead of seconds to expire when setting into memcache, and also storing that timestamp in memcache and when i set it back into memcache i would just set it to the same timestamp stored in the value. Or is there a better way of doing this?
BTW I an using memcache and not memcached.
I know this question is old, but I thought I'd add a caution to gprime's solution.
It sounds like gprime's "little ugly hack" is to store the expiration as a separate value in memcache. The problem is, memcache may end up purging the expiry value while it is still needed. This can happen even when the memory allocated to memcached is not full.
(See http://sparklewise.com/?p=506 for further explanation.)
This could be a problem if your code doesn't account for the possibility that the previously-stored expiry is gone. Even if you do account for that, you could end up with values living longer than expected in the cache.
It's probably not a huge deal in 99.999% of the cases, but it's one of those gotchas that will cause massive hair-pulling and head-scratching when it does happen. Hopefully this post will help someone avoid that pain. :-)
Essentially Memcache does exactally what you want it to. It does its job very well, getting and setting values... I think the answer your looking for is outside of the default functionality of memcache. I suppose you can put more control on your codebase to check a timestamp that you store with your blob and use that to set expire times for future updates?
I don't know what your using for your non-memcache consistent storage, but I would store an expiry date in that-- then use that value to update your memcache.

memcacheD This is ok?

I'm new to memcached.
Is this code vulnerable to the expired cache race condition?
How would you improve it?
$memcache = new Memcache;
$memcache->connect('127.0.0.1');
$arts = ($memcache===FALSE) ? FALSE : $memcache->get($qparams);
if($arts===FALSE) {
$arts=fetchdb($q, $qparams);
$memcache->add($qparams, $arts, MEMCACHE_COMPRESSED, 60*60*24*3);
}
if($arts<>FALSE) {
// do stuff
} else {
// empty dataset
}
$qparams contains the parameters to the query, so I'm using it as key.
$arts get's an array with all fields I need for every item.
Let's say that query X gets 100 rows. A little after row #50 is modified by another process (lets say that the retail price gets increased).
What should I do about the cache?
How can I know in row #50 is cached?
Should I invalidate ALL the entries in the cache? (sounds like overkill to me).
Is this code vulnerable to the expired cache race condition? How would you improve it?
Yes. If two (or more) simultaneous clients try to fetch the same key from the cache and end up pulling it from the database. You will have spikes on the database and for periods of time the database will be under heavy load. This is called cache stampede. There are a couple of ways to handle this:
For new items preheat the cache (basically means that you preload the objects you require before the site goes live).
For items that expire periodically create an expire time that is a bit in the future than the actual expire time (lets say 5-10 minutes). Then when you pull the object from the cache, check if the expire time is close, cache into the future to prevent any other client from updating the cache, and update from the database. For this to work with no cache stampedes you would need to either implement key locking or use cas tokens (would require the latest client library to work).
For more info check the memcached faq.
Let's say that query X gets 100 rows. A little after row #50 is modified by another process (lets say that the retail price gets increased).
You have three types of data in cache:
Objects
Lists of Objects
Generated data
What I usually do is to keep the objects as separate keys and then use cache "pointers" in lists. In your case you have N objets somewhere in cache (lets say the keys are 1,2..N), and then you have your list of objects in an array array(1,2,3,10,42...). When you decide to load the list with objects, you load the list key from cache, then load the actual objects from cache (using getMulti to reduce requests). In this case if any of the object gets updated, you update it in one spot only and it is automatically updated everywhere (not to mention that you save huge amount of space with this technique).
Edit: Decided to add a bit more info regarding the lookahead time expiration.
You set up your object with an expiration data x and save it into the database with an expiration date of x+5minutes. This are the steps you take when you load the object from the cache:
Check if it is time to update (time() - x < 0)
If so, lock the key so nobody can update it while you are refreshing the item. If the you cannot lock the key, then somebody else is already updating the key, and it becomes a SEP (Somebody Else's Problem). Since memcached has no solution for locks, you have to devise your own mechanism. I usually do this by adding a separate key with the original keys value + ":lock" at the end. You must set this key to expire in the shortest amount possible (for memcached that is 1 second).
If you obtained a lock on the key, you first save the object with a new expiration time (this way you are sure no other clients will try to lock the key), then go about your business and update the key from the database and save the new value again with the appropriate lookahead expirations (see point 1).
Hope this clears everything up :)
You have to invalidate any cached object that contains a modified item. Either you have to modify the cache mechanism to store items at a more granular level, or invalidate the entire entry.
It's basically the same as saying you're caching the entire DB in a single cache-entry. You either expire it or you don't.

Categories