Following zend_disk_cache_store documentation about the last parameter: "The Data Cache keeps objects in the cache as long as the TTL is not expired. Once the TTL is expired, the object is removed from the cache. The default value is 0."
The documentation does not explicitly say if the data is removed from disk or just ignored by zend. From my testings, it does not remove from disk. Is there any resource on zend to make sure the cache is removed from disk?
The Data Cache Lock-On-Expire feature reduces the load spike of a busy application by guaranteeing that an application gathers an expired piece from the data source only once, and by avoiding a situation where multiple PHP processes simultaneously detect that the data in the cache has expired, and repeatedly run high-cost operations.
How does it work?
When a stored Data Cache entry expires, the following process takes place:
The first attempt to fetch it will receive a 'false' response.
All subsequent requests will be receiving the expired object stored in the Data Cache for the duration of 120 seconds.
During this time period, the php script that received the 'false' response generates an updated data entry and stores it in the Data Cache with the same key.
As soon as the updated data entry is created, it is returned to the subsequent fetching requests.
If this does not occur within the time period of 120 seconds, the entire process (1-4) will repeat itself.
More here:
http://files.zend.com/help/Zend-Server/zend-server.htm#working_with_the_data_cache.htm
Related
I've doubt regarding speed and latency for show real time data.
Let's assume that I want to show read time data to users by fire ajax requests at every second that get data from MySql table by simple collection query.
For that currently these two options are bubbling in my mind
MySql / Amazon Aurora
File system
Among these options which would be better? Or any other solution?
As I checked practically, if we open one page in browser then ajax requests gives response in less than 500ms using PHP, MySql, Nginx stack.
But if we open more pages then same ajax requests gives response in more than 1 second that should be less than 500ms for every visitors.
So in this case if visitors increase then ajax requests gives very poor response.
I also checked with Node.js+MySql but same result.
Is it good to create json files for records and fetch data from file? Or any other solution?
Indeed, you have to use database to store actual data but you can easily add memory cache (it could be internal dictionary or separate component) to track actual updates.
Than your typical ajax request will look something like:
Memcache, do we have anything new for user 123?
Last update was 10 minutes ago
aha, so nothing new, let's return null;
When you write data:
Put data into database
Update lastupdated time for clients in memcache
Actual key might be different - e.g. chat room id. Idea is to read database only when updates actually happened.
Level 2:
You will burn you webserver and also client internet with high number of calls. You can do something:
DateTime start = DateTime.Now;
while(Now.Subtract(30 seconds) < start)
{
if (hasUpdates) return updates;
sleep(100);
}
Than client will call server 1 time per 30 seconds.
Client will get response immediately when server notices new data.
From the memcached wiki:
When the table is full, subsequent inserts cause older data to be purged in least recently used (LRU) order.
I have the following questions:
Which data will be purged? The one which is older by insertion, or the one which is least recently used? I mean if recently accessed data is d1 which is oldest by insertion and the cache is full while replacing data will it replace d1?
I am using PHP for interacting with memcached. Can I have control over how data is replaced in memcached? Like I do not want some of my data to get replaced until it expires even if the cache is full. This data should not be replaced instead other data can be removed for insertion.
When data is expired is it removed immediately?
What is the impact of the number of keys stored on memcached performance?
What is the significance of -k option in memcached.conf? I am not able to understand what "lock down all paged memory" means. Also, the description in README is not sufficient.
When memcached needs to store new data in memory, and the memory is already full, what happen is this:
memcached searches for a a suitable* expired entry, and if one is found, it replaces the data in that entry. Which answers point 3) data is not removed immediately, but when new data should be set, space is reallocated
if no expired entry is found, the one that is least recently used is replaced
*Keep in mind how memcached deals with memory: it allocates blocks of different sizes, so the size of the data you are going to set in the cache plays role in deciding which entry is removed. The entries are 2K, 4K, 8K, 16K... etc up to 1M in size.
All this information can be found in the documentation, so just read in carefully. As #deceze says, memcached does not guarantee that the data will be available in memory and you have to be prepared for a cache miss storm. One interesting approach to avoid a miss storm is to set the expiration time with some random offset, say 10 + [0..10] minutes, which means some items will be stored for 10, and other for 20 minutes (the goal is that not all of items expire at the same time).
And if you want to preserve something in the cache, you have to do two things:
a warm-up script, that asks cache to load the data. So it is always recently used
2 expiration times for the item: one real expiration time, let's say in 30 minutes; another - cached along with the item - logical expiration time, let's say in 10 minutes. When you retrieve the data from the cache, you check the logical expiration time and if it is expired - reload data and set it in the cache for another 30 minutes. In this way you'll never hit the real cache expiration time, and the data will be periodically refreshed.
5) What is the significance of -k option in "memcached.conf". I am not
able to understand what does "Lock down all paged memory" means. Also
description in README is also not sufficient.
No matter how much memory you will allocate for memcached, it will use only the amount it needs, e.g. it allocates only the memory actually used. With the -k option however, the entire memory is reserved when memcached is started, so it always allocates the whole amount of memory, no matter if it needs it or not
I have a very extensive caching system in place for each and every API call. Unique fingerprint is created from every command and request parameters and a specific timeout.
When a request is made and it does not assign acceptable cache timestamp, then the request is made without cache being returned, so the program goes through everything by itself. Result of this is stored in cache with a new timestamp.
If a request is made and request defines that it is willing to accept 5 minute cache - and the system finds such - then system returns result from cache.
This means that each cache record for me includes a key (unique fingerprint), result and timestamp for when it was made.
Currently cache is stored in filesystem, timestamp is the file modification time, which causes i/o requests that are a killer on higher loads.
Having read multiple articles, I realized that Memcache and Memcached are recommended for reducing these calls.
But Memcache and Memcached only store fingerprint and the value. There is no timestamp, which technically means that I would lose on demand cache timestamp acceptance and flexibility. I would technically have to start storing two records per cache:
Fingerprint-data and Data
Fingerprint-time and Timestamp
..which seems dirty. Are there any alternatives?
If you know at the time of creation how long your cached objets should last inside the cache, then Memcached has the functionality you need. The Memcache::set function has a parameter called $expire, where you can set the lifetime of the cached object in seconds.
If you only know the lifetime when you retrieve the object from cache, this will not work.
I agree that using two keys per cached entity is not feasible, because the cache could lose one of the two while keeping the other.
A (still "dirty", but better) solution could be to store a timestamp with each object you put in the cache. You could do this by not caching the objects directly, but rather an array containing the timestamp and the object.
I'm caching tweets on my site (with 30 min expiration time). When the cache is empty, the first user to find out will repopulate it.
However, at that time the Twitter API may return a 200. In that case I'd like to prolong the previous data for another 30 mins. But the previous data will already be lost.
So instead I'd like to look into repopulating the cache, say, 5 minutes before expiration time so that I don't lose any date.
So how do I know the expiration time of an item when using php's memcache::get()?
Also, is there a better way of doing this?
In that case, isn't this the better logic?
If the cache is older than 30 minutes, attempt to pull from Twitter
If new data was successfully retrieved, overwrite the cache
Cache data for an indefinite amount of time (or much longer than you intend to cache anyway)
Note the last time the cache was updated (current time) in a separate key
Rinse, repeat
The point being, only replace the data with something new if you have it, don't let the old data be thrown away automatically.
don't store critical data in memcached. it guarantees nothing.
if you always need to get "latest good" cache - you need to store data at any persistent storage, such as database or flat file.
in this case if nothing found in cache - you do twitter api request. if it fails - you read data from persistent. and on another http request you will make same iteration one more time.
or you can put data from persistent into memcache with pretty shor lifetime. few minutes for example (1-5) to let twitter servers time to get healthy. and after it expired - repeat the request.
When you are putting your data into memcache - you are setting also how long the cache is valid. So theoretically you could also put the time when cache was created and/or when cache will expire. Later after fetching from cache you can always validate how much time left till cache will expire and decide what you want to do.
But letting cache to be repopulated on user visit can be still risky at some point - lets say if you would like to repopulate cache when it reaches ~5 min before expiration time - and suddenly there would be no visitors coming in last 6 minutes before cache expires - then cache will still expire and no one will cause it to be repopulated. If you want to be always sure that cache entry exists - you need to do checks periodically - for example - making a cronjob which does cache checks and fill-ups.
I'm new to memcached.
Is this code vulnerable to the expired cache race condition?
How would you improve it?
$memcache = new Memcache;
$memcache->connect('127.0.0.1');
$arts = ($memcache===FALSE) ? FALSE : $memcache->get($qparams);
if($arts===FALSE) {
$arts=fetchdb($q, $qparams);
$memcache->add($qparams, $arts, MEMCACHE_COMPRESSED, 60*60*24*3);
}
if($arts<>FALSE) {
// do stuff
} else {
// empty dataset
}
$qparams contains the parameters to the query, so I'm using it as key.
$arts get's an array with all fields I need for every item.
Let's say that query X gets 100 rows. A little after row #50 is modified by another process (lets say that the retail price gets increased).
What should I do about the cache?
How can I know in row #50 is cached?
Should I invalidate ALL the entries in the cache? (sounds like overkill to me).
Is this code vulnerable to the expired cache race condition? How would you improve it?
Yes. If two (or more) simultaneous clients try to fetch the same key from the cache and end up pulling it from the database. You will have spikes on the database and for periods of time the database will be under heavy load. This is called cache stampede. There are a couple of ways to handle this:
For new items preheat the cache (basically means that you preload the objects you require before the site goes live).
For items that expire periodically create an expire time that is a bit in the future than the actual expire time (lets say 5-10 minutes). Then when you pull the object from the cache, check if the expire time is close, cache into the future to prevent any other client from updating the cache, and update from the database. For this to work with no cache stampedes you would need to either implement key locking or use cas tokens (would require the latest client library to work).
For more info check the memcached faq.
Let's say that query X gets 100 rows. A little after row #50 is modified by another process (lets say that the retail price gets increased).
You have three types of data in cache:
Objects
Lists of Objects
Generated data
What I usually do is to keep the objects as separate keys and then use cache "pointers" in lists. In your case you have N objets somewhere in cache (lets say the keys are 1,2..N), and then you have your list of objects in an array array(1,2,3,10,42...). When you decide to load the list with objects, you load the list key from cache, then load the actual objects from cache (using getMulti to reduce requests). In this case if any of the object gets updated, you update it in one spot only and it is automatically updated everywhere (not to mention that you save huge amount of space with this technique).
Edit: Decided to add a bit more info regarding the lookahead time expiration.
You set up your object with an expiration data x and save it into the database with an expiration date of x+5minutes. This are the steps you take when you load the object from the cache:
Check if it is time to update (time() - x < 0)
If so, lock the key so nobody can update it while you are refreshing the item. If the you cannot lock the key, then somebody else is already updating the key, and it becomes a SEP (Somebody Else's Problem). Since memcached has no solution for locks, you have to devise your own mechanism. I usually do this by adding a separate key with the original keys value + ":lock" at the end. You must set this key to expire in the shortest amount possible (for memcached that is 1 second).
If you obtained a lock on the key, you first save the object with a new expiration time (this way you are sure no other clients will try to lock the key), then go about your business and update the key from the database and save the new value again with the appropriate lookahead expirations (see point 1).
Hope this clears everything up :)
You have to invalidate any cached object that contains a modified item. Either you have to modify the cache mechanism to store items at a more granular level, or invalidate the entire entry.
It's basically the same as saying you're caching the entire DB in a single cache-entry. You either expire it or you don't.