I have a very extensive caching system in place for each and every API call. Unique fingerprint is created from every command and request parameters and a specific timeout.
When a request is made and it does not assign acceptable cache timestamp, then the request is made without cache being returned, so the program goes through everything by itself. Result of this is stored in cache with a new timestamp.
If a request is made and request defines that it is willing to accept 5 minute cache - and the system finds such - then system returns result from cache.
This means that each cache record for me includes a key (unique fingerprint), result and timestamp for when it was made.
Currently cache is stored in filesystem, timestamp is the file modification time, which causes i/o requests that are a killer on higher loads.
Having read multiple articles, I realized that Memcache and Memcached are recommended for reducing these calls.
But Memcache and Memcached only store fingerprint and the value. There is no timestamp, which technically means that I would lose on demand cache timestamp acceptance and flexibility. I would technically have to start storing two records per cache:
Fingerprint-data and Data
Fingerprint-time and Timestamp
..which seems dirty. Are there any alternatives?
If you know at the time of creation how long your cached objets should last inside the cache, then Memcached has the functionality you need. The Memcache::set function has a parameter called $expire, where you can set the lifetime of the cached object in seconds.
If you only know the lifetime when you retrieve the object from cache, this will not work.
I agree that using two keys per cached entity is not feasible, because the cache could lose one of the two while keeping the other.
A (still "dirty", but better) solution could be to store a timestamp with each object you put in the cache. You could do this by not caching the objects directly, but rather an array containing the timestamp and the object.
Related
Following zend_disk_cache_store documentation about the last parameter: "The Data Cache keeps objects in the cache as long as the TTL is not expired. Once the TTL is expired, the object is removed from the cache. The default value is 0."
The documentation does not explicitly say if the data is removed from disk or just ignored by zend. From my testings, it does not remove from disk. Is there any resource on zend to make sure the cache is removed from disk?
The Data Cache Lock-On-Expire feature reduces the load spike of a busy application by guaranteeing that an application gathers an expired piece from the data source only once, and by avoiding a situation where multiple PHP processes simultaneously detect that the data in the cache has expired, and repeatedly run high-cost operations.
How does it work?
When a stored Data Cache entry expires, the following process takes place:
The first attempt to fetch it will receive a 'false' response.
All subsequent requests will be receiving the expired object stored in the Data Cache for the duration of 120 seconds.
During this time period, the php script that received the 'false' response generates an updated data entry and stores it in the Data Cache with the same key.
As soon as the updated data entry is created, it is returned to the subsequent fetching requests.
If this does not occur within the time period of 120 seconds, the entire process (1-4) will repeat itself.
More here:
http://files.zend.com/help/Zend-Server/zend-server.htm#working_with_the_data_cache.htm
I am using CI's sessions in connection with a database. So all of our sessions are in this ci_sessions table on our database and it can get a lot of rows, considering that the session_id keep changing every 5 minutes.
Do we need to empty the table, say every one a month / week maybe?
While what #Marc-Audet said is true, if you take a look at the code, you can see it is a really lousy way to clean up sessions.
The constructor calls the _sess_gc function every time it is initiated. So, basically each request to your server if you have it autoloaded.
Then, it generates a random number below 100 and sees if that's below a certain value (by default it is 5). If this condition is met, then it will remove any rows on the session table with last_activity value less than current time minus your session expiration.
While this works for most cases, it is technically possible that (if the world is truly random) the random number generator does not generate a number below 5 for a long time, in which case, your sessions will not be cleaned up.
Also, if you have your session expiry time set to a long time (if you set to 0, CI will set it to 2 years) then those rows are not going to get deleted anyway. And if your site is good enough to get a decent amount of visitors, your DBA will be pointing fingers at the session table some time soon :)
It works for most cases - but I would not call it a proper solution. Their session id regeneration really should have been built to remove the records pertaining to the previous ids and the garbage collection really should not be left to a random number - in theory, it is possible that the required number is not generated as frequently as you wished.
In our case, I have removed the session garbage collection from the session library and I manually take care of it once a day (with a cron job .. and a reasonable session expiration time). This reduces the number of unnecessary hits to the DB and also does not leave a massive table in the DB. It is still a big table, but lot smaller than what it used to be.
Given the fact that the OP question doesn't have a CodeIgniter 2 tag, I'll answer how to deal with sessions cleanup when the database keeps growing for CodeIgniter 3.
Issue:
When you set (in the config.php file) sess_expiration key too high (let's say 1 year) and sess_time_to_update key low (let's say 5 min), the session table will keep growing as the users browse though your website, until sessions rows will expire and will be garbage collected (which is 1 year).
Solution:
Setting sess_regenerate_destroy key to TRUE (default set to FALSE) will delete an old session when it will regenerate itself with the new id, thus cleaning your table automatically.
No, CodeIgniter cleans up after itself...
Note
According to the CodeIgniter documentation:
The Session class has built-in garbage collection which clears out expired sessions so you do not need to write your own routine to do it.
CodeIgniter's Session Class probably checks the session table and cleans up expired entries. However, the documentation does not say when the clean up happens. Since there are no cron jobs as part of CodeIgniter, the clean up must occur when the Session class is invoked. I suppose if the site remains idle forever, the session table will never be cleared. But, this would be an unusual case.
CodeIgniter implements the SessionHandlerInterface (see the docs for the custom driver).
CodeIgniter defines a garbage collector method named gc() for each driver (database, file, redis, etc) or you can define your custom gc() for your custom driver.
The gc() method is passed to PHP with the session_set_save_handler() function, therefore the garbage collector is called internally by PHP based on session.gc_divisor, session.gc_probability settings.
For example, with the following settings:
session.gc_probability = 1
session.gc_divisor = 100
There is a 1% chance that the garbage collector process starts on each request.
So, you do not need to clean the session table if your settings are properly set.
When you call:
$this->session->sess_destroy();
It deletes the information in database by itself.
Since PHP7, the GC-based method is disabled by default, as per the documentation at https://www.php.net/manual/en/function.session-gc.php Stumbled upon this because a legacy application suddenly stopped working, reaching a system limitation since sessions are never ever cleaned up. A cronjob to clean up the sessions would be a good idea...
It is always good practice to clear the table. Otherwise, if your querying the session data for say creating reports or something, it will be slow and unreliable. Nevertheless, given the performance of mysql, yes do so.
From the memcached wiki:
When the table is full, subsequent inserts cause older data to be purged in least recently used (LRU) order.
I have the following questions:
Which data will be purged? The one which is older by insertion, or the one which is least recently used? I mean if recently accessed data is d1 which is oldest by insertion and the cache is full while replacing data will it replace d1?
I am using PHP for interacting with memcached. Can I have control over how data is replaced in memcached? Like I do not want some of my data to get replaced until it expires even if the cache is full. This data should not be replaced instead other data can be removed for insertion.
When data is expired is it removed immediately?
What is the impact of the number of keys stored on memcached performance?
What is the significance of -k option in memcached.conf? I am not able to understand what "lock down all paged memory" means. Also, the description in README is not sufficient.
When memcached needs to store new data in memory, and the memory is already full, what happen is this:
memcached searches for a a suitable* expired entry, and if one is found, it replaces the data in that entry. Which answers point 3) data is not removed immediately, but when new data should be set, space is reallocated
if no expired entry is found, the one that is least recently used is replaced
*Keep in mind how memcached deals with memory: it allocates blocks of different sizes, so the size of the data you are going to set in the cache plays role in deciding which entry is removed. The entries are 2K, 4K, 8K, 16K... etc up to 1M in size.
All this information can be found in the documentation, so just read in carefully. As #deceze says, memcached does not guarantee that the data will be available in memory and you have to be prepared for a cache miss storm. One interesting approach to avoid a miss storm is to set the expiration time with some random offset, say 10 + [0..10] minutes, which means some items will be stored for 10, and other for 20 minutes (the goal is that not all of items expire at the same time).
And if you want to preserve something in the cache, you have to do two things:
a warm-up script, that asks cache to load the data. So it is always recently used
2 expiration times for the item: one real expiration time, let's say in 30 minutes; another - cached along with the item - logical expiration time, let's say in 10 minutes. When you retrieve the data from the cache, you check the logical expiration time and if it is expired - reload data and set it in the cache for another 30 minutes. In this way you'll never hit the real cache expiration time, and the data will be periodically refreshed.
5) What is the significance of -k option in "memcached.conf". I am not
able to understand what does "Lock down all paged memory" means. Also
description in README is also not sufficient.
No matter how much memory you will allocate for memcached, it will use only the amount it needs, e.g. it allocates only the memory actually used. With the -k option however, the entire memory is reserved when memcached is started, so it always allocates the whole amount of memory, no matter if it needs it or not
I'm caching tweets on my site (with 30 min expiration time). When the cache is empty, the first user to find out will repopulate it.
However, at that time the Twitter API may return a 200. In that case I'd like to prolong the previous data for another 30 mins. But the previous data will already be lost.
So instead I'd like to look into repopulating the cache, say, 5 minutes before expiration time so that I don't lose any date.
So how do I know the expiration time of an item when using php's memcache::get()?
Also, is there a better way of doing this?
In that case, isn't this the better logic?
If the cache is older than 30 minutes, attempt to pull from Twitter
If new data was successfully retrieved, overwrite the cache
Cache data for an indefinite amount of time (or much longer than you intend to cache anyway)
Note the last time the cache was updated (current time) in a separate key
Rinse, repeat
The point being, only replace the data with something new if you have it, don't let the old data be thrown away automatically.
don't store critical data in memcached. it guarantees nothing.
if you always need to get "latest good" cache - you need to store data at any persistent storage, such as database or flat file.
in this case if nothing found in cache - you do twitter api request. if it fails - you read data from persistent. and on another http request you will make same iteration one more time.
or you can put data from persistent into memcache with pretty shor lifetime. few minutes for example (1-5) to let twitter servers time to get healthy. and after it expired - repeat the request.
When you are putting your data into memcache - you are setting also how long the cache is valid. So theoretically you could also put the time when cache was created and/or when cache will expire. Later after fetching from cache you can always validate how much time left till cache will expire and decide what you want to do.
But letting cache to be repopulated on user visit can be still risky at some point - lets say if you would like to repopulate cache when it reaches ~5 min before expiration time - and suddenly there would be no visitors coming in last 6 minutes before cache expires - then cache will still expire and no one will cause it to be repopulated. If you want to be always sure that cache entry exists - you need to do checks periodically - for example - making a cronjob which does cache checks and fill-ups.
I'm new to memcached.
Is this code vulnerable to the expired cache race condition?
How would you improve it?
$memcache = new Memcache;
$memcache->connect('127.0.0.1');
$arts = ($memcache===FALSE) ? FALSE : $memcache->get($qparams);
if($arts===FALSE) {
$arts=fetchdb($q, $qparams);
$memcache->add($qparams, $arts, MEMCACHE_COMPRESSED, 60*60*24*3);
}
if($arts<>FALSE) {
// do stuff
} else {
// empty dataset
}
$qparams contains the parameters to the query, so I'm using it as key.
$arts get's an array with all fields I need for every item.
Let's say that query X gets 100 rows. A little after row #50 is modified by another process (lets say that the retail price gets increased).
What should I do about the cache?
How can I know in row #50 is cached?
Should I invalidate ALL the entries in the cache? (sounds like overkill to me).
Is this code vulnerable to the expired cache race condition? How would you improve it?
Yes. If two (or more) simultaneous clients try to fetch the same key from the cache and end up pulling it from the database. You will have spikes on the database and for periods of time the database will be under heavy load. This is called cache stampede. There are a couple of ways to handle this:
For new items preheat the cache (basically means that you preload the objects you require before the site goes live).
For items that expire periodically create an expire time that is a bit in the future than the actual expire time (lets say 5-10 minutes). Then when you pull the object from the cache, check if the expire time is close, cache into the future to prevent any other client from updating the cache, and update from the database. For this to work with no cache stampedes you would need to either implement key locking or use cas tokens (would require the latest client library to work).
For more info check the memcached faq.
Let's say that query X gets 100 rows. A little after row #50 is modified by another process (lets say that the retail price gets increased).
You have three types of data in cache:
Objects
Lists of Objects
Generated data
What I usually do is to keep the objects as separate keys and then use cache "pointers" in lists. In your case you have N objets somewhere in cache (lets say the keys are 1,2..N), and then you have your list of objects in an array array(1,2,3,10,42...). When you decide to load the list with objects, you load the list key from cache, then load the actual objects from cache (using getMulti to reduce requests). In this case if any of the object gets updated, you update it in one spot only and it is automatically updated everywhere (not to mention that you save huge amount of space with this technique).
Edit: Decided to add a bit more info regarding the lookahead time expiration.
You set up your object with an expiration data x and save it into the database with an expiration date of x+5minutes. This are the steps you take when you load the object from the cache:
Check if it is time to update (time() - x < 0)
If so, lock the key so nobody can update it while you are refreshing the item. If the you cannot lock the key, then somebody else is already updating the key, and it becomes a SEP (Somebody Else's Problem). Since memcached has no solution for locks, you have to devise your own mechanism. I usually do this by adding a separate key with the original keys value + ":lock" at the end. You must set this key to expire in the shortest amount possible (for memcached that is 1 second).
If you obtained a lock on the key, you first save the object with a new expiration time (this way you are sure no other clients will try to lock the key), then go about your business and update the key from the database and save the new value again with the appropriate lookahead expirations (see point 1).
Hope this clears everything up :)
You have to invalidate any cached object that contains a modified item. Either you have to modify the cache mechanism to store items at a more granular level, or invalidate the entire entry.
It's basically the same as saying you're caching the entire DB in a single cache-entry. You either expire it or you don't.