I am using CI's sessions in connection with a database. So all of our sessions are in this ci_sessions table on our database and it can get a lot of rows, considering that the session_id keep changing every 5 minutes.
Do we need to empty the table, say every one a month / week maybe?
While what #Marc-Audet said is true, if you take a look at the code, you can see it is a really lousy way to clean up sessions.
The constructor calls the _sess_gc function every time it is initiated. So, basically each request to your server if you have it autoloaded.
Then, it generates a random number below 100 and sees if that's below a certain value (by default it is 5). If this condition is met, then it will remove any rows on the session table with last_activity value less than current time minus your session expiration.
While this works for most cases, it is technically possible that (if the world is truly random) the random number generator does not generate a number below 5 for a long time, in which case, your sessions will not be cleaned up.
Also, if you have your session expiry time set to a long time (if you set to 0, CI will set it to 2 years) then those rows are not going to get deleted anyway. And if your site is good enough to get a decent amount of visitors, your DBA will be pointing fingers at the session table some time soon :)
It works for most cases - but I would not call it a proper solution. Their session id regeneration really should have been built to remove the records pertaining to the previous ids and the garbage collection really should not be left to a random number - in theory, it is possible that the required number is not generated as frequently as you wished.
In our case, I have removed the session garbage collection from the session library and I manually take care of it once a day (with a cron job .. and a reasonable session expiration time). This reduces the number of unnecessary hits to the DB and also does not leave a massive table in the DB. It is still a big table, but lot smaller than what it used to be.
Given the fact that the OP question doesn't have a CodeIgniter 2 tag, I'll answer how to deal with sessions cleanup when the database keeps growing for CodeIgniter 3.
Issue:
When you set (in the config.php file) sess_expiration key too high (let's say 1 year) and sess_time_to_update key low (let's say 5 min), the session table will keep growing as the users browse though your website, until sessions rows will expire and will be garbage collected (which is 1 year).
Solution:
Setting sess_regenerate_destroy key to TRUE (default set to FALSE) will delete an old session when it will regenerate itself with the new id, thus cleaning your table automatically.
No, CodeIgniter cleans up after itself...
Note
According to the CodeIgniter documentation:
The Session class has built-in garbage collection which clears out expired sessions so you do not need to write your own routine to do it.
CodeIgniter's Session Class probably checks the session table and cleans up expired entries. However, the documentation does not say when the clean up happens. Since there are no cron jobs as part of CodeIgniter, the clean up must occur when the Session class is invoked. I suppose if the site remains idle forever, the session table will never be cleared. But, this would be an unusual case.
CodeIgniter implements the SessionHandlerInterface (see the docs for the custom driver).
CodeIgniter defines a garbage collector method named gc() for each driver (database, file, redis, etc) or you can define your custom gc() for your custom driver.
The gc() method is passed to PHP with the session_set_save_handler() function, therefore the garbage collector is called internally by PHP based on session.gc_divisor, session.gc_probability settings.
For example, with the following settings:
session.gc_probability = 1
session.gc_divisor = 100
There is a 1% chance that the garbage collector process starts on each request.
So, you do not need to clean the session table if your settings are properly set.
When you call:
$this->session->sess_destroy();
It deletes the information in database by itself.
Since PHP7, the GC-based method is disabled by default, as per the documentation at https://www.php.net/manual/en/function.session-gc.php Stumbled upon this because a legacy application suddenly stopped working, reaching a system limitation since sessions are never ever cleaned up. A cronjob to clean up the sessions would be a good idea...
It is always good practice to clear the table. Otherwise, if your querying the session data for say creating reports or something, it will be slow and unreliable. Nevertheless, given the performance of mysql, yes do so.
Related
I have a form with multiple pages. I use the $_SESSION array to store the user input. Each page starts with
session_start();
Sometimes the $_SESSION variables are lost. I guess this is happens if the user remains for a too long period afk and the Garbage Collector removes then the variables.
If I understand it correctly, then the function session_status() only checks if a session has been started, and not if the garbage collector has removed recently any entries.
If the garbage collector becomes active, does he delete all entries of the $_SESSION array or just some of them? In other words, could I check if my Session expired by simply doing the following:
session_start();
if(empty($_SESSION)){
// Garbage Collecter removed entries because user was too long afk
}
The overall mechanism is not as sophisticated as you probably think.
Sessions can have several storage back-ends, the default of which is the builtin file handler, that merely creates, well, files:
The only way to link a given file with a given session is the session ID which, as you can see, is part of the file name.
Garbage collection is a file removal based on last modification time. Once it happens, files are gone forever. There's just no trace or record that the file ever existed.
In general, you don't need to worry about this case. Just make sure you define a lifetime that's long enough for your application. The default value in many systems often ranges from 20 to 30 minutes, which is fairly small. Also, make sure your app has its own session directory, so other apps with a shorter lifetime won't remove your files:
session_save_path('/home/foo/app/sessions');
ini_set('session.gc_maxlifetime', 86400); // 1 day (in seconds)
P.S. Some Linux systems disable PHP garbage collection and replace it with a custom cron script, what prevents custom locations from being cleaned up. For that reason I normally set these other directives just in case:
// Restore the default values
ini_set('session.gc_probability', 1);
ini_set('session.gc_divisor', 100);
The scenario:
User logs in
Cookie is set to length of session
After 1 hour of inactivity I wish to log out the user
How I think I can solve this:
Set the session.gc_maxlifetime to 1 hour (3600)
Set the session.gc_probability to 1
Set the session.gc_divisor to 1
Therefore having a 100% certainty that garbage collection will occur on any idle session cookies after 1 hour.
My question:
All the posts and documentation I've read has never mentioned setting a gc change of 100%, therefore is it bad to do this? Is there a better way?
It's a symfony app, and long term I would like to do something like this http://symfony.com/doc/master/components/http_foundation/session_configuration.html#session-meta-data but for now I was hoping to just do something simple with session.gc_*
One post I read implies that having a 100% garbage collection chance is "cost-intensive" How do I expire a PHP session after 30 minutes? is this true? If so, how cost intensive?
Cheers!
The gc_probability and gc_divisor are there to let you define the "probability" of firing up the garbage collection (GC).
Since GC (as everything) comes with a cost, you wouldn't usually want it to run on each and every web request processed by your server - that would mean that every page opening or every AJAX request served from PHP would cause the GC to run.
So, depending on the actual server load and usage, the admin is expected to do an educated guess on how often should GC be run: once in 100, 1/10000 or 1 in million requests.
But, there's a problematic flaw in the OP's original reasoning - that garbage collection will occur on any idle session. The way I read the manual, the garbage collection will occur on ANY session, not just idle ones:
session.gc_maxlifetime integer: specifies the number of seconds after which data will be seen as 'garbage' and potentially cleaned up.
So, the session (idle or not) lifetime is decided with gc_maxlifetime, while the moment of the GC being started (as said in the docs: "potentially") is really decided with gc_probability and gc_divisor.
To resume, my late answer to the question would be - I would not under normal condition have GC running at each and every request (the 1/1 scenario you mentioned), because
that seems like a serious overkill. On some level, you would probably end up with thousands (if not worse) of IFs and only once going into its THEN
you would log out ANY user on your system after 60mins, not just the idle ones.
There are much better ways of doing this.
If this isn't for something particularly secure, you can set an expiration date/length for the session cookies on the client-side. A technically minded user could tweak the expiration in this case, so you wouldn't want to use this on a bank site.
If you need something more secure, just store an expiration time along with the other session data and check against it. If it's exceeded, destroy their session and force them to log back in.
I am making a session system for my website using PHP and MySQL. The idea is that a user session will last for around 5 minutes if they are inactive, and a CronJob runs every now and then and checks to see if sessions are expired, and if they are, removes the session.
The issue:
Every time someone loads their page it has to check the database to see if their session is still valid. I am wondering if in that CronJob task, I could make it find that users PHP Session and change a variable like $_SESSION['isValidSession'] and set it to false.
So once they load the page it just checks if that variable if the session is valid.
Sorry for the wall of text!
TL;DR: I want to modify session variables of different specified sessions.
Thanks.
Every time someone loads their page it has to check the database to
see if their session is still valid. I am wondering if in that CronJob
task, I could make it find that users PHP Session and change a
variable like $_SESSION['isValidSession'] and set it to false.
You have to do this regardless. When the users load their page, the system must verify whether the session exists in the database (I assume that you're using a DB).
If you run the cron job every minute, and expire all sessions older than five (which seems rather excessive? I often stay inactive on a site for five, ten, even fifteen minutes if I am reading a long page), this will automatically "mark invalid" (actually remove) the sessions.
Normally you would keep a TIMESTAMP column with the time of last update of that row (meaning that session), and the cron job would DELETE all rows with timestamp older than five minutes ago. When reloading the page, the system would no longer find the relevant session row, and deduce (correctly) that the session has expired.
However, what you want (reading a session knowing its SessionID) can be accomplished by reading in the session by the cron job (you can code the job in PHP) either loading as extant session given its ID, or by reading the DB column holding the serialized data with a SELECT SessionData FROM SessionTable WHERE id = 'SessionId'; and de-serializing it. Then you modify the inflated object, re-serialize it and store it back in the database with SQL UPDATE. Hey presto!, session has now been modified.
But be aware that this will likely cause concurrency problems with active clients, and cannot be done in SQL in one fell swoop - you can't execute UPDATE Sessions SET isInactive = 1 WHERE expiry... directly. Normally you need to read the rows of interest one by one, unserialize them and store them back, processing them with PHP code.
You can do it indirectly with two different workarounds.
One, you change your session code to use unserialized data. This will impact maintainability and performance (you can't "just add" something to a session: you have to create a column for it).
Two: you take advantage of the fact that in serialized form, "0" and "1" have the same length. That is, the serialized session containing isValidSession (name of 14 characters) will contain the text
...{s:14:"isValidSession";b:1;}...
and you can change that piece of string with {s:14:"isValidSession";b:0;}, thus making isValidSession become False. This is not particularly good practice - you're messing with the system's internals. Of course, I don't think anybody expects PHP's serialized data syntax to change anytime soon (...or do they?).
<?php var_dump($_SESSION); ?>
You should store the time of last request of the users in the database.
In the cornjob you should check users last view time and compare to current time, then check which user time has been expired.
And then update the column of database as false for expired users.
After than you can easily find out which user should be log out just by checking that colmn in database.
I'll most probably be using MemCache for caching some database results.
As I haven't ever written and done caching I thought it would be a good idea to ask those of you who have already done it. The system I'm writing may have concurrency running scripts at some point of time. This is what I'm planning on doing:
I'm writing a banner exchange system.
The information about banners are stored in the database.
There are different sites, with different traffic, loading a php script that would generate code for those banners. (so that the banners are displayed on the client's site)
When a banner is being displayed for the first time - it get's cached with memcache.
The banner has a cache life time for example 1 hour.
Every hour the cache is renewed.
The potential problem I see in this task is at step 4 and 6.
If we have for example 100 sites with big traffic it may happen that the script has a several instances running simultaneously. How could I guarantee that when the cache expires it'll get regenerated once and the data will be intact?
How could I guarantee that when the cache expires it'll get regenerated once and the data will be intact?
The approach to caching I take is, for lack of a better word, a "lazy" implementation. That is, you don't cache something until you retrieve it once, with the hope that someone will need it again. Here's the pseudo code of what that algorithm would look like:
// returns false if there is no value or the value is expired
result = cache_check(key)
if (!result)
{
result = fetch_from_db()
// set it for next time, until it expires anyway
cache_set(key, result, expiry)
}
This works pretty well for what we want to use it for, as long as you use the cache intelligently and understand that not all information is the same. For example, in a hypothetical user comment system, you don't need an expiry time because you can simply invalidate the cache whenever a new user posts a comment on an article, so the next time comments are loaded, they're recached. Some information however (weather data comes to mind) should get a manual expiry time since you're not relying on user input to update your data.
For what its worth, memcache works well in a clustered environment and you should find that setting something like that up isn't hard to do, so this should scale pretty easily to whatever you need it to be.
I'm new to memcached.
Is this code vulnerable to the expired cache race condition?
How would you improve it?
$memcache = new Memcache;
$memcache->connect('127.0.0.1');
$arts = ($memcache===FALSE) ? FALSE : $memcache->get($qparams);
if($arts===FALSE) {
$arts=fetchdb($q, $qparams);
$memcache->add($qparams, $arts, MEMCACHE_COMPRESSED, 60*60*24*3);
}
if($arts<>FALSE) {
// do stuff
} else {
// empty dataset
}
$qparams contains the parameters to the query, so I'm using it as key.
$arts get's an array with all fields I need for every item.
Let's say that query X gets 100 rows. A little after row #50 is modified by another process (lets say that the retail price gets increased).
What should I do about the cache?
How can I know in row #50 is cached?
Should I invalidate ALL the entries in the cache? (sounds like overkill to me).
Is this code vulnerable to the expired cache race condition? How would you improve it?
Yes. If two (or more) simultaneous clients try to fetch the same key from the cache and end up pulling it from the database. You will have spikes on the database and for periods of time the database will be under heavy load. This is called cache stampede. There are a couple of ways to handle this:
For new items preheat the cache (basically means that you preload the objects you require before the site goes live).
For items that expire periodically create an expire time that is a bit in the future than the actual expire time (lets say 5-10 minutes). Then when you pull the object from the cache, check if the expire time is close, cache into the future to prevent any other client from updating the cache, and update from the database. For this to work with no cache stampedes you would need to either implement key locking or use cas tokens (would require the latest client library to work).
For more info check the memcached faq.
Let's say that query X gets 100 rows. A little after row #50 is modified by another process (lets say that the retail price gets increased).
You have three types of data in cache:
Objects
Lists of Objects
Generated data
What I usually do is to keep the objects as separate keys and then use cache "pointers" in lists. In your case you have N objets somewhere in cache (lets say the keys are 1,2..N), and then you have your list of objects in an array array(1,2,3,10,42...). When you decide to load the list with objects, you load the list key from cache, then load the actual objects from cache (using getMulti to reduce requests). In this case if any of the object gets updated, you update it in one spot only and it is automatically updated everywhere (not to mention that you save huge amount of space with this technique).
Edit: Decided to add a bit more info regarding the lookahead time expiration.
You set up your object with an expiration data x and save it into the database with an expiration date of x+5minutes. This are the steps you take when you load the object from the cache:
Check if it is time to update (time() - x < 0)
If so, lock the key so nobody can update it while you are refreshing the item. If the you cannot lock the key, then somebody else is already updating the key, and it becomes a SEP (Somebody Else's Problem). Since memcached has no solution for locks, you have to devise your own mechanism. I usually do this by adding a separate key with the original keys value + ":lock" at the end. You must set this key to expire in the shortest amount possible (for memcached that is 1 second).
If you obtained a lock on the key, you first save the object with a new expiration time (this way you are sure no other clients will try to lock the key), then go about your business and update the key from the database and save the new value again with the appropriate lookahead expirations (see point 1).
Hope this clears everything up :)
You have to invalidate any cached object that contains a modified item. Either you have to modify the cache mechanism to store items at a more granular level, or invalidate the entire entry.
It's basically the same as saying you're caching the entire DB in a single cache-entry. You either expire it or you don't.