This is a beginner question...
In a website, what type of data should or should not be included inside the session? I understand that I should not include any info that needs to remain secure. I'm more interested in programming best practice. For example, it is possible to include into the session some data which would otherwise be sent from page to page as dependency injection. Wouldn't that correspond to creating a global variable?
Generally speaking, what kind of data has or hasn't its place inside a session table?
Thanks,
JDelage
The minimum amount of information needed to maintain needed state information between requests.
You should treat your session as a write-once, read many storage. But one which is rather volatile - e.g. the state of your underlying application data should be consistent (or recoverable) if all the sessions suddenly disappeared.
There are some exceptions to this (normally the shopping basket would be stored in the session - but you might want to perform stock adjustments to 'reserve' items prior to checkout). Here items may be added/edited/changed multiple times - so its not really write-once - but by pre-reserving stock items you are maintaining the recoverabiltiy of the database - but an implication of this is that you should reverse the stock adjustments when the session expires in the absence of completion.
If you start trying to store information about the data relating to individual page turns, you're quickly going to get into problems when the user starts clicking on the forward/back buttons or opens a new window.
In general you can put anything you like in a session. It's bad practice to put information in a session that has to be present to make your page run without (technical) errors.
I suggest to minimize the amount of data in your session as much as possible.
stuff you can save in the session so that you dont have to make another database query for info that isn't going to change. like their username, address, phone number, account balance, security permissions on your site, etc.
(This is perhaps more than you're looking for, but might make for good additional information to add to the good answers already posted.)
Since you mention best practices, you may want to look into some projects/technologies which can be used to take the idea of session state a bit further. One common pitfall with horizontally scaling web applications across multiple servers is maintaining session state between them. (User A logs in to Server A which stores the user's session, but on the next request hits Server B which doesn't know about User A's session, etc.)
One of the things I always end up saying to myself and to colleagues is that session by itself isn't really the best place to store data, even if that data is highly transient in nature. A web server is a request/response system, not a data store. It's highly tuned to the former, but not always so great for the latter.
Thus, there are ways to externalize your application's session data (or any stateful data, which should really be kept to a design minimum in the RESTful stateless nature of the web) from your web server and to another system. Memcached is a very common tool for this. There are also drop-in session replacements (or configurable session options for various frameworks/environments) which store session in a database like SQL or MySQL.
One idea I've been toying with lately is to store session data (well, any transient data where it's OK to lose it in a catastrophe) in a NoSQL database. CouchDB and MongoDB are my current top choices for this, but there's no shortage of other options. CouchDB has excellent horizontal scaling, MongoDB is ridiculously fast when run entirely in-memory, etc.
One of the major benefits of something like this, at least for me, is that deployments can easily become non-events. The web services on any given server can be re-started and the applications therein re-initialized without losing stateful data. If the data is persisted to the disk (that is, not entirely run in-memory) then the server can even be rebooted without losing it. Servers/services can drop in and out of the farm and users would never know the difference.
Additionally, externalizing this data allows you to analyze the data in potentially useful ways. Query it, run metrics on it, interface with it via other web applications or entirely offline tools, etc. It really opens up the options as a project grows in complexity.
(Again, this isn't really intended to answer your question, but rather to just add information that you may find useful. It's something my colleagues and I have been tinkering with as of late and your question seemed like a good place to mention it.)
Related
Is it secure to store user chat messages in database as plain text?
And another question: Where to store page content - in database or in files? Wordpress holds blog entries in database, but it takes 25 requests to database to display a page, so website perfomance decreases.
Summary of comments:
Considering the nature of the application (a chatroom on the Internet), I feel that saving the chats as plain text is acceptable assuming that the users do not chat about private/sensitive/confidential information.
The above assumption can be made given some faith and optimism in the human race (i.e.: that it is smart enough to realize that a chatroom or a PM session is not the time or place to give out passwords, SINs, credit card numbers, etc.).
Concerning the, well... concerns surrounding the situation in which a person betrays the above assumption, I am of the belief that no amount of foolproofing is enough for the most ingenious fool. While encrypting the chat is most definitely more secure than not, the cost of encrypting each and every chat as opposed to the applicability of the added benefit to chats (that is, none whatsoever unless the chat contains sensitive information, which is a rare case at best) provides little incentive to encrypt those chats. A much simpler solution would be to simply disclaim any responsibility for private information leaked from the chats.
One last tidbit on handling sensitive information (this one's for Internet users everywhere): don't do it through email, chat, or any unsecured connection. Try as much as possible to avoid putting sensitive information where it will be logged, unless you have absolute confidence the logs won't be breached.
Separate questions really ought to be posted as separate questions...
And neither question is sufficiently well-defined to give anything but the broadest of answers...
But here we go:
Whether or not something is "secure" depends on your threat model (i.e. your definition of "secure"). But what alternatives are available to you? If you encrypt the messages, where will you store the decryption key?
Where to store data depends on the structure of such data and how you intend for it to be used. If it is "static" and will always be queried in predictable ways, a filesystem may provide sufficient structure for good performance; however, if the data is "dynamic" (i.e. your application will modify it), then a database may offer greater flexibility or better performance. As with most problems in computing, the design decision you take is a trade-off for which the best answer will depend on your own requirements (and indeed, your metric of what is "best").
As long as you keep your database credentials safe with you, It is fine to keep them in database. I do not see any reason to keep them encrypted.
What kind of page content are you talking about ? If it is a CMS kinds stuff, If you keep page contents as Pages, How many pages you are going to keep. God !!!!
You should use a database to keep such things. It makes things easier to update the content in the future etc. And you do not need to worry about Querying your database. Find out items which are not being updated frequently and cache it. when your page needs those contents , get it from the cache layer instead of the DB tables.
Is it secure to store user chat messages in database as plain text?
Depends on how sensible is the information and how secure is the database itself. Example: Can the database be accessed from outside or only from localhost? However if you feel that you need an extra layer of security, then it doesn't hurt to use some simple way of obfuscation.
Where to store html page content - in database or in files?
Most of the times, accessing files is quicker than accessing the Database.
Database should be used to store/access information in a structured way, enabling elaborated searches, data changes, etc... Static HTML is probably better to save in the filesystem. However, sometimes it might be best to store html content in the database.
Examples:
Storing in database makes it easier to change database information from an admin page, than changing the file system (also, it's safer). So for dynamically created pages, or in a CMS (such as Drupal or Wordpress) it might be best to store content and "layouts" in the database.
Forum posts contain "content" as well as markup (styling). You don't usually separate one from the other.
Database storage ends in files too. Compression is in use - less memory for saving in the end.
I am using the Zend Framework but my question is broadly about sessions / databases / auth (PHP MySQL).
Currently this is my approach to authentication:
1) User signs in, the details are checked in database.
- Standard stuff really.
2) If the details are correct only the user's unique ID is stored in the session and a security token (user unique ID + IP + Browser info + salt). The session in written to the filesystem.
I've been reading around and many are saying that storing stuff in sessions is not a good idea, and that you should really only write a unique ID which refers back to the user's details and a security token to prevent session hijacking. So this is the approach i've taken, i use to write the user's details in session, but i've moved that out. Wanted to know your opinions on this.
I'm keeping sessions in the filesystem since i don't run on multiple servers, and since i'm only writting a tiny tiny bit of data to sessions, i thought that performance would be greater keeping sessions in the filesystem to reduce load on the database. Once the session is written on authentication, it really is only read-only from then on.
3) The rest of the user's details (like subscription details, permissions, account info etc) are cached in the filesystem (this can always be easily moved to memory if i wanted even more performance).
So rather than keeping the user's details in session, the user's details are cached in the file system. I'm using Zend_Cache and the unique cache id is something like md5(/cache/auth/2892), the number is the unique id of the user. I guess the benefit of this method is that once the user is logged in, there is essentially not database queries being run to get the user's details. Just wonder if this approach is better than keeping the whole lot in session...
4) As the user moves throughout the site the only thing that is checked is the ID in the session and the security token.
So, overall the first question is 1) is the filesystem more efficient than a database for this purpose 2) have i taken enough security precautions 3) is separating user detail's from the session into a cached file a pointless task?
Thanks.
You're asking a range of things.
Sessions
Sessions in PHP are fast and efficient. Thousands of small disk-based sessions on a moderately up-to-date server is not going to be a performance bottleneck. Neither is writing your own handlers (very easy; the PHP manual has examples) to put it in a database.
About the only best-practice rules about sessions is: only give the web browser one thing, the session ID. Putting just the logged in userid in the session and retrieving those details from the DB when you need them is also best-practice. It also means that user information can be changed and they get it on the next page update.
It doesn't sound like you will have this problem but beware of just throwing a lot of stuff into a session. A few K of data (say, a few dozen scalars) is fine. Tossing many objects and large arrays of data in there will be noticed. If you do this for a specific page, remember to throw it away in the session once the page is done with it.
You may also want to implement your own login timeout with a session variable. The garbage collection settings in php.ini are intended for managing the storage of session data, not for doing login timeouts.
Caching
This is a complex topic and you will probably need to start gathering metrics (generally page load times) before implementing anything.
To implement any sort of caching, you do need to consider the lifetime of the data you're caching and how expensive re-generating it will be on a cache miss. Just throwing memcache at the problem is not a solution; you still need to understand your caching parameters and how memcache interprets them. This also applies to any persistent storage solution, including disk-based sessions, but I'm highlighting memcache because it is high-profile and has quite an aggressive expiry mechanism.
An often overlooked example is loading the same data from the database multiple times in a page: a good ORM will do that for you without relying on MySQL query caching. Another overlooked example are small queries that run on every page: caching these for just a few seconds on a moderately busy server and the database load will drop considerably.
Finally, caching at multiple levels is often much more effective and scalable than once because they can leverage each other's expiries. It also abstracts well: for example, hide it in your ORM and it's theoretically available invisibly and automatically for all your objects.
1) You can easily test which is faster by making a loop script. Anyway, a drawback with using the filesystem is that you need to update the cached file everytime you update the db. Copies of data is in general a bad thing. Also, unless you have millions of visitors I dont think there will be any practical diffrence regarding speed in any of the stratagies. And... not to forget, sessions are also stored in the filesystem. One file for each session.
Is a query faster then the filesystem: Depends. Is query caching enabled. In MySql it is by default, and than you might be lucky and only need a memory access. If not, the db needs to do a filesystem accass anyway. Second, how optimized is your query with index's. How buissy is the server harddisk.
3) Depends on the speed of fetching it from db. In general, caching can do magic to speed performance, but caching in memory would be even better by using memcached or something similar. In general i would avoid copies of the data in files. But of course, if it takes secods to query the data from the db, than go for filesystem caching. Also, if you have many users.. like 10.000+ you have to make some folder system, since putting 10.000 cached files in the same folder slows downs the accesstime...
I currently have a custom session handler class which simply builds on php's session functionality (and ties in some mySQL tables).
I have a wide variety of session variables that best suits my application (primarily kept on the server side). Although I am also using jQuery to improve the usability of the front-end, and I was wondering if feeding some of the session variables (some basics and some browse preference id's) to a JS object would be a bad way to go.
Currently if I need to access any of this information at the front-end I do a ajax request to a php page specifically written to provide the appropriate response, although I am unsure if this is the best practice (actually I'm pretty sure this just creates a excess number of Ajax requests).
Has anyone got any comments on this? Would this be the best way to have this sort of information available to the client side?
I really guess it depends on many factors. I'm always having "premature optimization ..." in the back of my head.
In earlier years I rushed every little idea that came to my mind into the app. That often lead to "i made it cool but I didn't took time to fully grasp the problem I'm trying to solve; was there a problem anyway?"
Nowadays I use the obvious approach (like yours) which is fast (without scarifying performance completely on the first try) and then analyze if I'm getting into problems or not.
In other words:
How often do you need to access this information from different kind of loaded pages (because if you load the information once without the user reloading there's probably not much point in re-fetching it anyway) multiplied by number of concurrent clients?
If you write the information into a client side cookie for fast JS access, can harm be done to your application if abused (modified without application consent)? Replace "JS" and "cookie" without any kind of offline storage like WHATWG proposes it, if #1 applies.
The "fast" approach suits me, because often there's not the big investment into prior-development research. If you've done that carefully ... but then you would probably know that answer already ;)
As 3. you could always push the HTML to your client already including the data you need in JS, maybe that can work in your case. Will be interesting to see what other suggestions will come!
As I side note: I've had PHP sessions stored in DB too, until I moved them over to memcached (alert: it's a cache and not a persistent store so may be not a good idea for you case, I can live with it, I just make sure it's always running) to realize a average drop of 20% of database queries and and through this a 90% drop of write queries. And I wasn't even using any fancy Ajax yet, just the number of concurrent users.
I would say that's definately an overkill of AJAX, are these sessions private or important not to show to a visitor? Just to throw it out there; a cookie is the easiest when it comes to both, to have the data in a javascript object makes it just as easily readable to a visitor, and when it comes down to cookies being enabled or not, without cookies you wouldn't have sessions anyway.
http://www.quirksmode.org/js/cookies.html is a good source about cookie handling in JS and includes two functions for reading and writing cookies.
I just figured out that I can actually store objects in the $_SESSION and I find it quite cool because when I jump to another page I still have my object. Now before I start using this approach I would like to find out if it is really such a good idea or if there are potential pitfalls involved.
I know that if I had a single point of entry I wouldn't need to do that but I'm not there yet so I don't have a single point of entry and I would really like to keep my object because I don't lose my state like that. (Now I've also read that I should program stateless sites but I don't understand that concept yet.)
So in short: Is it ok to store objects in the session, are there any problems with it?
Edit:
Temporary summary: By now I understand that it is probably better to recreate the object even if it involves querying the database again.
Further answers could maybe elaborate on that aspect a bit more!
I know this topic is old, but this issue keeps coming up and has not been addressed to my satisfaction:
Whether you save objects in $_SESSION, or reconstruct them whole cloth based on data stashed in hidden form fields, or re-query them from the DB each time, you are using state. HTTP is stateless (more or less; but see GET vs. PUT) but almost everything anybody cares to do with a web app requires state to be maintained somewhere. Acting as if pushing the state into nooks and crannies amounts to some kind of theoretical win is just wrong. State is state. If you use state, you lose the various technical advantages gained by being stateless. This is not something to lose sleep over unless you know in advance that you ought to be losing sleep over it.
I am especially flummoxed by the blessing received by the "double whammy" arguments put forth by Hank Gay. Is the OP building a distributed and load-balanced e-commerce system? My guess is no; and I will further posit that serializing his $User class, or whatever, will not cripple his server beyond repair. My advice: use techniques that are sensible to your application. Objects in $_SESSION are fine, subject to common sense precautions. If your app suddenly turns into something rivaling Amazon in traffic served, you will need to re-adapt. That's life.
it's OK as long as by the time the session_start() call is made, the class declaration/definition has already been encountered by PHP or can be found by an already-installed autoloader. otherwise it would not be able to deserialize the object from the session store.
HTTP is a stateless protocol for a reason. Sessions weld state onto HTTP. As a rule of thumb, avoid using session state.
UPDATE:
There is no concept of a session at the HTTP level; servers provide this by giving the client a unique ID and telling the client to resubmit it on every request. Then the server uses that ID as a key into a big hashtable of Session objects. Whenever the server gets a request, it looks up the Session info out of its hashtable of session objects based on the ID the client submitted with the request. All this extra work is a double whammy on scalability (a big reason HTTP is stateless).
Whammy One: It reduces the work a single server can do.
Whammy Two: It makes it harder to scale out because now you can't just route a request to any old server - they don't all have the same session. You can pin all the requests with a given session ID to the same server. That's not easy, and it's a single point of failure (not for the system as a whole, but for big chunks of your users). Or, you could share the session storage across all servers in the cluster, but now you have more complexity: network-attached memory, a stand-alone session server, etc.
Given all that, the more info you put in the session, the bigger the impact on performance (as Vinko points out). Also as Vinko points out, if your object isn't serializable, the session will misbehave. So, as a rule of thumb, avoid putting more than absolutely necessary in the session.
#Vinko You can usually work around having the server store state by embedding the data you're tracking in the response you send back and having the client resubmit it, e.g., sending the data down in a hidden input. If you really need server-side tracking of state, it should probably be in your backing datastore.
(Vinko adds: PHP can use a database for storing session information, and having the client resubmit the data each time might solve potential scalability issues, but opens a big can of security issues you must pay attention to now that the client's in control of all your state)
Objects which cannot be serialized (or which contain unserializable members) will not come out of the $_SESSION as you would expect
Huge sessions put a burden on the server (serializing and deserializing megs of state each time is expensive)
Other than that I've seen no problems.
In my experience, it's generally not worth it for anything more complicated than an StdClass with some properties. The cost of unserializing has always been more than recreating from a database given a session-stored Identifier. It seems cool, but (as always), profiling is the key.
I would suggest don't use state unless you absolutely need it. If you can rebuild the object without using sessions do it.
Having states in your webapplication makes the application more complex to build, for every request you have to see what state the user is in. Ofcourse there are times where you cannot avoid using session (example: user have to be kept login during his session on the webapplication).
Last I would suggest keeping your session object as small as possible as it impacts performance to serialize and unserialize large objects.
You'll have to remember that resource types (such as db connections or file pointers) wont persist between page loads, and you'll need to invisibly re-create these.
Also consider the size of the session, depending how it is stored, you may have size restrictions, or latency issues.
I would also bring up when upgrading software libraries - we upgraded our software and the old version had objects in session with the V1 software's class names, the new software was crashing when it tried to build the objects that were in the session - as the V2 software didn't use those same classes anymore, it couldn't find them. We had to put in some fix code to detect session objects, delete the session if found, reload the page. The biggest pain initially mind you was recreating this bug when it was first reported (all too familiar, "well, it works for me" :) as it only affected people who where in and out the old and new systems recently - however, good job we did find it before launch as all of our users would surely have had the old session variables in their sessions and would have potentially crashed for all, would have been a terrible launch :)
Anyway, as you suggest in your amendment, I also think it's better to re-create the object. So maybe just storing id and then on each request pulling the object from the database, is better/safer.
Hy guys. I'm currently working on a project which uses a lot of data stored in session variables. My question is how reliable is this method and if affects the server performance and memory usage. Basicaly, what you would choose between session variables and cookies.
In general, session variables are going to be a lot more secure in the fact that the user cannot edit them locally on his/her machine.
But the real question begs, what are you looking to store? With a bit more information we might be able to give you a better answer as to where you would want to store it :)
Edit:
If you are looking to store user actions, I might recommend building a UserActions table or something along those lines. A table that contains the following:
id INT (generic ID for the record),
timestamp TIMESTAMP/DATETIME (whatever your DB supports),
userid INT (lookup to the user table),
action VARCHAR (what action you want to record),
etc etc (whatever else you want to store)
Then when a user performs an action you want to record, just log it into the table itself, instead of making it travel along with the user in a session/cookie. Really the page itself doesn't need to know what actions the user has performed in the past, unless its a "multi-step wizard" type application. In that case, it probably would be best to pass them as a session variable.
Then you are pushing the storage into a true storage component (being the database) instead of session/cookie as storage.
I mean we still don't really have an idea of exactly what you are developing, but I hope it helps.
Session variables are generally preferable to cookies. That said, they are usually stored in the /tmp directory on your web server, which is world-readable and world-writable. This could be breeding ground for mischief if you don't control your server or you run in a shared environment. Not storing sensitive information in session variables, and not relying on them for stuff that has to work is a good practice.
You should only use cookies if you need the data-per-user to persist across sessions. That is to say, if they revisit the site outside of the session expiry time and you need the data there.
Otherwise, if the data is only for their current session, then go ahead and put it in $_SESSION. That what it's there for.
Session data is usually stored in files on the server or in database. So how many data is there depends only from your scripts. If you want to store big binary files in the sessions you will probably reach memory limits quickly.
Storing the data in cookies is not always a good idea. This data is visible to the client, he can easy change it and in some cases that just something you mustn't allow.
Session variables do not require submission by the user, they are simply loaded based on a session key. Memory usage depends on the session implementation since there is the cost of retrieving the session from your database (or file system, or memory, or w/e).
it's always a tradeoff between keeping information on the server (more memory used) and pushing some of that data off to the client machine (more bandwidth and less secure). As a rule of thumb, I prefer sessions, they are more secure and easy to manage.
When i wrote this question I was thinking at non-sensitive data and with application for logging user activity on website. I think that, for a busy server, with a large number of users, it's better to use cookies instead, it will unload the server resources(memory, hard-drive I/O). In terms of performance, I think that session variables are a better solution.
Anyway, I don't know how better it will scale the SV solution.
Using any session variables - at all - means that your application servers need to maintain session state with appropriate synchronisation.
This has an overhead and may negatively affect the scalability of your application, because every server needs to know about (potentially) every session - which is going to mean a lot of cross-server traffic for session data and synchronisation.
While you only have one server, it's ok.
When you get more and more servers geographically distributed, it gets more and more painful.
There is some overhead for the serialisation / unserialisation of the session, but in practice that's not such a problem as it will be relatively fixed per request, hence scalable to high traffic applications.