Assumption
I understand that it's not good to store to much data and it is needed to be as simple.
State today
Now I use as minimum needed and using simple data types (int and strings)
mainly for storing user's id and to tell if he is logged in.
must of my functions are static or singleton that has to be built each post/get.
I have trouble to representing the current state and changing it.
and get a largely static site.
most of state representing goes into javascript .
Target
for the other hand if I'll create a object that represent the entire website it will be much easier for me to maintain user's input , including database interaction.
simple question, how much data should be stored there?
example
One of the things i want to implement is
objects that relate to Database tables,
Let's take a page for a "car.update()".
Now if i store an object for it, that extends a connection to the Database with methods
for CRUD.
When I handle a post back from that page with details i could just put them in properties needed and call the update method.
situation now: I need to create a new object with that details and make an static update
Another example
storing previous search result and filter it using new data
In many cases the ideal amount would be none. Store the username in a cookie along with an HMAC hash used to verify the cookie was created by your site, and get everything else from the database (or cache) as needed. This makes it easy to load balance across servers because any server can handle any request and there's no state that needs to be shared between them.
This approach wouldn't be appropriate for banking or other top-security uses because if someone gets your cookie they connect as you. But for sites where you're not doing anything super critical it's great. The risk can also be mitigated somewhat by adding an expiration mechanism to your cookie handling. See chubbards great answer related to another HMAC question for more info.
note you can switch the way PHP stores data using session_set_save_handler. Then you don't have to change the calls and you improve performances/maintenance with the efficiency of database.
The minimum would be the user I.D.—assuming it is a logging in type of interface. But it is often helpful to include the most common aspects of that, like the user's permission and other items which are stored in the database, but are frequently referenced when constructing pages.
You shouldn't store an enormous amount of data, but you can without problems store some user-information if it helps you server you pages faster.
But if you want to build a more dynamic website, you will probably retreive more and more data from the database. So when you're connecting to a database after all, you could skip storing all kinds of information in the session, because you can just as well get them from the database. Most databases (including MySQL) have a quite efficient query cache that will make repeated queries lightning fast.
So in that case you'll need to save little more than the userid and maybe a small amount of flags.
Related
For large arrays, is it better to save the data to global variables or query the database each time I need them? In my situation keeping them local scope and passing them to functions isn't an option.
I'm using wordpress and in most pages I get every user and all metadata attached to them. Often times I use these variables in multiple places on the same page. Unfortunately wordpress won't let me pass variables between templates so I'm stuck either using global variables or calling the database each time. Eventually, this will be hundreds of users with a lot of metadata attached to each. Should I call the database each time to keep the variables local, or should save them to global variables to save on database queries? What are the considerations? Should I worry about performance, overhead, and/or other issues?
Thanks so much!
The only real solution to your problem is using some kind of cache system (Memcache and Redis are your best options). Fortunately, there are plenty of Wordpress plugins that make the integration an easy thing. For instance:
Redis: https://wordpress.org/plugins/redis-object-cache/
Memcache: https://wordpress.org/plugins/memcached/
EDIT
If you only want to cache a few databases calls, you can forget about Wordpress plugins and start coding a bit. Let's say you only want to cache the call for retrieving the list of users from database, and let's assume you are using Memcache to accomplish this task (Memcache stores key-value pairs and allows super fast access to a value given a key).
Query Memcache asking for the key "users".
Memcache still doesn't have such key, so you'll have a cache fail and after it, you'll query your database to retrieve the user list. Now serialize the database response (serialize and json_encode are two different ways to do this) and store the key "users" along this serialized value in your memcache.
Next time you query your memcache asking for "users", you'll get a hit. In this moment you just have to unserialize the value and work with your user list.
And that's all. Now you just have to decide what you want to cache and apply this procedure to those elements.
You shouldn't have to perform the calls but once per page, you might have to execute the call once for every page. So I would suggest you creating some sort of class to interact with your database that you can call on to get the data that you need. I would also recommend using stored procedures and functions on your database instead of straight queries since this will help both with security and separation of application logic and data functionality.
I've implemented an Access Control List using 2 static arrays (for the roles and the resources), but I added a new table in my database for the permissions.
The idea of using a static array for the roles is that we won't create new roles all the time, so the data won't change all the time. I thought the same for the resources, also because I think the resources are something that only the developers should treat, because they're more related to the code than to a data. Do you have any knowledge of why to use a static array instead of a database table? When/why?
The problem with hardcoding values into your code is that compared with a database change, code changes are much more expensive:
Usually need to create a new package to deploy. That package would need to be regression tested, to verify that no bugs have been introduced. Hint: even if you only change one line of code, regression tests are necessary to verify that nothing went wrong in the build process (e.g. a library isn't correctly packaged causing a module to fail).
Updating code can mean downtime, which also increases risk because what if the update fails, there always is a risk of this
In an enterprise environment it is usually a lot quicker to get DB updates approved than code change.
All that costs time/effort/money. Note, in my opinion holding reference data or static data in a database does not mean a hit on performance, because the data can always be cached.
Your static array is an example of 'hard-coding' your data into your program, which is fine if you never ever want to change it.
In my experience, for your use case, this is not ever going to be true, and hard-coding your data into your source will result in you being constantly asked to update those things you assume will never change.
Protip: to a project manager and/or client, nothing is immutable.
I think this just boils down to how you think the database will be used in the future. If you leave the data in arrays, and then later want to create another application that interacts with this database, you will start to have to maintain the roles/resources data in both code bases. But, if you put the roles/resources into the database, the database will be the one authority on them.
I would recommend putting them in the database. You could read the tables into arrays at startup, and you'll have the same performance benefits and the flexibility to have other applications able to get this information.
Also, when/if you get to writing a user management system, it is easier to display the roles/resources of a user by joining the tables than it is to get back the roles/resources IDs and have to look up the pretty names in your arrays.
Using static arrays you get performance, considering that you do not need to access the database all the time, but safety is more important than performance, so I suggest you do the control of permissions in the database.
Study on RBAC.
Things considered static should be coded static. That is if you really consider them static.
But I suggest using class constants instead of static array values.
I want to add some static information associated with string keys to all of my pages. The individual PHP pages use some of that information filtered by a query string. Which is the better approach to add this information? Generate a 100K (or larger if more info is needed later) PHP file with an associated array or add an other DB table with this info and query that?
The first solution involves loading the 100K file every time even if I use only some of the information on the current page. The second on the other hand adds an extra database call to the rendering of every page.
Which is the less costly if there are a large number of pages? Loading a PHP file or making an extra db call?
Unless it is shown to really be a bottleneck (be it including the php file or querying the database), you should choose the option that is best maintainable.
My guess is that it is the second option. Store it in a database.
Storing it in a database is a much better plan. With the database you can provide better data constraints, more easily cross reference with other data and create strong relationships. You may or may not need that at this time, but it's a much more flexible solution in the end.
What is the data used for? I'm wondering if the data you need could be stored in a session variable/cookie once it is pulled from the database which would allow you to not query the db on the rendering of every page.
If you were to leverage a PHP file then utilizing APC or some other opcode cache will mitigate performance concerns as your PHP files will only be loaded each time the file changes.
However, as others have noted, a database is the best place to store this stuff as it is much easier to maintain (this should be your priority to begin with).
Having ensured ease of maintenance and a working application, should you require a performance boost then generally accepted practice would be to cache this static data in an in-memory key/value store such as memcached. This will give you rapid access to your static values (for most requests).
I wouldn't call this information "static".
To me, it's just a routine call to get dome information from the database, among other calls being made to assemble whole page. What I am missing?
And I do agree with Dennis, all optimizations should be based on real needs and profiling. Otherwise it's effect could be opposite.
If you want to utilize some caching, consider to implement Conditional GET for the whole page.
I am thinking about using a noSQL (mongoDB) paired with memcached to store sessions with in my webapp. The idea is that upon each page load, the user data is compared to the data in the memcache and if something has changed, the data would be written to both memcached and mySQL. This way the reads would be greatly reduced and memcached utilized to do what it does best.
However I am a bit concerned about using a non-ACID database for session storage especially with the memcached layer. Let's say something goes wrong while updating the session to the DB and our users got instant headache wondering why their product that they put in the cart doesn't show up...
What's an appropriate approach to this? Should we go for a mySQL session storage or is it fine to keep a non-acid supportive database for sessions?
Thanks!
I'm using MongoDB as session storage currently. It is possible to avoid race conditions mentioned by pilif. I found a class that implements a session handler for MongoDB (http://www.jqueryin.com/projects/mongo-session/) and forked it on github to suit my needs (http://github.com/halfdan/MongoSession).
If you don't want to lose your data, stick with ACID tested databases.
What's the payoff you're looking for?
If you want a secure system, you can't trust anything from the user, save for perhaps selected integers, so letting them store the information is typically a really bad idea.
I don't see the payoff for storing sessions outside of your MySQL database. You can cron cleanup on the tables if that's your concern, but why bother? Some users will shop on a site and then get distracted for a while. They would then come back a day or two later.
If you use cookies or something really temporary to store their session info, there is a really good chance their shopping time was wasted. Users really value their time... so if you stored their session info in the database, you can write something sexy to manage that data.
Plus, the nice side effect of this is that you'll generate a lot of residual information about what people like on your website that wouldn't perhaps be available to you later on. Like you could even consider some of it to be like a poll or something where the items people are adding to their cart could impact how you manage your business, order inventory or focus your marketing.
If you go with something really temporary then you lose out on getting residual benefits.
Without any locking on the session, be really, really careful of what you are storing. Never ever store anything that is dependent on what you have read before as the data might change between you reading and writing - especially in case of ajax where multiple requests can go out at once.
An example what you must not store in a non-locked session would be a shopping cart as, to add a product, you have to read, unserialize, add the product and then serialize again. If any other request does the same thing between the first requests read and write, you lose the second request's data.
Have a look at this article for detail: http://thwartedefforts.org/2006/11/11/race-conditions-with-ajax-and-php-sessions/
Keep Sessions on your filesystem (where PHP locks them for you), in your database (where you have to do manual locking) or never, ever, write anything of value to your session if that value is derived of a previous read.
While using memcached as a cache for database, it is the user who have to ensure the data consistency between database and cache. If you'll want to scale up and add more servers there is a probability to be out of sync with database even if everything seems ok.
Instead you may consider Hazelcast. As of 1.9 it also supports memcache protocol. Compared to memcached Hazelcast wants you to implement Map Persister and only itself updates the database for the updated entries. This way you don't have to handle "check cache, if data changed update database" kind of stuff.
If you write your app so that the user stores all session information client side, then you just verify that information as needed, you won't need to worry about sessions on the server side. This is one of the principles in REST style architecture. For instance, if the user is requesting adding an item to their shopping cart, just store the itemID list and count on the client side. When you hit the cart page, you can easily look up the item information from the list of itemIDs they are telling you are in their cart.
During checkout, go directly against the database with transactions to ensure you aren't getting any race conditions, and check your live inventory. If inventory isn't there when they go to check out, just say, "sorry, we just sold out". Of course, at that point you should go update any caches you have out there that are telling people you have inventory.
I would look at how much the user costs to acquire and then ask what is the cost for implementing a really good system. Keep in mind that users are a biological retry method. "I'm bored... press reload again..." While, this isn't the most perfect solution, it is sometimes acceptable vs the cost comparsion for "not lose anything - ever".
If you want additional security, you can have your sessions cached to a separate set of memcache servers so there are no accidental flushes. :)
There are a number of other systems membase.org, and some other persistent memcache solutions (java implementations) that will persist storage to disk. If you want to modify your client somewhat, or how you access memcache, you can do your own replication of memcache session objects.
-daniel
I have a sort of vague question that you guys are perfect for answering. I've many times come across a point where I've had a form for my user to fill out where it consisted of many different pages. So far, I've been saving them in a session, but I'm a little worried about that practice since a session could expire and it seems a rather volatile way of doing it.
I could see, for example, having a table for temporary forms in SQL that you save to at the end of each page. I could see posting all the data taken so far to the next page. Things along those lines. How do you guys do it? What's good practice for these situations?
Yes, you can definitely save the intermediate data in the database, and then flip some bit to indicate that the record is finished when the user submits the final result. Depending on how you are splitting up the data collection, each page may be creating a row in a different table (with some key tying them together).
You may also want to consider saving the data in a more free-form manner, such as XML in a single column. This will allow you to maintain complex data structures in a simple data schema, but it will make querying the data difficult (unless your database supports xml column types, which most modern enterprisey databases do).
Another advantage to storing the interim data in the database is that the user can return to it later if he wishes. Just send the user an email when he starts, with a link to his work item. Of course, you may need to add whatever security layers on top of that to make sure someone else doesn't return to his work item.
Storing the interim data in the DB also allows the user to skip around from one page to another, and revisit past pages.
Hidden fields are also a good approach, but they will not allow the user to return later.
I would avoid storing large data structures in session, since if the user doesn't invalidate the session explicitly, and if you don't have a good mechanism for cleaning up old sessions, these expired sessions may stick around for a long time.
In the end, it really depends on your specific business needs, but hopefully this gives you something to think about.
I would stick with keeping the data in the session as it is more or less temporary at this stage: What would you do if a user does not complete the forms? You would have to check the SQL table for uncompleted data regularly making your whole application more complex.
By the way, there is a reason for session expiring namely security. And you can define yourself when the session expires.
Why not just pass things along in hidden parameters?
Ahh, good question.
I've found that a great way to handle this (if it's linear). The following will work especially well if you are including different content (pages) into one PHP page (MVC, for example). However, if you need to go from URL to URL, it can be difficult, because you cannot POST across a redirect (well, you can, but no browsers support it).
You can fill in the details.
$data = array();
//or//
$data = unserialize(base64_decode($_POST['data']));
// add keys to data
// serialize
$data = base64_encode(serialize($data));
<input type="hidden" name="data" value="<?= htmlspecialchars($data, ENT_QUOTES); ?>" />