Is there any pro or con to have either for a website with few ammounts of user data stored? As of now, I have one database with all needed user data (addresses, telephone etc). Now I'm considering making a database to keep track of current events, of which some are linked to specific users. Other than perhaps making it more "tasteful" to separate data by "type" (a user database, and an "event database"), is there any real reason to do so? The amount of users will never go past 100.
If you're going to be linking user data and current event data, it will be significantly easier to have that data in separate tables in a single database.
If you actually mean having multiple databases for a small website, then no. You would want 1 database for the entire website, especially if it is small. You would want multiple tables though.
I am assuming you are getting a database and a table mixed up, which would then make your question make sense, and then yes, you would want multiple tables in your database to store different information.
It's always best to split things. Use one table for users, and another for the events. With a few users you won't have a problem if you use only one. But have it in mind for future references
Related
First of all, I apologize if a similar question has been asked and answered. I searched and found similar questions, but not one quite close enough.
My question is basically whether or not it is a good idea to separate tables of virtually the same data in my particular circumstance. The tables track data track data for two very different groups (product licensing data for individual users and product licensing data for enterprise users). I am thinking of separating them into two tables so that the user verification process runs faster (especially for individual users since the number of records is significantly lower (eg ~500 individual records vs ~10,000 enterprise records)). Lastly, there is a significant difference in the user types that isn't apparent in the table structure - individual users all have a fixed number of activations while enterprise users may have up to unlimited activations and the purpose of tracking is more for activation stats.
The reason I think separating the tables would be a good idea is because each table would be smaller, resulting in faster queries (at least I think it would...). On the other hand, I will have to do two queries to obtain analytical data. Additionally, I may wish to change the data I am tracking from time to time and obviously, this is more of a pain with two duplicate tables.
I am guessing the query time difference is probably insignificant, even with tens of thousands of records?? However, I would like to hear peoples' thoughts on this (mainly regarding efficiency and overall best practices) if they would be so kind to share.
Thanks in advance!
When designing your database structure you should try to normalize your data as much as possible. So to answer your question
"whether or not it is a good idea to separate tables of virtually the same data in my particular circumstance."
If you normalize your database correctly, the answer is no, it's not a good idea to create two tables with almost identical information. With normalization you should be able to separate out similar data into mapping tables which will allow you to create more complex queries that will run faster.
A very basic example of a first normal form normalization would be you have a table of users, and in the table you have a column for role. Instead of having the physical word "admin" or "member" you have an id that is mapped to another table called roles where 1 = admin and 2 = member. The idea is it is more efficient to store repeated ids rather then repeated words like admin and member.
I'm using MySQL as my main database for a simple "Social Network" I'm spending a few weeks on.
As with all social networks, the user requires a connection with their friends in order to make it social.
My theory was to either add another column onto my user database and name it connections. There, I would store a string of user id's separated by a comma, then split them when needed.
Another theory I had was to create a completely new table connections and use two columns "user_1" and "user_2". The database would then, when searching for friends, would perform a select looking for their id and so on.
The question is though: What would be the most efficient? If I'm to support large numbers of users, is it risky going with option 2?
Some advice would be greatly appreciated,
Thanks!
A normalized structure (option #2) is highly preferable for structuring the type of data that you describe. It will be far more efficient to query a narrow table with two integer columns than to split through an ever-growing list of IDs.
I would suggest reading about the different normalization forms: http://en.wikipedia.org/wiki/Database_normalization (see "Normal Forms")
The second approach is much better. You're creating relations between the user by using a table 'connections'. This way you can create 'n:m' relations. If you want to add some kind of connection type ('love interest', 'friend') you can easily add it in a table, but not in a string.
There's another benefit: you don't have to think about the number of connections a user has. What would you use for the connections? A varchar? A text? Do you really want to parse this mess each and every time? How do you make sure that you don't add a connection twice?
tldr;: Use a table to show relations.
Option 1 will not end well. Go with a separate table.
A separate table called connections without a doubt would be easier. having multiple values in one column defeats the purpose of a database, can you imagine searching for all friends of user1 with option 1?
MySQL can certainly deliver good performance with option 2. It's easier to select friends and to do calculations. There's a lot you can do with caching, multiple servers, load balancing and all that.
And realistically speaking: by the time you reach a large number of users, you'll be rewriting the system anyway to incorporate all the lessons you've learned along the way.
I'm designing a blog-like website system from the ground up, based off of PHP and MySQL. It works based on this structure:
Everything has a unique ID, known as an entity ID or ENID.
A master table contains all ENIDs, so there are no duplicates.
There are four types of entities: posts, revisions, modules, and users
This is all so that id.php can be asked for any resource on the site and know what to do with it.
Posts are categorized to a module. For example, documents, messages, events, etc. all belong to a separate module.
Posts reference a specific row in a revisions table to be displayed to the user.
I'm wondering, would it be best to split the four entities and the master table up across separate databases, or would it be best to keep them all in one? Security is a TOP priority.
I don't see how splitting those things to separate databases could increase security by itself. It will only complicate your application code without any necessity.
Store them in the same database and focus your security efforts on other areas: firewalls, sql sanitizing, etc.
Keeping authentication information in the same database is absolutely acceptable. Most people do it this way. Just make sure you don't store passwords in plain text (you should store a salted hash instead).
I personally think one DB would be enough.
Also, I don't think using more than one DB would increase security in any way, but I could be wrong.
One database should be fine. I usually set things up so that each application/service uses one database, with different table in it for the various information that I want to store.
I think you don't need to store your information in different databases. All of them belongs to one system. Working with different databases will occur you with many tasks and you have to care about many databases instead of one.
You'd better to have just 1 database in this case and focus yourself on its security issues.
By the way, don't forget that you will need relations between key columns for different reason. So At least, working with different databases will force you to do something more that when you have just one database with different tables.
Next to your normal user table "user"(user_id/user_email/user_pwd/etc), what is the best way to go to store profile information?
Would one just add fields to the user table like "user"
(user_id/user_email/user_pwd/user_firstname/user_lastname/user_views/etc)
or create another table called "profiles"
(profile_id/user_id/user_firstname/user_lastname/user_views/etc)
or would one go for a table with property definitions and another table to store those values?
I know the last one is the most flexible, as you can add and remove fields easily.
But for a big site (50k users up) would this be fast?
Things to consider with your approaches
Storing User Profile in Users Table
This is generally going to be the fastest approach in terms of getting at the profile data, although you may have a lot of redundant data in here (columns that may not have any information in them).
Quick (especially if you only pull columns you need from the db)
Wasted Data
More difficult to work with / maintain (arguably with interfaces such as PHPMyAdmin)
Storing User Profile in User_Profile Table 1-1 relationship to users
Should still be quite quick with a join and you may eliminate some data redundancy if user profiles aren't created unless a user fills one in.
Easier to work with
Ever so slightly slower due to join (or 2nd query)
Storing User Profile as properties and values in tables
*i.e. Table to store possible options, table to store user_id, option_id and value*
No redundant data stored, all data is relevant
Most normalised method
Slower to retrieve and update data
My impression is that most websites use the 2nd method and store profile information in a second table, its common for most larger websites to de-normalize the database (twitter, facebook) to achieve greater read performance at the expense of slower write performance.
I would think that keeping the profile information in a second table is likely the way to go when you are looking at 50,000 records. For optimum performance you want to keep data that is written heavily seperated from data that is read heavy to ensure cache can work effectively.
Table with property definitions isn't the good idea. I suggest to use three tables to store data:
user(id,login,email,pwd, is_banned, expired, ...)
-- rarely changed, keep small, extremaly fast search, easy to cache, admin data
profile(id, user_id, firstname,lastname, hobby,description, motto)
--data often changed by user,...
user_stats(id,user_id,last_login,first_login,post_counter, visit_counter, comment_counter)
--counters are very often updated, dml invalidate cache
The better way to store authorisation and authentication data is LDAP.
You need way more than 3 tables. How will he store data like multiple emails, multiple addresses, multiple educational histories, multiple "looking for" relationships, etc. Each needs its own row assuming many values will be lookups like city, sex preference, school names, etc. so either normalize it fully or go the noSQL route, no point in hanging in the middle, you will lose the best of both worlds.
you can duplicate rows but it wont be good. social networks do not live with 50,000 users. either you will be successful and have millions of users or you will crash and clsoe it because to run these you need $$$ which will only come if you have a solid user base. With only 50,000 users for life investors wont invest, ad revenues wont cover the cost and you will close it. So design it like you want to be the next facebook right from day one. Think big!
I have a pretty large social network type site I have working on for about 2 years (high traffic and 100's of files) I have been experimenting for the last couple years with tweaking things for max performance for the traffic and I have learned a lot. Now I have a huge task, I am planning to completely re-code my social network so I am re-designing mysql DB's and everything.
Below is a photo I made up of a couple mysql tables that I have a question about. I currently have the login table which is used in the login process, once a user is logged into the site they very rarely need to hit the table again unless editing a email or password. I then have a user table which is basicly the users settings and profile data for the site. This is where I have questions, should it be better performance to split the user table into smaller tables? For example if you view the user table you will see several fields that I have marked as "setting_" should I just create a seperate setting table? I also have fields marked with "count" which could be total count of comments, photo's, friends, mail messages, etc. So should I create another table to store just the total count of things?
The reason I have them all on 1 table now is because I was thinking maybe it would be better if I could cut down on mysql queries, instead of hitting 3 tables to get information on every page load I could hit 1.
Sorry if this is confusing, and thanks for any tips.
alt text http://img2.pict.com/b0/57/63/2281110/0/800/dbtable.jpg
As long as you don't SELECT * FROM your tables, having 2 or 100 fields won't affect performance.
Just SELECT only the fields you're going to use and you'll be fine with your current structure.
should I just create a seperate setting table?
So should I create another table to store just the total count of things?
There is not a single correct answer for this, it depends on how your application is doing.
What you can do is to measure and extrapolate the results in a dev environment.
In one hand, using a separate table will save you some space and the code will be easier to modify.
In the other hand you may lose some performance ( and you already think ) by having to join information from different tables.
About the count I think it's fine to have it there, although it is always said that is better to calculate this kind of stuff, I don't think for this situation it hurt you at all.
But again, the only way to know what's better your you and your specific app, is to measuring, profiling and find out what's the benefit of doing so. Probably you would only gain 2% of improvement.
You'll need to compare performance testing results between the following:
Leaving it alone
Breaking it up into two tables
Using different queries to retrieve the login data and profile data (if you're not doing this already) with all the data in the same table
Also, you could implement some kind of caching strategy on the profile data if the usage data suggests this would be advantageous.
You should consider putting the counter-columns and frequently updated timestamps in its own table --- every time you bump them the entire row is written.
I wouldn't consider your user table terrible large in number of columns, just my opinion. I also wouldn't break that table into multiple tables unless you can find a case for removal of redundancy. Perhaps you have a lot of users who have the same settings, that would be a case for breaking the table out.
Should take into account the average size of a single row, in order to find out if the retrieval is expensive. Also, should try to use indexes as while looking for data...
The most important thing is to design properly, not just to split because "it looks large". Maybe the IP or IPs could go somewhere else... depends on the data saved there.
Also, as the socialnetworksite using this data also handles auth and autorization processes (guess so), the separation between login and user tables should offer a good performance, 'cause the data on login is "short enough", while the access to the profile could be done only once, inmediately after the successful login. Just do the right tricks to improve DB performance and it's done.
(Remember to visualize tables as entities, name them as an entity, not as a collection of them)
Two things you will want to consider when deciding whether or not you want to break up a single table into multiple tables is:
MySQL likes small, consistent datasets. If you can structure your tables so that they have fixed row lengths that will help performance at the potential cost of disk space. One thing that from what I can tell is common is taking fixed length data and putting it in its own table while the variable length data will go somewhere else.
Joins are in most cases less performant than not joining. If the data currently in your table will normally be accessed all at the same time then it may not be worth splitting it up as you will be slowing down both inserts and quite potentially reads. However, if there is some data in that table that does not get accessed as often then that would be a good candidate for moving out of the table for performance reasons.
I can't find a resource online to substantiate this next statement but I do recall in a MySQL Performance talk given by Jay Pipes that he said the MySQL optimizer has issues once you get more than 8 joins in a single query (MySQL 5.0.*). I am not sure how accurate that magic number is but regardless joins will usually take longer than queries out of a single table.