Next to your normal user table "user"(user_id/user_email/user_pwd/etc), what is the best way to go to store profile information?
Would one just add fields to the user table like "user"
(user_id/user_email/user_pwd/user_firstname/user_lastname/user_views/etc)
or create another table called "profiles"
(profile_id/user_id/user_firstname/user_lastname/user_views/etc)
or would one go for a table with property definitions and another table to store those values?
I know the last one is the most flexible, as you can add and remove fields easily.
But for a big site (50k users up) would this be fast?
Things to consider with your approaches
Storing User Profile in Users Table
This is generally going to be the fastest approach in terms of getting at the profile data, although you may have a lot of redundant data in here (columns that may not have any information in them).
Quick (especially if you only pull columns you need from the db)
Wasted Data
More difficult to work with / maintain (arguably with interfaces such as PHPMyAdmin)
Storing User Profile in User_Profile Table 1-1 relationship to users
Should still be quite quick with a join and you may eliminate some data redundancy if user profiles aren't created unless a user fills one in.
Easier to work with
Ever so slightly slower due to join (or 2nd query)
Storing User Profile as properties and values in tables
*i.e. Table to store possible options, table to store user_id, option_id and value*
No redundant data stored, all data is relevant
Most normalised method
Slower to retrieve and update data
My impression is that most websites use the 2nd method and store profile information in a second table, its common for most larger websites to de-normalize the database (twitter, facebook) to achieve greater read performance at the expense of slower write performance.
I would think that keeping the profile information in a second table is likely the way to go when you are looking at 50,000 records. For optimum performance you want to keep data that is written heavily seperated from data that is read heavy to ensure cache can work effectively.
Table with property definitions isn't the good idea. I suggest to use three tables to store data:
user(id,login,email,pwd, is_banned, expired, ...)
-- rarely changed, keep small, extremaly fast search, easy to cache, admin data
profile(id, user_id, firstname,lastname, hobby,description, motto)
--data often changed by user,...
user_stats(id,user_id,last_login,first_login,post_counter, visit_counter, comment_counter)
--counters are very often updated, dml invalidate cache
The better way to store authorisation and authentication data is LDAP.
You need way more than 3 tables. How will he store data like multiple emails, multiple addresses, multiple educational histories, multiple "looking for" relationships, etc. Each needs its own row assuming many values will be lookups like city, sex preference, school names, etc. so either normalize it fully or go the noSQL route, no point in hanging in the middle, you will lose the best of both worlds.
you can duplicate rows but it wont be good. social networks do not live with 50,000 users. either you will be successful and have millions of users or you will crash and clsoe it because to run these you need $$$ which will only come if you have a solid user base. With only 50,000 users for life investors wont invest, ad revenues wont cover the cost and you will close it. So design it like you want to be the next facebook right from day one. Think big!
Related
Is there any pro or con to have either for a website with few ammounts of user data stored? As of now, I have one database with all needed user data (addresses, telephone etc). Now I'm considering making a database to keep track of current events, of which some are linked to specific users. Other than perhaps making it more "tasteful" to separate data by "type" (a user database, and an "event database"), is there any real reason to do so? The amount of users will never go past 100.
If you're going to be linking user data and current event data, it will be significantly easier to have that data in separate tables in a single database.
If you actually mean having multiple databases for a small website, then no. You would want 1 database for the entire website, especially if it is small. You would want multiple tables though.
I am assuming you are getting a database and a table mixed up, which would then make your question make sense, and then yes, you would want multiple tables in your database to store different information.
It's always best to split things. Use one table for users, and another for the events. With a few users you won't have a problem if you use only one. But have it in mind for future references
So I'm working on site that will replace an older site with a lot of traffic, and I will also have a lot of data in the DB, so my question to you guys is what is the best way to design mysql tables for growth?
I was thinking to split let's say a table with 5 000 000 rows in 5 tables,with 1 000 000 rows/table and create a relationship between the tables, but I guess this isn't a good option since I will spend a lot of resources and time to figure out in what table my data is.
Or can you guys give me some tips mabe some useful articles?
No, you're absolutely right on the relationships. This technique is called Normalization where you define separate tables because these individual tables are affected with time and independent of other tables.
So if you have a hotel database that keeps a track of rooms and guests, then you know normalization is necessary because rooms and guests are independent of each other.
But you will have foreign keys/surrogate keys in each table (for instance, room_id) that could relate the particular guest entering for that particular room.
Normalization, in your case, could help you optimize that 5000 rows of yours as it would not be optimal for a loop to go over 5000 elements and retrieve an entire data.
Here is a strong example for why normalization is essential in database management.
Partitioning as mentioned in a comment is one way to go, but the first path to check out is even determining if you can break down the tables with the large amounts of data into workable chunks based on some internal data.
For instance, lets say you have a huge table of contacts. You can essentially break down the data into contacts that start from a-d, e-j, etc. Then when you go to add records you just make sure you add the records to the correct table (I'd suggest checking out stored procedures for handling this, so that logic is regulated in the database). You'd also probably set up stored procedures to also get data from the same tables. By doing this however, you have to realize that using auto-incrementing IDs won't work correctly as you won't be able to maintain unique IDs across all of the tables without doing some work yourself.
These of course are the simple solutions. There are tons of solutions for large data sets which also includes looking at other storage solutions, clustering, partitioning, etc. Doing some of these things manually yourself can give you a little bit of an understanding on some of the possibly "manual solutions".
I am in the process of creating a website where I need to have the activity for a user (similar to your inbox in stackoverflow) stored in sql. Currently, my teammates and I are arguing over the most effective way to do this; so far, we have come up with two alternate ways to do this:
Create a new table for each user and have the table name be theirusername_activity. Then when I need to get their activity (posting, being commented on, etc.) I simply get that table and see the rows in it...
In the end I will have a TON of tables
Possibly Faster
Have one huge table called activity, with an extra field for their username; when I want to get their activity I simply get the rows from that table "...WHERE username=".$loggedInUser
Less tables, cleaner
(assuming I index the tables correctly, will this still be slower?)
Any alternate methods would also be appreciated
"Create a new table for each user ... In the end I will have a TON of tables"
That is never a good way to use relational databases.
SQL databases can cope perfectly well with millions of rows (and more), even on commodity hardware. As you have already mentioned, you will obviously need usable indexes to cover all the possible queries that will be performed on this table.
Number 1 is just plain crazy. Can you imagine going to manage it, and seeing all those tables.
Can you imagine the backup! Or the dump! That many create tables... that would be crazy.
Get you a good index, and you will have no problem sorting through records.
here we talk about MySQL. So why would it be faster to make separate tables?
query cache efficiency, each insert from one user would'nt empty the query cache for others
Memory & pagination, used tables would fit in buffers, unsued data would easily not be loaded there
But as everybody here said is semms quite crazy, in term of management. But in term of performances having a lot of tables will add another problem in mySQL, you'll maybe run our of file descriptors or simply wipe out your table cache.
It may be more important here to choose the right engine, like MyIsam instead of Innodb as this is an insert-only table. And as #RC said a good partitionning policy would fix the memory & pagination problem by avoiding the load of rarely used data in active memory buffers. This should be done with an intelligent application design as well, where you avoid the load of all the activity history by default, if you reduce it to recent activity and restrict the complete history table parsing to batch processes and advanced screens you'll get a nice effect with the partitionning. You can even try a user-based partitioning policy.
For the query cache efficiency, you'll have a bigger gain by using an application level cache (like memcache) with history-per-user elements saved there and by emptying it at each new insert .
You want the second option, and you add the userId (and possibly a seperate table for userid, username etc etc).
If you do a lookup on that id on an properly indexed field you'd only need something like log(n) steps to find your rows. This is hardly anything at all. It will be way faster, way clearer and way better then option 1. option 1 is just silly.
In some cases, the first option is, in spite of not being strictly "the relational way", slightly better, because it makes it simpler to shard your database across multiple servers as you grow. (Doing this is precisely what allows wordpress.com to scale to millions of blogs.)
The key is to only do this with tables that are entirely independent from a user to the next -- i.e. never queried together.
In your case, option 2 makes the most case: you'll almost certainly want to query the activity across all or some users at some point.
Use option 2, and not only index the username column, but partition (consider a hash partition) on that column as well. Partitioning on username will provide you some of the same benefits as the first option and allow you to keep your sanity. Partitioning and indexing the column this way will provide a very fast and efficient means of accessing data based on the username/user_key. When querying a partitioned table, the SQL Engine can immediately lop off partitions it doesn't need to scan as it can tell based off of the username value queried vs. the ability of that username to reside within a partition. (in this case only one partition could contain records tied to that user) If you have a need to shard the table across multiple servers in the future, partitioning doesn't hinder that ability.
You will also want to normalize the table by separating the username field (and any other elements in the table related to username) into its own table with a user_key. Ensure a primary key on the user_key field in the username table.
This majorly depends now on where you need to retrieve the values. If its a page for single user, then use first approach. If you are showing data of all users, you should use single table. Using multiple table approach is also clean but in sql if the number of records in a single table are very high, the data retrieval is very slow
I have been creating a web app and am looking to expand. In my web app I have a table for users which includes privileges in order to track whether a user is an administrator, a very small table for a dynamic content section of a page, and a table for tracking "events" on the website.
Being not very experienced with web application creation, I'm not really sure about how professionals would create systems of databases and tables for a web application. In my web app, I plan to add further user settings for each member of the website and even a messaging system. I currently use PHP with a MySQL database that I query for all of my commands, but I would be willing to change any of this if necessary. What would be the best wat to track content such as messages that are interpersonal and also specific user settings for each user. Would I want to have multiple databases at any point? Would I want to have multiple tables for each user, perhaps? Any information on how this is done or should be done would be quite helpful.
I'm sorry about the broadness of the question, but I've been wanting to reform this web app since I feel that my ideas for table usage are not on par with those that experienced programmers have.
Here's my seemingly long, hopefully not too convoluted answer to your question. I think I've covered most, if not all of your queries.
For your web app, you could have a table of users called "Users", settings table called "UserSettings" or something equally as descriptive, and messages in "PrivateMessages" table. Then there could be child tables that store extra data that is required.
User security can be a tricky thing to design and implement. Do you want to do it by groups (if you plan on having many users, making it easier to manage their permissions), or just assign individually due to a small user base? For security alone, you'd end up with 4 tables:
Users
UserSettings
UserGroups
UserAssignedGroups
That way you can have user info, settings, groups they can be assigned to and what they ARE assigned to separated properly. This gives you a decent amount of flexibility and conforms to normalization standards (as mentioned above by DrSAR).
With your messages, don't store them with the username, but rather the User ID. For instance, in your PrivateMessages table, you would have a MessageID, SenderUserID, RecipientUserID, Subject, Body and DateSent to store the most basic info. That way, when a user wants to check their received messages, you can query the table saying:
SELECT * FROM PrivateMessages WHERE RecipientUserID = 123556
A list of tables for your messages could be as such:
PrivateMessages
MessageReplies
The PrivateMessages table can store the parent message, and then the MessageReplies table can store the subsequent replies. You could store it all in one table, but depending on traffic and possibly writing recursive functions to retrieve all messages and replies from one table, a two table approach would be simplest I feel.
If I were you, I'd sit down with a pencil and paper, and write down/draw what I want to track in my database. That way you can then draw links between what you want to store, and see how it will come together. It helps me when I'm trying to visualise things.
For the scope of your web app you don't need multiple databases. You do need, however, multiple tables to store your data efficiently.
For user settings, always use a separate table. You want your "main" users table as lean as possible, since it will be accessed (= searched) every time a user will try to log in. Store IDs, username, password (hashed, of course) and any other field that you need to access when authenticating. Put all the extra information in a separate table. That way your login will only query a smaller table and once the user is authenticated you can use its ID to get all other information from the secondary table(s).
Messages can be trickier because they're a bigger order of magnitude - you might have tens or hundreds for each user. You need to design you table structure based on your application's logic. A table for each user is clearly not a feasible solution, so go for a general messages table but implement procedures to keep it to a manageable size. An example would be "archiving" messages older than X days, which would move them to another table (which works well if your users aren't likely to access their old messages too often). But like I said, it depends on your application.
Good luck!
Along the lines of Cristian Radu's comments: you need to split your data into different tables. The lean user table will (in fact, should) have one unique ID per user. This (unique) key should be repeated in the secondary tables. It will then be called a foreign key. Obviously, you want a key that's unique. If your username can be guaranteed to be unique (i.e. you require user be identified by their email address), then you can use that. If user names are real names (e.g. Firstname Sirname), then you don't have that guarantee and you need to keep a userid which becomes your key. Similarly, the table containing your posts could (but doesn't have to) have a field with unique userids indicating who wrote it etc.
You might want to read a bit about database design and the concept of normalization: (http://dev.mysql.com/tech-resources/articles/intro-to-normalization.html) No need to get bogged down with the n-th form of normalization but it will help you at this stage where you need to figure out the database design.
Good luck and report back ;-)
I have a pretty large social network type site I have working on for about 2 years (high traffic and 100's of files) I have been experimenting for the last couple years with tweaking things for max performance for the traffic and I have learned a lot. Now I have a huge task, I am planning to completely re-code my social network so I am re-designing mysql DB's and everything.
Below is a photo I made up of a couple mysql tables that I have a question about. I currently have the login table which is used in the login process, once a user is logged into the site they very rarely need to hit the table again unless editing a email or password. I then have a user table which is basicly the users settings and profile data for the site. This is where I have questions, should it be better performance to split the user table into smaller tables? For example if you view the user table you will see several fields that I have marked as "setting_" should I just create a seperate setting table? I also have fields marked with "count" which could be total count of comments, photo's, friends, mail messages, etc. So should I create another table to store just the total count of things?
The reason I have them all on 1 table now is because I was thinking maybe it would be better if I could cut down on mysql queries, instead of hitting 3 tables to get information on every page load I could hit 1.
Sorry if this is confusing, and thanks for any tips.
alt text http://img2.pict.com/b0/57/63/2281110/0/800/dbtable.jpg
As long as you don't SELECT * FROM your tables, having 2 or 100 fields won't affect performance.
Just SELECT only the fields you're going to use and you'll be fine with your current structure.
should I just create a seperate setting table?
So should I create another table to store just the total count of things?
There is not a single correct answer for this, it depends on how your application is doing.
What you can do is to measure and extrapolate the results in a dev environment.
In one hand, using a separate table will save you some space and the code will be easier to modify.
In the other hand you may lose some performance ( and you already think ) by having to join information from different tables.
About the count I think it's fine to have it there, although it is always said that is better to calculate this kind of stuff, I don't think for this situation it hurt you at all.
But again, the only way to know what's better your you and your specific app, is to measuring, profiling and find out what's the benefit of doing so. Probably you would only gain 2% of improvement.
You'll need to compare performance testing results between the following:
Leaving it alone
Breaking it up into two tables
Using different queries to retrieve the login data and profile data (if you're not doing this already) with all the data in the same table
Also, you could implement some kind of caching strategy on the profile data if the usage data suggests this would be advantageous.
You should consider putting the counter-columns and frequently updated timestamps in its own table --- every time you bump them the entire row is written.
I wouldn't consider your user table terrible large in number of columns, just my opinion. I also wouldn't break that table into multiple tables unless you can find a case for removal of redundancy. Perhaps you have a lot of users who have the same settings, that would be a case for breaking the table out.
Should take into account the average size of a single row, in order to find out if the retrieval is expensive. Also, should try to use indexes as while looking for data...
The most important thing is to design properly, not just to split because "it looks large". Maybe the IP or IPs could go somewhere else... depends on the data saved there.
Also, as the socialnetworksite using this data also handles auth and autorization processes (guess so), the separation between login and user tables should offer a good performance, 'cause the data on login is "short enough", while the access to the profile could be done only once, inmediately after the successful login. Just do the right tricks to improve DB performance and it's done.
(Remember to visualize tables as entities, name them as an entity, not as a collection of them)
Two things you will want to consider when deciding whether or not you want to break up a single table into multiple tables is:
MySQL likes small, consistent datasets. If you can structure your tables so that they have fixed row lengths that will help performance at the potential cost of disk space. One thing that from what I can tell is common is taking fixed length data and putting it in its own table while the variable length data will go somewhere else.
Joins are in most cases less performant than not joining. If the data currently in your table will normally be accessed all at the same time then it may not be worth splitting it up as you will be slowing down both inserts and quite potentially reads. However, if there is some data in that table that does not get accessed as often then that would be a good candidate for moving out of the table for performance reasons.
I can't find a resource online to substantiate this next statement but I do recall in a MySQL Performance talk given by Jay Pipes that he said the MySQL optimizer has issues once you get more than 8 joins in a single query (MySQL 5.0.*). I am not sure how accurate that magic number is but regardless joins will usually take longer than queries out of a single table.