I am creating user profiles on my site and lost on how to design this: There are many fields, some are 1:1 like city of residence, birthday, etc. But there are over 50 fields which are 1:many (or many to many?) like favorite movies, sport teams, dating preference, screen names, phone numbers, email addresses etc. It gets more complex when we have previous companies worked at, previous schools, etc. A person can belong to many companies and there are many fields in this group like Date worked at, department, company name, industry name, etc.
So the question is how to store all this? If we normalize all these profile fields there will be many many tables to join. As far as i read, for social networks people recommend a denormalized approach. But eitherways, I am storing all user details and profile details in the main user table, so each row is a unique user. If i have to store all these multiple preference, esp like favorite movies can go in the hundreds and past companies itself have a whole group of fields, so there will be lots of duplicates in the user table.
What approach do social networks take for this?
Social network data storage questions are really no different than the data storage questions in general... normalized and related data is the best way to 'store' this data efficiently. The RDBMS is made to handle these relationships - the PK-FK relationships and JOINS are the MAIN point of Relational DBs... so even though YOU 'see" join join join etc, the DB is (should be) efficient in handling these joins.
From a USAGE standpoint of getting to the pertinent data - make sure your indexes are accurate and optimized - and make use of VIEWS to 'flatten' the data you need for display purposes...
So whatever application server you are using to get the data will call the VIEW - that will 'appear' to you, the developer, as a 'flatter' representation of the data, making UI and APP serer interaction cleaner and more efficient (both in resources, and in coding),
as a general guideline - flattening of data is generally considered 'acceptable' in a data warehousing environment... of course I don't what to open up the monstrous debate of "just how normalized, is 'normalized'" (first - sixth form of normalization...)
I guess you could think of a SN as more of an OLAP, than the OLTP. In which case 'some' de-normalized data storage is common - and acceptable - really, YOU get to decide just how de-normalized you want things... For instance - in your examples, of employment history and movies, sports. I'd think that a simple 1:many allowing duplicate entries on such items would be fine, and probably easier to maintain...
Hope that was helpful,
You have to stick with the normalization strategy of creating your schema.The query might be a pain which you should handle with extreme caution especially when dealing with joins.If you are a dot developer, i guess LINQ will handle d pain for you.I believe your RDMS is smart enough to handle your queries with great performance. One thing to take note is your query structure.Write performance-based queries.As i said, LINQ should do this best....cheers
Related
I have different post types, like status updates, projects, donation etc. Each type of post has its one or more tables in databse. A user can create all post types. User has a wall like Facebook where he can see different post types which he created in chronological order (any post type created last should be on top of the wall).
What would be the most appropriate approach?
Fetch data from database with different queries store in array and then manipulate array?
To write a complex single query which can fetch data from different tables in chronological order?
To make a separate table for user activity and store data whenever user perform any activity?
Your approach different from the above?
simple to set up, doesn't perform very well (has a very bad worst-case).
is the simplest. You say complex but you can do this fairly easy with a UNION + ORDER BY construction. Performance will be pretty good.
will perform the best I think but there will be some duplication and things might get a little complex. Relational databases are not very good at polymorphism.
What's important to realize is that it's relatively easy to switch between these solutions. If you have a service oriented architecture (or just good design in general). So I wouldn't be too worried about which approach you pick. If in the future it seems your chosen approach doesn't work too well you could switch to another.
lets say i have this mysql db, and all the tables in the db are related to one another, primary keys, foreign keys, etc all are set. Now is it possible to predict, just from the database design, what the queries will be used for the application? Since the database does dictate the application capabilities, then therefore from the design, we can predict what queries that will be used in the application, right?
If it is possible, is there a strategy or automated way to generate the possible queries?
I have written a book on the subject of analyzing data using SQL and Excel, and have spent many years working with databases.
Yes, from a database structure, you can figure out how tables are going to be joined together. You are not going to figure out the harder -- and generally more business relevant -- things that users need. Here are some examples:
You can have a database where the primary table is telephone calls, with the associated information. From this database, you may need to know the maximum number of active calls at one time. Or you may need to know how many different people someone calls in a month.
You can have a database of subscriber records. You may need to figure out the probability that someone will stop after a given amount of time.
You can have a database of products and purchases. You may need to figure out the most common combinations of three products that occur together.
You can have a database of credit card purchases. You may need to figure out who spends more than $200 in a restaurant more than 50 miles from their billing address.
The point is. A database does not represent "application capabilities". A database represents entities and relationships between them, presumably in the real world. There is hubris to think that you can look at a database and know what the business questions are.
Instead, the purpose of a database is to support data, which in turn, supports applications. The needs of applications will change over time. The beauty of databases, as opposed to many other data storage technologies, is that the technology scales as the data increases, supports changes to the structure, and allows new entities and relationships to be added into the system, without completely rewriting it.
Over time, and with experience, you might develop intuition on what's important. Even if you do, you will be constantly surprised at the varied needs of your users.
I am sincerely not trying to be smart here but answer is - yes and no.
Yes, because 3NF design usually outlines business rules behind it pretty well, so you can to a degree tell what is the business logic behind it, you can create an object or graph model from it and get a good idea
what kinds of questions can be asked from based on connections/relations and accessible properties.
No, because combinatorially you might have a untractable number of combinations of questions from a graph. Hence, you can't really tell what question one might ask in reasonable, non-exponential amount of time.
In general, if design is good and tables are meaningfully named you can get a pretty good idea what is going on.
Theoretically it's possible but due to the combinatorial explosion of N rows by X columns by Z tables by W possible functions by Q possible values on each column/row this is an amazingly large number.
The issue here is that you need to take into account the data too. Some queries only make sense when there is particular data and other don't. So you are essentially considering massively large hypercube.
I work with Multidimensional databases (denormalised cubes) and this is essentially denormalised databases. Have a read aup on OLAP theory and you'll see why.
So in short no as it's practically impossible.
Now is it possible to predict, just from the database design, what the queries will be used for the application?
You can, at least in principle, predict which queries can be answered efficiently. Which queries will the applications actually try to execute is another matter.
In an ideal world, database model would take into account all the querying needs of all the applications, now and in the future. We don't live in that world yet ;)
If it is possible, is there a strategy or automated way to generate the possible queries?
No, that requires human understanding of what the model actually means. Unfortunately, there is no good way to teach a tool to have that level of understanding.
A good model will immediately make sense to a person experienced in database modeling and the domain being modeled. Such person will typically be able to predict a fair portion of queries actually being used, but rarely all of them, so the documentation beside the database model itself is desirable. And of course, not all models are good...
I am currently embarking on a new venture to learn PHP and MySQL. I have done some simple databases in the past using Access, but this one is to be a web-centric database for tracking a myriad of data including contacts and project information. I will need to link the various tables in various relationships, and I am not sure the best way to do that. Since I am just starting out with PHP/MySQL I am researching online sources for learning as much as possible. If anyone has recommendations on books or websites, I would appreciate it.
In setting up my tables, one major area that I am concerned with is contacts. I will have a variety of contacts that include: employees, clients, vendors, subcontractors, etc.. and a single contact can be multiple types and each type would have various additional fields that pertain to them. My thought was to have one contacts table that links to other tables for the various contact types. I'm not sure which field type or setup of table options are best... Thoughts?
This scenario will likely play out in other areas of the database as well for projects and products.
Any pointers/direction would be appreciated.
WES
Are you familiar with Object Oriented Design.
In a RDBMS, such as MySQL, I would design the database as follows:
Contacts (first name, last names, etc)
Employees
Vendors
Clients
etc
Your tables that extend Contacts would hold their specific data as well as a contact_id column, which creates the relationship.
As an aside, NoSQL solutions solve this problem natively as they don't have a rigid schema. Meaning you could save various data for each record.
I would give a little thought to how you're going to use those extended fields. Certainly create a table for contacts, and you definitely could create a table for each contact type (employees, clients, etc) with a column connecting the record to the contact table (thus employees would have employee_id, contact_id, propertyOne, propertyTwo, etc).
Another option though, which may be convenient if your application is contact centric and you really just want to be able to associate different kinds of information with contacts, would be to have a contact table, a table containing the types of extended information (say "contactTypes" and it would have the information that a vendor type has a billing address for example) and a third table to actually hold all the data (name-value pairs). This is a bit more fluid in that it will let you add new types of contacts or add fields to a type without actually altering your schema. The first option (Jason McCreary's) might scale better if you're going to have many, many records...
Regarding resources - there's so much out there, I can't even begin to narrow it down for you - look at the php manual and just google "php mysql tutorial" - tons of stuff.
A book I would recommend that helped me a lot is Beginning-PHP-MySQL-Novice-Professional
And make sure you watch the following video, but it might be advanced if you do not know the basics. Bbuild-a-login-system-for-a-simple-website/
For database relations as well as an introduction visit Coding HORROR
I hope this helps..
My question is very similar to this question but a bit more specific.
My application has multiple companies and multiple users per company. It makes the most sense to me (at this point) for each company to have a "private" set of tables. This makes security extremely simple as I don't have to worry about JOIN-ing up my structure tree to be sure I only get data for the specific company. I can also extend the mysqli database extension and have it put a prefix on the table names in the query so that I never have to worry about security while writing my queries.
One other major advantage that I can see is that if one of the companies needs a customization, I can modify their specific tables and not have to take into account everyone else. The way that my app is designed it is extremely modular and implementing custom code is very simple.
There are some disadvantages that I can see but so far it seems that the above advantages would out-weigh them. The above proposed system does sort of grate on my (possibly) hyper-normalized database schema preferences up to this point. Another obvious disadvantage is implementing schema alterations but I can script them and be safe enough. One point that I'm not sure about is performance. If I have MySQL working with so many tables, will I make bottlenecks for myself?
I look forward to your thoughts!
Your proposal sounds reasonable to me. I would suggest that instead of prefixing your tables with the company name, you store the tables for each company in a separate schema. That way you can have tables with the same name, reducing your problems in the code, and have each set of tables protected by a different username and password in a convenient manner. Backups and replication would then all be distinguishable at need.
Lookup tables could be stored in yet another schema to which all users have access.
I am trying to teach myself how to use SQL, namely mysql.
What I am trying to understand is how to deal with many different types of data with in the same table. Say I am building a web application, and I have many different content types (blog item, comment item, files, pages, forms) that I need to store different data fields for each. Would I create a new table for each different content type since each content type has its own unique field requirements, or is there a better way to do this? It seems a little much to create a new table for content each type. If I had 30 types of content in my web app, that would be 30 tables just for the types, which seems a little much. And, if I had a new content type, I would have to create a new table that contained all the required fields I would need for that type.
Is there a better way to do something like this, when I have many different types of content that each requires different fields of data that needs to go into the database? Can I somehow check to see what type the content is, then select another table that holds all the different field types?
A little confused about what to do.
Just to give an example:
Stack Overflow itself uses the same database table (called Posts) for questions and answers. Even though these two types of data are not identical, the site creators considered them similar enough to put them into one table. There's a PostTypeId field that says whether this post is a question or an answer. On answers, the Title field would be NULL, on questions, other columns might be ignored.
Comments, on the other hand, are in a different table. Of course you could theoretically put them into the same Posts table and have a PostTypeId for comments. But the overhead this would create (because of the lightweightness of comments) justifies creating a new table.
I know this isn't really an answer, and other developers might even have decided to put questions and answers into different tables; but it gives some perspective. Long story short: It depends :)
Sketch interactions
First try not to think about database design, but how entities should interact between themselves. Think of it as each entity has its own Class, which represents required data.
It's always a good start to take pencil and paper and sketch your interactions between these entities, on what interactions (or relations) are you trying to accomplish. Learning the Database design process
Extendability and reuse
For example you want to have a User, which can post BlogPosts each BlogPost can have a set of Tags and relevant set of Comments. Attachments can be injected into BlogPost and also into Comment.
Reusability and extendability is the key. When sketching your interactions try to isolate dependencies. Think of it in OO manner. Let's explore the Attachment a little more. You can create an Attachment table and then extend Attachement by creating BlogPostAttachment and CommentAttachment where you can easily create relations between these dependable entities. This creates an easily extendable content type which you can further reuse in eg. UserDetailsAttachment
ORM's to rescue
By studying example code usage of Object relational mappers like Doctrine or Propel you can grasp some ideas for table extendabity. Practical examples are always the best one.
Related SO questions, which you may be interested in
Good Resources for Relational Database Design
Good PHP ORM Library?
How should a programmer learn great database design?
I know, it's a long way to go, but considering factors of creating large scale DB applications with many relations and entity types it best to use help of ORM in the long run
You needn't be afraid of using many many tables - the database will happily deal with lots of them without complaining. If you let each content type have its own table, you get certain advantages:
Simplicity: Each table can be fairly simple, and the constraints are straightforward. For example if ContentType1 has a field with a relation to another table, you can make that a foreign key in the database design and the RDBMS will take care of data integrity for you.
Indexing efficiency: if ContentType2 needs to be indexed by date but ContentType3 needs to be indexed by name (to take a simple example), having them in two separate tables means each index is there for exactly the data it needs and nothing else. Combining them in one table means you need both indexes covering the combined dataset, which is messier and uses up more disk space.
If you need to output a list combining two content types, a UNION of the two tables is both easy; and if you need to do that often with large amounts of data, an indexed view can make it cheap.
On the other hand, if you have two content types which are very similar (as in the StackOverflow case above for example), you can get some advantages from combining them into one table:
Simplicity: You only need to code the table once - if done right (i.e. the two content types are really very similar), this can make your codebase smaller and simpler.
Extensibility: if a third content type crops up which is again similar to the first two, and similar in the same way that the first two match each other, the table can straightforwardly be extended to store all three content types.
Indexing for performance. If the most common way of getting at the data is to combine the two content types and order them by date (say), a field which is common to both content types, then it can be inefficient to have two separate tables which must repeatedly be UNIONed and then sorted. Combining the two content types in one table lets you put a single index on the date field, allowing faster querying (though remember you can get a similar benefit from indexed views).
If you normalize rigorously, you will have a database where every entity type has its own table in the database. However, denormalization in various ways (such as combining two entity types in one table) can have benefits which might (depending on the size and shape of your data) outweight the costs. I'd advise a strategy of keeping all content types separate at least at first, and consider combining them as a tactical denormalization if it turns out to be necessary.
You need to read a book about building websites with PHP and MySQL. It's a good attitude to google first because some programmers think it is a lazy question. I suggest reading "Learning PHP MySQL and JavaScript".
Anyway, before you start coding your site, you need to plan what kinda information you will store, then you design your database. Say a register form will contain A First_Name, Second_Name, DateOfBirth, Country, Gender and Email. You create a table named as say "USER_INFO" and you assign a datatype matching the data you would like to store, a Number, text, Date, and So on, then via PHP you connect to MySQL and store or retrieve the data you want. You really need to read a book or a tutorial so you get a full answer, AND GOOGLE :P