I'm thinking about how to represent a complex structure in a SQL Server database.
Consider an application that needs to store details of a family of objects, which share some attributes, but have many others not common. For example, a commercial insurance package may include liability, motor, property and indemnity cover within the same policy record.
It is trivial to implement this in C#, etc, as you can create a Policy with a collection of Sections, where Section is inherited as required for the various types of cover. However, relational databases don't seem to allow this easily.
I can see that there are two main choices:
Create a Policy table, then a Sections table, with all the fields required, for all possible variations, most of which would be null.
Create a Policy table and numerous Section tables, one for each kind of cover.
Both of these alternatives seem unsatisfactory, especially as it is necessary to write queries across all Sections, which would involve numerous joins, or numerous null-checks.
What is the best practice for this scenario?
#Bill Karwin describes three inheritance models in his SQL Antipatterns book, when proposing solutions to the SQL Entity-Attribute-Value antipattern. This is a brief overview:
Single Table Inheritance (aka Table Per Hierarchy Inheritance):
Using a single table as in your first option is probably the simplest design. As you mentioned, many attributes that are subtype-specific will have to be given a NULL value on rows where these attributes do not apply. With this model, you would have one policies table, which would look something like this:
+------+---------------------+----------+----------------+------------------+
| id | date_issued | type | vehicle_reg_no | property_address |
+------+---------------------+----------+----------------+------------------+
| 1 | 2010-08-20 12:00:00 | MOTOR | 01-A-04004 | NULL |
| 2 | 2010-08-20 13:00:00 | MOTOR | 02-B-01010 | NULL |
| 3 | 2010-08-20 14:00:00 | PROPERTY | NULL | Oxford Street |
| 4 | 2010-08-20 15:00:00 | MOTOR | 03-C-02020 | NULL |
+------+---------------------+----------+----------------+------------------+
\------ COMMON FIELDS -------/ \----- SUBTYPE SPECIFIC FIELDS -----/
Keeping the design simple is a plus, but the main problems with this approach are the following:
When it comes to adding new subtypes, you would have to alter the table to accommodate the attributes that describe these new objects. This can quickly become problematic when you have many subtypes, or if you plan to add subtypes on a regular basis.
The database will not be able to enforce which attributes apply and which don't, since there is no metadata to define which attributes belong to which subtypes.
You also cannot enforce NOT NULL on attributes of a subtype that should be mandatory. You would have to handle this in your application, which in general is not ideal.
Concrete Table Inheritance:
Another approach to tackle inheritance is to create a new table for each subtype, repeating all the common attributes in each table. For example:
--// Table: policies_motor
+------+---------------------+----------------+
| id | date_issued | vehicle_reg_no |
+------+---------------------+----------------+
| 1 | 2010-08-20 12:00:00 | 01-A-04004 |
| 2 | 2010-08-20 13:00:00 | 02-B-01010 |
| 3 | 2010-08-20 15:00:00 | 03-C-02020 |
+------+---------------------+----------------+
--// Table: policies_property
+------+---------------------+------------------+
| id | date_issued | property_address |
+------+---------------------+------------------+
| 1 | 2010-08-20 14:00:00 | Oxford Street |
+------+---------------------+------------------+
This design will basically solve the problems identified for the single table method:
Mandatory attributes can now be enforced with NOT NULL.
Adding a new subtype requires adding a new table instead of adding columns to an existing one.
There is also no risk that an inappropriate attribute is set for a particular subtype, such as the vehicle_reg_no field for a property policy.
There is no need for the type attribute as in the single table method. The type is now defined by the metadata: the table name.
However this model also comes with a few disadvantages:
The common attributes are mixed with the subtype specific attributes, and there is no easy way to identify them. The database will not know either.
When defining the tables, you would have to repeat the common attributes for each subtype table. That's definitely not DRY.
Searching for all the policies regardless of the subtype becomes difficult, and would require a bunch of UNIONs.
This is how you would have to query all the policies regardless of the type:
SELECT date_issued, other_common_fields, 'MOTOR' AS type
FROM policies_motor
UNION ALL
SELECT date_issued, other_common_fields, 'PROPERTY' AS type
FROM policies_property;
Note how adding new subtypes would require the above query to be modified with an additional UNION ALL for each subtype. This can easily lead to bugs in your application if this operation is forgotten.
Class Table Inheritance (aka Table Per Type Inheritance):
This is the solution that #David mentions in the other answer. You create a single table for your base class, which includes all the common attributes. Then you would create specific tables for each subtype, whose primary key also serves as a foreign key to the base table. Example:
CREATE TABLE policies (
policy_id int,
date_issued datetime,
-- // other common attributes ...
);
CREATE TABLE policy_motor (
policy_id int,
vehicle_reg_no varchar(20),
-- // other attributes specific to motor insurance ...
FOREIGN KEY (policy_id) REFERENCES policies (policy_id)
);
CREATE TABLE policy_property (
policy_id int,
property_address varchar(20),
-- // other attributes specific to property insurance ...
FOREIGN KEY (policy_id) REFERENCES policies (policy_id)
);
This solution solves the problems identified in the other two designs:
Mandatory attributes can be enforced with NOT NULL.
Adding a new subtype requires adding a new table instead of adding columns to an existing one.
No risk that an inappropriate attribute is set for a particular subtype.
No need for the type attribute.
Now the common attributes are not mixed with the subtype specific attributes anymore.
We can stay DRY, finally. There is no need to repeat the common attributes for each subtype table when creating the tables.
Managing an auto incrementing id for the policies becomes easier, because this can be handled by the base table, instead of each subtype table generating them independently.
Searching for all the policies regardless of the subtype now becomes very easy: No UNIONs needed - just a SELECT * FROM policies.
I consider the class table approach as the most suitable in most situations.
The names of these three models come from Martin Fowler's book Patterns of Enterprise Application Architecture.
The 3rd option is to create a "Policy" table, then a "SectionsMain" table that stores all of the fields that are in common across the types of sections. Then create other tables for each type of section that only contain the fields that are not in common.
Deciding which is best depends mostly on how many fields you have and how you want to write your SQL. They would all work. If you have just a few fields then I would probably go with #1. With "lots" of fields I would lean towards #2 or #3.
In addition at the Daniel Vassallo solution, if you use SQL Server 2016+, there is another solution that I used in some cases without considerable lost of performances.
You can create just a table with only the common field and add a single column with the JSON string that contains all the subtype specific fields.
I have tested this design for manage inheritance and I am very happy for the flexibility that I can use in the relative application.
With the information provided, I'd model the database to have the following:
POLICIES
POLICY_ID (primary key)
LIABILITIES
LIABILITY_ID (primary key)
POLICY_ID (foreign key)
PROPERTIES
PROPERTY_ID (primary key)
POLICY_ID (foreign key)
...and so on, because I'd expect there to be different attributes associated with each section of the policy. Otherwise, there could be a single SECTIONS table and in addition to the policy_id, there'd be a section_type_code...
Either way, this would allow you to support optional sections per policy...
I don't understand what you find unsatisfactory about this approach - this is how you store data while maintaining referential integrity and not duplicating data. The term is "normalized"...
Because SQL is SET based, it's rather alien to procedural/OO programming concepts & requires code to transition from one realm to the other. ORMs are often considered, but they don't work well in high volume, complex systems.
The another way to do it, is using the INHERITS component. For example:
CREATE TABLE person (
id int ,
name varchar(20),
CONSTRAINT pessoa_pkey PRIMARY KEY (id)
);
CREATE TABLE natural_person (
social_security_number varchar(11),
CONSTRAINT pessoaf_pkey PRIMARY KEY (id)
) INHERITS (person);
CREATE TABLE juridical_person (
tin_number varchar(14),
CONSTRAINT pessoaj_pkey PRIMARY KEY (id)
) INHERITS (person);
Thus it's possible to define a inheritance between tables.
Alternatively, consider using a document databases (such as MongoDB) which natively support rich data structures and nesting.
I lean towards method #1 (a unified Section table), for the sake of efficiently retrieving entire policies with all their sections (which I assume your system will be doing a lot).
Further, I don't know what version of SQL Server you're using, but in 2008+ Sparse Columns help optimize performance in situations where many of the values in a column will be NULL.
Ultimately, you'll have to decide just how "similar" the policy sections are. Unless they differ substantially, I think a more-normalized solution might be more trouble than it's worth... but only you can make that call. :)
Related
I am not sure how to formulate the question, so I didn't find anything useful enough regarding my problem.
1. In PRO(projects) I have 3 columns: "organizer", "partners", "other". I want to use information from table ORG(organisations) in all of these 3. Also, I need to show more than one partner. Is it possible? For example, I have
org:
name |country|city |
apple |shop |fruits part |
cherry|plate |big |
orange|plate |little |
banana|shop |frozen fruits|
I want to show in view.php:
All projects:
name |organizer|partner |other |place |
salad |banana |apple,cherry |orange |plate,little|
salad2|banana |apple |orange |plate |
Info for place in PRO is taken from two tables, country and city. But country and city are also used by organisation. organisation's country doesn't equal project's country(for example, project takes place in London, but none of the participants'organisations are based in London).
Are all of these things doable with what I already have?
I get "circled" relationship thanks to country/city double usage, is it allowed?(my teacher said no or should be avoided -? I don't remember- and I got different opinions from web).
If you want to Add multiple partners, there are really two ways to do it.
You could add a table to store partner information you could then have a column that contains a foreign key to the projects primary key that they are assigned to.
(this would be best if a partner is never assigned to multiple projects)
this is really an extension of #1. except in this case if you need to have projects assigned to multiple partners and partners assigned to multiple projects you could add a another table with only two columns. The first would be a foreign key to a project primary key, and the second would be a foreign key to partner's primary key. This way if you want to query for all of the partners assigned to project you could do:
Select * from Partners_Table where id=(Select partner_id from CrossReference_Table);
you can do the same the other way too.
Your Country Fields should be foreign keys to your country table, Same with city because your projects are not necessarily in the same city as there respective organization, therefor they should not be linked to the organization's city and country field. Everything else looks good from what I can tell.
I am trying to add Propel 1.7 to a legacy php project as part of a refactor. I have generated the schema.xml file, and am now adding in the many-to-many relations.
However, there is a table that encodes a three-way many to many relationship, so it has three columns as its primary key, each of which refers to a different id column in a different table.
Basically, websites are a concept defined by a country and a language.
ads_to_websites:
ad_id | country_id | lang_id
refer to
ads->id
country->id
lang->id
Is this supported at all in Propel? I understand how to add foreign keys and primary keys to the schema.xml, but what happens when I want to do a join through the table? Can I use isCrossRef? The documentation is slightly vague, I find.
Or should I make a new table, so that we have
websites:
id | country_id | lang_id
refering to
country->id
lang->id
and
ads_to_websites:
website_id | ad_id
refering to
websites->id
ads->id
Which is possible and I think fits the data model, but a pain to implement (another project uses the same db).
Alternatively, would another ORM like Doctrine do this better?
What would be an efficient way to store "Quests" in an SQL database? Let's say the context is RPG. (Here was a previous question: How to store Goals (think RPG Quest) in SQL)
To summarize a Quest may be a combination of the following:
Discover [Location]
Kill n [MOB Type]
Acquire n of [Object]
Achieve a [Skill] in [Skillset]
All the other things you get in RPGs
The answer listed out in the link was:
For the Quest table:
| ID | Title | FirstStep (Foreign key to GuestStep table) | etc.
The QuestStep table
| ID | Title | Goal (Foreign key to Goal table) | NextStep (ID of next QuestStep)
I actually think it's pretty neat, but I have two things I would like to add:
Let's say I want to create it so that a quest can only be active only on certain days (e.g. M W F only) and/or active only at a certain time span (e.g. Halloween). What would be the ideal way of doing this?
Another thing: Let say I want to have a quest with two steps and a quest with 8 steps. We can create a table that is 8 columns wide but we would have lots of empty space. And what if the stars align and I needed an 9 step-wide quest?
The QuestStep table actually has a NextStep, sort of like a linked list, but what about Quests that you can do out of order?
P.S: As you can see it is potentially read-heavy, and the schema is potentially... non-schematic. Is NosSQL a vying option? (Redis seems memory only, so I'll more likely go with MongoDB)
I am developing a (potentially) large-scale tracking software that tracks customer data, along with tickets that are created for tasks associated with said customers. This system is written entirely in PHP, and the database is MySQL.
The system currently supports multiple "locations" (stores for example), and each has its own table for customer data (in the same database, each database can be host to a whole different business' installation). For example:
store1_customers
customer_id | customer_firstname | customer_lastname
----------------------------------------------------
1 | John | Doe
2 | Bill | Bob
store2_customers
customer_id | customer_firstname | customer_lastname
----------------------------------------------------
1 | Jill | Smith
2 | Jimmy | Person
This works great for keeping locations separate for different business needs. However, we are running into the need to have "global" customers for other instances that can be accessed from any location, while keeping other customers separate.
The two options I can think of are to either make a new "global_customers" table that can then be pulled from separately, or to merge all of the data into one large table.
I have concerns with both methods. The first would require a new column in every table that references the customer to determine which customer table to pull from. For example, store1_tickets would have to know whether to pull the customer ID of 1 from store1_customers or from global_customers. This seems to be a bit dirty, and I think would present problems with trying to do my multiple JOIN queries.
The second method of making one giant table concerns me in two ways: the first being the size of the table (each table so far can have potentially 20k+ records, and there are 7 locations for just one particular installation of the "software"). I know this point may be moot due to how MySQL works and can handle it. The second concern is merging the existing data. I see it being a nightmare since each table has a 1-20k customer ID, and I would have to have some way of changing thousands upon thousands of existing records in other tables to match the new numbering of this table.
Is there a better way, or more proper way of accomplishing this? I'm sorry if this question does seem subjective, but it does come down to a database problem and how to handle the data in a reasonable way.
Merge all the data into one large table. That is how databases are designed to be used.
For data migration, you will end up with new Keys, there is no way around that. You could, however, add a new column to store the 'legacy' ID. This is just some of the pain assoicatied with normalizing a database. Take the pain now rahter than presisting with a sub-optimal database design.
Customer type would be another column within the cusotmer table, probably (but depending on your requirements) this would be a FK to a CustomerType table.
So, I'm dealing with an enormous online form right now. It's separated into different sections visually, but those tend to change. With around 300 fields, it almost seems ridiculous to put them into a single table, though if I separate them and someone decides to move a field to a different section on the front end on several different occasions, it will become a mess in the database and fields won't match their front end sections.
I'm essentially asking: What is the best way to organize something like this in a normalized fashion?
You could move the field names to another table and reference them in the value table.
Example
field_id | field_name
------------------------
1 | first_name
2 | last_name
Then reference from the values:
value_id | field_id | value
--------------------------------
1 | 1 | John
2 | 2 | Doe
3 | 1 | Max
4 | 2 | Jefferson
If you're going to use a SQL database, then the Entity-Attribute-Value model (EAV) described above is probably a good answer. You might also want to mix in a couple of denormalized tables with common or specialized data.
Another option might be a document store though; this sounds like just the kind of problem that inspired data stores like MongoDB. In MongoDB you just store everything as a giant json document. If some data isn't needed for some records and is left out, it isn't considered "bad" in the way sparsely populated wide SQL database tables are.
You can group your fields. Separate them as components and you will probably notice, that you can make multiple tables out off that one. Also by separating table you cant make form with, for example:
fieldset tags
separate it in multiple steps (it think the best solution)
multiple ajax requests for each form after previous is filled
form separated by open/close javascript windows
Database design, object design, and form design are three very different elements. If there are relationships between the data in a one to may fashion, you should have different tables to normalize the data. If however, everything is a One-to-one relationship then having all 300 in the same table is perfectly acceptable. I find it difficult to believe that there is a logical or even physical construct that has 300 elements unto itself; but it's possible. If you start getting into attribute data of something lets say we're talking about a vehicle. We could be talking about a car, a truck, a semi, a motorcycle, a bicycle, etc... each of those types of vehicles have different properties which would be managed in separate tables to normalize the data. moving elements of them to different pages wouldn't make a whole lot of sense; but moving common attributes might. For example I wouldn't ask about color on section 1 and again in section 4. But I might section things out to describe make, model, and then custom attributes.