database table design dilemma, a lot of check boxes? - php

I want to begin with Thank you, you guys have been good to me.
I will go straight to the question.
Having a table with over 400 columns, is that bad?
I have web forms that consists mainly of questions that require check box answers.
The total number of check boxes can run up to 400 if not more.
I actually modeled one of the forms, and put each check box in a column (took me hours to do).
Because of my unfamiliarity with database design, I did not feel like that was the right way to go.
So I read somewhere that some people use the serialize function, to store a group of check boxes as text in a column.
I just want to know it that would be the best way to store these check boxes.
Oh and some more info I will be using cakephp orm with these tables.
Thanks again in advance.
My database looks something like this
Table : Patients, Table : admitForm, Table : SomeOtherFOrm
each form table will have a PatientId
As i stated above i first attempted creating a table for each form, and then putting each check box in a column. That took me forever to do.
so i read some where serializing check boxes per question would be a good idea
So im asking would would be a good approach.

For questions with multiple options, just add another table.
The question that nobody has asked you yet is do you need to do data mining or put the answers to these checkbox questions into a where clause in a query. If you don't need to do any queries on the data that look at the data contained in these answers then you can simply serialize them up into a few fields. You could even pack them into numbers. (all who come after you will hate you if you pack the data though)
Here's my idea of a schema.

== Edit #3 ==
Updated ERD with ability to store free form answers, also linked patient_reponse_option to question_option_link table so a patients response will be saved with correct option context (we know which question the response is too). I will post a few queries soon.
== Edit #2 ==
Updated ERD with form data
== Edit #1 ==
The short answer to your question is no, 400 columns is not the right approach. As an alternative, check out the following schema:
== Original ==
According to your recent edit, you will want to incorporate a pivot table. A pivot table breaks up a M:M relationship between 'patients' and 'options', for example, many patients can have many options. For this to work, you don't need a table with 400 columns, you just need to incorporate the aforementioned pivot table.
Example schema:
// patient table
tableName: patient
id: int(11), autoincrement, unsigned, not null, primary key
name_first: varchar(100), not null
name_last: varshar(100), not null
// Options table
tableName: option
id: int(11), autoincrement, unsigned, not null, primary key
name: varchar(100), not null, unique key
// pivot table
tableName: patient_option_link
id: int(11), autoincrement, unsigned, not null, primary key
patient_id: Foreign key to patient (`id`) table
option_id: Foreign key to option (`id`) table
With this schema you can have any number of 'options' without having to add a new column to the patients table. Which, if you have a large number of rows, will crush your database if you ever have to run an alter table add column command.
I added an id to the pivot table, so if you ever need to handle individual rows, they will be easier to work with, vs having to know the patient_id and option_id.

I think I would split this out into 3 tables. One table representing whatever entity is answering the questions. A second table containing the questions themselves. Finally, a third junction table that will be populated with the primary key of the first table and the id of the question from the second table whenever the entity from the first table selects the check box for that question.

Usually 400 columns means your data could be normalized better and broken into multiple tables. 400 columns might actually be appropriate, though, depending on the use case. An example where it might be appropriate is if you need these fields on every single query AND you need to filter records using these columns (ie: use them in your WHERE clause)... in that case the SQL JOINs will likely be more expensive than having a sparsely populated "wide" table.
If you never need to use SQL to filter out records based on these "checkboxes" (I'm guessing they are yes/no boolean/tinyint type values) then serializing is a valid approach. I would go this route if I needed to use the checkbox values most of time I query the table, but don't need to use them in a WHERE clause.
If you don't need these checkbox values, or only need a small subset of them, on a majority of requests to your table then its likely you should work on breaking your table into multiple tables. One approach is to have a table with the checkbox values (id, record_id, checkbox_name, checkbox_value) where record_id is the id of your primary table record. This implies a one-to-many relationship between your primary records and your checkbox values.

Related

Should I include auto-incremental id in all related tables?

I have multiple tables in a Laravel app with 1-to-1 relationship such as users , users_settings , user_financial
And some 1-to-many relationships such as users_histories
My questions are:
1. Should I always include incremental id at the first?
for example is the id necessary in the Table #2 below?
Table 1:
id (primary,increments) , name, email, password
Table 2:
id (primary,increments), user_id, something_extra
^ why does every guide include this? // e.g. https://appdividend.com/2017/10/12/laravel-one-to-one-eloquent-relationships/
Can't I just use user_id as primary key and skip the incremental key? because I want to auto insert it on table 2 as soon as data is inserted in table 1.
2. How should I name 1-to-1 and 1-to-many tables in Laravel? `
I searched but didn't find any naming convention for different type of relationships...
Currently I do:
users table with primary key id is the base.
1-to-1: users_settings with foreign key user_id
1-to-many: users_histories foreign_key user_id
many-to-many: users_groups foreign_key user_id
should the first two tables be named settings/setting , histories/history instead? sorry I'm a little confused here.
I actually asked a similar question around 2 days ago. Its up to you but I'd say yes. In my case if I don't auto_increment all my ids in the related tables, data won't be associated with the correct user. However, there is an argument for saying auto_increment columns should not be used in this case, but they are useful for other things. According to some, the relationships might not be as meaningful so it'd be up to you and down to the specifics of you data tables for how meaningful the relationship will be. Regardless, you should research more into the advantages of auto_incrementing all your ids in related tables, as well as possible disadvantages before deciding what you want to do. Either way is fine, but they offer different advantages and disadvantages- which you'll need to compare and what works best for your specific case.
This is a well debated topic about the primary key. IMHO, No, you shouldn't. Every column in database should have a purpose. Following this, for your example, I agree that the auto_increment id is redundant and this is simply because it doesn't have a purpose. The second table is still uniquely describing the user so that the primary key should be the user_id.
Beside the above, there is another principle for me to decide whether I need the auto_increment id: whether I can see a table as an entity. For example, user is clearly an entity, but a relationship is not (in most cases), i.e., composite key can serves the purpose. But when an relationship table is extended to have more attributes and it starts to make sense for it to have an auto_increment id.
I don't have much experience on Laravel, but the naming for a database table should not be dictated by a framework. Comparing history and user_history, what a new DBA or developer expect from the two names without looking its data? user_history describes the table more precisely

Better approach for updating multiple data

I have this MySQL table, where row contact_id is unique for each user_id.
history:
- hist_id: int(11) auto_increment primary key
- user_id: int(11)
- contact_id: int(11)
- name: varchar(50)
- phone: varchar(30)
From time to time, server will receive a new list of contacts for a specific user_id and need to update this table, inserting, deleting or updating data that is different from previous information.
For example, currenty data is:
So, server receive this data:
And the new data is:
As you can see, first row (John) was updated, second row (Mary) was deleted and some other row (Jeniffer) was included.
Today what I am doing is deleting all rows with a specific user_id, and inserting the new data. But the autoincrement field (hist_id) is getting bigger and bigger...
Obs: Table have about 80 thousand records, and this update will occur 30 times a day or more.
I have some (related) questions:
1. In this scenario, do you think deleting all records from a specific user_id and inserting updated data is a good approach?
2. What about removing the autoincrement field? I don't need it, but I think it is not a good idea to have a table without a primary key.
3. Or maybe the better approach is to loop new data, selecting each user_id / contact_id for comparing values to update?
PS. For better approach I mean the most efficient way
Thank you so much for any help!
In this scenario, do you think deleting all records from a specific user_id and inserting updated data is a good approach?
Short Answer
No. You should be taking advantage of 'upsert' which is short for 'insert on duplicate key update'. What this means is that if they key pair you're inserting already exists, update the specified columns with the specified data. You then shorten your logic and reduce increments. Here's an example, using your table structure that should work. This is also assuming that you have set the user_id and contact_id fields to unique.
INSERT INTO history (user_id, contact_id, name, phone)
VALUES
(1, 23, 'James Jr.', '(619)-543-6222')
ON DUPLICATE KEY UPDATE
name=VALUES(name),
phone=VALUES(phone);
This query should retain the contact_id but overwrite the prexisting data with the new data.
What about removing the autoincrement field? I don't need it, but I think it is not a good idea to have a table without a primary key.
Primary keys do not imply auto incremented values. I could have a varchar field as the primary key containing names of fruits and vegetables. Is this optimized for performance? Probably not. There many situations that might call for auto increment and there are definite reasons to avoid it. It all depends on how you wish to access the data and how this can impact future expansion. In your situation, I would start over on the table structure and re-think how you wish to store and access the data. Do you want to write more logic to control the data OR do you want the data to flow naturally by itself? You've made a history table that is functioning more like a hybrid many-to-one crosswalk at first glance. Without looking at the remaining table structure, I can't necessarily say on a whim that it's not a good idea. What I can say is that I would do this a bit differently. I will answer this more specifically in the next question.
Or maybe the better approach is to loop new data, selecting each user_id / contact_id for comparing values to update?
I would avoid looping through the data in order to update it. That is a job for SQL and it does this job well. Sometimes, we might find ourselves in a situation where we must do this to either extract data in a specific format or to repair data in some way however, avoid doing this for inserting or updating the data. It can negatively impact performance and you will likely paint yourself into a corner.
Back to what I said toward the end of your second question which will help you see what I am talking about. I am going to assume that user_id is a primary key that is auto-incremented in your user table. I will do some guestimation here and show you an example of how you can redesign your user, contact and phone number structure. The following is a quick model I threw together that shows the foreign key relationship between the tables.
Note: The column names and overall data arrangement could be done differently but I did this quickly to give you a decent example of a normalized database structure. All of the foreign keys have a structural layout which separates your data in a way that enables you to control the flow of data as it enters and leaves your system. Here's the screenshot of the database model I threw together using MySQL Workbench.
(source: xonos.net)
Here's the SQL so that you can look at it more closely.
You'll notice that the "person" table is extracted from users but shares data with contacts. This enables you to store all "people" in one place, all "users" in another and all "contacts" in another. Now, why would we do this? The number one reason can be explained in two scenarios.
1.) Say we have someone, in this example I'll call him "Jim Bean". "Jim Bean" works for the company, so he is a user of the system. But, "Jim Bean" happens to own a side business and does contact work for the company at the same time. So, he is both a contact and a user of the system. In a more "flat table" environment, we would have two records for Jim Bean that contain the same data which could become outdated or incorrect, quickly.
2.) Let's say that Jim did some bad things and the company wants nothing to do with him anymore. They don't want any record of him - as if he never existed. All that we have to do is delete Jim Bean from the Person table. That's it. Since the foreign relationship has "CASCADE" on update/delete - this automatically propagate and clears out the other tables related to him.
I highly recommend that you do some reading on normalized data structure. It has saved me many hours once I got the hang of it and I will never go back.

Populating a third table to maintain efficiency

I am currently working on a PHP/MySQL project for an assignment. In studying the efficient design of databases while working on the assignment I notice that in many cases it is good practice to create a third table when working with only two sets of data.
For example, if we have a table for "Students" and a table for "Addresses" it appears to be a good idea to create a third table i.e. "Student_Addresses" since a student can hypothetically have more than one address (separated parents etc.) and a single address can represent more than one student (siblings).
My question is: How do we go about populating that third table? Is there a way that it is done automatically using primary and/or foreign keys?
I've tried Google and my textbook to understand this but I've gotten nowhere. Links to tutorials or articles would be greatly appreciated.
Thanks for your help. I hope the question and example are clear.
n:m or 1:m normalization rule
Option 1:
user table
id
f_name
s_name
......
user address table
id
user_id // this should be index only as foreign keys will allow 1:1 only
address line 1
address line 2
address line 3
address_type (home, office ....)
Option 2:
user table
id
f_name
s_name
......
address table
id
address line 1
address line 2
address line 3
address_type (home, office ....)
user_address table
userId
addressId
according to your description option 2 would be the right solution. After adding the data to user table and address table then you need to add the data to user_address table manually. Some Object relational mapper (ORM) may do add the data to the third table automatically but you need to define the relations. check http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/reference/association-mapping.html.
http://docstore.mik.ua/orelly/linux/sql/ch02_02.htm
http://www.keithjbrown.co.uk/vworks/mysql/mysql_p7.php
You can save the data in the third table using triggers when the data is inserted/updated/deleted in your base tables. You can learn more about triggers at
mySQL Triggers
However in your case it would be better if you could write the logic at the application/code level to make an entry in the third table. You can set up foreign key relationships to this table from your base tables so that the data remains consistent.
There is no native method in MySQL to populate Student_Addresses in your situation - you have to take care of entering data (connections) by yourself, but you can use - for example - transactions - see answers in this topic: SQL Server: Is it possible to insert into two tables at the same time?
For taking care of connections consistency - in Student_Addresses make not-null fields for relations to ID from Student and ID from Address, make both of these field as unique key together and use ON UPDATE CASCADE and ON DELETE CASCADE. This will take care of removing records from junction table when removing records from any of two other tables and also won't allow you to add same address to the same student twice.
I don't think data will be populated automatically rather it's responsibility of user to insert data.
I am note sure about PHP but using Hibernate and Java this can be done seemlessly. Since data of Students and addresses could be coming through some web application Hibernate can map java objects to records in table and also populate relationship table.

Merge several mySQL databases with equivalent structure

I would like write a php script that merges several databases, and I would like to be sure of how to go around it before I start anything.
I have 4 databases which have the same structure and almost same data. I want to merge them without any duplicate entry while preserving (or re-linking) the foreign keys.
For example there is a db1.product table which is almost the same as db2.products so I think I would have to use LIKE comparison on name and description columns to be sure that I only insert new rows. But then, when merging the orders table I have to make sure that the productID still indicates the right product.
So I thought of 2 solutions :
Either I use for each table insert into db1.x as select * from db2.x and then make new links and check for duplicate using triggers.
Either I delete duplicate entries and update new foreign keys (after having dropped constraints) and then insert row into the main database.
Just heard of MySQL Data Compare and Toad for mySQL, could they help me to merge tables ?
Could someone indicate to me what should be the right solution ?
sorry for my english and thank you !
First thing is how are you determining whether products are the same? You mentioned LIKE comparison on name and description. You need to establish a rule what says that product is one and the same in your db1, db2 and so on.
However, let's assume that product's name and description are the attributes that define it.
ALTER TABLE products ADD UNIQUE('name', 'description');
Run this on all of your databases.
After you've done that, select one of the databases you wish to import into and run the following query:
INSERT IGNORE INTO db1.products SELECT * FROM db2.products;
Repeat for the remaining databases.
Naturally, this all fails if you can't determine how you're going to compare the products.
Note: never use reserved words for your column names such as word "name".
Firstly, good luck with this - sounds like a tricky job.
Secondly, I wouldn't do this with PHP - I'd write SQL to do the work, assuming this is a one-off migration task and not a recurring task.
As an approach, I would do the following.
Create a database with the schema you want - it sounds like each of your 4 databases have small variations in the schema. Just create the schema for now, don't worry about the data.
Create a "working" database, with the same schema, but with columns for "old" primary keys. For instance:
table ORDER
order_id int primary key auto increment
old_order_id int not null
...other columns...
table ORDER_LINE
order_line_id int primary key auto increment
old_order_line_id int not null
order_id int foreign key
...other columns...
Table by table, Insert into your working database from your first source database. Let the primary keys auto_increment, but put the original primary key into the "old_" column.
For instance:
insert into workingdb.orders
select null, order_id, ....other columns...
from db1.orders
Where you have a foreign key, populate it by finding the record in the old_ column.
For instance:
insert into workingdb.order_line
select null, ol.order_line_id, o.order_id
from db1.order_line ol,
workingdb.order
where ol.order_id = o.old_order_id
Rinse and repeat for the other databases.
Finally, copy the data from your working database into the "proper" database. This is optional - it may help to retain the old IDs for lookups etc.

MySQL design for associated IDs

I'm new to programming so forgive my simple questions.
Basically, I have two different tables containing data related to one another. I'd like to create a new column called "id" which will associate rows in both tables so that I can appropriately display the data.
When a user takes an action, a row is inserted into both tables.
What kind of properties should "id" have? Primary key, auto-increment on both tables or one table? How do I ensure that the same ID is inserted into both rows, do I insert into table1 first, then grab that ID and insert into table2?
Any help appreciated. Thanks
It's somewhat difficult to answer your question without knowing what the two tables contain, but I suggest you read about database normalization.
Regardless of how many tables you decide to have, each table should have an id column of some sort. Having a way to uniquely refer to a single row makes life a lot easier down the road when you need to make changes to the data. Auto-increment saves you from having to come up with your own unique primary key values.

Categories