I'm new to programming so forgive my simple questions.
Basically, I have two different tables containing data related to one another. I'd like to create a new column called "id" which will associate rows in both tables so that I can appropriately display the data.
When a user takes an action, a row is inserted into both tables.
What kind of properties should "id" have? Primary key, auto-increment on both tables or one table? How do I ensure that the same ID is inserted into both rows, do I insert into table1 first, then grab that ID and insert into table2?
Any help appreciated. Thanks
It's somewhat difficult to answer your question without knowing what the two tables contain, but I suggest you read about database normalization.
Regardless of how many tables you decide to have, each table should have an id column of some sort. Having a way to uniquely refer to a single row makes life a lot easier down the road when you need to make changes to the data. Auto-increment saves you from having to come up with your own unique primary key values.
Related
I have this MySQL table, where row contact_id is unique for each user_id.
history:
- hist_id: int(11) auto_increment primary key
- user_id: int(11)
- contact_id: int(11)
- name: varchar(50)
- phone: varchar(30)
From time to time, server will receive a new list of contacts for a specific user_id and need to update this table, inserting, deleting or updating data that is different from previous information.
For example, currenty data is:
So, server receive this data:
And the new data is:
As you can see, first row (John) was updated, second row (Mary) was deleted and some other row (Jeniffer) was included.
Today what I am doing is deleting all rows with a specific user_id, and inserting the new data. But the autoincrement field (hist_id) is getting bigger and bigger...
Obs: Table have about 80 thousand records, and this update will occur 30 times a day or more.
I have some (related) questions:
1. In this scenario, do you think deleting all records from a specific user_id and inserting updated data is a good approach?
2. What about removing the autoincrement field? I don't need it, but I think it is not a good idea to have a table without a primary key.
3. Or maybe the better approach is to loop new data, selecting each user_id / contact_id for comparing values to update?
PS. For better approach I mean the most efficient way
Thank you so much for any help!
In this scenario, do you think deleting all records from a specific user_id and inserting updated data is a good approach?
Short Answer
No. You should be taking advantage of 'upsert' which is short for 'insert on duplicate key update'. What this means is that if they key pair you're inserting already exists, update the specified columns with the specified data. You then shorten your logic and reduce increments. Here's an example, using your table structure that should work. This is also assuming that you have set the user_id and contact_id fields to unique.
INSERT INTO history (user_id, contact_id, name, phone)
VALUES
(1, 23, 'James Jr.', '(619)-543-6222')
ON DUPLICATE KEY UPDATE
name=VALUES(name),
phone=VALUES(phone);
This query should retain the contact_id but overwrite the prexisting data with the new data.
What about removing the autoincrement field? I don't need it, but I think it is not a good idea to have a table without a primary key.
Primary keys do not imply auto incremented values. I could have a varchar field as the primary key containing names of fruits and vegetables. Is this optimized for performance? Probably not. There many situations that might call for auto increment and there are definite reasons to avoid it. It all depends on how you wish to access the data and how this can impact future expansion. In your situation, I would start over on the table structure and re-think how you wish to store and access the data. Do you want to write more logic to control the data OR do you want the data to flow naturally by itself? You've made a history table that is functioning more like a hybrid many-to-one crosswalk at first glance. Without looking at the remaining table structure, I can't necessarily say on a whim that it's not a good idea. What I can say is that I would do this a bit differently. I will answer this more specifically in the next question.
Or maybe the better approach is to loop new data, selecting each user_id / contact_id for comparing values to update?
I would avoid looping through the data in order to update it. That is a job for SQL and it does this job well. Sometimes, we might find ourselves in a situation where we must do this to either extract data in a specific format or to repair data in some way however, avoid doing this for inserting or updating the data. It can negatively impact performance and you will likely paint yourself into a corner.
Back to what I said toward the end of your second question which will help you see what I am talking about. I am going to assume that user_id is a primary key that is auto-incremented in your user table. I will do some guestimation here and show you an example of how you can redesign your user, contact and phone number structure. The following is a quick model I threw together that shows the foreign key relationship between the tables.
Note: The column names and overall data arrangement could be done differently but I did this quickly to give you a decent example of a normalized database structure. All of the foreign keys have a structural layout which separates your data in a way that enables you to control the flow of data as it enters and leaves your system. Here's the screenshot of the database model I threw together using MySQL Workbench.
(source: xonos.net)
Here's the SQL so that you can look at it more closely.
You'll notice that the "person" table is extracted from users but shares data with contacts. This enables you to store all "people" in one place, all "users" in another and all "contacts" in another. Now, why would we do this? The number one reason can be explained in two scenarios.
1.) Say we have someone, in this example I'll call him "Jim Bean". "Jim Bean" works for the company, so he is a user of the system. But, "Jim Bean" happens to own a side business and does contact work for the company at the same time. So, he is both a contact and a user of the system. In a more "flat table" environment, we would have two records for Jim Bean that contain the same data which could become outdated or incorrect, quickly.
2.) Let's say that Jim did some bad things and the company wants nothing to do with him anymore. They don't want any record of him - as if he never existed. All that we have to do is delete Jim Bean from the Person table. That's it. Since the foreign relationship has "CASCADE" on update/delete - this automatically propagate and clears out the other tables related to him.
I highly recommend that you do some reading on normalized data structure. It has saved me many hours once I got the hang of it and I will never go back.
I am developing a MySQL db for a user list, and I am trying to determine the most efficient way to design it.
My issue comes in that there are 3 types of users: "general", "normal", and "super". General and normal users differ only in the values of certain columns, so the schema to store them is identical. However, super users have at least 4 extra columns of info that needs to be stored.
In addition, each user needs a unique user_id for reference from other parts of the site.
So, I can keep all 3 users in the same table, but then I would have a lot of NULL values stored for the general and normal user rows.
Or, I can split the users into 2 tables: general/normal and super. This would get rid of the abundance of NULLs, but would require a lot more work to keep track of the user_ids and ensure they are unique, as I would have to handle that in my PHP instead of just doing a SERIAL column in the single table solution above.
Which solution is more efficient in terms of memory usage and performance?
Or is there another, better solution I am not seeing?
Thanks!
If each user needs a unique id, then you have the answer to your question: You want one users table with a UserId column. Often, that column would be an auto-incremented integer primary key column -- a good approach to the implementation.
What to do about the other columns? This depends on a number different factors, which are not well explained in your question.
You can store all the columns in the same table. In fact, you could then implement views so you can see users of only one type. However, if a lot of the extra columns are fixed-width (such as numbers) then space is still allocated. Whether or not this is an issue is simply a question of the nature of the columns and the relative numbers of different users.
You can also store the extra columns for each type in its own table. This would have a foreign key relationship to the original table, using the UserId. If both these keys are primary keys, then the joins should be very fast.
There are more exotic possibilities as well. If the columns do not need to be indexed, then MySQL 5.7 has support for JSON, so they could all go into one column. Some databases (particularly columnar-oriented ones) allows "vertical partitioning" where different columns in a single table are stored in separate allocation units. MySQL does not (yet) support vertical partitioning.
why not build an extra table; but only for the extra coloumns you need for super users? so 2 tables one with all the users and one with super users's extra info
If you want to have this type of schema. try to create a relation
like:
tb_user > user_id , user_type_id(int)
tb_user_type > user_type_id(int) , type_name
this way you will have just 2 tables and if the type is not set you can set a default value to a user.
thanks in advance for any help.
I have a question about foreign keys. I understand the concept of having the data from one table inserted into another for reference. But my question is, how does it get there?
Currently I have two tables and two forms. One form inserts data into table A, the other form inserts into B. Then I use a function to get the id from the last insert into A and insert it into B. Is this the proper way to do this or am I missing something?
There are two possibilities :
You know the primary key before the insertion in table A => Then your technique isn't the right one, since you're retrieving something you already added.
You don't know it (Example: auto-incremented id's) => Then your technique is the right one, and I don't think there is any other better way to achieve what you are asking for.
Note that what I called the primary key is the primary key of the row in table A, and a foreign key for rows in table B.
Short answer, I don't believe you aren't missing anything. There are many ways to accomplish what you are after, but your explanation is probably the most used and straightforward.
Another way is to use a trigger on table A to populate table B after insert (this only works if you do not need any additional user input, like form input to insert into table B).
As you cannot insert two ids at a time, yes it was an proper way.
First inserting the record on primary table, which we knows it.
Secondly, you that last insert id using the mysqli_insert_id() function
Now insert data on foreign table using this primary key.
quick question.
In my user database I have 5 separate tables all containing different information. 4 tables are connected by foreign key to the primary key of the first table.
I am wanting to trigger row inserts on the other 4 tables when I do an insert on the first (primary). I thought that with ON UPDATE CASCADE would do this for me but after trying it I realised it did not...I know clue is in the name ON UPDATE!!!!!
I also tried and failed at multiple triggers on the same table but found this was not possible either.
What I am planning on doing is putting a trigger on the first to INSERT on the second and then putting a trigger on the second to insert on the third......etc
Would just like to know if this is a wise thing to do or not or if I am missing a better and simpler way of doing this.
Any help/advice much appreciated.
Based on the given information, it "feels" as if there might be a flaw in the database design if each of the child tables requires a row for every single row in the parent table. There is a reason that "ON INSERT CASCADE" does not exist; it is typically not considered meaningful.
The first thought that comes to mind is that the child tables should actually be part of the parent table; it sounds as if there is a one-to-one relationship. It still may make sense to have separate tables from an organizational standpoint (and size of records), but it is something to think about.
If there is not a one-to-one relationship, then the ability to add meaningful data beyond default values to the child tables would imply there might be a bit more normalization of data required. If the only values to be added are NULLs, then one could maybe argue that there is no real point in having the record because a LEFT JOIN could produce the same results without that record.
Having said all that, if it is required, I would think that it would be better to have a single trigger on the parent table add all the records to the child tables rather than chain them in several triggers. That way the logic would be contained in a single location.
Not understanding your structure (the information you need in each of these tables is pertinent to correctly answer), I can only guess that a trigger might not be what you want to do this. If your tables have other fields beyond what is in table 1 and they do not have default values, how will you get the value for those other fields inthe trigger? Personally I would use a stored proc to insert to table1 and get the id value back from the insert and then insert to the other tables with the additonal information needed and put it all in a transaction so that if one insert fails all are rolled back.
I want to begin with Thank you, you guys have been good to me.
I will go straight to the question.
Having a table with over 400 columns, is that bad?
I have web forms that consists mainly of questions that require check box answers.
The total number of check boxes can run up to 400 if not more.
I actually modeled one of the forms, and put each check box in a column (took me hours to do).
Because of my unfamiliarity with database design, I did not feel like that was the right way to go.
So I read somewhere that some people use the serialize function, to store a group of check boxes as text in a column.
I just want to know it that would be the best way to store these check boxes.
Oh and some more info I will be using cakephp orm with these tables.
Thanks again in advance.
My database looks something like this
Table : Patients, Table : admitForm, Table : SomeOtherFOrm
each form table will have a PatientId
As i stated above i first attempted creating a table for each form, and then putting each check box in a column. That took me forever to do.
so i read some where serializing check boxes per question would be a good idea
So im asking would would be a good approach.
For questions with multiple options, just add another table.
The question that nobody has asked you yet is do you need to do data mining or put the answers to these checkbox questions into a where clause in a query. If you don't need to do any queries on the data that look at the data contained in these answers then you can simply serialize them up into a few fields. You could even pack them into numbers. (all who come after you will hate you if you pack the data though)
Here's my idea of a schema.
== Edit #3 ==
Updated ERD with ability to store free form answers, also linked patient_reponse_option to question_option_link table so a patients response will be saved with correct option context (we know which question the response is too). I will post a few queries soon.
== Edit #2 ==
Updated ERD with form data
== Edit #1 ==
The short answer to your question is no, 400 columns is not the right approach. As an alternative, check out the following schema:
== Original ==
According to your recent edit, you will want to incorporate a pivot table. A pivot table breaks up a M:M relationship between 'patients' and 'options', for example, many patients can have many options. For this to work, you don't need a table with 400 columns, you just need to incorporate the aforementioned pivot table.
Example schema:
// patient table
tableName: patient
id: int(11), autoincrement, unsigned, not null, primary key
name_first: varchar(100), not null
name_last: varshar(100), not null
// Options table
tableName: option
id: int(11), autoincrement, unsigned, not null, primary key
name: varchar(100), not null, unique key
// pivot table
tableName: patient_option_link
id: int(11), autoincrement, unsigned, not null, primary key
patient_id: Foreign key to patient (`id`) table
option_id: Foreign key to option (`id`) table
With this schema you can have any number of 'options' without having to add a new column to the patients table. Which, if you have a large number of rows, will crush your database if you ever have to run an alter table add column command.
I added an id to the pivot table, so if you ever need to handle individual rows, they will be easier to work with, vs having to know the patient_id and option_id.
I think I would split this out into 3 tables. One table representing whatever entity is answering the questions. A second table containing the questions themselves. Finally, a third junction table that will be populated with the primary key of the first table and the id of the question from the second table whenever the entity from the first table selects the check box for that question.
Usually 400 columns means your data could be normalized better and broken into multiple tables. 400 columns might actually be appropriate, though, depending on the use case. An example where it might be appropriate is if you need these fields on every single query AND you need to filter records using these columns (ie: use them in your WHERE clause)... in that case the SQL JOINs will likely be more expensive than having a sparsely populated "wide" table.
If you never need to use SQL to filter out records based on these "checkboxes" (I'm guessing they are yes/no boolean/tinyint type values) then serializing is a valid approach. I would go this route if I needed to use the checkbox values most of time I query the table, but don't need to use them in a WHERE clause.
If you don't need these checkbox values, or only need a small subset of them, on a majority of requests to your table then its likely you should work on breaking your table into multiple tables. One approach is to have a table with the checkbox values (id, record_id, checkbox_name, checkbox_value) where record_id is the id of your primary table record. This implies a one-to-many relationship between your primary records and your checkbox values.