Let's say you have got two tables like the following in a MySQL database:
TABLE people:
primary key: PERSON_ID,
NAME,
SURNAME, etc.
TABLE addresses:
primary key: ADDRESS_ID,
foreign key: PERSON_ID,
addressLine1, etc.
If you manage the creation of rows (in both table) and the retrieving of data trough PHP do you still need to create a physical relationship in the database? If yes, why?
Yes, one concrete reason is to have faster retrieving of rows if you want to join tables. Creating a foreign key constraint automatically creates a an index on the column.
So table address' schema should look like this, (assuming People's table primary key is PERSON_ID)
CREATE TABLE Address
(
Address_ID INT,
Person_ID INT,
......,
CONSTRAINT tb_pk PRIMARY KEY (Address_ID),
CONTRRAINT tb_fk FOREIGN KEY (Person_ID)
REFERENCES People(Person_ID)
)
Strictly speaking: You don't need to use FK's. careful indexing and well written query's might seem to be sufficient. However FK's and certainly FK constraints are very useful when it comes to securing data consistency (avoiding orphaned data, for example)
Suppose you wrote your application, everything is tested and it works like a charm. Great, but who's to say that you'll be around every time something has to be changed? Are you going to maintain the code by yourself or is it likely that someone else might end up doing a quick fix/tweak or implement another feature down the road? In reality, you're never going to be the only one writing and maintaining the code, and even if you are the only one maintaining the code, you're almost certainly going to encounter bugs as time passes...Foreign keys inform both your co-workers and you that data from tbl1 depends on the data from tbl2 and vice-versa. Just like comments, this makes the application easier to maintain.
Bugs are easier to detect: creating a method deleting a record from tbl1, but forgetting to update tbl2 to reflect the changes made to the first tbl. When this happens, the data is corrupted, but the query that caused this won't result in errors: the SQL is syntactically correct and the action it performs is the desired action. These kind of bugs could remain hidden for quite some time, and by the time this is spotted, god knows how much data has been corrupted...
Lastly, and this is an argument that is used all too often, what if the connection to the DB is lost mid-way through a series of update/delete query's? FK Constraints enable you to cascade certain actions. I haven't actually seen this happen, but I know of anybody who doesn't write code to protect against just such a scenarioDeleting or updating several relational records, but mid-way, the connection with the DB gets cut off for some reason. You might have edited tbl2, but the connection was lost before the query to tbl1 was sent. Again, we end up with corrupted data. FK CASCADE's are very useful here. Delete from tbl1, and set an ON DELETE CASCADE rule, so that you can rest assured that the related records are deleted from tbl2. In the same situation, ON DELETE RESTRICT, can be a fairly useful rule, too.
Note that FK's aren't the ultimate answer to life, the universe and everything (that's 42 - as we all know), but they are a vital part of true relational database-designs.
Referential integrity is an article that you should read and comprehend.
there are two ways
-first one is to handle all the things on coding end manage the things on deleting or updating a record
but when you use foreign key you are enforcing the relation and Db don't allow you to delete records with foreign key constraint especially when you don't want to delete the records related to it there is some situations accrue where you need to do this kind of tasks.
-Second way is to manage things on the Db side. If you have 1-to-many or many-to-many relations in database, foreign keys will be very useful. Also they have some good actions - RESTRICT, CASCADE, SET NULL, NO ACTION those can do some work for you
Related
I have this MySQL table, where row contact_id is unique for each user_id.
history:
- hist_id: int(11) auto_increment primary key
- user_id: int(11)
- contact_id: int(11)
- name: varchar(50)
- phone: varchar(30)
From time to time, server will receive a new list of contacts for a specific user_id and need to update this table, inserting, deleting or updating data that is different from previous information.
For example, currenty data is:
So, server receive this data:
And the new data is:
As you can see, first row (John) was updated, second row (Mary) was deleted and some other row (Jeniffer) was included.
Today what I am doing is deleting all rows with a specific user_id, and inserting the new data. But the autoincrement field (hist_id) is getting bigger and bigger...
Obs: Table have about 80 thousand records, and this update will occur 30 times a day or more.
I have some (related) questions:
1. In this scenario, do you think deleting all records from a specific user_id and inserting updated data is a good approach?
2. What about removing the autoincrement field? I don't need it, but I think it is not a good idea to have a table without a primary key.
3. Or maybe the better approach is to loop new data, selecting each user_id / contact_id for comparing values to update?
PS. For better approach I mean the most efficient way
Thank you so much for any help!
In this scenario, do you think deleting all records from a specific user_id and inserting updated data is a good approach?
Short Answer
No. You should be taking advantage of 'upsert' which is short for 'insert on duplicate key update'. What this means is that if they key pair you're inserting already exists, update the specified columns with the specified data. You then shorten your logic and reduce increments. Here's an example, using your table structure that should work. This is also assuming that you have set the user_id and contact_id fields to unique.
INSERT INTO history (user_id, contact_id, name, phone)
VALUES
(1, 23, 'James Jr.', '(619)-543-6222')
ON DUPLICATE KEY UPDATE
name=VALUES(name),
phone=VALUES(phone);
This query should retain the contact_id but overwrite the prexisting data with the new data.
What about removing the autoincrement field? I don't need it, but I think it is not a good idea to have a table without a primary key.
Primary keys do not imply auto incremented values. I could have a varchar field as the primary key containing names of fruits and vegetables. Is this optimized for performance? Probably not. There many situations that might call for auto increment and there are definite reasons to avoid it. It all depends on how you wish to access the data and how this can impact future expansion. In your situation, I would start over on the table structure and re-think how you wish to store and access the data. Do you want to write more logic to control the data OR do you want the data to flow naturally by itself? You've made a history table that is functioning more like a hybrid many-to-one crosswalk at first glance. Without looking at the remaining table structure, I can't necessarily say on a whim that it's not a good idea. What I can say is that I would do this a bit differently. I will answer this more specifically in the next question.
Or maybe the better approach is to loop new data, selecting each user_id / contact_id for comparing values to update?
I would avoid looping through the data in order to update it. That is a job for SQL and it does this job well. Sometimes, we might find ourselves in a situation where we must do this to either extract data in a specific format or to repair data in some way however, avoid doing this for inserting or updating the data. It can negatively impact performance and you will likely paint yourself into a corner.
Back to what I said toward the end of your second question which will help you see what I am talking about. I am going to assume that user_id is a primary key that is auto-incremented in your user table. I will do some guestimation here and show you an example of how you can redesign your user, contact and phone number structure. The following is a quick model I threw together that shows the foreign key relationship between the tables.
Note: The column names and overall data arrangement could be done differently but I did this quickly to give you a decent example of a normalized database structure. All of the foreign keys have a structural layout which separates your data in a way that enables you to control the flow of data as it enters and leaves your system. Here's the screenshot of the database model I threw together using MySQL Workbench.
(source: xonos.net)
Here's the SQL so that you can look at it more closely.
You'll notice that the "person" table is extracted from users but shares data with contacts. This enables you to store all "people" in one place, all "users" in another and all "contacts" in another. Now, why would we do this? The number one reason can be explained in two scenarios.
1.) Say we have someone, in this example I'll call him "Jim Bean". "Jim Bean" works for the company, so he is a user of the system. But, "Jim Bean" happens to own a side business and does contact work for the company at the same time. So, he is both a contact and a user of the system. In a more "flat table" environment, we would have two records for Jim Bean that contain the same data which could become outdated or incorrect, quickly.
2.) Let's say that Jim did some bad things and the company wants nothing to do with him anymore. They don't want any record of him - as if he never existed. All that we have to do is delete Jim Bean from the Person table. That's it. Since the foreign relationship has "CASCADE" on update/delete - this automatically propagate and clears out the other tables related to him.
I highly recommend that you do some reading on normalized data structure. It has saved me many hours once I got the hang of it and I will never go back.
I have two tables in mysql. When I insert/delete values in the first table I want that the values get duplicated in table 2 to keep them "aligned".
table1:
id - username
1 - test_user
table2:
Same id as table1 and username as table1 (on insert/delete)
I want to keep the data between the tables aligned without doing multiple queries. I've read about triggers not sure if it's the correct road, i am a beninner.
I said two tables but i will need to do this in multiple tables.
You can use Mysql triggers. This way you can auto insert/update/delete datas from second table.
MySql Using Triggers
When you INSERT new records, given that you don't want to do two inserts for some reason, using a trigger to insert into the second table will work. For UPDATE and DELETE you might want to look at the CASCADE option with foreign keys. If all you are doing is keeping the data consistent between tables, that's exactly what cascade is for.
When you create table2 you just add a foreign key like this:
FOREIGN KEY (id, username)
REFERENCES table1(id, username) ON UPDATE CASCADE ON DELETE CASCADE
Then whenever you alter table1 the changes will automatically get pushed through to table2.
Couple prerequisites for this to work:
You have to use a storage engine that supports foreign keys, something like InnoDB and not MyISAM
You need to have an index on (id,username) in table1; the foriegn key needs to match a key in the parent table
You should read the doc page for foreign keys. There are a couple other ways you can tweak them, and you should figure out what works best for your purposes.
You can certainly put triggers on your table1 to make parallel changes to your other tables as your application changes table1.
See here for the documentation: http://dev.mysql.com/doc/refman/5.0/en/trigger-syntax.html
But, you should think over your design. It will take multiple queries to do your inserts and updates; they'll just be done "behind your back" on the server. They'll still take time. Triggers can really slow things down.
Also, triggers are a little bit fragile. If you add a column to a table, you'll have to rework your triggers. Triggers are generally a pain in the neck to keep in a source-control system and a huge pain in the neck to test, so using them will make your application more troublesome to maintain.
Could you think of another approach to handling this need for duplication? Could you, for example, use a view or a join to present the data you need to your application program without actually duplicating tables and the rows in them? If you figure out how to do that you'll be much happier in the long run.
CREATE VIEW table2 AS
SELECT *
FROM table1;
will produce a "fake" table2 with the contents of table1.
Or if you're hoping to view only the test users in a second table, a view can do that for you too, for example:
CREATE VIEW table3 AS
SELECT *
FROM table1
WHERE usertype = 'test_user' ;
If you're using duplicate tables for "backup," that's a bad way to make sure your information is safe. Instead, you need to back up your MySQL server instance.
Formal relational database design principles teach us to duplicating data, but instead use view and joins to structure the data the way applications need to see it.
I am trying to create a site where users can register and create a profile, therefore I am using two MySQL tables within a database e.g. users and user_profile.
The users table has an auto increment primary key called user_id.
The user_profile table has the same primary key called user_id however it is not auto increment.
*see note for why I have multiple tables.
When a user signs up, data from the registration form is inserted into users, then the last_insert_id() is input into the user_id field of the user_profile table. I use transactions to ensure this always happens.
My question is, is this bad practice?
Should I have a unique auto increment primary key for the user_profile table, even though one user can only ever have one profile?
Maybe there are other downsides to creating a database like this?
I'd appreciate if anyone can explain why this is a problem or if it's fine, I'd like to make sure my database is as efficient as possible.
Note: I am using seperate tables for user and user_profile because user_profile contains fields that are potentially null and also will be requested much more than the user table, due to the data being displayed on a public profile.
Maybe this is also bad practice and they should be lumped in one table?
I find this a good approach, I'd give bonus point if you use a foreign key relation and preferably cascade when deleting the user from the user table.
As too separated the core user data in one table, and the option profile data in another - good job. Nothing more annoying then a 50 field dragonish entry with 90% empty values.
It is generally frowned upon, but as long as you can provide the reasoning for the 1 to 1 relationship I'm sure it is fine.
I have used them when I have hundreds of columns (and it would be more logical to split them out into separate tables)
or I need a thinner table to speed up fullscans
In your case I would use a single table and create a couple of views.
see: http://dev.mysql.com/doc/refman/5.0/en/create-view.html
In general a single table approach is more logical, quicker, simpiler, and uses less space.
I don't think it's a bad practice. Sometimes it's quite useful, especially if you want one class to deal with authentication, and not load all profile data. You can then modify how your authentication works, build web services and so on, with little care about maintaining data structures about profiles information which is likely to change as your project evolves.
This is very good practice.
It's right at the core of writing good, modular, normalised relational database structures.
This is for a sort of proof of concept draft to get things working, but don't want to have completely crap code. For my database, I tried to get true foreign key relations going using innoDB, but couldn't get it.
Instead of using foreign keys, I decided to just pull mysql_insert_id() after inserts, saving it as a variable, then putting that variable into the related table.
Is this horrible? Everything seems to work well, and I'm able to connect and relate ID's as needed. What benefits would using foreign keys give me over my method (besides updates/deletes cascading)?
To create a relation (master->detail), you have to always supply the keys by yourself, either using mysql_insert_id, natural keys or key generated by your applications. The FOREIGN KEY is not going to make that work for you.
What FOREIGN KEY does is
Helping you enforce the relationship/the integrity of your data (so the "detail" record does not point to an invalid parent)
Handles deletion or key alterations of master records (ON DELETE ..., ON UPDATE ...).
It's also creating an index in your "detail"-table for the "master_id"-row if it doesn't exist yet (okay, you could also do that without FOREIGN KEY)
Has also some kind of documenting purpose for example an ERM-tool could reengineer the relationship model from your schema (okay, this point is a slight long shot)
The cost of adding the FOREIGN KEY constraint statement is small compared to its benefits.
referring to this question, I've decided to duplicate the tables every year, creating tables with the data of the year, something like, for example:
orders_2008
orders_2009
orders_2010
etc...
Well, I know that probably the speed problem could be solved with just 2 tables for each element, like orders_history and order_actual, but I thought that once the handler code is been wrote, there will be no difference.. just many tables.
Those tables will have even some child with foreign key;
for example the orders_2008 will have the child items_2008:
CREATE TABLE orders_2008 (
id serial NOT NULL,
code character(5),
customer text
);
ALTER TABLE ONLY orders_2008
ADD CONSTRAINT orders_2008_pkey PRIMARY KEY (id);
CREATE TABLE items_2008 (
id serial NOT NULL,
order_id integer,
item_name text,
price money
);
ALTER TABLE ONLY items_2008
ADD CONSTRAINT items_2008_pkey PRIMARY KEY (id);
ALTER TABLE ONLY items_2008
ADD CONSTRAINT "$1" FOREIGN KEY (order_id) REFERENCES orders_2008(id) ON DELETE CASCADE;
So, my problem is: what do you think is the best way to replicate those tables every 1st january and, of course, keeping the table dependencies?
A PHP/Python script that, query after query, rebuild the structure for the new year (called by a cron job)?
Can the PostgreSQL's functions be used in that way?
If yes, how (an little example will be nice)
Actually I'm going for the first way (a .sql file containing the structure, and a php/python script loaded by cronjob that rebuild the structure), but i'm wondering if this is the best way.
edit: i've seen that the pgsql function CREATE TABLE LIKE, but the foreigns keys must be added in a second time.. or it will keep the new tables referencied tot he old one.
PostgreSQL has a feature that lets you create a table that inherits fields from another table. The documentation can be found in their manual. That might simplify your process a bit.
You should look at Partitioning in Postgresql. It's the standard way of doing what you want to do. It uses inheritance as John Downey suggested.
Very bad idea.
Have a look around partitioning and keep your eyes on your real goal:
You don't want table sets for every year, because this is not your problem. Lots of systems are working perfectly without them :)
You want to solve some performance and/or storage space issues.
I'd recommend orders and order_history... just periodically roll the old orders into the history, which is a read-only dataset now, so you add an index to cater for every single query you require, and it should (if your data structures are half decent) remain performant.
If your history table starts getting "too big" it's probably time to start think about data warehousing... which really is marvelous, but it certainly ain't cheap.
As others mentioned in your previous question, this is probably a bad idea. That said, if you are dead set on doing it this way, why not just create all the tables up front (say 2008-2050)?