I am unable to figure out an efficient way to establish relationships between tables. I want to have a database of books, authors, publishers and the users that sign-up and have their bookshelves (Read, Currently Reading, Want to Read (or Plan to Read)). I want the users to be able to select which books they've read, want to read or are currently reading.
P.s. I am aware of PK and FK in database table relations.
Edit: maybe this is a better way of doing it:
Then I shall use "Status" = (Read, Plant to Read and Currently reading) - please tell me if this is good and efficient!
You'll need a N:M link between books and authors, since a book might have multiple authors and each author might have written more than one book. In a RDBMS that means you'll need a written_by table.
The link between books and publishers however is different. Any given book can only have one publisher (unless in your system different editions of a book are considered the same book). So all you need here is a publisher_id foreign key in books
Lastly, and most importantly you're looking at the readers / users. And their relation to books. Naturally, this is also a N:M relation. I sure hope that people read more than one book (we all know what happens if you only ever read one...) and surely a book is read by more than one person. That calls for a book_users connection table. The real question here is, how to design it. There are three basic designs.
Separate tables by type of relation. (as outlined by #just_somebody ) Advantages: You only have INSERTS and DELETES, never UPDATES. While this looks kind of neat, and somewhat helps with query optimization, most of the time it serves no actual purpose other than showing off a big database chart.
One table with a status indicator. (as outlined by #Hardcoded) Advantages: You only have one table. Disadvantages: You'll have INSERTS, UPDATES and DELETES - something RDBMS can easily handle, but which has its flaws for various reasons (more on that later) Also, a single status field implies that one reader can have only one connection to the book at any time, meaning he could only be in the plan_to_read, is_reading or has_read status at any point in time, and it assumes an order in time this happens. If that person would ever plan to read it again, or pause, then reread from the begining etc, such a simple series of status indicators can easily fail, because all of a sudden that person is_reading now, but also has_read the thing. For most applications this still is a reasonable approach, and there are usually ways to design status fields so they are mutually exclusive.
A log. You INSERT every status as a new row in a table - the same combination of book and reader will appear more than once. You INSERT the first row with plan_to_read, and a timestamp. Another one with is_reading. Then another one with has_read. Advantages: You will only ever have to INSERT rows, and you get a neat chronology of things that happened. Disadvantages: Cross table joins now have to deal with a lot more data (and be more complex) than in the simpler approaches above.
You may ask yourself, why is there the emphasis on whether you INSERT, UPDATE or DELETE in what scenario? In short, whenever you run an UPDATE or DELETE statement you are very likely to in fact lose data. At that point you need to stop in your design process and think "What is it I am losing here?" In this case, you lose the chronologic order of events. If what users are doing with their books is the center of your application, you might very well want to gather as much data as you can. Even if it doesn't matter right now, that is the type of data which might allow you to do "magic" later on. You could find out how fast somebody is reading, how many attempts they need to finish a book, etc. All that without asking the user for any extra input.
So, my final answer is actually a question:
Would it be helpful to tell someone how many books they read last year?
Edit
Since it might not be clear what a log would look like, and how it would function, here's an example of such a table:
CREATE TABLE users_reading_log (
user_id INT,
book_id INT,
status ENUM('plans_to_read', 'is_reading', 'has_read'),
ts TIMESTAMP DEFAULT NOW()
)
Now, instead of updating the "user_read" table in your designed schema whenever the status of a book changes you now INSERT that same data in the log which now fills with a chronology of information:
INSERT INTO users_reading_log SET
user_id=1,
book_id=1,
status='plans_to_read';
When that person actually starts reading, you do another insert:
INSERT INTO users_reading_log SET
user_id=1,
book_id=1,
status='is_reading';
and so on. Now you have a database of "events" and since the timestamp column automatically fills itself, you can now tell what happened when. Please note that this system does not ensure that only one 'is_reading' for a specific user-book pair exists. Somebody might stop reading and later continue. Your joins will have to account for that.
a database table is a mathematical relation, in other words a predicate and a set of tuples ("rows") for which that predicate is true. that means each "row" in a "table" is a (true) proposition.
this may all look scary but the basic principles are really simple and worth knowing and applying rigorously: you'll better know what you're doing.
relations are simple if you start small, with the binary relation. for example, there's a binary relation > (greater than) on the set of all integers which "contains" all ordered pairs of integers x, y for which the predicate x > y holds true. note: you would not want to materialize this specific relation as a database table. :)
you want Books, Authors, Publishers and Users with their bookshelfs (Read, Currently Reading, Want to Read). what are the predicates in that? "user U has read book B", "user U is reading book B", "user U wants to read book B" would be some of them; "book B has ISBN# I, title T, author A" would be another, but some books have multiple authors. in that case, you'll do well to split it out into a separate predicate: "book B was written by author A".
CREATE TABLE book (
id INT NOT NULL PRIMARY KEY
);
CREATE TABLE author (
id INT NOT NULL PRIMARY KEY
, name TEXT NOT NULL
);
CREATE TABLE written_by (
book INT NOT NULL REFERENCES book (id)
, author INT NOT NULL REFERENCES author (id)
);
CREATE TABLE reader (
id INT NOT NULL PRIMARY KEY
);
CREATE TABLE has_read (
reader INT NOT NULL REFERENCES reader (id)
, book INT NOT NULL REFERENCES book (id)
);
CREATE TABLE is_reading (
reader INT NOT NULL REFERENCES reader (id)
, book INT NOT NULL REFERENCES book (id)
);
CREATE TABLE plans_reading (
reader INT NOT NULL REFERENCES reader (id)
, book INT NOT NULL REFERENCES book (id)
);
etc etc.
edit: C. J. Date's Introduction to Database Systems
If I was you, I'd use a schema much like the following:
TABLE user
-- Stores user's basic info.
( user_id INTEGER PRIMARY KEY
, username VARCHAR(50) NOT NULL
, password VARCHAR(50) NOT NULL
, ...
, ...
, ...
);
TABLE author
-- Stores author's basic info
( author_id INTEGER PRIMARY KEY
, author_name VARCHAR(50)
, date_of_birth DATE
, ...
, ...
, ...
);
TABLE publisher
-- Stores publisher's basic info
( publisher_id INTEGER PRIMARY KEY
, publisher_name VARCHAR(50)
, ...
, ...
, ...
);
TABLE book
-- Stores book info
( book_id INTEGER PRIMARY KEY
, title VARCHAR(50) NOT NULL
, author_id INTEGER NOT NULL
, publisher_id INTEGER NOT NULL
, published_dt DATE
, ...
, ...
, ...
, FOREIGN KEY (author_id) REFERENCES author(author_id)
, FOREIGN KEY (publisher_id) REFERENCES publisher(publisher_id)
);
TABLE common_lookup
-- This column stores common values that are used in various select lists.
-- The first three values are going to be
-- a - Read
-- b - Currently reading
-- c - Want to read
( element_id INTEGER PRIMARY KEY
, element_value VARCHAR(2000) NOT NULL
);
TABLE user_books
-- This table contains which user has read / is reading / want to read which book
-- There is a many-to-many relationship between users and books.
-- One user may read many books and one single book can be read by many users.
-- Hence we use this table to maintain that information.
( user_id INTEGER NOT NULL
, book_id INTEGER NOT NULL
, status_id INTEGER NOT NULL
, ...
, ...
, ...
, FOREIGN KEY (user_id) REFERENCES user(user_id)
, FOREIGN KEY (book_id) REFERENCES book(book_id)
, FOREIGN KEY (status_id) REFERENCES common_lookup(element_id)
);
TABLE audit_entry_log
-- This is an audit entry log table where you can track changes and log them here.
( audit_entry_log_id INTEGER PRIMARY KEY
, audit_entry_type VARCHAR(10) NOT NULL
-- Stores the entry type or DML event - INSERT, UPDATE or DELETE.
, table_name VARCHAR(30)
-- Stores the name of the table which got changed
, column_name VARCHAR(30)
-- Stores the name of the column which was changed
, primary_key INTEGER
-- Stores the PK column value of the row which was changed.
-- This is to uniquely identify the row which has been changed.
, ts TIMESTAMP
-- Timestamp when the change was made.
, old_number NUMBER(36, 2)
-- If the changed field was a number, the old value should be stored here.
-- If it's an INSERT event, this would be null.
, new_number NUMBER(36,2)
-- If the changed field was a number, the new value in it should be stored here.
-- If it's a DELETE statement, this would be null.
, old_text VARCHAR(2000)
-- Similar to old_number but for a text/varchar field.
, new_text VARCHAR(2000)
-- Similar to new_number but for a text/varchar field.
, old_date VARCHAR(2000)
-- Similar to old_date but for a date field.
, new_date VARCHAR(2000)
-- Similar to new_number but for a date field.
, ...
, ... -- Any other data types you wish to include.
, ...
);
I would then create triggers on a few tables that would track changes and enter data in the audit_entry_log table.
First of all create 4 tables for books, authors, publishers & the users. than
create a table books_authers which has relationship with table books and table authers.
create a table books_publishers which has relationship with table books and table publishers.
create a table books_user which has relationship with table books and table users. also in this table use a flag to show the book id which user Read, Currently Reading, Want to Read (or Plan to Read).
This is just markup try it
I would have a Books table, containing: title, author, publisher, isbn. A Book_Statuses table, containing an id (PK) and a status (Read, Reading, etc..). A third table for user_books, in which there would be a fk_book_id related with the Books table, and a fk_status_id which would be linked to the Book_Statuses table.
All this together gives you an easily accessible data structure.
This is assuming I understand your question. If you want to have tables for authors, publishers and books. I'd need clarification on your needs.
Your answer is the best way to do this. For example, suppose that you have books and categories tables and a book can suit more than one category. best way to keep this data creating a third table to keep book-category relations. otherwise you have to create columns for every category.
ID name comedy adventure etc
5 BookName yes no no
like this. this is the baddest thing to do. believe me. your solution is best way to do it.
and don't aware of PK & FK in Database Table Relations. if you use them good, it will be faster and safer than doing their works manually.
Related
i have two tables(innodb) in MYSQL data base both share a similar column the account_no column i want to keep both columns as integers and still keep both free from collusion when inserting data only.
there are 13 instances of this same question on stackoverflow i have read all. but in all, the recommended solutions where:
1) using GUID :this is good but am trying to keep the numbers short and easy for the users to remember.
2) using sequence :i do not fully understand how to do this but am thinking it involves making a third table that has an auto_increment and getting my values for the the two major tables from it.
3) using IDENTITY (1, 10) [1,11,21...] for the first table and the second using IDENTITY (2, 10) [2,12,22...] this works fine but in the long term might not be such a good idea.
4) using php function uniqid(,TRUE) :not going to work its not completely collision free and the columns in my case have to be integers.
5) using php function mt_rand(0,10): might work but i still have to check for collisions before inserting data.
if there is no smarter way to archive my goal i would stick with using the adjusted IDENTITY (1, 10) and (2, 10).
i know this question is a bit dumb seeing all the options i have available but the most recent answer on a similar topic was in 2012 there might have been some improvements in the MYSQL system that i do not know about yet.
also am using php language to insert the data thanks.
Basically, you are saying that you have two flavors of an entity. My first recommendation is to try to put them in a single table. There are three methods:
If most columns overlap, just put all the columns in a single table (accounts).
If one entity has more columns, put the common columns in one table and have a second table for the wider entity.
If only some columns overlap, put those in a single table and have a separate table for each subentity.
Let met assume the third situation for the moment.
You want to define something like:
create table accounts (
AccountId int auto_increment primary key,
. . . -- you can still have common columns here
);
create table subaccount_1 (
AccountId int primary key,
constraint foreign key (AccountId) references accounts(AccountId),
. . .
);
create table subaccount_2 (
AccountId int primary key,
constraint foreign key (AccountId) references accounts(AccountId),
. . .
);
Then, you want an insert trigger on each sub-account table. This trigger does the following on insert:
inserts a row into accounts
captures the new accountId
uses that for the insert into the subaccount table
You probably also want something on accounts that prevents inserts into that table, except through the subaccount tables.
A big thank you to Gordon Linoff for his answer i want to fully explain how i solved the problem using his answer to help others understand better.
original tables:
Table A (account_no, fist_name, last_name)
Table B (account_no, likes, dislikes)
problem: need account_no to auto_increment across both tables and be unique across both tables and remain a medium positive integer (see original question).
i had to make an extra Table_C to which will hold all the inserted data at first, auto_increment it and checks for collisions through the use of primary_key
CREATE TABLE Table_C (
account_no int NOT NULL AUTO_INCREMENT,
fist_name varchar(50),
last_name varchar(50),
likes varchar(50),
dislikes varchar(50),
which_table varchar(1),
PRIMARY KEY (account_no)
);
Then i changed MySQL INSERT statement to insert to Table_C and added an extra column which_table to say which table the data being inserted belong to and Table_C on insert of data performs auto_increment and checks collision then reinsert the data to the desired table through the use of triggers like so:
CREATE TRIGGER `sort_tables` AFTER INSERT ON `Table_C` FOR EACH ROW
BEGIN
IF new.which_table = 'A' THEN
INSERT INTO Table_A
VALUES (new.acc_no, new.first_name, new.last_name);
ELSEIF new.which_table = 'B' THEN
INSERT INTO Table_B
VALUES (new.acc_no, new.likes, new.dislikes);
END IF;
END
I have these tables:
create table person (
person_id int unsigned auto_increment,
person_key varchar(40) not null,
primary key (person_id),
constraint uc_person_key unique (person_key)
)
-- person_key is a varchar(40) that identifies an individual, unique
-- person in the initial data that is imported from a CSV file to this table
create table marathon (
marathon_id int unsigned auto_increment,
marathon_name varchar(60) not null,
primary key (marathon_id)
)
create table person_marathon (
person_marathon _id int unsigned auto_increment,
person_id int unsigned,
marathon_id int unsigned,
primary key (person_marathon_id),
foreign key person_id references person (person_id),
foreign key marathon_id references person (marathon_id),
constraint uc_marathon_person unique (person_id, marathon_id)
)
Person table is populated by a CSV that contains about 130,000 rows. This CSV contains a unique varchar(40) for each person and some other person data. There is no ID in the CSV.
For each marathon, I get a CSV that contains a list of 1k - 30k persons. The CSV contains essentially just a list of person_key values that show which people participated in that specific marathon.
What is the best way to import the data into the person_marathon table to maintain the FK relationship?
These are the ideas I can currently think of:
Pull the person_id + person_key information out of MySQL and merge the person_marathon data in PHP to get the person_id in there before inserting into the person_marathon table
Use a temporary table for insert... but this is for work and I have been asked to never use temporary tables in this specific database
Don't use a person_id at all and just use the person_key field but then I would have to join on a varchar(40) and that's usually not a good thing
Or, for the insert, make it look something like this (I had to insert the <hr> otherwise it wouldn't format the whole insert as code):
insert into person_marathon
select p.person_id, m.marathon_id
from ( select 'person_a' as p_name, 'marathon_a' as m_name union
select 'person_b' as p_name, 'marathon_a' as m_name )
as imported_marathon_person_list
join person p
on p.person_name = imported_marathon_person_list.p_name
join marathon m
on m.marathon_name = imported_marathon_person_list.m_name
The problem with that insert is that to build it in PHP, the imported_marathon_person_list would be huge because it could easily be 30,000 select union items. I'm not sure how else to do it, though.
I've dealt with similar data conversion problems, though at a smaller scale. If I'm understanding your problem correctly (which I'm not sure of), it sounds like the detail that makes your situation challenging is this: you're trying to do two things in the same step:
import a large number of rows from CSV into mysql, and
do a transformation such that the person-marathon associations work through person_id and marathon_id, rather than the (unwieldy and undesirable) varchar personkey column.
In a nutshell, I would do everything possible to avoid doing both of these things in the same step. Break it into those two steps - import all the data first, in tolerable form, and optimize it later. Mysql is a good environment to do this sort of transformation, because as you import the data into the persons and marathons tables, the IDs are set up for you.
Step 1: Importing the data
I find data conversions easier to perform in a mysql environment than outside of it. So get the data into mysql, in a form that preserves the person-marathon associations even if it isn't optimal, and worry about changing the association approach afterwards.
You mention temp tables, but I don't think you need any. Set up a temporary column "personkey", on the persons_marathons table. When you import all the associations, you'll leave person_id blank for now, and just import personkey. Importantly, ensure that personkey is an indexed column both on the associations table and on the persons table. Then you can go through later and fill in the correct person_id for each personkey, without worrying about mysql being inefficient.
I'm not clear on the nature of the marathons table data. Do you have thousands of marathons to enter? If so, I don't envy you the work of handling 1 spreadsheet per marathon. But if it's fewer, then you can perhaps set up the marathons table by hand. Let mysql generate marathon IDs for you. Then as you import the person_marathon CSV for each marathon, be sure to specify that marathon ID in each association relevant to that marathon.
Once you're done importing the data, you have three tables:
* persons - you have the ugly personkey, as well as a newly generated person_id, plus any other fields
* marathons - you should have a marathon_id at this point, right? either newly generated, or a number you've carried over from some older system.
* persons_marathons - this table should have marathon_id filled in & pointing to the correct row in the marathons table, right? You also have personkey (ugly but present) and person_id (which is still null).
Step 2: Use personkey to fill in person_id for each row in the association table
Then you either use straight Mysql, or write a simple PHP script, to fill in person_id for each row in the persons_marathons table. If I'm having trouble getting mysql to do this directly, I'll often write a php script to deal with a single row at a time. The steps in this would be simple:
look up any 1 row where person_id is null but personkey is not null
look up that personkey's person_id
write that person_id in the associations table for that row
You can tell PHP to repeat this 100 times then end script, or 1000 times, if you keep getting timeout problems or anything like taht.
This transformation involves a huge number of lookups, but each lookup only needs to be for a single row. That's appealing because at no point do you need to ask mysql (or PHP) to "hold the whole dataset in its head".
At this point, your associations table should have person_id filled in for every row. It's now safe to delete the personkey column, and voila, you have your efficient foreign keys.
Weirdly I have done a lot of development with mySQL and never encountered some of the things I have encountered todays.
So, I have a user_items table
ID | name
---------
1 | test
I then have an item_data table
ID | item | added | info
-------------------------
1 | test | 12345 | important info
2 | test | 23456 | more recent important info
I then have an emails table
ID | added | email
1 | 12345 | old#b.com
2 | 23456 | a#b.com
3 | 23456 | b#c.com
and an emails_verified table
ID | email
-----------
1 | a#b.com
Now I appreciate the setup of these tables may not be efficient etc, but this cannot be changed, and is a lot more complex than it may seem.
What i want to do is as follows. I want to be able to search through a users items and display the associated info, as well as any emails associated, as well as displaying if the email has been verified.
user_items.name = item_data.item
item_data.added = emails.added
emails.email = emails_verified.email
So for user item 1, test. I want to be able to return its ID, its name, the most recent information, the most recent emails, and their verification status.
So I woud like to return
ID => 1
name => test
information => more recent important info
emails => array('0' => array('email' => 'a#b.com' , 'verified' => 'YES'),'1' => array('email' => 'b#c.com' , 'verified' => 'NO'))
Now I could do this with multiple queries with relative ease. My research however suggests that this is significantly more resource/time costly then using one (albeit very complex) mysql query with loads of join statements.
The reason using one query would also would be useful (I believe) is because I can then add search functionality with relative ease - adding to the query complex where statements.
To further complicated matters I am using CodeIgniter. I cannot be too picky :) so any none CI answers would still be very useful.
The code I have got thus far is as follows. It is however very much 'im not too sure what im doing'.
function test_search()
{
$this->load->database();
$this->db->select('user_items.*,item_data.*');
$this->db->select('GROUP_CONCAT( emails.email SEPARATOR "," ) AS emails', FALSE);
$this->db->select('GROUP_CONCAT( IF(emailed.email,"YES","NO") SEPARATOR "," ) AS emailed', FALSE);
$this->db->where('user_items.name','test');
$this->db->join('item_data','user_items.name = item_data.name','LEFT');
$this->db->join('emails','item_data.added = emails.added','LEFT');
$this->db->join('emailed','emails.email = emailed.email','LEFT');
$this->db->group_by('user_items.name');
$res = $this->db->get('user_items');
print_r($res->result_array());
}
Any help with this would be very much appreciated.
This is really complex sql - is this really the best way to achieve this functionality?
Thanks
UPDATE
Following on from Cryode's excellent answer.
The only thing wrong with it is that it only returns one email. By using GROUP_CONCAT however I have been able to get all emails and all email_verified statuses into a string which I can then explode with PHP.
To clarify is the subquery,
SELECT item, MAX(added) AS added
FROM item_data
GROUP BY item
essentially creating a temporary table?
Similar to that outlined here
Surely the subquery is necessary to make sure you only get one row from item_data - the most recent one?
And finally to answer the notes about the poorly designed database.
The database was designed this way as item_data is changed regularly but we want to keep historical records.
The emails are part of the item data but because there can be any number of emails, and we wanted them to be searchable we opted for a seperate table. Otherwise the emails would have to be serialized within the item_data table.
The emails_verified table is seperate as an email can be associated with more than one item.
Given that, although (clearly) complicated for querying it still seems a suitable setup..?
Thanks
FINAL UPDATE
Cryodes answer is a really useful answer relating to database architecture in general.
Having conceptualised this a little more, if we store the version id in user_items we dont need the subquery.
Because none of the data between versions is necessarily consistent we will scrap his proposed items table(for this case).
We can then get the correct version from a item_data tables
We can also get the items_version_emails rows based on the version id and from this get the respective emails from our 'emails' table.
I.E It works perfectly.
The downside of this is that when I add new version data in item_data I have to update the user_items table with the new version that has been inserted.
This is fine, but simply as a generalized point what is quicker?
I assume the reason such a setup has been suggested is that it is quicker - an extra update each time new data is added is worth it to save potentially hundreds of subqueries when lots of rows are being displayed. Especially given that we display the data more than we update it.
Just for knowledge when in future designing database architecture does anyone have any links/general guidance on what is quicker and why such that we can all make better optimized databases.
Thanks again to Cryode !!
Using your database structure, this is what I came up with:
SELECT ui.name, id.added, id.info, emails.email,
CASE WHEN ev.id IS NULL THEN 'NO' ELSE 'YES' END AS email_verified
FROM user_items AS ui
JOIN item_data AS id ON id.item = ui.name
JOIN (
SELECT item, MAX(added) AS added
FROM item_data
GROUP BY item
) AS id_b ON id_b.item = id.item AND id_b.added = id.added
JOIN emails ON emails.added = id.added
LEFT JOIN emails_verified AS ev ON ev.email = emails.email
But as others have pointed out, the database is poorly designed. This query will not perform well on a table with a lot of data, since there are no aggregate functions for this purpose. I understand that in certain situations you have little to no control over database design, but if you want to actually create the best situation, you should be emphatic to whomever can control it that it can be improved.
One of the biggest optimizations that could be made is to add the current item_data ID to the user_items table. That way the subquery to pull that wouldn't be necessary (since right now we're essentially joining item_data twice).
Converting this to CI's query builder is kind of a pain in the ass because of the sub query. Assuming you're only working with MySQL DBs, just stick with $this->db->query().
Added from your edit:
This query returns one email per row, it does not group them together. I left the CONCAT stuff out because it's one more thing that slows down your query -- your PHP can put the emails together afterwards much faster.
Yes, the subquery is that part -- a query within a query (pretty self-explanatory name :wink:). I wouldn't call it creating a temporary table, because that's something you can actually do. More like retrieving a subset of the information in the table, and using it kind of like a WHERE clause. The subquery is what finds the most recent row in your item_data table, since we have to figure it out ourselves (again, proper database design would eliminate this).
When we say you can optimize your database design, it doesn't mean you can't have it set up in a similar way. You made it sound like the DB could not be altered at all. You have the right idea as far as the overall scheme, you're just implementing it poorly.
Database Design
Here's how I would lay this out. Note that without knowing the whole extent of your project, this may need modification. May also not be 100% the best optimized on the planet -- I'm open for suggestions for improvement. Your mileage may vary.
User Items
CREATE TABLE `users_items` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(11) NOT NULL,
`item_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Defines the relationship between a base item and a user.
Items
CREATE TABLE `items` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`item_name` varchar(50) NOT NULL DEFAULT '',
`created_on` datetime NOT NULL,
`current_version` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Your items table should have all of your items' base information -- things that will not change on a per-revision basis. Notice the current_version column -- this is where you'll store the ID from the versions table, indicating which is most recent (so we don't have to figure it out ourselves).
Items Versions (history)
CREATE TABLE `items_versions` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`item_id` int(10) unsigned NOT NULL,
`added` datetime NOT NULL,
`info` text,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Here is where you'd store the history of an item -- each update would create a new row here. Note that the item_id column is what ties this row to a particular base item.
Emails
CREATE TABLE `emails` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`email` varchar(100) NOT NULL DEFAULT '',
`verified` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Since emails can be shared between multiple products, we'll end up using what's called a many-to-many relationship. Emails can be tied to multiple products, and a product can be tied to multiple emails. Here we defined our emails, and include a verified column for whether it has been verified or not.
Item Emails
CREATE TABLE `items_versions_emails` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`version_id` int(11) NOT NULL,
`email_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
Assuming the emails are tied to an item version and not the base item, this is the structure you want. Unfortunately, if you have a ton of versions and never change the email(s), this will result in a lot of repeated data. So there's room for optimization here. If you tie emails to the base item, you'll have less repeated data, but you'll lose the history. So there's options for this. But the goal is to show how to set up DB relationships, not be 100% perfect.
That should give you a good start on how to better lay out your DB structure.
Another Update
Regarding speed, inserting a new item version and then updating the related item row with the new version ID will give you much better performance than requiring a subquery to pull the latest update. You'll notice in the solution for your original structure, the item_info table is being joined twice -- once to join the most recent rows, and again to grab the rest of the data from that recent row (because of the way GROUP BY works, we can't get it in a single join). If we have the recent version ID stored already, we don't need the first join at all, which will improve your speed dramatically (along with proper indexing, but that's another lesson).
I wouldn't recommend ditching the base items table, but that's really up to you and your application's needs. Without a base item, there's no real way to track the history of that particular item. There's nothing in the versions that shows a common ancestor/history, assuming you're removing the item_id column.
In MySQL, is it possible to have a column in two different tables that auto-increment? Example: table1 has a column of 'secondaryid' and table2 also has a column of 'secondaryid'. Is it possible to have table1.secondaryid and table2.secondaryid hold the same information? Like table1.secondaryid could hold values 1, 2, 4, 6, 7, 8, etc and table2.secondaryid could hold values 3, 5, 9, 10? The reason for this is twofold: 1) the two tables will be referenced in a separate table of 'likes' (similar to users liking a page on facebook) and 2) the data in table2 is a subset of table1 using a primary key. So the information housed in table2 is dependent on table1 as they are the topics of different categories. (categories being table1 and topics being table2). Is it possible to do something described above or is there some other structural work around that im not aware of?
It seems you want to differentiate categories and topics in two separate tables, but have the ids of both of them be referenced in another table likes to facilitate users liking either a category or a topic.
What you can do is create a super-entity table with subtypes categories and topics. The auto-incremented key would be generated in the super-entity table and inserted into only one of the two subtype tables (based on whether it's a category or a topic).
The subtype tables reference this super-entity via the auto-incremented field in a 1:1 relationship.
This way, you can simply link the super-entity table to the likes table just based on one column (which can represent either a category or a topic), and no id in the subtype tables will be present in both.
Here is a simplified example of how you can model this out:
This model would allow you to maintain the relationship between categories and topics, but having both entities generalized in the superentity table.
Another advantage to this model is you can abstract out common fields in the subtype tables into the superentity table. Say for example that categories and topics both contained the fields title and url: you could put these fields in the superentity table because they are common attributes of its subtypes. Only put fields which are specific to the subtype tables IN the subtype tables.
If you just want the ID's in the two tables to be different you can initially set table2's AUTO_INCREMENT to some big number.
ALTER TABLE `table2` AUTO_INCREMENT=1000000000;
You can't have an auto_increment value shared between tables, but you can make it appear that it is:
set ##auto_increment_increment=2; // change autoinrement to increase by 2
create table evens (
id int auto_increment primary key
);
alter table evens auto_increment = 0;
create table odds (
id int auto_increment primary key
);
alter table odds auto_increment = 1;
The downside to this is that you're changing a global setting, so ALL auto_inc fields will now be growing by 2 instead of 1.
It sounds like you want a MySQL equivalent of sequences, which can be found in DBMS's like PosgreSQL. There are a few known recipes for this, most of which involve creating table(s) that track the name of the sequence and an integer field that keeps the current value. This approach allows you to query the table that contains the sequence and use that on one or more tables, if necessary.
There's a post here that has an interesting approach on this problem. I have also seen this approach used in the DB PEAR module that's now obsolete.
You need to set the other table's increment value manually either by the client or inside mysql via an sql function:
ALTER TABLE users AUTO_INCREMENT = 3
So after inserting into table1 you get back the last auto increment then modify the other table's auto increment field by that.
I'm confused by your question. If table 2 is a subset of table 3, why would you have it share the primary key values. Do you mean that the categories are split between table 2 and table 3?
If so, I would question the design choice of putting them into separate tables. It sounds like you have one of two different situations. The first is that you have a "category" entity that comes in two flavors. In this case, you should have a single category table, perhaps with a type column that specifies the type of category.
The second is that your users can "like" things that are different. In this case, the "user likes" table should have a separate foreign key for each object. You could pull off a trick using a composite foreign key, where you have the type of object and a regular numeric id afterwards. So, the like table would have "type" and "id". The person table would have a column filled with "PERSON" and another with the numeric id. And the join would say "on a.type = b.type and a.id = b.id". (Or the part on the "type" could be implicit, in the choice of the table).
You could do it with triggers:
-- see http://dev.mysql.com/doc/refman/5.0/en/information-functions.html#function_last-insert-id
CREATE TABLE sequence (id INT NOT NULL);
INSERT INTO sequence VALUES (0);
CREATE TABLE table1 (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
secondardid INT UNSIGNED NOT NULL DEFAULT 0,
PRIMARY KEY (id)
);
CREATE TABLE table2 (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
secondardid INT UNSIGNED NOT NULL DEFAULT 0,
PRIMARY KEY (id)
);
DROP TRIGGER IF EXISTS table1_before_insert;
DROP TRIGGER IF EXISTS table2_before_insert;
DELIMITER //
CREATE
TRIGGER table1_before_insert
BEFORE INSERT ON
table1
FOR EACH ROW
BEGIN
UPDATE sequence SET id=LAST_INSERT_ID(id+1);
NEW.secondardid = LAST_INSERT_ID();
END;
//
CREATE
TRIGGER table2_before_insert
BEFORE INSERT ON
table2
FOR EACH ROW
BEGIN
UPDATE sequence SET id=LAST_INSERT_ID(id+1);
NEW.secondardid = LAST_INSERT_ID();
END;
//
Hello Im in the midst of creating a social networking site and I would like to know how I would go about creating the relationships between users. Many sites say that I should create a relationship/friend table, but Im looking into the future and believe that this would be ineffective. This idea could be as popular as facebook and I want to be ready for that many users. Facebook has 400 million users so a friends table would be at least 150 times that. Doing a query for ones friends would be very slow I would think. So would the solution be a seperate table for each user containing their friends ID's. or an associated CSV file containing the ID's. Any help would be greatly appreciated to the design of my site. Thanks
Build the schema you need today, not the one you think you'll need 5 years from now.
Do you think facebook designed their schema to support 400 million users on day one? Of course not. Building for that kind of scale is complicated, expensive, and honestly, if you try it now, you'll probably get it wrong and have to redo it later anyway.
And let's be honest: you have a better chance of winning the lottery than hitting 400 million users any time soon. Even if you do, your project will have hundreds of engineers by then -- plenty of bandwidth for redesigning your schema.
Now's the time to build simple.
Edit to add some solid examples:
Youtube:
They went through a common evolution:
single server, went to a single master
with multiple read slaves, then
partitioned the database, and then
settled on a sharding approach.
Keep it simple! Simplicity allows you
to rearchitect more quickly so you can
respond to problems. It's true that
nobody really knows what simplicity
is, but if you aren't afraid to make
changes then that's a good sign
simplicity is happening.
Livejournal also grew from a single database on a single server to multiple sharded replicated databases
I'm sure you could find a dozen more examples on the highscalability blog
While you think of eventually supporting millions of users, you're only ever seeing a particular persons friends list - that limits the actual amount of data substantially...
In order to maintain normalized friendship relationships in the database, you'd need two tables:
USERS
user_id (primary key)
username
FRIENDS
user_id (primary key, foreign key to USERS(user_id))
friend_id (primary key, foreign key to USERS(user_id))
This will stop duplicates (IE: 1, 2) from happening, but won't stop reversals because (2, 1) is valid. You'd need a trigger to enforce that there's only one instance of the relationship...
In your code, when inserting relationships into table, follow a convention.
issueSQLQuery("INSERT INTO relationships (friend1, friend2)
VALUES (?, ?)", min(friend_1_ID, friend_2_ID), max(friend_1_ID, friend_2_ID))
Do similarly for retrievals, as well. Of course, this could be done in a stored procedure.
Both of the alternatives you've suggested would no doubt result in grief - imagine 400 million tables, or managing 400 million files.
Definitely best to maintain a properly indexed relationships table.
If you expect the levels of success attained by Facebook (I like your confidence), you will soon realize what they realized. Relational databases begin to fall short and you'll want to look into NoSQL solutions.
That being said, why pre-optimize for 400 millions users? Build a system that will work now for, say, 500, 000 users. If you need to redesign after that, then you must be very successful and will have the resources to do so.
something like this should do you initially: http://pastie.org/1127206
drop table if exists user_friends;
drop table if exists users;
create table users
(
user_id int unsigned not null auto_increment primary key,
username varchar(32) unique not null,
created_date datetime not null
)
engine=innodb;
delimiter #
create trigger users_before_ins_trig before insert on users
for each row
begin
set new.created_date = now();
end#
delimiter ;
create table user_friends
(
user_id int unsigned not null,
friend_user_id int unsigned not null,
created_date datetime not null,
primary key (user_id, friend_user_id), -- note clustered composite PK
foreign key (user_id) references users(user_id),
foreign key (friend_user_id) references users(user_id)
)
engine=innodb;
delimiter #
create trigger user_friends_before_ins_trig before insert on user_friends
for each row
begin
set new.created_date = now();
end#
delimiter ;
drop procedure if exists insert_user;
delimiter #
create procedure insert_user
(
in p_username varchar(32)
)
proc_main:begin
insert into users (username) values (p_username);
end proc_main #
delimiter ;
drop procedure if exists insert_user_friend;
delimiter #
create procedure insert_user_friend
(
in p_user_id int unsigned,
in p_friend_user_id int unsigned
)
proc_main:begin
if p_user_id = p_friend_user_id then
leave proc_main;
end if;
insert into user_friends (user_id, friend_user_id) values (p_user_id, p_friend_user_id);
end proc_main #
delimiter ;
drop procedure if exists list_user_friends;
delimiter #
create procedure list_user_friends
(
in p_user_id int unsigned
)
proc_main:begin
select
u.*
from
user_friends uf
inner join users u on uf.friend_user_id = u.user_id
where
uf.user_id = p_user_id
order by
u.username;
end proc_main #
delimiter ;
call insert_user('f00');
call insert_user('bar');
call insert_user('bish');
call insert_user('bash');
call insert_user('bosh');
select * from users;
call insert_user_friend(1,2);
call insert_user_friend(1,3);
call insert_user_friend(1,4);
call insert_user_friend(1,1); -- oops
call insert_user_friend(2,1);
call insert_user_friend(2,5);
select * from user_friends;
call list_user_friends(1);
call list_user_friends(2);
-- call these stored procs from your php !!
You could accomplish this using a table to represent the "Relationship" that one user has with another user. This is essentially a JOIN table between two different rows in the same table. An example join table might include the following columns:
USER_1_ID
USER_2_ID
To get a list of friends write a query that performs an INNER JOIN from the USER in question to the RELATIONSHIP table back to a second instance on the USER table.