I have several entries (that will grow into thousands) in a database that I want to be able to index via keywords.
So say I have a table such as:
id | username | email | description
And I want to attach multiple keywords to that user such as:
Tall | Blonde | Male | Skinny
How would I structure the table for that? Much like the keywords used in stackoverflow at the bottom of the new question?
Thanks
you need three tables
users:
id | username | email | description
tags: (will contain Tall, Blonde, Male, Skinny etc... in the name field)
id | name
users_tags:
user_id | tag_id
to make a user have a tag, you add a row to the users_tags table with the ids of the associated items.
both user_id and tag_id should be foreign keys, referring to the appropriate table, and combined, they should form a unique key (possibly primary key).
If I understood correctly, have 3 tables:
- users (user_id, username)
- keywords (keyword_id, keyword)
- user_keywords (user_id, keyword_id)
and have relations between them (on same column names as of in an example).
This is a many-to-many relationship.
You have a table of keywords and a table of objects (which can be associated with keywords).
You then have a third table which has two columns which form a composite key, each of which is a foreign key on on of the other two tables.
e.g.
object_keyword_relationship
===========================
object_id | keyword_id
1 | 1
1 | 2
Related
This question already has answers here:
Is storing a delimited list in a database column really that bad?
(10 answers)
Closed 4 years ago.
I have 2 table on my MySQL database. And My Tables like above.
Note: Its an offline and old script.
My Products Table ;
| ID | ProductName | ProductCategory |
---------------------------------------------
| 1 | Example Name | 1,2,3,4,5,6,7 |
---------------------------------------------
| 2 | Example Name | 1,2,10,11,12 |
---------------------------------------------
And My Query is like below for list my products by category.
mysql_query("SELECT * FROM products where FIND_IN_SET('".$catid."', ProductCategory)");
I can sort them for desc or asc but i want to sort them manually like i desired.
i need an idea.
Your database design violates first normal form. You SHOULDNOT insert multiple values into a column by separating with commas.
Your database schema should look like below
create table ProductTable(ProductID integer primary key, ProductName varchar(30));
create table CategoryTable(CategoryID integer primary key, CategoryName varchar(30));
create table ProductCategoryRelationTable(ProductID integer, CategoryID integer);
Why we created ProductCategoryRelationTable is because one Product can have multiple Categories and a category can belong to multiple products (multi multi relationship). Also ProductCategoryRelationTable should have composite primary key on (ProductID,CategoryID) and foreign key constraints with ProductTable and CategoryTable.
Once you have created above tables, try to express your queries around these tables.
I'm currently developing an application that allows a customer to register for an event through a custom form. That custom form will be built by the event admin for specific input by the customer.
The customer will go to the form, complete the input and pick a venue that will then display the available time-slots. I'm stuck with these two database designs and wondering which one is a better approach.
Pivot table with 3 foreign keys
Table 'Customers' -
| id | name |
Table 'Events' -
| id | name | form_fields (json)
Table 'Venues' -
| id | address | event_id |
Table 'Timeslots' -
| id | datetime | slots | venue_id |
Pivot Table 'Tickets' -
|id | customer_id | timeslot_id | event_id | form_data (json)
Two pivot tables
Table 'Customers' -
| id | name |
Table 'Events' -
| id | name | form_fields (json)
Table 'Venues' -
| id | address | event_id |
Table 'Timeslots' -
| id | datetime | slots | venue_id |
Pivot Table 'Tickets' -
| id | customer_id | timeslot_id |
Pivot Table 'EventCustomers' -
| id | customer id | event_id | form_data (json)
In addition, I will store the HTML markup of the custom form built by admin in 'form_fields' (json) and have the customer complete the form and store the values in 'form_data' (json).
Is it also sensible to have the custom form and data saved in json?
Thank you.
To answer your question(even if it's a bit off topic):
None of the above.
To model data we must ask ourselves what are the constraints. Data is often easier to define by what it cannot do, not what it can do.
For example, can you have a Tickets record that:
Does not have a customer record ( customer_id = null )
Does not have a timeslot ( timeslot_id = null) -timeslot is related to venue or the location and time of the event.
Does not have an event ( event_id = null )
If you answered no to all of these then we have to bring this data all together at one time (but, not necessarily in the same table).
Now in my mind, it's pretty clear you could/should not have a ticket that:
wasn't assigned to a customer
does not have an event
does not have a timeslot
does not have a venue
whose number exceeds the number of slots for the event (this you mostly missed on)
So I will assume these are our "basic" constraints
Problems with your second case:
you could sell a ticket to a customer for a particular timeslot ( at a venue ), but for an unknown event. Record in Tickets, and No record in the EventCustomers table
you could also have a customer registered to an event, with no ticket or timeslot/venue. Record in EventCustomers and No record in the Tickets table
To me that seems somewhat illogical, and indeed it violates the constraints I outlined above.
Problems with your first case:
On the surface the first case looks fine as far as our constraints above look. But as I worked though it some issues popped up. To understand these, as a general rule, we always want a unique index on all the foreign keys in a pivot table ( aka a unique compound key ).
So in the first case we want this(idealy):
Pivot Table 'Tickets' -
|id | customer_id | timeslot_id | event_id | form_data (json)
//for this table you would want this compound, unique index
Unique Key ticket (customer_id,timeslot_id,event_id)
This lead me to the number of "slots" as this would imply that a customer could only have one tickets record per event and timeslot/venue. This relates back to the part I said that you mostly missed on, i.e. you have no way to track how many you have used. At first you might want to allow duplicates in this table. "We can just add some more tickets in right?" - you think, and this is the easy fix, not.
Exhibit A:
Pivot Table 'Tickets' -
|id | customer_id | timeslot_id | event_id | form_data (json)
| 1 | 1 | 1 | 1 | {}
| 2 | 1 | 1 | 1 | {}
While contemplating Exhibit A consider some basic DB design rules:
In a good DB design you always want ( ideally )
a surrogate primary key, a key with no relation to the data, this is id
a natural key, a unique key that is part of the data. An example would be if you had an email field attracted to customer, you could make this unique to prevent adding duplicate customers. It's a piece of the data that is by it's nature unique and part of the data.
The first one (surrogate keys) allow you use the data with no knowledge of the data itself. This is good as it gives us some separation of concerns, some abstractions between our code and the data. When you join two tables on their primary key, foreign key relationship you don't need to know anything else about the data.
The second (natural key) is essential to prevent duplicate data. In the case of a pivot table the foreign keys, which are surrogate keys in their respective tables, become a natural key in the pivot table. These are now part of the data in the context of the pivot table and they uniquely and naturally identify that data.
Why is uniqueness so important?
Once you allow duplicates with the pivot tables you will run into several issues (especially if you have accessory data like the form_data):
How to tell those records apart?
Which of the duplicates is the authoritative copy, which is in charge.
How do you synchronize that accessory data, if you need to change form_data, which record do you change it in. Only one? Which one? Both? how do you maintain synchronizing all the duplicates.
What if an accidental duplicate gets entered, how will you know it was accidental? How do you know it's a real duplicate or true duplicate and not a valid record.
Even if you knew it was an accidental duplicat, how do you decide which one of the duplicates should be removed, this goes back to which is the authoritative record.
In short order, it really becomes a mess to deal with.
Finally (what I would suggest)
Table 'customer' -
| id | name |
Table 'event' -
| id | name | form_fields (json)
Table 'venue' -
| id | address | slots |
Table 'show' -
| id | datetime | venue_id | event_id |
Table 'purchase' -
| id | show_id | customer_id | slots | created |
Table 'ticket' ( customers_shows )
| id | purchase_id | guid |
I changed quite a few things (you can use some or all of these changes):
I changed the plural names to singular. I only use plurals when I do pivot tables that have a no accessory data, such a name would be venues_events. This is because a record from customer is a single entity, I don't need to do any joins to get useful data. A record from our hypothetical venues_events would encompass 2 entities, so I would know right away I need to do a join no matter what as there is no other data besides the foreign keys.
Now in the case of show, you may notice that is essentially a pivot table. So why did I not name it venues_events as I listed above. The reason is we have a datetime column in there, which is what I mean by "accessory" data. So in this case I could pull data just from show if I just wanted the datetime and I would not need a join to do it. So it can be considered a single entity that has some Many to One relationships. ( A Many to Many is a Many to One and a One to Many that's why we need pivot tables ) More on this table later.
Letter Casing and spacing. I would suggest using all lowercase and no spaces. MySql is case sensitive and doesn't play nice with spaces. It's just easier from a standpoint of not having to remember did we name it venuesEvents or VenuesEvents or Venuesevents etc... Consistency in naming convention is paramount in good DB design.
The above is largely Opinion based, it's my answer so it's my opinion. Deal with it.
Table show
I moved the slotscolumn to venue. I am assuming that the venue will determine how many slots are available, in my mind this is a physical requirement or attribute of the venue itself. For example a Movie theater has only X number of seats, no matter what time the movie is at doesn't change how many seats are there. If those assumptions are correct then it saves us a lot of work trying to remember how many seats a venue has every time we enter a show.
The reason I changed timeslot to show is that in both your original cases, there is some disharmony in the data model. Some things that just don't tie together as well as they should. For example your timeslots have no direct relation to the event.
Exhibit B (using your structure):
Table 'event' -
| id | name | form_fields (json) |
| 1 | "Event A" | "{}" |
| 2 | "Event B" | "{}" |
Table 'Venues' -
| id | address | event_id |
| 1 | "123 ABC SE" | 1 |
| 2 | "123 AB SE" | 2 | //address entered wrong as AB instead ABC
Table 'Timeslots' -
| id | datetime | slots | venue_id |
| 1 | "2018-01-27 04:41:23" | 200 | 1 |
| 2 | "2018-01-27 04:41:23" | 200 | 2 |
In the above exhibit, we can see right away we have to duplicate the address to create more then one event at a given venue. So if the address was entered wrong, it could be correct in some venues and incorrect in others. This can be a real issue as programmatically how do you know that AB was supposed to be ABC when the venue ID and event ID are both different for this record. Basically how do you tell those records apart at run time? You will find that it is very difficult to do. The main problem is you have to much data in Veneues, your trying to do to much with it and the relationship doesn't fit the constraints of the data.
That's not even the worst of it as a further problem creeps in, because now that the venue_id is different we can corrupt our Timeslots table and have 2 records in there at the same time for the same venue. Then, because the slots are tied to this table, we can also corrupt things down stream such as selling more tickets then we should for that time and place. Everything just starts to fracture.
Even counting the numbers of shows at a given venue becomes a real challenge, this "flaw" is in both data models you presented.
The same Data in my Model
#with Unique compound Key datetime_venue_id( show.datetime, show.venue_id)
Table 'event' -
| id | name | form_fields (json) |
| 1 | "Event A" | "{}" |
#| 2 | "Event B" | "{}" |
Table 'venue' -
| id | address | slots |
| 1 | "123 ABC SE" | 200 |
Table 'show' -
| id | datetime | venue_id | event_id |
| 1 | "2018-01-27 04:41:23" | 1 | 1 |
#| 2 | "2018-01-27 04:41:23" | 1 | 2 |
As you can see, you no longer have the duplicate address. And while it looks like you could enter in 2 shows for the same venue at the same time, this is only because we don't have a compound unique key that includes the datetime and venue_id a.k.a. Unique Key datetime_venue_id( datetime, venue_id). If you tried inserting that data with that constraint MySql would blowup on you. And if you included both inserts ( event and show ) in the same "Transaction" (which is how I would do it, in innodb engine) the whole thing would fail and get rolled back and neither the event or show would get inserted.
Now you could try to argue that you could have the same Unique constraint on Exhibit B, but as the Venue ID is different there, you would be wrong.
Anyway, show is our new main pivot table with foreign keys from event and venue and then the accessory data datetime.
Besides what I went over above, this setup gives us several advantages over the old structure, in this one table we now have access to:
what and where is the event (by joining on Table event )
when is the event ( timestamp )
how many slots available for the event (by joining on Table venue)
This centers everything around the show record. We can build a "show" independent of a customer or tickets. Because really a customer is not part of the show, and including them to soon (or to late depending how you look at it) in the data model muddies everything up.
Exhibit C
#in your first case
Pivot Table 'Tickets' -
|id | customer_id | timeslot_id | event_id | form_data (json)
#in your second case
Pivot Table 'Tickets' -
| id | customer_id | timeslot_id |
Pivot Table 'EventCustomers' -
| id | customer id | event_id | form_data (json)
AS I said above, you cant put what I am calling a show the what,where and when together without having a customer ID (in either of your data models). As you build your application around this later it will become a huge issue. This may be insurmountable at run time. Basically, you need all that data assembled and waiting on the customer_id. In both of your models that's not the case, and there is data you may not have easy access to. For example for the first case (of the old structure) how would you know that timeslot_id=20 AND event_id=32 plus a customer equals a valid ticket? There is no direct relationship between timeslot and event outside of the pivot table that contains the customer. timeslot_id=20 could be valid for any event and you have no way to know that.
It's so much easier to grab say show=32 and check how many slots are left and then just do the purchase record. Everything is ready and waiting for it.
Table purchase
I also added purchase or an order table, even if the "shows" are free this table provides us with some great utility. This is also a pivot table, but it has some accessory data just like show does. ( slots and created ).
This table
we bind the customer table to the show table here
we have a 'created' field so you will know when this record was created, when the tickets where purchased
we also have a number of slots the customer will use, we can do an aggregate sum of slots grouped on the show_id to see how many slots we have "sold". With one join from show to venue we can find out how many total slots this "show" has with the same integer key (show.id) that we used above to aggregate. Then it would be a simple matter to compare the two, if you wanted to get fancy you may be able to do this all in one query.
Table ticket
Now you may or may not even need this table. It has a many to one relationship to table purchase. So One order can have Many tickets. The records in here would be generated when a purchase is made, the number dependent on what is in slots. The primary use of this table is just to provide a unique record for each individual ticket. For this I have guid column which can just be a unique hash. Basically this would give you some tracking ability on individual tickets, I don't really have enough information to know how this will work in your case. You may even be able to replace this table with JSON data if searching on it is not a concern, and that would make maintenance of it easier in the case that some tickets are refunded. But as I hinted this is very dependent on your particular use case.
Some brief SQL examples
Joining Everything (just to show the relationships):
SELECT
{some fields}
FROM
ticket AS t
JOIN
puchase AS p ON t.purchase_id = p.id
JOIN
customer AS c ON p.customer_id = c.id
JOIN
show AS s ON p.show_id = s.id
JOIN
venue AS v ON s.venue_id = s.id
JOIN
event AS e ON s.event_id = e.id
Counting the used slots for a show:
SELECT
SUM(slots) AS used_slots
FROM
puchase
WHERE
show_id = :show_id
GROUP BY show_id
Get the available slots for a show:
SELECT
v.name,
v.slots
FROM
venue AS v
JOIN
show AS s ON s.venue_id = v.id
WHERE
v.show_id = :show_id
# or you could do s.id = :show_id
It also works out nice that all the tables start with a different letter, which makes aliasing a bit easier.
-note- The table name event may be a reserved word in MySql, I am not sure off the top of my head if it will work as a table name. Some reserved words still work in some parts of the query based on the context it's used in. Even if that is true, I am sure you can come up with a work around for it. Coincidentally this is why I named purchase that instead of order as "order" is a reserved word. (I just happen to think of event)
I hope that helps and makes sense. I probably spent way more time on this then I should have, but I design things like this for a living and I really enjoy the data architecture part of it, so I can get a bit carried away at times.
I have several Posts tables and one Votes table. How can I prevent inserting non-existing post_id (in Posts tables) in the Votes table?
// Posts_1 // Posts_2 // Posts_3
+----+---------+ +----+---------+ +----+---------+
| id | content | | id | content | | id | content |
+----+---------+ +----+---------+ +----+---------+
// Votes
+----+---------+
| id | post_id |
+----+---------+
It should be noted, in reality the structure of Posts tables is different. (all Posts tables have not the same structure), Then I can not combine all Posts tables as one table.
Now I want to prevent of inserting invalid rows in the Votes table. (invalid = post_id is not exist in the none of Posts tables)
So, If I have just one table, I can create a foreign key on the Votes.post_id reference to Posts.id, But the problem is having several Posts table. ok, well, Is there any suggest?
The table structure is crazy. You need to have a POST Index Table, which combines all the posts to one single place and gives it like this:
// Posts_Index
+----+---------+------------+
| id | post_id | post_table |
+----+---------+------------+
// Votes
+----+---------+
| id | post_id |
+----+---------+
Else you need to reverse map the way. So that, post_id -> votes.id.
A solution could be that you use post_table_ref in votes table to identify from which table the post has been voted for. This post_table_ref must be fixed for the whole application.
Depending on table from which post is coming from, you'll have to give post_table_ref in you votes update query.
E.g: if the post is coming from post_table_01 you'll have to set post_table_ref to 1 in votes update query. At the time of getting records from votes table you'll use the same post_table_ref in WHERE clause of your SELECT FROM VOTES query.
You can't in this data structure, because you don't have one primary key to reference, but three, which can have the same value multiple times in different tables.
Using a FK column for each posts_# table can have your Votes referencing no or multiple posts_# tables.
If you -really- need 3 or more different posts tables, you could create one central post_base table with PK and let the other posts tables reference it.
Table posts_base:
Id int primary key,
-- possibly other common data
then give the posts_# tables the posts_base.Id as FK and let Votes.post_id reference posts_base.Id!
Also, read some books about relational database design first; it's rather easy to create tables with somethin in it, but without knowing about foreign keys, normalization and the like, the database quickly becomes a messy trash dump, which can neither meet performance nor data integrity requirements.
Or, if you don't need it or want to code all that yourself (easier in the beginning than learning relational DB design, but troublesome later on), shift to NoSQL, where you just dump posts as documents and add Votes to these documents when submitted (so they can never be orphaned).
I have a store where I sell products with duration (expiration time for users).
I have a mysql table for users who look like this
| user_id | user_name | user_password | user_email |
and another one for products :
| product_id | product_name | product_price |
I'm selling two products, so now I'm wondering if should I add two columns in user table so it will look like this .
| user_id | user_name | user_password | user_email | product_one | product_two |
and in those fields put the date of expiration for the users who already bought the products (both of them will be blank by default),
or should I just make a new table for the purchased products and then store the appropriate user_id.
Thanks in advance, any help is appreciated.
The second choice is more like a 3NF (3rd Normal Form). I have some little experience in e-shops (mainly in Opencart) and in my opinion, and from whatever I've already seen, this is how they're working.
In fact, it's far better to have one more table which will hold the 'Orders' and a 'User_Id', and another table that will hold the 'Orders' and the 'Product_Ids' in them.
I'm neither a database nor an e-shop platform expert, but according to my experience, I'd go with the second one.
EDIT
I'm editing my current answer to add an example. So, you already have two tables, one for users (customers) and one for products. These two table are (as already mentioned) the following (I don't know the actual table names, so I'll put mine).
table 'users':
| user_id | user_name | user_password | user_email |
table 'products':
| product_id | product_name | product_price |
So, my suggestion is to introduce a new entity (let's name that entity 'order') and create a table that will contains each order matched with the user that made it. So the 'orders' table will be something like this:
| order_id | user_id |
Then you will have another table that will match each order with a product_id. In this table you can have also your 'expiration time' field. A sample of such table is the following:
table 'order_products':
| order_id | product_id | product_exp_date |
However, tha last table has a flaw: it has not a PRIMARY KEY. You have to be a little creative here and import a field in order to hold a primary key, such as order_product_id, which will hold a UNIQUE identifier for each separate product in each separate order. But you'll have to find a way on how to do this.
Hope this clarified my thought.
You want a separate table, which I will name users_products. This will allow you to add products. It's generally more flexible.
It will have these columns
user_id
product_id
expiration
You can find what current products a user possesses like this:
select u.user_name, u.user_email, p.product_name
from users u
left join users_products up on u.user_id = p.user_id
left join products p on up.product_id = p.product_id
and p.expiration >= NOW()
The primary key of your users_products table should be a compound key made of all three columns.
When you sell a user with ID 123 the product with id 321, expiring in 30 days, you represent that in your database with this query.
INSERT INTO users_products
(user_id, product_id, expiration)
VALUES ( 123, 321, NOW() + INTERVAL 30 DAY)
I'm new to database design so please bear with me. I'm using PHP and MySQL.
I have a 'movies' table that contains some details about a movie. This includes genres, which have an (if I understand correctly) many to many relationship with movies, implying a single movie can belong to different genres and a single genre can belong to different movies.
From what I gather about database design, storing this kind of relationship in one table is not a good idea as it will either violate First Normal form or Second Normal form rules.
How would I design my tables to avoid this; would I have to create a table for each genre separately or... ?
This leads my to my next question: separate tables need to have foreign keys to identify which information belongs to what row. In theory, if I had a unique key identifying each movie which I would then like to use to identify a director in a separate table, how would I create this relationship in MySQL?
Thank you for your time - if I've made anything unclear please let me know and I will try my best to clarify.
This includes genres, which have an (if I understand correctly) many
to many relationship with movies, implying a single movie can belong
to different genres and a single genre can belong to different movies.
That's right.
From what I gather about database design, storing this kind of
relationship in one table is not a good idea
Right. You're looking for something loosely along these lines.
create table movies (
movie_id integer primary key
-- other columns
);
create table genres (
genre varchar(15) primary key
-- other columns?
);
create table movie_genres (
movie_id integer not null,
genre varchar(15) not null,
primary key (movie_id, genre),
foreign key (movie_id) references movies (movie_id),
foreign key (genre) references genres (genre)
);
For directors, assuming there is only one director per movie, you can use this instead of the movies table above
create table movies (
movie_id integer primary key,
director_id integer not null,
foreign key (director_id) references directors (director_id) -- not shown.
-- other columns
);
You need one table Movies, one table Genres and one table Movies_and_Genres. The first two contain unique primary keys which you can create using the mysql autoincrement field type. The third table contains pairs of those primary keys.
create table movies (id integer not null primary key auto_increment, title ... );
create table genres (id integer not null primary key auto_increment, genre ... );
create table movies_and_genres (id_movie integer not null, id_genre integer not null);
As for the directors, this is a question of data modeling. If a movie can have more than one director, then you need a directors and a movies_and_directors table. Otherwise you need only the directors table and a director column in the movies table.
you need/should use junction tables
table movie:
| id | title | rating |
-----------------------
| 1 | foo | 5 |
| 2 | bar | 4 |
table genre:
| id | name |
----------------
| 1 | comedy |
| 2 | romance |
table director:
| id | name |
----------------
| 1 | hello |
| 2 | world |
table movie_genre:
| id | movie_id | genre_id |
----------------------------
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 1 |
table movie_director:
| id | movie_id | director_id |
-------------------------------
| 1 | 1 | 1 |
| 2 | 2 | 1 |
| 3 | 2 | 2 |
You're quite right, what you need here is to have a table of movies (e.g. tbl_movies);
pk id
name
... etc
a table for genres (eg tbl_genres);
pk id
name
... etc
and a table linking the two (eg tbl_movie_genres);
fk id_movies
fk id_genres
You can either set the pk of the tbl_movie_geners to be the two foreign keys, or you can set a standalone pk (e.g id like in the tbl_movies and tbl_genres tables above).
this way you can list as many genres per movies and they're linked through the tbl_movie_genres table; eg:
tbl_movies:
id name
1 Movie 1
2 Movie 2
tbl_genres:
id name
1 Horror
2 Action
3 Rom Com
tbl_movie_genres
id_movies id_genres
1 3
2 1
2 2
Would show you that 'Movie 1' is a rom com and 'Movie 2' is an action horror.
To satisfy the many-many relationship use a join table that holds foreign keys to both the movie and genre tables:
MOVIE
-----
ID (PK)
GENRE
-----
ID (PK)
MOVIE_GENRE
-----------
MOVIE_ID (FK that references MOVIE(ID))
GENRE_ID (FK that references GENRE(ID))
Here the MOVIE table has a primary key of ID, the GENRE table has a primary key of ID, and the MOVIE_GENRE table has two foreign key references: one to MOVIE.ID and another to GENRE.ID. The primary key for MOVIE_GENRE could either be the composite key of (MOVIE_ID,GENRE_ID) as that will be unique but you could use a synthetic key as well.
For dealing with the director table and relationship, if it is a one-to-many relationship (one director for many movies), simply add a foreign key to the MOVIE table:
DIRECTOR
--------
ID (PK)
MOVIE
-----
ID (PK)
DIRECTOR_ID (FK TO DIRECTOR(ID))
If the off chance you need to support another many-to-many relationship (many directors for many movies), use the join table approach like above.