How would things like customer reviews be stored in a database? I cant imagine there would be rows for each item and columns for each review as one product may have 2 reviews and another may have 100+ - id presume they were stored in a separate file for reviews but then surely not one file per item! I dont know enough about storing data to be able to figure this one out by myself!
A similar situation is something like an online calendar - there is all the information about each appointment (time, duration, location, etc) and there can be many of these on each day, every day, for all users! A logical way would be to have a table for each user with all their appointments in, but at the same time that seems illogical because if you have 1000+ users, thats alot of tables!
Basically Id like to know what the common/best practice way is of storing this 'big dynamic data'.
Customer reviews can easily be stored by using two tables in one-to-many relationship.
Suppose you have a table containing products/articles/whatever worth reviewing. Each of them has an unique ID and other attributes.
Table "products"
+-------------------------------------+
| id | name | attribute1 | attribute2 |
+-------------------------------------+
Then you make another table, with its name indicating what it's about. It should contain at least an unique ID and a column for the IDs from the other table. Let's say it will also have an email of the user who submitted the review and (obviously) the review text itself:
Table "products_reviews"
+--------------------------------------------+
| id | product_id | user_email | review_text |
+--------------------------------------------+
So far, so good. Let's assume you're selling apples.
Table "products"
+-------------------------------+
| 1 | 'Apple' | 'green' | '30$' |
+-------------------------------+
Then, two customers come, each one buys one apple worth 30$ and likes it, so they both leave a review.
Table "products_reviews"
+-------------------------------------------------------------------------------+
| 1 | 2 | alice#mail.com | 'I really like these green apples, they are awesome' |
| 2 | 2 | bob#mail.com | 'These apples rock!' |
+-------------------------------------------------------------------------------+
So now all you have to do is to fetch all the reviews for your apples and be happy about how much your customers like them:
SELECT *
FROM products_reviews
INNER JOIN products ON products_reviews.product_id = products.id
WHERE products.name = 'Apple';
You can now display them under the shopping page for apples (just don't mention they cost 30$).
The same principle applies for things like an online calendar. You have one table with users, and many tables with other stuff - appointments, meetings, etc. which relate to that user.
Keep in mind, however, that things like meetings are better displayed in a many-to-many table, since they are shared by many people (usually). Here's a link that visualizes it very good, and here's a question here on SO with sample code for PHP. Go ahead and test it for yourself.
Cheers :)
Related
I'm currently developing an application that allows a customer to register for an event through a custom form. That custom form will be built by the event admin for specific input by the customer.
The customer will go to the form, complete the input and pick a venue that will then display the available time-slots. I'm stuck with these two database designs and wondering which one is a better approach.
Pivot table with 3 foreign keys
Table 'Customers' -
| id | name |
Table 'Events' -
| id | name | form_fields (json)
Table 'Venues' -
| id | address | event_id |
Table 'Timeslots' -
| id | datetime | slots | venue_id |
Pivot Table 'Tickets' -
|id | customer_id | timeslot_id | event_id | form_data (json)
Two pivot tables
Table 'Customers' -
| id | name |
Table 'Events' -
| id | name | form_fields (json)
Table 'Venues' -
| id | address | event_id |
Table 'Timeslots' -
| id | datetime | slots | venue_id |
Pivot Table 'Tickets' -
| id | customer_id | timeslot_id |
Pivot Table 'EventCustomers' -
| id | customer id | event_id | form_data (json)
In addition, I will store the HTML markup of the custom form built by admin in 'form_fields' (json) and have the customer complete the form and store the values in 'form_data' (json).
Is it also sensible to have the custom form and data saved in json?
Thank you.
To answer your question(even if it's a bit off topic):
None of the above.
To model data we must ask ourselves what are the constraints. Data is often easier to define by what it cannot do, not what it can do.
For example, can you have a Tickets record that:
Does not have a customer record ( customer_id = null )
Does not have a timeslot ( timeslot_id = null) -timeslot is related to venue or the location and time of the event.
Does not have an event ( event_id = null )
If you answered no to all of these then we have to bring this data all together at one time (but, not necessarily in the same table).
Now in my mind, it's pretty clear you could/should not have a ticket that:
wasn't assigned to a customer
does not have an event
does not have a timeslot
does not have a venue
whose number exceeds the number of slots for the event (this you mostly missed on)
So I will assume these are our "basic" constraints
Problems with your second case:
you could sell a ticket to a customer for a particular timeslot ( at a venue ), but for an unknown event. Record in Tickets, and No record in the EventCustomers table
you could also have a customer registered to an event, with no ticket or timeslot/venue. Record in EventCustomers and No record in the Tickets table
To me that seems somewhat illogical, and indeed it violates the constraints I outlined above.
Problems with your first case:
On the surface the first case looks fine as far as our constraints above look. But as I worked though it some issues popped up. To understand these, as a general rule, we always want a unique index on all the foreign keys in a pivot table ( aka a unique compound key ).
So in the first case we want this(idealy):
Pivot Table 'Tickets' -
|id | customer_id | timeslot_id | event_id | form_data (json)
//for this table you would want this compound, unique index
Unique Key ticket (customer_id,timeslot_id,event_id)
This lead me to the number of "slots" as this would imply that a customer could only have one tickets record per event and timeslot/venue. This relates back to the part I said that you mostly missed on, i.e. you have no way to track how many you have used. At first you might want to allow duplicates in this table. "We can just add some more tickets in right?" - you think, and this is the easy fix, not.
Exhibit A:
Pivot Table 'Tickets' -
|id | customer_id | timeslot_id | event_id | form_data (json)
| 1 | 1 | 1 | 1 | {}
| 2 | 1 | 1 | 1 | {}
While contemplating Exhibit A consider some basic DB design rules:
In a good DB design you always want ( ideally )
a surrogate primary key, a key with no relation to the data, this is id
a natural key, a unique key that is part of the data. An example would be if you had an email field attracted to customer, you could make this unique to prevent adding duplicate customers. It's a piece of the data that is by it's nature unique and part of the data.
The first one (surrogate keys) allow you use the data with no knowledge of the data itself. This is good as it gives us some separation of concerns, some abstractions between our code and the data. When you join two tables on their primary key, foreign key relationship you don't need to know anything else about the data.
The second (natural key) is essential to prevent duplicate data. In the case of a pivot table the foreign keys, which are surrogate keys in their respective tables, become a natural key in the pivot table. These are now part of the data in the context of the pivot table and they uniquely and naturally identify that data.
Why is uniqueness so important?
Once you allow duplicates with the pivot tables you will run into several issues (especially if you have accessory data like the form_data):
How to tell those records apart?
Which of the duplicates is the authoritative copy, which is in charge.
How do you synchronize that accessory data, if you need to change form_data, which record do you change it in. Only one? Which one? Both? how do you maintain synchronizing all the duplicates.
What if an accidental duplicate gets entered, how will you know it was accidental? How do you know it's a real duplicate or true duplicate and not a valid record.
Even if you knew it was an accidental duplicat, how do you decide which one of the duplicates should be removed, this goes back to which is the authoritative record.
In short order, it really becomes a mess to deal with.
Finally (what I would suggest)
Table 'customer' -
| id | name |
Table 'event' -
| id | name | form_fields (json)
Table 'venue' -
| id | address | slots |
Table 'show' -
| id | datetime | venue_id | event_id |
Table 'purchase' -
| id | show_id | customer_id | slots | created |
Table 'ticket' ( customers_shows )
| id | purchase_id | guid |
I changed quite a few things (you can use some or all of these changes):
I changed the plural names to singular. I only use plurals when I do pivot tables that have a no accessory data, such a name would be venues_events. This is because a record from customer is a single entity, I don't need to do any joins to get useful data. A record from our hypothetical venues_events would encompass 2 entities, so I would know right away I need to do a join no matter what as there is no other data besides the foreign keys.
Now in the case of show, you may notice that is essentially a pivot table. So why did I not name it venues_events as I listed above. The reason is we have a datetime column in there, which is what I mean by "accessory" data. So in this case I could pull data just from show if I just wanted the datetime and I would not need a join to do it. So it can be considered a single entity that has some Many to One relationships. ( A Many to Many is a Many to One and a One to Many that's why we need pivot tables ) More on this table later.
Letter Casing and spacing. I would suggest using all lowercase and no spaces. MySql is case sensitive and doesn't play nice with spaces. It's just easier from a standpoint of not having to remember did we name it venuesEvents or VenuesEvents or Venuesevents etc... Consistency in naming convention is paramount in good DB design.
The above is largely Opinion based, it's my answer so it's my opinion. Deal with it.
Table show
I moved the slotscolumn to venue. I am assuming that the venue will determine how many slots are available, in my mind this is a physical requirement or attribute of the venue itself. For example a Movie theater has only X number of seats, no matter what time the movie is at doesn't change how many seats are there. If those assumptions are correct then it saves us a lot of work trying to remember how many seats a venue has every time we enter a show.
The reason I changed timeslot to show is that in both your original cases, there is some disharmony in the data model. Some things that just don't tie together as well as they should. For example your timeslots have no direct relation to the event.
Exhibit B (using your structure):
Table 'event' -
| id | name | form_fields (json) |
| 1 | "Event A" | "{}" |
| 2 | "Event B" | "{}" |
Table 'Venues' -
| id | address | event_id |
| 1 | "123 ABC SE" | 1 |
| 2 | "123 AB SE" | 2 | //address entered wrong as AB instead ABC
Table 'Timeslots' -
| id | datetime | slots | venue_id |
| 1 | "2018-01-27 04:41:23" | 200 | 1 |
| 2 | "2018-01-27 04:41:23" | 200 | 2 |
In the above exhibit, we can see right away we have to duplicate the address to create more then one event at a given venue. So if the address was entered wrong, it could be correct in some venues and incorrect in others. This can be a real issue as programmatically how do you know that AB was supposed to be ABC when the venue ID and event ID are both different for this record. Basically how do you tell those records apart at run time? You will find that it is very difficult to do. The main problem is you have to much data in Veneues, your trying to do to much with it and the relationship doesn't fit the constraints of the data.
That's not even the worst of it as a further problem creeps in, because now that the venue_id is different we can corrupt our Timeslots table and have 2 records in there at the same time for the same venue. Then, because the slots are tied to this table, we can also corrupt things down stream such as selling more tickets then we should for that time and place. Everything just starts to fracture.
Even counting the numbers of shows at a given venue becomes a real challenge, this "flaw" is in both data models you presented.
The same Data in my Model
#with Unique compound Key datetime_venue_id( show.datetime, show.venue_id)
Table 'event' -
| id | name | form_fields (json) |
| 1 | "Event A" | "{}" |
#| 2 | "Event B" | "{}" |
Table 'venue' -
| id | address | slots |
| 1 | "123 ABC SE" | 200 |
Table 'show' -
| id | datetime | venue_id | event_id |
| 1 | "2018-01-27 04:41:23" | 1 | 1 |
#| 2 | "2018-01-27 04:41:23" | 1 | 2 |
As you can see, you no longer have the duplicate address. And while it looks like you could enter in 2 shows for the same venue at the same time, this is only because we don't have a compound unique key that includes the datetime and venue_id a.k.a. Unique Key datetime_venue_id( datetime, venue_id). If you tried inserting that data with that constraint MySql would blowup on you. And if you included both inserts ( event and show ) in the same "Transaction" (which is how I would do it, in innodb engine) the whole thing would fail and get rolled back and neither the event or show would get inserted.
Now you could try to argue that you could have the same Unique constraint on Exhibit B, but as the Venue ID is different there, you would be wrong.
Anyway, show is our new main pivot table with foreign keys from event and venue and then the accessory data datetime.
Besides what I went over above, this setup gives us several advantages over the old structure, in this one table we now have access to:
what and where is the event (by joining on Table event )
when is the event ( timestamp )
how many slots available for the event (by joining on Table venue)
This centers everything around the show record. We can build a "show" independent of a customer or tickets. Because really a customer is not part of the show, and including them to soon (or to late depending how you look at it) in the data model muddies everything up.
Exhibit C
#in your first case
Pivot Table 'Tickets' -
|id | customer_id | timeslot_id | event_id | form_data (json)
#in your second case
Pivot Table 'Tickets' -
| id | customer_id | timeslot_id |
Pivot Table 'EventCustomers' -
| id | customer id | event_id | form_data (json)
AS I said above, you cant put what I am calling a show the what,where and when together without having a customer ID (in either of your data models). As you build your application around this later it will become a huge issue. This may be insurmountable at run time. Basically, you need all that data assembled and waiting on the customer_id. In both of your models that's not the case, and there is data you may not have easy access to. For example for the first case (of the old structure) how would you know that timeslot_id=20 AND event_id=32 plus a customer equals a valid ticket? There is no direct relationship between timeslot and event outside of the pivot table that contains the customer. timeslot_id=20 could be valid for any event and you have no way to know that.
It's so much easier to grab say show=32 and check how many slots are left and then just do the purchase record. Everything is ready and waiting for it.
Table purchase
I also added purchase or an order table, even if the "shows" are free this table provides us with some great utility. This is also a pivot table, but it has some accessory data just like show does. ( slots and created ).
This table
we bind the customer table to the show table here
we have a 'created' field so you will know when this record was created, when the tickets where purchased
we also have a number of slots the customer will use, we can do an aggregate sum of slots grouped on the show_id to see how many slots we have "sold". With one join from show to venue we can find out how many total slots this "show" has with the same integer key (show.id) that we used above to aggregate. Then it would be a simple matter to compare the two, if you wanted to get fancy you may be able to do this all in one query.
Table ticket
Now you may or may not even need this table. It has a many to one relationship to table purchase. So One order can have Many tickets. The records in here would be generated when a purchase is made, the number dependent on what is in slots. The primary use of this table is just to provide a unique record for each individual ticket. For this I have guid column which can just be a unique hash. Basically this would give you some tracking ability on individual tickets, I don't really have enough information to know how this will work in your case. You may even be able to replace this table with JSON data if searching on it is not a concern, and that would make maintenance of it easier in the case that some tickets are refunded. But as I hinted this is very dependent on your particular use case.
Some brief SQL examples
Joining Everything (just to show the relationships):
SELECT
{some fields}
FROM
ticket AS t
JOIN
puchase AS p ON t.purchase_id = p.id
JOIN
customer AS c ON p.customer_id = c.id
JOIN
show AS s ON p.show_id = s.id
JOIN
venue AS v ON s.venue_id = s.id
JOIN
event AS e ON s.event_id = e.id
Counting the used slots for a show:
SELECT
SUM(slots) AS used_slots
FROM
puchase
WHERE
show_id = :show_id
GROUP BY show_id
Get the available slots for a show:
SELECT
v.name,
v.slots
FROM
venue AS v
JOIN
show AS s ON s.venue_id = v.id
WHERE
v.show_id = :show_id
# or you could do s.id = :show_id
It also works out nice that all the tables start with a different letter, which makes aliasing a bit easier.
-note- The table name event may be a reserved word in MySql, I am not sure off the top of my head if it will work as a table name. Some reserved words still work in some parts of the query based on the context it's used in. Even if that is true, I am sure you can come up with a work around for it. Coincidentally this is why I named purchase that instead of order as "order" is a reserved word. (I just happen to think of event)
I hope that helps and makes sense. I probably spent way more time on this then I should have, but I design things like this for a living and I really enjoy the data architecture part of it, so I can get a bit carried away at times.
This is a bit confusing. I cant find how to word it for google and I just cant wrap my head around the logic to do this.
I have "contacts" and "sites" tables that I am storing data for in a database. We need to have a page showing the contacts information and what "sites" they are associated with AND have a page for information about a "site" and show what contacts are associated with it.
Right now I have a field for "contacts" that has comma separated ids of each site that its associated with and a field for "sites" that also has comma separated ids of each contact associated with that site.
When I create a new site with an associated contact. how should the logic go that will update the "associated sites" field on the contacts row in MySQL while also updating its own "associated contacts" field?
I think, you are speaking about many-to-many relationships.
I assume you have a table design like this:
tbl_contacts tbl_sites
id | full_name id | label
1 | John Doe 1 | my website
3 | Maria Doe 2 | super website
You need a table to "link" the tables. this is a many-to-many-table:
tbl_contacts2sites
id | contact_id | site_id
1 | 1 | 2
5 | 1 | 1
3 | 3 | 2
So, John Doe is assigned to both sites, but Maria only to the "super site".
This is the common way to design your relationships. You should avoid any comma seperated lists for this kind of relationships.
The best solution would be when you create a third table (contacts2sites) in this table you have three columns called: id, siteid, contactid.
In this table you can add every connection between sites and contacts. To query the data out of it you can use the mysql query with joins over all tables.
little example:
solution with joins
I am developing a personal finance tracker (for fun!) and I have a table of categories. Each category is an entry in the table and at the end of the month they are all duplicated with their relevant balances reset to the start of the month reading for the new month.
Among others, these categories can be of type 'savings' and so have a running total. If I want to retrieve a category or update it then I used the category_id field and this works fine for the current working month but linking months together is breaking my brain. For the savings categories I want to show how the running_total has increased over the previous six months but in my current DB design, categories don't "know" about their previous months as they are created new at the start of each month.
The only way I could currently retrieve the last 6 months of a savings running_total is to search by the category name but this is potentially unreliable.
I have considered adding a field to the table which is "previous_month_category_id" which would work as a way to link the categories together but would be expensive to implement as it would require 6 MSQL operations each time grabbing the "previous_month_category_id" from the result and then re running the query.
If MYSQL can do some kind of recursion then maybe this could work but I feel like there is a more obvious answer staring me in the face.
I'm using Codeigniter and MYSQL but not scared of vanilla PHP if required.
Help on how to do this would be great.
UPDATE 1:
Below is a sample from what the savings category might look like mixed in amongst other categories. At the end of each month the entry is duplicated with the same category_name, type, buget, year, and users_id but the category_id auto increments, the month updates to the new month number and the running total is the previous running_total + the budget. How would I do one database query to retrieve these without using the category_name? As this could change is the user decided to caller it "Bigger TV" at the end of July
+-------------+--------------+------+--------+---------------+------+-------+----------+
| category_id |category_name | type | budget | running_total | year | month | users_id |
+-------------+--------------+------+--------+---------------+------+-------+----------+
| 44 | Big TV | sav | 20 | 240 | 2012 | 8 | 77 |
+-------------+--------------+------+--------+---------------+------+-------+----------+
| 32 | Big TV | sav | 20 | 220 | 2012 | 7 | 77 |
+-------------+--------------+------+--------+---------------+------+-------+----------+
| 24 | Big TV | sav | 20 | 200 | 2012 | 6 | 77 |
UPDATE 2:
I'm not sure I'm explaining myself very well So I'll put some more detail around how the app works and see if that helps.
I have tables called "categories", "transactions" and "users". A category can be one of three types, 1: Cash, 2: Regular Payment, 3: Savings. Think of cash and regular payment types as buckets, at the start of each month each bucket is full and the aim is to take money out of it and make sure there is still a bit left at the end of the month (or at least not negative).
This is fine on a month by month basis and works very well (for me, I have used this system for 2 years now I think). The trip up comes with Savings as they are linked month by month and are more like a big bucket that is added to each month (with a set increment called budget) until it overspills and is then drained (like Big TV would be when you buy it), or taken from a little bit here and there and the aim is to build up an emergency fund (like "When my car breaks down" type thing).
When the relevant information is displayed for each category only the current month is shown for cash and regular as that is all that is important, for the savings however the current amount is also shown but it would be nice to show a small history graph of how it had built up (or depleted) over time. To do this I need some way of searching for the previous end of month states of these categories so that the graph can be plotted but currently I can't work out how to link them all by anything other than the category_name.
I have tried to implement a bit of DB normalisation but this is the first schema I've implemented having known about normalisation so I've probably missed some aspects of it and possibly avoided any over normalisation where it didn't feel right.
Below are my tables:
categories
+-------------+--------------+------+--------+---------------+------+-------+----------+
| category_id |category_name | type | budget | running_total | year | month | users_id |
+-------------+--------------+------+--------+---------------+------+-------+----------+
transactions
+----------------+--------------+--------+------+----------+------------------------+
| transaction_id | description | amount | date | users_id | categories_category_id |
+----------------+--------------+--------+------+----------+------+-------+---------+
they are joined on categories_category_id which is a foreign key
I have always worked off the premise that each category needs an new entry for each month but it seems from the comments and answers below that I would be better off having just one category entry regardless of month and then just calculating everything on the fly?
Although, the budgets can be changed by the user and so for record keeping I'm not sure if this would work also the "deposits" never really happen it is just the category being duplicated at the end of the month so I guess that would need to dealt with.....
The aim of this app has always been to decouple financial tracking from the physical transaction that occur in a bank account and provide a layer over someones finances thus allowing the user to avoid hard to explain transactions etc and just focus on over all cash position. There is no concept of an "income" in this system, or a bank account.
It seems to me like your database design could use some work. I'm still not completely familiar with what you're really trying to do, but my initial thoughts would be to store each transaction as a single row in a table, and then query that table in different ways to generate different types of reports on it. Something like this:
transactions:
+----+---------+--------+---------------+-----------+-------------+
| id | user_id | amount | running_total | datestamp | category_id |
+----+---------+--------+---------------+-----------+-------------+
categories:
+----+------+------+
| id | name | type |
+----+------+------+
Don't increment the categories based on time. Add an entry to the categories table when you actually have a new category. If a transaction could possibly belong to multiple categories, then use a third (relational) table that relates transactions (based on transaction ID) to categories (based on category ID).
When you have a deposit, the amount field will be positive and for withdrawals, it will be negative. You can get your current running total by doing something like:
SELECT running_total FROM transactions
WHERE id = (SELECT MAX(id) FROM transactions WHERE user_id = '$userID');
You can find your total difference for a particular month by doing this:
SELECT SUM(amount) FROM transactions WHERE DATE('%c', datestamp) = '$monthNumber';
You can find the total spending for a particular category by doing this:
SELECT SUM(t.amount) FROM transactions t
INNER JOIN categories c ON t.category_id = c.id WHERE c.name = 'Big TV';
There are plenty of other possibilities, but the purpose here is just to demonstrate a possibly better way to store your data.
Take a look at the items table below, as you can see this table is not normalized. Name should in a separate table to normalize it.
mysql> select * from items;
+---------+--------+-----------+------+
| item_id | cat_id | name | cost |
+---------+--------+-----------+------+
| 1 | 102 | Mushroom | 5.00 |
| 2 | 2 | Mushroom | 5.40 |
| 3 | 173 | Pepperoni | 4.00 |
| 4 | 109 | Chips | 1.00 |
| 5 | 35 | Chips | 1.00 |
+---------+--------+-----------+------+
This table is not normalize because on the backend Admin site, staff simply select a category and type in the item name to add data quickly. It is very quick. There are hundreds of same item name but the cost is not always the same.
If I do normalize this table to something like this:
mysql> select * from items;
+---------+--------+--------------+------+
| item_id | cat_id | item_name_id | cost |
+---------+--------+--------------+------+
| 1 | 102 | 1 | 5.00 |
| 2 | 2 | 1 | 5.40 |
| 3 | 173 | 2 | 4.00 |
| 4 | 109 | 3 | 1.00 |
| 5 | 35 | 3 | 1.00 |
+---------+--------+--------------+------+
mysql> select * from item_name;
+--------------+-----------+
| item_name_id | name |
+--------------+-----------+
| 1 | Mushroom |
| 2 | Pepperoni |
| 3 | Chips |
+--------------+-----------+
Now how can I add item (data) on the admin backend (data entry point of view) because this table has been normalized? I don't want like a dropdown to select item name - there will be thousands of different item name - it will take a lot of of time to find the item name and then type in the cost.
There need to be a way to add item/data quick as possible. What is the solution to this? I have developed backend in PHP.
Also what is the solution for editing the item name? Staff might rename the item name completely for example: Fish Kebab to Chicken Kebab and that will effect all the categories without realising it. There will be some spelling mistake that may need correcting like F1sh Kebab which should be Fish Kebab (This is useful when the tables are normalized and I will see item name updated every categories).
I don't want like a dropdown to select item name - there will be thousands of different item name - it will take a lot of of time to find the item name and then type in the cost.
There are options for selecting existing items other than drop down boxes. You could use autocompletion, and only accept known values. I just want to be clear there are UI friendly ways to achieve your goals.
As for whether to do so or not, that is up to you. If the product names are varied slightly, is that a problem? Can small data integrity issues like this be corrected with batch jobs or similar if they are a problem?
Decide what your data should look like first, based on the design of your system. Worry about the best way to structure a UI after you've made that decision. Like I said, there are usable ways to design UI regardless of your data structuring.
I think you are good to go with your current design, for you name is the product name and not the category name, you probably want to avoid cases where renaming a single product would rename too many of them at once.
Normalization is a good thing but you have to measure it against your specific needs and in this case I really would not add an extra table item_name as you shown above.
just my two cents :)
What are the dependencies supposed to be represented by your table? What are the keys? Based on what you've said I don't see how your second design is any more normalized that your first.
Presumably the determinants of "name" in the first design are the same as the determinants of "item_name_id" in the second? If so then moving name to another table won't make any difference to the normal forms satisified by your items table.
User interface design has nothing to do with database design. You cannot let the UI drive the database design and expect sensible results.
You need to validate the data and check for existence prior to adding it to see if it's a new value.
$value = $_POST['userSubmittedValue']
//make sure you sanitize the variable (never trust user input)
$query = SELECT item_name_id
FROM item_name
WHERE name='$value';
$result = mysql_query($query);
$row = mysql_fetch_row($result);
if(!empty($row))
{
//add the record with the id from $row['item_name_id'] to items table
}
else
{
//this will be a new value so run queries to add the new value to both items and item_name tables
}
There need to be a way to add item/data quick as possible. What is the
solution to this? I have developed backend in PHP.
User interface issues and database structure are separate issues. For a given database structure, there are usually several user-friendly ways to present and change the data. Data integrity comes from the database. The user interface just needs to know where to find unique values. The programmer decides how to use those unique values. You might use a drop-down list, pop up a search form, use autocomplete, compare what the user types to the elements in an array, or query the database to see whether the value already exists.
From your description, it sounds like you had a very quick way to add data in the first place: "staff simply select a category and type in the item name to add data quickly". (Replacing "mushroom" with '1' doesn't have anything to do with normalization.)
Also what is the solution for editing the item name? Staff might
rename the item name completely for example: Fish Kebab to Chicken
Kebab and that will effect all the categories without realising it.
You've allowed the wrong person to edit item names. Seriously.
This kind of issue arises in every database application. Allow only someone trained and trustworthy to make these kinds of changes. (See your dbms docs for GRANT and REVOKE. Also take a look at ON UPDATE RESTRICT.)
In our production database at work, I can insert new states (for the United States), and I can change existing state names to whatever I want. But if I changed "Alabama" to "Kyrgyzstan", I'd get fired. Because I'm supposed to know better than to do stuff like that.
But even though I'm the administrator, I can't edit a San Francisco address and change its ZIP code to '71601'. The database "knows" that '71601' isn't a valid ZIP code for San Francisco. Maybe you can add a table or two to your database, too. I can't tell from your description whether something like that would help you.
On systems where I'm not the administrator, I'd expect to have no permissions to insert rows into the table of states. In other tables, I might have permission to insert rows, but not to update or delete them.
There will be some spelling mistake that may need correcting like F1sh
Kebab which should be Fish Kebab
The lesson is the same. Some people should be allowed to update items.name, and some people should not. Revoke permissions, restrict cascading updates, increase data integrity using more tables, or increase training.
I have one table GAMES and another PLAYERS. Currently each "game" has a column for players_in_game but I have nothing reciprocating in the PLAYERS table. Since this column is an array (Comma separated list of the player's ID #s) I'm thinking that it would probably be better to have each player's record also contain a list of the games they are a member of. On the other hand, duplicating the information in two separate tables might actually require more DB calls.
For perspective, there aren't likely to be more then a dozen players in a game (generally 4-6 is the norm) but there could potentially be a large number of games.
Is there a good way to figure out which would be more efficient?
Thanks.
Normalization is generally a good thing. Comma delimited lists in tables is a sign that a table is in desperate need of a foreign key. If you're worried about extra queries, check out JOINING
dbo.games
+----+----------+
| id | name |
+----+----------+
| 1 | war |
| 2 | invaders |
+----+----------+
dbo.players
+----+----------+---------+
| id | name | game_id |
+----+----------+---------+
| 1 | john | 1 |
| 2 | mike | 1 |
+----+----------+---------+
SELECT games.name, count(players.id) as total_players FROM games INNER JOIN players ON games.id = players.game_id GROUP BY games.name;
Result:
+-----------+--------------+
| name |total_players |
+-----------+--------------+
| war | 2 |
| invaders | 0 |
+-----------+--------------+
Sidenote: Go Hokies :)
Oh god, please don't use CSVs!! I know it's tempting when you're new to SQL, but it becomes unqueryable...
You need 3 tables: games, players, and players_in_games. games and players should each have a primary auto-incrementing key like id, and then players_in_games needs just two fields, player_id and game_id. This is called a "many to many" relationship. A player can play many games, and a game can have many players.
The right answer is a table called PlayersInGames that has a player id and a game id per row.
I would create a third table that links the players and games. Your comma-delimited list is effectively a third table, but parsing your list is almost certainly going to be less efficient than letting the database do it for you.
Ask yourself what happens if you remove a row from the GAME table. Now you'll have to loop over all the PLAYER rows, parse the list, figure out which ones contain a reference to the removed GAME, and then update all the lists.
Bad design. Let SQL do what it was born for. The query will be fast enough if you index it properly. Micro-optimizations like this are the wrong approach.