CakePHP Normalization

CakePHP Normalization - php

I am debating over the amount of normalization to use in my tables.
For example, if I have a database table called players with columns such as name, hometown, etc...
Other columns are options bats (right, left, or switch), or status (active, injured) that with be displayed as radio buttons or drop downs.
Currently, our database stores these options in their own tables bats and statuses and we reference the related table with the fields bat_idand status_id.
If the bats and statuses tables are simply storing a list of names and ids and will always have less than 10 values, should I flatten the database and simply store the values directly in the players table?
When creating radio inputs for those fields I might have to execute a group by query on a large table. Would it make sense to store the possible values globally as an array in app/config/bootstrap.php or by using the configure class?

For my opinion you can really seldom overuse normalization. I'd avoid globals as much as I can.
If your bats and statuses tables will hold "only" configuration or status key data you might put all of them together in a single table, accessible by namespace.
E.g.:
id | namespace | value
---------------------------
1 | bats | left
2 | bats | right
3 | bats | swing
4 | status | active
5 | status | injured
... etc, you get it. Simply have an index on the namespace to help the database - unless there are really only a few lines in there where a decent DB would ignore the index anyway.

Related

MySQL & jQuery: Store dynamic generated content in database

I have a dynamic form that creates some inputs that will generate values and should be saved into the database. Each set of values should be saved separetely in a single field in the database called "Education":
Should be stored like this:
+--------+---------+----------+--------+--------+--------+--------+-----+
| id | name | Education |
+--------+---------+----------+--------+--------+--------+--------+-----+
| 100 | John | [Harvard, Marketing,2009,2014] [MIT,CS,2005,2009] |
+--------+---------+----------+--------+--------+--------+--------+-----+
| 101 | Daniel | [TEC, Marketing,2009,2014] [Standford,CS,2001,2005]|
+--------+---------+----------+--------+--------+--------+--------+-----+
The Education field can have up to 10 sets of values, I'm just showing 2.
Please look at the JSFIDDLE to see how it actually works: http://jsfiddle.net/YueX2/6/
How can I store into the database when a single set of values is edited and saving it into the database only updating the given set of values?
Also, is this the best way to do it?

There is a practice in designing databases called normalization which would lead you to the best way to go about it. Based on your Question and jsFiddle you would end up with 2 separate database tables.
Ex: tbl_Users: which would contain fields such as
userID
userName
Then you would have another table Ex: tbl_Education which would contain a few fields such as
record_id
userID
schoolName
It is in this table where you would set the particular users id in the userID field which would have to match the userID field from tbl_Users and then a single school they attended in the schoolName field. If they attended multiple schools, they would have multiple entries in the tbl_Education table but only a single entry in the tbl_Users table. If you need to retrieve the data you would perform a SQL query on the two table and join on the User_id field. This would result in multiple records being returned, but with all of the data needed.
Any information which is specific to the particular part of their education would go in the tbl_Education table and anything specific to the user (hair color, eye color, height, etc) would go in the tbl_Users table.
Ex SQL Query:
SELECT tbl_Users.userID, tbl_Users.userName, tbl_Education.schoolName
FROM tbl_Users, tbl_Education
WHERE tbl_Users.userID = tbl_Education.userID;
The WHERE clause is essentially the join between the two tables. There are many ways to write this query, I used the method which seems visually the easiest to see what is going on.
Here is the wikipedia link for normalization to get you started.
http://en.wikipedia.org/wiki/Database_normalization

This is not the best way to solve this problem. You better make 2 tables.
Table users:
+--------+---------+
| id | name |
+--------+---------+
Table education:
+--------+----------+------------+-------+-----+------+
| id | location | discipline | start | end | user |
+--------+----------+------------+-------+-----+------+
As education.user you save a foreign key users.id from the user table. the implementation of the input form is on the one hand side more complex but you have no limitations of entrys per user nor too much overhead in your database.

Followers/following database structure

My website has a followers/following system (like Twitter's). My dilemma is creating the database structure to handle who's following who.
What I came up with was creating a table like this:
id | user_id | followers | following
1 | 20 | 23,58,84 | 11,156,27
2 | 21 | 72,35,14 | 6,98,44,12
... | ... | ... | ...
Basically, I was thinking that each user would have a row with columns for their followers and the users they're following. The followers and people they're following would have their user id's separated by commas.
Is this an effective way of handling it? If not, what's the best alternative?

That's the worst way to do it. It's against normalization. Have 2 seperate tables. Users and User_Followers. Users will store user information. User_Followers will be like this:
id | user_id | follower_id
1 | 20 | 45
2 | 20 | 53
3 | 32 | 20
User_Id and Follower_Id's will be foreign keys referring the Id column in the Users table.

There is a better physical structure than proposed by other answers so far:
CREATE TABLE follower (
user_id INT, -- References user.
follower_id INT, -- References user.
PRIMARY KEY (user_id, follower_id),
UNIQUE INDEX (follower_id, user_id)
);
InnoDB tables are clustered, so the secondary indexes behave differently than in heap-based tables and can have unexpected overheads if you are not cognizant of that. Having a surrogate primary key id just adds another index for no good reason1 and makes indexes on {user_id, follower_id} and {follower_id, user_id} fatter than they need to be (because secondary indexes in a clustered table implicitly include a copy of the PK).
The table above has no surrogate key id and (assuming InnoDB) is physically represented by two B-Trees (one for the primary/clustering key and one for the secondary index), which is about as efficient as it gets for searching in both directions2. If you only need one direction, you can abandon the secondary index and go down to just one B-Tree.
BTW what you did was a violation of the principle of atomicity, and therefore of 1NF.
1 And every additional index takes space, lowers the cache effectiveness and impacts the INSERT/UPDATE/DELETE performance.
2 From followee to follower and vice versa.

One weakness of that representation is that each relationship is encoded twice: once in the row for the follower and once in the row for the following user, making it harder to maintain data integrity and updates tedious.
I would make one table for users and one table for relationships. The relationship table would look like:
id | follower | following
1 | 23 | 20
2 | 58 | 20
3 | 84 | 20
4 | 20 | 11
...
This way adding new relationships is simply an insert, and removing relationships is a delete. It's also much easier to roll up the counts to determine how many followers a given user has.

No, the approach you describe has a few problems.
First, storing multiple data points as comma-separated strings has a number of issues. It's difficult to join on (and while you can join using like it will slow down performance) and difficult and slow to search on, and can't be indexed the way you would want.
Second, if you store both a list of followers and a list of people following, you have redundant data (the fact that A is following B will show up in two places), which is both a waste of space, and also creates the potential of data getting out-of-sync (if the database shows A on B's list of followers, but doesn't show B on A's list of following, then the data is inconsistent in a way that's very hard to recover from).
Instead, use a join table. That's a separate table where each row has a user id and a follower id. This allows things to be stored in one place, allows indexing and joining, and also allows you to add additional columns to that row, for example to show when the following relationship started.

Storing a User's Statistics In a Table. which of these two methods should i be using?

What's the best way to store site statistics for specific users? Basically I want to store how many times a user has done a specific task. The data will be coming from a potentially large table and will be referenced frequently, so I want to avoid COUNT() and store them in their own table.
Method A
Have a table with the following fields, then have a row for each user to store the count for each field:
User_id | posted_comments | comment_replies | post_upvotes | post_downvotes
50 12 7 23 54
Method B
Have one table storing the actions, and another storing the count for that action:
Table 1:
Id | Action
1 | posted_comments
2 | comment_replies
3 | post_upvotes
4 | post_downvotes
Table 2
User_id | Action | Count
50 | 1 | 12
50 | 2 | 7
50 | 3 | 23
50 | 4 | 54
I can't see me having more than 25-30 actions in total, but I'm not sure if that is too many to store horizontally as in method A.

I think you answered your question. If you don't know what the actions are, then store each action in a separate row. That would be the second option.
Be sure that you have the proper indexes on the table. One possibility is (user_id, action, count). With this index, it will be fast to denormalize the table at the user level.
If you have a well-defined problem and won't need to be adding/removing/renaming columns in a table, then the first version is also feasible. Otherwise, just stick with inserting rows. The queries may seem a little bit more complicated, but the application is more flexible.

Seems like a typical BI question to me. The real question is not how many "actions" you have in your dimension, but how often they change.
Table A is denormalized and quick and easy to read: with a "SELECT" you get your information in the proper format.
Table B is normalized and easier to maintain It is highly recommended if your list of actions difficult to defined in advance, and is a must if it is dynamic.
To pass back and forth from Table A to Table B is known as pivot operations, for which you find standard tools, but which are never easy to code manually. So do not jump too quickly to the conclusion that Table B is better just because every body tells so since Codd in 1970.
I suggest you to ask yourself the question of how often will your COUNT(*) table(s) will be read. If you can live with the statistics of yesterday, then compute BOTH tables every night.

Worried about too many fields in table

i am currently planning out my next project which is a text based mmorpg game. I am currently trying to design certain parts of the database and have hit a bit of a problem that i have never had before. One part of the game allows the player to buy a car and add addons to it. I was going to have a different table altogether to manage the addons for the car, but a user could have up to 100 addons for a single car, which would require over 100 fields, of course i am not happy with this many fields in one table as it could become difficult to manage, is there any other way to split them up into multiple table?
Thanks

Why does each addon have to be a separate column? Couldn't you have a many-to-many join table that would link car to addon?
Car
ID | Owner
1 | Jacob
2 | Mary
Addon
ID | Name | Price
1 | Flame decal | $10
2 | CD Changer | $150
Car_Addon
Car_ID | Addon_Id
1 | 1
1 | 2
2 | 2
This indicates that Jacob's car has a flame decal and a cd changer, while Mary's car only has a cd changer.
Advantages of this approach:
You can use foreign key constraints to ensure that no invalid records can be created
It's easy to query in either direction -- which addons does this car have or which cars have a given addon
The meaning of the relation is clear -- you're not relying on decoding serialized data within a single field
You can store data about the association between car and addon -- the car_addon table can have a column for when the addon was added to that car, how it was paid for, whether it was part of a discount package, etc.

You have a many-to-many relationship between cars and addons. You need an intermediary junction table to resolve that relationship.

No.
Split them into multiple tables. If you have 100+ fields in a table, 99.9% of the time you haven't normalized your design enough. A sure sign of a badly structured database is a lot of sparsely populated fields.
Why are you hesitant to split it?

You should have a table for Cars (ID, Name) for example, one table for ADDON (ID, Name) too and another table to link these talbes called CAR_ADDON (idCar, idADDON).
That would be the best approach

Database Normalisation and Data Entry (admin backend)

Take a look at the items table below, as you can see this table is not normalized. Name should in a separate table to normalize it.
mysql> select * from items;
+---------+--------+-----------+------+
| item_id | cat_id | name | cost |
+---------+--------+-----------+------+
| 1 | 102 | Mushroom | 5.00 |
| 2 | 2 | Mushroom | 5.40 |
| 3 | 173 | Pepperoni | 4.00 |
| 4 | 109 | Chips | 1.00 |
| 5 | 35 | Chips | 1.00 |
+---------+--------+-----------+------+
This table is not normalize because on the backend Admin site, staff simply select a category and type in the item name to add data quickly. It is very quick. There are hundreds of same item name but the cost is not always the same.
If I do normalize this table to something like this:
mysql> select * from items;
+---------+--------+--------------+------+
| item_id | cat_id | item_name_id | cost |
+---------+--------+--------------+------+
| 1 | 102 | 1 | 5.00 |
| 2 | 2 | 1 | 5.40 |
| 3 | 173 | 2 | 4.00 |
| 4 | 109 | 3 | 1.00 |
| 5 | 35 | 3 | 1.00 |
+---------+--------+--------------+------+
mysql> select * from item_name;
+--------------+-----------+
| item_name_id | name |
+--------------+-----------+
| 1 | Mushroom |
| 2 | Pepperoni |
| 3 | Chips |
+--------------+-----------+
Now how can I add item (data) on the admin backend (data entry point of view) because this table has been normalized? I don't want like a dropdown to select item name - there will be thousands of different item name - it will take a lot of of time to find the item name and then type in the cost.
There need to be a way to add item/data quick as possible. What is the solution to this? I have developed backend in PHP.
Also what is the solution for editing the item name? Staff might rename the item name completely for example: Fish Kebab to Chicken Kebab and that will effect all the categories without realising it. There will be some spelling mistake that may need correcting like F1sh Kebab which should be Fish Kebab (This is useful when the tables are normalized and I will see item name updated every categories).

I don't want like a dropdown to select item name - there will be thousands of different item name - it will take a lot of of time to find the item name and then type in the cost.
There are options for selecting existing items other than drop down boxes. You could use autocompletion, and only accept known values. I just want to be clear there are UI friendly ways to achieve your goals.
As for whether to do so or not, that is up to you. If the product names are varied slightly, is that a problem? Can small data integrity issues like this be corrected with batch jobs or similar if they are a problem?
Decide what your data should look like first, based on the design of your system. Worry about the best way to structure a UI after you've made that decision. Like I said, there are usable ways to design UI regardless of your data structuring.

I think you are good to go with your current design, for you name is the product name and not the category name, you probably want to avoid cases where renaming a single product would rename too many of them at once.
Normalization is a good thing but you have to measure it against your specific needs and in this case I really would not add an extra table item_name as you shown above.
just my two cents :)

What are the dependencies supposed to be represented by your table? What are the keys? Based on what you've said I don't see how your second design is any more normalized that your first.
Presumably the determinants of "name" in the first design are the same as the determinants of "item_name_id" in the second? If so then moving name to another table won't make any difference to the normal forms satisified by your items table.
User interface design has nothing to do with database design. You cannot let the UI drive the database design and expect sensible results.

You need to validate the data and check for existence prior to adding it to see if it's a new value.
$value = $_POST['userSubmittedValue']
//make sure you sanitize the variable (never trust user input)
$query = SELECT item_name_id
FROM item_name
WHERE name='$value';
$result = mysql_query($query);
$row = mysql_fetch_row($result);
if(!empty($row))
{
//add the record with the id from $row['item_name_id'] to items table
}
else
{
//this will be a new value so run queries to add the new value to both items and item_name tables
}

There need to be a way to add item/data quick as possible. What is the
solution to this? I have developed backend in PHP.
User interface issues and database structure are separate issues. For a given database structure, there are usually several user-friendly ways to present and change the data. Data integrity comes from the database. The user interface just needs to know where to find unique values. The programmer decides how to use those unique values. You might use a drop-down list, pop up a search form, use autocomplete, compare what the user types to the elements in an array, or query the database to see whether the value already exists.
From your description, it sounds like you had a very quick way to add data in the first place: "staff simply select a category and type in the item name to add data quickly". (Replacing "mushroom" with '1' doesn't have anything to do with normalization.)
Also what is the solution for editing the item name? Staff might
rename the item name completely for example: Fish Kebab to Chicken
Kebab and that will effect all the categories without realising it.
You've allowed the wrong person to edit item names. Seriously.
This kind of issue arises in every database application. Allow only someone trained and trustworthy to make these kinds of changes. (See your dbms docs for GRANT and REVOKE. Also take a look at ON UPDATE RESTRICT.)
In our production database at work, I can insert new states (for the United States), and I can change existing state names to whatever I want. But if I changed "Alabama" to "Kyrgyzstan", I'd get fired. Because I'm supposed to know better than to do stuff like that.
But even though I'm the administrator, I can't edit a San Francisco address and change its ZIP code to '71601'. The database "knows" that '71601' isn't a valid ZIP code for San Francisco. Maybe you can add a table or two to your database, too. I can't tell from your description whether something like that would help you.
On systems where I'm not the administrator, I'd expect to have no permissions to insert rows into the table of states. In other tables, I might have permission to insert rows, but not to update or delete them.
There will be some spelling mistake that may need correcting like F1sh
Kebab which should be Fish Kebab
The lesson is the same. Some people should be allowed to update items.name, and some people should not. Revoke permissions, restrict cascading updates, increase data integrity using more tables, or increase training.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.