Database Normalisation and Data Entry (admin backend)

Database Normalisation and Data Entry (admin backend) - php

Take a look at the items table below, as you can see this table is not normalized. Name should in a separate table to normalize it.
mysql> select * from items;
+---------+--------+-----------+------+
| item_id | cat_id | name | cost |
+---------+--------+-----------+------+
| 1 | 102 | Mushroom | 5.00 |
| 2 | 2 | Mushroom | 5.40 |
| 3 | 173 | Pepperoni | 4.00 |
| 4 | 109 | Chips | 1.00 |
| 5 | 35 | Chips | 1.00 |
+---------+--------+-----------+------+
This table is not normalize because on the backend Admin site, staff simply select a category and type in the item name to add data quickly. It is very quick. There are hundreds of same item name but the cost is not always the same.
If I do normalize this table to something like this:
mysql> select * from items;
+---------+--------+--------------+------+
| item_id | cat_id | item_name_id | cost |
+---------+--------+--------------+------+
| 1 | 102 | 1 | 5.00 |
| 2 | 2 | 1 | 5.40 |
| 3 | 173 | 2 | 4.00 |
| 4 | 109 | 3 | 1.00 |
| 5 | 35 | 3 | 1.00 |
+---------+--------+--------------+------+
mysql> select * from item_name;
+--------------+-----------+
| item_name_id | name |
+--------------+-----------+
| 1 | Mushroom |
| 2 | Pepperoni |
| 3 | Chips |
+--------------+-----------+
Now how can I add item (data) on the admin backend (data entry point of view) because this table has been normalized? I don't want like a dropdown to select item name - there will be thousands of different item name - it will take a lot of of time to find the item name and then type in the cost.
There need to be a way to add item/data quick as possible. What is the solution to this? I have developed backend in PHP.
Also what is the solution for editing the item name? Staff might rename the item name completely for example: Fish Kebab to Chicken Kebab and that will effect all the categories without realising it. There will be some spelling mistake that may need correcting like F1sh Kebab which should be Fish Kebab (This is useful when the tables are normalized and I will see item name updated every categories).

I don't want like a dropdown to select item name - there will be thousands of different item name - it will take a lot of of time to find the item name and then type in the cost.
There are options for selecting existing items other than drop down boxes. You could use autocompletion, and only accept known values. I just want to be clear there are UI friendly ways to achieve your goals.
As for whether to do so or not, that is up to you. If the product names are varied slightly, is that a problem? Can small data integrity issues like this be corrected with batch jobs or similar if they are a problem?
Decide what your data should look like first, based on the design of your system. Worry about the best way to structure a UI after you've made that decision. Like I said, there are usable ways to design UI regardless of your data structuring.

I think you are good to go with your current design, for you name is the product name and not the category name, you probably want to avoid cases where renaming a single product would rename too many of them at once.
Normalization is a good thing but you have to measure it against your specific needs and in this case I really would not add an extra table item_name as you shown above.
just my two cents :)

What are the dependencies supposed to be represented by your table? What are the keys? Based on what you've said I don't see how your second design is any more normalized that your first.
Presumably the determinants of "name" in the first design are the same as the determinants of "item_name_id" in the second? If so then moving name to another table won't make any difference to the normal forms satisified by your items table.
User interface design has nothing to do with database design. You cannot let the UI drive the database design and expect sensible results.

You need to validate the data and check for existence prior to adding it to see if it's a new value.
$value = $_POST['userSubmittedValue']
//make sure you sanitize the variable (never trust user input)
$query = SELECT item_name_id
FROM item_name
WHERE name='$value';
$result = mysql_query($query);
$row = mysql_fetch_row($result);
if(!empty($row))
{
//add the record with the id from $row['item_name_id'] to items table
}
else
{
//this will be a new value so run queries to add the new value to both items and item_name tables
}

There need to be a way to add item/data quick as possible. What is the
solution to this? I have developed backend in PHP.
User interface issues and database structure are separate issues. For a given database structure, there are usually several user-friendly ways to present and change the data. Data integrity comes from the database. The user interface just needs to know where to find unique values. The programmer decides how to use those unique values. You might use a drop-down list, pop up a search form, use autocomplete, compare what the user types to the elements in an array, or query the database to see whether the value already exists.
From your description, it sounds like you had a very quick way to add data in the first place: "staff simply select a category and type in the item name to add data quickly". (Replacing "mushroom" with '1' doesn't have anything to do with normalization.)
Also what is the solution for editing the item name? Staff might
rename the item name completely for example: Fish Kebab to Chicken
Kebab and that will effect all the categories without realising it.
You've allowed the wrong person to edit item names. Seriously.
This kind of issue arises in every database application. Allow only someone trained and trustworthy to make these kinds of changes. (See your dbms docs for GRANT and REVOKE. Also take a look at ON UPDATE RESTRICT.)
In our production database at work, I can insert new states (for the United States), and I can change existing state names to whatever I want. But if I changed "Alabama" to "Kyrgyzstan", I'd get fired. Because I'm supposed to know better than to do stuff like that.
But even though I'm the administrator, I can't edit a San Francisco address and change its ZIP code to '71601'. The database "knows" that '71601' isn't a valid ZIP code for San Francisco. Maybe you can add a table or two to your database, too. I can't tell from your description whether something like that would help you.
On systems where I'm not the administrator, I'd expect to have no permissions to insert rows into the table of states. In other tables, I might have permission to insert rows, but not to update or delete them.
There will be some spelling mistake that may need correcting like F1sh
Kebab which should be Fish Kebab
The lesson is the same. Some people should be allowed to update items.name, and some people should not. Revoke permissions, restrict cascading updates, increase data integrity using more tables, or increase training.

Related

mySQL table organisation for multi-dimensional categorisation

Existing System:
I have a mySQL database that stores category related information for approximately 200 different unique users. The information being stored and retrieved for each user is in the hierarchy of
imageCategories
> Parent Category 1
> Child Category 1 : "45,19,3,4,8"
> Child Category 2 : "17,1,99"
> ... etc
> Parent Category 2
> Child Category 1 : "83,6"
> Child Category 2 : "19,74,26"
... etc
> etc
The string value of each child category is a series of comma-separated ids which reference descriptions (on a separate table) stored under that child category. I store all of this as an array in a column for each user by means of a json_encoded string in the form of:
{"Parent Category 1":{"Child Category 1":["45,19,3,4,8"],"Child Category 2":["17,1,99"]},"Parent Category 2":{"Child Category 1":["83,6"],"Child Category 2":["19,74,26"]}}
The system works by retrieving this json_string when a user logs and decoding it to a session array. Whenever any changes are made to it, it's reencoded to a json string, saved to the database and the session array is updated to reflect this. This works fine. While my research way back when made me do so, I was never quite sure if storing a multi-dimensional array in mySQL is good best practise. What I do know is that this keeps organising it quite stress-free and I haven't noticed it causing a lot of overhead, which is not to say that it doesn't.
The conundrum:
What I want to do now is add a string description to each Child Category in the database. Potentially to each Parent Category later but baby steps first.
I was initially going to start a third dimension for the overall array. Instead of:
"Child Category Key" : "id string"
I would change it to:
"Child Category Key" : ["id string", "description string"]
or:
"Child Category Key" : ["id string", id for description on another table]
I don't see an issue with either, but I'm wondering if im veering way off best practises. Should I be creating a new table for the entire category structure, rather than storing all of it as a json string in a column with other user settings (it's never going to get too unwieldly in terms of character length). The current structure is quite easy to get my head around and I wouldn't necessarily jump to a solution that would provide minimal overhead benefits if it's structure makes managing the database unecessarily complicated (keep in mind some of us aren't naturals at this and our brains process this kinda structure a little slower than others).
Design Requirements:
I may miss out on describing specifics needed as I'm unsure what the most pertinent information is from what's relevant. I can elaborate where needed. What seems the most important design requirement is that each user has unique category keys and values. They can only be in the form of parent > child > csv of ids but each user will have custom key titles and a different number of each. The order of each is also essential.
I'm currently running on a server with ssd disk, 1gb of memory and a single 2ghz core from an Intel hexcore. Requests to the database are primarily retrieving the categories on both a front and backend. The majority use little traffic so nothing has been too taxing apart from occasional spikes. I will upgrade when I see a bottleneck approaching. Just trying to use what I have as efficiently as possible at the moment and keep best practices in play.
Database Structure:
Right now my table structure is in the form of (omitting other columns not relevant to the question):
Table usersettings:
+-----+----------------------+-----+
| id | imageCategories | ... |
+-----+----------------------+-----+
| 1 | {"Parent Category... | ... |
| 2 | {"Parent Category... | ... |
| 3 | {"Parent Category... | ... |
| ... | | |
+-----+----------------------+-----+
Table users:
+-----+----------------------+---------+--------+
| id | username | cluster | server |
+-----+----------------------+---------+--------+
| 1 | johndoe | 1 | 1 |
| 2 | katedoe | 1 | 1 |
| 3 | ellendoe | 1 | 1 |
| ... | | | |
+-----+----------------------+---------+--------+
Table descriptions_0001:
+-----+---------+---------------+-----+
| id | title | descriptions | ... |
+-----+---------+---------------+-----+
| 11 | Title 1 | Description 1 | ... |
| 56 | Title 2 | Description 2 | ... |
| 78 | Title 3 | Description 3 | ... |
| ... | | | |
+-----+---------+---------------+-----+
There is an equal row for every usersettings entry in users with matching ids. So their username etc. can always referenced from usersettings by knowing its own id number. Currently I only have one database but in an attempt to future proof it to some degree I store descriptions in a table with an index in its name and each user has a cluster number value as well as a server number value. Each user has, on average, about 100 descriptions row so this is coming to 20,000 rows at the moment. When this is creating a bottleneck I'll start a descriptions table 0002, and later a second server should it be needed. Perhaps I'm naive in my workflow but it seems like it should help.
Summary:
So in summary, should I adapt my categories array to store a string description for child categories by:
Making the child categories key have an array value rather than the
current string value that contains the current string value and an
additional string description.
Like 1 but make the string description an id number that references
a string on a new table
Look at not using a json encoded array at all and move the entire
category structure into its own table
Create a table for parent categories, one for child categories and one for the csv contents. Include a description column (per the conundrum above) and an order column (essential, per the design requirements above) in each - or is there a better method of storing order than retrieving and updating the order column for each relevant row when the table will contain unique category information for multiple users? It sounds like it may require a lot of overhead.

I ended up going for a solution somewhat similar to (4). I also better appreciate the importance of describing the design requirements now as what led me to this decision was the realisation that it was more efficient in processing (I believe?) and simpler to comprehend working with select levels of a hierarchy at a time.
For example, If I'm dealing with all descriptions under parent category 2, child category 1, I just fetch or insert all descriptions in a description table with a shared identifier, rather than dealing with a multidimensional array that contains all hierarchies. The latter made organising users in the db easier but the categorisation was becoming large enough that I decided it did warrant separate tables for each level of the hierarchy. There's enough situations where I'm working with only an isolated level of the categorisation hierarchy that putting the entire categorisation into a single md array felt like the poorer choice.
In terms of overhead difference, I'm unsure for now. There's less sorting of arrays happening in php to isolate data I need but there's far more calls to the db.
My hesitation in understanding the design requirements (and still not giving a thorough answer on this) is that I'm new to large user databases and am not good at forecasting the needs. I'm designing it in such a way that it feels scalable to me and so, again, the table for each level of the hierarchy feels the least cumbersome (after the cumbersome set up - I'm currently redoing tonnes of code to make functions work with the new set up) and more scaleable as needs change.

MySQL: can I implement row-level AND column level security?

Say I have a table like this:
itemID | PriceA | PriceB | PriceC | other columns...
1 | 8.0 | 6.95 | 0.5 | ...
2 | 5.9 | 6.97 | 4.1 | ...
3 | 0.2 | 1.12 | 3.5 | ...
I want a user to login in, but only see certain rows, and only one Price column. For example, user Susie can see only rows 1 and 2, and only Price B for those items. User Hanna can see rows 2 and 3 at Price A.
Maybe it doesn't need to be database-level security. Basically, users will login-in on a website (a Wordpress) and, after logging-in, will see certain products at a certain price.
As well, more than one user can access any given row or column. It isn't a one-to-one relationship. I think this differs from typical row-level mysql security.
I have 2 questions:
Should this be database-level security or should it be something else? PHP code?
Any suggestions on how I can implement this?

Actually, I think creating views will solve my problem. Does that seem secure?
I found this: How can I allow users sql access to a table limited to certain rows?

Customer reviews and calendar entries, etc in a database

How would things like customer reviews be stored in a database? I cant imagine there would be rows for each item and columns for each review as one product may have 2 reviews and another may have 100+ - id presume they were stored in a separate file for reviews but then surely not one file per item! I dont know enough about storing data to be able to figure this one out by myself!
A similar situation is something like an online calendar - there is all the information about each appointment (time, duration, location, etc) and there can be many of these on each day, every day, for all users! A logical way would be to have a table for each user with all their appointments in, but at the same time that seems illogical because if you have 1000+ users, thats alot of tables!
Basically Id like to know what the common/best practice way is of storing this 'big dynamic data'.

Customer reviews can easily be stored by using two tables in one-to-many relationship.
Suppose you have a table containing products/articles/whatever worth reviewing. Each of them has an unique ID and other attributes.
Table "products"
+-------------------------------------+
| id | name | attribute1 | attribute2 |
+-------------------------------------+
Then you make another table, with its name indicating what it's about. It should contain at least an unique ID and a column for the IDs from the other table. Let's say it will also have an email of the user who submitted the review and (obviously) the review text itself:
Table "products_reviews"
+--------------------------------------------+
| id | product_id | user_email | review_text |
+--------------------------------------------+
So far, so good. Let's assume you're selling apples.
Table "products"
+-------------------------------+
| 1 | 'Apple' | 'green' | '30$' |
+-------------------------------+
Then, two customers come, each one buys one apple worth 30$ and likes it, so they both leave a review.
Table "products_reviews"
+-------------------------------------------------------------------------------+
| 1 | 2 | alice#mail.com | 'I really like these green apples, they are awesome' |
| 2 | 2 | bob#mail.com | 'These apples rock!' |
+-------------------------------------------------------------------------------+
So now all you have to do is to fetch all the reviews for your apples and be happy about how much your customers like them:
SELECT *
FROM products_reviews
INNER JOIN products ON products_reviews.product_id = products.id
WHERE products.name = 'Apple';
You can now display them under the shopping page for apples (just don't mention they cost 30$).
The same principle applies for things like an online calendar. You have one table with users, and many tables with other stuff - appointments, meetings, etc. which relate to that user.
Keep in mind, however, that things like meetings are better displayed in a many-to-many table, since they are shared by many people (usually). Here's a link that visualizes it very good, and here's a question here on SO with sample code for PHP. Go ahead and test it for yourself.
Cheers :)

"horizontal" vs. "vertical" table design, SQL

Apologies if this has been covered thoroughly in the past - I've seen some related posts but haven't found anything that satisfies me with regards to this specific scenario.
I've been recently looking over a relatively simple game with around 10k players. In the game you can catch and breed pets that have certain attributes (i.e. wings, horns, manes). There's currently a table in the database that looks something like this:
-------------------------------------------------------------------------------
| pet_id | wings1 | wings1_hex | wings2 | wings2_hex | horns1 | horns1_hex | ...
-------------------------------------------------------------------------------
| 1 | 1 | ffffff | NULL | NULL | 2 | 000000 | ...
| 2 | NULL | NULL | NULL | NULL | NULL | NULL | ...
| 3 | 2 | ff0000 | 1 | ffffff | 3 | 00ff00 | ...
| 4 | NULL | NULL | NULL | NULL | 1 | 0000ff | ...
etc...
The table goes on like that and currently has 100+ columns, but in general a single pet will only have around 1-8 of these attributes. A new attribute is added every 1-2 months which requires table columns to be added. The table is rarely updated and read frequently.
I've been proposing that we move to a more vertical design scheme for better flexibility as we want to start adding larger volumes of attributes in the future, i.e.:
----------------------------------------------------------------
| pet_id | attribute_id | attribute_color | attribute_position |
----------------------------------------------------------------
| 1 | 1 | ffffff | 1 |
| 1 | 3 | 000000 | 2 |
| 3 | 2 | ffffff | 1 |
| 3 | 1 | ff0000 | 2 |
| 3 | 3 | 00ff00 | 3 |
| 4 | 3 | 0000ff | 1 |
etc...
The old developer has raised concerns that this will create performance issues as users very frequently search for pets with specific attributes (i.e. must have these attributes, must have at least one in this colour or position, must have > 30 attributes). Currently the search is quite fast as there are no JOINS required, but introducing a vertical table would presumably mean an additional join for every attribute searched and would also triple the number of rows or so.
The first part of my question is if anyone has any recommendations with regards to this? I'm not particularly experienced with database design or optimisation.
I've run tests for a variety of cases but they've been largely inconclusive - the times vary quite significantly for all of the queries that I ran (i.e. between half a second and 20+ seconds), so I suppose the second part of my question is whether there's a more reliable way of profiling query times than using microtime(true) in PHP.
Thanks.

This is called the Entity-Attribute-Value-Model, and relational database systems are really not suited for it at all.
To quote someone who deems it one of the five errors not to make:
So what are the benefits that are touted for EAV? Well, there are none. Since EAV tables will contain any kind of data, we have to PIVOT the data to a tabular representation, with appropriate columns, in order to make it useful. In many cases, there is middleware or client-side software that does this behind the scenes, thereby providing the illusion to the user that they are dealing with well-designed data.
EAV models have a host of problems.
Firstly, the massive amount of data is, in itself, essentially unmanageable.
Secondly, there is no possible way to define the necessary constraints -- any potential check constraints will have to include extensive hard-coding for appropriate attribute names. Since a single column holds all possible values, the datatype is usually VARCHAR(n).
Thirdly, don't even think about having any useful foreign keys.
Finally, there is the complexity and awkwardness of queries. Some folks consider it a benefit to be able to jam a variety of data into a single table when necessary -- they call it "scalable". In reality, since EAV mixes up data with metadata, it is lot more difficult to manipulate data even for simple requirements.
The solution to the EAV nightmare is simple: Analyze and research the users' needs and identify the data requirements up-front. A relational database maintains the integrity and consistency of data. It is virtually impossible to make a case for designing such a database without well-defined requirements. Period.
The table goes on like that and currently has 100+ columns, but in general a single pet will only have around 1-8 of these attributes.
That looks like a case for normalization: Break the table into multiple, for example one for horns, one for wings, all connected by foreign key to the main entity table. But do make sure that every attribute still maps to one or more columns, so that you can define constraints, data types, indexes, and so on.

Do the join. The database was specifically designed to support joins for your use case. If there is any doubt, then benchmark.
EDIT: A better way to profile the queries is to run the query directly in the MySQL interpretter on the CLI. It will give you the exact time that it took to run the query. The PHP microtime() function will also introduce other latencies (Apache, PHP, server resource allocation, network if connection to a remote MySQL instance, etc).

What you are proposing is called 'normalization'. This is exactly what relational databases were made for - if you take care of your indexes, the joins will run almost as fast as if the data were in one table.
Actually, they might even go faster: instead of loading 1 table row with 100 columns, you can just load the columns you need. If a pet only has 8 attributes, you only load those 8.

This question is a very subjective. If you have the resources to update the middleware to reflect the column that has been added then, by all means, go with horizontal there is nothing safer and easier to learn than a fixed structure. One thing to remember, anytime you update a tables structure you have to update each one of its dependencies unless there is some catch-all like *, which I suggest you stay aware from unless you are just dumping data to a screen and order of columns is irrelevant.
With that said, Verticle is the way to go if you don't have all of your requirements in place or don't have the desire to update code in n number of areas. Most of the time you just need storage containers to store data. I would segregate things like numbers, dates, binary, and text in separate columns to preserve some data integrity, but there is nothing wrong with verticle storage, as long as you know how to formulate and structure queries to bring back the data in the appropriate format.
FYI, Wordpress uses verticle data storage for majority of the dynamic content it has to store for the millions of uses it has.

First thing from Database point of view is that your data should be grow vertically not in horizontal way. So, adding a new column is not a good design at all. Second thing, this is very common scenario in DB design. And the way to solve this you have to create three tables. 1st is of Pets, 2nd is of Attributes and 3rd is mapping table between theres two. Here is the example:
Table 1 (Pet)
Pet_ID | Pet_Name
1 | Dog
2 | Cat
Table 2 (Attribute)
Attribute_ID | Attribute_Name
1 | Wings
2 | Eyes
Table 3 (Pet_Attribute)
Pet_ID | Attribute_ID | Attribute_Value
1 | 1 | 0
1 | 2 | 2
About Performance:
Pet_ID and Attribute_ID are the primary keys which are indexed (http://developer.mimer.com/documentation/html_92/Mimer_SQL_Engine_DocSet/Basic_concepts4.html), so the search is very fast. And this is the right way to sovle the problem. Hope, now it will be clear to you.

Find Similar Descriptions in Database PHP/MySQL

We are building a help desk application for running our service company, and I am trying to figure out to assist the call center people in assigning a category based the problem description from the customer.
My primary idea, is to compare the description the customer gave, to prior descriptions, and use the category that was used in the prior service calls based on the most common category assigned.
Any ideas how to do it?
My description field is a blob field as some descriptions are quite long. I would prefer to find a way to do this that requires the least system resources.
Thanks for any input :)
Mike

I'm a person of custom code; I don't feel the job is done right if you use big, bloated systems, so take this with a grain of salt if you are not wanting to code this yourself. However, this might not be as hard as you're making it; yes, I would definitely go with a tagging system. However, it doesn't have to be so complicated.
Here's how I would handle it:
First, make a database with 3 tables; one for categories, tags, and 'links' (links between categories and tags).
Then, create a PHP function that initializes an array (empty works just fine) and pushes new (lowercased) words if they don't exist. An example of this might be:
<?php
// Pass the new description to this
// function.
function getCategory($description)
{
// Lowercase it all
$description = strtolower($description);
// Kill extra whitespace
$description = trim($description);
$description = preg_replace('~\s\s+~', ' ', $description);
// Kill anything that isn't a number or a letter
// NOTE: This is untested, so just edit this however you'd like to make it work. The
// idea is to just eliminate everything that isn't a letter or number. Just don't take out
// spaces; we need them!
$descripton = trim($description, "!##$%^&*()_+-=[]{};:'\"\\\n\r|<>?,./");
// Now the description should just contain words with a single space in between them.
// Let's break them up.
$dict = explode(" ", $description);
// And find the unique ones...
$dict = array_unique($dict, SORT_STRING);
// If you wanted to, you could trim either common words you specify,
// or any words under, say, 4 characters. Up to you!
return $dict;
}
?>
Next, populate your database how you want; make a few categories and some tags, and then link them together (if you want to get fancy, switch the MySQL engine to InnoDB and make relationships. Makes things a bit quicker!)
Table `Categories`
|-------------------------|
| Column: Category |
| Rows: |
| Food |
| Animals |
| Plants |
| |
|-------------------------|
Table `Tags`
|-------------------------|
| Column: Tag |
| Rows: |
| eat |
| hamburger |
| meat |
| leaf |
| stem |
| seed |
| fur |
| hair |
| claws |
| |
|-------------------------|
Table `Links`
|-------------------------|
| Columns: tag, category |
| Rows: |
| eat, Food |
| hamburger, Food |
| meat, Food |
| leaf, Food |
| leaf, Plant |
| stem, Plant |
| fur, Animals |
| ... |
|-------------------------|
By using MySQL InnoDB relationships, the links table will not take up any more space by creating rows; this is because they are linked, in a way, and are all stored by reference. This will immensely cut down on database size.
Now, for the kicker, a clever mysql query to the database, which follows these steps:
For each category, sum up the tags belonging both to the category and the description dictionary (which we created in the earlier PHP function).
Sort them from greatest to least
Pull the top 1 or 3 or however many suggested categories you'd like!
This will get you a nice list of categories that have the highest matching count of tags. How you want to craft the MySQL query is up to you.
While this seems like a lot of setup, it really isn't. You have 3 tables at most, one or two PHP functions and a few MySQL queries. The database will only be as big as the categories, the tags and the references to both (in the links table; references don't take up much space!)
To update the database, simply put in tags that don't exist to the tags database and link them to the category you decided to assign to the description. This will broaden your database's range of tags and will, over time, get your database more tuned to your descriptions (i.e. more accurate).
If you wanted to get really detailed, you'd insert duplicate links between categories and tags to create a sort of weighted tag system, which would make your system even more accurate.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.