I am using mysql database with php to build a web application .
I have a child table attachment, which is a common table for many master tables: teacher, student, classRoom (and others).
The master tables number exceeds 10 lets say n tables.
My question's, is it a good practice to:
Create just one table in database called 'attachment' and relate it with its masters .
This will cause to have n foreign keys in the attachment table (ie: n-1 unused columns ) which will leads too, to n-1 attributes in the model without being initialized or used each time I create a model .
Create a table for each master table (master_i) called (master_i_Attachment) and relate it just to its master. But this will lead to n attachments tables and n models for attachment in my code.
Any advice ?
What you can do is to just have a table with the following fields: id, reference_id (one of your parent tables), reference_type (ie to which table the reference_id belongs), (all the other fields in your attachment table).
Then, for example, if you want to get the attachments for the particular parent type, you can run SELECT query filtering on that type, e.g. WHERE reference_type='classroom'.
Or if you want to get the attachment for the classroom with a specific ID:
SELECT * FROM attachment WHERE reference_id=<ID> AND reference_type = 'classroom';
You will probably want to have a composite unique key on (reference_id, reference_type) which will ensure that you won't get duplicated attachments (unless you want the possibility for the given ID of the given type to have more than one attachment, in which case the key should not be unique).
Whether this solution suits your needs depends on how you are going to use the data, i.e. what kind of queries you are going to run most often.
Based on database normalization concept, using redundant and uninitialized (or null) values in database is discouraged. Actually normalization tries to isolate data more and more (it means more table for any anomaly). BUT you can simply ignore rules or denormalize your database for performance issues.
In your case, I think the simplest (and normalized) way would be choice number #2 (a separate table for each attachment type). But you can tweak your design as Ashalynd says. Put a type column in your table to specify the parent table. BTW using this method will add complexity for cascading changes in database.
Related
For the past couple years I've been working on my own lightweight PHP CMS that I use for my personal projects. The one thing its missing is an easy databasing solution.
I am looking to create a simple content type database framework in which I can specify a new type (user, book, event..ect) and then be able to load everything related to it automatically.
For some content types, there could be fields that can only have 1 value and some that can have zero to many values so I will use a new table for these. Take the example:
table: event
columns: id, name, description, date
table: event_people:
columns: id_event, id_user
table: event_pictures:
columns: id_event, picture
Events will have a bunch of fields that contain a value such as the description, but there could also be a bunch of pictures and people going to it.
I want to be able to create a generic PHP class that will load all the information on a content type. My current thought process is to make entity loader function that I can give it an id and type:
Entity:load($id, "event");
From this I was going to get all of the tables with the prefix of "event", load all of the data with the passed in ID and then store it in a multidimensional array. I feel like there is probably a more efficient way for this however. I'd like to stay away from having a config file someplace that specifies all of the content types and their child tables because I want to be able to add a new child table and have it pick it up automatically.
Is there anyway to store this relationship directly within the MySQL table? I don't do a lot of databasing and I've just recently started to use foreign keys (what a life saver). Would I be more efficient to see which tables have a foreign key related to the id column in the event table, and if so how would this be done? I'm also open to different ways of storing this information.
Note: I'm doing this just for fun so please don't refer me to use any premade frameworks. I'd like to create this myself.
I think your approach of searching for all tables with prefix name event is sensible. The only way I can think to be more efficient is to have an "entity_relationship" table that you could query. It would allow you flexibility in your naming convention, avoid naming conflicts, and this lookup should be more efficient than a pattern match search.
Then whenever a new object type with its own table was added, then you could make an entry on the relationship table.
INSERT INTO entity_relationship VALUES
('event','event_people'),
('event','event_pictures'),
('event','event_documents'),
('event','event_performers');
I am developing a MySQL db for a user list, and I am trying to determine the most efficient way to design it.
My issue comes in that there are 3 types of users: "general", "normal", and "super". General and normal users differ only in the values of certain columns, so the schema to store them is identical. However, super users have at least 4 extra columns of info that needs to be stored.
In addition, each user needs a unique user_id for reference from other parts of the site.
So, I can keep all 3 users in the same table, but then I would have a lot of NULL values stored for the general and normal user rows.
Or, I can split the users into 2 tables: general/normal and super. This would get rid of the abundance of NULLs, but would require a lot more work to keep track of the user_ids and ensure they are unique, as I would have to handle that in my PHP instead of just doing a SERIAL column in the single table solution above.
Which solution is more efficient in terms of memory usage and performance?
Or is there another, better solution I am not seeing?
Thanks!
If each user needs a unique id, then you have the answer to your question: You want one users table with a UserId column. Often, that column would be an auto-incremented integer primary key column -- a good approach to the implementation.
What to do about the other columns? This depends on a number different factors, which are not well explained in your question.
You can store all the columns in the same table. In fact, you could then implement views so you can see users of only one type. However, if a lot of the extra columns are fixed-width (such as numbers) then space is still allocated. Whether or not this is an issue is simply a question of the nature of the columns and the relative numbers of different users.
You can also store the extra columns for each type in its own table. This would have a foreign key relationship to the original table, using the UserId. If both these keys are primary keys, then the joins should be very fast.
There are more exotic possibilities as well. If the columns do not need to be indexed, then MySQL 5.7 has support for JSON, so they could all go into one column. Some databases (particularly columnar-oriented ones) allows "vertical partitioning" where different columns in a single table are stored in separate allocation units. MySQL does not (yet) support vertical partitioning.
why not build an extra table; but only for the extra coloumns you need for super users? so 2 tables one with all the users and one with super users's extra info
If you want to have this type of schema. try to create a relation
like:
tb_user > user_id , user_type_id(int)
tb_user_type > user_type_id(int) , type_name
this way you will have just 2 tables and if the type is not set you can set a default value to a user.
I have many tables in my database, an example is the table fs_user, the following is an extract of the table columns (dealing with privacy settings):
4 Columns from the table fs_user:
show_email_to
show_address_to
show_gender_to
show_interested_in_to
Like many social networks, I need not only to specify which data is private and which is public, but also which data is available to a chosen users, and which one is not.
As I have about 30 data like the 4 data above, I think it will be bad to create one table for every data, and make a many to many relation with the table fs_user.
This is why, I got the idea of saving this data in a Json form for every column (whose type=TEXT), example
show_email_to => {1:'ALL',2:'BUT',3:'3'}
This data means, show email to all users, except the user whose id=3.
Another example:
show_email_to => {1:'NONE',2:'BUT',3:'3',4:'80',5:'10'}
This means, no user will see the email except the users id=3,id=80 and id=10.
Of course, the MySql query will select this data, and PHP/Js will extract the data I need from Json.
Another point, is that sometimes .. a user wants to show data only to his friends except 3 friends.
This will do :
show_email_to => {1:'FRIENDS',2:'BUT',3:'3'}
This means that the email will be shown to all his friends, except user with id=3.
My question is : How much will be this system performant, flexible (for other uses) compared to the 'many to many' solution (which requires to have many data in many tables)??
Note: I know already that saving many elements in one column is a bad practice, But here: I think this is a json element and can be considered as a one Object
This is a good question. What you propose is, with respect, a very bad idea indeed if you're using any flavor of SQL. You are proposing to denormalize your tables in a way that will defeat every attempt to speed up searching or querying in the future.
What should you do instead? You could take a look at using an XML-centric dbms like MarkLogic. It's capable of creating indexes that accelerate various Xpath-style queries, so you would be able to search on relationships. If you do that, I hope you have a big budget.
Or, you could use normalized permission tables.
item_to_show (item id)
order (an integer specifying rule ordering, needed for this)
recipient (user id)
isdenied (0 means recipient is allowed, 1 means she is denied)
In this table, the primary key is a compound key constructed of the first two columns.
I'm aware that you have many types of items. You assert that it's bad to have an extra table for each item type in your system. I don't agree that it's inherently bad. I believe your proposed solution is far worse.
You could arrange to give each item a unique id number to allow you to use a single permission table. See this for an example of how to do that. Fastest way to generate 11,000,000 unique ids
Or you could have a single permission table with a type id.
item_to_show (item id)
item_type_to_show (item type id)
order (an integer specifying rule ordering, needed for this)
recipient (user id)
isdenied (0 means recipient is allowed, 1 means she is denied)
In this case the primary key is the first three columns.
Or, you can do what you don't want to do and have a separate permission table for each item type.
You say, "As I have about 30 data like the 4 data above, I think it will be bad to create one table for every data, and make a many to many relation with the table fs_user"
I agree with the first part of your statement only. You only need one table. For the sake of a name, I'll call it ShowableItems. Fields would be ShowableItemId (PK) and Item. Some of these items would be email, gender, address, etc.
Then you need a many to many table that shows what items can be shown to whom. Your three fields would be, the id of the person who owns the item, the showable item id, and the id of the person who can see it.
I am currently working on a PHP/MySQL project for an assignment. In studying the efficient design of databases while working on the assignment I notice that in many cases it is good practice to create a third table when working with only two sets of data.
For example, if we have a table for "Students" and a table for "Addresses" it appears to be a good idea to create a third table i.e. "Student_Addresses" since a student can hypothetically have more than one address (separated parents etc.) and a single address can represent more than one student (siblings).
My question is: How do we go about populating that third table? Is there a way that it is done automatically using primary and/or foreign keys?
I've tried Google and my textbook to understand this but I've gotten nowhere. Links to tutorials or articles would be greatly appreciated.
Thanks for your help. I hope the question and example are clear.
n:m or 1:m normalization rule
Option 1:
user table
id
f_name
s_name
......
user address table
id
user_id // this should be index only as foreign keys will allow 1:1 only
address line 1
address line 2
address line 3
address_type (home, office ....)
Option 2:
user table
id
f_name
s_name
......
address table
id
address line 1
address line 2
address line 3
address_type (home, office ....)
user_address table
userId
addressId
according to your description option 2 would be the right solution. After adding the data to user table and address table then you need to add the data to user_address table manually. Some Object relational mapper (ORM) may do add the data to the third table automatically but you need to define the relations. check http://docs.doctrine-project.org/projects/doctrine-orm/en/latest/reference/association-mapping.html.
http://docstore.mik.ua/orelly/linux/sql/ch02_02.htm
http://www.keithjbrown.co.uk/vworks/mysql/mysql_p7.php
You can save the data in the third table using triggers when the data is inserted/updated/deleted in your base tables. You can learn more about triggers at
mySQL Triggers
However in your case it would be better if you could write the logic at the application/code level to make an entry in the third table. You can set up foreign key relationships to this table from your base tables so that the data remains consistent.
There is no native method in MySQL to populate Student_Addresses in your situation - you have to take care of entering data (connections) by yourself, but you can use - for example - transactions - see answers in this topic: SQL Server: Is it possible to insert into two tables at the same time?
For taking care of connections consistency - in Student_Addresses make not-null fields for relations to ID from Student and ID from Address, make both of these field as unique key together and use ON UPDATE CASCADE and ON DELETE CASCADE. This will take care of removing records from junction table when removing records from any of two other tables and also won't allow you to add same address to the same student twice.
I don't think data will be populated automatically rather it's responsibility of user to insert data.
I am note sure about PHP but using Hibernate and Java this can be done seemlessly. Since data of Students and addresses could be coming through some web application Hibernate can map java objects to records in table and also populate relationship table.
Please I don't have any idea. Although I've made some readings on the topic. All I know is it is used to make the data in the database more efficient and easy to handle. And It can also be used to save disk space. And lastly, if you used normalization. You will have to generate more tables.
Now I have a lot of questions to ask.
First, how will normalization help to save disk space or whatever space occupied by the database.
Second, Is it possible to add data on multiple tables using only 1 query.
Please help, I'm just a newbie wanting to learn from you. Thanks.
Ok, couple of things:
php has got nothing to do with this. normalization is about modelling data
normalization is not about saving disk space. It is about organizing data so that it is easily maintainable, which in turn is a way to maintain data-integrity.
normalization is typically described in a few stages or 'normal forms'. In practice, people that design relational databases often intuitively 'get it right' most of the time. But it is still good to be aware of the normal forms and what their characteristics are. There is a lot of documentation on that on the internet (fe http://en.wikipedia.org/wiki/Database_normalization), and you should certainly do you own research, but the most important stages are:
unormalized data: in this stage, data is not truly tabular ('relational'). There is a lot of discussion of what tabular really means, and experts disagree with one another. but most people agree that data is unnormalized in case there are multi-valued attributes (=columns that can for one row contain lists as value), or in case there are repeating groups (=multiple columns or multiple groups of columns for storing the same type of data)
Example of multi-valued column: person (first_name, last_name, phonenumbers)
Here, phonenumbers implies there could be more phonenumbers, stored in one column
Example of repeating group: person(first_name, last_name, child1_first_name, child1_birth_date, child2_first_name, child2_birth_date..., childN_first_name, childN_birth_date)
Here, the person table has a number of column pairs (child_first_name, child_birth_date) to store the person's children.
Note that something like order (shipping_address, billing_address) is not a repeating group: the addresses for billing and shipping may be similar pieces of data, but each has its own distinct role for an order, both just represent a different aspect of an order. child1 thru child10 do not - children do not have specific roles, and the list of children is variable (you never know how many groups you should reserve in advance)
In both cases, multi-valued columns and repeating groups, you basically have "nested table" structure - a table within a table. Data is said to be in 1NF (first normal form) if neither of these occur.
The 1NF is about structural characeristics: the tabular form of the data. All subsequenct normal forms have to do with eliminating redundancy. Redundancy occurs when the same information is independently stored multiple times. Redundancy is bad: if you want to change some fact, you have to change it in multiple places. If you forget to chance one of them, you have inconsistent data - the data is contradicting itself.
There are a lot of processes that can eliminate redundancy, each leading to a higher normal form, all the way from 1nf up to 6nf. However, typically most databases are adequately normalized at 3nf (or a lsight variation of that called boyce-codd normal form, BCNF) You should study 2nf and 3nf, but the principle is very simple: a table is adequately normalized, if:
the table is in 1nf
the table has a key (a column or column combination whose values are required, and which uniquely identifies a row - ie. there can be only one row having that combination of values in the key columns)
there are no functional dependencies between the non-key columns
non-key columns are not functionally dependent upon part of the key (but are completely functionally dependent upon the entire key).
functional dependency means that a column's value can be derived from another column. simple example:
order_item (order_id, item_number, customer_id, product_code, product_description, amount)
let's assume (order_id, item_number) is key. product_code and product description are functionally dependent upon each other: for one particular product_code, you will always find the same product description (as if product description is a function of product_code). The problem is now: suppose a product description changes for a particualr product code, you have to change all orders that us that product_code. forget only one and you have an inconsistent database.
The way to solve it is to create a new product table with (product_code, product_description), having (product_code) as key, and then instead of storing all product fields in order, only store a reference to a row in the product table in the order_item records (in this case, order_item should only keep product_code, which is sufficient to look up a row in the product table and find the product_description)
So as you u can see, with this solution you do actually save space (by not storing all these product descriptions in each order_item that happens to order the product) and you do get more tables (split off product from order_item) But just remember that it is not because of saving diskspace: it is because you eliminate redundancy, thus making it easier to maintain the data. because now you only have to change one row in the product table to change the description
There are a lot of similar questions on StackOverflow already, for example, Can someone please give an example of 1NF, 2NF and 3NF in plain english?
Look in the Related sidebar to the right for a bunch of them. That'll get you started.
As for your specific questions:
Normalization saves disk space by reducing redundant data storage. This has another benefit: if you have multiple copies of a given entity attribute in your database, they can get out of sync, while if you have a normalized database and use referential integrity, this cannot happen.
The INSERT statement references only one table. A TRIGGER on the insert statement can add rows to other tables, but there's no way to supply data to the trigger other than those columns in the table that spawned it.
When you need to insert dependent rows after inserting a row to the parent table, use the LAST_INSERT_ID() function to retrieve the auto-generated primary key value of the last INSERT statement in your session.
I think you will learn this when you start creating the schema for your database.
Please think reverse when you add a field that exists somewhere else in your database.
By reverse I mean, ask yourself: if I have to modify the field, how many queries do I have to run?
Probably you end up, with the answer, that you will have to run 2 or X times the query to modify the content of your column.
Keep it simple, that means assign an ID to each content you have duplicated in your database.
For example taking column address
this is not good
update clients set address = 'new address' where clientid=500;
update orders set address = 'new address' where orderid=300;
good approach would be
create a addresses table
//and run a single query
update addresses set address = 'new address' where addressid=100;
And use the address id 100 everywhere in your database table as a foreign key reference (clients+orders), this way you achieve that the id 100 is not changed, but if you update the content of the address all linked tables will pick up the change.
Level 3 of normalization is enough this time for you.
Normalization is a set of rules. The more you follow, the higher a "level" of normalisation your database has. In general, level 3 is the highest level sought after.
Normalised data is theoretically "purer" than non-normalised data. This makes it easier to rationalise about it, and it removes redundancy, which is reduces the chance of data getting out of sync.
From a pratical viewpoint however, normalised data isn't always the best design, even if it is in theory. If you don't really know the finer points, aiming for normalised data isn't such a bad idea though.
in phpmyadmin > 4.3.0, in structure -> Table structure, we got above the table:
"Print" "Propose table structure" "Track table" "Move columns" "Improve table structure" , in "Improve table structure" you got a wizard which says :
Improve table structure (Normalization):
Select up to what step you want to normalize
First step of normalization (1NF)
Second step of normalization (1NF+2NF)
Third step of normalization (1NF+2NF+3NF)
To question 2: No it is not possible to insert data into multiple tables with one query.
See the INSERT syntax.
In addition to other answers, you can also search here on SO for normalization and find e.g. the question: Normalization in MySQL