I am trying to create a site where users can register and create a profile, therefore I am using two MySQL tables within a database e.g. users and user_profile.
The users table has an auto increment primary key called user_id.
The user_profile table has the same primary key called user_id however it is not auto increment.
*see note for why I have multiple tables.
When a user signs up, data from the registration form is inserted into users, then the last_insert_id() is input into the user_id field of the user_profile table. I use transactions to ensure this always happens.
My question is, is this bad practice?
Should I have a unique auto increment primary key for the user_profile table, even though one user can only ever have one profile?
Maybe there are other downsides to creating a database like this?
I'd appreciate if anyone can explain why this is a problem or if it's fine, I'd like to make sure my database is as efficient as possible.
Note: I am using seperate tables for user and user_profile because user_profile contains fields that are potentially null and also will be requested much more than the user table, due to the data being displayed on a public profile.
Maybe this is also bad practice and they should be lumped in one table?
I find this a good approach, I'd give bonus point if you use a foreign key relation and preferably cascade when deleting the user from the user table.
As too separated the core user data in one table, and the option profile data in another - good job. Nothing more annoying then a 50 field dragonish entry with 90% empty values.
It is generally frowned upon, but as long as you can provide the reasoning for the 1 to 1 relationship I'm sure it is fine.
I have used them when I have hundreds of columns (and it would be more logical to split them out into separate tables)
or I need a thinner table to speed up fullscans
In your case I would use a single table and create a couple of views.
see: http://dev.mysql.com/doc/refman/5.0/en/create-view.html
In general a single table approach is more logical, quicker, simpiler, and uses less space.
I don't think it's a bad practice. Sometimes it's quite useful, especially if you want one class to deal with authentication, and not load all profile data. You can then modify how your authentication works, build web services and so on, with little care about maintaining data structures about profiles information which is likely to change as your project evolves.
This is very good practice.
It's right at the core of writing good, modular, normalised relational database structures.
Related
In a relational database which is the best way to store a foriegn key, or relate a key, not sure how best to phrase it.
For example if you had tables:
User
ID
Name
Email
Email:
ID
Email
This is not a table I am actually creating just simple enough to get the idea across.
In the user table if I would create a key with the ID then when I select the user table the ID of the email is the value in the email field. I can reference the actual email field, this fixes that issue, but is that not wasteful?
I am using PHPMyAdmin to setup the databases and keys.
Any advice the best way to do this?
PhpMyAdmin provides a Graphical User Interface (GUI) to MySQL operations.
The best way to do this is databases, which I presume you will have already created, is to simply run a quick query!
MySQL looks pretty scary, but its actually really simple, once you grasp the concept of it!
By looking at your example, you'll most likely be looking to run a query like this one. (This is rough pseudo and may not run in itself)
ALTER TABLE User
ADD FOREIGN KEY (ID) REFERENCES Email(ID);
Easy peasy! :)
The best resource for finding out more about the syntax required is over on W3 Schools! This is the article you'll want to look at!
Edit
As a general database rule, I'd recommend making all your column names (fields) unique! This avoids confusion. For example, you could use UserID and EmailID.
I have this MySQL table, where row contact_id is unique for each user_id.
history:
- hist_id: int(11) auto_increment primary key
- user_id: int(11)
- contact_id: int(11)
- name: varchar(50)
- phone: varchar(30)
From time to time, server will receive a new list of contacts for a specific user_id and need to update this table, inserting, deleting or updating data that is different from previous information.
For example, currenty data is:
So, server receive this data:
And the new data is:
As you can see, first row (John) was updated, second row (Mary) was deleted and some other row (Jeniffer) was included.
Today what I am doing is deleting all rows with a specific user_id, and inserting the new data. But the autoincrement field (hist_id) is getting bigger and bigger...
Obs: Table have about 80 thousand records, and this update will occur 30 times a day or more.
I have some (related) questions:
1. In this scenario, do you think deleting all records from a specific user_id and inserting updated data is a good approach?
2. What about removing the autoincrement field? I don't need it, but I think it is not a good idea to have a table without a primary key.
3. Or maybe the better approach is to loop new data, selecting each user_id / contact_id for comparing values to update?
PS. For better approach I mean the most efficient way
Thank you so much for any help!
In this scenario, do you think deleting all records from a specific user_id and inserting updated data is a good approach?
Short Answer
No. You should be taking advantage of 'upsert' which is short for 'insert on duplicate key update'. What this means is that if they key pair you're inserting already exists, update the specified columns with the specified data. You then shorten your logic and reduce increments. Here's an example, using your table structure that should work. This is also assuming that you have set the user_id and contact_id fields to unique.
INSERT INTO history (user_id, contact_id, name, phone)
VALUES
(1, 23, 'James Jr.', '(619)-543-6222')
ON DUPLICATE KEY UPDATE
name=VALUES(name),
phone=VALUES(phone);
This query should retain the contact_id but overwrite the prexisting data with the new data.
What about removing the autoincrement field? I don't need it, but I think it is not a good idea to have a table without a primary key.
Primary keys do not imply auto incremented values. I could have a varchar field as the primary key containing names of fruits and vegetables. Is this optimized for performance? Probably not. There many situations that might call for auto increment and there are definite reasons to avoid it. It all depends on how you wish to access the data and how this can impact future expansion. In your situation, I would start over on the table structure and re-think how you wish to store and access the data. Do you want to write more logic to control the data OR do you want the data to flow naturally by itself? You've made a history table that is functioning more like a hybrid many-to-one crosswalk at first glance. Without looking at the remaining table structure, I can't necessarily say on a whim that it's not a good idea. What I can say is that I would do this a bit differently. I will answer this more specifically in the next question.
Or maybe the better approach is to loop new data, selecting each user_id / contact_id for comparing values to update?
I would avoid looping through the data in order to update it. That is a job for SQL and it does this job well. Sometimes, we might find ourselves in a situation where we must do this to either extract data in a specific format or to repair data in some way however, avoid doing this for inserting or updating the data. It can negatively impact performance and you will likely paint yourself into a corner.
Back to what I said toward the end of your second question which will help you see what I am talking about. I am going to assume that user_id is a primary key that is auto-incremented in your user table. I will do some guestimation here and show you an example of how you can redesign your user, contact and phone number structure. The following is a quick model I threw together that shows the foreign key relationship between the tables.
Note: The column names and overall data arrangement could be done differently but I did this quickly to give you a decent example of a normalized database structure. All of the foreign keys have a structural layout which separates your data in a way that enables you to control the flow of data as it enters and leaves your system. Here's the screenshot of the database model I threw together using MySQL Workbench.
(source: xonos.net)
Here's the SQL so that you can look at it more closely.
You'll notice that the "person" table is extracted from users but shares data with contacts. This enables you to store all "people" in one place, all "users" in another and all "contacts" in another. Now, why would we do this? The number one reason can be explained in two scenarios.
1.) Say we have someone, in this example I'll call him "Jim Bean". "Jim Bean" works for the company, so he is a user of the system. But, "Jim Bean" happens to own a side business and does contact work for the company at the same time. So, he is both a contact and a user of the system. In a more "flat table" environment, we would have two records for Jim Bean that contain the same data which could become outdated or incorrect, quickly.
2.) Let's say that Jim did some bad things and the company wants nothing to do with him anymore. They don't want any record of him - as if he never existed. All that we have to do is delete Jim Bean from the Person table. That's it. Since the foreign relationship has "CASCADE" on update/delete - this automatically propagate and clears out the other tables related to him.
I highly recommend that you do some reading on normalized data structure. It has saved me many hours once I got the hang of it and I will never go back.
I am developing a MySQL db for a user list, and I am trying to determine the most efficient way to design it.
My issue comes in that there are 3 types of users: "general", "normal", and "super". General and normal users differ only in the values of certain columns, so the schema to store them is identical. However, super users have at least 4 extra columns of info that needs to be stored.
In addition, each user needs a unique user_id for reference from other parts of the site.
So, I can keep all 3 users in the same table, but then I would have a lot of NULL values stored for the general and normal user rows.
Or, I can split the users into 2 tables: general/normal and super. This would get rid of the abundance of NULLs, but would require a lot more work to keep track of the user_ids and ensure they are unique, as I would have to handle that in my PHP instead of just doing a SERIAL column in the single table solution above.
Which solution is more efficient in terms of memory usage and performance?
Or is there another, better solution I am not seeing?
Thanks!
If each user needs a unique id, then you have the answer to your question: You want one users table with a UserId column. Often, that column would be an auto-incremented integer primary key column -- a good approach to the implementation.
What to do about the other columns? This depends on a number different factors, which are not well explained in your question.
You can store all the columns in the same table. In fact, you could then implement views so you can see users of only one type. However, if a lot of the extra columns are fixed-width (such as numbers) then space is still allocated. Whether or not this is an issue is simply a question of the nature of the columns and the relative numbers of different users.
You can also store the extra columns for each type in its own table. This would have a foreign key relationship to the original table, using the UserId. If both these keys are primary keys, then the joins should be very fast.
There are more exotic possibilities as well. If the columns do not need to be indexed, then MySQL 5.7 has support for JSON, so they could all go into one column. Some databases (particularly columnar-oriented ones) allows "vertical partitioning" where different columns in a single table are stored in separate allocation units. MySQL does not (yet) support vertical partitioning.
why not build an extra table; but only for the extra coloumns you need for super users? so 2 tables one with all the users and one with super users's extra info
If you want to have this type of schema. try to create a relation
like:
tb_user > user_id , user_type_id(int)
tb_user_type > user_type_id(int) , type_name
this way you will have just 2 tables and if the type is not set you can set a default value to a user.
I need help in how to setup my tables in my mysql database.
I have an user. An user gets 10 tokens to use as they wish. A token can be used only for one thing only. They can use it to create a cartoon page, game page, bio page, question page, and etc. The token can be reassigned to another page also.
I want to see what the best way to structure my database would be and the relationships.
I am thinking of having a user table, token table, cartoon table, game table, bio table, and question table. But how do I relate to all of them.
Relational database table design is not a hard science. It's a highly flexible paradigm that can and will change depending on the nature of your application. That being said, there are certain design principles which you should typically adhere to. See e.g. http://en.wikipedia.org/wiki/Database_design.
That being said, I'll take a shot at this from an approach that seems highly dynamic, since I'm not sure about the application specifics. Firstly, you will most definitely need a users table that might look like:
users:
first_name
last_name
user_id
n_tokens
I put this here in case a user's number of tokens is dynamic
Now that you have a way of specifying a user, you'll want to see how that user has used his/her tokens:
tokens:
token_id
This is used to specify what this (or these) token(s) are used for.
user_id
This should have a duplicate foreign key constraint on users.user_id.
n_tokens
This is different from users.n_tokens. It is not clear from your problem
whether or not you'll always have 1 token per feature. This value is a
forward-looking approach at possible future enhancements.
pagetype_id
This is to specify what type of page type the token is used for. It'll have
a foreign key constraint on pagetypes.pagetype_id (see below).
As noted above, you want people to use tokens on specific page types, like cartoon pages, bio pages, etc. which this table defines:
pagetypes:
pagetype_id
Used to define a unique identifier to this type of page
pagetype_name
String field to label the page types, e.g. "cartoon," "bio," etc.
Now, you may have several tables defining the types of pages, so I'll specify the layout for just one:
bio_pages:
pagetype_id
This id has a duplicate foreign key constraint on pagetypes.pagetype_id.
The value will ALWAYS be constant for this table. This is not the most
normalized design, as you MAY want to have a more generic 'pages' table,
but several complex joins can be confusing to some developers and even
worse, slow for queries.
token_id
Foreign key constraint on tokens.token_id.
OTHER
The rest of the fields are user-defined, as you wish to specify what
belongs on a bio page.
SUMMARY
To summarize the above table layouts, you'll want to design the database so that self-contained entities do not span multiple tables, and multiple entities don't combine into one table. With that said, I separated out the users table from the tokens table from the different <pagetype>_pages tables. To make the design more intuitive I'll make some sample queries that might help with your understanding of the layout.
To get all the bio pages that a user has created:
SELECT bio_pages.*
FROM bio_pages
INNER
JOIN tokens
ON bio_pages.token_id=tokens.token_id
INNER
JOIN users
ON tokens.user_id=users.user_id
WHERE users.first_name='john'
AND users.last_name='doe'
To get the total number of coins that a user has used, and also the maximum number of coins they can use:
SELECT SUM(tokens.n_tokens) AS used_tokens,
users.n_tokens AS max_tokens
FROM tokens
INNER
JOIN users
ON tokens.user_id=users.user_id
WHERE users.first_name='john'
AND users.last_name='doe'
GROUP BY users.user_id
Hopefully this all makes sense and is intuitive. Please let me know if anything is unclear.
Please I don't have any idea. Although I've made some readings on the topic. All I know is it is used to make the data in the database more efficient and easy to handle. And It can also be used to save disk space. And lastly, if you used normalization. You will have to generate more tables.
Now I have a lot of questions to ask.
First, how will normalization help to save disk space or whatever space occupied by the database.
Second, Is it possible to add data on multiple tables using only 1 query.
Please help, I'm just a newbie wanting to learn from you. Thanks.
Ok, couple of things:
php has got nothing to do with this. normalization is about modelling data
normalization is not about saving disk space. It is about organizing data so that it is easily maintainable, which in turn is a way to maintain data-integrity.
normalization is typically described in a few stages or 'normal forms'. In practice, people that design relational databases often intuitively 'get it right' most of the time. But it is still good to be aware of the normal forms and what their characteristics are. There is a lot of documentation on that on the internet (fe http://en.wikipedia.org/wiki/Database_normalization), and you should certainly do you own research, but the most important stages are:
unormalized data: in this stage, data is not truly tabular ('relational'). There is a lot of discussion of what tabular really means, and experts disagree with one another. but most people agree that data is unnormalized in case there are multi-valued attributes (=columns that can for one row contain lists as value), or in case there are repeating groups (=multiple columns or multiple groups of columns for storing the same type of data)
Example of multi-valued column: person (first_name, last_name, phonenumbers)
Here, phonenumbers implies there could be more phonenumbers, stored in one column
Example of repeating group: person(first_name, last_name, child1_first_name, child1_birth_date, child2_first_name, child2_birth_date..., childN_first_name, childN_birth_date)
Here, the person table has a number of column pairs (child_first_name, child_birth_date) to store the person's children.
Note that something like order (shipping_address, billing_address) is not a repeating group: the addresses for billing and shipping may be similar pieces of data, but each has its own distinct role for an order, both just represent a different aspect of an order. child1 thru child10 do not - children do not have specific roles, and the list of children is variable (you never know how many groups you should reserve in advance)
In both cases, multi-valued columns and repeating groups, you basically have "nested table" structure - a table within a table. Data is said to be in 1NF (first normal form) if neither of these occur.
The 1NF is about structural characeristics: the tabular form of the data. All subsequenct normal forms have to do with eliminating redundancy. Redundancy occurs when the same information is independently stored multiple times. Redundancy is bad: if you want to change some fact, you have to change it in multiple places. If you forget to chance one of them, you have inconsistent data - the data is contradicting itself.
There are a lot of processes that can eliminate redundancy, each leading to a higher normal form, all the way from 1nf up to 6nf. However, typically most databases are adequately normalized at 3nf (or a lsight variation of that called boyce-codd normal form, BCNF) You should study 2nf and 3nf, but the principle is very simple: a table is adequately normalized, if:
the table is in 1nf
the table has a key (a column or column combination whose values are required, and which uniquely identifies a row - ie. there can be only one row having that combination of values in the key columns)
there are no functional dependencies between the non-key columns
non-key columns are not functionally dependent upon part of the key (but are completely functionally dependent upon the entire key).
functional dependency means that a column's value can be derived from another column. simple example:
order_item (order_id, item_number, customer_id, product_code, product_description, amount)
let's assume (order_id, item_number) is key. product_code and product description are functionally dependent upon each other: for one particular product_code, you will always find the same product description (as if product description is a function of product_code). The problem is now: suppose a product description changes for a particualr product code, you have to change all orders that us that product_code. forget only one and you have an inconsistent database.
The way to solve it is to create a new product table with (product_code, product_description), having (product_code) as key, and then instead of storing all product fields in order, only store a reference to a row in the product table in the order_item records (in this case, order_item should only keep product_code, which is sufficient to look up a row in the product table and find the product_description)
So as you u can see, with this solution you do actually save space (by not storing all these product descriptions in each order_item that happens to order the product) and you do get more tables (split off product from order_item) But just remember that it is not because of saving diskspace: it is because you eliminate redundancy, thus making it easier to maintain the data. because now you only have to change one row in the product table to change the description
There are a lot of similar questions on StackOverflow already, for example, Can someone please give an example of 1NF, 2NF and 3NF in plain english?
Look in the Related sidebar to the right for a bunch of them. That'll get you started.
As for your specific questions:
Normalization saves disk space by reducing redundant data storage. This has another benefit: if you have multiple copies of a given entity attribute in your database, they can get out of sync, while if you have a normalized database and use referential integrity, this cannot happen.
The INSERT statement references only one table. A TRIGGER on the insert statement can add rows to other tables, but there's no way to supply data to the trigger other than those columns in the table that spawned it.
When you need to insert dependent rows after inserting a row to the parent table, use the LAST_INSERT_ID() function to retrieve the auto-generated primary key value of the last INSERT statement in your session.
I think you will learn this when you start creating the schema for your database.
Please think reverse when you add a field that exists somewhere else in your database.
By reverse I mean, ask yourself: if I have to modify the field, how many queries do I have to run?
Probably you end up, with the answer, that you will have to run 2 or X times the query to modify the content of your column.
Keep it simple, that means assign an ID to each content you have duplicated in your database.
For example taking column address
this is not good
update clients set address = 'new address' where clientid=500;
update orders set address = 'new address' where orderid=300;
good approach would be
create a addresses table
//and run a single query
update addresses set address = 'new address' where addressid=100;
And use the address id 100 everywhere in your database table as a foreign key reference (clients+orders), this way you achieve that the id 100 is not changed, but if you update the content of the address all linked tables will pick up the change.
Level 3 of normalization is enough this time for you.
Normalization is a set of rules. The more you follow, the higher a "level" of normalisation your database has. In general, level 3 is the highest level sought after.
Normalised data is theoretically "purer" than non-normalised data. This makes it easier to rationalise about it, and it removes redundancy, which is reduces the chance of data getting out of sync.
From a pratical viewpoint however, normalised data isn't always the best design, even if it is in theory. If you don't really know the finer points, aiming for normalised data isn't such a bad idea though.
in phpmyadmin > 4.3.0, in structure -> Table structure, we got above the table:
"Print" "Propose table structure" "Track table" "Move columns" "Improve table structure" , in "Improve table structure" you got a wizard which says :
Improve table structure (Normalization):
Select up to what step you want to normalize
First step of normalization (1NF)
Second step of normalization (1NF+2NF)
Third step of normalization (1NF+2NF+3NF)
To question 2: No it is not possible to insert data into multiple tables with one query.
See the INSERT syntax.
In addition to other answers, you can also search here on SO for normalization and find e.g. the question: Normalization in MySQL