One simple question and I couldn't find any answers to id :
Should name be in 2 different DB columns ( name / surname ) or in 1 column ( name + surname ) ?
In all the projects I had they were in 2 different columns, but now I have to start a new project and I was wantering how it better to store it. I mean, the 2 different columns gave me a bit of trouble and sometimes slowed performance down. Please note this very important thing :
A very important part of the public part of the site will be an advanced search and it WILL search for the full name in about 200k records.
So, what do you suggest ? 2 columns or 1 ? I am inclined twords the 1 column solution because I cannot find any advantages in using 2, but maybe I am wrong ?
EDIT
Thank you for the answers. The only reason for this question was for the performance issue, I need all the extra boost I can get.
The point of a relational database is to relate data. If you store a full name (e.g. John Smith) in a single field, you lose the ability to easily separate out the first and last names.
If you store them in separate fields, you can VERY easily rejoin them into a single full name, but it's quite difficult to reliably pull a name apart into separate first + last name components.
Two columns is much more flexible. Eg.
Do you ever want to sort by surname?
Do you ever want to address the person formally (eg: Dear Mr Cosmin)?
Will you ever want to search by surname and not forename, or vice versa?
200K records is a trivial amount in any properly designed database.
You may find this an interesting read on the subject of names
With two columns, you can sort by surname without having to do expensive substring operations in your select statement. It is easy to do a CONCAT to get the full name in situations that call for it, but harder to parse the last name out of names such as "John Doe-Smith" or "John Doe III".
Using 2 columns helps you in:
easy sorting data by surname
communication with user by name (eg. "Hello Michael" used on many websites etc.)
displaying a lot of data in multiple columns (you can display only surname when you have no space on screen)
Names stored in format "Surname Name" is still easy to sort, but may be seen as inelegant in some countries.
In my opinion, I'd rather designed it as two different colmns because you can have various ways to handle the record. About performance issue, add an index on two columns to make faster searching.
There are times when you want to search for John Doe and wanting that even it is reverse Doe John but still matches to John Doe. That's one advantage of having separate fields on the name.
Sample design of schema,
CREATE TABLE PersonList
(
ID INT AUTO_INCREMENT,
FirstName VARCHAR(25),
LastName VARCHAR(25),
-- other fields here,
CONSTRAINT tb_pk PRIMARY (ID),
CONSTRAINT tb_uq UNIQUE (FirstName, LastName)
)
Related
This is my first time making my own mysql database, and I was hoping for some pointers. I've looked through previous questions and found that it IS possible to search multiple tables at once... so that expanded my posibilities.
What I am trying to do, is have a searchable / filterable listing of Snowmobile clubs on a PHP page.
These clubs should be listable by state, county or searchable for by name / other contained info.
I'd also like them to be alphabetized in the results, despite the order of entry.
Currently my mind was in the place of, have a table for NY, PA etc
With Columns for County(varchar), Clubname(varchar), Street address (long text) , phone (varchar) email (varchar) website address (varchar)
Should I really be making multiple tables for each county, such as NY.ALBANY , NY.MADISON
Are the field formats I have chosen the sensible ones?
Should Address be broken into subcomponents... such as street1, street2, city, state, zip.
Eventually, I think I'd like a column "trailsopen" with a yes or no, and change the tr background to green or red based on input.
Hope this makes sense...
Here is how I would setup your db:
state
id (tinyint) //primary key auto incremented unsigned
short (varchar(2)) // stores NY, PA
long (varchar(20)) // Stores New York, Pennsylvania
county
id (int) //primary key auto incremented unsigned
state_id (tinyint) //points to state.id
name (varchar(50))
club_county
id (int) //primary key auto incremented unsigned
county_id (int) //points to county.id
club_id (int) //points to club.id
club
id (int) //primary key auto incremented unsigned
name (varchar(100))
address (varchar(100)
city (varchar(25))
zip (int)
etc...
From my perspective, it seems like 1 table will be enough for your needs. MySQL is so robust that there are many ways to do just about anything. I recommend downloading and using MySQL Workbench, which makes creating tables, changing tables, and writing queries easier and quicker than embedding them in a webpage.
Download MySQL Workbench -> http://dev.mysql.com/downloads/workbench/
You will also need to learn a lot about the MySQL queries. I think you can put all the info that you need in one table, and the trick is which query you use to display the information.
For example, assume you only have 1 table, with all states together. You can display just the snow mobile clubs from NY state with a query like this:
select * from my_table where state = "NY";
If you want to display the result alphabetic by Club Name, then you would use something like this:
select * from my_table where state = "NY" order by clubname;
There is A LOT of documentation online. So I would suggest doing quite a few hours of research and playing with MySQL Workbench.
The purpose of Stack Overflow is to answer more specific questions that have to do with specific code or queries. So once you have built a program, and get stumped on something, you can ask the specific question here. Good luck!
U can a create a single table with composite key constraint. Like..
I have 3 department in a company and each have multiple num of sub department.so I can create a database like this..
Dept_id || sub_dept_id || Name || Sal || Address || Phone
..where Dept_id and sub_dept_id will jointly represent the primary key and beholds its uniqueness.
But remember if ur database is going to be too large,then think before u doing this step,u might need need clustering or index for that scenario.
While writing SQL query,its good practise to divide a main module in num of sub module. So u can break the Adress.
As per your yes/no.... use integer feild and plan it in a way that if its YES,it'll store 1 else 0(zero)...
You shouldn't make individual tables for the individual counties. What you should do instead is create a table for states, a table for counties, and a table for addresses.
The result could look something like this:
state (id, code, name),
county (id, stateID, name),
club (id, countyID, name, streetAddress, etc...)
The process used to determine what to break up and when is called "database normalisation" - there's actually algorithms that do this for you. The wiki page on that is a good place to start: http://en.wikipedia.org/wiki/Database_normalization
One long text for the street address is fine, btw, as are varchars for the other fields.
Should I really be making multiple tables for each county, such as NY.ALBANY , NY.MADISON
It depends, but in your described case an alternative might be to have one database table with all the snowmobile clubs, and one table for all the states/counties. In the clubs table you could have an id field as foreign key which links the entry to a specific state/county entry.
To get all the info together you'd just have to do a JOIN-operation on the tables (please refer to mysql documentation).
Are the field formats I have chosen the sensible ones?
They would work..
Should Address be broken into subcomponents... such as street1, street2, city, state, zip?
Essentially the question here is if you need it broken down into subcomponents, either now or in the future? If it is broken down, you have the data separated which makes further processing (eg generation of serial letters, automated lookups..) potentially simpler, but that depends on your processing; if you don't need it separated why make life more complicated?
so many answers already.. agreed.
Which is good practice? To store data as a comma separated list in the database or have multiple rows?
I have a table for accounts, classes, and enrolments.
If the enrolment table has 3 fields: ID, AccountID and ClassID, is it better for ClassID to be a varchar containing a comma separated list such as this: "24,21,182,12" or for it to be just an int and have one entry per enrolment?
tldr: Don't do this. That is, don't use a "packed array" here.
Use a correctly normalized design with "multiple rows". This is likely a good candidate for a Many-to-Many relationship. Consider this structure:
Classes 1:M Enrollments(Class,Student) M:1 Students
Following a properly normalized design will reduce pain. In addition, here are some other advantages:
Referential integrity (use InnoDB)
Consistent model described with relationships
Type enforcement (can't have "foo,,")
JOIN and query without needing custom code
"What are the names of the students in class A?"
"Who is taking more than one class?"
Columns can be useful indexed (query performance)
Generally faster than handling locally in code
More flexible and consistent
Can attach attributes to enrollments such as status
No need to have code to handle serialization at access sites
More accommodating of placeholders and ORMs
Never ever ever cram multiple values into a single database field by combining them with some sort of delimiter, like a comma, or fixed length substrings. In the rare cases where this clearly gives a benefit in storage requirements or performance ... see rule #1: never ever ever. Ever.
When you cram multiple values into a single field, you sabatague all the clever features built into the database engine to help you retrieve and manipulate values.
Like let's say you have this -- I guess it's some sort of student database.
Plan A
student (student_id, account_id, class_id_mash)
Plan B
student (student_id, account_id)
student_class (student_id, class_id)
Okay, lets' say you want a list of all the students taking class #27. With Plan B you write
select student_id
from student join student_class on student.student_id=student_class.student_id
where class_id=27
Easy.
How would you do it with Plan A? You might think
select student_id
from student
where class_id_mash like '%27%'
But that will not only find all students in class 27, but also all those in class 127 or 272.
Okay, how about:
select student_id
from student
where class_id_mash like '%,27,%'
There, now we won't find 127 or 272! But, oops, we also won't find it if the 27 happens to be the first or last one in the list, because then there aren't commas on both sides.
So okay, maybe we could get around that with more rules about delimiters or with a more complex matching expression. But it would be unnecessariliy complex and painful.
And even if we did it, every search for class id has to be a full-fill sequential search. With one value per field and multiple records, you can create an index on the class_id field for fast, efficient retrieval. (Some database engines have ways to index into the middle of text fields, but again, why get into complicated solutions when there's an easy solution?)
How do we validate the class_id's? With separate fields, we can say "class_id references class" and the database engine will insure that we don't enter an illegal value. With the mash, no such free validation.
I have done both, but instead of storing the information in the database as comma seperated, I use another delimiter, such as | (so that I don't worry about formatting on insert into db). Its more about how often you will query the data
If you are only going to need the complete list, it is fine to store it as a comma separated value. But if you need to query the list, they should be stored separately.
Currently storing 3 bits of information about an persons name.
name,nicename,searchname = ("Mr.Joe bloggs", "Mr-Joe-bloggs", "mrjoebloggs")
Name used for a user's display name, nicename for the url and searchname for realtime searching the database (so speed is a must, milliseconds matter!)
Currently one table holds all 3 fields, but how much more effient would it be to store each field in a seperate table?and relate everything by id?
or would that just waste extra selects relating them to one another? DB will have over 100m records.
If you insist on keeping those three fields, you'd be creating a one-to-one relationship with every piece of data. It would make sense to keep them all in the same row.
However, you might find it better to only store the name. When you need the "nice name", you can use a regex to replace periods and space (and other characters) with hyphens (or remove them). When a user searches for "mr joe bloggs", you can make a simple searching algorithm by dividing up the three words and using the LIKE clause.
i don't even know if calling it serialized column is right, but i'm going to explain myself, for example, i have a table for users, i want to store the users phone numbers(cellphone, home, office, etc), so, i was thinkin' to make a column for each number type, but at the same time came to my head an idea, what if i save a json string in a single column, so, i will never have a column that probably will never be used and i can turn that string into a php array when reading the data from database, but i would like to hear the goods and bads of this practice, maybe it is just a bad idea, but first i want to know what other people have to say about
thanks
Short Answer, Multiple columns.
Long Answer:
For the love of all that is holy in the world please do not store mutiple data sets in a single text column
I am assuming you will have a table that will either be
+------------------------------+ +----------------------+
| User | cell | office | home | OR | User | JSON String |
+------------------------------+ +----------------------+
First I will say both these solutions are not the best solution but if you were to pick the from the two the first is best. There are a couple reasons mainly though the ability to modify and query specifically is really important. Think about the algrothim to modify the second option.
SELECT `JSON` FROM `table` WHERE `User` = ?
Then you have to do a search and replace in either your server side or client side language
Finally you have to reinsert the JSON string
This solution totals 2 queries and a search and replace algorithm. No Good!
Now think about the first solution.
SELECT * FROM `table` WHERE `User` = ?
Then you can do a simple JSON encode to send it down
To modify you only need one Query.
UPDATE `table` SET `cell` = ? WHERE `User` = ?
to update more than one its again a simple single query
UPDATE `table` SET `cell` = ?, `home` = ? WHERE `User` = ?
This is clearly better but it is not best
There is a third solution Say you want a user to be able to insert an infinite number of phone numbers.
Lets use a relation table for that so now you have two tables.
+-------------------------------------+
+---------+ | Phone |
| Users | +-------------------------------------+
+---------+ | user_name| phone_number | type |
| U_name | +-------------------------------------+
+---------+
Now you can query all the phone numbers of a user with something like this
Now you can query the table via a join
SELECT Users., phone. FROM Phone, Users WHERE phone.user_name = ? AND Users.U_name = ?
Inserts are just as easy and type checking is easy too.
Remember this is a simple example but SQL really provides a ton of power to your data-structure you should use it rather than avoiding it
I would only do this with non-essential data, for example, the user's favorite color, favorite type of marsupial (obviously 'non-essential' is for you to decide). The problem with doing this for essential data (phone number, username, email, first name, last name, etc) is that you limit yourself to what you can accomplish with the database. These include indexing fields, using ORDER BY clauses, or even searching for a specific piece of data. If later on you realize you need to perform any of these tasks it's going to be a major headache.
Your best best in this situation is using a relational table for 1 to many objects - ex UserPhoneNumbers. It would have 3 columns: user_id, phone_number, and type. The user_id lets you link the rows in this table to the appropriate User table row, the phone_number is self explanatory, and the type could be 'home', 'cell', 'office', etc. This lets you still perform the tasks I mentioned above, and it also has the added benefit of not wasting space on empty columns, as you only add rows to this table as you need to.
I don't know how familiar you are with MySQL, but if you haven't heard of database normalization and query JOINs, now is a good time to start reading up on them :)
Hope this helps.
If you work with json, there are more elegant ways than MySQL. Would recommend to use either another Database working better with json, like mongoDB or a wrapper for SQL like Persevere, http://www.persvr.org/Documentation (see "Perstore")
I'm not sure what the advantages of this approach would be. You say "so, i will never have a column that probably will never be used..." What I think you meant was (in your system) that sometimes a user may not have a value for each type of phone number available, and that being the case, why store records with empty columns?
Storing records with some empty columns is not necessarily bad. However, if you wanted to normalize your database, you could have a separate table for user_phonenumber, and create a 1:many relationship between user and user_phonenumber records. The user_phonenumber table would basically have four columns:
id (primary key)
userid (foreign key to user table)
type (e.g. cellphone, home, office, etc.)
value (the phone number)
Constraints would be that id is a primary key, userid is a foreign key for user.id, and type would be an enum (of all possible phone number types).
I working on a food database, every food has a list of properties (fats, energy, vitamins, etc.)
These props are composed by 50 different columns of proteins, fat, carbohydrates, vitamins, elements, etc.. (they are a lot)
the number of columns could increase in the future, but not too much, 80 for extreme case
Each column needs an individual reference to one bibliography of a whole list from another table (needed to check if the value is reliable or not).
Consider the ids, should contain a number, a NULL val, or 0 for one specific exception reference (will point to another table)
I've though some solution, but they are very different eachothers, and I'm a rookie with db, so I have no idea about the best solution.
consider value_1 as proteins, value_2 as carbohydrates, etc..
The best (I hope) 2 alternatives I thought are:
(1) create one varchar(255?) column, with all 50 ids, so something like this:
column energy (7.00)
column carbohydrates (89.95)
column fats (63.12)
column value_bil_ids (165862,14861,816486) ## as a varchar
etc...
In this case, I can split it with "," to an array and check the ids, but I'm still worried about coding praticity... this could save too many columns, but I don't know how much could be pratical in order to scalability too.
Principally, I thought this option usual for query optimization (I hope!)
(2) Simply using an additional id column for every value, so:
column energy (7.00)
column energy_bibl_id (165862)
column carbohydrates (89.95)
column carbohydrates_bibl_id (14861)
column fats (63.12)
column fats_bibl_id (816486)
etc...
It seems to be a weightful number of columns, but much clear then first, especially for the relation of any value column and his ID.
(3) Create a relational table behind values and bibliographies, so
table values
energy
carbohydrates
fats
value_id --> point to table values_and_bibliographies val_bib_id
table values_and_bibliographies
val_bib_id
energy_id --> point to table bibliographies biblio_id
carbohydrates_id --> point to table bibliographies biblio_id
fats_id --> point to table bibliographies biblio_id
table bibliographies
biblio_id
biblio_name
biblio_year
I don't know if these are the best solutions, and I shall be grateful if someone will help me to bring light on it!
You need to normalize that table. What you are doing is madness and will cause you to loose hair. They are called relational databases so you can do what you want without adding of columns. You want to structure it so you add rows.
Please use real names and we can whip a schema out.
edit Good edit. #3 is getting close to a sane design. But you are still very unclear about what a bibliography is doing in a food schema! I think this is what you want. You can have a food and its components linked to a bibliography. I assume bibliography is like a recipe?
FOODS
id name
1 broccoli
2 chicken
COMPONENTS
id name
1 carbs
2 fat
3 energy
BIBLIOGRAPHIES
id name year
1 chicken soup 1995
FOOD_COMPONENTS links foods to their components
id food_id component_id bib_id value
1 1 1 1 25 grams
2 1 2 1 13 onces
So to get data you use a join.
SELECT * from FOOD_COMPONENTS fc
INNER JOIN COMPONENTS c on fc.component_id = c.id
INNER JOIN FOODS f on fc.foods_id = f.id
INNER JOIN BIBLIOGRAPHIES b on fc.bib_id = b.id
WHERE
b.name = 'Chicken Soup'
You seriously need to consider redesiging your database structure - it isn't recommended to keep adding columns to a table when you want to store additional data that relates to it.
In a relational database you can relate tables to one another through the use of foreign keys. Since you want to store a bunch of values that relate to your data, create a new table (called values or whatever), and then use the id from your original table as a foreign key in your new table.
Such a design that you have proposed will make writing queries a major headache, not to mention the abundance of null values you will have in your table assuming you don't need to fill every column..
Here's one approach you could take to allow you to add attributes all day long without changing your schema:
Table: Food - each row is a food you're describing
Id
Name
Description
...
Table: Attribute - each row is a numerical attribute that a food can have
Id
Name
MinValue
MaxValue
Unit (probably a 'repeating group', so should technically be in its own table)
Table: Bibliography - i don't know what this is, but you do
Id
...
Table: FoodAttribute - one record for each instance of a food having an attribute
Food
Attribute
Bibliography
Value
So you might have the following records
Food #1 = Cheeseburger
Attribute #1 = Fat (Unit = Grams)
Bibliography #1 = whatever relates to cheeseburgers and fat
Then, if a cheeseburger has 30 grams of fat, there would be an entry in the FoodAttribute table with 1 in the Food column, 1 in the Attribute column, a 1 in the Bibliography column, and 30 in the Value column.
(Note, you may need some other mechanisms to deal with non-numeric attributes.)
Read about Data Modeling and Database Normalization for more info on how to approach these types of problems...
Appending more columns to a table isn't recommended nor popular in the DB world, except with a NoSQL system.
Elaborate your intentions please :)
Why, for the love of $deity, are you doing this by columns? That way lies madness!
Decompose this table into rows, then put a column on each row. Without knowing more about what this is for and why it is like it is, it's hard to say more.
I re-read your question a number of times and I believe you are in fact attempting a relational schema and your concern is with the number of columns (you mention possibly 80) associated with a table. I assure you that 80 columns on a table is fine from a computational perspective. Your database can handle it. From a coding perspective, it may be high.
Proposed (1) Will fail when you want to add a column. You're effectively storing all your columns in a comma delimited single column. Bad.
I don't understand (2). It sounds the same as (3)
(3) is correct in spirit, but your example is muddled and unclear. Whittle your problem down to a simple case with five columsn or something and edit your question or post again.
In short, don't worry about number of columns right now. Low on the priority list.
If you have no need to form queries based on arbitrary key/value pairs you'd like to add to every record, you could in a pinch serialize()/unserialize() an associative array and put that into a single field