I'm in the early stages of creating a database using MySQL and PHP and would like some advice please. I have started to collate the data and would like to start typing it into .csv files ready to import into my tables. Before I do, I'm unsure how to layout my structured columns and tables properly.
Ok, I'll try my best to make clear what I'm trying to create. I'd have my home page structure where you have a choice of selecting a list of players by season or by an A-Z list of all-time players. Once you click on a specific player from the list of players it would show something like this for their player profile: http://stats.touch-line.com/playerdet.asp?playerid=41472&cust=2&lang=0&FromSTR=TRUE&compid=&teamid=1&H2H=
How many tables would I need to create?
A player table with playerID,playerName,playerDOB,playerBirthplace,playerPosition etc.
A team table with teamID,teamName,teamNickname,teamGround,teamFounded etc.
A season table with seasonID,playerID,teamID,playerApps,playerGoals?
Or is there a quicker, more efficient way without the need to use so many tables to link the data? Any advice would be much appreciated. Thanks in advance. ;)
How many tables do I need to create?
The short answer is: one table for each "entity" type. An entity can be defined as a person, place, thing, concept or event, which can be uniquely identified, is of interest to the business, and we can store information about.
One key to database design is data analysis (Richard Perkinson "Data Analysis: The Key to Database Design", QED c.1993)
You've identified some of the important entities in your model: player, team, season. There may be some other key entities that are missing, which may be discovered later.
The attributes of each entity need to be identified, and should be dependent on the key of the entity, and not some other key. (Every attribute should be dependent on the key, the whole key, and nothing but the key, so help me Codd.)
You also need to identify the relationships that exist between the entities. Can a player be a member of more than one team? Can a player have more than one position? If a player is traded (moves from one team to another), how will that be represented in the model?
Where we encounter "many-to-many" relationships, those are represented in a separate relationship tables. Repeating attributes also get broken into separate child tables.
It's important that you get the model right, before you start combining multiple entities into the same table. Optimization usually results in a broken model; it usually doesn't fix a model that doesn't work.
Databases are designed to handle large number of rows efficiently, when the queries are in line with the model. Databases with dozens of tables can run very efficiently, and run more efficiently than databases with fewer tables.
I'd be more concerned with getting a database design that works, than I would be concerned with optimizing a design that doesn't work.
Related
I'm creating a database on mysql for a small app.
Problem is there are too many fields that are identical on different Tables like
Table 1: Muncipal Issues:
ID,
UserID,
Title,
Location,
Description,
ImageURL,
Table 2: Harrasement Issues:
ID ,
UserID,
Title,
Location,
Description,
ImageURL
Tables 3 same as above
both tables have almost same coulmns.
i want to ask if it's better to use a relations and create a table for handling IDs and link it with other details or it's better to create a single table with an extra coulmn for these issues.
on one hand there'll be too many tables with identical columns.
on the other hand there'll few tables with too many rows in it.
What will be best for performance more rows or more tables.
i'm using Mysql.
Firstly, unless you expect millions of records don't care that much about performance but care more about the structure of your data and how easy it will be to access it. Literally write down a list of data that you plan to extract in your app e.g. "find all issues today", "find all unresolved issues older than 6 months" and then try to build real SQL queries on your expected structure. If they're going hard try to change the structure.
To answer your question: it depends. The current structure has following benefits:
It's easy to query certain type of issues
It's easy to build a PHP application - just make one template form (or model) and then copypaste it with slight changes for other tables
In case of performance problems it may be easier to create a cluster by simply putting each table on the different db server.
and following downsides:
It's inflexible. Adding new field that you forgot to add in the beginning will be painful since you'll have to change 3 (or more) tables and then the same amount of pieces in your app.
Adding new types of issues will be painful and require creating new table.
Creating SQL-s for getting data like "all non-resolved issues (regardless of type)" will require complicated UNION-s. Moreover this UNIONS will require creating virtual field with issue type otherwise you can't tell from which table did certain id come.
The classical db approach recommends using one table for common fields and create derived tables for fields that are different. So:
issues table should have all common fields and is identified by PK issue_id
municipal_issues uses the foreign key to issues.issue_id and has only the specific fields
harassment_issues uses the foreign key to issues.issue_id and has only the specific fields
also the issues table has the issue_type field that takes values "harassment", "municipal" etc and helps finding the table where the additional data are stored.
This pattern is called "Class Table inheritance" and you may check out the SQL antipatterns presentation for more info and other approaches. This solves the flexibility issue and still allows re-creating each of the original tables with only one simple JOIN that goes pretty fast.
Also as a side note you may look into the db schema of bug-trackers like Mantis since this looks like the same domain.
I'm not a database expert, so I'm not sure how to ask this question briefly and succinctly. I am trying to copy data with the following characteristics: many of the tables with data being copied contain references to other tables with data being copied; i.e., a patient might attend a class where their weight is recorded, so I need to copy both the class attendance row as well as the weight value stored in another table, which is referenced by the class attendance row. There are other, even more complex, examples in this database, but it seems that I need to perform some kind of recursive copy of these inter-referenced items so I can maintain the cross-references in the copied data.
So, is there any kind of standard approach to this problem? If there isn't a direct answer, could someone share the terminology of what I'm trying to do so that I can look it up on my own? I'm certain this problem has been tackled many times before, but I don't know how to find the solution. I understand the basic concepts of JOINs and FKs, but this solution seems to require a way to copy the rows from various tables while also going back and updating the cross-references (in some cases, these are FKs, and in other cases, they are not; I'm stuck with the schema as it is).
PS: If it's such an obvious solution, why won't anyone just provide it or characterize it below so we can move on? Most of humanity is capable of asking the occasional dumb question, and this may very well be one of mine, but I'm seriously stuck on this one and would appreciate some assistance.
Here's a sketch of a small part of the schema to try to illustrate the issue:
When we copy a patient's data record, we need to 1) create a new row in patient; 2) create a corresponding new row in edclass_session_labs; 3) create a new row in patient_lab_weight; and (here's what I see as the tricky part) 4) also update the reference in edclass_session_labs to the new row in patient_lab_weight. What I'm looking for is a way to do this programmatically and algorithmically. I'm sure problems like this have been tackled before, so that's why I'm asking for advice here.
I didn't fully understand what you mean by "copy patient data", so there are two options:
1) If you want to "copy" the data to a report, you need to link many tables with related information, so you have to study the concept of JOINs and FOREIGN KEYs. This is what we do when we need to convert relational data into a flat table that can be easily read by non-IT people.
2) If you need to copy specific data from database tables to other database tables, you also have to study FOREIGN KEYs and table relationship. You need to understand how table rows relate to rows on other tables (one to many, many to one, many to many), so you can create INSERT statements based on SELECTs that will filter the exact data you need.
This is very general, but I think it's sufficient to point you to the right direction.
EDIT:
Since the issue is related to creating a merged structure of patient data, let's say we have patient 1 and patient 2. They are duplicates of the same person, and need to be merged. I would do this, in this order:
a) Create a patient 3, this one will be the target of our merging. Simply copy each field from patients 1 or 2 to this new record.
b) Create as many new records as needed in table "patient_lab_weight". For example: if patient 1 has 2 records there, and patient 2 has 4 records, you will have to create 6 records, which are copies of the records related to patient 1 and 2, but patient_id will be 3. However, after creating each record here, obtain the auto_increment generated for field "patient_lab_weight_id", and insert a new record in "ed_class_session_labs", with patient_id = 3, and "patient_lab_weight_id" = the obtained ID. Do that for each insert on "patient_lab_weight".
c) after all that, disable patients 1 and 2 in your application.
If you use this approach, you will slowly build up your new structure, linked in a consistent way.
I'm developing software for conducting online surveys. When a lot of users are filling in a survey simultaneously, I'm experiencing trouble handling the high database write load. My current table (MySQL, InnoDB) for storing survey data has the following columns: dataID, userID, item_1 .. item_n. The item_* columns have different data types corresponding to the type of data acquired with the specific items. Most item columns are TINYINT(1), but there are also some TEXT item columns. Large surveys can have more than a hundred items, leading to a table with more than a hundred columns. The users answers around 20 items in one http post and the corresponding row has to be updated accordingly. The user may skip a lot of items, leading to a lot of NULL values in the row.
I'm considering the following solution to my write load problem. Instead of having a single table with many columns, I set up several tables corresponding to the used data types, e.g.: data_tinyint_1, data_smallint_6, data_text. Each of these tables would have only the following columns: userID, itemID, value (the value column has the data type corresponding to its table). For one http post with e.g. 20 items, I then might have to create 19 rows in data_tinyint_1 and one row in data_text (instead of updating one large row with many columns). However, for every item, I need to determine its data type (via two table joins) so I know in which table to create the new row. My zend framework based application code will get more complicated with this approach.
My questions:
Will my solution be better for heavy write load?
Do you have a better solution?
Since you're getting to a point of abstracting this schema to mimic actual datatypes, it might stand to reason that you should simply create new table sets per-survey instead. Benefit will be that the locking will lessen and you could isolate heavy loads to outside machines, if the load becomes unbearable.
The single-survey database structure then can more accurately reflect your real world conditions and data input handlers. It ought to make your abstraction headaches go away.
There's nothing wrong with creating tables on the fly. In some configurations, soft sharding is preferable.
This looks like obvious solution would be to use document database for fast writes and then bulk-insert answers to MySQL asynchronously using cron or something like that. You can create view in the document database for quick statistics, but allow filtering and other complicated stuff only in MySQ if you're not a fan of document DBMSs.
Please I don't have any idea. Although I've made some readings on the topic. All I know is it is used to make the data in the database more efficient and easy to handle. And It can also be used to save disk space. And lastly, if you used normalization. You will have to generate more tables.
Now I have a lot of questions to ask.
First, how will normalization help to save disk space or whatever space occupied by the database.
Second, Is it possible to add data on multiple tables using only 1 query.
Please help, I'm just a newbie wanting to learn from you. Thanks.
Ok, couple of things:
php has got nothing to do with this. normalization is about modelling data
normalization is not about saving disk space. It is about organizing data so that it is easily maintainable, which in turn is a way to maintain data-integrity.
normalization is typically described in a few stages or 'normal forms'. In practice, people that design relational databases often intuitively 'get it right' most of the time. But it is still good to be aware of the normal forms and what their characteristics are. There is a lot of documentation on that on the internet (fe http://en.wikipedia.org/wiki/Database_normalization), and you should certainly do you own research, but the most important stages are:
unormalized data: in this stage, data is not truly tabular ('relational'). There is a lot of discussion of what tabular really means, and experts disagree with one another. but most people agree that data is unnormalized in case there are multi-valued attributes (=columns that can for one row contain lists as value), or in case there are repeating groups (=multiple columns or multiple groups of columns for storing the same type of data)
Example of multi-valued column: person (first_name, last_name, phonenumbers)
Here, phonenumbers implies there could be more phonenumbers, stored in one column
Example of repeating group: person(first_name, last_name, child1_first_name, child1_birth_date, child2_first_name, child2_birth_date..., childN_first_name, childN_birth_date)
Here, the person table has a number of column pairs (child_first_name, child_birth_date) to store the person's children.
Note that something like order (shipping_address, billing_address) is not a repeating group: the addresses for billing and shipping may be similar pieces of data, but each has its own distinct role for an order, both just represent a different aspect of an order. child1 thru child10 do not - children do not have specific roles, and the list of children is variable (you never know how many groups you should reserve in advance)
In both cases, multi-valued columns and repeating groups, you basically have "nested table" structure - a table within a table. Data is said to be in 1NF (first normal form) if neither of these occur.
The 1NF is about structural characeristics: the tabular form of the data. All subsequenct normal forms have to do with eliminating redundancy. Redundancy occurs when the same information is independently stored multiple times. Redundancy is bad: if you want to change some fact, you have to change it in multiple places. If you forget to chance one of them, you have inconsistent data - the data is contradicting itself.
There are a lot of processes that can eliminate redundancy, each leading to a higher normal form, all the way from 1nf up to 6nf. However, typically most databases are adequately normalized at 3nf (or a lsight variation of that called boyce-codd normal form, BCNF) You should study 2nf and 3nf, but the principle is very simple: a table is adequately normalized, if:
the table is in 1nf
the table has a key (a column or column combination whose values are required, and which uniquely identifies a row - ie. there can be only one row having that combination of values in the key columns)
there are no functional dependencies between the non-key columns
non-key columns are not functionally dependent upon part of the key (but are completely functionally dependent upon the entire key).
functional dependency means that a column's value can be derived from another column. simple example:
order_item (order_id, item_number, customer_id, product_code, product_description, amount)
let's assume (order_id, item_number) is key. product_code and product description are functionally dependent upon each other: for one particular product_code, you will always find the same product description (as if product description is a function of product_code). The problem is now: suppose a product description changes for a particualr product code, you have to change all orders that us that product_code. forget only one and you have an inconsistent database.
The way to solve it is to create a new product table with (product_code, product_description), having (product_code) as key, and then instead of storing all product fields in order, only store a reference to a row in the product table in the order_item records (in this case, order_item should only keep product_code, which is sufficient to look up a row in the product table and find the product_description)
So as you u can see, with this solution you do actually save space (by not storing all these product descriptions in each order_item that happens to order the product) and you do get more tables (split off product from order_item) But just remember that it is not because of saving diskspace: it is because you eliminate redundancy, thus making it easier to maintain the data. because now you only have to change one row in the product table to change the description
There are a lot of similar questions on StackOverflow already, for example, Can someone please give an example of 1NF, 2NF and 3NF in plain english?
Look in the Related sidebar to the right for a bunch of them. That'll get you started.
As for your specific questions:
Normalization saves disk space by reducing redundant data storage. This has another benefit: if you have multiple copies of a given entity attribute in your database, they can get out of sync, while if you have a normalized database and use referential integrity, this cannot happen.
The INSERT statement references only one table. A TRIGGER on the insert statement can add rows to other tables, but there's no way to supply data to the trigger other than those columns in the table that spawned it.
When you need to insert dependent rows after inserting a row to the parent table, use the LAST_INSERT_ID() function to retrieve the auto-generated primary key value of the last INSERT statement in your session.
I think you will learn this when you start creating the schema for your database.
Please think reverse when you add a field that exists somewhere else in your database.
By reverse I mean, ask yourself: if I have to modify the field, how many queries do I have to run?
Probably you end up, with the answer, that you will have to run 2 or X times the query to modify the content of your column.
Keep it simple, that means assign an ID to each content you have duplicated in your database.
For example taking column address
this is not good
update clients set address = 'new address' where clientid=500;
update orders set address = 'new address' where orderid=300;
good approach would be
create a addresses table
//and run a single query
update addresses set address = 'new address' where addressid=100;
And use the address id 100 everywhere in your database table as a foreign key reference (clients+orders), this way you achieve that the id 100 is not changed, but if you update the content of the address all linked tables will pick up the change.
Level 3 of normalization is enough this time for you.
Normalization is a set of rules. The more you follow, the higher a "level" of normalisation your database has. In general, level 3 is the highest level sought after.
Normalised data is theoretically "purer" than non-normalised data. This makes it easier to rationalise about it, and it removes redundancy, which is reduces the chance of data getting out of sync.
From a pratical viewpoint however, normalised data isn't always the best design, even if it is in theory. If you don't really know the finer points, aiming for normalised data isn't such a bad idea though.
in phpmyadmin > 4.3.0, in structure -> Table structure, we got above the table:
"Print" "Propose table structure" "Track table" "Move columns" "Improve table structure" , in "Improve table structure" you got a wizard which says :
Improve table structure (Normalization):
Select up to what step you want to normalize
First step of normalization (1NF)
Second step of normalization (1NF+2NF)
Third step of normalization (1NF+2NF+3NF)
To question 2: No it is not possible to insert data into multiple tables with one query.
See the INSERT syntax.
In addition to other answers, you can also search here on SO for normalization and find e.g. the question: Normalization in MySQL
i've never created a shopping cart, or forum in php. aside from viewing and analyzing another persons project or viewing tutorials that display how to make such a project or how to being such a project. how would a person know how to design the database structure to create such a thing? im guessing its probbably through trial and error...
you should read up and understand the basics of normalization. for most projects, normalizing to 3rd normal form will be just fine. there are always certain scenarios when you want more or less normalization, but understanding the concepts behind it will allow you to think about how your database is structured in a normalized format.
here's a very basic example of normalizing a table:
students
student_id
student_name
student_class
student_grade
a pretty standard table containing various data, but we can see some issues right away. we can see that a student's name is dependant on his ID, however, a student may be involved in more than one class, and each class would conceivably have a different grade. we can then think about the tables as such:
students
student_id
student_name
class
class_id
class_name
this is not bad, now we can see we have various students, and various classes, but we haven't captured the student's grades.
grades
student_id
class_id
grade
now we have a 3rd table, which allows us to understand the relation between a particular student, a particular class, and a grade associated with that class. from our first initial table, we now have 3 tables in a normalized database (let's assume we don't need to normalize grades any further for sake of example :) )
a few things we can glean from this very basic example:
our data is all tied to a key of some sort (student_id, class_id, and student_id + class_id). these are unique identifiers within each table.
with our keyed relations, we're able to relate information to each other (how many classes is student #4096 enrolled in?)
we can see our tables will not contain duplicated data now (think about our first table, where student_class could be the same value for many students. if we had to change the class name, we'd have to update all the records. in our normalized format, we can just update class_name of class_id)
The main technique you can learn about database design is called Database Normalization.
Database normalization has it's limits, especially if you have many transactions. At some point you may be forced to Denormalize.
But imho it's always better to start with a normalized database design.
I would also recommend using a visual editor for creating your database schema. I have recently been using: http://dev.mysql.com/workbench/
Once you create a database design, have someone with more experience look it over and give you feedback. But know that as much time as you can spend designing, you will eventually have to do implementation and will find out you are missing something or could do an even better job doing it another way.
So, design, get feedback, but don't be afraid to change.