mysql distinct query using joins

mysql distinct query using joins - php

I have a complex database relationship (to me its complex). In theory, I think it a good design, but my roadblock now is getting data out of it in as few queries as possible. Here is the database structure I have:
student table:
some fields like name, phone, email, etc.
students_requirements table (mapping table):
student_id,
requirement_id,
date
requirements table (belongs to a requirement type):
id,
requirement_type_id,
name
requirement_type table (has many requirements):
id,
type,
name,
Ok, so here is an example of how it is used. I can build requirement types. An example would be something like an assignment. Each assignment has multiple requirements. A student can pass off requirements for a specific assignment, but doesn't necessarily have requirements passed off for all assignments. So I would want to query all assignments by student. So say there are 50 assignments entered in the system, and jon smith has entered requirements for 4 of those assignments. I would like to query by jon smith id to find all assignments that he has entered any requirements for.
I hope that makes sense. My only guess is to use a join, but to be honest, I really don't understand them very well.
Any help would be awesome!

Try this:
SELECT * FROM student_table, students_requirements_table,
requirements_table, requirement_type_table
WHERE student_table.name = "Jon Smith"
AND students_requirements_table.id = student_table.id
AND requirements_table.id = students_requirements_table.requirement_id
AND requirement_type_table.id = requirements_table.requirement_type_id;
Check that the table names are accurate, as I've had to assume a couple of things (such as there being underscores in some of your table names), and note that all of the above should actually be one long line (but that makes it unreadable on this page, so I've split it across multiple lines).
I don't have a LAMP rig setup at the moment, so I can't mock this up to test it, and it's been a while since I had to write MySQL joins, but I think this is on the right track.
If you need to use LEFT JOIN then take a look at this page: Left joins to link three or more tables.

Related

SQL Multi Table and Multi Column Select

I'm making a mysql database that has one table for each student in a school, and in each table it then has the timetable of each student. I need to be able to run a script that will search every table in the database and every column for 2 values. For example, it needs to search all tables and columns for teacher "x" where day_week = MondayA. In the table, there are 11 columns total, one for the day_week then 5 for period lesson (so period 1 lesson, period 2 lesson ect) then another 5 for the teacher they have for each period.
Any help would be much appreciated.
Thanks.

Fix your schema
First of all, your schema sounds very bad. Every time you add a new student, you have to change it (add a new table), and if this were for a real school, that would be an absolute disaster! Changing the schema is more expensive than simply inserting a row into a table, and if your web application can directly change the database, then any security exploits that might be exposed could potentially lead to people messing with your tables without you realizing it.
On top of that, it makes querying, say, the number of students an absolute pain. Ideally, your data should be laid out in a way that lets you answer any and all questions you might ever have for it. Not just questions you have now, but further down the road.
And if that's not bad enough, it makes querying a nightmare. You have to keep track of the number of tables somehow, and their names, so that every time you query information it's running an entirely different query. Some queries, like 'List students that joined in the last year', grow in size, complexity, and time to run as the list of students (the number of tables) grows. This may be what you're running into already, though it's hard to tell simply from your question.
Normalization
Normalization is, put simply, 'Designing the schema well'. It's a bit of a vague topic, but it's broken down into varying levels; and each level depends on the last.
To be perfectly honest, I don't understand the wording of the different levels, and I'm a little bit of a newb at databases myself, but here is the gist of normalization, from what I've been taught:
Every value means one, small, simple thing
Basically, don't go crazy and put a bunch of stuff in a single column. It's bad design to have a column like, 'Categories', and the value be a long string that reads like, "Programming, Databases, Web Development, MySQL, Cows".
First of all, parsing strings is time consuming, especially the longer they are, and second of all, if those categories are associated with anything else - like, perhaps you have a table of categories for people to choose from - then now you're checking larger strings for the contents of smaller strings. If you want to pull up every item of a certain category, you will be matching that string against the ENTIRE database... Which can be excruciatingly slow.
I'm not sure if this is part of normalization, but what I've learned to do is to make a numeric 'ID' for everything I refer to in more than one table. For example, instead of a database table that has the columns 'Name', 'Address', 'Birthday', I'll have, 'ID', 'Name', 'Address', 'Birthday'. ID would be a unique number for every row, a primary key, and if at any time I wanted to refer to ANY of the people in it, I'd just use that number.
Numbers are much quicker to compare/match, much quicker to look up, and overall much nicer for the database to deal with, and let you create queries that run at very tiny fractions of the amount of time as with a string-based database.
To complete the example, you could have three tables; say, 'Articles', 'Categories', and 'Article_Categories'.
'Articles' would hold all the actual articles and their properties. Something like, 'ID', 'Title', 'Content'.
'Categories' would hold all of the individual categories available, with 'ID' and 'Category' fields.
'Article_Categories' would hold the combinations of articles to categories; a unique combination of 'Article_ID' and 'Category_ID'.
What this might look like:
Articles
1, 'Web Cow Geniuses', 'Cows have been shown to know how to create great databases for websites using MySQL.';
2, 'Why to use MySQL', "It's free, duh!";
Categories
1, Cows;
2, Databases;
3, MySQL;
4, Programming;
5, Web Development;
Article_Categories
1, 1;
1, 2;
1, 3;
1, 4;
1, 5;
2, 2;
2, 3;
Notice that each combination in 'Article_Categories' is unique; you never see, for example, '1, 3' twice. But '1' is in the first column multiple times, and '3' is in the second column multiple times.
This is called a 'many to many' table. You use it when you have a relationship between two data sets, where there are multiple combinations for mixing them. Essentially, where any number of items in one can correspond to any number of items from the other.
Do not mix data and metadata
Basically, data is the content of the tables. The values inside the rows. Metadata is the tables themselves; the table names, the value types, and the relationships between two different sets of data.
Metadata inside data
Here's an example of putting metadata inside data:
A 'People' table that has, as columns, 'isStudent' and 'isTeacher'.
When data is put in 'People', you might have a row where they are both a teacher and a student, so you put something like 'ID', 'Name', 'yes', 'yes'. This doesn't sound bad, and there may well be a teacher who's taking classes at the same school so it is possible.
However, it takes up more space since you have to have a value of some sort in both columns, even if they are only one or the other.
A better way to make this would be to split it out into three separate tables:
A 'People' table that has an ID, name, and other data that every person has.
A 'Students' table that uses only the values of the 'People.ID' as data.
A 'Teachers' table that uses only the values of the 'People.ID' as data.
This way, everybody who is a student gets referenced to in 'Students', and everyone who's a teacher gets referenced in 'Teachers'. As mentioned previously, we use the 'ID' field because it's quicker to match up across tables. Now, there are only as many Teachers referenced as there need to be, and the same goes for Students. This initially takes up more space due to the size overhead of having them as separate tables, but as the database grows, this is more than made up for.
This also allows you to reference teachers directly. Say you have a table of 'Classes', and you only want Teachers capable of being the, well, Teacher. Your 'Classes' table, in the 'Teachers' column, can have a foreign key to 'Teachers.ID'. That way, if a Student hacks the database and tries to put themselves as teaching a class somehow, it's impossible for them to do so.
Data inside metadata
This is quite similar to what you appear to be having problems with.
Data is, essentially, what it is we are trying to store. Student names, teacher names, schedules for both, etc. However, sometimes we put data - like a student's name - inside of metadata - like the name of a table.
Whenever you see yourself regularly adding onto or changing the schema of a database, it is a HUGE sign that you are putting data inside of metadata. In your case, every student having their own table is essentially putting their name in the metadata.
Now, there are times where you kinda want to do this, when the number of tables will not change THAT often. It can make things simpler.. For example, if you have a website selling underwear, you might have both 'Mens_Products' and 'Womens_Products' tables. Obviously the 'neater' solution would be to have a 'Product_Categories' table, in case you want to add transgender products or other sell products to both genders, but in this case it doesn't matter that much. It wouldn't be hard to add a 'Trans_Products' table, and it's not like you'd be adding new tables frequently.
Do not duplicate data
At first, this'll sound like I'm contradicting EVERYTHING I've just said. "How am I supposed to copy those IDs everywhere if I'm not supposed to duplicate data?!" But alas, that's not exactly what I mean. In fact, this is another reason for having a separate ID for each item you might refer to!
Essentially, you don't want to have to update more data than you need to. If, for example, you had a 'Birthday' column in your 'Students' and your 'Teachers' tables in the above example, and you had someone who was both a Student and a Teacher, suddenly their birthday is recorded in two different spots! Now, what if the birthday was wrong, and you wanted to change it? You'd have to change it twice!
So instead, you put it in your 'People' table. That way, for each person, it only exists once.
This might seem like an obvious example, but you'd be surprised at how often it can occur by accident. Just be careful, and watch for anything that requires you to update the same value in two different locations.
Queries
So, with all that out of the way, how should you query? What sort of SELECT statement should you use?
Lets say you have the following schema (primary key in bold):
People:
ID
Name (Unique)
Birthday
Teachers:
People_ID (Foreign: People.ID)
Students:
People_ID (Foreign: People.ID)
Classes:
ID
Name (Unique)
Teacher_ID (Foreign: Teachers.ID)
Class_Times:
Class_ID (Foreign: Classes.ID)
Day (Enum: 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday')
Start_Time
Student_Classes:
Student_ID (Foreign: Students.ID)
Class_ID (Foreign: Classes.ID)
First note that 'Student_Classes' has two primary keys... This makes the combination of the two unique, not the individual ones. This makes it a many-to-many table, as discussed earlier. I did this also for 'Class_ID' and 'Day' so that you wouldn't put the class twice on the same day.
Also, it may be bad that we use an Enum for the days of the week... If we wanted to add Sunday classes, we'd have to change it, which is a change in the schema, which could potentially break things. However, I didn't feel like adding a 'Days' table and all that.
At any rate, if you wanted to find all of the teachers who were teaching on a Monday, you could just do this:
SELECT
People.Name
FROM
People
LEFT JOIN
Teachers
ON
People.ID = Teachers.People_ID
LEFT JOIN
Classes
ON
People.ID = Classes.Teacher_ID
LEFT JOIN
Class_Times:
ON
Classes.ID = Class_Times.Class_ID
WHERE
Class_Times.Day = 'Monday';
Or, formatted in one big long string (like it'll be when you put it in your other programming langauge):
SELECT People.Name FROM People LEFT JOIN Teachers ON People.ID = Teachers.People_ID LEFT JOIN Classes ON People.ID = Classes.Teacher_ID LEFT JOIN Class_Times: ON Classes.ID = Class_Times.Class_ID WHERE Class_Times.Day = 'Monday';
Essentially, here is what we do:
Select the main thing we want, the teacher's name. The name is stored in the 'People' table, so we select from that first.
We then left join it to the 'Teachers' table, telling it that all of the People we select must be a Teacher.
After that, we do the same with 'Classes'; narrowing it down to only Classes that the Teacher actually teaches themselves.
Then we also grab 'Class_Times' (important for the final step), but only for those Classes that the Teacher is teaching.
Finally, we specify that the Day the Class takes place must be a 'Monday'.

First, it's worth noting this is probably not the best approach. A table per student sounds like a bad idea. You are going to be generating massive amounts of dynamic queries and not able to leverage indexing, so performance will suffer. I would highly recommend finding an approach to get the tables into one table and time series into a join table. Or look at a noSQL (non-relational approach). A document database seems like it might be a fit here.
That said, to answer your question: You need to query the schema (information_schema tables) for lists of tables and columns and then loop through querying the tables.
Start with the mysql docs here on information_schema

You need to create one table for students and one for timetable and have foreign key of student in timetable. Use best practices, consider you have 1000 students, you will end up creating 1000 tables while database is there is make life easier. Create one table, add as many entries as you want.
Secondly, ask your question more clearly using this structure so we may be able to help you

Table 1: Student:
id firstName lastName
Table 2: Schedule:
studentID day period classID
studentID(relates to Student.id)
classID(relates to Classes.id)
Table 3: Classes:
id className teacherName
BOLD is primary key
This will gather all students that have that teacher:
Select S1.firstName, S1.lastName, C.teacherName from Student as S1 join Schedule as S2 join Classes as C where S1.id = S2.studentID and S2.classID = C.id and C.teacherName = XXXX
This will gather all students that are in a certain class:
Select S1.firstName, S1.lastName from Student as S1 join Schedule as S2 where S1.id = S2.studentID and S2.classID = XXXX

inserting form data into mysql

Please advise how to do this php mysql form and data insert.Already searched on this site and couldn't find any question regarding this.
I have a form that collects student information - student_info(fields: id, name, sex, dob). I can insert this to a table. Now I would like to create two other tables like this
male_students (id, student_info_id, male_names)
female_students (id, student_info_id, female_names).
My idea for these two separate tables is because I can show the list of male and female easily by a SELECT query.
To do this, I thought I can do this but I am not sure how and if this is even a right approach.
for example I have a script called form_submit.php - this has the form
filling and submitting the form would insert data into student_info tables.
when doing the step 2, I would like to check if ($sex == male) or (if $sec==female), do a insert into male_students and female_students respectively.
but I am stuck
should i just write three individual queries inside the
form_submit.php ?
how to get the student_info_id for these two
tables. I thought of LAST_INSERT_ID but I am confused what will
happen if two users fill out the form at same time. So how to
approach this?
If this is not even a right way to approach, how to populate the data for those two tables?
Please advise.
regards

There is absolutely no reason to split "males" and "females" into their own tables in this scenario. (And I'm at a loss to imagine any scenario where it would make sense.)
The entity you're storing is, for lack of a better term, a Person. (User, Individual, etc. could be used in this context as well. Stick with whatever language is appropriate for the domain.) So a Person is a record in a table. Gender is an attribute of a Person, so it's a data element on that table. A highly simplified structure to convey this might be:
Person
----------
ID (integer)
GivenName (string)
FamilyName (string)
Gender (enumeration)
The Gender value would simply be a selected value from whichever possible options are available. Such options might include:
Male
Female
Unknown
Undisclosed
There are medical cases where there may be even more options, and psychological cases may indeed further add to the set. But for most domains that might be covered by "Unknown" or "Undisclosed" (or perhaps "Other" as an option, though that might look strange on the form to the vast majority of users).
To select this information, you'd simply add a WHERE clause to your query. Something like this:
SELECT * FROM Person WHERE Gender=1
If 1 maps to, for example, Male then this would select all Persons who have a Gender attribute of Male.

Storing database info as array

Which is good practice? To store data as a comma separated list in the database or have multiple rows?
I have a table for accounts, classes, and enrolments.
If the enrolment table has 3 fields: ID, AccountID and ClassID, is it better for ClassID to be a varchar containing a comma separated list such as this: "24,21,182,12" or for it to be just an int and have one entry per enrolment?

tldr: Don't do this. That is, don't use a "packed array" here.
Use a correctly normalized design with "multiple rows". This is likely a good candidate for a Many-to-Many relationship. Consider this structure:
Classes 1:M Enrollments(Class,Student) M:1 Students
Following a properly normalized design will reduce pain. In addition, here are some other advantages:
Referential integrity (use InnoDB)
Consistent model described with relationships
Type enforcement (can't have "foo,,")
JOIN and query without needing custom code
"What are the names of the students in class A?"
"Who is taking more than one class?"
Columns can be useful indexed (query performance)
Generally faster than handling locally in code
More flexible and consistent
Can attach attributes to enrollments such as status
No need to have code to handle serialization at access sites
More accommodating of placeholders and ORMs

Never ever ever cram multiple values into a single database field by combining them with some sort of delimiter, like a comma, or fixed length substrings. In the rare cases where this clearly gives a benefit in storage requirements or performance ... see rule #1: never ever ever. Ever.
When you cram multiple values into a single field, you sabatague all the clever features built into the database engine to help you retrieve and manipulate values.
Like let's say you have this -- I guess it's some sort of student database.
Plan A
student (student_id, account_id, class_id_mash)
Plan B
student (student_id, account_id)
student_class (student_id, class_id)
Okay, lets' say you want a list of all the students taking class #27. With Plan B you write
select student_id
from student join student_class on student.student_id=student_class.student_id
where class_id=27
Easy.
How would you do it with Plan A? You might think
select student_id
from student
where class_id_mash like '%27%'
But that will not only find all students in class 27, but also all those in class 127 or 272.
Okay, how about:
select student_id
from student
where class_id_mash like '%,27,%'
There, now we won't find 127 or 272! But, oops, we also won't find it if the 27 happens to be the first or last one in the list, because then there aren't commas on both sides.
So okay, maybe we could get around that with more rules about delimiters or with a more complex matching expression. But it would be unnecessariliy complex and painful.
And even if we did it, every search for class id has to be a full-fill sequential search. With one value per field and multiple records, you can create an index on the class_id field for fast, efficient retrieval. (Some database engines have ways to index into the middle of text fields, but again, why get into complicated solutions when there's an easy solution?)
How do we validate the class_id's? With separate fields, we can say "class_id references class" and the database engine will insure that we don't enter an illegal value. With the mash, no such free validation.

I have done both, but instead of storing the information in the database as comma seperated, I use another delimiter, such as | (so that I don't worry about formatting on insert into db). Its more about how often you will query the data

If you are only going to need the complete list, it is fine to store it as a comma separated value. But if you need to query the list, they should be stored separately.

more efficient database structure across multiple tables

I am setting up a MySQL database with multiple tables. Several of the tables will have fields with similar names that aren't necessarily for the same purpose.
For example, there's a users table that will have a name field, a category table with a name field and so on.
I've previously seen this setup up either with or without a preface to the field name, so in the above example using user_name, cat_name etc.
As these are all in separate tables, is there any benefit to structuring the database with or without this preface? I know that when using joins and calling the data through PHP you have to add a SELECT users.name AS username... to keep the fields from overwriting each other when using mysql_fetch_array. But i'm not sure if there's any efficiencies in using one method over the other?

It depends on what your shop does or your preference. There is nothing about a prefix that will make this better. Personally I would just keep it as name since: Users.Name and Orders.Name and Products.Name all contain tuples with different object types.
At the end of the day you want to be consistent. If you prefer a cat_ and a user_ prefix just be consistent with your design and include this prefix for all object types. To me less is more.

It's really just a matter of preference. I personally prefer the approach of using just name.
One thing to watch out for though, if you're doing any SELECT * FROM ... queries (which you shouldn't be; always select fields explicitly), you may end up selecting the wrong data.

One disadvantage is if anyone is stupid enough to use natural joins (you can guess that I find this a poor practice but mysql does allow it so you need to consider if that will happen) you may end up joining on those fields with the same name by accident.

Is it considered bad form to encode object-oriented data directly into single rows in a relational database?

I'm relatively new to databases so I apologize if there's an obvious way to approach this or if there is some fundamental process I'm missing. I'm using PHP and MySQL in a web application involving patient medical records. One requirement is that users be able to view and edit the medical records from a web page.
As I envisage it, a single Patient object has basic attributes like id, name, and address, and then each Patient also has an array of Medication objects (med_name, dose, reason), Condition objects (cond_name, date, notes), and other such objects (allergies, family history, etc.). My first thought was to have a database schema with tables as follows:
patients (id, name, address, ...)
medications ( patient_id, med_name, dose, reason)
conditions ( patient_id, cond_name, date, notes)
...
However, this seems wrong to me. Adding new medications or conditions is easy enough, but deleting or editing existing medications or conditions seems ridiculously inefficient - I'd have to, say, search through the medications table for a row matching patient_id with the old med_name, dose, and reason fields, and then delete/edit it with the new data. I could add some primary key to the medications and conditions tables to make it more efficient to find the row to edit, but that would seem like an arbitrary piece of data.
So what if I just had a single table with the following schema?
patients (id, name, address, meds, conds, ...)
Where meds and conds are simply representations (say, binary) of arrays of Medication and Condition objects? PHP can interpret this data and fetch and update it in the database as needed.
Any thoughts on best practices here would be welcome. I'm also considering switching to Ruby on Rails, so if that affects any decisions I should make I'm interested to hear that as well. Thanks a lot folks.

The 'badness' or 'goodness' of encoding your data like that depends on your needs. If you NEVER need to refer to individual smaller chunks of data in those 'meds' and 'conds' tables, then there's no problem.
However, then you're essentially reducing your database to a slightly-smarter-than-dumb storage system, and lose the benefits of the 'relational' part of SQL databases.
e.g. if you ever need to run a a query for "find all patients who are taking viagra and have heart conditions", then the DBMS won't be able directly run that query, as it has no idea how you've "hidden" the viagra/heart condition data inside those two fields, whereas with a properly normalized database you'd have:
SELECT ...
FROM patients
LEFT JOIN conditions ON patients.id = conditions.patient_id
LEFT JOIN meds ON patients.id = meds.patient_id
WHERE (meds.name = 'Viagra') AND (condition.name = 'Heart Disease')
and the DBMS hands everything automatically. If you're encoding everything into a single field, then you're stuck with substring operations (assuming the data's in some readable ascii format), or at worse, having to suck the entire database across to your client app, decode each field, check its contents, then throw away everything that doesn't contain viagra or heart disease - highly inefficient.

This breaks first normal form. You can never query on object attributes that way.
I'd recommend either an ORM solution, if you have objects, or an object database.

I'd have to, say, search through the medications table for a row
matching patient_id with the old med_name, dose, and reason fields,
and then delete/edit it with the new data.
Assuming the key was {patient_id, med_name, start_date}, you'd just do a single update. No searching.
update medications
set reason = 'Your newly edited reason, for example.'
where patient_id = ?
and med_name = ?
and start_date = ?
Your app will already know the patient id, med name, and start date, because the user will have to somehow "select" the row those are in before any change will make sense.
If you're going to change the dosage, you need two changes, an update and an insert, in order to make sense.
update medications
set stop_date = '2012-01-12'
where patient_id = ?
and med_name = ?
and start_date = ?
-- I'm using fake data in this one.
insert into medications (patient_id, med_name, start_date, stop_date, dosage)
values (1, 'that same med', '2012-01-12', '2012-01-22', '40mg bid')

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.