Optimising Database Structure

Optimising Database Structure - php

I'm developing a reward system for our VLE which uses three separate technologies - JavaScript for most of the client-side/display processing, PHP to communicate with the database and MySQL for the database itself.
I've attached three screenshots of my "transactions" table. Its structure, a few example records and an overview of its details.
The premise is that members of staff award points to students for good behaviour etc. This can mean that classes of 30 students are given points at a single time. Staff have a limit of 300 points/week and there are around 85 staff currently accessing the system (this may rise).
The way I'm doing it at the moment, every "transaction" has a "Giver_ID" (the member of staff awarding points), a "Recipient_ID" (the student receiving the points), a category and a reason. This way, every time a member of staff issues 30 points, I'm putting 30 rows into the database.
This seemed to work early on, but within three weeks I already have over 12,000 transactions in the database.
At this point it gets a bit more complicated. On the Assign Points page (another screenshot attached), when a teacher clicks into one of their classes or searches for an individual student, I want the students' points to be displayed. The only way I can currently do this on my system is to do a "SELECT * FROM 'transactions'" and put all the information into an array using the following JS:
var Points = { "Recipient_ID" : "0", "Points" : "0" };
function getPoints (data) {
for (var i = 0; i < data.length; i++) {
if (Points[data[i].Recipient_ID]) {
Points[data[i].Recipient_ID] = parseInt(Points[data[i].Recipient_ID]) + parseInt(data[i].Points);
} else {
Points[data[i].Recipient_ID] = data[i].Points;
}
}
}
When logging in to the system internally, this appears to work quickly enough. When logging in externally however, this process takes around 20 seconds, and thus doesn't display the students' points values until you've clicked/searched a few times.
I'm using the following code in my PHP to access these transactions:
function getTotalPoints() {
$sql = "SELECT *
FROM `transactions`";
$res = mysql_query($sql);
$rows = array();
while($r = mysql_fetch_assoc($res)) {
$rows[] = $r;
}
if ($rows) {
return $rows;
} else {
$err = Array("err_id" => 1);
return $err;
}
}
So, my question is, how should I actually be approaching this? Full-text indices; maybe a student table with their total points values which gets updated every time a transaction is entered; mass-transactions (i.e. more than one student receiving the same points for the same category) grouped into a single database row? These are all things I've contemplated but I'd love someone with more DB knowledge than myself to provide enlightenment.
Example records
Table structure
Table overview
Assign Points interface
Many thanks in advance.

Your problem is your query:
SELECT * FROM `transactions`
As your data set gets bigger, this will take longer to load and require more memory to store it. Rather determine what data you need specifically. If it's for a particular user:
SELECT SUM(points) FROM `transactions` WHERE Recipient_ID=[x]
Or if you want all the sums for all your students:
SELECT Recipient_ID, SUM(points) AS Total_Points FROM `transactions` GROUP BY Recipient_ID;
To speed up selections on a particular field you can add an index for that field. This will speed up the selections, especially as the table grows.
ALTER TABLE `transactions` ADD INDEX Recipient_ID (Recipient_ID);
Or if you want to display a paginated list of all the entries in transactions:
SELECT * FROM `transactions` LIMIT [page*num_records_per_page],[num_records_per_page];
e.g.: SELECT * FROM `transactions` LIMIT 0,25 ORDER BY Datetime; # First 25 records

Adding to Tom's suggestion, you may want to consider normalizing your database further. I assume you now have 3 tables:
students (id, name, ...)
staff (id, name, ...)
transactions (id, student_id, staff_id, points, date, reason)
A more normalized form uses more tables with less data:
students (id, name, ...)
staff (id, name, ...)
transactions (id, staff_id, points, date, reason)
transactions_students (transaction_id, student_id)
Adding a transaction then becomes a two-step process: First you create one transaction record, and then you insert multiple records into transactions_students, each one linking the transaction to one student. Note that you can create a view that behaves exactly like the original denormalized table for selecting, something like:
CREATE VIEW vw_transactions AS SELECT transactions.*, transactions_students.student_id FROM transactions INNER JOIN transactions_students WHERE transactions_students.transaction_id = transactions.id
This will drastically reduce the number of records in the transactions table, and it avoids storing the date and the reason redunantly. The downside is that linking transactions to students requires one extra join - but if you have your foreign keys and indexes set up properly, this doesn't have to be a problem at all.

I'd index the Recipient_ID so you can search for 1 person specifically at any given point or at least be able to GROUP your data more effectively. If you do opt to group by category_id then i'd add a seperate or combined index to category_id too.
The second suggestion would be to GROUP and AGGREGATE your data on the fly. For example:
SELECT Recipient_ID, Category_ID, SUM(points) FROM transactions GROUP BY Recipient_ID, Category_ID
These two suggestions should dramatically upgrade your performance because instead of calculating the total points for your students on the PHP/JS side, you will do it directly on the database.

Related

How do I improve the speed of these PHP MySQLi queries without indexing?

Lets start by saying that I cant use INDEXING as I need the INSERT, DELETE and UPDATE for this table to be super fast, which they are.
I have a page that displays a summary of order units collected in a database table. To populate the table an order number is created and then individual units associated with that order are scanned into the table to recored which units are associated with each order.
For the purposes of this example the table has the following columns.
id, UID, order, originator, receiver, datetime
The individual unit quantities can be in the 1000's per order and the entire table is growing to hundreds of thousands of units.
The summary page displays the number of units per order and the first and last unit number for each order. I limit the number of orders to be displayed to the last 30 order numbers.
For example:
Order 10 has 200 units. first UID 1510 last UID 1756
Order 11 has 300 units. first UID 1922 last UID 2831
..........
..........
Currently the response time for the query is about 3 seconds as the code performs the following:
Look up the last 30 orders by by id and sort by order number
While looking at each order number in the array
-- Count the number of database rows that have that order number
-- Select the first UID from all the rows as first
-- Select the last UID from all the rows as last
Display the result
I've determined the majority of the time is taken by the Count of the number of units in each order ~1.8 seconds and then determining the first and last numbers in each order ~1 second.
I am really interested in if there is a way to speed up these queries without INDEXING. Here is the code with the queries.
First request selects the last 30 orders processed selected by id and grouped by order number. This gives the last 30 unique order numbers.
$result = mysqli_query($con, "SELECT order, ANY_VALUE(receiver) AS receiver, ANY_VALUE(originator) AS originator, ANY_VALUE(id) AS id
FROM scandb
GROUP BY order
ORDER BY id
DESC LIMIT 30");
While fetching the last 30 order numbers count the number of units and the first and last UID for each order.
while($row=mysqli_fetch_array($result)){
$count = mysqli_fetch_array(mysqli_query($con, "SELECT order, COUNT(*) as count FROM scandb WHERE order ='".$row['order']."' "));
$firstLast = mysqli_fetch_array(mysqli_query($con, "SELECT (SELECT UID FROM scandb WHERE orderNumber ='".$row['order']."' ORDER BY UID LIMIT 1) as 'first', (SELECT UID FROM barcode WHERE order ='".$row['order']."' ORDER BY UID DESC LIMIT 1) as 'last'"));
echo "<td align= center>".$count['count']."</td>";
echo "<td align= center>".$firstLast['first']."</td>";
echo "<td align= center>".$firstLast['last']."</td>";
}
With 100K lines in the database this whole query is taking about 3 seconds. The majority of the time is in the $count and $firstlast queries. I'd like to know if there is a more efficient way to get this same data in a faster time without Indexing the table. Any special tricks that anyone has would be greatly appreciated.

Design your database with caution
This first tip may seems obvious, but the fact is that most database problems come from badly-designed table structure.
For example, I have seen people storing information such as client info and payment info in the same database column. For both the database system and developers who will have to work on it, this is not a good thing.
When creating a database, always put information on various tables, use clear naming standards and make use of primary keys.
Know what you should optimize
If you want to optimize a specific query, it is extremely useful to be able to get an in-depth look at the result of a query. Using the EXPLAIN statement, you will get lots of useful info on the result produced by a specific query, as shown in the example below:
EXPLAIN SELECT * FROM ref_table,other_table WHERE ref_table.key_column=other_table.column;
Don’t select what you don’t need
A very common way to get the desired data is to use the * symbol, which will get all fields from the desired table:
SELECT * FROM wp_posts;
Instead, you should definitely select only the desired fields as shown in the example below. On a very small site with, let’s say, one visitor per minute, that wouldn’t make a difference. But on a site such as Cats Who Code, it saves a lot of work for the database.
SELECT title, excerpt, author FROM wp_posts;
Avoid queries in loops
When using SQL along with a programming language such as PHP, it can be tempting to use SQL queries inside a loop. But doing so is like hammering your database with queries.
This example illustrates the whole “queries in loops” problem:
foreach ($display_order as $id => $ordinal) {
$sql = "UPDATE categories SET display_order = $ordinal WHERE id = $id";
mysql_query($sql);
}
Here is what you should do instead:
UPDATE categories
SET display_order = CASE id
WHEN 1 THEN 3
WHEN 2 THEN 4
WHEN 3 THEN 5
END
WHERE id IN (1,2,3)
Use join instead of subqueries
As a programmer, subqueries are something that you can be tempted to use and abuse. Subqueries, as show below, can be very useful:
SELECT a.id,
(SELECT MAX(created)
FROM posts
WHERE author_id = a.id)
AS latest_post FROM authors a
Although subqueries are useful, they often can be replaced by a join, which is definitely faster to execute.
SELECT a.id, MAX(p.created) AS latest_post
FROM authors a
INNER JOIN posts p
ON (a.id = p.author_id)
GROUP BY a.id
Source: http://20bits.com/articles/10-tips-for-optimizing-mysql-queries-that-dont-suck/

JOIN query too slow on real database, on small one it runs fine

I need help with this mysql query that executes too long or does not execute at all.
(What I am trying to do is a part of more complex problem, where I want to create PHP cron script that will execute few heavy queries and calculate data from the results returned and then use those data to store it in database for further more convenient use. Most likely I will make question here about that process.)
First lets try to solve one of the problems with these heavy queries.
Here is the thing:
I have table: users_bonitet. This table has fields: id, user_id, bonitet, tstamp.
First important note: when I say user, please understand that users are actually companies, not people. So user.id is id of some company, but for some other reasons table that I am using here is called "users".
Three key fields in users_bonitet table are: user_id ( referencing user.id), bonitet ( represents the strength of user, it can have 3 values, 1 - 2 - 3, where 3 is the best ), and tstamp ( stores the time of bonitet insert. Every time when bonitet value changes for some user, new row is inserted with tstamp of that insert and of course new bonitet value.). So basically some user can have bonitet of 1 indicating that he is in bad situation, but after some time it can change to 3 indicating that he is doing great, and time of that change is stored in tstamp.
Now, I will just list other tables that we need to use in query, and then I will explain why. Tables are: user, club, club_offer and club_territories.
Some users ( companies ) are members of a club. Member of the club can have some club offers ( he is representing his products to the people and other club members ) and he is operating on some territory.
What I need to do is to get bonitet value for every club offer ( made by some user who is member of a club ) but only for specific territory with id of 1100000; Since bonitet values are changing over time for each user, that means that I need to get the latest one only. So if some user have bonitet of 1 at 21.01.2012, but later at 26.05.2012 it has changed to 2, I need to get only 2, since that is the current value.
I made an SQL Fiddle with example db schema and query that I am using right now. On this small database, query is working what I want and it is fast, but on real database it is very slow, and sometimes do not execute at all.
See it here: http://sqlfiddle.com/#!9/b0d98/2
My question is: am I using wrong query to get all this data ? I am getting right result but maybe my query is bad and that is why it executes so slow ? How can I speed it up ? I have tried by putting indexes using phpmyadmin, but it didn't help very much.
Here is my query:
SELECT users_bonitet.user_id, users_bonitet.bonitet, users_bonitet.tstamp,
club_offer.id AS offerId, club_offer.rank
FROM users_bonitet
INNER JOIN (
SELECT max( tstamp ) AS lastDate, user_id
FROM users_bonitet
GROUP BY user_id
)lastDate ON users_bonitet.tstamp = lastDate.lastDate
AND users_bonitet.user_id = lastDate.user_id
JOIN users ON users_bonitet.user_id = users.id
JOIN club ON users.id = club.user_id
JOIN club_offer ON club.id = club_offer.club_id
JOIN club_territories ON club.id = club_territories.club_id
WHERE club_territories.territory_id = 1100000
So I am selecting bonitet values for all club offers made by users that are members of a club and operate on territory with an id of 1100000. Important thing is that I am selecting club_offer.id AS offerId, because I need to use that offerId in my application code so I can do some calculations based on bonitet values returned for each offer, and insert data that was calculated to the field "club_offer.rank" for each row with the id of offerId.

Your query looks fine. I suspect your query performance may be improved if you add a compound index to help the subquery that finds the latest entry from users_botinet for each user.
The subquery is:
SELECT max( tstamp ) AS lastDate, user_id
FROM users_bonitet
GROUP BY user_id
If you add (user_id, tstamp) as an index to this table, that subquery can be satisfied with a very efficient loose index scan.
ALTER TABLE users_bonitet ADD KEY maxfinder (user_id, tstamp);
Notice that if this users_botinet table had an autoincrementing id number in it, your subquery could be refactored to use that instead of tstamp. That would eliminate the possibility of duplicates and be even more efficient, because there's a unique id for joining. Like so.
FROM users_botinet
INNER JOIN (
SELECT MAX(id) AS id
FROM users_botinet
GROUP BY user_id
) ubmax ON users_botinet.id = ubmax.id
In this case your compound index would be (user_id, id.
Pro tip: Don't add lots of indexes unless you know you need them. It's a good idea to read up on how indexes can help you. For example. http://use-the-index-luke.com/

MySQL:Select a table (for email) based on another tables data

Here is the issue: I have 4 tables in a DB. one is "calls" and the other 3 are the support teams "IT","Maintenance","Engineering". When a row is created in the "calls" table there is a field named" Support team" and there are 3 possible options for this field it, maintenance, and engineering. I need to be able to email these teams based on what team has been requested in the "calls" table. All of the email info is stored in the individual team's table. I hope this makes sense. If not I can diagram the issue.

Create 3 string variables to go into support_team field of the calls table: it, maintenance, engineering.
Let the variables have the same name as the database tables it, maintenance and engineering.
After inserting a new record in the calls table, use this
$team = "Select calls.support_team from calls where id = $last_id ";
"Select * from $team ";
Where $last_id is the id of the recently inserted record in the calls table.
Since $team gets the name of the tables: id, maintenance or engineering; the second line of code just gets the names of team members.

Since the tables are limited you could do a bunch of left joins:
SELECT * FROM calls
LEFT JOIN team_it ON calls.`support team` = 'it' AND calls.id = team_it.id
... etc
I didn't know what the join conditions are, so I guessed the calls.id = team_it.id
If the three tables already have a "foreign key" to calls, you can just left join on that instead.

Best method for storing quiz results in MySQL

I'm trying to record test/quiz scores in a database. What's the best method to do this when there might be a lot of tests and users?
These are some options I considered: should I create a new column for each quiz and row for users, or does this have its limitations? Might this be slow? Should i create a new row for each user & quiz? Should I stick to my original 'user' database and encode it in text?
Elaborating a little on the plan: JavaScript Quiz, submits score with AJAX, and a script sends it to the database. I'm new with php so i'm not sure about a good approach.
Any help would be greatly appreciated :) this is for a school science fair

I'd suggest 3 data tables in your database: students, tests, and scores.
Each student needs to have fields for an ID and whatever else (name, dob, etc) you want to record about them.
Tests should have fields for an ID and whatever else (name, date, weight, etc).
Scores should have the student ID, a test ID, and the score (any anything else).
This means you can query a student and join with the scores table to get all the student's scores. You can also join the test table these results to get labels put onto each score and calculate a grade based on scores and weight.
Alternately you can query for a test and join with the scores to get all the scores on a given test to get the class stats.

I would say create a database table, maybe one that lists all students(name, dob, student id), and then one for all tests(score, date, written by). Will only you access the db, or can your students access it too? If the latter is the case, you need to make sure the create accurate security or "views" to ensure the student can only see their own grades at a time (not everyone's).

Definitely do not create dynamic columns! (no column for each quiz). Also adding columns to user table (or generally any table) when they are not identifying the user(or generally any table item) is bad aproach...
This is pretty example of normalization, you should avoid storing any redundant rows. To do that you would create 3 tables and foreign keys to ensure scores are always referencing an existing user and quiz. E.g.:
users - id, nickname, name
quizzes - id, quizName, quizOtherData
scores - id, user_id (references users.id) , quiz_id , (ref. quizzes.id), score
And then add rows to scores table per user per quiz. Additionaly you could create UNIQUE key for columns user_id and quiz_id to disallow users to complete one quiz more times than one.
This will be fast and will not store redundant (unneeded extra) data.
To get results of quiz with id e.g. 4 and user info of people who's submitted this quiz, ordered from highest to lowest score, you would do query like:
SELECT users.*, scores.score
FROM scores RIGHT JOIN users ON(users.id=scores.user_id)
WHERE scores.quiz_id = 4
ORDER BY score DESC
Reason why I used RIGHT join here is because there might be users that didn't do this quiz, however every score always have an existing user&quiz (due to foreign keys
To get overall info of all users, quizes and scores you would do something like:
SELECT *
FROM quizzes
LEFT JOIN scores ON(quizzes.id=scores.quiz_id)
LEFT JOIN users ON(users.id=scores.user_id)
ORDER BY quizzes.id DESC, scores.score DESC, users.name ASC
BTW: If you are new to PHP (or anybody reading this), use PHP's PDO interface to communicate with your database :) AVOID functions like mysql_query, at least use mysqli_query, but for portability I would recommend stay with PDO.

Creating class schedule table in SQL

I need to create a table to store users' class schedules. These schedules have 7 blocks a day for Monday through Friday. However, not all blocks are filled with classes.
I was planning on creating a table that stored stored a user's id, the period id, the class name, and the class subject in each record. If I implement it this way, what is the best way to determine when a user does not have classes using PHP? Is there a better layout for this?

You need to make three tables, and set up a many-to-many relationship.
But if you don't want to get real complex, why not just insert the students free time like a class, call it 'free time', then you can just search for those.
SELECT * FROM records WHERE student_id = '0001' AND class = 'free time'
Otherwise, I'm not sure how you'd find an empty block without having a table devoted to the blocks.

i wouldnt say you NEED to do anything, but i think you'll eventually find normalizing is a very good idea here. http://en.wikipedia.org/wiki/Database_normalization
you probably want tables for:
student (id, name, whatever)
course (id, name, subject/dept)
section (id, course id and time info)
student_to_section (student id, section id)
Time depends. You can put start/end times on sections (SQL timestamps or integer unix time stamps would each be fine) or keep a table of time slots with unique id (then sections would just have a foreign key to this id). EDIT: sounds like you've chosen the second way
As for your free-time, find the time periods for all sections taken by a student and free time is whats left. The following will give the time blocks where a student is BUSY.
SELECT T.*
FROM section S
INNER JOIN time_blocks T on S.time_id = T.id
INNER JOIN student_to_section STS on STS.section_id = S.id
WHERE STS.student_id = ###
For free time, use:
SELECT T2.*
FROM time_blocks T2
WHERE T2.id NOT IN
(put above statement here)

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.