I need to determine what table data is from for a news feed. The feed must say something like "Person has uploaded a video" or "Person has updated their bio". Therefore I need to determine where data came from as different types of data are in different tables, obviously. I am hoping you can do this with SQL but probably not so PHP is the option. I have no idea how to do this so just need pointing in the right direction.
I'll briefly describe the database as I don't have time to make a diagram.
1.There is a table titled members with all basic info such as email, password and ID. The ID is the primary key.
All other tables have foreign keys for the ID linking to the ID in the members table.
Other tables include; tracks, status, pics, videos. All pretty self explanatory from there.
I need to determine somehow what table the updated data comes from so I can then tell the user what so and so has done. Preferably I would want only one SQL statement for the whole feed so all the tables are joined and ordered by timestamp making everything much simpler for me. Hopefully I can do both but as I said really not sure.
A basic outline of the statement, will be longer but have simplified;
SELECT N.article, N.ID, A.ID, A.name,a.url, N.timestamp
FROM news N
LEFT JOIN artists A ON N.ID = A.ID
WHERE N.ID = A.ID
ORDER BY N.timestamp DESC
LIMIT 10
Members table;
CREATE TABLE `members` (
`ID` int(111) NOT NULL AUTO_INCREMENT,
`email` varchar(100) COLLATE latin1_general_ci NOT NULL,
`password` varchar(100) COLLATE latin1_general_ci NOT NULL,
`FNAME` varchar(100) COLLATE latin1_general_ci NOT NULL,
`SURNAME` varchar(100) COLLATE latin1_general_ci NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00' ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`ID`),
UNIQUE KEY `email` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=5 DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci
Tracks table, all other tables are pretty much the same;
CREATE TABLE `tracks` (
`ID` int(11) NOT NULL,
`url` varchar(200) COLLATE latin1_general_ci NOT NULL,
`name` varchar(100) COLLATE latin1_general_ci NOT NULL,
`timestamp` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00' ON UPDATE CURRENT_TIMESTAMP,
`track_ID` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`track_ID`),
UNIQUE KEY `url` (`url`),
UNIQUE KEY `track_ID` (`track_ID`),
KEY `ID` (`ID`),
CONSTRAINT `tracks_ibfk_1` FOREIGN KEY (`ID`) REFERENCES `members` (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci
Before I have tried using a mysql query for each table and putting everything into an array and echoing it out. This seemed long and tiresome and I had no luck with it. I have now deleted all that code as it was a week or so ago.
Please do not feel you have to go into depth with this just point me in the right direction.
ADDITION:
Here is the sql query i have made for a trigger that was suggested. Not sure what is wrong as have never used trigger before. When inserting something into tracks this error comes up
#1054 - Unknown column 'test' in 'field list'
The values in the query are just for testing at the moment
delimiter $$
CREATE
TRIGGER tracks_event AFTER INSERT
ON tracks FOR EACH ROW
BEGIN
INSERT into events(ID, action)
VALUES (3, test);
END$$
delimiter ;
UPDATE!
I have now created a table called events as suggested and used triggers to update it AFTER an insert in one of several tables.
Here is the query I have tried but it is wrong. The query needs to get info referenced in the events table from all the other tables and order by timestamp.
SELECT T.url, E.ID, T.ID, E.action, T.name, T.timestamp
FROM tracks T
LEFT JOIN events E ON T.ID = E.ID
WHERE T.ID = E.ID
ORDER BY T.timestamp DESC
In that query I have only include the events and tracks table for simplicity as the problem is still there. There will be many more tables so the problem will worsen.
It's hard to describe the problem but basically because there is an ID in every table and one ID can do several actions, the action can be shown with the wrong outcome, in this case url.
I will explain what's in the events table and the tracks table and give the outcome to further explain.
In the events table;
4 has uploaded a track.
3 has some news.
4 has become an NBS artist.
In the tracks;
2 uploads/abc.wav Cannonballs & Stones 2012-08-20 23:59:59 1
3 uploads/19c9aa51c821952c81be46ca9b2e9056.mp3 test 2012-08-31 23:59:59 2
4 uploads/2b412dd197d464fedcecb1e244e18faf.mp3 testing 2012-08-31 00:32:56 3
4 111 111111 0000-00-00 00:00:00 111111
Outcome of query;
uploads/19c9aa51c821952c81be46ca9b2e9056.mp3 3 3 has some news. test 2012-08-31 23:59:59
uploads/2b412dd197d464fedcecb1e244e18faf.mp3 4 4 has uploaded a track. testing 2012-08-31 00:32:56
uploads/2b412dd197d464fedcecb1e244e18faf.mp3 4 4 has become an NBS artist. testing 2012-08-31 00:32:56
111 4 4 has become an NBS artist. 111111 0000-00-00 00:00:00
111 4 4 has uploaded a track. 111111 0000-00-00 00:00:00
As you can see the query gives unwanted results. The action for each ID is given on each url so the url can be shown more than once and with the wrong action. Because there is only the tracks table in that query, the only action i would want showing is 'has uploaded a track.'
It's hard to provide the statement you want without the full details of your schema. For example, the question refers to a news table and an artists table, but doesn't provide the schemas for those, or indicate how the statement that contains those references relate to any of the other tables mentioned in the question.
Still, I think what you want can be done entirely in MySQL, without any fun PHP tricks, especially if there are common fields in each of the various tables.
But first: this might not be the answer you're really wanting, but using triggers on your various tables to update an "events feed" table is likely the best solution. i.e., when an insert or update happens on the "status" table, have a trigger on the status table that inserts into the "events feed" table the ID of the person, and their type of action. You could have a separate insert and update trigger to indicate different events for the same data type.
Then it'd be super-easy to have an events feed, because you're just selecting straight from that events feed table.
Check out the create trigger syntax.
That said, I think you might have a look at the CASE and UNION keywords.
You can then construct a query that grabs data from all tables and outputs strings indicating something. You could then turn that query into a view, and use that as an "events feed" table to select directly from.
Say you have a list of members (which you do), and the various tables that contain actions from those members (i.e., tracks, status, pics, videos), which all have a key pointing back to your members table. You don't need to select from members to generate a list of activity, then; you can just UNION together the tables that have certain events.
SELECT
events.member_id
, events.table_id
, events.table
, events.action
, events.when_it_happened
, CASE
WHEN events.table = "tracks" THEN "Did something with tracks"
WHEN events.table = "status" THEN "Did something with status"
END
AS feed_description
FROM (
SELECT
tracks.ID AS member_id
, tracks.track_ID AS table_id
, "tracks" AS table
, CONCAT(tracks.url, ' ', tracks.name) AS action
, tracks.timestamp AS when_it_happened
ORDER BY tracks.timestamp DESC
LIMIT 10
UNION
SELECT
status.ID as member_id
, status.status_id AS table_id
, "status" AS table
, status.value AS action
, status.timestamp AS when_it_happened
ORDER BY status.timestamp DESC
LIMIT 10
UNION
...
) events
ORDER BY events.when_it_happened DESC
I still think you'd be better off creating a feed table built by triggers, because it'll perform a lot better if you're querying for the feed more often than generating events.
Related
I have a table where I log members.
There are 1,486,044 records here.
SELECT * FROM `user_log` WHERE user = '1554143' order by id desc
However, this query takes 5 seconds. What do you recommend ?
Table construction below;
CREATE TABLE IF NOT EXISTS `user_log` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`user` int(11) NOT NULL,
`operation_detail` varchar(100) NOT NULL,
`ip_adress` varchar(50) NOT NULL,
`l_date` datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
COMMIT;
For this query:
SELECT * FROM `user_log` WHERE user = 1554143 order by id desc
You want an index on (user, id desc).
Note that I removed the single quotes around the filtering value for user, since this column is a number. This does not necessarily speeds things up, but is cleaner.
Also: select * is not a good practice, and not good for performance. You should enumerate the columns you want in the resultset (if you don't need them all, do not select them all). If you want all columns, since your table has not a lot of columns, you might want to try a covering index on all 5 columns, like: (user, id desc, operation_detail, ip_adress, l_date).
In addition to the option of creating an index on (user, id), which has already been mentioned, a likely better option is to convert the table to InnoDB as create an index only on (user).
I cleaned the question a little bit because it was getting very big and unreadable.
Running on my localhost.
As you can see in the image below, the query takes 755.15 ms when selecting from the table Job that contains 15000 rows (with the where conditions returning 6650)
The table Company contains 1000 rows.
The table geo__name contains 84300 rows approx and is not giving me any problem, so I believe the problem is the database structure or something.
The structure of these 2 tables is the following:
Table Job is:
CREATE TABLE `job` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`company_id` int(11) NOT NULL,
`activity_sector_id` int(11) DEFAULT NULL,
`status` int(11) NOT NULL,
`active` datetime NOT NULL,
`contract_type_id` int(11) NOT NULL,
`salary_type_id` int(11) NOT NULL,
`workday_id` int(11) NOT NULL,
`geoname_id` int(11) NOT NULL,
`title` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`minimum_experience` int(11) DEFAULT NULL,
`min_salary` decimal(7,2) DEFAULT NULL,
`max_salary` decimal(7,2) DEFAULT NULL,
`zip_code` int(11) DEFAULT NULL,
`vacancies` int(11) DEFAULT NULL,
`show_salary` tinyint(1) NOT NULL,
PRIMARY KEY (`id`),
KEY `created_at` (`created_at`,`active`,`status`) USING BTREE,
CONSTRAINT `FK_FBD8E0F823F5422B` FOREIGN KEY (`geoname_id`) REFERENCES `geo__name` (`id`),
CONSTRAINT `FK_FBD8E0F8398DEFD0` FOREIGN KEY (`activity_sector_id`) REFERENCES `activity_sector` (`id`),
CONSTRAINT `FK_FBD8E0F85248165F` FOREIGN KEY (`salary_type_id`) REFERENCES `job_salary_type` (`id`),
CONSTRAINT `FK_FBD8E0F8979B1AD6` FOREIGN KEY (`company_id`) REFERENCES `company` (`id`),
CONSTRAINT `FK_FBD8E0F8AB01D695` FOREIGN KEY (`workday_id`) REFERENCES `workday` (`id`),
CONSTRAINT `FK_FBD8E0F8CD1DF15B` FOREIGN KEY (`contract_type_id`) REFERENCES `job_contract_type` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=15001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
The table company is:
CREATE TABLE `company` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`name` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`logo` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`created_at` datetime NOT NULL,
`updated_at` datetime NOT NULL,
`website` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`user_id` int(11) NOT NULL,
`phone` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`cifnif` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
`type` int(11) NOT NULL,
`subscription_id` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
UNIQUE KEY `UNIQ_4FBF094FA76ED395` (`user_id`),
KEY `IDX_4FBF094F9A1887DC` (`subscription_id`),
KEY `name` (`name`(191)),
CONSTRAINT `FK_4FBF094F9A1887DC` FOREIGN KEY (`subscription_id`) REFERENCES `subscription` (`id`),
CONSTRAINT `FK_4FBF094FA76ED395` FOREIGN KEY (`user_id`) REFERENCES `user` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
The query is the following:
SELECT
j0_.id AS id_0,
j0_.status AS status_1,
j0_.title AS title_2,
j0_.min_salary AS min_salary_3,
j0_.max_salary AS max_salary_4,
c1_.id AS id_5,
c1_.name AS name_6,
c1_.logo AS logo_7,
a2_.id AS id_8,
a2_.name AS name_9,
g3_.id AS id_10,
g3_.name AS name_11,
j4_.id AS id_12,
j4_.name AS name_13,
j5_.id AS id_14,
j5_.name AS name_15,
w6_.id AS id_16,
w6_.name AS name_17
FROM
job j0_
INNER JOIN company c1_ ON j0_.company_id = c1_.id
INNER JOIN activity_sector a2_ ON j0_.activity_sector_id = a2_.id
INNER JOIN geo__name g3_ ON j0_.geoname_id = g3_.id
INNER JOIN job_salary_type j4_ ON j0_.salary_type_id = j4_.id
INNER JOIN job_contract_type j5_ ON j0_.contract_type_id = j5_.id
INNER JOIN workday w6_ ON j0_.workday_id = w6_.id
WHERE
j0_.active >= CURRENT_TIMESTAMP
AND j0_.status = 1
ORDER BY
j0_.created_at DESC
When executing the above query I have these results:
In MYSQL Workbench: 0.578 sec / 0.016 sec
In Symfony profiler: 755.15 ms
The question is: Is the duration of this query correct? if not, how can I improve the speed of the query? it seems too much.
The Symfony debug toolbar if it helps:
As you can see in the below image, I'm only getting the data I really need:
The explain query:
The timeline:
The MySQL server can't handle the load being placed on it. This could be due to resource contention, or because it has not been appropriately tuned and it could also be a problem with your hard drive.
First, I would start your performance by adding MySQL keyword "STRAIGHT_JOIN" which tells MySQL to query the data in the order I have provided, dont try to think the relationships for me. However, on your dataset being so small, and already 1/2 second, don't know if that will help as much, but on larger datasets I have known it to SIGNIFICANTLY improve performance.
Next, you appear to be getting lookup descriptions based on the PK/FK relationship results. Not seeing the indexes on those tables, I would suggest doing covering indexes which contain both the key and description so the join can get the data from the index pages it uses for the JOIN instead of use index page, find the actual data pages to get the description and continue.
Last, your job table with the index on (created_at,active,status), might perform better if the index had the index as ( status, active, created_at ).
With your existing index, think of it this way, each day of data is put into a single box. Within each day box that is sorted by an active timestamp (even if simplified by active date), THEN the status.
So, for each day CREATED, you open a box. Look at secondary boxes, one for each "Active" timestamp (ex: by day). Within each Active timestamp (day), only now can you see if the "Status = 1" records. So open each active timestamp day, assess Status = 1, then close each created day box and go to the next created day box and repeat. So look at the labor intensive of open each box per day, each active box within that day.
Now, under the suggested index starting with status. You now have a very finite number of boxes, one for each status. Open only the 1 box for status = 1 These are the only ones you want to consider... All the others you don't care. Inside that, you have the actual records based on ACTIVE Timestamp and that is sub-sorted. From that, you can jump directly to those at the current timestamp. From the first record and the rest within the box, you now have all the records that qualify. Done. Since these records (index) ALSO has the Created_at as part of the index, it can optimize that with the descending sort order.
For ensuring "covering indexes" for the other lookup tables if they do not yet exist, I suggest the following.
table index
company ( id, name, logo )
activity_sector (id, name )
geo__name ( id, name )
job_salary_type ( id, name )
job_contract_type ( id, name )
workday ( id, name )
And the MySQL Keyword...
SELECT STRAIGHT_JOIN (rest of query...)
There are several reasons as to why Symfony is slow.
1. Server fault
First, it could be the server fault. Server performances may hinder your query time.
2. Data size and defered rendering
Then comes the data size. As you can see on the image below, the query on one of my project have a 50Mb data size (currently about 20k rows).
Parsing 50Mb in HTML can take some time, mostly because of loops.
Still, there are solutions about this, like defered rendering.
Defered rendering is quite simple, instead of parsing data in your twig you,
send all data to a javascript varaible, and use javascript to parse/render data once the DOM is loaded.
3. Query optimisation
As I wrote in comment, you can check the following question, on which I explained why custom queries are important.
Are Doctrine relations affecting application performance?
In this question, you will read that order matter... It's in fact the most important thing.
While static data in your databases are often inserted in the right order,
it's rarely the case for dynamic data (data provided by user during the website life)
Which is why, using ORDER BY in your query will often speed up the page rendering,
as doctrine won't be doing extra queries on it's own.
As exemple, One of my site have about 700 entries diplayed on the index.
First, here is the query count while using findAll() :
It show 254 query (253 duplicates) in 144ms, plus 39 render time.
Next, using the second parameter of findBy(), ORDER BY, I get this result :
You can see the full query here (sreenshot is big)
Much better, 1 query only in 8ms, and about the same render time.
But, here, I don't use any fields from associations.
From the moment I will do it, doctrine qui do some extra query, and query count and time will skyrocket.
In the end, it will turn back to something like findAll()
And last, this is the custom query :
In this custom query, the query time went from 8ms to 38ms.
But, unlike the previous query, I got way more data in my result,
which will prevent doctrine from doing extra query.
Again, ORDER BY() matter in this query. Without it, I skyrocket back to 84 queries.
4. Partials
When you do custom query, you can load partials objects instead of full data.
As you said in your question, description field seems to slow down your loading speed,
with partials, you can avoid to load some fields from the table, which will speed up query speed.
First, instead of your regular syntax, this is how you will create the query builder :
$em=$this->getEntityManager();
$qb=$em->createQueryBuilder();
Just in case, I prefer to keep $em as a separate variable (if I want to fetch some class repository for example).
Then you can start your partial select. Careful, first select can't include any association fields :
$qb->select("partial job.{id, status, title, minimum_experience, min_salary, max_salary, zip_code, vacancies")
->from(Job::class, "job");
Then you can add your associations :
$qb->addSelect("company")
->join("job.company", "company");
Or even add partial association in case you don't need all the data of the association :
$qb->addSelect("partial activitySector.{id}")
->join("job.activitySector", "activitySector");
$qb->addSelect("partial job.{id, company_id, activity_sector_id, status, active, contract_type_id, salary_type_id, workday_id, geoname_id, title, minimum_experience, min_salary, max_salary, zip_code, vacancies, show_salary");
5. Caches
You could also use various caches, like Zend OPCache for PHP, which you will find some advices in this question: Why Symfony3 so slow?
There is also the SQL cache Varnish.
This round up about everything I can share to lower your loading time.
Hope it will prove useful and you will be able to solve your problem.
So many keys , try to minimize the number of keys.
A couple of years ago I designed a reward system for 11-16yo students in PHP, JavaScript and MySQL.
The premise is straightforward:
Members of staff issue points to students under various categories ("Positive attitude and behaviour", "Model citizen", etc)
Students accrue these points then spend them in our online store (iTunes vouchers, etc)
Existing system
The database structure is also straightforward (probably too much so):
Transactions
239,189 rows
CREATE TABLE `transactions` (
`Transaction_ID` int(9) NOT NULL auto_increment,
`Datetime` date NOT NULL,
`Giver_ID` int(9) NOT NULL,
`Recipient_ID` int(9) NOT NULL,
`Points` int(4) NOT NULL,
`Category_ID` int(3) NOT NULL,
`Reason` text NOT NULL,
PRIMARY KEY (`Transaction_ID`),
KEY `Giver_ID` (`Giver_ID`),
KEY `Datetime` (`Datetime`),
KEY `DatetimeAndGiverID` (`Datetime`,`Giver_ID`),
KEY `Recipient_ID` (`Recipient_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=249069 DEFAULT CHARSET=latin1
Categories
34 rows
CREATE TABLE `categories` (
`Category_ID` int(9) NOT NULL,
`Title` varchar(255) NOT NULL,
`Description` text NOT NULL,
`Default_Points` int(3) NOT NULL,
`Groups` varchar(125) NOT NULL,
`Display_Start` datetime default NULL,
`Display_End` datetime default NULL,
PRIMARY KEY (`Category_ID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
Rewards
82 rows
CREATE TABLE `rewards` (
`Reward_ID` int(9) NOT NULL auto_increment,
`Title` varchar(255) NOT NULL,
`Description` text NOT NULL,
`Image_URL` varchar(255) NOT NULL,
`Date_Inactive` datetime NOT NULL,
`Stock_Count` int(3) NOT NULL,
`Cost_to_User` float NOT NULL,
`Cost_to_System` float NOT NULL,
PRIMARY KEY (`Reward_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=91 DEFAULT CHARSET=latin1
Purchases
5,889 rows
CREATE TABLE `purchases` (
`Purchase_ID` int(9) NOT NULL auto_increment,
`Datetime` datetime NOT NULL,
`Reward_ID` int(9) NOT NULL,
`Quantity` int(4) NOT NULL,
`Student_ID` int(9) NOT NULL,
`Student_Name` varchar(255) NOT NULL,
`Date_DealtWith` datetime default NULL,
`Date_Collected` datetime default NULL,
PRIMARY KEY (`Purchase_ID`)
) ENGINE=InnoDB AUTO_INCREMENT=6133 DEFAULT CHARSET=latin1
Problems
The system ran perfectly well for a period of time. It's now starting to slow-down massively on certain queries.
Essentially, every time I need to access a students' reward points total, the required query takes ages. Here is a few example queries and their run-times:
Top 15 students, excluding attendance categories, across whole school
SELECT CONCAT( s.Firstname, " ", s.Surname ) AS `Student` , s.Year_Group AS `Year Group`, SUM( t.Points ) AS `Points`
FROM frog_rewards.transactions t
LEFT JOIN frog_shared.student s ON t.Recipient_ID = s.id
WHERE t.Datetime > '2013-09-01' AND t.Category_ID NOT IN ( 12, 13, 14, 26 )
GROUP BY t.Recipient_ID
ORDER BY `Points` DESC
LIMIT 0 , 15
Run-time: 44.8425 sec
SELECT Recipient_ID, SUM(points) AS Total_Points FROMtransactionsGROUP BY Recipient_ID
Run-time: 9.8698 sec
Now I appreciate that, especially with the second query, I shouldn't ever be running a call which would return such a vast quantity of rows but the limitations of the framework within which the system runs meant that I had no other choice if I wanted to display students' total reward points for teachers/tutors/year managers/leadership to view and analyse.
Time for a solution
Fortunately the framework we've been forced to use is changing. We'll now be using oAuth rather than a horrible, outdated JavaScript widget format.
Unfortunately - or, I guess, fortunately - it means we'll have to rewrite quite a lot of the system.
One of the main areas I intend to look at when rewriting the system is the database structure. As time goes on it will only get bigger, so I need to do a bit of future-proofing.
As such, my main question is this: what is the most efficient and effective way of storing students' point totals?
The only idea I can come up with is to have a separate table called totals with Student_ID and Points fields. Every time a member of staff gives out some points, it adds a row into the transactions table but also updates the totals table.
Is that efficient? Would it be efficient to also have a Points_Since_Monday type field? How would I update/keep on top of that?
On top of the main question, if anyone has suggestions for general improvement with regard to optimisation of the database table, please let me know.
Thanks in advance,
Duncan
There is nothing particularly wrong with your design which should make it as slow as you have reported. I'm thinking there must be other factors at work, such as the server it is running on being overloaded or slow, for example. Only you will be able to find out if that is the case.
In order to test your design I recreated it on the 2008 SQL Server I have running on my desktop computer. I have a standard computer, single hard disc, not SSD, not raid etc. so on a proper database server the results should be even better. I had to make some changes to the design as you are using MySQL but none of the changes should affect performace, it's just so I can run it on my database.
Here's the table structure I used, I had to guess at what you would have in the Student and Staff tables as you do not descibe those. I also took the liberty of changing the field names in the Transaction table for Giver_ID and Receiver_ID as I assume only staff give points and students receive them.
I generated random data to fill the tables with the same number of rows as you said you have in your database
I ran the two queries you said are taking a long time, I've changed them to suit my design but I (hope) the result is the same
SELECT TOP 15
Firstname + ' ' + Surname
,Year_Group
,SUM(Points) AS Points
FROM points.[Transaction]
INNER JOIN points.Student ON points.[Transaction].Student_ID = points.Student.Student_ID
WHERE [Datetime] > '2013-09-01'
AND Category_ID NOT IN ( 12, 13, 14, 26 )
GROUP BY Firstname + ' ' + Surname
,Year_Group
ORDER BY SUM(Points) DESC
SELECT Student_ID
,SUM(Points) AS Total_Points
FROM points.[Transaction]
GROUP BY Student_ID
Both queries returned results in about 1s. I have not created any additional indexes on the tables other than the CLUSTERED indexes generated by default on the primary keys. Looking at the execution plan the query processor estimates that implementing the following index could improve the query cost by 81.0309%
CREATE NONCLUSTERED INDEX [<Name of Missing Index>]
ON [points].[Transaction] ([Datetime],[Category_ID])
INCLUDE ([Student_ID],[Points])
As others have commented I would look elsewhere for bottlenecks before spending a lot of time redesigning your database.
Update:
I realised I never actually addressed your specific question:
what is the most efficient and effective way of storing students'
point totals?
The only idea I can come up with is to have a separate table called
totals with Student_ID and Points fields. Every time a member of staff
gives out some points, it adds a row into the transactions table but
also updates the totals table.
I would not recommend keeping a separate point total unless you have explored every other possible way to speed up the database. A separate tally can become out of sync with the transactions and then you have to reconcile everything and track down what went wrong, and what the correct total should be.
You should always focus on maintaining the correctness and consistency of the data before trying to increase speed. Most of the time a correct (normalised) data model will operate quickly enough.
In one place I worked we found the most cost effective way to speed up our database was simply to upgrade the hardware; much quicker and cheaper than spending many man-hours redesigning the database :)
I'm working on a search engine using CakePHP 2.0 and am having difficulty finding the most efficient way to get the result I'm wanting.
Say I'm querying people and I get a set of 20 results with 5 age 20, 10 age 30 and 5 age 40. In addition, 15 of these people have brown eyes, 3 have blue and 2 have green. I want to find the most efficient way to get those specific counts. I'll then display these results on the page so that users can see what's in the results with those parameters. They will then be able to click on one of them to add that search parameter to the current query.
This isn't something that I can store in a database or cache at all because each search could be different and could/will return different results.
If I'm not explaining what I'm trying to do (this may be likely) there are several websites that do this. Cars.com uses this method when searching for cars. You search a generic search and then links on the side allow you to narrow your results. These links include counts of the current result set that fall within the specific parameter.
An idea has been to get the full result set and then parse through it generating the counts and this would work, but in my specific project I'm dealing with thousands of records and it seems like this could add additional load time to the page and/or strain on the server.
Here's a visual example:
Cars.com is likely using associated tags for each feature that is counted. With associated tags a table record has many and belongs to many feature tags.
So that they don't have to create a tag for each car price. They create price range tags.
For every car record there are associated tag records that hold all features of that car. You can then cache a count in the tag record of how many cars have that feature.
The SQL table structure might be something like this.
CREATE TABLE `cars` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`make` varchar(45) DEFAULT NULL,
`model` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `features` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`name` varchar(45) DEFAULT NULL,
`count` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
CREATE TABLE `cars_features` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`car_id` int(11) NOT NULL,
`feature_id` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
For every Car record there can be multiple Feature records. These are associated to each Car via the cars_features table. When someone searches and finds Car XXXX you can then look up the Features of that car, and also display a cached count of how many cars have that feature.
EDIT:
To narrow the counts so that they are limited to only the cars that were discovered in the search. You'll need to first get a list of all the Car IDs and then perform a COUNT using a JOIN between the cars_features table and features.
Here is some sample data.
INSERT INTO `cars` (`make`, `model`) VALUES ('Ford', 'Explorer');
INSERT INTO `cars` (`make`, `model`) VALUES ('Hond', 'Civic');
INSERT INTO `cars` (`make`, `model`) VALUES ('Hond', 'Civic');
INSERT INTO `features` (`name`, `count`) VALUES ('Red', 2);
INSERT INTO `features` (`name`, `count`) VALUES ('Green', 1);
INSERT INTO `cars_features` (`car_id`, `feature_id`) VALUES (1, 1);
INSERT INTO `cars_features` (`car_id`, `feature_id`) VALUES (2, 1);
INSERT INTO `cars_features` (`car_id`, `feature_id`) VALUES (3, 2);
Assuming we searched for that returned two items so that our Car IDs were (1,2). We could find a feature count using the following SQL query.
SELECT `features`.`id`,COUNT(`features`.`id`)
FROM `cars_features`
JOIN `features` ON (`cars_features`.`feature_id`=`features`.`id`)
WHERE `cars_features`.`car_id` IN (1,2)
GROUP BY `features`.`id`
This will report that count for each Feature limited to just the Car records found.
I'll try to write the above in CakePHP model format.
$this->CarFeature->find('all',array(
'conditions'=>array('CarFeature.car_id'=>$ids),
'fields'=>array('Feature.id','COUNT(Feature.id)'),
'group'=>array('Feature.id'),
'contain'=>'Feature'
));
I have a social network similar to myspace but I use PHP and mysql, I have been looking for the best way to show users bulletins posted only
fronm themself and from users they are confirmed friends with.
This involves 3 tables
friend_friend = this table stores records for who is who's friend
friend_bulletins = this stores the bulletins
friend_reg_user = this is the main user table with all user data like name and photo url
I will post bulletin and friend table scheme below, I will only post the fields important for the user table.
-- Table structure for table friend_bulletin
CREATE TABLE IF NOT EXISTS `friend_bulletin` (
`auto_id` int(11) NOT NULL AUTO_INCREMENT,
`user_id` int(10) NOT NULL DEFAULT '0',
`bulletin` text NOT NULL,
`subject` varchar(255) NOT NULL DEFAULT '',
`color` varchar(6) NOT NULL DEFAULT '000000',
`submit_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`status` enum('Active','In Active') NOT NULL DEFAULT 'Active',
`spam` enum('0','1') NOT NULL DEFAULT '1',
PRIMARY KEY (`auto_id`),
KEY `user_id` (`user_id`),
KEY `submit_date` (`submit_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=245144 ;
-- Table structure for table friend_friend
CREATE TABLE IF NOT EXISTS `friend_friend` (
`autoid` int(11) NOT NULL AUTO_INCREMENT,
`userid` int(10) DEFAULT NULL,
`friendid` int(10) DEFAULT NULL,
`status` enum('1','0','3') NOT NULL DEFAULT '0',
`submit_date` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`alert_message` enum('yes','no') NOT NULL DEFAULT 'yes',
PRIMARY KEY (`autoid`),
KEY `userid` (`userid`),
KEY `friendid` (`friendid`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=2657259 ;
friend_reg_user table fields that will be used
auto_id = this is the users ID number
disp_name = this is the users name
pic_url = this is a thumbnail image path
Bulletins should show all bulletins posted by a user ID that is in our friend list
should also show all bulletins that we posted ourself
needs to scale well, friends table is several million rows
// 1 Old method uses a subselect
SELECT auto_id, user_id, bulletin, subject, color, fb.submit_date, spam
FROM friend_bulletin AS fb
WHERE (user_id IN (SELECT userid FROM friend_friend WHERE friendid = $MY_ID AND status =1) OR user_id = $MY_ID)
ORDER BY auto_id
// Another old method that I used on accounts that had a small amount of friends because this one uses another query
that would return a string of all there friends in this format $str_friend_ids = "1,2,3,4,5,6,7,8"
select auto_id,subject,submit_date,user_id,color,spam
from friend_bulletin
where user_id=$MY_ID or user_id in ($str_friend_ids)
order by auto_id DESC
I know these are not good for performance as my site is getting really large so I have been experimenting with JOINS
I beleive this gets everything I need except it needs to be modified to also get bulletins posted by myself, when I add that into the WHERE part it seems to break it and return multiple results for each bulletin posted, I think because it is trying to
return results that I am a friedn of and then I try to consider myself a friend and that doesnt work well.
My main point of this whole post though is I am open to opinions on the best performance way to do this task, many big social networks have similar function that return a list of items posted only by your friends. There has to be other faster ways???? I keep reading that JOINS are not great for performance but how else can I do this? Keep in mind I do use indexes and have a dedicated database server but my userbase is large there is no way around that
SELECT fb.auto_id, fb.user_id, fb.bulletin, fb.subject, fb.color, fb.submit_date, fru.disp_name, fru.pic_url
FROM friend_bulletin AS fb
LEFT JOIN friend_friend AS ff ON fb.user_id = ff.userid
LEFT JOIN friend_reg_user AS fru ON fb.user_id = fru.auto_id
WHERE (
ff.friendid =1
AND ff.status =1
)
LIMIT 0 , 30
First of all, you can try to partition out the database so that you're only accessing a table with the primary rows you need. Move rows that are less often used to another table.
JOINs can impact performance but from what I've seen, subqueries are not any better. Try refactoring your query so that you're not pulling all that data at once. It also seems like some of that query can be run once elsewhere in your app, and those results either stored in variables or cached.
For example, you can cache an array of friends who are connected for each user and just reference that when running the query, and only update the cache when a new friend is added/removed.
It also depends on the structure of your systems and your code architecture - your bottle nneck may not entirely be in the db.