How to efficiently calculate averages from a big table?

How to efficiently calculate averages from a big table? - php

I have a table called ratings with the following fields:
+-----------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------+------+-----+---------+----------------+
| rating_id | bigint(20) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | NO | MUL | NULL | |
| movie_id | int(11) | NO | | NULL | |
| rating | float | NO | | NULL | |
+-----------+------------+------+-----+---------+----------------+
Indexes on this table:
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| ratings | 0 | PRIMARY | 1 | rating_id | A | 100076 | NULL | NULL | | BTREE | | |
| ratings | 0 | user_id | 1 | user_id | A | 564 | NULL | NULL | | BTREE | | |
| ratings | 0 | user_id | 2 | movie_id | A | 100092 | NULL | NULL | | BTREE | | |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
I have another table called movie_average_ratings which has the following fields:
+----------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+---------+------+-----+---------+-------+
| movie_id | int(11) | NO | PRI | NULL | |
| average_rating | float | NO | | NULL | |
+----------------+---------+------+-----+---------+-------+
As it is obvious by this point I want to calculate the average rating of movies from ratings table and update the movie_average_ratingstable. I tried the following SQL query.
UPDATE movie_average_ratings
SET average_rating = (SELECT AVG(rating)
FROM ratings
WHERE ratings.movie_id = movie_average_ratings.movie_id);
Currently, there are around 10,000 movie records and 100,000 rating records and I get Lock wait timeout exceeded; try restarting transaction error. The number of records can grow significantly so I don't think increase timeout is a good solution.
So, how can I write 'scalable' query to acheive this? Is iterating the movie_average_ratings table records and calculate averages individually the most efficient solution to this?

Without an explain, it's hard to be clear on what's holding you up. It's also not clear that you will get a performance improvement by storing this aggregated data as a denormalized table - if the query to calculate the ratings executes in 0.04 seconds, it's unlikely querying your denormalized table will be much faster.
In general, I recommend only denormalizing if you know you have a performance problem.
But that's not the question.
I would do the following:
delete from movie_average_ratings;
insert into movie_average_ratings
Select movie_ID, avg(rating)
from ratings
group by movie_id;

I just found something in another post:
What is happening is, some other thread is holding a record lock on
some record (you're updating every record in the table!) for too long,
and your thread is being timed out.
This means that some of your records are locked you can force unlock them in the console:
1) Enter MySQL mysql -u your_user -p
2) Let's see the list of locked tables mysql> show open tables where in_use>0;
3) Let's see the list of the current processes, one of them is locking
your table(s) mysql> show processlist;
4) Kill one of these processes mysql> kill put_process_id_here;

You could redesign the movie_average_ratings table to
movie_id (int)
sum_of_ratings (int)
num_of_ratings (int)
Then, if a new rating is added you can add it to movie_average_ratings and calculate the average if needed

Related

What is causing this memory leak when (inner) joining this table?

I have SQL that in my head, would and should run in under 1 second:
SELECT mem.`epid`,
mem.`model_id`,
em.`UKM_Make`,
em.`UKM_Model`,
em.`UKM_CCM`,
em.`UKM_Submodel`,
em.`Year`,
em.`UKM_StreetName`,
f.`fit_part_number`
FROM `table_one` AS mem
INNER JOIN `table_two` em ON mem.`epid` = em.`ePID`
INNER JOIN `table_three` f ON `mem`.`model_id` = f.`fit_model_id`
LIMIT 1;
When I run in the terminal this SQL executes in 16 seconds. However, if I remove the line:
INNER JOIN `table_three` f ON `mem`.`model_id` = f.`fit_model_id`
then it executes in 0.03 seconds. Unfortunately for me, I'm not to sure how to debug MYSQL performance issues. This causes my PHP script to run out of memory trying to execute the query.
Here are my table structures:
table_one
+----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| epid | int(11) | YES | | NULL | |
| model_id | int(11) | YES | | NULL | |
+----------+---------+------+-----+---------+-------+
table_two
+----------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| ePID | int(11) | NO | | NULL | |
| UKM_Make | varchar(100) | NO | | NULL | |
| UKM_Model | varchar(100) | NO | | NULL | |
| UKM_CCM | int(11) | NO | | NULL | |
| UKM_Submodel | varchar(100) | NO | | NULL | |
| Year | int(11) | NO | | NULL | |
| UKM_StreetName | varchar(100) | NO | | NULL | |
| Vehicle Type | varchar(100) | NO | | NULL | |
+----------------+--------------+------+-----+---------+-------+
table_three
+-----------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+----------------+
| fit_fitment_id | int(11) | NO | PRI | NULL | auto_increment |
| fit_part_number | varchar(50) | NO | | NULL | |
| fit_model_id | int(11) | YES | | NULL | |
| fit_year_start | varchar(4) | YES | | NULL | |
| fit_year_end | varchar(4) | YES | | NULL | |
+-----------------+-------------+------+-----+---------+----------------+
The above is output from describe $table_name
Is there anything that I'm obviously missing and if not, how can I try to find out why including table_three causes such a slow response time?
EDIT ONE:
After the indexing suggestion (used CREATE INDEX fit_model ON table_three (fit_model_id), it performs the query in 0.00 seconds (in MYSQL). Removing the limit, is still running from after doing the suggestion ... so not quite there. Anton's suggestion about using EXPLAIN I used it and got this output:
+------+-------------+-------+------+---------------+-----------+---------+----------------------+-------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+-----------+---------+----------------------+-------+-------------------------------------------------+
| 1 | SIMPLE | mem | ALL | NULL | NULL | NULL | NULL | 5587 | Using where |
| 1 | SIMPLE | f | ref | fit_model | fit_model | 5 | mastern.mem.model_id | 14 | |
| 1 | SIMPLE | em | ALL | NULL | NULL | NULL | NULL | 36773 | Using where; Using join buffer (flat, BNL join) |
+------+-------------+-------+------+---------------+-----------+---------+----------------------+-------+-------------------------------------------------+
EDIT TWO
I've added a Foreign Key based on suggestions using the below query:
ALTER TABLE `table_one`
ADD CONSTRAINT `model_id_fk_tbl_three`
FOREIGN KEY (`model_id`)
REFERENCES `table_three` (`fit_model_id`)
MYSQL is still executing the command - there are a lot of rows, so half-expecting this behaviour. With PHP I can break up the query and build my array like that, so I guess that possibly solves the issue - thought is there anything more I can do to try and reduce execution time?

Based on everyone's comments etc. I managed to perform a few things that made my query run a hell of a lot quicker and not crash my script.
1) Indexes
I created an index on my table_three for the field fit_model_id:
CREATE INDEX fit_model ON `table_three` (`fit_model_id`);
This made my LIMIT 1 query go from 16 seconds to 0.03 seconds execution time (in MYSQL CLI).
However, 100 rows or so would still take a lot longer than I thought.
2) Foreign Keys
I created a foreign key that linked table_one.model_id = table_three.fit_model_id using the below query:
ALTER TABLE `table_one`
ADD CONSTRAINT `model_id_fk_tbl_three`
FOREIGN KEY (`model_id`)
REFERENCES `table_three` (`fit_model_id`)
This definitely helped, but still felt like more could be done.
3) OPTIMIZE TABLE
I then used OPTIMIZE TABLE on these tables:
table_one
table_three
This then made my script work and my query fast as ever. However, the issue I had was a large data set, so I let, the query run in MYSQL CLI whilst increasing the LIMIT by 1000 each script run time to help the indexing process, got all the way to 30K rows before it started crashing.
CLI took 31 minutes and 8 seconds to complete. So I did this:
31 x 60 = 1860
1860 + 8 = 1868
1868 / 448476 = 0.0042
So each row took 0.0042 seconds to complete - which is fast enough in my eyes.
Thanks to everyone for commenting and helping me debug and fix the issue :)

Based on comments correct answer is as follows:
In case of long execution of select statement add EXPLAIN statement before SELECT
Check whether possible_keys are empty in subqueries for specific tables.
Add FOREIGN KEYs for tables found in step 2. In case of vast table it's recommended to adjust MAX_EXECUTION_TIME variable (can be done for single query)
In case of massive insert/update/delete operations OPTIMIZE TABLE can adjust performance also.

Speed up inserting IDs from one large mysql table to another

I can achieve what I want to do it's just terribly slow and I'm pretty sure there is a better way. Here is an abstract:
I have two tables I imported from two text files (via LOAD DATA INFILE). There is no primary and foreign key relationship between these tables yet.
Table 1 - Events: 1.5 millions rows (person_id values are empty)
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| event_id | int(11) | NO | PRI | NULL | auto_increment |
| person_id | int(11) | NO | UNI | NULL | |
| event_details| longtext | NO | | NULL | |
| person_name | varchar(30) | NO | | NULL | |
| mother_name | varchar(30) | NO | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
Table 2 - People: 140 000 rows
+--------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+--------------+------+-----+---------+----------------+
| person_id | int(11) | NO | PRI | NULL | auto_increment |
| person_name | varchar(30) | NO | | NULL | |
| mother_name | varchar(30) | NO | | NULL | |
+--------------+--------------+------+-----+---------+----------------+
I'm trying to insert the person_id from the people table into the events table.
I've tried a few methods, basically loop through each row of the events table find the person_id in the people table using the person_name and mother_name in the current row from the events table. Then insert the person_id into the events table.
My current query takes a bout 5 days to update all 1.5 million rows in the events table. How can I make this faster?
I'm using laravel and php but any suggestions welcome.

MySQL is not correctly selecting rows (sometimes)

This is an update to this question, wherein I was casting around trying to work out what on earth was going on:
MySQL sometimes erroneously returns 0 for count(*)
I ended up accepting an answer there because it did answer the question I posed ("why might this happen") even though it didn't answer the question I really wanted to know about ("why is this happening to me"). But I've managed to narrow things down a little bit on the latter question, and think I can definitively say that something is wrong in a way that I don't understand and have never seen before.
The issue has been really difficult to debug because, for reasons beyond my comprehension, logging in to the database automagically fixes it. However, today I managed to trigger the problematic state while having an open MySQL session in a terminal. Here are some queries and the subsequent responses taken from that session:
First, this is my table layout:
mysql> describe forum_posts;
+-----------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------+------+-----+---------+----------------+
| post_id | int(11) | NO | PRI | NULL | auto_increment |
| thread_id | int(11) | YES | MUL | NULL | |
| forum_id | int(11) | YES | MUL | NULL | |
| user_id | int(11) | YES | MUL | NULL | |
| moderator | tinyint(1) | NO | | 0 | |
| message | mediumtext | YES | MUL | NULL | |
| date | int(11) | NO | MUL | NULL | |
| edited | int(11) | YES | | NULL | |
| deleted | tinyint(1) | YES | MUL | 0 | |
| bbcode | tinyint(1) | NO | | 1 | |
+-----------+------------+------+-----+---------+----------------+
10 rows in set (0.00 sec)
Now, lets look at how many posts there are in a given forum thread:
mysql> SELECT count(post_id) as num FROM `forum_posts` where thread_id=5243;
+-----+
| num |
+-----+
| 195 |
+-----+
1 row in set (0.00 sec)
OK, but I only want forum posts that don't have the deleted flag set:
mysql> SELECT count(post_id) as num FROM `forum_posts` where thread_id=5243 and deleted=0;
+-----+
| num |
+-----+
| 0 |
+-----+
1 row in set (0.06 sec)
mysql> select post_id,deleted from forum_posts where thread_id=5243 and deleted=0;
Empty set (0.06 sec)
OK, lets just double-make-sure that they aren't actually all deleted:
mysql> select post_id,deleted from forum_posts where thread_id=5243;
+---------+---------+
| post_id | deleted |
+---------+---------+
| 104081 | 0 |
| 104082 | 0 |
[snip]
| 121162 | 0 |
| 121594 | 0 |
+---------+---------+
195 rows in set (0.00 sec)
Every row in that table has 'deleted' set to 0, and yet adding and deleted=0 to the query yields no results. Until I open a new session by logging in to MySQL again from a terminal window, after which I can once again properly select rows where 'deleted' is 0.
What on earth?
UPDATES:
#miken32 in the comments below suggested I try an EXPLAIN SELECT ..., so:
mysql> explain select post_id,deleted from forum_posts where thread_id='5243' and deleted=0;
+----+-------------+-------------+-------------+-------------------+-------------------+---------+------+------+--------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------------+-------------------+-------------------+---------+------+------+--------------------------------------------------------------+
| 1 | SIMPLE | forum_posts | index_merge | thread_id,deleted | thread_id,deleted | 5,2 | NULL | 97 | Using intersect(thread_id,deleted); Using where; Using index |
+----+-------------+-------------+-------------+-------------------+-------------------+---------+------+------+--------------------------------------------------------------+
1 row in set (0.00 sec)

Based on the comment that using FORCE KEY alters the result from the query, it is very likely that we are dealing with the merge optimizer bug. EXPLAIN of the original query shows the optimization is done by selecting from the deleted key, then from the post_id key, then merging the results. When we force to bypass that code, the problem goes away.
The steps from the point:
try it on the same data with the most recent 5.6 version of MySQL
if the issue reproduces, try to isolate it to the most minimal test case, visit http://bugs.mysql.com/ and report the bug

Exorcise the daemons and ghosts! Add this index to avoid any "merge" bug:
INDEX(deleted, thread_id) and DROP the key on just deleted
An index on a flag is almost always useless. This time it was worse than useless.
This wil be cheaper, faster, and safer than FORCE INDEX.

Ranking players in assassins game from mysql table

I am trying to figure out how to calculate the rankings for a game of assassins that I am running, I wish to rank people by kills primarily, and then by time of kills (those who got kills before the others are ranked higher) and then last the people that have been assassinated already ranked below those that are alive.
My table for logging assassinations looks like this:
mysql> describe assassinations;
+-----------+-----------------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-----------------------------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| assassin | int(11) | NO | | NULL | |
| target | int(11) | NO | | NULL | |
| timestamp | int(11) | NO | | NULL | |
| ver | enum('assassin','target','both','none') | NO | | none | |
| confirmed | bit(1) | NO | | b'0' | |
+-----------+-----------------------------------------+------+-----+---------+----------------+
I am thinking that there must be a way to order the mysql results just like the way I want it to be ranked, but I don't know how. I got as far as trying to get the most common assassin value :(. I am using PHP with MySQL so a PHP solution would also work. (Please note, ignore the "confirmed" field, but "ver" must be both for it to be a valid kill).
Any help would be much appreciated. :)

Use COUNT and MIN to get the number of kills and the time of the first kill. And an EXISTS subquery to tell if the assassin has already been killed. Then you can use all these values in the ORDER BY clause to rank the players.
SELECT a1.assassin, COUNT(*) AS kills, MIN(timestamp) AS killtime,
EXISTS (SELECT * FROM assassinations AS a2
WHERE a2.target = a1.assassin) AS killed
FROM assassins AS a1
WHERE ver = 'both'
GROUP BY assassin
ORDER BY kills DESC, killtime ASC, killed ASC

What methods are there for storing the order of items in a database?

I'm creating a portfolio website that has galleries that contain images. I want the user of this portfolio to be able to order the images within a gallery. The problem itself is fairly simple I'm just struggling with deciding on a solution to implement.
There are 2 solutions I've thought of so far:
Simply adding an order column (or priority?) and then querying with an ORDER BY clause on that column. The disadvantage of this being that to change the order of a single image I'd have to update every single image in the gallery.
The second method would be to add 2 nullable columns next and previous that simply store the ID of the next and previous image. This would then mean there would be less data to update when the order was changed; however, it would be much more complex to set up and I'm not entirely sure how I'd actually implement it.
Extra options would be great.
Are those options viable?
Are there better options?
How could / should they be implemented?
The current structure of the two tables in question is the following:
mysql> desc Gallery;
+--------------+------------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+-------------------+-----------------------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| title | varchar(255) | NO | | NULL | |
| subtitle | varchar(255) | NO | | NULL | |
| description | varchar(5000) | NO | | NULL | |
| date | datetime | NO | | NULL | |
| isActive | tinyint(1) | NO | | NULL | |
| lastModified | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+--------------+------------------+------+-----+-------------------+-----------------------------+
mysql> desc Image;
+--------------+------------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+-------------------+-----------------------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| galleryId | int(10) unsigned | NO | MUL | NULL | |
| description | varchar(250) | YES | | NULL | |
| path | varchar(250) | NO | | NULL | |
| lastModified | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+--------------+------------------+------+-----+-------------------+-----------------------------+
Currently there is no implementation of ordering in any form.

while 1 is a bit ugly you can do:
UPDATE table set order=order+1 where order>='orderValueOfItemYouCareAbout';
this will update all the rest of the images and you wont have to do a ton of leg work.

As bart2puck has said and I stated in the question, option 1 is a little bit ugly; it is however the option I have chosen to go with to simplify the solution all round.
I have added a column (displayOrder int UNSIGNED) to the Image table after path. When I want to re-order a row in the table I simply swap rows around. So, if I have 3 rows:
mysql> SELECT id, galleryId, description, displayOrder FROM Image ORDER BY displayOrder;
+-----+-----------+----------------------------------+--------------+
| id | galleryId | description | displayOrder |
+-----+-----------+----------------------------------+--------------+
| 271 | 20 | NULL | 1 |
| 270 | 20 | Tracks leading into the ocean... | 2 |
| 278 | 20 | NULL | 3 |
+-----+-----------+----------------------------------+--------------+
3 rows in set (0.00 sec)
If I want to re-order row 278 to appear second rather than third, I'll simply swap it with the second by doing the following:
UPDATE Image SET displayOrder =
CASE displayOrder
WHEN 2 THEN 3
WHEN 3 THEN 2
END
WHERE galleryId = 20
AND displayOrder BETWEEN 2 AND 3;
Resulting in:
mysql> SELECT id, galleryId, description, displayOrder FROM Image ORDER BY displayOrder;
+-----+-----------+----------------------------------+--------------+
| id | galleryId | description | displayOrder |
+-----+-----------+----------------------------------+--------------+
| 271 | 20 | NULL | 1 |
| 278 | 20 | NULL | 2 |
| 270 | 20 | Tracks leading into the ocean... | 3 |
+-----+-----------+----------------------------------+--------------+
3 rows in set (0.00 sec)
One possible issue that some people may find is that you can only alter the position by one place with this method, i.e. to move image 278 to appear first I'd have to make it second, then first, otherwise the current first image would appear third.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

How to efficiently calculate averages from a big table? - php

You could redesign the movie_average_ratings table to movie_id (int) sum_of_ratings (int) num_of_ratings (int) Then, if a new rating is added you can add it to movie_average_ratings and calculate the average if needed

Related

What is causing this memory leak when (inner) joining this table?

Speed up inserting IDs from one large mysql table to another

MySQL is not correctly selecting rows (sometimes)

Ranking players in assassins game from mysql table

What methods are there for storing the order of items in a database?

Categories

Resources