What is causing this memory leak when (inner) joining this table?

What is causing this memory leak when (inner) joining this table? - php

I have SQL that in my head, would and should run in under 1 second:
SELECT mem.`epid`,
mem.`model_id`,
em.`UKM_Make`,
em.`UKM_Model`,
em.`UKM_CCM`,
em.`UKM_Submodel`,
em.`Year`,
em.`UKM_StreetName`,
f.`fit_part_number`
FROM `table_one` AS mem
INNER JOIN `table_two` em ON mem.`epid` = em.`ePID`
INNER JOIN `table_three` f ON `mem`.`model_id` = f.`fit_model_id`
LIMIT 1;
When I run in the terminal this SQL executes in 16 seconds. However, if I remove the line:
INNER JOIN `table_three` f ON `mem`.`model_id` = f.`fit_model_id`
then it executes in 0.03 seconds. Unfortunately for me, I'm not to sure how to debug MYSQL performance issues. This causes my PHP script to run out of memory trying to execute the query.
Here are my table structures:
table_one
+----------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+---------+------+-----+---------+-------+
| epid | int(11) | YES | | NULL | |
| model_id | int(11) | YES | | NULL | |
+----------+---------+------+-----+---------+-------+
table_two
+----------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| ePID | int(11) | NO | | NULL | |
| UKM_Make | varchar(100) | NO | | NULL | |
| UKM_Model | varchar(100) | NO | | NULL | |
| UKM_CCM | int(11) | NO | | NULL | |
| UKM_Submodel | varchar(100) | NO | | NULL | |
| Year | int(11) | NO | | NULL | |
| UKM_StreetName | varchar(100) | NO | | NULL | |
| Vehicle Type | varchar(100) | NO | | NULL | |
+----------------+--------------+------+-----+---------+-------+
table_three
+-----------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+----------------+
| fit_fitment_id | int(11) | NO | PRI | NULL | auto_increment |
| fit_part_number | varchar(50) | NO | | NULL | |
| fit_model_id | int(11) | YES | | NULL | |
| fit_year_start | varchar(4) | YES | | NULL | |
| fit_year_end | varchar(4) | YES | | NULL | |
+-----------------+-------------+------+-----+---------+----------------+
The above is output from describe $table_name
Is there anything that I'm obviously missing and if not, how can I try to find out why including table_three causes such a slow response time?
EDIT ONE:
After the indexing suggestion (used CREATE INDEX fit_model ON table_three (fit_model_id), it performs the query in 0.00 seconds (in MYSQL). Removing the limit, is still running from after doing the suggestion ... so not quite there. Anton's suggestion about using EXPLAIN I used it and got this output:
+------+-------------+-------+------+---------------+-----------+---------+----------------------+-------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-------+------+---------------+-----------+---------+----------------------+-------+-------------------------------------------------+
| 1 | SIMPLE | mem | ALL | NULL | NULL | NULL | NULL | 5587 | Using where |
| 1 | SIMPLE | f | ref | fit_model | fit_model | 5 | mastern.mem.model_id | 14 | |
| 1 | SIMPLE | em | ALL | NULL | NULL | NULL | NULL | 36773 | Using where; Using join buffer (flat, BNL join) |
+------+-------------+-------+------+---------------+-----------+---------+----------------------+-------+-------------------------------------------------+
EDIT TWO
I've added a Foreign Key based on suggestions using the below query:
ALTER TABLE `table_one`
ADD CONSTRAINT `model_id_fk_tbl_three`
FOREIGN KEY (`model_id`)
REFERENCES `table_three` (`fit_model_id`)
MYSQL is still executing the command - there are a lot of rows, so half-expecting this behaviour. With PHP I can break up the query and build my array like that, so I guess that possibly solves the issue - thought is there anything more I can do to try and reduce execution time?

Based on everyone's comments etc. I managed to perform a few things that made my query run a hell of a lot quicker and not crash my script.
1) Indexes
I created an index on my table_three for the field fit_model_id:
CREATE INDEX fit_model ON `table_three` (`fit_model_id`);
This made my LIMIT 1 query go from 16 seconds to 0.03 seconds execution time (in MYSQL CLI).
However, 100 rows or so would still take a lot longer than I thought.
2) Foreign Keys
I created a foreign key that linked table_one.model_id = table_three.fit_model_id using the below query:
ALTER TABLE `table_one`
ADD CONSTRAINT `model_id_fk_tbl_three`
FOREIGN KEY (`model_id`)
REFERENCES `table_three` (`fit_model_id`)
This definitely helped, but still felt like more could be done.
3) OPTIMIZE TABLE
I then used OPTIMIZE TABLE on these tables:
table_one
table_three
This then made my script work and my query fast as ever. However, the issue I had was a large data set, so I let, the query run in MYSQL CLI whilst increasing the LIMIT by 1000 each script run time to help the indexing process, got all the way to 30K rows before it started crashing.
CLI took 31 minutes and 8 seconds to complete. So I did this:
31 x 60 = 1860
1860 + 8 = 1868
1868 / 448476 = 0.0042
So each row took 0.0042 seconds to complete - which is fast enough in my eyes.
Thanks to everyone for commenting and helping me debug and fix the issue :)

Based on comments correct answer is as follows:
In case of long execution of select statement add EXPLAIN statement before SELECT
Check whether possible_keys are empty in subqueries for specific tables.
Add FOREIGN KEYs for tables found in step 2. In case of vast table it's recommended to adjust MAX_EXECUTION_TIME variable (can be done for single query)
In case of massive insert/update/delete operations OPTIMIZE TABLE can adjust performance also.

Related

How to efficiently calculate averages from a big table?

I have a table called ratings with the following fields:
+-----------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------+------+-----+---------+----------------+
| rating_id | bigint(20) | NO | PRI | NULL | auto_increment |
| user_id | int(11) | NO | MUL | NULL | |
| movie_id | int(11) | NO | | NULL | |
| rating | float | NO | | NULL | |
+-----------+------------+------+-----+---------+----------------+
Indexes on this table:
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| ratings | 0 | PRIMARY | 1 | rating_id | A | 100076 | NULL | NULL | | BTREE | | |
| ratings | 0 | user_id | 1 | user_id | A | 564 | NULL | NULL | | BTREE | | |
| ratings | 0 | user_id | 2 | movie_id | A | 100092 | NULL | NULL | | BTREE | | |
+---------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
I have another table called movie_average_ratings which has the following fields:
+----------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+---------+------+-----+---------+-------+
| movie_id | int(11) | NO | PRI | NULL | |
| average_rating | float | NO | | NULL | |
+----------------+---------+------+-----+---------+-------+
As it is obvious by this point I want to calculate the average rating of movies from ratings table and update the movie_average_ratingstable. I tried the following SQL query.
UPDATE movie_average_ratings
SET average_rating = (SELECT AVG(rating)
FROM ratings
WHERE ratings.movie_id = movie_average_ratings.movie_id);
Currently, there are around 10,000 movie records and 100,000 rating records and I get Lock wait timeout exceeded; try restarting transaction error. The number of records can grow significantly so I don't think increase timeout is a good solution.
So, how can I write 'scalable' query to acheive this? Is iterating the movie_average_ratings table records and calculate averages individually the most efficient solution to this?

Without an explain, it's hard to be clear on what's holding you up. It's also not clear that you will get a performance improvement by storing this aggregated data as a denormalized table - if the query to calculate the ratings executes in 0.04 seconds, it's unlikely querying your denormalized table will be much faster.
In general, I recommend only denormalizing if you know you have a performance problem.
But that's not the question.
I would do the following:
delete from movie_average_ratings;
insert into movie_average_ratings
Select movie_ID, avg(rating)
from ratings
group by movie_id;

I just found something in another post:
What is happening is, some other thread is holding a record lock on
some record (you're updating every record in the table!) for too long,
and your thread is being timed out.
This means that some of your records are locked you can force unlock them in the console:
1) Enter MySQL mysql -u your_user -p
2) Let's see the list of locked tables mysql> show open tables where in_use>0;
3) Let's see the list of the current processes, one of them is locking
your table(s) mysql> show processlist;
4) Kill one of these processes mysql> kill put_process_id_here;

You could redesign the movie_average_ratings table to
movie_id (int)
sum_of_ratings (int)
num_of_ratings (int)
Then, if a new rating is added you can add it to movie_average_ratings and calculate the average if needed

How can a simple MySQL insert/update be slower than an external web request?

Due to some performance issues I've been optimizing several SQL queries and adding indexes to certain tables/columns to speed up things.
Been running some time tests using microtime() in PHP (looping the queries a couple hundred times and calling RESET QUERY CACHE in each loop). I'm somewhat baffled by the results from one of the functions that does 3 things:
Inserts a row in a sessions table (InnoDB).
Updates a row in a users table (InnoDB).
Sends session ID to remote server which inserts the session ID in a session table of it's own (MongoDB).
Step 1. generally takes 30 - 40 ms, step 2. 20 - 30 ms and step 3. 7 - 20 ms.
I've tried looking up some expected query times for MySQL, but haven't found anything useful, so I don't know what to expect. Having said that, those query times seem somewhat high and I would definite not expect the web request to finish faster than the MySQL queries to the local database.
Any idea if those query times are reasonable compared to the web request?
SQL/system information
Both servers (the remote and the one with the MySQL database) are virtual servers running on the same physical server with shared storage (multiple SSD raid destup). The remote server has a single CPU and 2 GB RAM assigned, the MySQL server has 8 CPUs and 32 GB RAM assigned. Both servers are on the same LAN.
The sessions insert query:
INSERT INTO sessions (
session_id,
user_id,
application,
machine_id,
user_agent,
ip,
method,
created,
last_active,
expires
)
VALUES (
string, // session_id
int, // user_id
string, // application
string, // machine_id
string, // user_agent
string, // ip
string, // method
CURRENT_TIMESTAMP, // created
CURRENT_TIMESTAMP, // last_active
NULL / FROM_UNIXTIME([PHP timestamp]) // expires
)
The sessions table (contains ~500'000 rows);
+-------------+---------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+---------------+------+-----+---------+----------------+
| sessions_id | int(11) | NO | PRI | NULL | auto_increment |
| session_id | char(32) | NO | UNI | NULL | |
| user_id | int(11) | NO | MUL | NULL | |
| application | varchar(128) | NO | | NULL | |
| machine_id | varchar(36) | NO | | NULL | |
| user_agent | varchar(1024) | NO | | NULL | |
| ip | varchar(15) | NO | | NULL | |
| method | varchar(20) | NO | | NULL | |
| created | datetime | NO | | NULL | |
| last_active | datetime | NO | | NULL | |
| expires | datetime | YES | MUL | NULL | |
+-------------+---------------+------+-----+---------+----------------+
The users update query:
UPDATE users
SET last_active = string // For example '2016-01-01 00:00:00'
WHERE user_id = int
The users table (contains ~200'000 rows):
+------------------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+---------------------+------+-----+---------+----------------+
| user_id | int(11) | NO | PRI | NULL | auto_increment |
| username | varchar(64) | NO | MUL | NULL | |
| first_name | varchar(256) | NO | | NULL | |
| last_name | varchar(256) | NO | | NULL | |
| info | varchar(512) | NO | | NULL | |
| address1 | varchar(512) | NO | | NULL | |
| address2 | varchar(512) | NO | | NULL | |
| city | varchar(256) | NO | | NULL | |
| zip_code | varchar(128) | NO | | NULL | |
| state | varchar(256) | NO | | NULL | |
| country | varchar(128) | NO | | NULL | |
| locale | varchar(5) | NO | | NULL | |
| phone | varchar(128) | NO | | NULL | |
| email | varchar(256) | NO | MUL | NULL | |
| password | char(60) | NO | MUL | NULL | |
| permissions | bigint(20) unsigned | NO | | 0 | |
| created | datetime | YES | | NULL | |
| last_active | datetime | YES | | NULL | |
+------------------------+---------------------+------+-----+---------+----------------+

It seems that the problem was simply our MySQL settings (they were all default).
I ran a MySQL profile on the users update query and found that the step query end was taking up the majority of the time spent executing the query.
Googling that led me to https://stackoverflow.com/a/12446986/736247 - rather than using all the suggested values directly (which cannot be recommended, because some of them can have adverse effects on data integrity) I found some more info, including this page on Percona: https://www.percona.com/blog/2013/09/20/innodb-performance-optimization-basics-updated/.
InnoDB Startup Options and System Variables was also useful: http://dev.mysql.com/doc/refman/5.7/en/innodb-parameters.html was also useful.
I ended up setting new values for the following settings:
innodb_flush_log_at_trx_commit
innodb_flush_method
innodb_buffer_pool_size
innodb_buffer_pool_instances
innodb_log_file_size
This resulted in significantly shorter query times (measured in the same way as I did in the question):
Insert a row in a sessions table: ~8 ms (down from 30-40 ms).
Update a row in a users table: ~2.5 ms (down from 20-30 ms).

MySQL is not correctly selecting rows (sometimes)

This is an update to this question, wherein I was casting around trying to work out what on earth was going on:
MySQL sometimes erroneously returns 0 for count(*)
I ended up accepting an answer there because it did answer the question I posed ("why might this happen") even though it didn't answer the question I really wanted to know about ("why is this happening to me"). But I've managed to narrow things down a little bit on the latter question, and think I can definitively say that something is wrong in a way that I don't understand and have never seen before.
The issue has been really difficult to debug because, for reasons beyond my comprehension, logging in to the database automagically fixes it. However, today I managed to trigger the problematic state while having an open MySQL session in a terminal. Here are some queries and the subsequent responses taken from that session:
First, this is my table layout:
mysql> describe forum_posts;
+-----------+------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------+------+-----+---------+----------------+
| post_id | int(11) | NO | PRI | NULL | auto_increment |
| thread_id | int(11) | YES | MUL | NULL | |
| forum_id | int(11) | YES | MUL | NULL | |
| user_id | int(11) | YES | MUL | NULL | |
| moderator | tinyint(1) | NO | | 0 | |
| message | mediumtext | YES | MUL | NULL | |
| date | int(11) | NO | MUL | NULL | |
| edited | int(11) | YES | | NULL | |
| deleted | tinyint(1) | YES | MUL | 0 | |
| bbcode | tinyint(1) | NO | | 1 | |
+-----------+------------+------+-----+---------+----------------+
10 rows in set (0.00 sec)
Now, lets look at how many posts there are in a given forum thread:
mysql> SELECT count(post_id) as num FROM `forum_posts` where thread_id=5243;
+-----+
| num |
+-----+
| 195 |
+-----+
1 row in set (0.00 sec)
OK, but I only want forum posts that don't have the deleted flag set:
mysql> SELECT count(post_id) as num FROM `forum_posts` where thread_id=5243 and deleted=0;
+-----+
| num |
+-----+
| 0 |
+-----+
1 row in set (0.06 sec)
mysql> select post_id,deleted from forum_posts where thread_id=5243 and deleted=0;
Empty set (0.06 sec)
OK, lets just double-make-sure that they aren't actually all deleted:
mysql> select post_id,deleted from forum_posts where thread_id=5243;
+---------+---------+
| post_id | deleted |
+---------+---------+
| 104081 | 0 |
| 104082 | 0 |
[snip]
| 121162 | 0 |
| 121594 | 0 |
+---------+---------+
195 rows in set (0.00 sec)
Every row in that table has 'deleted' set to 0, and yet adding and deleted=0 to the query yields no results. Until I open a new session by logging in to MySQL again from a terminal window, after which I can once again properly select rows where 'deleted' is 0.
What on earth?
UPDATES:
#miken32 in the comments below suggested I try an EXPLAIN SELECT ..., so:
mysql> explain select post_id,deleted from forum_posts where thread_id='5243' and deleted=0;
+----+-------------+-------------+-------------+-------------------+-------------------+---------+------+------+--------------------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------------+-------------------+-------------------+---------+------+------+--------------------------------------------------------------+
| 1 | SIMPLE | forum_posts | index_merge | thread_id,deleted | thread_id,deleted | 5,2 | NULL | 97 | Using intersect(thread_id,deleted); Using where; Using index |
+----+-------------+-------------+-------------+-------------------+-------------------+---------+------+------+--------------------------------------------------------------+
1 row in set (0.00 sec)

Based on the comment that using FORCE KEY alters the result from the query, it is very likely that we are dealing with the merge optimizer bug. EXPLAIN of the original query shows the optimization is done by selecting from the deleted key, then from the post_id key, then merging the results. When we force to bypass that code, the problem goes away.
The steps from the point:
try it on the same data with the most recent 5.6 version of MySQL
if the issue reproduces, try to isolate it to the most minimal test case, visit http://bugs.mysql.com/ and report the bug

Exorcise the daemons and ghosts! Add this index to avoid any "merge" bug:
INDEX(deleted, thread_id) and DROP the key on just deleted
An index on a flag is almost always useless. This time it was worse than useless.
This wil be cheaper, faster, and safer than FORCE INDEX.

Ranking players in assassins game from mysql table

I am trying to figure out how to calculate the rankings for a game of assassins that I am running, I wish to rank people by kills primarily, and then by time of kills (those who got kills before the others are ranked higher) and then last the people that have been assassinated already ranked below those that are alive.
My table for logging assassinations looks like this:
mysql> describe assassinations;
+-----------+-----------------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-----------------------------------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| assassin | int(11) | NO | | NULL | |
| target | int(11) | NO | | NULL | |
| timestamp | int(11) | NO | | NULL | |
| ver | enum('assassin','target','both','none') | NO | | none | |
| confirmed | bit(1) | NO | | b'0' | |
+-----------+-----------------------------------------+------+-----+---------+----------------+
I am thinking that there must be a way to order the mysql results just like the way I want it to be ranked, but I don't know how. I got as far as trying to get the most common assassin value :(. I am using PHP with MySQL so a PHP solution would also work. (Please note, ignore the "confirmed" field, but "ver" must be both for it to be a valid kill).
Any help would be much appreciated. :)

Use COUNT and MIN to get the number of kills and the time of the first kill. And an EXISTS subquery to tell if the assassin has already been killed. Then you can use all these values in the ORDER BY clause to rank the players.
SELECT a1.assassin, COUNT(*) AS kills, MIN(timestamp) AS killtime,
EXISTS (SELECT * FROM assassinations AS a2
WHERE a2.target = a1.assassin) AS killed
FROM assassins AS a1
WHERE ver = 'both'
GROUP BY assassin
ORDER BY kills DESC, killtime ASC, killed ASC

What methods are there for storing the order of items in a database?

I'm creating a portfolio website that has galleries that contain images. I want the user of this portfolio to be able to order the images within a gallery. The problem itself is fairly simple I'm just struggling with deciding on a solution to implement.
There are 2 solutions I've thought of so far:
Simply adding an order column (or priority?) and then querying with an ORDER BY clause on that column. The disadvantage of this being that to change the order of a single image I'd have to update every single image in the gallery.
The second method would be to add 2 nullable columns next and previous that simply store the ID of the next and previous image. This would then mean there would be less data to update when the order was changed; however, it would be much more complex to set up and I'm not entirely sure how I'd actually implement it.
Extra options would be great.
Are those options viable?
Are there better options?
How could / should they be implemented?
The current structure of the two tables in question is the following:
mysql> desc Gallery;
+--------------+------------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+-------------------+-----------------------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| title | varchar(255) | NO | | NULL | |
| subtitle | varchar(255) | NO | | NULL | |
| description | varchar(5000) | NO | | NULL | |
| date | datetime | NO | | NULL | |
| isActive | tinyint(1) | NO | | NULL | |
| lastModified | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+--------------+------------------+------+-----+-------------------+-----------------------------+
mysql> desc Image;
+--------------+------------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+-------------------+-----------------------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| galleryId | int(10) unsigned | NO | MUL | NULL | |
| description | varchar(250) | YES | | NULL | |
| path | varchar(250) | NO | | NULL | |
| lastModified | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+--------------+------------------+------+-----+-------------------+-----------------------------+
Currently there is no implementation of ordering in any form.

while 1 is a bit ugly you can do:
UPDATE table set order=order+1 where order>='orderValueOfItemYouCareAbout';
this will update all the rest of the images and you wont have to do a ton of leg work.

As bart2puck has said and I stated in the question, option 1 is a little bit ugly; it is however the option I have chosen to go with to simplify the solution all round.
I have added a column (displayOrder int UNSIGNED) to the Image table after path. When I want to re-order a row in the table I simply swap rows around. So, if I have 3 rows:
mysql> SELECT id, galleryId, description, displayOrder FROM Image ORDER BY displayOrder;
+-----+-----------+----------------------------------+--------------+
| id | galleryId | description | displayOrder |
+-----+-----------+----------------------------------+--------------+
| 271 | 20 | NULL | 1 |
| 270 | 20 | Tracks leading into the ocean... | 2 |
| 278 | 20 | NULL | 3 |
+-----+-----------+----------------------------------+--------------+
3 rows in set (0.00 sec)
If I want to re-order row 278 to appear second rather than third, I'll simply swap it with the second by doing the following:
UPDATE Image SET displayOrder =
CASE displayOrder
WHEN 2 THEN 3
WHEN 3 THEN 2
END
WHERE galleryId = 20
AND displayOrder BETWEEN 2 AND 3;
Resulting in:
mysql> SELECT id, galleryId, description, displayOrder FROM Image ORDER BY displayOrder;
+-----+-----------+----------------------------------+--------------+
| id | galleryId | description | displayOrder |
+-----+-----------+----------------------------------+--------------+
| 271 | 20 | NULL | 1 |
| 278 | 20 | NULL | 2 |
| 270 | 20 | Tracks leading into the ocean... | 3 |
+-----+-----------+----------------------------------+--------------+
3 rows in set (0.00 sec)
One possible issue that some people may find is that you can only alter the position by one place with this method, i.e. to move image 278 to appear first I'd have to make it second, then first, otherwise the current first image would appear third.

We Keep Coding

PHP, A popular general-purpose scripting language that is especially suited to web development.

What is causing this memory leak when (inner) joining this table? - php

Related

How to efficiently calculate averages from a big table?

How can a simple MySQL insert/update be slower than an external web request?

MySQL is not correctly selecting rows (sometimes)

Ranking players in assassins game from mysql table

What methods are there for storing the order of items in a database?

Categories

Resources