This is more of a theoretical query than anything else, but I have a complex join (resulting in upwards of 1900 records in the main table, combined with all the sub-result tables in the join -- join shown below), the resulting web page is taking 5-10 minutes on my local machine to process and complete building. I realize this could easily be many factors, but am hoping to get some hints. Basically I am loading an array of names from two tables (one is cross-references, so the array is used to sort the data on the names, with links and a field noting if it is a cross reference), then if a name is not a cross reference, I issue this join:
select
n.NameCode, n.AL_NameCode, n.Name, n.Name_HTML, n.Region, n.Local, n.Deceased,
n.ArmsLink, n.RollOfArms, n.Blazon, n.PreferredTitle, n.ShortBio,
n.HeadShotPhoto, n.HeadShotPhotographer, n.HeadShotContributor,
x.NameCode, x.NameAKA, x.AlternateName,
g.NameLink, g.`Group Name`,
p.NameLink, p.`Relationship Type`, p.`Related To Link`,
p2.Position_ID, p2.NameLink, p2.`Position Held`, p2.`Times Held`,
p2.`Date Started`, p2.`Date Ended`, p2.Hyperlink as pos_Hyperlink,
p2.`Screentip Text`,
a.`Name Link`, a.Description, a.EventDate, a.Hyperlink, a.`Screentip Text`,
a.ExternalLink
from who_names as n
left outer join who_crossref as x on n.NameCode=x.NameCode
left outer join who_groups as g on n.NameCode=g.NameLink
left outer join who_personal as p on n.NameCode=p.NameLink
left outer join who_positions as p2 on n.NameCode=p2.NameLink
left outer join who_arts as a on n.NameCode=a.`Name Link`
where n.NameCode = ?
order by n.Name desc, g.`Group Name`, p2.`Date Started`, a.EventDate;
In order to output the various parts of the data, I:
1) Start a table,
2) Output the name and some other info in the first row,
3) Then in order to process, say, the groups (sub-groups someone associates themselves with within the organization), I issue:
mysqli_data_seek( $result, 0 ); // to rewind to top of data so we're at first row
and see if there's anything to process for subgroups (not everyone has anything ...),
4) I repeat for personal relationships, and other sections, going back to the top of the data and looping back through if there's anything to process.
When done with that individual, I close off the table, and loop back in the array to the next name, and repeat ...
While this works, 5-10 minutes is way to long to load a web page.
I am pondering ideas to resolve this, but I am not sure if it is any specific aspect of my code. Is it the seeks back to the top of the rowset returned? Is it the tables in the browser? Is it a combination of both (very possibly)? The program is too big to post here in its entirety. I am feeling rather flummoxed at how to resolve this, and hoping someone has some pointers to help me speed the processing up, and I hope the details I've given are enough to give something to work with.
Based on comments and feedback below, in PHP Admin, I did the following:
explain select n.NameCode, n.AL_NameCode, n.Name, n.Name_HTML, n.Region, n.Local, n.Deceased,
n.ArmsLink, n.RollOfArms, n.Blazon, n.PreferredTitle, n.ShortBio, n.HeadShotPhoto,
n.HeadShotPhotographer, n.HeadShotContributor,
x.NameCode, x.NameAKA, x.AlternateName,
g.NameLink, g.`Group Name`,
p.NameLink, p.`Relationship Type`, p.`Related To Link`,
p2.Position_ID, p2.NameLink, p2.`Position Held`, p2.`Times Held`, p2.`Date Started`,
p2.`Date Ended`, p2.Hyperlink as pos_Hyperlink, p2.`Screentip Text`,
a.`Name Link`, a.Description, a.EventDate, a.Hyperlink, a.`Screentip Text`,
a.ExternalLink
from who_names as n
left outer join who_crossref as x on n.NameCode=x.NameCode
left outer join who_groups as g on n.NameCode=g.NameLink
left outer join who_personal as p on n.NameCode=p.NameLink
left outer join who_positions as p2 on n.NameCode=p2.NameLink
left outer join who_arts as a on n.NameCode=a.`Name Link`
where n.NameCode=638
order by n.Name desc, g.`Group Name`, p2.`Date Started`, a.EventDate
This returned:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE n const PRIMARY,ix1_names PRIMARY 4 const 1 Using temporary; Using filesort
1 SIMPLE x ref ix2_crossref ix2_crossref 4 const 1 NULL
1 SIMPLE g ref ix3_groups ix3_groups 4 const 3 NULL
1 SIMPLE p ref ix4_personal ix4_personal 4 const 1 NULL
1 SIMPLE p2 ref ix5_positions ix5_positions 4 const 13 NULL
1 SIMPLE a ref ix6_arts ix6_arts 4 const 28 NULL
Which appears to just be a list of the indexes, so it doesn't seem to be helping me.
Since you are using a SINGLE main table and the rest of the joins are all OUTER JOIN there's a single most important index that can make your query faster:
create index ix1_names on who_names (NameCode, Name);
Also, the Nested Loop Joins (NLJ) against the related tables will benefit of the following indexes. You may already have several of these so check if you have them first. If you don't, then create them:
create index ix2_crossref on who_crossref (NameCode);
create index ix3_groups on who_groups (NameLink);
create index ix4_personal on who_personal (NameLink);
create index ix5_positions on who_positions (NameLink);
create index ix6_arts on who_arts (`Name Link`);
But again, it's the first one the one I consider the most important one.
You'll need to test for real to see if the performance improves with it/them.
If the query is still slow, please retrieve the execution plan, as #memo suggested, by using:
explain select ...
First, try removing the "order by" clause and see if that improves anything. Sometimes it can happen that the query itself is fast, but the re-ordering is slow, requiring temporary files.
Second, feed the query to an EXPLAIN statement (e.g. EXPLAIN SELECT whathaveyou FROM table...). Check out the output for bottlenecks, missing indexes etc. (https://dev.mysql.com/doc/refman/8.0/en/using-explain.html)
After a lot of work, I found a few issues that I was able to resolve: I was (thinking it made sense at the time) opening some tables when they weren't necessary to get row counts; I dropped the big join and just opened the sub-tables as needed; cleaned up a few other places in the code; added a few more indexes on another set of tables that weren't in the original join. I was able to reduce the speed from 4 minutes to 45 seconds. While 45 seconds is a long time to load a page, I figure since this page was handling up to 1500 (sometimes more) primary records, and pulling data from up to 10 different tables, formatting (tables inside tables, etc.), that 45 seconds is probably doable, with a note at the top of the page and a progress bar that displays while loading the page. Thanks, all. The indexes did help, and the other explanations also helped a lot.
I have a database where the results from a shooter game are stored. I put them to 3NF to allow extensions of the system. So it looks like this:
Player
-------------------
GameId integer
PlayerId integer
TeamId integer
Hits
-------------------
GameId integer
FromId integer
ToId integer
Hits integer
So basically for every game there is a ID and every Player and Team has its ID (with their names stored in other databases)
Now I want to calculate points for each player. I need the points for each game but more importantly the total per player. The points are basically: 3 Points for each hit on opponent, -2 points for each hit of a team member and -2 points for each hit taken.
Alone the calculation of the number of team hits requires a JOIN with 3 tables and I fear for performance in production environment. (Each game has ~8 players-> PlayerDB-Size is 8n and HitsDB-Size is (8-1)^2*n)
And at the end: I need to calculate the points per player for each game and sum those up because the minimum points per game should be zero. And finally get a rank for each player (player x has the 2nd most total points etc)
I feel like I'm getting lost in overly complicated queries that will kill the database' performance at some point.
Could anyone judge the design and maybe give me some pointers where to start looking further? I though about storing the TeamHits and Points per Game in the players Database (Points for summing over them, teamHits for statistical purposes) but that would of course break normalization.
PS: I'm working with PHP 5 and MYSQL. I also thought about getting each game from the database, calculating the points in PHP (which I'm already doing when I show the game) and writing this back (optimally on putting in the game to the DB but also when the parameters for the points change)
Edit: Idea to avoid subselects would be:
SELECT p.*, SUM(h.Hits) AS TeamHits, SUM(h2.Hits) as Hits
FROM player p
LEFT JOIN
(hits h
INNER JOIN player p2
ON h.GameId=p2.GameId AND h.ToId=p2.PlayerId
)
ON p.GameId=p2.GameId AND h.FromId=p.PlayerId AND p.TeamId=p2.TeamId
GROUP BY p.PlayerId, p.GameId
LEFT JOIN hits h2
ON h2.GameId=p.GameId AND h2.FromId=p.PlayerId
But of course this does not work. Is it even possible to combine groupings with joins or will I have to use subqueries?
Best I have is:
SELECT p.PlayerId, SUM((-2-3)*IFNULL(th.TeamHits, 0) + (3)*IFNULL(h.Hits, 0) + (-2)*IFNULL(ht.HitsTaken, 0)) AS Points
FROM player p
LEFT JOIN
(SELECT p.GameId, p.PlayerId, SUM(h.Hits) AS TeamHits
FROM player p
INNER JOIN hits h
ON h.GameId=p.GameId AND p.PlayerId=h.FromId
INNER JOIN player p2
ON p.GameId=p2.GameId AND p2.PlayerId=h.ToId AND p.TeamId=p2.TeamId
GROUP BY p.PlayerId, p.GameId) th
ON p.GameId=th.GameId AND p.PlayerId=th.PlayerId
LEFT JOIN
(SELECT p.GameId, p.PlayerId, SUM(h.Hits) AS Hits
FROM player p
INNER JOIN hits h
ON h.GameId=p.GameId AND p.PlayerId=h.FromId
GROUP BY p.PlayerId, p.GameId) h
ON p.GameId=h.GameId AND p.PlayerId=h.PlayerId
LEFT JOIN
(SELECT p.GameId, p.PlayerId, SUM(h.Hits) AS HitsTaken
FROM player p
INNER JOIN hits h
ON h.GameId=p.GameId AND p.PlayerId=h.ToId
INNER JOIN player p2
ON p.GameId=p2.GameId AND p2.PlayerId=h.FromId AND p.TeamId!=p2.TeamId
GROUP BY p.PlayerId, p.GameId) ht
ON p.GameId=ht.GameId AND p.PlayerId=ht.PlayerId
GROUP BY p.PlayerId
Fiddle: http://sqlfiddle.com/#!9/dc0cb/4
Current problem: For a database with about 10,000 games calculating the points for all players takes about 18s. This is unusable, so I need to improve this...
Joins are not that expensive, subqueries are. as long as you can avoid subqueries you're not hitting too bad.
Remember, a database is built for this stuff these days.
Just make sure you have the proper indexes on the right fields so its optimised. Like teamID and GameID and playerID should be indexes.
Just run it in phpmyadmin and see how many milliseconds it takes to execute. if it takes more than 50 its a heavy query, but usually its pretty hard to hit this... I once managed to make a very heavy query that joined 100.000+ rows out of different tables and views and still did that in 5ms...
What numbers of requests a hour are we talking about? 200 players a day? 200.000 players a day? How often do the requests happen? 10 per second per player? once a minute? how loaded is your database?
I think that all these parameters are low, so you shouldnt worry about this optimisation yet.
Get your game up and running, clean up the php code where real gains can be had, and stay clear of complex subqueries or views.
As long as your table does joins and unions its pretty darn fast. and if you must do a subquery see if there is not an alternative way by using a linking table to link certain results to certain other tables so you can do a join instead of a subquery.
I am new to all of this and I have Googled and searched on here, but to no avail. Using google and some of the responses here, I've managed to solve a separate problem, but this is what I'm really interested in and am wondering if this is even possible/how to accomplish it.
I have mysql table that looks like this:
id type of game players timestamp
1 poker a,b,c,d,e,f,g,h 2011-10-08 08:00:00
2 fencing i,j,k,l,m,n,o,p 2011-10-08 08:05:00
3 tennis a,e,k,g,p,o,d,z 2011-10-08 08:10:00
4 football x,y,f,b 2011-10-08 08:15:00
There are 7 types of games, and either 4 or 8 players separated by commas for each gametype.
However, the players are IRC nicknames so potentially there could be new players with unique nicknames all the time.
What I am trying to do is look in the players column of the entire table and find the top 10 players in terms of games played, regardless of the gametype, and print it out to a website in this format, e.g.:
Top 10 Players:
a (50 games played)
f (39 games played)
o (20 games played)
......
10 g (2 games played)
Does anyone have any idea how to accomplish this? Any help is appreciated! Honestly, without this website I would not have even come this fair in my project!
My suggestion is that you don't keep a list of the players for each game in the same table, but rather implement a relationship between a games table and a players table.
The new model could look like:
TABLE Games:
id type of game timestamp
1 poker 2011-10-08 08:00:00
2 fencing 2011-10-08 08:05:00
3 tennis 2011-10-08 08:10:00
4 football 2011-10-08 08:15:00
TABLE Players:
id name
1 a
2 b
3 c
.. ..
TABLE PlayersInGame:
id idGame idPlayer current
1 1 1 true //Player a is currently playing poker
When a player starts a game, add it to the PlayersInGame table.
When a player exits a game, set the current status to false.
To retrieve the number of games played by a player, query the PlayersInGame table.
SELECT COUNT FROM PlayersInGame WHERE idPlayer=1
For faster processing you need to de-normalize(not actually denormalization, but i don't know what else to call it) the table and keep track of the number of games for each player in the Players table. This would increase the table size but provide better speed.
So insert column games played in Players and query after that:
SELECT * FROM Players ORDER BY games_played DESC LIMIT 10
EDIT:
As Ilmari Karonen pointed out, to gain speed from this you must create an INDEX for the column games_played.
Unless you have a huge number of players, you probably don't need the denormalization step suggested at the end of Luchian Grigore's answer. Assuming tables structured as he initially suggests, and an index on PlayersInGame (idPlayer), the following query should be reasonably fast:
SELECT
name,
COUNT(*) AS games_played
FROM
PlayersInGame AS g
JOIN Players AS p ON p.id = g.idPlayer
GROUP BY g.idPlayer
ORDER BY games_played DESC
LIMIT 10
This does require a filesort, but only on the grouped data, so its performance will only depend on the number of players, not the number of games played.
Ps. If you do end up adding an explicit games_played column to the player table, do remember to create an index on it — otherwise the denormalization will gain you nothing.
I'm creating an app for a trading card game called Magic: the Gathering and I have made a query that checks all user-submitted decks and gives you the percentage of cards you have in your inventory over the cards in the deck. But my problem is, it does this for all the decks on the database. What I want to do is only return the decks which I already have 50% of the cards for.
Here is the query:
SELECT
SUM(t.qty_inv) / t.deck_cards completed,
t.deck_id
FROM (
SELECT
CASE WHEN m.qty_inv IS NULL THEN 0 WHEN m.qty_inv > dc.card_qty THEN dc.card_qty ELSE m.qty_inv END qty_inv,
dc.deck_id,
d.deck_cards
FROM mtgb_test.decks d
INNER JOIN mtgb_test.decks_cards dc ON (d.deck_id = dc.deck_id)
LEFT OUTER JOIN (
SELECT
COUNT(*) qty_inv, item_print_id print_id
FROM mtgb_test.inventories_items
WHERE item_user_id = 1
GROUP BY item_print_id
) m ON (m.print_id = dc.card_print_id)
) t
GROUP BY deck_id
ORDER BY completed DESC;
The problem with this query is that I can't use the derived completed field in the where clause like so:
WHERE completed > 0.5
I don't know if variables can solve this problem, I tried a bit but it got mished mashed as I'm very new to user-defined variables.
Edit: Some good people answered below that I needed to have the HAVING syntax, and that is the correct and obvious answer. I'd just probably choose the best answer to the postscript question then.
Another thing, if I had 200,000 decks in my database and I had 2,000 cards in my inventory, I'm fine with this query looping through all those and finding which decks I already had half of the cards for. But my problem is with the LEFT OUTER JOIN. I don't know about how MySQL really works inside but maybe someone who does can point it out. When looping through all the 200,000 decks, would it do the left join for every deck there is or will it be smart enough to cache this so that it only queries my inventory items once?
Thanks in advance,
Ramon
You can use HAVING completed > 0.5 to further reduce the amount of rows returned. You can read more about HAVING clause at http://dev.mysql.com/doc/refman/5.5/en/select.html.
I have made a website for a 'walking challenge', which has a table that logs miles walked.
The target is 2105 miles (Newcastle, UK to Istanbul).
On the home page i have a leaderboard which currently shows the 5 teams who have racked up the most miles.
I am using the following query to achive this:
SELECT
SUM(log.distance) AS l,
log.*,
team.*
FROM
team
RIGHT JOIN
log ON team.teamname = log.teamname
GROUP BY
log.teamname
ORDER BY
l DESC
However i want this leaderboard to show the 5 teams that finished first rather than who have walked the furthest. ie, the teams who reached 2105 miles first.
The current website can be viewed here
Add a nullable completedDate field to the table and populate it whenever someone completes the race. Order by the completed date.
There'd be no way to order by who finished first otherwise.
Since you have a timestamp field that gets the current time everytime a team enters the number of miles it has walked, you could do domething like this:
SELECT
SUM(log.distance) AS l,
MAX(log.timestamp) AS t,
log.*,
team.*
FROM
team
RIGHT JOIN
log ON team.teamname = log.teamname
GROUP BY
log.teamname
WHERE
l >= 2105
ORDER BY
t ASC
Keep in mind that this will only work if you don't allow a team to add extra miles after completing the target distance.
If they are able to add extra miles after completing the target, let me know, and i'll try looking for another query