I have a database where the results from a shooter game are stored. I put them to 3NF to allow extensions of the system. So it looks like this:
Player
-------------------
GameId integer
PlayerId integer
TeamId integer
Hits
-------------------
GameId integer
FromId integer
ToId integer
Hits integer
So basically for every game there is a ID and every Player and Team has its ID (with their names stored in other databases)
Now I want to calculate points for each player. I need the points for each game but more importantly the total per player. The points are basically: 3 Points for each hit on opponent, -2 points for each hit of a team member and -2 points for each hit taken.
Alone the calculation of the number of team hits requires a JOIN with 3 tables and I fear for performance in production environment. (Each game has ~8 players-> PlayerDB-Size is 8n and HitsDB-Size is (8-1)^2*n)
And at the end: I need to calculate the points per player for each game and sum those up because the minimum points per game should be zero. And finally get a rank for each player (player x has the 2nd most total points etc)
I feel like I'm getting lost in overly complicated queries that will kill the database' performance at some point.
Could anyone judge the design and maybe give me some pointers where to start looking further? I though about storing the TeamHits and Points per Game in the players Database (Points for summing over them, teamHits for statistical purposes) but that would of course break normalization.
PS: I'm working with PHP 5 and MYSQL. I also thought about getting each game from the database, calculating the points in PHP (which I'm already doing when I show the game) and writing this back (optimally on putting in the game to the DB but also when the parameters for the points change)
Edit: Idea to avoid subselects would be:
SELECT p.*, SUM(h.Hits) AS TeamHits, SUM(h2.Hits) as Hits
FROM player p
LEFT JOIN
(hits h
INNER JOIN player p2
ON h.GameId=p2.GameId AND h.ToId=p2.PlayerId
)
ON p.GameId=p2.GameId AND h.FromId=p.PlayerId AND p.TeamId=p2.TeamId
GROUP BY p.PlayerId, p.GameId
LEFT JOIN hits h2
ON h2.GameId=p.GameId AND h2.FromId=p.PlayerId
But of course this does not work. Is it even possible to combine groupings with joins or will I have to use subqueries?
Best I have is:
SELECT p.PlayerId, SUM((-2-3)*IFNULL(th.TeamHits, 0) + (3)*IFNULL(h.Hits, 0) + (-2)*IFNULL(ht.HitsTaken, 0)) AS Points
FROM player p
LEFT JOIN
(SELECT p.GameId, p.PlayerId, SUM(h.Hits) AS TeamHits
FROM player p
INNER JOIN hits h
ON h.GameId=p.GameId AND p.PlayerId=h.FromId
INNER JOIN player p2
ON p.GameId=p2.GameId AND p2.PlayerId=h.ToId AND p.TeamId=p2.TeamId
GROUP BY p.PlayerId, p.GameId) th
ON p.GameId=th.GameId AND p.PlayerId=th.PlayerId
LEFT JOIN
(SELECT p.GameId, p.PlayerId, SUM(h.Hits) AS Hits
FROM player p
INNER JOIN hits h
ON h.GameId=p.GameId AND p.PlayerId=h.FromId
GROUP BY p.PlayerId, p.GameId) h
ON p.GameId=h.GameId AND p.PlayerId=h.PlayerId
LEFT JOIN
(SELECT p.GameId, p.PlayerId, SUM(h.Hits) AS HitsTaken
FROM player p
INNER JOIN hits h
ON h.GameId=p.GameId AND p.PlayerId=h.ToId
INNER JOIN player p2
ON p.GameId=p2.GameId AND p2.PlayerId=h.FromId AND p.TeamId!=p2.TeamId
GROUP BY p.PlayerId, p.GameId) ht
ON p.GameId=ht.GameId AND p.PlayerId=ht.PlayerId
GROUP BY p.PlayerId
Fiddle: http://sqlfiddle.com/#!9/dc0cb/4
Current problem: For a database with about 10,000 games calculating the points for all players takes about 18s. This is unusable, so I need to improve this...
Joins are not that expensive, subqueries are. as long as you can avoid subqueries you're not hitting too bad.
Remember, a database is built for this stuff these days.
Just make sure you have the proper indexes on the right fields so its optimised. Like teamID and GameID and playerID should be indexes.
Just run it in phpmyadmin and see how many milliseconds it takes to execute. if it takes more than 50 its a heavy query, but usually its pretty hard to hit this... I once managed to make a very heavy query that joined 100.000+ rows out of different tables and views and still did that in 5ms...
What numbers of requests a hour are we talking about? 200 players a day? 200.000 players a day? How often do the requests happen? 10 per second per player? once a minute? how loaded is your database?
I think that all these parameters are low, so you shouldnt worry about this optimisation yet.
Get your game up and running, clean up the php code where real gains can be had, and stay clear of complex subqueries or views.
As long as your table does joins and unions its pretty darn fast. and if you must do a subquery see if there is not an alternative way by using a linking table to link certain results to certain other tables so you can do a join instead of a subquery.
Related
Currently I'm developing a background cms for an online shop.
I split the tables as follow in my database:
-products
-productdetails (descrition...)
-productimages
-product variants (colors..)
-product cross selling
Now on the product edit page i need to fetch all data for a single product.
So my question is how i can get those details more efficient then make 3-5 database calls.
Or would the processing with php be less efficient then make those 3-5 calls ?
At the moment the query looks like that:
SELECT
pr.id, pr.categorieid, pr.itemnumber, pr.barcode, pr.price, pr.weight, pr.gender, pr.manufracture, pr.fsk18, pr.condition, pc.id AS pcid, pc.productcrossid, pc.sort, pd.productname,
pd.productdesc, pd.additional, pd.linktitle, pd.metatitle, pd.metadesc, pd.urlkeywords, pi.id AS piid, pi.wichimage, pi.variantid, pi.image, pi.imagealt, pv.id AS pvid, pv.variant,
pv.variantvalue, pv.sku, pv.price AS pvprice, pv.weight AS pvweight, pv.stock, pv.special
FROM
products pr
LEFT JOIN
productcross as pc
ON pr.id = pc.productid
LEFT JOIN
productdetails as pd
ON pr.id = pd.productid
LEFT JOIN
productimage as pi
ON pr.id = pi.productid AND pd.lang = pi.lang
LEFT JOIN
productvariants as pv
ON pr.id = pv.productid
WHERE
pr.id = :id
ORDER BY pd.lang ASC
As result i recieve many rows, because of the left join each value get joined with the rows i joined before.
The problem is there are dynamic many rows for cross selling, variants, images, so it can be random if variants or images are more (else i could group them atleast because each variant can get an own image, but there can be also more images then variants)
Products 1 row, productdetails according to how many languages are used, most likely 3.
Edit: According to Explain and the indexes i set, the performance of this single query is very good.
Edit:
According Paul Spiegel i tryed using GROUP_CONCAT
SELECT
pr.id, pr.categorieid, pr.itemnumber, pr.barcode, pr.price, pr.weight, pr.gender, pr.manufracture, pr.fsk18, pr.condition, pc.id AS pcid, pc.productcrossid, pc.sort, pd.productname,
pd.productdesc, pd.additional, pd.linktitle, pd.metatitle, pd.metadesc, pd.urlkeywords
FROM
products pr
LEFT JOIN
productsdetails as pd
ON pr.id = pd.productid
LEFT JOIN (
SELECT
GROUP_CONCAT(productcrossid) AS pcproductcrossid, GROUP_CONCAT(sort) AS pcsort, GROUP_CONCAT(id) AS pcid, productid
FROM productscross
WHERE productid = :id
) pc
ON pr.id = pc.productid
WHERE
pr.id = :id
ORDER BY pd.lang ASC
As result i recieve many rows, because of the left join each value get joined with the rows i joined before.
That's not what LEFT means.
X JOIN Y ON ... delivers rows that show up on both X and Y.
X LEFT JOIN Y ON ... delivers all the rows of X even if there is no matching row (or rows) in Y.
You might get "many rows" because the relationship is "1:many". Think of Classes JOIN Students With JOIN you get multiple rows per Class (one per student), except for any classes without any students. With LEFT JOIN, you additionally get a row for any Class with no students.
Your query with products will be a huge explosion of rows. All products, expanded by multiple details by multiple images, etc. It will be a mess.
In the EXPLAIN, multiply the numbers in the "Rows" column -- that will be a crude metric of how big the result set will be.
Use one query to get the images; another to get the colors; etc. Use JOIN (or LEFT JOIN only when needed.
GROUP_CONCAT() is handy sometimes. It might be useful to list the "colors". But for "images", you would then have to split it up so you can build multiple <img..> tags. That's rather easy to do, but it is extra work.
It is usually 'wrong' to have 1:1 mapping between tables. In such cases, why not have a single table?
Do not fear 3-5 queries. We are talking milliseconds. The rendering of the page is likely to take several times as long as the SELECTs. I often have several dozen queries to build a web page, yet I am satisfied with the performance. And, yes, I ascribe to the notion of putting all the info about one 'product' on the page at once (when practical). It's much better than having to click here to get the colors and click there to see the images, etc.
Rather than hitting so many query's you can refer to the concept which is known as flat tables in magento.
The logic behind using this concept is that what ever important data which is required to be show on the front end is stored in single table itself as well as the data is stored in there prescriptive tables.
So while querying you just need to pick the data from that flat table itself rather than querying to multiple tables and increasing the query execution time.
For reference Please check out the below link,Hope this helps.
Visit http://excellencemagentoblog.com/blog/2015/06/10/magento-flat-tables/
I do know the question is not regarding Magento but you can build your own logic to achieve this mechanism.
This is more of a theoretical query than anything else, but I have a complex join (resulting in upwards of 1900 records in the main table, combined with all the sub-result tables in the join -- join shown below), the resulting web page is taking 5-10 minutes on my local machine to process and complete building. I realize this could easily be many factors, but am hoping to get some hints. Basically I am loading an array of names from two tables (one is cross-references, so the array is used to sort the data on the names, with links and a field noting if it is a cross reference), then if a name is not a cross reference, I issue this join:
select
n.NameCode, n.AL_NameCode, n.Name, n.Name_HTML, n.Region, n.Local, n.Deceased,
n.ArmsLink, n.RollOfArms, n.Blazon, n.PreferredTitle, n.ShortBio,
n.HeadShotPhoto, n.HeadShotPhotographer, n.HeadShotContributor,
x.NameCode, x.NameAKA, x.AlternateName,
g.NameLink, g.`Group Name`,
p.NameLink, p.`Relationship Type`, p.`Related To Link`,
p2.Position_ID, p2.NameLink, p2.`Position Held`, p2.`Times Held`,
p2.`Date Started`, p2.`Date Ended`, p2.Hyperlink as pos_Hyperlink,
p2.`Screentip Text`,
a.`Name Link`, a.Description, a.EventDate, a.Hyperlink, a.`Screentip Text`,
a.ExternalLink
from who_names as n
left outer join who_crossref as x on n.NameCode=x.NameCode
left outer join who_groups as g on n.NameCode=g.NameLink
left outer join who_personal as p on n.NameCode=p.NameLink
left outer join who_positions as p2 on n.NameCode=p2.NameLink
left outer join who_arts as a on n.NameCode=a.`Name Link`
where n.NameCode = ?
order by n.Name desc, g.`Group Name`, p2.`Date Started`, a.EventDate;
In order to output the various parts of the data, I:
1) Start a table,
2) Output the name and some other info in the first row,
3) Then in order to process, say, the groups (sub-groups someone associates themselves with within the organization), I issue:
mysqli_data_seek( $result, 0 ); // to rewind to top of data so we're at first row
and see if there's anything to process for subgroups (not everyone has anything ...),
4) I repeat for personal relationships, and other sections, going back to the top of the data and looping back through if there's anything to process.
When done with that individual, I close off the table, and loop back in the array to the next name, and repeat ...
While this works, 5-10 minutes is way to long to load a web page.
I am pondering ideas to resolve this, but I am not sure if it is any specific aspect of my code. Is it the seeks back to the top of the rowset returned? Is it the tables in the browser? Is it a combination of both (very possibly)? The program is too big to post here in its entirety. I am feeling rather flummoxed at how to resolve this, and hoping someone has some pointers to help me speed the processing up, and I hope the details I've given are enough to give something to work with.
Based on comments and feedback below, in PHP Admin, I did the following:
explain select n.NameCode, n.AL_NameCode, n.Name, n.Name_HTML, n.Region, n.Local, n.Deceased,
n.ArmsLink, n.RollOfArms, n.Blazon, n.PreferredTitle, n.ShortBio, n.HeadShotPhoto,
n.HeadShotPhotographer, n.HeadShotContributor,
x.NameCode, x.NameAKA, x.AlternateName,
g.NameLink, g.`Group Name`,
p.NameLink, p.`Relationship Type`, p.`Related To Link`,
p2.Position_ID, p2.NameLink, p2.`Position Held`, p2.`Times Held`, p2.`Date Started`,
p2.`Date Ended`, p2.Hyperlink as pos_Hyperlink, p2.`Screentip Text`,
a.`Name Link`, a.Description, a.EventDate, a.Hyperlink, a.`Screentip Text`,
a.ExternalLink
from who_names as n
left outer join who_crossref as x on n.NameCode=x.NameCode
left outer join who_groups as g on n.NameCode=g.NameLink
left outer join who_personal as p on n.NameCode=p.NameLink
left outer join who_positions as p2 on n.NameCode=p2.NameLink
left outer join who_arts as a on n.NameCode=a.`Name Link`
where n.NameCode=638
order by n.Name desc, g.`Group Name`, p2.`Date Started`, a.EventDate
This returned:
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE n const PRIMARY,ix1_names PRIMARY 4 const 1 Using temporary; Using filesort
1 SIMPLE x ref ix2_crossref ix2_crossref 4 const 1 NULL
1 SIMPLE g ref ix3_groups ix3_groups 4 const 3 NULL
1 SIMPLE p ref ix4_personal ix4_personal 4 const 1 NULL
1 SIMPLE p2 ref ix5_positions ix5_positions 4 const 13 NULL
1 SIMPLE a ref ix6_arts ix6_arts 4 const 28 NULL
Which appears to just be a list of the indexes, so it doesn't seem to be helping me.
Since you are using a SINGLE main table and the rest of the joins are all OUTER JOIN there's a single most important index that can make your query faster:
create index ix1_names on who_names (NameCode, Name);
Also, the Nested Loop Joins (NLJ) against the related tables will benefit of the following indexes. You may already have several of these so check if you have them first. If you don't, then create them:
create index ix2_crossref on who_crossref (NameCode);
create index ix3_groups on who_groups (NameLink);
create index ix4_personal on who_personal (NameLink);
create index ix5_positions on who_positions (NameLink);
create index ix6_arts on who_arts (`Name Link`);
But again, it's the first one the one I consider the most important one.
You'll need to test for real to see if the performance improves with it/them.
If the query is still slow, please retrieve the execution plan, as #memo suggested, by using:
explain select ...
First, try removing the "order by" clause and see if that improves anything. Sometimes it can happen that the query itself is fast, but the re-ordering is slow, requiring temporary files.
Second, feed the query to an EXPLAIN statement (e.g. EXPLAIN SELECT whathaveyou FROM table...). Check out the output for bottlenecks, missing indexes etc. (https://dev.mysql.com/doc/refman/8.0/en/using-explain.html)
After a lot of work, I found a few issues that I was able to resolve: I was (thinking it made sense at the time) opening some tables when they weren't necessary to get row counts; I dropped the big join and just opened the sub-tables as needed; cleaned up a few other places in the code; added a few more indexes on another set of tables that weren't in the original join. I was able to reduce the speed from 4 minutes to 45 seconds. While 45 seconds is a long time to load a page, I figure since this page was handling up to 1500 (sometimes more) primary records, and pulling data from up to 10 different tables, formatting (tables inside tables, etc.), that 45 seconds is probably doable, with a note at the top of the page and a progress bar that displays while loading the page. Thanks, all. The indexes did help, and the other explanations also helped a lot.
So I have a query that works exactly as intended but needs heavy optimization. I am using software to track load times across my site and 97.8% of all load time on my site is a result of this one function. So to explain my database a little bit before getting to the query.
First I have a films table, a competitions table and a votes table. Films can be in many competitions and competitions have many films, since this is a many to many relationship we have a pivot table to show their relationship (filmCompetition) When loading the competition page however these films need to be in order of their votes (most voted at the top. least at the bottom).
Now In the query below you can see what I am doing, grabbing from the films from the filmsCompetition table that match the current competition id $competition->id, and then I order by the total number of votes for that film. Like I said this works but is super efficient but I cannot think of another way to do it.
$films = DB::select( DB::raw("SELECT f.*, COUNT(v.value) AS totalVotes
FROM filmCompetition AS fc
JOIN films AS f ON f.id = fc.filmId AND fc.competitionId = '$competition->id'
LEFT JOIN votes AS v ON f.id = v.filmId AND v.competitionId = '$competition->id'
GROUP BY f.id
ORDER BY totalVotes DESC
") );
For this query, you want indexes on filmCompetition(competitionId, filmId), films(id), and votes(filmId, competitionId)`.
However, it is probably more efficient to write the query like this:
SELECT f.*,
(SELECT COUNT(v.value)
FROM votes v
WHERE v.filmId = f.id and v.competitionId = '$competition->id'
) AS totalVotes
FROM films f
WHERE EXISTS (SELECT 1
FROM FilmCompetition fc
WHERE fc.FilmId = f.Filmid AND
fc.competitionId = '$competition->id'
)
ORDER BY TotalVotes DESC
This saves the outer aggregation, which should be a performance win. For this version, the indexes are FilmCompetition(FilmId, CompetitionId) and Votes(FilmId, CompetitionId).
The solution I actually ended up using was a little different but since Gordon answered the question I marked him as the correct answer.
My Solution
To actually fix this I did a slightly different approach, rather than trying to do all of this in SQL I did my three queries separately and then joined them together in PHP. While this can be slower in my case doing it my original way took about 15 seconds, doing it Gordons way took about 7, and doing it my new way took about 600ms.
Info: I have this table (PERSONS):
PERSON_ID int(10)
POINTS int(6)
4 OTHER COLUMNS which are of type int(5 or 6)
The table consist of 25M rows and is growing 0.25M a day. The distribution of points is around 0 to 300 points and 85% of the table has 0 points.
Question: I would like to return to the user which rank he/she has if they got at least 1 point. How and where would be the fastest way to do it, in SQL or PHP or combination?
Extra Info: Those lookups can happen every second 100 times. The solutions I have seen so far are not fast enough, if more info needed please ask.
Any advice is welcome, as you understand I am new to PHP and MySQL :)
Create an index on t(points) and on t(person_id, points). Then run the following query:
select count(*)
from persons p
where p.points >= (select points from persons p where p.person_id = <particular person>)
The subquery should use the second index as a lookup. The first should be an index scan on the first index.
Sometimes MySQL can be a little strange about optimization. So, this might actually be better:
select count(*)
from persons p cross join
(select points from persons p where p.person_id = <particular person>) const
where p.points > const.points;
This just ensures that the lookup for the points for the given person happens once, rather than for each row.
Partition your table into two partitions - one for people with 0 points and one for people with one or more points.
Add one index on points to your table and another on person_id (if these indexes don't already exist).
To find the dense rank of a specific person, run the query:
select count(distinct p2.points)+1
from person p1
join person p2 on p2.points > p1.points
where p1.person_id = ?
To find the non-dense rank of a specific person, run the query:
select count(*)
from person p1
join person p2 on p2.points >= p1.points
where p1.person_id = ?
(I would expect the dense rank query to run significantly faster.)
I've searched the site for similar posts but i found just one where the developer tried to do his calculations (win-lose-draws) with an enormous SQL query. I would like to do the calculations in my controller but don't really know where to start.
I have 2 tables which look like this:
Teams
teamID teamName
Games
gameID matchday homeTeamID awayTeamID homeScore awayScore
Now i'm trying to produce a league ranking out of this match results, But i need some insights on how to look at this...
At the moment, I have a query which selects all the match results and assigns the correct teamID's to the home or away Team like this:
"SELECT g.gameID, g.matchday, g.homeTeamID, g.awayTeamID, g.homeScore, g.awayScore, th.teamName as homeTeam, ta.teamName as awayTeam,
FROM games AS g
INNER JOIN teams as th ON g.homeTeamID = th.teamID
INNER JOIN teams as ta ON g.awayTeamID = ta.teamID
JOIN submenu_teams AS s ON g.submenuID = s.submenuID"
Can anybody try to explain where to go from here to get a nice ranking of the teams according to how many points they won during the season?
Thnx!
I would suggest to keep track of the points in a table (season1) so that every time a page is requested, you don't have to compute the rankings again : you just fetch from the table.
Everytime a new match is played, run a script that adds X point to winner and substracts Y points from loser.
To display, fetch the results and order by score.
You're done !
(was it my post that you read on rankings and SQL ?)