I need to calculate the average score of every player's 3 most recent scores (golf rounds). If it matters to the code / syntax, this is only required when they have a minimum of 3 scores.
I have a view that has these fields:
round_id
player_id
score
round_date
As new scores are entered into the database, I would like to keep track and notified. I thought my options would to be keep this within the database (somehow) or to generate appropriate php code to do the equivalent. But thought keeping inside the database itself, it could / would handle new data insertions / updates better. PHP would have a page that would need to be loaded to execute.
I have seen some examples of nested select statements, and some that have mysql variables (my basic sql skills, not really gone into variables so would need explaining). none seem to directly relate to my specific needs.
Thanks
Something like this (untested):
select player_id, avg(substring_index(substring_index(scores,',',round),',',-1))
from
(
select 1 round union all select 2 union all select 3
) last_rounds
cross join
(
select player_id, group_concat(score order by round_date desc) scores
from player_round
group by player_id
having count(*) >= 3
) player_scores
group by player_id
Related
I know this has been covered somewhat but it hasn't helped my situation, rather there's been discrepancies in the rank. Wondering if anyone would be able to help!
So i have a game, the database tables are:
users, maps, nicknames, user_game_scores
Im developing a leader board and easily able to get the information ordered by score. fantastic.
But i want to rank this so that i can pull a specific users scores and the rank be relevant to all scores. eg:
GLOBAL SCORES
user info - Score - (rank)1
user info - Score - (rank)2
user info - Score - (rank)3
etc.
Whereas USER SCORES are more likely to be:
user info - Score - (rank)82
user info - Score - (rank)94
user info - Score - (rank)115
etc.
I imagine the implementation to be this:
SELECT users.first_name, users.surname, player_nicknames.nickname, maps.map_name, user_game_scores.score,
FIND_IN_SET( score, ( SELECT GROUP_CONCAT( score ORDER BY score DESC ) FROM user_game_scores ) ) AS rank
FROM `user_game_scores`
INNER JOIN users ON user_game_scores.user_id = users.user_id
INNER JOIN maps ON user_game_scores.map_id = maps.map_id
INNER JOIN player_nicknames ON user_game_scores.user_id = player_nicknames.user_id
WHERE user_game_scores.deleted is null
AND users.deleted is null
AND player_nicknames.deleted is null
ORDER BY user_game_scores.score DESC
But it returns this: (click here) - names etc have been removed from the image as it may not be appropriate to display
As you can see the Rank tends to miss a number or two (number 2 and 23). i understand that something like rank 24 will group and continue (which i prefer to happen in that instance) but i dont understand why some of the rank is missing and really dont want to post process this functionality.
Sorry this is long but i thought id provide as much information as i can. Thanks in advance!
It's probably because your SELECT GROUP_CONCAT subquery doesn't filter "deleted" (deleted is null) entries. – Paul Spiegel 9 hours ago
The current implementation is a single complex query with multiple joins and temporary tables, but is putting too much stress on my MySQL and is taking upwards of 30+ seconds to load the table. The data is retrieved by PHP via a JavaScript Ajax call and displayed on a webpage. Here is the tables involved:
Table: table_companies
Columns: company_id, ...
Table: table_manufacture_line
Columns: line_id, line_name, ...
Table: table_product_stereo
Columns: product_id, line_id, company_id, assembly_datetime, serial_number, ...
Table: table_product_television
Columns: product_id, line_id, company_id, assembly_datetime, serial_number, warranty_expiry, ...
A single company can have 100k+ items split between the two product tables. The product tables are unioned and filtered by the line_name, then ordered by assembly_datetime and limited depending on the paging. The datetime value is also reliant on timezone and this is applied as part of the query (another JOIN + temp table). line_name is also one of the returned columns.
I was thinking of splitting the line_name filter out from the product union query. Essentially I'd determine the ids of the lines that correspond to the filter, then do a UNION query with a WHERE condition WHERE line_id IN (<results from previous query>). This would cut out the need for joins and temp tables, and I can apply the line_name to line_id and timezone modification in PHP, but I'm not sure this is the best way to go about things.
I have also looked at potentially using Redis, but the large number of individual products is leading to a similarly long wait time when pushing all of the data to Redis via PHP (20-30 seconds), even if it is just pulled in directly from the product tables.
Is it possible to tweak the existing queries to increase the efficiency?
Can I push some of the handling to PHP to decrease the load on the SQL server? What about Redis?
Is there a way to architect the tables better?
What other solution(s) would you suggest?
I appreciate any input you can provide.
Edit:
Existing query:
SELECT line_name,CONVERT_TZ(datetime,'UTC',timezone) datetime,... FROM (SELECT line_name,datetime,... FROM ((SELECT line_id,assembly_datetime datetime,... FROM table_product_stereos WHERE company_id=# ) UNION (SELECT line_id,assembly_datetime datetime,... FROM table_product_televisions WHERE company_id=# )) AS union_products INNER JOIN table_manufacture_line USING (line_id)) AS products INNER JOIN (SELECT timezone FROM table_companies WHERE company_id=# ) AS tz ORDER BY datetime DESC LIMIT 0,100
Here it is formatted for some readability.
SELECT line_name,CONVERT_TZ(datetime,'UTC',tz.timezone) datetime,...
FROM (SELECT line_name,datetime,...
FROM (SELECT line_id,assembly_datetime datetime,...
FROM table_product_stereos WHERE company_id=#
UNION
SELECT line_id,assembly_datetime datetime,...
FROM table_product_televisions
WHERE company_id=#
) AS union_products
INNER JOIN table_manufacture_line USING (line_id)
) AS products
INNER JOIN (SELECT timezone
FROM table_companies
WHERE company_id=#
) AS tz
ORDER BY datetime DESC LIMIT 0,100
IDs are indexed; Primary keys are the first key for each column.
Let's build this query up from its component parts to see what we can optimize.
Observation: you're fetching the 100 most recent rows from the union of two large product tables.
So, let's start by trying to optimize the subqueries fetching stuff from the product tables. Here is one of them.
SELECT line_id,assembly_datetime datetime,...
FROM table_product_stereos
WHERE company_id=#
But look, you only need the 100 newest entries here. So, let's add
ORDER BY assembly_datetime DESC
LIMIT 100
to this query. Also, you should put a compound index on this table as follows. This will allow both the WHERE and ORDER BY lookups to be satisfied by the index.
CREATE INDEX id_date ON table_product_stereos (company_id, assembly_datetime)
All the same considerations apply to the query from table_product_televisions. Order it by the time, limit it to 100, and index it.
If you need to apply other selection criteria, you can put them in these inner queries. For example, in a comment you mentioned a selection based on a substring search. You could do this as follows
SELECT t.line_id,t.assembly_datetime datetime,...
FROM table_product_stereos AS t
JOIN table_manufacture_line AS m ON m.line_id = t.line_id
AND m.line_name LIKE '%test'
WHERE company_id=#
ORDER BY assembly_datetime DESC
LIMIT 100
Next, you are using UNION to combine those two query result sets into one. UNION has the function of eliminating duplicates, which is time-consuming. (You know you don't have duplicates, but MySQL doesn't.) Use UNION ALL instead.
Putting this all together, the innermost sub query becomes this. We have to wrap up the subqueries because SQL is confused by UNION and ORDER BY clauses at the same query level.
SELECT * FROM (
SELECT line_id,assembly_datetime datetime,...
FROM table_product_stereos
WHERE company_id=#
ORDER BY assembly_datetime DESC
LIMIT 100
) AS st
UNION ALL
SELECT * FROM (
SELECT line_id,assembly_datetime datetime,...
FROM table_product_televisions
WHERE company_id=#
ORDER BY assembly_datetime DESC
LIMIT 100
) AS tv
That gets you 200 rows. It should get those rows fairly quickly.
200 rows are guaranteed to be enough to give you the 100 most recent items later on after you do your outer ORDER BY ... LIMIT operation. But that operation only has to crunch 200 rows, not 100K+, so it will be far faster.
Finally wrap up this query in your outer query material. Join the table_manufacture_line information, and fix up the timezone.
If you do the indexing and the ORDER BY ... LIMIT operation earlier, this query should become very fast.
The comment dialog in your question indicates to me that you may have multiple product types, not just two, and that you have complex selection criteria for your paged display. Using UNION ALL on large numbers of rows slams performance: it converts multiple indexed tables into an internal list of rows that simply can't be searched efficiently.
You really should consider putting your two kinds of product data in a single table instead of having to UNION ALL multiple product tables. The setup you have now is inflexible and won't scale up easily. If you structure your schema with a master product table and perhaps some attribute tables for product-specific information, you will find yourself much happier two years from now. Seriously. Please consider making the change.
Remember: Index fast, data slow. Use joins over nested queries. Nested queries return all of the data fields whereas joins just consider the filters (which should all be indexed - make sure there's a unique index on table_product_*.line_id). It's been a while but I'm pretty sure you can join "ON company_id=#" which should cut down the results early on.
In this case, all of the results refer to the same company (or a much smaller subset) so it makes sense to run that query separately (and it makes the query more maintainable).
So your data source would be:
(table_product_stereos as prod
INNER JOIN table_manufacture_line AS ml ON prod.line_id = ml.line_id and prod.company_id=#
UNION
table_product_televisions as prod
INNER JOIN table_manufacture_line as ml on prod.line_id = ml.line_id and prod.company_id=#)
From which you can select prod. or ml. fields as required.
PHP is not a solution at all...
Redis can be a solution.
But the main thing I would change is the index creation for the tables (add missing indexe)...If you're running into temp tables you didn't create indexes well for the tables. And 100k rows in not much at all.
But I cant help you without any table creation statements as well as queries you run.
Make sure your "where part" is part of youf btree index from left to right.
I have this SQL query here that grabs the 5 latest news posts. I want to make it so it also grabs the total likes and total news comments in the same query. But the query I made seems to be a little slow when working with large amounts of data so I am trying to see if I can find a better solution. Here it is below:
SELECT *,
`id` as `newscode`,
(SELECT COUNT(*) FROM `likes` WHERE `type`="newspost" AND `code`=`newscode`) as `total_likes`,
(SELECT COUNT(*) FROM `news_comments` WHERE `post_id`=`newscode`) as `total_comments`
FROM `news` ORDER BY `id` DESC LIMIT 5
Here is a SQLFiddle as well: http://sqlfiddle.com/#!2/d3ecbf/1
I would recommend adding a total_likes and total_comments fields to the news table which gets incremented/decremented whenever a like and/or comment is added or removed.
Your likes and news_comments tables should be used for historical purposes only.
This strenuous counting should not be performed every time a page is loaded because that is a complete waste of resources.
You could rewrite this using joins, MySQL has known issues with subqueries, especially when dealing with large data sets:
SELECT n.*,
`id` as `newscode`,
COALESCE(l.TotalLikes, 0) AS `total_likes`,
COALESCE(c.TotalComments, 0) AS `total_comments`
FROM `news` n
LEFT JOIN
( SELECT Code, COUNT(*) AS TotalLikes
FROM `likes`
WHERE `type` = "newspost"
GROUP BY Code
) AS l
ON l.`code` = n.`id`
LEFT JOIN
( SELECT post_id, COUNT(*) AS TotalComments
FROM `news_comments`
GROUP BY post_id
) AS c
ON c.`post_id` = n.`id`
ORDER BY n.`id` DESC LIMIT 5;
The reason is that when you use a join as above, MySQL will materialise the results of the subquery when it is first needed, e.g at the start of this query, mySQL will put the results of:
SELECT post_id, COUNT(*) AS TotalComments
FROM `news_comments`
GROUP BY post_id
into an in memory table and hash post_id for faster lookups. Then for each row in news it only has to look up TotalComments from this hashed table, when you use a correlated subquery it will execute the query once for each row in news, which when news is large will result in a large number of executions. If the initial result set is small you may not see a performance benefit and it may be worse.
Examples on SQL Fiddle
Finally, you may want to index the relevant fields in news_comments and likes. For this particular query I think the following indexes will help:
CREATE INDEX IX_Likes_Code_Type ON Likes (Code, Type);
CREATE INDEX IX_newcomments_post_id ON news_comments (post_id);
Although you may need to split the first index into two:
CREATE INDEX IX_Likes_Code ON Likes (Code);
CREATE INDEX IX_Likes_Type ON Likes (Type);
First check for helping indexes on columns id, post_id and type,code.
I assume this is T-SQL, as that is what I am most familiar with.
First I would check indexes. If that looks good, then I'd check statement. Take a look at your query map to see how it's populating your result.
SQL works backward, so it starts with your last AND statement and goes from there. It'll group them all by code, and then type, and finally give you a count.
Right now, you're grabbing everything with certain codes, regardless of date. When you stated that you want the latest, I assume there is a date column somewhere.
In order to speed things up, add another AND to your WHERE and account for the date. Either last 24 hours, last week, whatever.
I have a high scoring (top scores) system, which is calculating positions by players's eperience.
But now I need to use the player's rank in other places just the web, maybe more places in the web too like personal
high scores, and it will show the player's rank in that skill.
Therefore just looping & playing with the loop cycle like rank++ won't really work, cause I need to save that rank for
other places.
What I could do is loop through all players and then send a query to update that player's rank, but what if i have 1000 players? or more?
that means 1000 queries per load.
I have thought if there could be a SQL query I can use to do the same action, in one or two queries.
How can I do this? I calculate ranks by ordering by player's eperience, so my table structure looks like this:
Tables:
Players
id (auto_increment) integer(255)
displayname varchar(255) unique
rank integer(255) default null
experience bigint(255)
This should give you the rank for user with id = 1. If you want every player, just remove the WHERE clause:
SELECT a.id, a.displayname, a.rank, a.experience
FROM (
SELECT id, displayname, #r:=#r+1 AS rank, experience
FROM players, (SELECT #rank:=0) tmp
ORDER BY experience DESC) a
WHERE a.id = 1
I wouldn't have rank in the players table directly, since this would mean that you would have to recalculate it every time a user changes experience. You could do this query anytime you want to get the rank for a player or for a leaderboard.
If you still want to update it, You can do an INNER JOIN with this query to UPDATE the original table with the rank from this query.
I have a table with fields id, votes(for each users), rating.
Task: Counting user rating based on votes for him and for others. that is, each time i update the field votes needed recalculation field rating.
Which means some can be on the 3rd place. voted for him and that he would be stood up to 2rd place, and the other vice versa - from 2 to 3. (in rating fiels)
How to solve this problem? Each time update the field to count users ratings on php and do a lot of update query in mysql is very expensive.
If you want to get the ratings with a select without having a rating column, then this is the way. However from a performance perspective I cannot guarantee this will be your best option. The way it works is that if two users have the same amount of votes they will have the same rating and then it will skip ahead the necessary number for the next different rating:
set #rating:=0;
set #count:=1;
select id,
case when #votes<>votes then #rating:=#rating+#count
else #rating end as rating,
case when #votes=votes then #count:=#count+1
else #count:=1 end as count,
#votes:=votes as votes
from t1
order by votes desc
sqlfiddle
This gives you an extra column which you can ignore, or you could wrap this select in to a subquery and have:
select t2.id,t2.votes,t2.rating from (
select id,
case when #votes<>votes then #rating:=#rating+#count
else #rating end as rating,
case when #votes=votes then #count:=#count+1
else #count:=1 end as count,
#votes:=votes as votes
from t1
order by votes desc) as t2
but the sqlfiddle is strangely giving inconsistent results so you'd have to do some testing. If anyone knows why this is I'd be interested in knowing the reason.
If you want to get the rating for just one user then doing the subquery option and using a where after the from should give you the desired result. sqlfiddle - but again, inconsistent results, run it a few times and sometimes it gives rating as 10 other times as 30. I think testing in your db to see what happens will be best.
Well it depends on a lot of factors
Do you have a large system that is growing exponentially?
Do you require the voting data for historical reporting?
Do users need to register when they vote?
Will this system be use only for one voting type throughout the system life cycle or will more voting on different subjects take place?
If all of the answers are NO then your current update method will work just fine. Just ensure that you apply best coding and MySQL table practices anyway.
Let assume most or all your answers were YES then I would suggest the following:
Every time a vote takes place INSERT the record into your table
Using INSERT, add a timestamp, user id if not possible then maybe an ip address/location
Assign a subject id as foreign key from the vote_subject table. In this table store the subject and date of voting
Now you can create a SELECT statement that can count the votes and calculate the ratings. The person top of the vote count list will get rating 1 in the SELECT. Furthermore you can filter per subject, per day, per user and you should also be able to determine volume depending on the result required.
All this of course dependent on how your system will scale in future. This might be way overkill but something to think about.
Yes aggregations are expensive. You could update a rank table every five minutes or so and query from there. The query as you probably already now is this:
select id, count(*) as votes
from users
group by id
order by votes desc
Instead of having the fields id, votes and rating, alter the table to have the fields id, rating_sum and rating_count. Each time you have a new rating you quering the database like this:
"UPDATE `ratings` SET `rating_count` = `rating_count` + 1, `rating_sum` = `rating_sum`+ $user_rating WHERE `id` = $id"
Now the rating is just the average -> rating_sum / rating_count. No need to have a field with the rating.
Also, to prevent a user rate more than one times, you could create a table named rating_users that will have 2 foreign keys the users.id and ratings.id. The primary key will be (users.id, ratings.id). So each time a user tries to rate first you check this table.
I would recommend doing this when querying the data. It would be much simpler. Order by votes descending.
Perhaps create a view and use the view when querying the data.
You could try something like this:
SET #rank := 0
select id, count(*) as votes, #rank := #rank + 1
from users
group by id
order by votes desc
Or
SET #rank := 0
select id, votes, #rank := #rank + 1
from users
order by votes desc