I am making a ranking app and are getting the users position in the ranking this way:
$sql = "SELECT fk_player_id FROM ".$prefix."_publicpoints
WHERE date BETWEEN '2013-01-01' AND '2013-12-31'
GROUP BY fk_player_id
HAVING SUM(points) > 235";
This is working as is should but are having one downfall. The query can get quite heavy if I have a ranking with 500.000 users. Then it have to run through all the users which have higher points than 235. Lets say that 235 give a posistion as # 345.879. Thats alot of rows... How can I do this in a better way? Atleast when I call the db?
Hoping 4 help and thanks in advance :-)
3 possible solutions that may (or may or may not combine them together depending on the situation
Add indices to the ranking columns
pre-compute the ranks only when it changes
pre-compute the ranks with a cron job - it should not matter if it is 10 minutes late.
If it is a generic ranking page, you can pre-render the page either with a template engine and cache it
you may be able to optimize your mysql performance as well either with more ram or configuring the caching of queries & temp tables
Related
I'm using PHP 7, MySQL and a small custom-built forum and a query for grabbing 7 columns with 2 SQL join statements into a "latest post" page. When the time comes that I hit 1 million rows will the limit 30 stop at 30 rows or will it have to sort the entire DB each run?
The reason I'm asking is I'm trying to wrap my head around how to paginate this custom forum I've built and if that pagination will be "ok" once it has to (theoretically) read through a million rows?
EDIT: My current query is a limit 30, sort desc.
EDIT2: Currently I'm getting about 500-600 posts give or take 50 a day. It's quickly adding up so I'm trying to monitor this before I get 1 million. That being said I'm only looking up one table right now, tblTopics and topic_id, topic_name, and topic_author (a fk). Then I'm doing another another lookup after that with the topic itself's foreign keys, topic_rating, and topic_category. The original lookup is where I have the sort and limit.
Sort is applied on the complete set, limit is applied after the sort, so adding a limit to an ORDER BY query does not make it a lot faster.
It depends.
SELECT ... FROM tbl ORDER BY x LIMIT 30;
INDEX(x)
will probably use the index and stop after 30 rows, not 1 million.
SELECT ... FROM tbl GROUP BY zz ORDER BY x LIMIT 30;
will scan all million rows, do the grouping, write to a tmp table, sort that tmp table, and only then deliver 30 rows.
SELECT ... FROM tbl WHERE yy = 123 ORDER BY x LIMIT 30;
INDEX(yy)
will probably prefer INDEX(yy), and it is hard to say how efficient it will be.
SELECT ... FROM tbl WHERE yy = 123 ORDER BY x LIMIT 30;
INDEX(yy, x)
will be very efficient -- not only can it use the index for filtering, but also for the ORDER BY and the LIMIT. Only 30 rows will be touched.
SELECT ... FROM tbl LIMIT 30;
is of dubious use. You will get some 30 rows, but who knows which 30? But it will be fast.
Well, this is still not answering you question. Your question involves a JOIN. Can you guess how much more complex the question becomes with JOIN involved?
If you would like to discuss your specific query, please provide the query and SHOW CREATE TABLE for each table and how many rows in each table.
If you are joining a 1-row table to a million row table, the 1-row table probably does not add any complexity.
If you are joining two million-row tables together without any indexes, then you are looking at a trillion intermediate 'rows' to work with!
Oh, and then you will want the 'second' 30 rows? That adds another dimension of complexity. I could spend a few more paragraphs on what can go wrong with OFFSET.
If this forum is somewhat open-ended where anyone can post "topics" and be the originating author, you probably want at a minimum a topics table with a PKID, Name, Author as you have, but also date added and most recent post and also count of posts against it. Too many times people build web sites that want counters all over the place and try to do aggregates, or the most recent, etc. Come to mention the most recent post, hold the ID of the most recent post too so you don't have to find the max date, then get the join base on that.
Then secondary table would be the details associated for a given post.
Then, via a trigger on your detail table for whatever you are posting against, you can do an update to the parent topic id and stamp it with count +1, most recent date of now, and the last ID with the ID of the newest record just created.
So now, joining to get that most recent context entry is a simple join and not overly complex.
Index on your topics table on the most recent post date so you are now getting ex: the most recent 30 topics, not necessarily the most recent 30 posts, such as 3 posts have a bunch of hits and account for all 30. Get 30 distinct topics, then let user see the details as they select the topic of interest. Your query at the top level is never going against the underlying details.
Obviously brief on true context of your website, but hopefully suggestions make sense for you to run with.
I'm working on a management system for a small library. I proposed them to replace the Excel spreadsheet they are using now with something more robust and professional like PhpMyBibli - https://en.wikipedia.org/wiki/PhpMyBibli - but they are scared by the amount of fields to fill, and also the interfaces are not fully translated in Italian.
So I made a very trivial DB, with basically a table for the authors and a table for the books. The authors table is because I'm tired to have to explain that "Gabriele D'Annunzio" != "Gabriele d'Annunzio" != "Dannunzio G." and so on.
My test tables are now populated with ~ 100k books and ~ 3k authors, both with plausible random text, to check the scripts under pressure.
For the public consultation I want to make an interface like that of Gallica, the website of the Bibliothèque nationale de France, which I find pretty useful. A sample can be seen here: http://gallica.bnf.fr/Search?ArianeWireIndex=index&p=1&lang=EN&f_typedoc=livre&q=Computer&x=0&y=0
The concept is pretty easy: for each menu, e.g. the author one, I generate a fancy <select> field with all the names retrieved from the DB, and this works smoothly.
The issue arises when I try to add beside every author name the number of books, as made by Gallica, in this way (warning - conceptual code, not actual PHP):
SELECT id, surname, name FROM authors
foreach row {
SELECT COUNT(*) as num FROM BOOKS WHERE id_auth=id
echo "<option>$surname, $name ($num)</option>";
}
With the code above a core of the CPU jumps at 100%, and no results are shown in the browser. Not surprising, since they are 3k queries on a 100k table in a very short time.
Just to try, I added a LIMIT 100 to the first query (on the authors table). The page then required 3 seconds to be generated, and 15 seconds when I raised the LIMIT to 500 (seems a linear increment). But of course I can't show to library users a reduced list of authors.
I don't know which hardware/software is used by Gallica to achieve their results, but I bet their budget is far above that of a small village library using 2nd hand computers.
Do you think that to add a "number_of_books" field in the authors table, which will be updated every time a new book is inserted, could be a practical solution, rather than to browse the whole list at every request?
BTW, a similar procedure must be done for the publication date, the language, the theme, and some other fields, so the query time will be hit again, even if the other tables are a lot smaller than the authors one.
Your query style is very inefficient - try using a join and group structure:
SELECT
authors.id,
authors.surname,
authors.name,
COUNT(books.id) AS numbooks
FROM authors
INNER JOIN books ON books.id_auth=authors.id
GROUP BY authors.id
ORDER BY numbooks DESC
;
EDIT
Just to clear up some issues I not explicitely said:
Ofcourse you don't need a query in the PHP loop any longer, just the displaying portion
Indices on books.id_auth and authors.id (the latter primary or unique) are assumed
EDIT 2
As #GordonLinoff pointed out, the IFNULL() is redundant in an inner join, so I removed it.
To get all themes, even if there aren't any books in them, just use a left join (this time including the IFNULL(), if your provider's MySQL may be old):
SELECT
theme.id,
theme.main,
theme.sub,
IFNULL(COUNT(books.theme),0) AS num
FROM themes
LEFT JOIN books ON books.theme=theme.id
GROUP BY themes.id
;
EDIT 3
Ofcourse a stored value will give you the best performance - but this denormalization comes at a cost: Your Database now has the potential to become inconsistent in a user-visible way.
If you do go with this method. I strongly recommend you use triggers to auto-fill this field (and ofcourse those triggers must sit on the books table).
Be prepared to see slowed down inserts - this might ofcourse be okay, as I guess you will see a much higher rate of SELECTS than INSERTS
After reading a lot about how the JOIN statement works, with the help of
useful answer 1 and useful answer 2, I discovered I used it some 15 or 20 years ago, then I forgot about this since I never needed it again.
I made a test using the options I had:
reply with the JOIN query with IFNULL(): 0,5 seconds
reply with the JOIN query without IFNULL(): 0,5 seconds
reply using a stored value: 0,4 seconds
That DB will run on some single core old iron, so I think a 20% difference could be significant, and I decide to use stored values, updating the count every time a new book is inserted (i.e. not often).
Anyway thanks a lot for having refreshed my memory: JOIN queries will be useful somewhere else in my DB.
update
I used the JOIN method above to query the book themes, which are stored into a far smaller table, in this way:
SELECT theme.id, theme.main, theme.sub, COUNT(books.theme) as num FROMthemesJOIN books ON books.theme = theme.id GROUP BY themes.id ORDER by themes.main ASC, themes.sub ASC
It works fine, but for themes which are not in the books table I obviously don't get a 0 response, so I don't have lines like Contemporary Poetry - Etruscan (0) to show as disabled options for the sake of list completeness.
Is there a way to have back my theme.main and theme.sub?
I want to know, what's the best way to get the last 10 new entries from a database (MySQL)? Sure, at the moment I'm using:
(SELECT whatever FROM whatever ORDER BY id (or whatever) DESC LIMIT 0,10)
But what happens, if you have hundreds of entries or thousands? Does MySQL still select and just "read" only the last ten entries - and doesn't lost speed & time crawling through all other entries?
For my purpose I'll always just need the last 10~20 entries from the database, the rest & old ones are more for archive-stuff. Every entry/record has an auto-increment-ID, which I use to show via ORDER and SELECT my entries (using PHP ~ PDO and prepared statments) and I love minimal and solutions, that don't require a lot of resources.
Good enough or are there better ways?
Thanks for your thoughts and explanations! :)
Your solution will always work fast no matter the size of the database, provided you have an index on the relevant columns (in this case id, which I assume is a primary key).
The reason is that the indexes are stored as B-Trees with low height and therefore are extremly fast to search in. I recommend this website for you as a background reading: http://use-the-index-luke.com/sql/anatomy/the-tree
I have a result set from MYSQL which I'm displaying in a paging scenario using PHP. (Prev/Next links)
Many result rows may have "child" rows associated with it. IE, they share a column containing the same "root number".
Due to the paging and limit arguments in my query, those groups of rows with common root numbers can be split between pages, which makes the display awkward.
I need the query to take that root number column into consideration and NOT split those child rows across to a second page. Instead, it should go ahead and include all of the rows sharing that root number on the same page together. In my mind, to achieve this, the query would take the root number into account and adjust the LIMIT upwards if the last row in the select has other rows with the same root number.
Seems like the offset value could also be exploited to achieve the desired result, but I'm not sure how I might do that on the fly.
Does anyone have thoughts on how to accomplish this?
SELECT * FROM (`tablename`) LIMIT 3600, 100
Example data:
id name rootnumber
-------------------------------------------------
1 Joe 789
2 Susan 789
3 Bill 789
4 Peter 123
Pagination with limit has several problems. Normally you count the complete result set (which takes almost as much work for the MySQL-server as retrieving the whole set), and as soon as you have limit 2000,50, you put as much work on the server as retrieving the first 2050 rows and throwing the first 2000 away. The third problem is that there is no other solution as easy as limit. ;-)
So, you could try different things:
Send bigger data packets of many pages to the client and do pagination in html/javascript/css. Just fetch a new packet when the user comes to the last of those pages. There you can work with the trick of fetching one row more than needed, so you see if that row is the same rootnumber as the last (so you discard that rootnumber completely) or if it has a new rootnumber (so the last rootnumber was completely read)
Give the user better search parameters - no user really reads through 250 lines completely, the user normally just searches for a certain date or a certain keyword, or some property of the root. As soon as the user 'paginates' through months or weeks, she has a clue at what time it was. This does have the problem of sometimes very different 'page' sizes. But that you could fix that in the client.
The MySQL-Server is very happy to do the search like where date between '2013-12-01' and '2014-01-01' or where color='blue' and customer.sex='f', there it an work with its magic and indices. Much better than that `limit 2000, 50.
This is work, this is not easy, but if you are good you can find better solutions for the customer, who does not really like to read all the lines in between.
EDIT:
There are technical solutions to that. That you show entries together when they have the same root number looks like you sort them. So in a query before (we do hope your MySQL Server has a Sado touch and likes that) and fetch only the root numbers:
select t.* from tablename as t
inner join
(
select rootnumber from tablename limit 3600, 50 # you put in your sort her, do you?
) as mt on mt.rootnumber = t.rootnumber;
As soon as your MySQL Server version uses indices on where in (subquery) (try explain), you can also use the nicer version
/* TRY EXPLAIN AND BEWARE OF FULL TABLE SCAN!*/
select t.* from table_name where rootnumber in
( select rootnumber from table_name limit 3600, 50)
;
But right now that might be really slow.
But: try to provide search parameters to reduce the table walking to an absolute minimum!
Much Fun!
Trying not to reinvent the wheel here so thought 'i'd ask you guys;
There is an existing database for a computer game that records - map name, time it took to finish the map, the difficulty level, id of person. This database is used to record best finish times for each player. So the player can type a certain command and it shows the best finish times for a particular map.
Now i would like to create a ranking system that rewards the player points for finishing the maps based on the difficulty level, e.g completing it on easy rewards the player 1 point, 2 points for medium ,etc. This ranking system will show the top players with most points.
My question is, would it be better to use the current database and use PHP to accomplish the new ranking system
or
create a new database to accomplish it?
In either case, a simple logic example would be appreciated.
I think it is best to just use your already existing database. What do you mean logic example?
Could you try this:
SELECT *, (count(*) * difficulty) AS total
FROM `map`
GROUP BY user_id
ORDER BY total DESC
LIMIT 10
difficulty is a table field, which is 1 for easy, 2 for medium, etc.
As Kyle said use your current database/table and let php/ SQL do the work. I would do something
select player,
Map,
Count(*)
From mytable
Group by player, map
That should give you a count of player completes by map. Test this though. After you get the count you can loop through your counts and based on map multiply by the points awarded.